Road to Artificia
Posts
Time for smartphone makers to ship verified video capture

Time for smartphone makers to ship verified video capture

Democracy depends on free speech, separation of powers, and smartphone camera design

Jeff LaPorte
March 03, 2025 • Estimated Reading Time: 9 minutes

Did you receive this forwarded from a friend?

The problem of AI-generated fakes — a quick recap

Since the 2016 U.S. election, foreign interference executed with modern web marketing techniques on the major social networks has become a reality throughout western countries. Questions about the impact of this kind of interference are naive — It’s a bit like brand advertising: everyone says they are not affected by it, but consumer brands wouldn’t spend the money if it didn’t have an impact. And countries wouldn’t make these efforts if they weren’t effective.

The impact doesn’t need to come in the form of changing the results of an election. Narrowing winning margins to weaken governing mandates, calling results into question, calling democracy into question: these are all wins for an interference effort. And for all the investigations that have happened, it’s simply too explosive for governments to lay out actual impacts to a vote retrospectively. (See reports⁴ on the Brexit referendum, 2016 U.S. election, Canadian 2019 and 2021 Federal elections, etc)So such investigations talk about examples and groups participating, but to say more risks magnifying societal disruption after the fact.

No, the only way to minimize the damage of this interference is to prevent it beforehand, or expose it as it happens.

In 2025, western society is awake to the challenge of foreign interference, but the online tools and tradecraft are more capable than ever. The apparent absence of AI-generated deepfakes in the 2024 U.S. election felt surprising, but shouldn’t give us too much comfort.

Picture of man realizing the reason it's been forever since he's seen any AI-generated images is that he can't distinguish between AI-generated and real images.

It’s interesting to speculate why we don't seem to have yet seen use of AI generated audio and video to manipulate western elections and public sentiment:

lack of operational capability
ambivalence about goals
a choice to wait for the tech to get better

Whatever the reason, in the last few months we've probably crossed the quality threshold for AI generation of videos that cannot be determined with certainty to be real or fake. We should assume this will be operationalized in the near-term for use against elections.

A few days ago, Alibaba released a video generative model called Wan 2.1, and made it open source. Wan's quality is very good:

Given that this level of quality is available in an open source model, and there are distilled versions of this model that run comfortably on a modern laptop, the proliferation genie is out of the bottle.

So what's to be done?

Productizing verified media capture

Verified media capture is the idea that a piece of media could carry an embedded proof of authenticity, allowing any recipient of the file to independently validate that proof of authenticity, thereby verifying that the media is “real”.

If this is seeming a little abstract, you could imagine a UI like this: A small green check that shows in the corner of each video you see, or a yellow triangle in videos lacking verification. Just like the familiar lock icon in your browser that indicates the server you are connected to is the server it claims to be.

Much of the required technical infrastructure already exists. Let’s talk about how it would work.

Most of the capabilities are already in your phone

Fortunately, device manufacturers like Apple have been building similar security features for many years. The iPhone already contains a host of technologies that provide verification of various aspects of the system. This includes a secure boot system, a secure key storage container called the Secure Enclave, and key-based system for the verification of genuine parts.

Note, I’m using Apple systems for the discussion here for a couple reasons: They have a great Platform Security guide that documents all their key security features, and (b) although some of these features have counterparts in the Android ecosystem, Apple’s security is simply more mature and complete¹ (sorry folks, don’t @ me).

These technologies are tied together in a chain of trust, such that each layer verifies the next² . In iOS systems, this chain begins with the Boot ROM, which is a piece of unchangeable code that is the first code to run each time the system starts. Each of the later pieces of software that are loaded are signed in a way that can be verified by the previous layer. So the Boot ROM can verify the bootloader’s signature, which can then verify the OS kernel signature, which can verify extensions and device drivers, etc. The real way it works has further protections and complexities, but this is the basic scheme.

That’s the software side of things. But current iOS devices go further and check the authenticity of various hardware components.

For example, you don’t want a repair shop to have the ability to put an “evil” camera into your system that injects extra location data into your photos to track you. Or the ability to replace your Face ID sensor with an evil one that will remember your face, and send your face to the system in a “replay attack”, when it’s actually Sergei from the FSB unlocking your phone. The system enabling this device authenticity check is similar to the software systems above. For a device to “pair” with the iPhone’s secure enclave (a security coprocessor), the hardware component participates in a key-based authentication process³ .

What’s missing to implement verified videos?

The basic idea is that the chain of trust already implemented in an iPhone enables the OS to sign an image or video generated by the onboard camera. The signing certificate would have its own chain of trust, similar to how web server TLS/SSL certificates are signed by trusted certificate authorities.

What's still needed to create an ecosystem of trusted media capture and distribution?

Modern cryptographic techniques provide everything needed from a software perspective. The validation would have multiple components.

First, a chain of trust running all the way from the actual camera to the encoded capture would certify the capture itself. This in itself would be very strong evidence of a real capture.

Second, one or multiple digital notary services can witness the file by storing and signing a record of the signed file hash and the time it was observed (this is a use case that blockchains are actually tailor-made for). If you trust the notary service, this provides further strong evidence of the reality of the media based on the creation time. If the device's chain of trust is broken, such as by the leak of a private signing key, past witnesses of the file provide proof of existence and signing before the leak, preserving the verifiability of all those previous captures.

Ensuring privacy for the person / device that captured the media would require generating unique signing keys per capture.

So in the end, what does such a system actually give us?

We would have an ecosystem in which devices would provide us with signed media — audio, video, and photos — which could be validated by any other device on the network, without relying on live cloud lookups.

During the COVID-19 pandemic, Google and Apple cooperated to define and release the Google/Apple Exposure Notification System (GAEN). A similar cooperative approach would be beneficial for a Verified Capture standard. The shared standard itself would only need to specify details of the image and metadata signing format.

Both Canon and Nikon have previously created systems (Nikon’s Image Authentication System, Canon's Original Data Security) with the goal of authenticating photo capture, but they were poorly designed and failed by not protecting the private signing keys properly. Apple and Google however are experienced at operating this type of infrastructure and are up to the task.

Tech has had a rough decade marked by public backlash, cynically inflamed by many political interests in turn. So much so that people forget that technology is also a tool for truth: key moments captured by eyewitnesses, nation-state crimes laid bare through online groups like Bellingcat, and scientific fraud exposed at scale through digital forensics and tireless experts like Elisabeth Bik.

So @tim_cook, @HanJong-hee, @sundarpichai, how about it? Unlike the misinformation battles of the last decade, this one won't turn political. It’s a societal good and defence of democracy that only you can make happen. Everybody wants to know if that video is real, and you've already got the tools to do it.

Wouldn't it be great if smartphones became unassailable sources of truth?

What did you think of this issue?

1 Examples I’m referring to: The Apple Secure Enclave Processor vs Android’s use of the ARM Trusted Execution Environment which is not hardware-isolated, Apple’s Secure Enclave Hardware Binding, the availability of Lockdown Mode, etc.

2 Apple Platform Security, December 2024
https://help.apple.com/pdf/security/en_US/apple-platform-security-guide.pdf

3 Apple doesn’t disclose the specific details of this hardware component pairing process.

4 Selected foreign interference investigations:
- Brexit referendum (UK Intelligence and Security Committee Russia Report, US Senate Foreign Relations report: Putin’s asymmetric assault on democracy in Russia and Europe: Implications for U.S. national security)
- 2016 U.S. election ODNI Report: Assessing Russian Activities and Intentions in Recent US Elections
- Canadian 2019 and 2021 Federal elections: Foreign Interference Commission