You handle the authentication by signatures with private keys embedded in hardware modules. An AI isn't going to be able to fake that signature. Instead, the system will fail because the keys will be extracted from the hardware modules.
For images in particular, hardware attestation fails in several ways:
1. The hardware just verifies that the image was acquired by that camera in particular. If an AI generates the thing it's photographing, especially if there's a glare/denoising step to make it more photographable, the camera's attestation is suddenly approximately worthless despite being real.
2. The same problem all those schemes have is that extracting hardware keys is O(1). It costs millions to tens of millions of dollars today, but the keys are plainly readable by a sufficiently motivated aversary. Those keys might buy us a decade or two, but everything beyond that is up in the air and prone to problems like process node size hitting walls while the introspection techniques continually get smaller and cheaper.
3. In the world you describe, you still have to trust the organizations producing hardware modules -- not just the "organization," but every component in that supply chain. It'd be easy for an internal adversary to produce 1/1M cameras which authenticate any incoming PNG and sell them for huge profits.
4. The hardware problem you're describing is much more involved than ordinary trusted computing because in addition to the keys being secure you also need the connection between the sensor and the keys to be secure. Otherwise, anyone could splice in a fake "sensor" that just grabs a signature for their favorite PNG.
4a. You're still only talking about O($10k) to O($100k) to produce a custom array to feed a fake photo into that sensor bank without any artifacts from normal screens. Even if the entire secure enclave / sensor are fully protected, you can still cheaply create a device that can sign all your favorite photos.
5. How, exactly, do lighting adjustments and whatnot fit in with such a signing scheme? Maybe the "RAW" is signed and a program for generating the edits is distributed alongside? Actually replacing general camera use with that sort of thing seemingly has some kinks to work out even if you can fix the security concerns.
These aren't failure points, they are significant roadblocks.
First way to overcome this is attesting on true raw files. Then mostly just transferring raw files. Possibly supplemented by ZKPs that prove one imagine is the denoised version of another.
The other blocks are overcome by targeting crime, not nation states. This means you only nrrd stochastic control of the supply chain. Especially because, unlike with DRM keys, the leaking of a key doesn't break the whole system. It is very possible to revoke trust in a key. And it is possible to detect misuse of a private key, and revoke trust in it.
This won't stop deepfakes of political targets. But it does keep society from being fully incapable of proving what really happened to their peers.
I'm not saying we definitely should do this. But I do think there is a possible setup here that could be made reality, and that would substantially reduce the problem.
(1) is a definite failure point, and (4) is going to be done for free by hobbyists. The best-case scenario is that the proposal helps keep honest people honest, reducing the number of malicious actors.
The problem is that the malicious product is nearly infinitely scalable, enough so that I expect services to crop up whereby people use rooms full of trusted devices to attest to your favorite photo, for very low fees. If that's not the particular way this breaks then it's because somebody found something even more efficient or the demand isn't high enough to be worth circumventing (and in the latter case the proposal is also worthless).