Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector

Reichman University

Image generators (e.g., Dall•E-2) leave unique artifacts, fingerprints, within generated images. With DIF we extract these fingerprints of the generative models from a small set of real and generated images and utilize them for the detection of generated images.


Blended Diffusion teaser.

Examples of fingerprints extracted from Stable Diffusion v1.4 and GLIDE.

Abstract

The generation of high-quality images has become widely accessible and is a rapidly evolving process. As a result, anyone can generate images that are indistinguishable from real ones. This leads to a wide range of applications, which also include malicious usage with deception in mind. Despite advances in detection techniques for generated images, a robust detection method still eludes us. In this work, we utilize the inductive bias of convolutional neural networks (CNNs) to develop a new detection method that requires a small amount of training samples and achieves accuracy that is on par or better than current state-of-the-art methods.

Method

Convolutional Neural Networks (CNNs) have a tendency to produce unique image features known as “fingerprints”. We use this phenomenon to train an additional CNN to mimic the fingerprint of a target image generator. We call this method “Deep Image Fingerprint” (DIF).

DIF has a few applications. Firstly, with DIF we achieved high accuracy on detection of synthetic images. Secondly, we show that fine-tuned image generators preserve the same fingerprint as their source model. Therefore, with DIF you may trace the origin of fine-tuned model.


For more details please refer to the paper.

Blank Image Experiment


Although CNNs have great image generation capabilities, they fail to accurately reproduce (Ŷ) images without semantic information (Y). Artifacts are clearly visible in Fourier space (FFT{Ŷ}).

Blank Image Experiment

Generated Image Detection

DIF is on par with or even outperforms pre-trained state-of-the-art methods for detecting images produced by GANs or novel Text-to-Image models. In contrast to other methods, it requires much fewer training samples.

Text-to-Image Detection

Text-to-Image Detection

Cross-Detection

We showed that image generators retain their unique fingerprints even after being fine-tuned on new data. Following this discovery, we observed that an early version of MidJourney was a pre-trained stable diffusion v1.x model. For the full methodology please refer to the paper.

Cross-Detection     Stable Diffusion 1.4

BibTeX

If you find this research useful, please cite the following:

 @misc{sinitsa2023deep,
      title={Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector},
      author={Sergey Sinitsa and Ohad Fried},
      year={2023},
      eprint={2303.10762},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}