面向DeepFake伪造模型溯源的逃避攻击

吴梦洁; 于佳艺; 汪润; 叶茜; 张钰洋; 蔺琛皓; 方黎明; 王丽娜

引用本文：

吴梦洁,于佳艺,汪润,叶茜,张钰洋,蔺琛皓,方黎明,王丽娜.面向DeepFake伪造模型溯源的逃避攻击[J].信息安全学报,已采用 [点击复制]
wu meng jie,yu jia yi,wang run,ye xi,zhang yu yang,lin chen hao,fang li ming,wang li na.Evading Attacks for DeepFake Fake Model Traceability[J].Journal of Cyber Security,Accept [点击复制]

本文已被：浏览 331次下载 0次
面向DeepFake伪造模型溯源的逃避攻击

0 字体:加大+\|默认\|缩小-
(1.武汉大学国家网络安全学院;2.西安交通大学网络空间安全学院;3.南京航空航天大学计算机科学与技术学院/人工智能学院/软件学院;4.郑州信大先进技术研究院)

摘要:

近年来，高度逼真的深度伪造技术加剧了虚假信息的传播，严重威胁到人们的声誉安全。模型溯源技术旨在追踪用于生成DeepFake的伪造模型，并为取证提供可解释的结果。然而，现有的模型溯源技术依赖于DeepFake生成过程中留下的痕迹，一旦这些痕迹被删除或篡改，溯源技术可能失效。据观察，用于模型溯源的特定痕迹同时存在于高频分量和低频分量中，并在溯源过程中起着不同的作用。本文首次提出一种新的无训练逃避攻击方法TraceEvader，并在最实用的无盒设置中进行了验证。具体来说，TraceEvader通过将从原始DeepFake中学习到的通用模仿痕迹注入到高频分量中，并在低频分量中引入对抗性模糊，以混淆某些痕迹的提取过程，从而干扰模型溯源。实验通过对包括生成对抗网络(Generative Adversarial Networks, GANs)和扩散模型(Diffusion Models, DMs)在内的8种生成模型生成的伪造图像进行综合评估，验证了该方法的有效性。总体而言，TraceEvader实现了79%的最高平均攻击成功率，并且在面对图像变换和专用去噪技术时仍表现出鲁棒性，平均攻击成功率保持在75%左右。TraceEvader证实了当前模型溯源技术的局限性，并呼吁DeepFakes研究人员和从业者探索更强大的模型溯源技术。

关键词: 深度伪造模型溯源无训练逃避攻击生成对抗网络扩散模型

DOI：

投稿时间：2024-08-22修订日期：2024-10-29

基金项目:国家重点研发计划项目(2021yfb3100700)、国家自然科学基金项目(62202340、62372334)、国家自然科学基金重点项目“鲲鹏”研究基金(2021yfb3100700)、河南省网络空间态势感知重点实验室开放基金(CCF-NSFOCUS 2023005)、武汉市知识创新计划项目(2022010801020127)、中央高校基本科研业务费专项(2042023kf0121)、湖北省自然科学基金项目(2021CFB089)

Evading Attacks for DeepFake Fake Model Traceability

wu meng jie¹, yu jia yi¹, wang run¹, ye xi¹, zhang yu yang¹, lin chen hao^2,3,4, fang li ming⁵, wang li na^1,6

(1.School of Cyber Science and Engineering, Wuhan University;2.School of Cyber Science and Engineering, Xi '3.'4.an Jiaotong University;5.College of Computer Science and Technology / College of Artficial Intelligence/ College of Software, Nanjing University of Aeronautics and Astronautics;6.Zhengzhou Xinda Institute of Advanced Technology)

Abstract:

In recent few years, DeepFakes are posing serve threats and concerns to both individuals and celebrities, as realistic DeepFakes facilitate the spread of disinformation. Model attribution techniques aim at attributing the adopted forgery models of DeepFakes for provenance purposes and providing explainable results to DeepFake forensics. However, the existing model attribution techniques rely on the trace left in the DeepFake creation, which can become futile if such traces were disrupted. Motivated by our observation that certain traces served for model attribution appeared in both the highfrequency and low-frequency domains and play a divergent role in model attribution. In this work, for the first time, we propose a novel training-free evasion attack, TraceEvader, in the most practical non-box setting. Specifically, TraceEvader injects a universal imitated traces learned from wild DeepFakes into the high-frequency component and introduces adversarial blur into the domain of the low-frequency component, where the added distortion confuses the extraction of certain traces for model attribution. The comprehensive evaluation on 4 state-of-the-art model attribution techniques and fake images generated by 8 generative models including generative adversarial networks (GANs) and diffusion models (DMs) demonstrates the effectiveness of our method. Overall, our TraceEvader achieves the highest average attack success rate of 79% and is robust against image transformations and dedicated denoising techniques as well where the average attack success rate is still around 75%. Our TraceEvader confirms the limitations of current model attribution techniques and calls the attention of DeepFake researchers and practitioners for more robust-purpose model attribution techniques.

Key words: DeepFakes model tracing untrained evasion attack generative adversarial networks diffusion models