摘要: |
作为自然语言处理(NLP)领域最强大的深度学习模型,Transformer在机器翻译和自然语言生成等任务中表现出色。同时,这意味着Transformer模型的知识产权(IPR)侵权风险也越来越大,尤其那些训练成本很高的大型模型。尽管目前存在针对卷积神经网络(Convolutional Neural Networks)和生成对抗网络(Generative Adversarial Networks)等模型的所有权验证方法,但针对Transformer的工作还很欠缺。因此,为了完善Transformer的知识产权保护,让版权所有者在黑盒环境和白盒环境下都能够有效验证Transformer模型所有权,本文首先提出了一种基于额外关注力的白盒水印方案,该方案将所有者签名嵌入模型中并能够抵抗各种攻击,包括通常水印无法抵御的模糊攻击(不破坏现有水印而是加入攻击者水印造成所有权混淆)。之后,本文提出了一个基于混合触发器(Hybrid Triggers)的后门添加方案,该方案在不访问模型源码的黑盒情况下实现了对模型所有权的验证,具有良好的隐蔽性和抗去除性。此外,本文研究了一种新形式的模糊攻击,实验结果表明,面对这种攻击,本文提出的水印方案优于现有的深度神经网络水印方案。本文为Transformer提供了一个更鲁棒的水印方案,解决了现有技术的局限性,加强了Transformer的知识产权保护。 |
关键词: 深度神经网络 知识产权保护 所有权验证 鲁棒水印 |
DOI:10.19363/J.cnki.cn10-1380/tn.2024.09.06 |
投稿时间:2023-03-07修订日期:2023-06-21 |
基金项目:本课题得到国家自然科学基金等资助。 |
|
Robust Watermarking for Protect Transformer Intellectual Property |
WANG Baowei, ZHENG Weiqian
|
(School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, China) |
Abstract: |
As the most powerful deep learning model in natural language processing (NLP), Transformer has excellent performance in tasks such as machine translation and natural language generation. However, this also means that Transformer models are increasingly at risk of Intellectual Property Rights (IPR) infringement, especially for large models with extremely high training costs. Although ownership verification methods are available for models such as Convolutional Neural Networks (CNN) and Generative Adversarial Networks (GAN), protection work for Transformer is still lacking. Therefore, in order to effectively verify the ownership of Transformer models in both black-box and white-box settings, this paper first proposes a robust watermark that can resist various attacks, including ambiguity attacks (not destroying the existing watermark, but adding the attacker’s watermark to cause ownership obfuscation), which normal watermarking schemes cannot resist, by adding Extra Attention as a white-box watermark carrier. Secondly, this paper implements a backdoor addition scheme based on Hybrid Triggers, which has good crypticity and removal resistance while achieving model ownership verification without access to the source code. In addition, a new form of ambiguity attack is investigated in this paper, and experimental results show that the watermarking scheme of this paper outperforms existing deep neural network watermarking schemes in the face of such attacks. The watermarking method proposed in this paper addresses the limitations of previous works, provides more robust watermarking for the Transformer, and enhances the intellectual property protection of the model. |
Key words: deep neural network intellectual property protection ownership verification robust watermark |