  • 苗晓孔,孙蒙,张雄伟,李嘉康,张星昱.基于参数转换的语音深度伪造及其对声纹认证的威胁评估[J].信息安全学报,2020,5(6):53-59    [点击复制]
  • MIAO Xiaokong,SUN Meng,ZHANG Xiongwei,LI Jiakang,ZHANG Xingyu.Deep Speech Forgery Based on Parameter Transformation and Threat Assessment to Voiceprint Authentication[J].Journal of Cyber Security,2020,5(6):53-59   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭


过刊浏览    高级检索

本文已被:浏览 6120次   下载 5260 本文二维码信息
苗晓孔, 孙蒙, 张雄伟, 李嘉康, 张星昱
(陆军工程大学 指挥控制工程学院 智能信息处理实验室 江苏 南京 210007)
关键词:  语音转换  声纹认证  对抗攻击  深度学习
Deep Speech Forgery Based on Parameter Transformation and Threat Assessment to Voiceprint Authentication
MIAO Xiaokong, SUN Meng, ZHANG Xiongwei, LI Jiakang, ZHANG Xingyu
(College of Command and Control Engineering Intelligent Information Processing Laboratory, Army Engineering University, Nanjing 210007, China)
Automatic Speaker verification (ASV) system, as a biometric authentication or recognition mechanism, has been widely used in people’s daily life. However, the system is vulnerable to deception attack in practical application, and the system also faces different potential risks. Voice conversion (VC) usually refers to the technology of “modifying and transforming” a person’s voice characteristics to make it sound like another person’s voice, while keeping the speech content information unchanged. VC could generate the voice of a specific target speaker, and it is difficult to distinguish the converted voice and the target voice in auditory perception. But for the speaker verification system, the auditory similarity is not enough to cheat the authentication system. This paper analyzes Mel cepstrum, a common feature vector extracted in speech conversion and speaker verification, and realizes more accurate conversion of Mel cepstrum with joint dynamic features by using a two-way long and short-time memory network with improved depth residuals. At the same time, the loss function is changed to optimize the performance of the conversion network and the global mean filter is introduced to filter out the cepstrum clutter generated in the conversion process and improve the quality of the converted voice as a whole. At the same time, the similarity of speech conversion is improved and the subjective perception is not decreased. And the converted voice is used to cheat two different speaker verification systems. Experiments show that the system can successfully cheat these authentication systems, and has a high success rate.
Key words:  voice conversion  voiceprint authentication  anti-attack  deep learning