  • 刘超,娄尘哲,喻民,姜建国,黄伟庆.面向恶意PDF文档分类的对抗样本生成方法研究[J].信息安全学报,2023,8(5):14-26    [点击复制]
  • LIU Chao,LOU Chenzhe,YU Min.Research on Adversarial Example Generation Method for Malicious PDF Document Classification[J].Journal of Cyber Security,2023,8(5):14-26   [点击复制]
刘超1, 娄尘哲1,2, 喻民1,2, 姜建国1, 黄伟庆1
(1.中国科学院信息工程研究所 北京 中国 100093;2.中国科学院大学网络空间安全学院 北京 中国 100093)
关键词:  恶意 PDF 文档|对抗样本|文档分类|样本生成|鲁棒性
基金项目:本课题得到中国科学院青年创新促进会(No. 2021155)资助。
Research on Adversarial Example Generation Method for Malicious PDF Document Classification
LIU Chao1, LOU Chenzhe1,2, YU Min1
(1.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;2.School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100093, China)
Spreading of malware via malicious documents is very common in the modern Internet and is one of the highest risks faced by many organizations. PDF documents are the most widely used document type worldwide, and as a result, there are countless attacks caused by them. The use of machine learning methods for malicious document detection is a popular and effective approach, but the robustness of machine learning classifiers has the potential to expose certain problems in the face of well-designed samples from attackers. In the field of computer vision, adversarial learning has proven to be an effective method for improving the robustness of classifiers in many scenarios. For malicious document detection, we still lack a comprehensive approach for generating adversarial examples for various attack scenarios. In this paper, we introduce the basics of PDF file formats, as well as effective malicious PDF document detectors and adversarial sample generation techniques. We propose a model to generate adversarial examples for adversarial learning in the area of malicious documents detection, and use the generated adversarial examples study the detection effectiveness (and evasion effectiveness) for hypothetical scenarios with multiple detectors. The key operations of the model are association feature extraction and feature modification, where association feature extraction is used to find the associations between different feature spaces and feature modification is used to maintain the stability of the examples. The final attack algorithm leverages the idea of momentum-based iterative gradient to boost the success rate and efficiency of generating adversarial examples. We combined some convincing datasets and rigorously set up the experimental environment and metrics, followed by tests against example attacks and robustness enhancement. Experimental results confirmed that the proposed model can maintain a high level of generation rate and success rate. Moreover, this model can be applied to other malware detectors and contribute to robust optimization.
Key words:  malicious PDF document|adversarial example|document classification|example generation|robustness