面向深度学习模型的可靠性测试综述

陈若曦; 金海波; 陈晋音; 郑海斌; 李晓豪

引用本文：

陈若曦,金海波,陈晋音,郑海斌,李晓豪.面向深度学习模型的可靠性测试综述[J].信息安全学报,2024,9(1):33-55 [点击复制]
CHEN Ruoxi,JIN Haibo,CHEN Jinyin,ZHENG Haibin,LI Xiaohao.Deep Learning Testing for Reliability: A Survey[J].Journal of Cyber Security,2024,9(1):33-55 [点击复制]

本文已被：浏览 10231次下载 6621次	码上扫一扫！
面向深度学习模型的可靠性测试综述
陈若曦¹, 金海波¹, 陈晋音^1,2, 郑海斌^1,2, 李晓豪¹
0 字体:加大+\|默认\|缩小-
(1.浙江工业大学信息工程学院杭州 310023;2.浙江工业大学网络空间安全研究院杭州 310023)

摘要:

深度学习模型由于其出色的性能表现而在各个领域被广泛应用,但它们在面对不确定输入时,往往会出现意料之外的错误行为,在诸如自动驾驶系统等安全关键应用,可能会造成灾难性的后果。深度模型的可靠性问题引起了学术界和工业界的广泛关注。因此,在深度模型部署前迫切需要对模型进行系统性测试,通过生成测试样本,并由模型的输出得到测试报告,以评估模型的可靠性,提前发现潜在缺陷。一大批学者分别从不同测试目标出发,对模型进行测试,并且提出了一系列测试方法。目前对测试方法的综述工作只关注到模型的安全性,而忽略了其他测试目标,且缺少对最新出版的方法的介绍。因此,本文拟对模型任务性能、安全性、公平性和隐私性4个方面对现有测试技术展开全方位综述,对其进行全面梳理、分析和总结。具体而言,首先介绍了深度模型测试的相关概念;其次根据不同测试目标对79篇论文中的测试方法和指标进行分类介绍;然后总结了目前深度模型可靠性测试在自动驾驶、语音识别和自然语言处理三个工业场景的应用,并提供了可用于深度模型测试的24个数据集、7个在线模型库和常用工具包;最后结合面临的挑战和机遇,对深度模型可靠性测试的未来研究方向进行总结和展望,为构建系统、高效、可信的深度模型测试研究提供参考。值得一提的是,本文将涉及的数据集、模型、测试方法代码、评价指标等资料归纳整理在https://github.com/Allen-piexl/Testing-Zoo,方便研究人员下载使用。

关键词: 深度学习模型深度测试可靠性安全性公平性隐私性

DOI：10.19363/J.cnki.cn10-1380/tn.2024.01.03

投稿时间：2022-04-06修订日期：2022-06-22

基金项目:本课题得到国家自然科学基金(No. 62072406)、信息系统安全技术重点实验室基金(No. 61421110502)、国家重点研发计划基金资助项目(No. 2018AAA0100801)、国家自然科学基金项目-联合重点(No. U21B2001)、浙江省重点研发计划项目(No. 2022C01018)资助。

Deep Learning Testing for Reliability: A Survey

CHEN Ruoxi¹, JIN Haibo¹, CHEN Jinyin^1,2, ZHENG Haibin^1,2, LI Xiaohao¹

(1.College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China;2.College of Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China)

Abstract:

Deep neural networks (DNNs) have been widely applied in various areas due to impressive capabilities and outstanding performance. However, they will expose unexpected erroneous behaviors when they are faced with uncertainty, which may lead to disastrous consequences in safety-critical applications such as autonomous driving systems. The reliability of deep models has aroused widespread concern in both academia and industry. Therefore, it is necessary to systematically test deep models before the deployment. The reliability of models can be evaluated and potential defects can be found in advance by generating testing examples and then obtaining test reports from the output of models. A large number of researchers have conducted in-depth research on testing DNNs and proposed a series of testing methods from different testing objectives. However, current works on survey of testing methods only focus on the security of DNNs, and they don’t take recently-published techniques into consideration. Different from them, this article focuses on four reliability test objectives of models, i.e., task performance, security, fairness and privacy, and comprehensively analyzes the related technologies and methods of testing DNNs. Firstly, the related concepts of deep learning testing are introduced. Then, according to different testing objectives, testing methods and metrics from 79 papers are classified and introduced in detail. Next, the current application of DNNs’ reliability testing in three industrial scenarios are summarized, including autonomous driving, speech recognition and natural language processing. Besides, 24 datasets, 7 online model libraries and common toolkits that can be used for deep model testing are provided. Finally, along with the challenges and opportunities, the future research direction of deep learning testing is summarized, which provides reference for the construction of systematic, efficient and reliable deep learning testing. It is worth noting that the related datasets, open-source code of testing methods and metrics are available in https://github.com/Allen-piexl/Testing-Zoo, to facilitate subsequent scholars' research.

Key words: deep neural networks deep testing reliability security fairness privacy