基于种子智能生成的内核模糊测试模型

王明义; 甘水滔; 王晓锋; 刘渊

引用本文：

王明义,甘水滔,王晓锋,刘渊.基于种子智能生成的内核模糊测试模型[J].信息安全学报,2024,9(3):124-137 [点击复制]
WANG Mingyi,GAN Shuitao,WANG Xiaofeng,LIU Yuan.Kernel Fuzzing Model Base on Intelligent Seed Generation[J].Journal of Cyber Security,2024,9(3):124-137 [点击复制]

本文已被：浏览 1661次下载 1022次	码上扫一扫！
基于种子智能生成的内核模糊测试模型
王明义¹, 甘水滔^2,3, 王晓锋^1,4, 刘渊¹
0 字体:加大+\|默认\|缩小-
(1.江南大学人工智能与计算机学院无锡中国 214122;2.清华大学网络研究院北京中国 100084;3.数学工程与先进计算国家重点实验室无锡中国 214083;4.鹏城实验室深圳中国 518005)

摘要:

操作系统具有庞大的用户群体, 因此使得内核漏洞具有极强的通用性。模糊测试作为一种高效的漏洞挖掘方法, 也被应用于操作系统内核, 并且已经取得不错的成果。但是, 目前流行的面向内核的模糊测试模型 Syzkaller 在生成种子时具有一定的盲目性, 无法自动产生具有依赖关系的系统调用, 制约了模糊测试的代码覆盖能力。为解决上述问题, 本文提出并实现了基于种子智能生成的内核模糊测试模型 SyzMix。该模型一方面结合 LSTM(Long Short-Term Memory)神经网络, 使用语法模板, 通过序列化操作和反序列化操作, 能自动生成更多蕴含潜在依赖关系的系统调用序列, 有效提高了种子执行的成功率; 另一方面, 通过静态分析方法获得系统调用显式依赖关系, 通过动态分析方法获得系统调用隐式依赖关系, 并通过上述依赖关系进一步优化种子内部系统调用关系, 结合测试用例的生成策略和变异策略, 显著提高了选择系统调用的准确性。综合上述方法, SyzMix 达到了更高的代码覆盖能力和代码覆盖加速比。为了验证模型的有效性和实用性, 利用 SyzMix 与 Syzkaller 在不同版本的内核中进行测试, 种子执行成功率提高了 16%, 选择系统调用的准确性提高了 88.8%, 内核代码覆盖率提高了 7.87%, 代码覆盖加速比达到了 132.3%。另外, SyzMix 在不同版本的内核中发现了 8 个的未知 bug, 并申请得到 CVE 编号 CVE-2021-45868。

关键词: 模糊测试漏洞挖掘操作系统内核神经网络

DOI：10.19363/J.cnki.cn10-1380/tn.2024.05.09

投稿时间：2022-06-17修订日期：2022-09-28

基金项目:本课题得到鹏城实验室重大任务项目(No. PCL2022A03), 国家自然科学基金项目(No. 62172191, No. 61972182)资助。

Kernel Fuzzing Model Base on Intelligent Seed Generation

WANG Mingyi¹, GAN Shuitao^2,3, WANG Xiaofeng^1,4, LIU Yuan¹

(1.School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China;2.Institute for Network, Tsinghua University, Beijing 100084, China;3.State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi 214083, China;4.Peng Cheng Laboratory, Shenzhen 518005, China)

Abstract:

Operating system has a large user base, which makes the kernel vulnerability very versatile. As an efficient vulnerability mining method, fuzzing has also been applied to the operating system kernel, and has achieved good results. However, Syzkaller, the popular kernel oriented fuzzing model, has some blindness in generating seeds, and cannot automatically generate system calls with dependencies, which will restrict the code coverage ability of fuzzing. To solve these problems, this paper proposes and implements a kernel fuzzing model SyzMix based on seed intelligent generation. On the one hand, the model combines with LSTM (long short term memory) neural network, uses syntax template, automatically generates more system call sequences with potential dependencies through serialization and deserialization, and effectively improves the success rate of seed execution; on the other hand, the explicit dependencies of system calls are obtained by static analysis method, and the implicit dependencies of system calls are obtained by dynamic analysis method, and further optimize the relationship of system calls within the seed through the above dependencies. Combined with the generation strategy and mutation strategy of test cases, the accuracy of selecting system calls is significantly improved. Based on the above methods, SyzMix achieves higher code coverage and the code coverage speed-up. To verify the validity and practicability of the model, SyzMix and Syzkaller are tested in different versions of the kernel. The success rate of seed execution is increased by 16%, the accuracy of selecting system calls is improved by 88.8%, the kernel code coverage is increased by 7.87%, and code coverage achieved a speed-up of 132.3%. In addition, SyzMix found eight unknown bugs in different versions of the kernel and requested CVE number CVE-2021-45868.

Key words: fuzzing vulnerability discovery operating system kernel neural network