引用本文
  • 李澍非,梁瑞刚,黄伟豪,相璐,孟国柱.基于家族核心函数的Linux恶意软件谱系分析方法[J].信息安全学报,已采用    [点击复制]
  • lishufei,liangruigang,huangweihao,xianglu,mengguozhu.Malware family core function-based Linux malware lineage analysis[J].Journal of Cyber Security,Accept   [点击复制]
【打印本页】 【下载PDF全文】 查看/发表评论下载PDF阅读器关闭

过刊浏览    高级检索

本文已被:浏览 1218次   下载 44  
基于家族核心函数的Linux恶意软件谱系分析方法
李澍非, 梁瑞刚, 黄伟豪, 相璐, 孟国柱
0
(信息工程研究所)
摘要:
近些来恶意软件发展迅速、衍变速度愈发加快,且呈现出家族性扩散的趋势,现已成为学术界和工业界关注和研究的热点问题。然而,传统的依赖人工分析的方法存在扩展性差、效率低、难以应对日渐庞大的恶意软件数量的瓶颈,基于机器学习的智能化恶意软件的检测和分类技术也面临不能及时发现和预警未知恶意软件家族,缺乏对家族间潜在谱系关系进行准确分析和理解的问题。本文针对上述挑战,提出了一种基于家族核心函数的恶意软件谱系分析方法FCF-MLA(Family Core Functions-based Malware Lineage Analysis),能够通过定位未知恶意软件家族的核心函数,进而实现对家族之间潜在谱系关系的推断。该方法首先通过基于标签投票的代码相似性聚类方法分配未知恶意软件的标签;其次筛选恶意软件的反汇编函数,基于相似函数集合在家族内覆盖率提取各家族的核心函数群,作为后续谱系分析的特征输入;最后本文对现实世界中家族间存在的谱系关系进行了定义和量化,并对不同家族的核心函数间相似性距离进行分级,利用统计的分级结果整体表征家族间的相似程度,结合家族间数量级差推断潜在的谱系关系。本文基于提出的方法实现了原型系统,在现实中来自10个家族的10,578个恶意软件样本的数据集上进行了测试。实验结果表明,该方法的分类准确率达到了98.26%,并且能够准确定位家族核心函数、刻画家族间谱系关系。
关键词:  恶意软件分类  无监督学习  相似性度量  核心函数提取  恶意软件谱系分析
DOI:10.19363/J.cnki.cn10-1380/tn.2024.08.04
投稿时间:2023-02-07修订日期:2023-05-09
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
Malware family core function-based Linux malware lineage analysis
lishufei, liangruigang, huangweihao, xianglu, mengguozhu
(Institute of Information Engineering)
Abstract:
In recent years, malware has developed rapidly, showing a trend of familial proliferation. It has become a hot topic in academia and industry. However, the traditional methods relying on manual analysis have the bottleneck of poor scalability, low efficiency, and difficulty in dealing with the increasing number of malware. The detection and classification technology based on machine learning also faces the inability to detect and warn of unknown malicious software in time and a lack of accurate analysis and understanding of the potential lineage relationships between families. Given the above problems, this paper proposes a family core function-based malware classification and family lineage analysis method FCF-MLA (Family Core Functions-based Malware Lineage Analysis), which can discover and capture unknown malware families while locating and tagging their core code, enabling accurate inferences about potential lineage relationships that exist among families. The method first identifies the malware tags through the code similarity clustering method based on tag voting. Secondly, we filter the full function of the malware family and extract the core function group of each family by the coverage rate of the similar function set in the family. Finally, we quantify the pedigree relationship between families in the real world. We perform distance grading based on the similarity between the core functions of different families, use the statistical grading distance to characterize the similarity between families, and combine the magnitude difference between families to infer the lineage relationship. This paper implements a prototype system based on the proposed method, tested on a real-world dataset of 10,578 malware samples from 10 families. The experimental results show that the method can effectively classify malware with an accuracy rate of 98.26% and can accurately infer the lineage relationships among malware families.
Key words:  Malware classification  unsupervised learning  similarity measure  core function extraction  malware lineage analysis