基于多视图表示学习的安卓恶意应用检测方法

赵文翔; 孟昭逸; 熊焰; 黄文超

引用本文：

赵文翔,孟昭逸,熊焰,黄文超.基于多视图表示学习的安卓恶意应用检测方法[J].信息安全学报,2024,9(5):162-177 [点击复制]
ZHAO Wenxiang,MENG Zhaoyi,XIONG Yan,HUANG Wenchao.Android Malware Detection Based on Multi-view Representation Learning[J].Journal of Cyber Security,2024,9(5):162-177 [点击复制]

本文已被：浏览 162次下载 103次	码上扫一扫！
基于多视图表示学习的安卓恶意应用检测方法
赵文翔, 孟昭逸, 熊焰, 黄文超
0 字体:加大+\|默认\|缩小-
(中国科学技术大学计算机科学与技术学院合肥中国 230027)

摘要:

安卓操作系统自发布以来一直保持着很高的市场份额,并且由于安卓应用的数量庞大、功能繁多、行为语义复杂,攻击者可采取多种手段将其真实攻击意图隐藏在合法功能之中。然而,现有检测方案往往只能识别有限类型的恶意应用及行为。为了解决这个问题,本文利用异构信息网络对现有的代表性检测方案进行高度抽象,并使用多视图表示学习和多视图融合方法对其进行深度挖掘与协同融合,以充分释放不同方案的检测潜力,构建更为精确且全面的恶意应用检测系统。为了实现上述目的,本文提出并实现了一个基于多视图表示学习的安卓恶意应用检测系统MVFDroid。该系统首先从敏感数据流、可疑控制条件和权限三个视角出发充分观察安卓应用,从而构建出异构信息网络,以描述应用行为的执行逻辑以及行为间的关联关系;然后采用基于视图的游走方式对异构信息网络进行采样,以生成不同视图下的应用行为表示向量;最后利用基于多视图融合的安卓恶意应用检测方法,将表示向量融合后送入深度神经网络(DNN)分类器中,从不同视角综合判断其目标应用的恶意性。实验表明,本文提出的方法可有效检测出安卓恶意应用,其检测的准确率为96.57%且F1值为95.56%,均优于当前的代表性检测方案Drebin、HinDroid和MaMaDroid。同时,实验结果表明,本文所使用的基于视图融合的表示学习方法可有效应用于安卓恶意应用检测任务,其效果优于基准方法DeepWalk、node2vec和metapath2vec。

关键词: Android恶意应用检测异构信息网络多视图融合图表示学习

DOI：10.19363/J.cnki.cn10-1380/tn.2024.09.04

投稿时间：2022-08-18修订日期：2023-01-06

基金项目:本课题得到国家自然科学基金项目(No.62102385,No.62372422,No.62272434,No.61972369)、安徽省自然科学基金项目(No.2108085QF262)和中央高校基本科研业务费专项资金(No.WK2150110024)资助。

Android Malware Detection Based on Multi-view Representation Learning

ZHAO Wenxiang, MENG Zhaoyi, XIONG Yan, HUANG Wenchao

(School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China)

Abstract:

The Android operating system has maintained a high market share since its release, and because of the large number of Android applications, their many functions, and their complex behavioral semantics, attackers can adopt a variety of means to hide their true attack intent within legitimate functionality. However, existing detection solutions often identify only limited types of malicious applications and behaviors. To solve this problem, we use heterogeneous information networks to highly abstract existing representative detection schemes, and use multi-view representation learning and multi-view fusion methods to deeply mine and collaboratively fuse them, in order to fully unleash the detection potential of different schemes and build a more accurate and comprehensive malicious application detection system. To achieve the above purpose, this paper proposes and implements a multi-view representation learning based Android malicious application detection system MVFDroid. In this system, we first fully observe Android applications from three perspectives: sensitive data flow, suspicious control conditions and permissions, so as to build a heterogeneous information network to describe the execution logic of application behaviors and the association relationships among behaviors; then we adopt a view-based wandering approach to sample the heterogeneous information network to generate application behavior representation vectors under different views; finally, we use a multi-view fusion-based Android malicious application detection method to fuse the representation vectors and feed them into a deep neural network (DNN) classifier to comprehensively determine the maliciousness of its target applications from different perspectives. Experiments show that the proposed method can effectively detect Android malicious applications with an accuracy of 96.57% and an F1 value of 95.56%, both of which are better than the current representative detection scheme Drebin, HinDroid and MaMaDroid. Meanwhile, experimental results show that the view fusion-based representation learning method used in this paper can be effectively applied to the Android malicious application detection task, which outperforms the benchmark methods DeepWalk, node2vec and metapath2vec.

Key words: Android malware detection heterogeneous information network multi-view fusion graph representation learning