API2Vec: Learning Representations of API Sequences for Malware Detection (ISSTA 2023 - Technical Papers)

Who

Lei Cui, Jiancong Cui, Yuede Ji, Zhiyu Hao, Lun Li, Zhenquan Ding

Track

ISSTA 2023 Technical Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 18 Jul 2023 13:30 - 13:45 at Amazon Auditorium (Gates G20) - ISSTA 3: Deep-Learning for Software Analysis Chair(s): Shiyi Wei

Abstract

Analyzing malware based on API call sequence is an effective approach as the sequence reflects the dynamic execution behavior of malware.Recent advancements in deep learning have led to the application of these techniques for mining useful information from API call sequences. However, these methods mainly operate on raw sequences and may not effectively capture important information especially for multi-process malware, mainly due to the API call interleaving problem.

Motivated by that, this paper presents API2Vec, a graph based API embedding method for malware detection. First, we build a graph model to represent the raw sequence. In particular, we design the temporal process graph (TPG) to model inter-process behavior and temporal API graph (TAG) to model intra-process behavior. With such graphs, we design a heuristic random walk algorithm to generate a number of paths that can capture the fine-grained malware behavior. By pre-training the paths using the Doc2Vec model, we are able to generate the embeddings of paths and APIs, which can further be used for malware detection. The experiments on a real malware dataset demonstrate that API2Vec outperforms the state-of-the-art embedding methods and detection methods for both accuracy and robustness, especially for multi-process malware.

DOI

https://doi.org/10.1145/3597926.3598054

Lei Cui

Zhongguancun Laboratory

China

Jiancong Cui

University of Chinese Academy of Sciences; Institute of Information Engineering at Chinese Academy of Sciences

China

Yuede Ji

University of North Texas

United States

Zhiyu Hao

Zhongguancun Laboratory

China

Lun Li

Institute of Information Engineering at Chinese Academy of Sciences

China

Zhenquan Ding

Institute of Information Engineering at Chinese Academy of Sciences

China

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 18 Jul
Displayed time zone: Pacific Time (US & Canada) change

13:30 - 15:00	ISSTA 3: Deep-Learning for Software AnalysisTechnical Papers at Amazon Auditorium (Gates G20) Chair(s): Shiyi Wei University of Texas at Dallas

13:30 15m Talk		API2Vec: Learning Representations of API Sequences for Malware Detection Technical Papers Lei Cui Zhongguancun Laboratory, Jiancong Cui University of Chinese Academy of Sciences; Institute of Information Engineering at Chinese Academy of Sciences, Yuede Ji University of North Texas, Zhiyu Hao Zhongguancun Laboratory, Lun Li Institute of Information Engineering at Chinese Academy of Sciences, Zhenquan Ding Institute of Information Engineering at Chinese Academy of Sciences DOI
13:45 15m Talk		CONCORD: Clone-Aware Contrastive Learning for Source CodeACM SIGSOFT Distinguished Paper Technical Papers Yangruibo Ding Columbia University, Saikat Chakraborty Microsoft Research, Luca Buratti IBM Research, Saurabh Pujar IBM, Alessandro Morari IBM Research, Gail Kaiser Columbia University, Baishakhi Ray Columbia University DOI
14:00 15m Talk		Type Batched Program Reduction Technical Papers Golnaz Gharachorlu Simon Fraser University, Nick Sumner Simon Fraser University DOI
14:15 15m Talk		CodeGrid: A Grid Representation of Code Technical Papers Abdoul Kader Kaboré University of Luxembourg, Earl T. Barr University College London; Google DeepMind, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI
14:30 15m Talk		Guided Retraining to Enhance the Detection of Difficult Android Malware Technical Papers Nadia Daoudi University of Luxembourg, Kevin Allix CentraleSupélec, Tegawendé F. Bissyandé University of Luxembourg, Jacques Klein University of Luxembourg DOI
14:45 15m Talk		Automatically Reproducing Android Bug Reports using Natural Language Processing and Reinforcement Learning Technical Papers Zhaoxu Zhang University of Southern California, Robert Winn University of Southern California, Yu Zhao University of Central Missouri, Tingting Yu University of Cincinnati, William G.J. Halfond University of Southern California DOI