Machine learning is increasingly being applied to malicious JavaScript detection in response to the growing number of Web attacks and the attendant costly manual identification. In practice, to hide their malicious behaviors or protect intellectual copyrights, both malicious and benign scripts tend to obfuscate their own code before uploading. While obfuscation is beneficial, it also introduces some additional code features (e.g., dead code) into the code. When machine learning is employed to learn a malicious JavaScript detector, these additional features can affect the model to make it less effective. However, there is still a lack of clear understanding of how robust existing machine learning-based detectors are on different obfuscators.
In this paper, we conduct the first empirical study to figure out how obfuscation affects machine learning detectors based on static features. Through the results, we observe several findings: 1) Obfuscation has a significant impact on the effectiveness of detectors, causing an increase both in false negative rate (FNR) and false positive rate (FPR), and the bias of obfuscation in the training
set induces detectors to detect obfuscation rather than malicious behaviors. 2) The common measures such as improving the quality of the training set by adding relevant obfuscated samples and leveraging state-of-the-art deep learning models can not work well.3) The root cause of obfuscation effects on these detectors is that feature spaces they use can only reflect shallow differences in code, not about the nature of benign and malicious, which can be easily affected by the differences brought by obfuscation. 4) Obfuscation has a similar effect on realistic detectors in VirusTotal, indicating that this is a common real-world problem.

Tue 18 Jul

Displayed time zone: Pacific Time (US & Canada) change

15:30 - 17:00
ISSTA Online 3: Empirical StudiesTechnical Papers at Bezos Seminar Room (Gates G04)
Chair(s): Jordan Samhi University of Luxembourg
15:30
10m
Talk
Understanding Breaking Changes in the Wild
Technical Papers
Dhanushka Jayasuriya University of Auckland, Valerio Terragni University of Auckland, Jens Dietrich Victoria University of Wellington, Samuel Ou University of Auckland, Kelly Blincoe University of Auckland
DOI
15:40
10m
Talk
LiResolver: License Incompatibility Resolution for Open Source Software
Technical Papers
Sihan Xu Nankai University, Ya Gao Nankai University, Lingling Fan Nankai University, Linyu Li Nankai University, Xiangrui Cai Nankai University, Zheli Liu Nankai University
DOI
15:50
10m
Talk
An Empirical Study on Concurrency Bugs in Interrupt-Driven Embedded Software
Technical Papers
Chao Li Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Rui Chen Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Boxiang Wang Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Zhixuan Wang Xidian University, Tingting Yu Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Yunsong Jiang Beijing Institute of Control Engineering; Beijing Sunwise Information Technology, Mengfei Yang China Academy of Space Technology
DOI
16:00
10m
Talk
An Empirical Study of Functional Bugs in Android AppsACM SIGSOFT Distinguished Paper
Technical Papers
Yiheng Xiong East China Normal University, Mengqian Xu East China Normal University, Ting Su East China Normal University, Jingling Sun East China Normal University, Jue Wang Nanjing University, He Wen East China Normal University, Geguang Pu East China Normal University, Jifeng He East China Normal University, Zhendong Su ETH Zurich
DOI
16:10
10m
Talk
Testing the Compiler for a New-Born Programming Language: An Industrial Case Study (Experience Paper)
Technical Papers
Yingquan Zhao Tianjin University, Junjie Chen Tianjin University, Ruifeng Fu Tianjin University, Haojie Ye Huawei, Zan Wang Tianjin University
DOI
16:20
10m
Talk
An Empirical Study on the Effects of Obfuscation on Static Machine Learning-Based Malicious JavaScript Detectors
Technical Papers
Kunlun Ren Huazhong University of Science and Technology, Qiang Weizhong Huazhong University of Science and Technology, Yueming Wu Nanyang Technological University, yi zhou Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology
DOI
16:30
10m
Talk
Security Checking of Trigger-Action-Programming Smart Home Integrations
Technical Papers
Lei Bu Nanjing University, Qiuping Zhang Nanjing University, Suwan Li Nanjing University, Jinglin Dai Nanjing University, Guangdong Bai University of Queensland, Kai Chen Institute of Information Engineering at Chinese Academy of Sciences, Xuandong Li Nanjing University
DOI
16:40
10m
Talk
Third-Party Library Dependency for Large-Scale SCA in the C/C++ Ecosystem: How Far Are We?
Technical Papers
Ling Jiang Southern University of Science and Technology, Hengchen Yuan Southern University of Science and Technology, Qiyi Tang Tencent Security Keen Lab, Sen Nie Tencent Security Keen Lab, Shi Wu Tencent Security Keen Lab, Yuqun Zhang Southern University of Science and Technology
DOI
16:50
10m
Talk
Who Judges the Judge: An Empirical Study on Online Judge Tests
Technical Papers
Kaibo Liu Peking University, Yudong Han Peking University, Jie M. Zhang King’s College London, Zhenpeng Chen University College London, Federica Sarro University College London, Mark Harman University College London, Gang Huang Peking University; National Key Laboratory of Data Space Technology and System, Yun Ma Peking University
DOI