Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper)
Software system complexity and security vulnerability diversity are plausible sources of the persistent challenges in software vulnerability research. Applying deep learning methods for automatic vulnerability detection has been proven an effective means to complement traditional detection approaches. Unfortunately, lacking well-qualified benchmark datasets could critically restrict the effectiveness of deep learning-based vulnerability detection techniques. Specifically, the long-term existence of erroneous labels in the existing vulnerability datasets may lead to inaccurate, biased, and even flawed results.
In this paper, we aim to obtain an in-depth understanding and explanation of the label error causes. To this end, we systematically analyze the diversified datasets used by state-of-the-art learning-based vulnerability detection approaches, and examine their techniques for collecting vulnerable source code datasets. We find that label errors heavily impact the mainstream vulnerability detection models, with a worst-case average F1 drop of {20.7%}. As mitigation, we introduce two approaches to dataset denoising, which will enhance the model performance by an average of {10.4%}. Leveraging dataset denoising methods, we provide a feasible solution to obtain high-quality labeled datasets.
Wed 19 JulDisplayed time zone: Pacific Time (US & Canada) change
10:30 - 12:00 | ISSTA 5: Improving Deep Learning SystemsTechnical Papers at Smith Classroom (Gates G10) Chair(s): Michael Pradel University of Stuttgart | ||
10:30 15mTalk | Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (Experience Paper) Technical Papers XuNie Huazhong University of Science and Technology; Beijing University of Posts and Telecommunications, Ningke Li Huazhong University of Science and Technology, Kailong Wang Huazhong University of Science and Technology, Shangguang Wang Beijing University of Posts and Telecommunications, Xiapu Luo Hong Kong Polytechnic University, Haoyu Wang Huazhong University of Science and Technology DOI | ||
10:45 15mTalk | Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis Technical Papers Xiangzhe Xu Purdue University, Shiwei Feng Purdue University, Yapeng Ye Purdue University, Guangyu Shen Purdue University, Zian Su Purdue University, Siyuan Cheng Purdue University, Guanhong Tao Purdue University, Qingkai Shi Purdue University, Zhuo Zhang Purdue University, Xiangyu Zhang Purdue University DOI | ||
11:00 15mTalk | CILIATE: Towards Fairer Class-Based Incremental Learning by Dataset and Training Refinement Technical Papers Xuanqi Gao Xi’an Jiaotong University, Juan Zhai University of Massachusetts Amherst, Shiqing Ma UMass Amherst, Chao Shen Xi’an Jiaotong University, Yufei Chen Xi’an Jiaotong University; City University of Hong Kong, Shiwei Wang Xi’an Jiaotong University DOI Pre-print | ||
11:15 15mTalk | DeepAtash: Focused Test Generation for Deep Learning Systems Technical Papers DOI | ||
11:30 15mTalk | Systematic Testing of the Data-Poisoning Robustness of KNN Technical Papers Yannan Li University of Southern California, Jingbo Wang University of Southern California, Chao Wang University of Southern California DOI | ||
11:45 15mTalk | Semantic-Based Neural Network Repair Technical Papers DOI |