Towards More Realistic Evaluation for Neural Test Oracle Generation
Unit testing has become an essential practice during software development and maintenance. Effective unit tests can help guard and improve software quality but require a substantial amount of time and effort to write and maintain. A unit test consists of a test prefix and a test oracle. Synthesizing test oracles, especially functional oracles, is a well-known challenging problem. Recent studies proposed to leverage neural models to generate test oracles, i.e., neural test oracle generation (NTOG), and obtained promising results. However, after a systematic inspection, we find there are some inappropriate settings in existing evaluation methods for NTOG. These settings could mislead the understanding of existing NTOG approaches' performance. We summarize them as 1) generating test prefixes from bug-fixed program versions, 2) evaluating with an unrealistic metric, and 3) lacking a straightforward baseline. In this paper, we first investigate the impacts of these settings on evaluating and understanding the performance of NTOG approaches. We find that 1) unrealistically generating test prefixes from bug-fixed program versions inflates the number of bugs found by the state-of-the-art NTOG approach TOGA by 61.8%, 2) FPR (False Positive Rate) is not a realistic evaluation metric and the Precision of TOGA is only 0.38%, and 3) a straightforward baseline NoException, which simply expects no exception should be raised, can find 61% of the bugs found by TOGA with twice the Precision. Furthermore, we introduce an additional ranking step to existing evaluation methods and propose an evaluation metric named Found@K to better measure the cost-effectiveness of NTOG approaches in terms of bug-finding. We propose a novel unsupervised ranking method to instantiate this ranking step, significantly improving the cost-effectiveness of TOGA. Eventually, based on our experimental results and observations, we propose a more realistic evaluation method TEval+ for NTOG and summarize seven rules of thumb to boost NTOG approaches into their practical usages.
Tue 18 JulDisplayed time zone: Pacific Time (US & Canada) change
15:30 - 17:00 | ISSTA Online 1: SE and Deep LearningTechnical Papers at Smith Classroom (Gates G10) Chair(s): Myra Cohen Iowa State University | ||
15:30 10mTalk | COME: Commit Message Generation with Modification Embedding Technical Papers Yichen He Beihang University, Liran Wang Beihang University, Kaiyi Wang Beihang University, Yupeng Zhang Beihang University, Hang Zhang Beihang University, Zhoujun Li Beihang University DOI | ||
15:40 10mTalk | CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation Technical Papers DOI Pre-print | ||
15:50 10mTalk | Towards More Realistic Evaluation for Neural Test Oracle Generation Technical Papers DOI Pre-print | ||
16:00 10mTalk | Detecting Condition-Related Bugs with Control Flow Graph Neural Network Technical Papers Jian Zhang Beihang University, Xu Wang Beihang University, Hongyu Zhang Chongqing University, Hailong Sun Beihang University, Xudong Liu Beihang University, Chunming Hu Beihang University, Yang Liu Nanyang Technological University DOI | ||
16:10 10mTalk | RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring Technical Papers Hao Liu Xiamen University, Yanlin Wang Sun Yat-sen University, Zhao Wei Tencent, Yong Xu Tencent, Juhong Wang Tencent, Hui Li Xiamen University, Rongrong Ji Xiamen University DOI Pre-print | ||
16:20 10mTalk | Interpreters for GNN-Based Vulnerability Detection: Are We There Yet? Technical Papers Yutao Hu Huazhong University of Science and Technology, Suyuan Wang Huazhong University of Science and Technology, Wenke Li Huazhong University of Science and Technology, Junru Peng Wuhan University, Yueming Wu Nanyang Technological University, Deqing Zou Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology DOI | ||
16:30 10mTalk | Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond Technical Papers Ensheng Shi Xi’an Jiaotong University, Yanlin Wang Sun Yat-sen University, Hongyu Zhang Chongqing University, Lun Du Microsoft Research, Shi Han Microsoft Research, Dongmei Zhang Microsoft Research, Hongbin Sun Xi’an Jiaotong University DOI |