Systematically Producing Test Orders to Detect Order-Dependent Flaky Tests
Software testing suffers from the presence of flaky tests, which can pass or fail when run on the same version of code. Order- dependent tests (OD tests) are flaky tests whose outcome depends on the order in which they are run. An OD test can be detected if specific tests are run or not run before it, resulting in a difference in test outcome. While prior work has proposed rerunning tests in different random test orders, this approach does not provide guarantees toward detecting all OD tests. Later work that proposed a more systematic approach to ordering tests still fails to account for the relationships between all tests in the test suite.
We propose three new techniques to detect OD tests through a more systematic means of producing test orders. Our techniques build upon prior work in Tuscan squares to cover test pairs in a minimal set of test orders while also obeying the constraints of how tests can be positioned in a test order w.r.t. their test classes. Further, as there are many test pairs that need to be covered, we develop a technique that can take a specified set of test pairs to cover and produce test orders that aim to cover just those test pairs. Our evaluation with 289 known OD tests across 47 test suites from open-source projects shows that our most cost-effective technique can detect 97.2% of the known OD tests with 104.7 test orders, on average, per subject. While all techniques produce a relatively large number of test orders, our analysis of the minimal set of test orders needed to detect OD tests shows a tremendous reduction in the test orders needed to detect OD tests – representing an opportunity for future work to prioritize test orders.
Thu 20 JulDisplayed time zone: Pacific Time (US & Canada) change
10:30 - 12:00 | ISSTA 9: Testing 2Technical Papers at Amazon Auditorium (Gates G20) Chair(s): Cristian Cadar Imperial College London | ||
10:30 15mTalk | A Comprehensive Study on Quality Assurance Tools for Java Technical Papers Han Liu East China Normal University, Sen Chen Tianjin University, Ruitao Feng UNSW, Chengwei Liu Nanyang Technological University, Kaixuan Li East China Normal University, Zhengzi Xu Nanyang Technological University, Liming Nie Nanyang Technological University, Yang Liu Nanyang Technological University, Yixiang Chen East China Normal University DOI | ||
10:45 15mTalk | Transforming Test Suites into Croissants Technical Papers Yang Chen University of Illinois at Urbana-Champaign, Alperen Yildiz Sabanci University, Darko Marinov University of Illinois at Urbana-Champaign, Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign DOI | ||
11:00 15mTalk | SlipCover: Near Zero-Overhead Code Coverage for Python Technical Papers Juan Altmayer Pizzorno University of Massachusetts Amherst, Emery D. Berger University of Massachusetts Amherst DOI | ||
11:15 15mTalk | To Kill a Mutant: An Empirical Study of Mutation Testing Kills Technical Papers Hang Du University of California at Irvine, Vijay Krishna Palepu Microsoft, James Jones University of California at Irvine DOI | ||
11:30 15mTalk | Systematically Producing Test Orders to Detect Order-Dependent Flaky Tests Technical Papers Chengpeng Li University of Texas at Austin, M. Mahdi Khosravi Middle East Technical University, Wing Lam George Mason University, August Shi University of Texas at Austin DOI | ||
11:45 15mTalk | Extracting Inline Tests from Unit Tests Technical Papers Yu Liu University of Texas at Austin, Pengyu Nie University of Texas at Austin, Anna Guo University of Texas at Austin, Milos Gligoric University of Texas at Austin, Owolabi Legunsen Cornell University DOI |