CodeGrid: A Grid Representation of Code (ISSTA 2023 - Technical Papers)

Who

Abdoul Kader Kaboré, Earl T. Barr, Jacques Klein, Tegawendé F. Bissyandé

Track

ISSTA 2023 Technical Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 18 Jul 2023 14:15 - 14:30 at Amazon Auditorium (Gates G20) - ISSTA 3: Deep-Learning for Software Analysis Chair(s): Shiyi Wei

Abstract

Code representation is a key step in the application of AI in software engineering.
Generic NLP representations are effective but do not exploit all the rich structure inherent to code.
Recent work has focused on extracting abstract syntax trees (AST) and integrating their structural information into code representations.These AST-enhanced representations advanced the state of the art and accelerated new applications of AI to software engineering. ASTs, however, neglect important aspects of code structure, notably control and data flow, leaving some potentially relevant code signal unexploited. For example, purely image-based representations perform nearly as well as AST-based representations, despite the fact that they must learn to even recognize tokens, let alone their semantics. This result, from prior work, is strong evidence that these new code representations can still be improved; it also raises the question of just {\em what signal image-based approaches are exploiting}.

We answer this question. We show that code is \emph{spatial} and exploit this fact to propose \toolname, a new representation that embeds tokens into a grid that preserves code layout.
Unlike some of the existing state of the art, \toolname is agnostic to the downstream task: whether that task is generation or classification, \toolname can complement the learning algorithm with spatial signal. For example, we show that CNNs, which are inherently spatially-aware models, can exploit \toolname outputs to effectively tackle fundamental software engineering tasks, such as code classification, code clone detection and vulnerability detection. PixelCNN leverages \toolname's grid representations to achieve code completion. Through extensive experiments, we validate our spatial code hypothesis, quantifying model performance as we vary the degree to which the representation preserves the grid. To demonstrate its generality, we show that \toolname augments models, improving their performance on a range of tasks, On clone detection, \toolname improves ASTNN's performance by 3.3% F1 score.

DOI

https://doi.org/10.1145/3597926.3598141

Abdoul Kader Kaboré

University of Luxembourg

Luxembourg

Earl T. Barr

University College London; Google DeepMind

United Kingdom

Jacques Klein

University of Luxembourg

Luxembourg

Tegawendé F. Bissyandé

University of Luxembourg

Luxembourg

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 18 Jul
Displayed time zone: Pacific Time (US & Canada) change

13:30 - 15:00	ISSTA 3: Deep-Learning for Software AnalysisTechnical Papers at Amazon Auditorium (Gates G20) Chair(s): Shiyi Wei University of Texas at Dallas

13:30 15m Talk		API2Vec: Learning Representations of API Sequences for Malware Detection Technical Papers Lei Cui Zhongguancun Laboratory, Jiancong Cui University of Chinese Academy of Sciences; Institute of Information Engineering at Chinese Academy of Sciences, Yuede Ji University of North Texas, Zhiyu Hao Zhongguancun Laboratory, Lun Li Institute of Information Engineering at Chinese Academy of Sciences, Zhenquan Ding Institute of Information Engineering at Chinese Academy of Sciences DOI
13:45 15m Talk		CONCORD: Clone-Aware Contrastive Learning for Source CodeACM SIGSOFT Distinguished Paper Technical Papers Yangruibo Ding Columbia University, Saikat Chakraborty Microsoft Research, Luca Buratti IBM Research, Saurabh Pujar IBM, Alessandro Morari IBM Research, Gail Kaiser Columbia University, Baishakhi Ray Columbia University DOI
14:00 15m Talk		Type Batched Program Reduction Technical Papers Golnaz Gharachorlu Simon Fraser University, Nick Sumner Simon Fraser University DOI
14:15 15m Talk		CodeGrid: A Grid Representation of Code Technical Papers Abdoul Kader Kaboré University of Luxembourg, Earl T. Barr University College London; Google DeepMind, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg DOI
14:30 15m Talk		Guided Retraining to Enhance the Detection of Difficult Android Malware Technical Papers Nadia Daoudi University of Luxembourg, Kevin Allix CentraleSupélec, Tegawendé F. Bissyandé University of Luxembourg, Jacques Klein University of Luxembourg DOI
14:45 15m Talk		Automatically Reproducing Android Bug Reports using Natural Language Processing and Reinforcement Learning Technical Papers Zhaoxu Zhang University of Southern California, Robert Winn University of Southern California, Yu Zhao University of Central Missouri, Tingting Yu University of Cincinnati, William G.J. Halfond University of Southern California DOI