CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation (ISSTA 2023 - Technical Papers)

Who

Yihong Dong, Ge Li, Zhi Jin

Track

ISSTA 2023 Technical Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 18 Jul 2023 15:40 - 15:50 at Smith Classroom (Gates G10) - ISSTA Online 1: SE and Deep Learning Chair(s): Myra Cohen

Abstract

General-purpose code generation aims to automatically convert the natural language description to code snippets in a general-purpose programming language (GPL) such as Python. In the process of code generation, it is essential to guarantee the generated code satisfies grammar constraints of GPL. However, existing sequence-to-sequence (Seq2Seq) approaches neglect grammar rules when generating GPL code. In this paper, we devise a pushdown automaton (PDA)-based methodology to make the first attempt to consider grammatical Seq2Seq models for general-purpose code generation, exploiting the principle that PL is a subset of PDA recognizable language and code accepted by PDA is grammatical. Specifically, we construct a PDA module and design an algorithm to constrain the generation of Seq2Seq models to ensure grammatical correctness. Guided by this methodology, we further propose CODEP, a code generation framework equipped with a PDA module, to integrate the deduction of PDA into deep learning. This framework leverages the state of PDA deduction (including state representation, state prediction task, and joint prediction with state) to assist models in learning PDA deduction. To comprehensively evaluate CODEP, we construct a PDA for Python and conduct extensive experiments on four public benchmark datasets. CODEP can employ existing sequence-based models as base models, and we show that it achieves 100% grammatical correctness percentage on these benchmark datasets. Consequently, CODEP relatively improves 17% CodeBLEU on CONALA, 8% EM on DJANGO, and 15% CodeBLEU on JUICE-10K compared to base models. Moreover, PDA module also achieves significant improvements on the pre-trained models.

Link to Preprint

https://github.com/YihongDong/CODEP/blob/main/CODEP%20Grammatical%20Seq2Seq%20Model%20for%20General-Purpose%20Code%20Generation.pdf

DOI

https://doi.org/10.1145/3597926.3598048

Yihong Dong

Peking University

China

Ge Li

Peking University

China

Zhi Jin

Peking University

China

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 18 Jul
Displayed time zone: Pacific Time (US & Canada) change

15:30 - 17:00	ISSTA Online 1: SE and Deep LearningTechnical Papers at Smith Classroom (Gates G10) Chair(s): Myra Cohen Iowa State University

15:30 10m Talk		COME: Commit Message Generation with Modification Embedding Technical Papers Yichen He Beihang University, Liran Wang Beihang University, Kaiyi Wang Beihang University, Yupeng Zhang Beihang University, Hang Zhang Beihang University, Zhoujun Li Beihang University DOI
15:40 10m Talk		CODEP: Grammatical Seq2Seq Model for General-Purpose Code Generation Technical Papers Yihong Dong Peking University, Ge Li Peking University, Zhi Jin Peking University DOI Pre-print
15:50 10m Talk		Towards More Realistic Evaluation for Neural Test Oracle Generation Technical Papers Zhongxin Liu Zhejiang University, Kui Liu Huawei, Xin Xia Huawei, Xiaohu Yang Zhejiang University DOI Pre-print
16:00 10m Talk		Detecting Condition-Related Bugs with Control Flow Graph Neural Network Technical Papers Jian Zhang Beihang University, Xu Wang Beihang University, Hongyu Zhang Chongqing University, Hailong Sun Beihang University, Xudong Liu Beihang University, Chunming Hu Beihang University, Yang Liu Nanyang Technological University DOI
16:10 10m Talk		RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring Technical Papers Hao Liu Xiamen University, Yanlin Wang Sun Yat-sen University, Zhao Wei Tencent, Yong Xu Tencent, Juhong Wang Tencent, Hui Li Xiamen University, Rongrong Ji Xiamen University DOI Pre-print
16:20 10m Talk		Interpreters for GNN-Based Vulnerability Detection: Are We There Yet? Technical Papers Yutao Hu Huazhong University of Science and Technology, Suyuan Wang Huazhong University of Science and Technology, Wenke Li Huazhong University of Science and Technology, Junru Peng Wuhan University, Yueming Wu Nanyang Technological University, Deqing Zou Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology DOI
16:30 10m Talk		Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond Technical Papers Ensheng Shi Xi’an Jiaotong University, Yanlin Wang Sun Yat-sen University, Hongyu Zhang Chongqing University, Lun Du Microsoft Research, Shi Han Microsoft Research, Dongmei Zhang Microsoft Research, Hongbin Sun Xi’an Jiaotong University DOI