A machine-learning algorithm with disjunctive model for data-driven program analysis

Minseok Jeon, Sehun Jeong, Sungdeok Cha, Hakjoo Oh

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)

Abstract

We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simpleminded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for datadriven program analysis aswell as a learning algorithm to find the model parameters. Ourmodel uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: Contextsensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.

Original languageEnglish
Article number13
JournalACM Transactions on Programming Languages and Systems
Volume41
Issue number2
DOIs
Publication statusPublished - 2019 Jun

Bibliographical note

Funding Information:
M. Jeon and S. Jeong are contributed equally to this work. This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-IT1701-09. This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (No.2017-0-00184, Self-Learning Cyber Immune Technology Development). Authors’ addresses: M. Jeon, S. Jeong, S. Cha, and H. Oh (corresponding author), Department of Computer Science and Engineering, Korea University, 145, Anam-ro, Sungbuk-gu, Seoul, 02841, Republic of Korea; emails: {minseok_jeon, gifaranga, scha, hakjoo_oh}@korea.ac.kr. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2019 Association for Computing Machinery. 0164-0925/2019/06-ART13 $15.00 https://doi.org/10.1145/3293607

Publisher Copyright:
© 2019 ACM.

Keywords

  • Context-sensitivity
  • Data-driven program analysis
  • Flow-sensitivity
  • Static analysis

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'A machine-learning algorithm with disjunctive model for data-driven program analysis'. Together they form a unique fingerprint.

Cite this