Automatically generating features for learning program analysis heuristics for C-Like languages

Kwonsoo Chae, Hakjoo Oh, Kihong Heo, Hongseok Yang

Research output: Contribution to journalArticlepeer-review

19 Citations (Scopus)

Abstract

We present a technique for automatically generating features for data-driven program analyses. Recently data-driven approaches for building a program analysis have been developed, which mine existing codebases and automatically learn heuristics for finding a cost-effective abstraction for a given analysis task. Such approaches reduce the burden of the analysis designers, but they do not remove it completely; they still leave the nontrivial task of designing so called features to the hands of the designers. Our technique aims at automating this feature design process. The idea is to use programs as features after reducing and abstracting them. Our technique goes through selected program-query pairs in codebases, and it reduces and abstracts the program in each pair to a few lines of code, while ensuring that the analysis behaves similarly for the original and the new programs with respect to the query. Each reduced program serves as a boolean feature for program-query pairs. This feature evaluates to true for a given program-query pair when (as a program) it is included in the program part of the pair. We have implemented our approach for three real-world static analyses. The experimental results show that these analyses with automatically-generated features are cost-effective and consistently perform well on a wide range of programs.

Original languageEnglish
Article number101
JournalProceedings of the ACM on Programming Languages
Volume1
Issue numberOOPSLA
DOIs
Publication statusPublished - 2017 Oct

Keywords

  • Automatic feature generation
  • Data-driven program analysis

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Automatically generating features for learning program analysis heuristics for C-Like languages'. Together they form a unique fingerprint.

Cite this