A scalable learning algorithm for data-driven program analysis

Sooyoung Cha, Sehun Jeong, Hakjoo Oh

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)


Context: Recently data-driven program analysis has emerged as a promising approach for building cost-effective static analyzers. The ideal static analyzer should apply accurate but costly techniques only when they benefit. However, designing such a strategy for real-world programs is highly nontrivial and requires labor-intensive work. The goal of data-driven program analysis is to automate this process by learning the strategy from data through a learning algorithm. Objective: Current learning algorithms for data-driven program analysis are not scalable enough to be used with large codebases. The objective of this paper is to overcome this shortcoming and present a new algorithm that is able to efficiently learn a strategy from large codebases. Method: The key idea is to use an oracle and transform the existing blackbox learning problem into a whitebox one that is much easier to solve. The oracle quantifies the relative importance of each part of the program with respect to the analysis precision. The oracle can be obtained by running the most and least precise analyses only once over the codebase. Results: Our learning algorithm is much faster than the existing algorithms while producing high quality strategies. The evaluation is done with 140 open-source C programs, comprising of 2.1 MLoC in total. Learning at this large scale was previously impractical. Conclusion: Our work advances the state-of-the-art of data-driven program analysis by addressing the scalability issue of the existing learning algorithm. Our technique will make the data-driven approach more practical in the real-world.

Original languageEnglish
Pages (from-to)1-13
Number of pages13
JournalInformation and Software Technology
Publication statusPublished - 2018 Dec

Bibliographical note

Publisher Copyright:
© 2018 Elsevier B.V.


  • Data-driven program analysis
  • Learning algorithm

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications


Dive into the research topics of 'A scalable learning algorithm for data-driven program analysis'. Together they form a unique fingerprint.

Cite this