TY - GEN
T1 - End-to-end prediction of buffer overruns from raw source code via neural memory networks
AU - Choi, Min Je
AU - Jeong, Sehun
AU - Oh, Hakjoo
AU - Choo, Jaegul
N1 - Funding Information:
This work was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIP) (No. NRF-2016M3C1B6950000, NRF-2016R1C1B2014062). Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of funding agencies.
PY - 2017
Y1 - 2017
N2 - Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches based on rigid rules and handcrafted features are limited in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, datadriven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source code samples demonstrate that our proposed model is capable of accurately detecting different types of buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.
AB - Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches based on rigid rules and handcrafted features are limited in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, datadriven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source code samples demonstrate that our proposed model is capable of accurately detecting different types of buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.
UR - http://www.scopus.com/inward/record.url?scp=85031926400&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2017/214
DO - 10.24963/ijcai.2017/214
M3 - Conference contribution
AN - SCOPUS:85031926400
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 1546
EP - 1553
BT - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
A2 - Sierra, Carles
PB - International Joint Conferences on Artificial Intelligence
T2 - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017
Y2 - 19 August 2017 through 25 August 2017
ER -