Data-mining based SQL injection attack detection using internal query trees

Mi Yeon Kim, Dong Hoon Lee

Research output: Contribution to journalArticlepeer-review

40 Citations (Scopus)


Detecting SQL injection attacks (SQLIAs) is becoming increasingly important in database-driven web sites. Until now, most of the studies on SQLIA detection have focused on the structured query language (SQL) structure at the application level. Unfortunately, this approach inevitably fails to detect those attacks that use already stored procedure and data within the database system. In this paper, we propose a framework to detect SQLIAs at database level by using SVM classification and various kernel functions. The key issue of SQLIA detection framework is how to represent the internal query tree collected from database log suitable for SVM classification algorithm in order to acquire good performance in detecting SQLIAs. To solve the issue, we first propose a novel method to convert the query tree into an n-dimensional feature vector by using a multi-dimensional sequence as an intermediate representation. The reason that it is difficult to directly convert the query tree into an n-dimensional feature vector is the complexity and variability of the query tree structure. Second, we propose a method to extract the syntactic features, as well as the semantic features when generating feature vector. Third, we propose a method to transform string feature values into numeric feature values, combining multiple statistical models. The combined model maps one string value to one numeric value by containing the multiple characteristic of each string value. In order to demonstrate the feasibility of our proposals in practical environments, we implement the SQLIA detection system based on PostgreSQL, a popular open source database system, and we perform experiments. The experimental results using the internal query trees of PostgreSQL validate that our proposal is effective in detecting SQLIAs, with at least 99.6% of the probability that the probability for malicious queries to be correctly predicted as SQLIA is greater than the probability for normal queries to be incorrectly predicted as SQLIA. Finally, we perform additional experiments to compare our proposal with syntax-focused feature extraction and single statistical model based on feature transformation. The experimental results show that our proposal significantly increases the probability of correctly detecting SQLIAs for various SQL statements, when compared to the previous methods.

Original languageEnglish
Pages (from-to)5416-5430
Number of pages15
JournalExpert Systems With Applications
Issue number11
Publication statusPublished - 2014 Sept 1


  • Data mining
  • Database
  • Intrusion detection
  • SQL injection attack
  • SVM

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Data-mining based SQL injection attack detection using internal query trees'. Together they form a unique fingerprint.

Cite this