TY - JOUR
T1 - Entropy analysis to classify unknown packing algorithms for malware detection
AU - Bat-Erdene, Munkhbayar
AU - Park, Hyundo
AU - Li, Hongzhe
AU - Lee, Heejo
AU - Choi, Mahn Soo
N1 - Funding Information:
Acknowledgements A preliminary version of this paper was presented at the 8th IEEE International Conference on Malware 2013 [52]. M.-S.Choi acknowledges the support by the National Research Foundation of Korea (Grant No. 2015-003689).
PY - 2017/6/1
Y1 - 2017/6/1
N2 - The proportion of packed malware has been growing rapidly and now comprises more than 80 % of all existing malware. In this paper, we propose a method for classifying the packing algorithms of given unknown packed executables, regardless of whether they are malware or benign programs. First, we scale the entropy values of a given executable and convert the entropy values of a particular location of memory into symbolic representations. Our proposed method uses symbolic aggregate approximation (SAX), which is known to be effective for large data conversions. Second, we classify the distribution of symbols using supervised learning classification methods, i.e., naive Bayes and support vector machines for detecting packing algorithms. The results of our experiments involving a collection of 324 packed benign programs and 326 packed malware programs with 19 packing algorithms demonstrate that our method can identify packing algorithms of given executables with a high accuracy of 95.35 %, a recall of 95.83 %, and a precision of 94.13 %. We propose four similarity measurements for detecting packing algorithms based on SAX representations of the entropy values and an incremental aggregate analysis. Among these four metrics, the fidelity similarity measurement demonstrates the best matching result, i.e., a rate of accuracy ranging from 95.0 to 99.9 %, which is from 2 to 13 higher than that of the other three metrics. Our study confirms that packing algorithms can be identified through an entropy analysis based on a measure of the uncertainty of the running processes and without prior knowledge of the executables.
AB - The proportion of packed malware has been growing rapidly and now comprises more than 80 % of all existing malware. In this paper, we propose a method for classifying the packing algorithms of given unknown packed executables, regardless of whether they are malware or benign programs. First, we scale the entropy values of a given executable and convert the entropy values of a particular location of memory into symbolic representations. Our proposed method uses symbolic aggregate approximation (SAX), which is known to be effective for large data conversions. Second, we classify the distribution of symbols using supervised learning classification methods, i.e., naive Bayes and support vector machines for detecting packing algorithms. The results of our experiments involving a collection of 324 packed benign programs and 326 packed malware programs with 19 packing algorithms demonstrate that our method can identify packing algorithms of given executables with a high accuracy of 95.35 %, a recall of 95.83 %, and a precision of 94.13 %. We propose four similarity measurements for detecting packing algorithms based on SAX representations of the entropy values and an incremental aggregate analysis. Among these four metrics, the fidelity similarity measurement demonstrates the best matching result, i.e., a rate of accuracy ranging from 95.0 to 99.9 %, which is from 2 to 13 higher than that of the other three metrics. Our study confirms that packing algorithms can be identified through an entropy analysis based on a measure of the uncertainty of the running processes and without prior knowledge of the executables.
KW - Entropy analysis
KW - Original entry point (OEP)
KW - Piecewise aggregate approximation (PAA)
KW - Symbolic aggregate approximation (SAX)
UR - http://www.scopus.com/inward/record.url?scp=85027987515&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027987515&partnerID=8YFLogxK
U2 - 10.1007/s10207-016-0330-4
DO - 10.1007/s10207-016-0330-4
M3 - Article
AN - SCOPUS:85027987515
SN - 1615-5262
VL - 16
SP - 227
EP - 248
JO - International Journal of Information Security
JF - International Journal of Information Security
IS - 3
ER -