Home > Published Issues > 2015 > Volume 10, No. 11, Nov. 2015 >

Malicious Detection Based on ReliefF and Boosting Multidimensional Features

Yang Xia Luo
School of Information, Xi'an University of Finance and Economics, 710100, China

Abstract—Aiming at the problem of large overhead and low accuracy on the identification of obfuscated and malicious code, a new algorithm is proposed to detect malicious code by identifying multidimensional features based on ReliefF and Boosting techniques. After a disassembly analysis and static analysis for the clustered malicious code families, the algorithm extracts features from four dimensions: two static properties (operation code sequences and bytecode sequence) and two features (system call graph and function call graph) which combines the semantic features to reflect the behaviour characteristic of the malware, and then selects important feature vectors based on Relief. Finally, ensemble learning is carried out, and the decision result is boosted based on weighted voting according to accuracy for a different feature analysis. It has been proven by experiment and comparison that the algorithms have a much higher accuracy of the testing dataset with low overhead.

Index Terms—Malicious code detection, multidimensional features, Boosting, ReliefF

Cite: Yang Xia Luo, “Malicious Detection Based on ReliefF and Boosting Multidimensional Features," Journal of Communications, vol. 10, no. 11, pp. 910-917, 2015. Doi: 10.12720/jcm.10.11.910-917