2024-08-20
2024-07-22
2024-06-20
Abstract—Aiming at the problem of large overhead and low accuracy on the identification of obfuscated and malicious code, a new algorithm is proposed to detect malicious code by identifying multidimensional features based on ReliefF and Boosting techniques. After a disassembly analysis and static analysis for the clustered malicious code families, the algorithm extracts features from four dimensions: two static properties (operation code sequences and bytecode sequence) and two features (system call graph and function call graph) which combines the semantic features to reflect the behaviour characteristic of the malware, and then selects important feature vectors based on Relief. Finally, ensemble learning is carried out, and the decision result is boosted based on weighted voting according to accuracy for a different feature analysis. It has been proven by experiment and comparison that the algorithms have a much higher accuracy of the testing dataset with low overhead. Index Terms—Malicious code detection, multidimensional features, Boosting, ReliefF Cite: Yang Xia Luo, “Malicious Detection Based on ReliefF and Boosting Multidimensional Features," Journal of Communications, vol. 10, no. 11, pp. 910-917, 2015. Doi: 10.12720/jcm.10.11.910-917