Polymorphic attacks against sequence-based software birthmarks


Sequence alignment algorithms have recently found a use in detecting code clones, software plagiarism, code theft, and polymorphic malware. This approach involves extracting birthmarks, in this case sequences, from programs and comparing them using sequence alignment, a procedure which has been intensively studied in the field of bioinformatics. This idea seems promising. However, we have shown that an attacker can evade detection by considering the positions of inserted dummy code and/or the frequency of function calls. Moreover, we found that randomly inserting and deleting symbols in the sequence was ineffective. By using birthmark sequences extracted from actual malicious and benign programs, we found that the most effective strategy was to use a hybrid approach incorporating ``non-consecutive insertion'' and ``highest frequency deletion''. We also discuss the implementation costs of such attacks and propose using non-determinism through concurrent programming as an alternative evasion strategy. pdf slides

Hyoungshick Kim, Wei Ming Khoo, Pietro Lio, Polymorphic attacks against sequence-based software birthmarks, 2nd Software Security and Protection Workshop (SSP'12), 2012

Unity in diversity: Phylogenetic-inspired techniques for reverse engineering and detection of malware families


We developed a framework for abstracting, aligning and analysing malware execution traces and performed a preliminary exploration of state of the art phylogenetic methods, whose strengths lie in pattern recognition and visualisation, to derive the statistical relationships within two contemporary malware families. We made use of phylogenetic trees and networks, motifs, logos, composition biases, and tree topology comparison methods with the objective of identifying common functionality and studying sources of variation in related samples. Networks were more useful for visualising short nop-equivalent code metamorphism than trees; tree topology comparison was suited for studying variations in multiple sets of homologous procedures. We found logos could be used for code normalisation, which resulted in 33% to 62% reduction in the number of instructions. A motif search showed that API sequences related to the management of memory, I/O, libraries and threading do not change significantly amongst malware variants; composition bias provided an efficient way to distinguish between families. Using context-sensitive procedure analysis, we found that 100% of a set of memory management procedures used by the FakeAV-DO and “Skyhoo” malware families were uniquely identifiable. We discuss how phylogenetic techniques can aid the reverse engineering and detection of malware families and describe some related challenges. pdf slides

Wei Ming Khoo, Pietro Lio, Unity in diversity: Phylogenetic-inspired techniques for reverse engineering and detection of malware families, 1st SysSec Workshop, 2011

Main page

Contact Information

Wei Ming Khoo
University of Cambridge
Computer Laboratory
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom