Rendezvous: A Search Engine for Binary Code

Abstract

The problem of matching between binaries is important for software copyright enforcement as well as for identifying disclosed vulnerabilities in software. We present a search engine prototype called Rendezvous which enables indexing and searching for code in binary form. Rendezvous identifies binary code using a statistical model comprising instruction mnemonics, control-flow sub-graphs and data constants which are simple to extract from a disassembly, yet normalising with respect to different compilers and optimisations. Experiments show that Rendezvous achieves F2 measures of 86.7% and 83.0% on the GNU C library compiled with different compiler optimisations and the GNU coreutils suite compiled with gcc and clang respectively. These two code bases together comprise more than one million lines of code. Rendezvous will bring significant changes to the way patch management and copyright enforcement is currently performed. pdf (preprint)


Wei Ming Khoo, Alan Mycroft, Ross Anderson, Rendezvous: A Search Engine for Binary Code, The 10th Working Conference on Mining Software Repositories (MSR'13), 2013

Main page

Contact Information

Wei Ming Khoo
University of Cambridge
Computer Laboratory
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom

wmk26[AT]cam[DOT]ac[DOT]uk