libdft Data flow tracking thing. Existing systems ad-hoc and problem specific. Flexible ones tend to have poor performance or other limitations. libdft -- data flow trackinf framework. Fast (by which he means less than 10x slowdown), reusable, and fine for commodity systems. Only care about explicit data flows i.e. not ones implicit in the control flow. Seems to be implemented on Pin. Performance: need to be careful about the actual instrumentation implementation, at an instruction-counting level. Data type of tag pool is important. Tags can be either one bit or one byte. Memory is duplicated to a shadow memory area which contains the pools, identity indexed. Perf looks to be roughly where you expect: ~factor 4 or five for CPU-intensive stuff. Also built a real thing to protect against stack smashes in the usual way. Overhead 7% on top of libdft itself. Q: What about memory/jump intensive benchmarks? A: Up to a factor of ten performance hit. Q: How does multithreading work? A: If you're doing 1-to-1 byte tag shadows you don't need to worry. For bit tracking you need, in principle, locking, but they don't do it. Q: Comparison to other frameworks e.g. Dynamo Rio? A: Similar feature set. Rio has more granularity. libdft is a different underlying framework. Libdft does reasonably well on performance. Q: How much work do you do in the initial analysis pass? How much dynamic information do you need? A: Depends on the underlying framework. Basically take what Pin gives us. Q: Is there any hope for DFT with usable performance for real applications? A: Inherently hard while maintaining flexibility. ------------------------ Transparent dynamic instrumentation Derek Bruening Whole system virtualisation -> hard to see what's going on in a single process. Do process virtualisation instead. e.g. Rosetta for Apple, Dr Memory debugging tool on DynamoRio. Inherently invasive. Makes transparency tricky. Tends to be done ad-hoc: run some apps, fix what breaks, run some more apps. Often deliberately sacrificed in corner cases to gain performance. Transparency principles: -- As few changes as possible. i.e. as close to native execution as possible -- Try to hide the changes which are made from the application. -- Separate resources e.g. principle one says don't do binary patching of application code. e.g. if you're doing an emulator with a code cache, you don't want to change the addresses He claims that the vast majority of real apps, and most benchmarks, will occasionally read and compute on their own return addresses? Huh? Also discusses instruction cache coherency issue. Now discussing data. He says you should keep stack, heap, registers, condition flags the same. e.g. Decent performance win if you assume that the flags are dead across ret instructions, but visual studio sometimes does that. Also, some apps put data beyond top of stack (e.g. MS Office) and some which use RSP as a general register (something by Adobe) So do separate resources -> disjoint stack, heap. Tricky on Windows because win32.dll can end up calling into the heap implicitly. Fix is to just load multiple copies and redirect calls from tool to new library. Still can't solve timing, or external observers, or address space transparency, ... Conclusions: if you're doing DBT then you need to think about transparency from the start. Q: Are there big differences between Linux and Windows when implementing this kind of thing? A: Yes. The ISA emulation itself is common, but running most apps requires a lot of platform-dependent stuff. e.g. some Microsoft apps assume that TLS slots are contiguous, even though the API says otherwise. Windows generally harder due to lack of information. Q: Assuming that you're in the same process. What are the alternatives? A: Either debugger-like external interface, or external to the OS. Both have problems. Somewhat application-dependent. Q: Instead of emphasising transparency could emphasise security. Is that a similar problem? A: Yes. A lot of the same principles apply. --------------- Execution mining Geoffrey Lefebvre Most of the time, if you want to understand kernel code you use printk. Which is sad, because the kernel is quite complicated. Also need to do a lot of iterating with different hypotheses. Problem is that existing tools are all live. Another problem: if you look at low-level stuff you get a huge semantic gap. Answer: Tralfamadore. Basic idea: run under qemu, record everything,then run analyses off-line later on. Want to use sane DRS rather than qemu. 3 primitives: streams represent execution, annotations describe events in the stream, and operators look for patterns in the annotations.