Alan Mycroft's student project suggestions
|Computer Laboratory > Alan Mycroft > Teaching > Alan Mycroft's student project suggestions|
Program this in Java, possible with various snooping and malicious process variants.
The aim of a decompiler is to produce as readable as possible high-level output (e.g. C or higher) from low-level source (e.g. assembler or binary). A simple system can be got working quite quickly e.g. produce C statements like "AX=AX+BX;" for assembly code like addl %eax,%ebx Doing it better has almost limitless possibilities. See Cifuentes' information: The Decompilation Page or see some work of mine Type-based decompilation for some work to get back structs/unions/pointers from code. (You'll note from her review of the paper that results in practice will be worth publishing.)
Pretty much a similar idea to the project above, but using VHDL. Can you transform a netlist (or even place-and-routed input suitable for FPGA or "tapeout") form of a chip back into readable VHDL or Verilog?
Lucid is a simple dataflow language. Operators effectively operate
on streams of data, for example
Many modern processors have significant non-sequential behaviour (e.g. multiple issue, pipelinining, VLIW). See for example http://www.eetimes.com/conferences/epf/story/OEG20010613S0082. Compilers are often not good at generating code which exploits instruction-level parallelism. So let's write a source-to-source optimiser at the assembly-code level which would produce better code. This project would be about statically analysing a sequence of instructions to determine the data flow, the implied latencies and resource contentions inherent in the sequence and rearranging the instructions into a sequence which achieves minimum cycle count. Use an processor of your choice, or design your own!
There's a fair bit of interest in the automotive embedded programming world regarding the extent to which it is possible to take a program written in C which uses floating point data types and have it translated into one which uses only fixed point data types. At one end of the spectrum lies manual translation; at the other, fully automatic translation. Probably there is a practical tool somewhere between the two.
Some of the variables in the program will have constraints on the ranges of values they can hold. They may also have constraints by way of a minimum resolution. The constraints can be used to suggest a range of fixed point types for the variables. Static analysis of the program could then be used to infer constraints on expressions and therefore to place constraints on the values of other variables. Having done this, there will still be a pool (probably quite large!) of variables and sub-expressions which do not have any candidate fixed point types. This might be the point to inject statistics on the ranges which variables and expressions might take, obtained dynamically either from unit tests, simulations or in-vehicle experiments. Or, one could ask an intelligent user to come up with some direction as to how to resolve conflicts. Or some disambiguating rule could be applied when faced with deciding between (a) preserving resoluting and running the risk of overflow, or (b) avoiding overflow at the expense of resolution.
A good project might well lead to a job, but remember the project is to demonstrate your computer science skills.
Necula and Lee at CMU proposed the idea of proof-carrying code for various applications. Basically proving that (e.g. binary) code originating from elsewhere is `correct' is in general infeasible. However, checking that code accompanied with a specification and proof meets the given safety specification is essentially linear. (One can see the Java so-called `Bytecode verifier' as rather a weak special case which essentially only checks that each path through a method has appropriately matching pushes and pops.)
For more technical information see the papers on http://foxnet.cs.cmu.edu/papers.html by Necula.
There are many possible applications of this idea from Operating Systems (e.g. register your own privileged code as an interrupt handler or operating system component) to Proved program compilers (given a source program annotated proof statements, compiler the sources to object code and the proof of the source into a proof of the binary which can then be checked possibly by a linker).
There has become significant interest in using C code to describe hardware, for example SystemC and Handel-C. Various control structures map ideally: integers are bundles of wire or latches; loops represent clocked logic; arrays could be local pieces of RAM.
The are many possible projects within this general area. They vary over several dimensions: should the target language be Verilog, should the compiler perform analysis to identify C code which might parallelise into two hardware components, should one carefully analyse the C to choose the optimal representation etc etc. Talking to various compiler/hardware people will clearly be helpful to get a project which suits you as well as possible.
BCPL is a typeless fore-runner of C whose advantage for this project is that we have source access to Dr. Martin Richards' BCPLKIT compiler which consists of around 2000 lines and is hence easy to read and modify. The current structure of the compiler reads source into a parse tree; generates stack-oriented intermediate code (OCODE); this is then translated into code for the target architecture. There is no global optimisation and it is hard to exploit efficiently multi-register machines (e.g. typical RISC).
This project would replace the intermediate code with 3-address code and then perform optimisations on the intermediate code such as `global common sub-expression elimination' or `register allocation by colouring' (see e.g. Alan Mycroft's Optimising Compiler course).
INMOS developed the OCCAM language as a low-level (about the level of C but smaller) programming language for the transputer. It provides PAR, ALT and SEQ operations for parallel, alternative (e.g. if-then-else) and sequential composition of basic commands. In addition there are language primitives for synchronised message passing between processes.
A basic project could just provide a parser for OCCAM and an interpreter. More sophisticated projects could include compiling OCCAM to machine code on a single processor (with run-time support for parallelism) or even to Unix (or Java) process primitives.
An alternative would be to develop tools to transform OCCAM programs (e.g. to reduce the amount of task creation in favour of communication with pre-created tasks).
Abadi and Cardelli gave a very simple `object calculus' on around the same level as the lambda-calculus in their ESOP'94 paper. The calculus is best seen as a very simple (but expressive) object-oriented programming language. There are both typed and untyped variants. Because of the simplicity an interpreter (or compiler) can be written within the time-frame of a CST or Diploma project (but prior experience of programming languages would probably be required for the latter). The mathematical presentation of the Abadi and Cardelli paper requires some initial effort to read.