Compiler Construction Exercises 2010. (C) David Greaves, Computer Laboratory, Cambridge. There is a large body of relevant exam questions on past papers, and they should serve as a main source of revision material. Therefore, the exercises on this page are slightly different: some of them are practical exercises and some are less elaborate or more challenging than typical Tripos 20-mark style. --- Exercises that involve ML programming can be done in any language of your choice, but you will find that the programs are longer (perhaps three times longer in php or perl and six times longer in Java). Exercise 1 Run the examples provided with the Syntax Tree/Lambda evaluator provided on this web page. Exercise 2 Extend the syntax tree evaluator so that it supports named functions, instead of just anonymous functions denoted with the "Fn" form. Exercise 3 For the syntax tree evaluator, implement a named function called triangle that computes triangle numbers using the named function mechanisms introduced in exercise 2. tri 1 = 1 tri N = sum 1 to N NB: Since we have not yet studied parsing concrete syntax into abstract syntax you should define functions by creating an initialised data structure of type Expr. Exercise 4 Extend the Syntax Tree evaluator so that it supports global variables, as shown in the lecture notes, but also make them mutable and introduce an assign operator. ----------------- Lexing exercises: Exercise L1 Why are regular languages suitable for lexing ? Exercise L2 Define a regular expression for date strings and then code it in the style accepted by lex or Jlex. Your language should cope with the sort of thing generated by the default output from the unix date command: $ date Fri Jan 22 11:31:22 GMT 2010 $ Exercise L3 Think of a language where it would be useful sometimes for some adjacent characters to sometimes be thought of as a single lexical token and at other times to be treated as more than one token. Exercise L4 Suppose a wacky language enables user-defined lexical tokens so that the parsing of parts of the input source are affected by what has come before. How might this be implemented ? NB: After studying both lex and yacc we will see that lex is generally called as a subroutine from yacc, but it could be an independent process provided it does not run too far ahead: e.g. it has a blocking write of its output tokens that experiences back pressure. ----------------- Parsing exercises: Exercise P1: What does it mean for a grammar to be ambiguous ? Give an example and say why such problems can often be fixed by adding an extra non-terminal Does such an extra production have to appear in the abstract syntax tree? Exercise P2: What is the difference between concrete syntax and abstract syntax? How might operator binding precedence be indicated in each? Why are parenthesis often only found in the concrete syntax form? Exercise P3: Show with examples how a re-writing parser (i.e. one that operates by iteratively re-writing the current sentential form by replacing strings that match the r.h.s. of a production with the appropriate non-terminal from the l.h.s.) might behave when faced with a) ambiguity b) operator precedence c) parenthesis matching. Exercise P4: How could the type checking phase of a compiler be fully implemented by its parser through well-formedness in the typing sense being intrinsic to the language grammar ? Why might this be inconvenient for the language design or user (give an example or two)? Exercise P5: ----------------- Translation exercises: Exercise T1: Give identities for the four connectives < > <= >= that express three of them as 'syntactic sugar' for the fourth. Explain why a two-handed IF parse tree node does not need its own AST node. Give another two examples where common concrete language forms do not need dedicated AST forms. Exercise T2: What extensions are needed to the toy language in the lecture notes to include arrays as a first-class part of the language ? Exercise T3: How is logical OR different from bitwise OR and how is this implemented in the translation phase of a compiler ? Exercise T4: What is the typical calling convention for stack-based intermediate code? How could differences between functions that return zero and one result be accommodated? Exercise T5: A simple implementation of a virtual machine for intermediate code keeps all of the goto destination labels on a linear list. Explain how performance can be improved with and without using a separate assembly phase. Exercise T6: Given that a 'typeof' function, as presented in the lecture slides, exists, sketch the modifications to a 'list flattening' style of intermediate code generator that emits the necessary 'i2f' instructions. If you wish, consider that the language contains strings as well as ints and floats, and provides the sort of coercion provided by php/javascript and so on. Exercise T7: The non-strict logical connectives (and and or) were presented in lectures as syntactic surgar for the predicated expression operator (questionmark/colon). Implement direct conversions of these operators instead. ----------------- Code generation exercises: Exercise C1: Implement the toy code generator, extended with a 'stackcache' as explained in lectures. Add the other extensions mentioned, up to the point of preserving values in registers until a join in the control graph. Exercise C2: Describe the operation of the Intel 'leave' instruction used to deallocate a stack frame. Design your own custom instructions for use at the beginning and end of a function call, paying close attention to the partition of work between caller and callee, optimisations that may be possible for a leaf routine and functions that accept a varying number of arguments (such as printf). Exercise C3: Describe the key functions of a link editor using about half a page of pseudo-code. END