Compiler Construction: With C# and DotNet Intermediate Language

These notes and example programs augment the introductory Compiler Construction course given by Dr. Mycroft. The course itself covers the fundamental principles, and some of the practice, of implementing compilers in a modern language.

This webpage builds on this with working examples showing how modern compiler tools (ANTLR) and modern implementation language (C#) can be used to construct a compiler for a simple high-level C-like language, and targetting the recently-introduced DotNet intermediate language from Microsoft.

A report going into further details of the programs presented here is available.


1. Hello World

The first example presented here shows the output of three compilers and their target languages. All example source and target code is available for download.

  1. C

    #include <stdio.h>
    
    int main( void )
    {
        puts( "Hello World!" );
        return 0;
    }
    

    When compiled with GCC for a PowerPC processor running Mac OSX, the output is:

    .data
    .cstring
            .align 2
    LC0:
            .ascii "Hello World!\0"
    .text
            .align 2
            .globl _main
    _main:
            mflr r0
            stmw r30,-8(r1)
            stw r0,8(r1)
            stwu r1,-80(r1)
            mr r30,r1
            bcl 20,31,L1$pb
    L1$pb:
            mflr r31
            addis r3,r31,ha16(LC0-L1$pb)
            la r3,lo16(LC0-L1$pb)(r3)
            bl L_puts$stub
            li r0,0
            mr r3,r0
            lwz r1,0(r1)
            lwz r0,8(r1)
            mtlr r0
            lmw r30,-8(r1)
            blr
    .data
    .picsymbol_stub
    L_puts$stub:
            .indirect_symbol _puts
            mflr r0
            bcl 20,31,L0$_puts
    L0$_puts:
            mflr r11
            addis r11,r11,ha16(L_puts$lazy_ptr-L0$_puts)
            mtlr r0
            lwz r12,lo16(L_puts$lazy_ptr-L0$_puts)(r11)
            mtctr r12
            addi r11,r11,lo16(L_puts$lazy_ptr-L0$_puts)
            bctr
    .data
    .lazy_symbol_pointer
    L_puts$lazy_ptr:
            .indirect_symbol _puts
            .long dyld_stub_binding_helper
    

  2. Java and the JVM

    Again, here is the basic hello world program in Java:

    public class Hello 
    {
        public static void main( String[] args )
        {
            System.out.print( "Hello world.\n" );
        }
    }
    

    The Java compiler, javac, compiles this into JVM bytecodes, which are then interpreted by the Java runtime. For this example, note the difference in the number of instructions in the program compared to the native-compiled C example above.

    Method Hello()
       0 aload_0
       1 invokespecial #1 <Method java.lang.Object()>
       4 return
    
    Method void main(java.lang.String[])
       0 getstatic #2 <Field java.io.PrintStream out>
       3 ldc #3 <String "Hello world.
    ">
       5 invokevirtual #4 <Method void print(java.lang.String)>
       8 return
    

    Far fewer instructions. Why is this?

  3. C# and DotNet

    Finally, here is the C# version and the corresponding DotNet Intermediate Code:

    using System;
    
    public class HelloApp
    {
        public static void Main()
        {
            Console.WriteLine( "Hello World!" );
        }
    }
    

    The Java compiler, javac, compiles this into JVM bytecodes, which are then interpreted by the Java runtime. For this example, note the difference in the number of instructions in the program compared to the native-compiled C example above.

    .class public auto ansi beforefieldinit HelloApp
           extends [mscorlib]System.Object
    {
      .method public hidebysig static void  Main() cil managed
      {
        .entrypoint
        // Code size       11 (0xb)
        .maxstack  1
        IL_0000:  ldstr      "Hello World!"
        IL_0005:  call       void [mscorlib]System.Console::WriteLine(string)
        IL_000a:  ret
      } // end of method HelloApp::Main
    
      .method public hidebysig specialname rtspecialname 
              instance void  .ctor() cil managed
      {
        // Code size       7 (0x7)
        .maxstack  1
        IL_0000:  ldarg.0
        IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
        IL_0006:  ret
      } // end of method HelloApp::.ctor
    
    } // end of class HelloApp
    
    

    Note that this is a subset of the output of the disassembler showing just the actual executable code.


From this point on we will use C# and DotNet as the implementation language and the target code respectively. To compile and run these examples you will need some form of DotNet runtime. There are currently two main versions:

  1. Microsoft DotNet

    The basic requirement is a C# compiler and the DotNet runtime. There are two versions available, the commercial version, and the research-oriented ROTOR project. There is a free, cut-down, version of the commercial version, and ROTOR is free.

  2. Mono

    The Mono Project is creating an open-source implementation of the C# and DotNet runtime as defined by the ECMA Standards that describe them. It has been ported to many operating systems, notably Linux, Mac OSX and Windows.


2. Four-Function Calculator

Two versions of the basic exeercise are presented, one using the recursive-descent techniques presented in the lecture course, the other using the compiler-generator tool ANTLR.

Hand-written recursive-descent parser.

Download files

To build the executable, you need a DotNet system installed (either Rotor or the full version). Then compile the application with:

csc 4calc.cs

and run with:

clix 4calc.exe

Enter expressions as you would expect, terminating each expressions with ';' (they can span multiple lines) and hit Return to evaluate them and display the result. To quit, enter Ctrl-D followed by Return.

Four-Function Calculator using ANTLR

This is a version of the 4-function calculator using ANTLR to generate a lexer and parser automatically from grammer rules.

Download files

You will need to install ANTLR before you can compile and run this example. How you achieve this will depend on which platform you are running on. For Rotor on MAC OSX, you cannot compile up the parts of the ANTLR runtime that use the Windows Forms libraries, which are not included in Rotor.

The user interface is exactly the same as for the hand-written parser version. However the internal operation is completely different. Whereas the earlier version computed intermediate results as soon as a rule was matched (e.g., "1+2" -> 3) the ANTLR-generated version first constructs a parse-tree of the input, and only then does the tree walker traverse the tree bottom-up computing the result.


Elisa --- A Simple Programming Language and its Compiler

Download files

Elisa is a simple C-like language with sufficient constructs to be able to write working programs that demonstrate the basics of a compiler. It is not meant to be a fully-developed programming language to write the next greatest operating system, but it could form the start of many student projects to build compilers or code analysers.

The Elisa language has a few simple operators, integers and arrays of integers, and functions.

The compiler itself consists of a handful of files, brought together by a Makefile:

Elisa.cs The main compiler driver: processes command line flags, then calls the parser and code generator.
CodeGen.cs This is the code generator interface, which defines the functionality that a code generator must provide.
DotNetCodeGen.cs A specific instance of the Code Generator class which generates DotNet output.
Symboltable.cs Type, symbol, and symbol table classes.
ElisaParser.g The ANTLR lexer and grammar file for Elisa. As for the four-function calculator above, this single file encompasses the lexer, the parser, and the parse tree walker.

The syntax of Elisa looks very much like C (or any other Algol-derived programming language). It has:

For example, here is a bubblesort function in Elisa:

// Integer bubblesort
// Takes an array of integers and the number of elements in the array
// and sorts them in-place

int bubblesort(int a[], int n)
{
    int t;
    int i;
    int done;

    done = 0;

    while (!done) {
        i = 1;
        done = 1;
        while (i < n) {
            if (a[i - 1] > a[i]) {
                done     = 0;
                t        = a[i - 1];
                a[i - 1] = a[i];
                a[i]     = t;
            }
            i = i + 1;
        }
    }

    return 0;
}

Note, there is already a simplified C compiler example project in the ROTOR package, called MyC, as well as a simplified LISP interpreter/compiler. You might wish to compare the compiler for Elisa with the MyC compiler.


Links To Related Material