THE COMPILER GENERATOR                  -- by L.C. Paulson
			September 1982, revised August 1984 and January 1988


DOCUMENTATION:

The complete documentation is my thesis (Paulson, 1981):  it explains how to
write semantic grammars and operate the compiler generator.  It also presents
the design issues and details of the implementation.  My paper in the 1982
Symposium on Principles of Programming Languages is a brief introduction to
the compiler generator (Paulson, 1982).  I have recently written a survey
article (Paulson, 1984).

In the thesis, Chapter 3 describes semantic grammars, Chapter 4 tells how to
define static semantics, and Chapter 5 is an introduction to dynamic
semantics.  Appendixes A through C are user manuals for
each component of the compiler generator.



SOURCE FILES of the compiler generator:

ga.p                  the Grammar Analyzer
ut.p                  the Universal Translator
sm.p                  the Stack Machine

Compile these using the `pc' command, or use the trivial Makefile.  The
switches of the compiler generator are controlled by assigning to Boolean
variables through the Pascal debugger. These are mostly listing options, but
there is also the option of generating tagfields at runtime, as the thesis
explains (section 6.3.6).  

The Grammar Analyzer and Universal Translator produce listing files ending
with `.lst' ; the Stack Machine produces an execution log file ending with
`.log'.  Language Description Files end with `.ldf' ;  object files for the
Stack Machine end with `.obj' .


SEMANTIC GRAMMARS:  any file ending with .sg
p2.sg            the Pascal grammar
for.sg           the Fortran grammar
p0v.sg		 generates verification conditions for language p0

TEST PROGRAMS:
*.p2                    Pascal test programs
*.for                   Fortran test programs

THE LR(0) EXAMPLE:
lr0.p2                  source file for LR(0) parser generator
lr0.data                program's input data:  a small expression grammar
                                from Aho and Ullman (1977), coded 
				as numbers
lr0.result              what the answer should be

All three components of the compiler generator prompt for their inputs.  For
the LR(0) example, the keyboard input to the Stack Machine is stored in
lr0.data;  this example can be run by the Unix command
    sm < lr0.data
The output of the Stack Machine will be a list of integers;  each kernel
begins with its kernel number, ends with 0, and lists the items as pairs of
integers.



PORTING THE COMPILER GENERATOR

I have ported the compiler generator to VAX/Unix (Berkeley Pascal), and also
to Multics (Grenoble Pascal).  Each task required just a few afternoons.
The main porting problems are calls to nonstandard procedures:

	procedure	purpose
	---------	-------
	date		date in characters
	time		time of day in characters
	clock		elapsed runtime, as an integer (milliseconds)
	assert		print error message if argument is false

The programs also call reset and rewrite with additional parameters
specifying disk files, which is nonstandard.  Many of the write statements
give a format width of "1", to avoid printing blank space in front of
numbers;  some Pascal systems print a warning when this width is exceeded.
When printing strings, Pascal is supposed to print only as many characters
as the width allows;  my histogram-printing procedures depend on this.

There are a few machine dependencies, all marked with the comment "machine
dependent".  The most serious one is the overlaying of pointers and integers
in the array dagArg of the Universal Translator.  However, this code worked
without change both times I ported it.

All identifiers are distinct in their first ten characters.


BUGS

I am aware of only one:  the stack machine crashes while executing
"ex4.asple".  I have not investigated this at all, as I am not studying
compiler generation at the moment.

The system may crash with a subscript error if you give it a grammar that
exceeds one of the limits set in the constant declarations of the system.


ARRAY TREES

I have discovered a new implementation of arrays that allows loads and
stores in log(subscript) time, and where "store" creates a new array,
leaving the old array unchanged.  These "array trees" make the array
compactor obsolete.  While they are only slightly faster than the array
compactor, they are much simpler and easier to implement. I suspect that
they are more robust;  the array compactor's performance is difficult to
predict. This note sketches how array trees work.

Consider the following binary tree:


                                1

                2                               3

        4               6               5               7
        
    8      12      10      14       9      13       11      15


An array tree X stores the element X[i] in the corresponding position of the
above binary tree.  Load and store locate the desired element by following
the binary decomposition of the subscript;  store then rebuilds the tree,
constructing new nodes to contain the data passed over when descending into
the tree.  The old tree is unchanged, as the stack machine requires.  The
stack machine (sm.p) implements load and store as the Pascal procedures
arrayLookup and arrayEnter.



POSSIBLE IMPROVEMENTS

Section 8.3.1 of the thesis suggests splitting the simplifier into an
evaluator and an optimizer.  The section claims that you must first split
the semantic language into static and dynamic parts, but this is
unnecessary. An ordinary call-by-value evaluator for my semantic language
would compute attributes correctly, and much more efficiently than my
simplifier does.  The optimizer would need to perform only beta-reductions;
it would be called on the bodies of lambda-expressions, just before
translating them into SECD instructions. Then top-level recursive functions
would become semantically equivalent to locally defined recursive functions
(compare with secton B.1). Many of the optimizations described in Chapter 7
would become unnecessary.

Data types (domains) in my semantic language are clumsy.  Much better would be
the polymorphic type system used in the language ML (Gordon et al., 1979).
This would retain the error-detection one expects of strong typing, while
getting rid of the need to specify the type of every variable, attribute,
recursive function, and bottom element (BOT).  Forward declarations would
disappear.

The ML mechanism for generating and trapping failures could produce readable
messages for both compile- and run-time errors.  (Currently it is almost
imposssible to diagnose run-time errors in the stack machine.)  But first
the compiler generator would have to provide more general string operations.
Perhaps the next compiler generator should be written on top of ML, in the
style of LCF (Gordon et al., 1979).

Uwe Pleban and his colleagues have done some interesting work using the 
compiler generator (Milos et al., 1984).

REFERENCES

A.V. Aho and J.D. Ullman (1977),
Principles of Compiler Design,
Addison-Wesley.

M.J.C. Gordon, R. Milner, C.W. Wadsworth (1979),
Edinburgh LCF,
Springer-Verlag LNCS 78.

D. Milos, U. Pleban, G. Loegel (1984),
Direct Implementation of Compiler Specifications,
11th POPL.

L.C. Paulson (1981),
A Compiler Generator for Semantic Grammars,
PhD Thesis, Stanford University.

L.C. Paulson (1982),
A semantics-directed compiler generator,
Ninth ACM Symposium on Principles of Programming Languages, pages 224-233.

L.C. Paulson (1984),
Compiler generation from denotational semantics,
in B. Lorho, editor,
Methods and Tools for Compiler Construction,
Cambridge University Press, pages 219-250.
