@part(user, root "th.mss")
@appendix(Using the Grammar Analyzer)

When started, the grammar analyzer prints:

@begin(example)

Semantic Grammar Analyzer

Language:
@end(example)

If you reply ``ln,'' then the analyzer will look for a semantic grammar
with file name ln.SG, write the language description file on ln.LDF,
and produce a listing file on ln.LST.
The analyzer will optionally list the LALR(1) states or attribute
dependencies of the grammar.


@appendixSection(Lexical Conventions for Semantic Grammars)

The grammar analyzer uses the following ASCII representations of
special symbols in semantic grammars:

@begin(display)

@tabClear@tabSet(1inch,2.5inch)
@i(symbol @\representation @\note)
@z(_) @\@t(->)
@g(l) @\@t(\)
| @\@t(!)
@z(X) @\@t(X)  @\@b(x) is a reserved word!
@z(T) @\@t(BOT[)@r(@i(domain))@t(])  @\the domain must be given
@end(display)

Names are strings of up to fifteen
letters, digits, and underscores, beginning
with a letter.
Only variable names may contain digits.
The case of letters is not significant.

Reserved words:

@begin(display, faceCode b)

@tabDivide(5)
and @\attribute @\bot @\case @\define
div @\domain @\else @\end @\eq
esac @\false @\fi @\fix @\forward
ge @\gt @\if @\in @\is
le @\left @\let @\letrec @\lt
mod @\ne @\nonassoc @\not @\of
or @\propagate @\resolution @\right @\rule
then @\true @\unspec @\with @\x
@end(display)

Comments begin with @t(#) and continue until the end of the line.
A separator is a space, end of line, or comment.
Separators are required between reserved words, names, and numbers.


@appendixSection(Resolution of Parser Conflicts)

To resolve the dangling @b(else) conflict,
include @b(right) "@b(else)" in the resolution part,
and use the syntax
@begin(display)
statement = @!"@b(if)" expression "@b(then)" statement "@b(else)" statement |
@/"@b(if)" expression "@b(then)" statement | . . .
@end(display)
not
@begin(display)
statement = "@b(if)" expression "@b(then)" statement elsePart
elsePart = "@b(else)" statement | empty
@end(display)

To resolve operator precedence, use appropriate resolution declarations
and the syntax
@begin(display)
expression = expression "+" expression | expression "*" expression | . . .
@end(display)
not
@begin(display)
expression = expression operator expression | . . .
operator = "+" | "*" | . . .
@end(display)

To use the resolution part properly,
you must know exactly how the grammar analyzer resolves
conflicts in the parse table.
Suppose there is a conflict between @i(shift) on symbol s and
@i(reduce) by a rule whose rightmost terminal is t.
The conflict is unresolvable if the rule has no terminals,
or if neither s nor t is mentioned in the resolution part.
Otherwise,
the grammar analyzer resolves the conflict according to which operator
has the higher binding power @cite[aho78]:

@begin(display)

@tabClear@tabset(1inch,2inch)
@i(priority @\grouping@\parse table entry)
s<t @\any @\reduce
s>t @\any @\shift
s=t @\@b(left) @\reduce
s=t @\@b(right) @\shift
s=t @\@b(nonassoc) @\error
@end(display)

The analyzer does not resolve reduce-reduce conflicts.

@appendixSection(Terminal Symbols)

Terminals in semantic grammars may contain alphanumeric (including underscore)
characters or non-alphanumeric characters, but not both.
They may not be empty, contain spaces, or begin with a digit.
A non-alphanumeric terminal may not be longer than two characters.


@appendixSection(Error Messages)

The grammar analyzer does not produce a language description file
if there are errors in the semantic grammar.
It reports only the first error encountered in a
definition or rule.
The messages are:

@begin(description)

@b(Illegal character) @i(char) @\
The semantic grammar contains a non-printing character.

@b(15 characters maximum) @\
A name is longer than fifteen characters.

@b(No digits allowed here) @i(name) @\
Only variable names may contain digits.

@b(Another symbol expected) @i(symbol) @\
Syntax error in the input grammar.

@b(Undeclared identifier in expression) @i(name) @\
The given name is undefined.

@b(Command improperly terminated) @i(symbol) @\
A semicolon is expected after every definition and rule.

@b(Unmatched quotes) @\
A string constant or terminal symbol runs past the end of a line.

@b(Line too long  truncated) @\
The input line is too long to process.

@b(Factor expected) @\
The current symbol cannot begin a factor.

@b(Function required here) @i(domain) @\
The current operand must be a function.

@b(Sum domain required here) @i(domain) @\
The current operand must belong to a sum domain.

@b(Product domain required here) @i(domain) @\
The current operand must belong to a product domain.

@b(Equality undefined for this domain) @i(domain) @\
The current operand must allow equality testing 
it must not contain a function domain.

@b(No such domain) @i(name) @\
There is no domain of the listed name.
The name of a variable must, after digits are removed,
be a domain name, unless the domain of the variable is explicitly given.

@b(Domain mismatch) @i(domain1 domain2) @\
The two domains are required to be the same.

@b(Illegal redefinition) @i(name) @\
Attempt to redefine the given name.

@b(Wrong number of attributes) @\
The current nonterminal symbol is supplied
the wrong number of attributes.

@b(Illegal in a defining position) @\
The current attribute expression is in a defining position,
and may contain only constants, variables, tuples, and injections.

@b(WITH depends on other WITHs) @\
Pseudo-attributes may not depend upon each other.
Either a with name or a uniqueName attribute
appears in another with clause.

@b(Not a selector of a sum domain) @i(name) @\
The given name appears where a tag is required.

@b(Wrong selector for case) @\
The tags in a case expression must appear in the same order
as in the definition of the union domain.

@b(No such symbol in grammar) @i(symbol) @\
The given symbol is not a terminal symbol of the grammar.

@b(Illegal terminal symbol) @\
The symbol may not be used as a terminal because it violates the
restrictions mentioned above  contains spaces, for example.

@b(No such nonterminal in grammar) @i(name) @\
The given name is not a nonterminal of the grammar.

@b(No strings generated by nonterminal) @i(name) @\
The given nonterminal cannot produce any terminal productions.

@b(Unreachable nonterminal) @i(name) @\
The given nonterminal cannot be produced from the start symbol.

@b(Unresolvable conflict in kernel) @i(kernel number) @\
The given parsing kernel has a shift-reduce or reduce-reduce conflict,
using LALR(1) lookahead.

@b(Unresolved forward reference to) @i(name) @\
The given name was never defined,
but was used as a domain, expression, or attribute variable.

@b(Missing final end token) @\
Every grammar must be terminated by the token @b(end).

@end(description)



@appendix(Using the Universal Translator)

When started, the translator prints

@begin(example)

Universal Translator

Program:
@end(example)

It expects a file name of the form pn.ln,
where pn is the program name and ln is the language name.
It uses pn.ln as the program source file,
looks for a language description file named ln.LDF,
writes a listing on pn.LST,
and writes object code on pn.COD.
The listing includes storage and run-time statistics,
the simplified semantics of the program,
and the generated code.

@appendixSection(Recursive Functions)
@label(recursive)

The universal translator distinguishes two kinds of recursive functions.
A function defined using @b(forward) and @b(define) declarations
is @i(compile-time recursive);
the simplifier will always apply it to its arguments, expanding its definition.
Every instance must disappear from the DAG before code generation,
or the translator will signal an error.
Use compile-time recursion for type-checking or bookkeeping functions that
require evaluation during compilation.

A function defined using @b(letrec) or @b(fix)
is @i(run-time recursive),
never expanded during simplification.
It will be translated into machine instrutions and called at run-time.
Use run-time recursion for the semantics of loops.

@appendixSection(Syntax Errors)

If the parser detects a syntax error,
it prints a list of expected symbols and stops.
(The symbols are the current LALR(1) lookahead;
some of them may not really be valid continuations of the input.)

The scanner detects the following lexical errors:

@begin(description)
@b(Line longer than 80 characters) @\
The input line is too long to process.

@b(Name longer than 15 characters) @\
The current name is longer than fifteen characters.

@b(End of file in comment) @\
A comment has no closing bracket,
but continues until the end of the file.

@b(Illegal character) @\
The program contains a non-printing character.

@b(No such symbol in language) @\
The current character has no meaning in the language.
@end(description)

@appendixSection(Semantic Errors)

The translator prints information to
pinpoint the cause of a semantic error.
The first line of a message lists the nonterminal containing the error,
blank if the error occurred in a predefined nonterminal
or @b(with) clause.
@display(@b[Semantic error:]  nonterminal)

If the error is an attribute equal to @z(T),
then the next line lists the name, if any,
of the attribute domain:
@display(@b(Undefined attribute)  domain name)

If the error is a circularly defined attribute,
then the next line is
@display{@b(Circularly defined attribute)  domain name}

If the error is a failed constraint,
then the next message line indicates the constraint's form:

@begin(display)

@tabSet(2inch)
@i(constraint@\message)
x @b(is) tag @\@b(Should be) tag
x @b(eq) y @\@b(Attribute mismatch)
name(args) @\@b(Failed check:) name
other @\no message
@end(display)

Using meaningful names in your semantic grammar will make
the error messages more readable.



@appendix(Using the Stack Machine)

The stack machine prompts for its code, input, and output files.
After reading the object code and input data,
it starts executing the object program.
If the program terminates, the machine prints the current state and the value
on top of the stack.
A value of @z(T) indicates abortion.
The machine also aborts whenever it attempts to perform arithmetic on @z(T).

@appendixSection(Debugging)

The stack machine does not provide a debugger;
the state it prints is usually undecipherable.
The only useful information printed is the program counter.
The listing from the universal translator includes the simplified semantic DAG,
as well as the object code.
Comparing the code with the DAG,
and the DAG with the source program,
will locate where the program aborted.

If the machine crashes because the dump has overflowed,
then the object program contains a function that loops by calling itself.
(Less likely, the program may contain a tail-recursive call that the machine
does not recognize.)
The return address, repeated in almost every element of the dump,
pinpoints the erroneous function call.

@appendixSection(Input and Output)

A semantic grammar must be compatible with the stack machine's input/output
conventions for programs to run properly.
Input and output are sequences of integers;
a list k@-(1), . . ., k@-(n) is represented as some permutation of:
@display{ ([1@z(_)k@-(1)] . . . [n@z(_)k@-(n)] @z(T), n) }
The semantics of a program is a mapping from the input file to the output file:

@begin(display)

@tabSet(3inch)
INTMAP = INT @Z(_) INT; @\#contents of a file
FILE = INTMAP @Z(X) INT; @\#file including length
PROGRAM = FILE @Z(_) FILE; @\#input/output semantics
@end(display)

The Pascal grammar uses a dangerous trick:
it defines INTMAP as INT@Z(_)VALUE instead of INT@Z(_)INT.
Since VALUE is a union domain that includes INT,
this works if tagfields are suppressed in the object code.
The grammar specifies type-checking to guarantee that the VALUEs in the
Pascal files really are INTegers.

The grammar must build the output file entirely from function updates,
not lambda-abstractions.
Although abstractions define legitimate functional values,
the stack machine's output routine cannot evaluate them.
Instead of
@display{(@g(l)int.int+10, 3)}
the machine requires
@display{[1@z(_)11][2@z(_)12][3@z(_)13] @z(T), 3)}
or another permutation, such as
@display{[3@z(_)13][1@z(_)11][2@z(_)12] @z(T), 3)}


@appendixSection(A Cautionary Note)
@label(cautionary)

A semantic formula may be correct but
extremely inefficient to execute.

I represent a Pascal set by its characteristic function:
@b(true) for the members of the set,
@b(false) for the non-members.
@display(SET = INT @Z(_) BOOL;)
The obvious definition of the empty set, @g(l)int.@b(false),
works in practice;
the obvious definitions of the set operations, such as union, do not.

My first attempt to define union was
@display{set1 @z(U) set2   =    @g(l)int. set1(int) @b(or) set2(int) }
As a program executed,
each set operation created a closure as the set's value.
Each closure referenced two others.
Over time, each set became a tree of closures.
Testing set membership caused evaluation of the entire tree.
Furthermore, the closures all contained environments referencing the state.
The array compactor bogged down in the hundreds of extra references.

I solved the problem by defining the union operation to construct a new set by
inserting the elements of set1 and set2 to the empty set, as function updates.

























































