# The Syntax of HOL Expressions

The basic language of the HOL logic is that of a typed lambda calculus. This simple syntax is augmented with many constructs that make expressions easier to input and to read. In the following, we discuss various aspects of the syntax of terms: how to directly construct them as well as how to parse and prettyprint them.

## The lexical structure of types and terms

• Type variables are like SML type variables:
```    :* -> **
```
from hol88 would be
```    :'a -> 'b
```
in hol90. Type constructors can be any identifier, i.e., an alphabetic character followed by any number of alphanumeric characters.
• The lexical structure for term identifiers is also much like that for SML: identifiers can be alphanumeric or symbolic. Variables must be alphanumeric. A symbolic identifier is any concatenation of the characters in the following list
```    explode "#?+*/\\=<>&%@!,:;_|~-";
```
with the exception of the keywords
```    ["\\", ";", "=>", "|", ":" ]
```
• Any alphanumeric can be a constant except the keywords
```    [ "let", "in", "and", "of" ].
```
• There are also two infinite families of constants: numbers and strings (if you have loaded the string library). Numbers are exactly the same as in hol88, while strings are much like SML strings.

• The separator for enumerated lists and sets is ";", for example
```    --`[1;2;3]`--
```
gives a three element list of type `:num list`.

## Functions for manipulating types and terms

hol90 supports a large number of primitive and derived functions for building and manipulating types and terms. These are originally defined in the modules Type, Term, and Dsyntax (which stands for "Derived syntax"). By and large, these functions have record types. Some users have asked that these functions be given tuple types, as in hol88. To meet these requests, we have gathered up all the built in term manipulation functions that have record types into two structures: Psyntax and Rsyntax. These structures are variants of each other, featuring identical functions with different types. The functions found in Rsyntax have record types, while those found in Psyntax have tuple or curried types, as in hol88. The functions in Rsyntax are initially in the pervasive environment. For example the function "dest_abs" in the initial hol90 environment has type
```    term -> {Bvar: term, Body :term}
```
Both paired and record versions may be intermixed through use of long identifiers. Typically, a user will do either
```    open Psyntax;
```
or
```    open Rsyntax;
```
and work in whatever environment they find most comfortable. Those who write packages for general purpose use will need to bear this in mind and, e.g., ensure that their package does not make assumptions about whether Psyntax or Rsyntax is open.

## Parsing and Quotation and Antiquotation

It is a bit tedious just to construct terms and types by direct construction. To alleviate this, HOL systems come with parsers for the type and term languages. These parsers come in various flavours, but all are found in the Parse structure. One important type of input to a parser is quotations. Quotation, and its sibling antiquotation, are features appearing in the original LCF system that have been ported to SML/NJ. A quotation is delimited by backquote marks, for example
```    `A /\ B`
```
In SML/NJ, quotations correspond to arbitrary object language expressions and hence they are a more general notion than in LCF and hol88, where a quotation can only be either a type or term. There can be quotations from many different object languages in the same SML/NJ session. A consequence of this is that quotations must be parsed explicitly, instead of implicitly, as in hol88. For example, the above quotation could be parsed as:
```    Parse.term_parser `A /\ B`
```
or, since the function "--" is a bit of syntactic sugaring for term_parser (suggested by Richard Boulton of Cambridge University), as
```    --`A /\ B`--
```
Similarly, types could be parsed as
```    Parse.type_parser `:(bool -> 'a) -> ('a # 'b) list`
```
or as
```    ==`:(bool -> 'a) -> ('a # 'b) list`==
```
One nicety of the system is that the hol90 prettyprinter will automatically surround a term or type with syntax that will re-parse it:
```    - Parse.term_parser `X \/ Y`;
val it = (--`X \/ Y`--) : term
```
This can be modified by altering the variables
```      Globals.type_pp_prefix     (* initially "(--"  *)
Globals.type_pp_suffix     (* initially "--)"  *)
Globals.term_pp_prefix     (* initially "(=="  *)
Globals.term_pp_suffix     (* initially "==)"  *)
```

## Antiquotation

For terms, antiquotes are much the same as in hol88. So
```    val tm = --`A /\ B`--;
val tm' = --`C /\ ^tm`--;
```
gives the same result as parsing
```    val tm' = --`C /\ A /\ B`--
```
For types, it is necessary to use a somewhat more devious approach: if you are going to use a type as an antiquotation, you must wrap the constructor ty_antiq around it:
```    val ty = ty_antiq(==`:('a -> 'b) -> 'a list -> 'b list`==)
val tm = --`(MAP:^ty) f`--;
```
In hol90, type antiquotations are needed to get the effect of type abbreviations in hol88.

## The grammar for terms

```TERM    :: SUFFIX
| APP_EXP SUFFIX
| APP_EXP => TERM | TERM   (* conditional expression *)
| APP_EXP
APP_EXP :: APP_EXP AEXP : TYPE       (* constrained application *)
| APP_EXP AEXP
| AEXP : TYPE               (* constrained atomic term *)
| AEXP
AEXP    :: identifier
| symbolic_identifier
| ^<ML-expression>  (* Antiquote *)
| "string"  (* string literal; only with theory of strings *)
| ( TERM )
| [ LIST ]
| { LIST }        (* when a set theory is loaded *)
| { TERM | TERM } (* when a set theory is loaded *)
SUFFIX  :: binder BV_LIST :: TERM . TERM (* restricted binder *)
| binder BV_LIST . TERM
| let BLIST in TERM
BV_LIST :: BV BV_LIST
| BV
BV      :: ( BV )
| identifier
| ^ <ML-expression>  (* Antiquote *)
| BV : TYPE
| ( BV , VSTRUCT )
VSTRUCT :: BV
| BV , VSTRUCT
BLIST   :: BV_LIST = TERM
| BV_LIST = TERM and BLIST
LIST    :: (* nothing, interpreted as the empty list *)
| TERM
| TERM ; LIST
```

## Some expansions performed in parsing

### Lambda Abstractions and Binders

```   <binder> x. tm --->  <binder>(\x. tm)

\(x,y). tm ---> UNCURRY(\x y. tm)

<binder> (x,y). tm --> <binder>(UNCURRY(\x y. tm))
```

### let terms

```    let x = 1 in tm  --> LET(\x.tm) 1

let x = 1
in let y = x + 1
in y+2  --> LET(\x. LET(\y. y+2)(x+1)) 1

let x = 1
and y = 2
in x+y     --> LET (LET (\x y. x+y) 1) 2
```

### Set abstractions

A set abstraction {tm | p} is parsed into
```    GSPEC (\vtuple. (tm,p))
```
where vtuple is the tuple made from the intersection of the free variables in tm and p.

## Precedence

Infixes in hol90 have precedence, and associate to the right, e.g., if "op" is an infix, then
```    A op B op C
```
parses to the same term as
```    A op (B op C).
```
Functions that introduce constants into the system have systematically been upgraded to include the parsing status of the constant. For example,
```  val new_infix : {Name:string,  Ty:hol_type, Prec:int} -> unit

val new_specification
:{name:string, sat_thm:thm,
consts:{const_name:string, fixity:fixity} list}
-> thm

val new_recursive_definition
:{def:term, fixity:fixity,
name:string, rec_axiom:thm} -> thm

val define_type
: {fixities:fixity list, name:string,
type_spec:term frag list} -> thm

```
where fixity is described by the following datatype:
```    datatype fixity = Prefix | Binder | Infix of int
```
To give constant c1 a higher precedence than constant c2, give it a higher number. The current precedences are
```          \$,  ---> 50
\$=  ---> 100
\$==>  ---> 200
\$\/  ---> 300
\$/\  ---> 400
\$>, \$<  ---> 450
\$>=, \$<=  ---> 450
\$+, \$-  ---> 500
\$*, \$DIV  ---> 600
\$MOD  ---> 650
\$EXP  ---> 700
\$o    ---> 800
```
For the library "sets", the precedences are
```         \$IN  ---> 450
\$SUBSET  ---> 450
\$PSUBSET  ---> 450
\$UNION  ---> 500
\$DIFF  ---> 500
\$INSERT  ---> 500
\$DELETE  ---> 500
\$INTER  ---> 600
```
The hol90 prettyprinter also uses precedence to eliminate as many brackets as possible.

## How terms are parsed

The information in this section is probably too detailed to include in a user manual; it is included anyway, for those intrepid users who want to know a bit more detail. Terms are currently parsed in 4 passes. The first pass leaves us with a structure where everything is associated to the left, conforming to the fact that application associates to the left and is highest in the precedence pecking order. The second pass sorts out the precedences of the user-defined infixes. This leaves us in the type of preterms.

The third pass is type inference. Type inference is by a variant of Milner's algorithm W; unification of types is done by Robinson's side-effecting unification algorithm. The declaration of hol_type allows two kinds of type variables: user and system. A user-given type variable is treated as a type constant during unification: it would be frustrating to constrain a term to have a certain type, only to have that type change by unification. A system type variable is modifiable by unification, but may not persist after the end of parsing.

The last pass removes dollarsigns from atom names and checks that all type variables are constrained. This pass maps from preterms to terms.

### How precedence is enforced (gory detail)

As already explained, the first pass of parsing has left us with a structure where everything is associated to the left, conforming to the fact that application associates to the left and is highest in the precedence pecking order. For us, all infixes associate to the right, with differing precedences. (The only exception to this is negation (~), a prefix operator.)

So we have the following preterm structure:

```    (...((X f) arg))
```
Since we have not "flipped" infixes to be prefixes yet, if the above expression is an infixed expression, f will be an infix constant, with (naively) X as its left arm and arg as its right arm. If we have an "infix chain", i.e., X is itself of the form (...((X' g) arg')), with g infix, then we have a tug-o-war between f and g for arg'. There are two cases, based on the precedences of f and g:
1. The precedence of g is strictly larger than f. Then arg' forms the right arm of g. The left arm of g is a suffix of X'. Then ((g left_arm) arg') becomes the left arm to f:
```  ((f ((g left_arm) arg')) arg)
```
2. Otherwise. arg' goes into ((f arg') arg), which forms the right arm for g.
We have glossed a bit, since at the time of looking at an infix, it is not possible to know what its left arm will be, so we put the infix and its right arm on a stack. We then move to the left in the term. Assume the next template is that of an infix: we have the two cases above. In case 1, we already have <f,arg> on the stack; we push <g,arg'> on top of that and continue. In case 2, we pop <f,arg> off the stack and make <g,((f arg') arg)>. We continue popping the stack and making values until we empty the stack or we encounter a <h,tm> pair on the stack with a precedence strictly less than g.

How does this terminate? When we run into a situation that doesn't meet the (...((X f) arg)) template. In that case, we unwind the stack by using a "dummy infix" with binding value less than anything on the stack.

#### Example

Suppose that we have declared infixes &, +, =, and ?, with the relative precedences being (& > +), (+ > ?), and (? > =). Our example is
```    A & B + C & D = A ? B + C ? D
```
Since there are no nested applications, we can treat this as a string. Put on the stack, the lookahead (LA) is +C. Since + > ?, we push <+,C> onto the stack and the algorithm traces as follows:
```
LA            stack
-------------------
+C

?B            <+,C>

=A

&D            <=,((? A) ((? ((+ B) C)) D))>

+C            <&,D>
<=,((? A) ((? ((+ B) C)) D))>

&B            <+,((& C) D)>
<=,((? A) ((? ((+ B) C)) D))>

A            <&,B>
<+,((& C) D)>
<=,((? A) ((? ((+ B) C)) D))>
```
Now unwind, by filling in all left hand sides on the stack.
```    1. ((& A) B)
2. ((+ ((& A) B)) ((& C) D))
3. ((= ((+ ((& A) B)) ((& C) D)))
((? A) ((? ((+ B) C)) D)))
```
If we pretty printed the result, we would get
```    (A & B) + (C & D) = A ? ((B + C) ? D)
```

## The grammar for types

```TYPE    :: TYPE -> TYPE
| TYPE + TYPE
| TYPE # TYPE
| APP_TY

APP_TY  :: TY_ARG identifier
| BASIC

TY_ARG  :: TY_ARG identifier
| BTY_ARG

BTY_ARG :: ( TYPE , TY_LIST )
| BASIC

TY_LIST :: TYPE , TY_LIST
| TYPE

BASIC   :: type_variable
| identifier      (* Type constant *)
| ^ <ML-expression>      (* Antiquoted type *)
| ( TYPE )
```

## The grammar for define_type specifications

```TYPE_SPEC :: ident = CLAUSES

CLAUSES :: CLAUSE
| CLAUSE | CLAUSES

CLAUSE  :: ""  (* definition of the empty string in the string library *)
| CONSTRUCTOR
| CONSTRUCTOR of TYPE
| CONSTRUCTOR of TYPE => CURRIED_TYPE

CURRIED_TYPE :: TYPE
| TYPE => CURRIED_TYPE

CONSTRUCTOR :: identifier
| symbolic_identifier
```

## Special cases in parsing

• The inclusion of ":" in the "symbolics" means that some constraints may need to be separated by white space. For example
```    \$=:bool->bool->bool
```
will be broken up by the lexer as
```    \$=: bool -> bool -> bool
```
and parsed as an application of the symbolic identifier "\$=:" to the term argument list ["bool", "->", "bool", "->", "bool"]. But
```    \$= :bool->bool->bool
```
is OK, being parsed as the symbolic identifier "=" constrained by a type.

• Another special case is that of ~, which is used for negation. It is the only prefix parsed identifier in the system. For example, the usual parsing of `A B C` is equivalent to `(A B) C`, since application is viewed as an invisible left associative infix operator with the top binding power. However, parsing `~ ~ A` is equivalent to parsing `~ (~ A)`. Furthermore, the tilde is really in a lexical class of its own, since it "breaks up" symbolic identifiers it's with (including itself): for example, if we take _|_ as an existing symbolic constant, then `~~_|_` is equivalent to `~(~( _|_ ))`. There is a "lexical shenanigan" that one can play to add identifiers with tilde as symbolic identifiers: add the lexeme of interest to the list referenced by the variable Globals.tilde_symbols. For example,
```    Globals.tilde_symbols := ["~*"];
new_infix{Name = "~*",
Ty   = `:bool -> bool -> bool`,
Prec = 450}
```
allows `A ~* B` to parse happily. Adding a "tilde"d symbolic identifier to the tilde_symbols suppresses the special nature of tilde in the symbolic identifier.
• A difference in the parsing of hol88 and hol90 is how type constraints work with infixes. In hol88
```    term infix term : hol_type
```
gets bracketed thus:
```    term infix (term : hol_type)
```
whereas in hol90, it is equivalent to the following parse:
```    (term infix term) : hol_type
```

Beginning of HOL syntax