The Syntax of HOL Expressions

The basic language of the HOL logic is that of a typed lambda calculus. This simple syntax is augmented with many constructs that make expressions easier to input and to read. In the following, we discuss various aspects of the syntax of terms: how to directly construct them as well as how to parse and prettyprint them.

User Manual Table of Contents


The lexical structure of types and terms

Beginning of HOL syntax
User Manual Table of Contents


Functions for manipulating types and terms

hol90 supports a large number of primitive and derived functions for building and manipulating types and terms. These are originally defined in the modules Type, Term, and Dsyntax (which stands for "Derived syntax"). By and large, these functions have record types. Some users have asked that these functions be given tuple types, as in hol88. To meet these requests, we have gathered up all the built in term manipulation functions that have record types into two structures: Psyntax and Rsyntax. These structures are variants of each other, featuring identical functions with different types. The functions found in Rsyntax have record types, while those found in Psyntax have tuple or curried types, as in hol88. The functions in Rsyntax are initially in the pervasive environment. For example the function "dest_abs" in the initial hol90 environment has type
    term -> {Bvar: term, Body :term}
Both paired and record versions may be intermixed through use of long identifiers. Typically, a user will do either
    open Psyntax;
or
    open Rsyntax;
and work in whatever environment they find most comfortable. Those who write packages for general purpose use will need to bear this in mind and, e.g., ensure that their package does not make assumptions about whether Psyntax or Rsyntax is open.

Beginning of HOL syntax
User Manual Table of Contents


Parsing and Quotation and Antiquotation

It is a bit tedious just to construct terms and types by direct construction. To alleviate this, HOL systems come with parsers for the type and term languages. These parsers come in various flavours, but all are found in the Parse structure. One important type of input to a parser is quotations. Quotation, and its sibling antiquotation, are features appearing in the original LCF system that have been ported to SML/NJ. A quotation is delimited by backquote marks, for example
    `A /\ B`
In SML/NJ, quotations correspond to arbitrary object language expressions and hence they are a more general notion than in LCF and hol88, where a quotation can only be either a type or term. There can be quotations from many different object languages in the same SML/NJ session. A consequence of this is that quotations must be parsed explicitly, instead of implicitly, as in hol88. For example, the above quotation could be parsed as:
    Parse.term_parser `A /\ B`
or, since the function "--" is a bit of syntactic sugaring for term_parser (suggested by Richard Boulton of Cambridge University), as
    --`A /\ B`--
Similarly, types could be parsed as
    Parse.type_parser `:(bool -> 'a) -> ('a # 'b) list`
or as
    ==`:(bool -> 'a) -> ('a # 'b) list`==
One nicety of the system is that the hol90 prettyprinter will automatically surround a term or type with syntax that will re-parse it:
    - Parse.term_parser `X \/ Y`;
    val it = (--`X \/ Y`--) : term
This can be modified by altering the variables
      Globals.type_pp_prefix     (* initially "(--"  *)
      Globals.type_pp_suffix     (* initially "--)"  *)
      Globals.term_pp_prefix     (* initially "(=="  *)
      Globals.term_pp_suffix     (* initially "==)"  *)

Antiquotation

For terms, antiquotes are much the same as in hol88. So
    val tm = --`A /\ B`--;
    val tm' = --`C /\ ^tm`--;
gives the same result as parsing
    val tm' = --`C /\ A /\ B`--
For types, it is necessary to use a somewhat more devious approach: if you are going to use a type as an antiquotation, you must wrap the constructor ty_antiq around it:
    val ty = ty_antiq(==`:('a -> 'b) -> 'a list -> 'b list`==)
    val tm = --`(MAP:^ty) f`--;
In hol90, type antiquotations are needed to get the effect of type abbreviations in hol88.

Beginning of HOL syntax
User Manual Table of Contents


The grammar for terms

TERM    :: SUFFIX 
         | APP_EXP SUFFIX 
         | APP_EXP => TERM | TERM   (* conditional expression *)
         | APP_EXP
APP_EXP :: APP_EXP AEXP : TYPE       (* constrained application *)
         | APP_EXP AEXP
         | AEXP : TYPE               (* constrained atomic term *)
         | AEXP 
AEXP    :: identifier
         | symbolic_identifier
         | ^<ML-expression>  (* Antiquote *)
         | "string"  (* string literal; only with theory of strings *)
         | ( TERM )
         | [ LIST ]
         | { LIST }        (* when a set theory is loaded *)
         | { TERM | TERM } (* when a set theory is loaded *)
SUFFIX  :: binder BV_LIST :: TERM . TERM (* restricted binder *)
         | binder BV_LIST . TERM 
         | let BLIST in TERM 
BV_LIST :: BV BV_LIST 
         | BV
BV      :: ( BV )
         | identifier
         | ^ <ML-expression>  (* Antiquote *)
         | BV : TYPE 
         | ( BV , VSTRUCT )
VSTRUCT :: BV
         | BV , VSTRUCT
BLIST   :: BV_LIST = TERM 
         | BV_LIST = TERM and BLIST
LIST    :: (* nothing, interpreted as the empty list *)
         | TERM
         | TERM ; LIST

Beginning of HOL syntax
User Manual Table of Contents


Some expansions performed in parsing

Lambda Abstractions and Binders

   <binder> x. tm --->  <binder>(\x. tm)

   \(x,y). tm ---> UNCURRY(\x y. tm)

   <binder> (x,y). tm --> <binder>(UNCURRY(\x y. tm))

let terms

    let x = 1 in tm  --> LET(\x.tm) 1

    let x = 1 
    in let y = x + 1 
       in y+2  --> LET(\x. LET(\y. y+2)(x+1)) 1

    let x = 1 
    and y = 2 
    in x+y     --> LET (LET (\x y. x+y) 1) 2

Set abstractions

A set abstraction {tm | p} is parsed into
    GSPEC (\vtuple. (tm,p))
where vtuple is the tuple made from the intersection of the free variables in tm and p.

Beginning of HOL syntax
User Manual Table of Contents


Precedence

Infixes in hol90 have precedence, and associate to the right, e.g., if "op" is an infix, then
    A op B op C 
parses to the same term as
    A op (B op C). 
Functions that introduce constants into the system have systematically been upgraded to include the parsing status of the constant. For example,
  val new_infix : {Name:string,  Ty:hol_type, Prec:int} -> unit

  val new_specification 
     :{name:string, sat_thm:thm,
       consts:{const_name:string, fixity:fixity} list}
      -> thm

  val new_recursive_definition 
     :{def:term, fixity:fixity, 
       name:string, rec_axiom:thm} -> thm

  val define_type
     : {fixities:fixity list, name:string, 
        type_spec:term frag list} -> thm

where fixity is described by the following datatype:
    datatype fixity = Prefix | Binder | Infix of int
To give constant c1 a higher precedence than constant c2, give it a higher number. The current precedences are
          $,  ---> 50
          $=  ---> 100
        $==>  ---> 200
         $\/  ---> 300
         $/\  ---> 400
      $>, $<  ---> 450
    $>=, $<=  ---> 450
      $+, $-  ---> 500
    $*, $DIV  ---> 600
        $MOD  ---> 650
        $EXP  ---> 700
        $o    ---> 800
For the library "sets", the precedences are
         $IN  ---> 450
     $SUBSET  ---> 450
    $PSUBSET  ---> 450
      $UNION  ---> 500
       $DIFF  ---> 500
     $INSERT  ---> 500
     $DELETE  ---> 500
      $INTER  ---> 600
The hol90 prettyprinter also uses precedence to eliminate as many brackets as possible.

Beginning of HOL syntax
User Manual Table of Contents


How terms are parsed

The information in this section is probably too detailed to include in a user manual; it is included anyway, for those intrepid users who want to know a bit more detail. Terms are currently parsed in 4 passes. The first pass leaves us with a structure where everything is associated to the left, conforming to the fact that application associates to the left and is highest in the precedence pecking order. The second pass sorts out the precedences of the user-defined infixes. This leaves us in the type of preterms.

The third pass is type inference. Type inference is by a variant of Milner's algorithm W; unification of types is done by Robinson's side-effecting unification algorithm. The declaration of hol_type allows two kinds of type variables: user and system. A user-given type variable is treated as a type constant during unification: it would be frustrating to constrain a term to have a certain type, only to have that type change by unification. A system type variable is modifiable by unification, but may not persist after the end of parsing.

The last pass removes dollarsigns from atom names and checks that all type variables are constrained. This pass maps from preterms to terms.

How precedence is enforced (gory detail)

As already explained, the first pass of parsing has left us with a structure where everything is associated to the left, conforming to the fact that application associates to the left and is highest in the precedence pecking order. For us, all infixes associate to the right, with differing precedences. (The only exception to this is negation (~), a prefix operator.)

So we have the following preterm structure:

    (...((X f) arg))
Since we have not "flipped" infixes to be prefixes yet, if the above expression is an infixed expression, f will be an infix constant, with (naively) X as its left arm and arg as its right arm. If we have an "infix chain", i.e., X is itself of the form (...((X' g) arg')), with g infix, then we have a tug-o-war between f and g for arg'. There are two cases, based on the precedences of f and g:
  1. The precedence of g is strictly larger than f. Then arg' forms the right arm of g. The left arm of g is a suffix of X'. Then ((g left_arm) arg') becomes the left arm to f:
      ((f ((g left_arm) arg')) arg)
    
  2. Otherwise. arg' goes into ((f arg') arg), which forms the right arm for g.
We have glossed a bit, since at the time of looking at an infix, it is not possible to know what its left arm will be, so we put the infix and its right arm on a stack. We then move to the left in the term. Assume the next template is that of an infix: we have the two cases above. In case 1, we already have <f,arg> on the stack; we push <g,arg'> on top of that and continue. In case 2, we pop <f,arg> off the stack and make <g,((f arg') arg)>. We continue popping the stack and making values until we empty the stack or we encounter a <h,tm> pair on the stack with a precedence strictly less than g.

How does this terminate? When we run into a situation that doesn't meet the (...((X f) arg)) template. In that case, we unwind the stack by using a "dummy infix" with binding value less than anything on the stack.

Example

Suppose that we have declared infixes &, +, =, and ?, with the relative precedences being (& > +), (+ > ?), and (? > =). Our example is
    A & B + C & D = A ? B + C ? D
Since there are no nested applications, we can treat this as a string. Put on the stack, the lookahead (LA) is +C. Since + > ?, we push <+,C> onto the stack and the algorithm traces as follows:

     LA            stack
     -------------------
     +C            

     ?B            <+,C>
                   

                   
                   

     =A            

     &D            <=,((? A) ((? ((+ B) C)) D))>

     +C            <&,D>
                   <=,((? A) ((? ((+ B) C)) D))>

     &B            <+,((& C) D)>
                   <=,((? A) ((? ((+ B) C)) D))>

      A            <&,B>
                   <+,((& C) D)>
                   <=,((? A) ((? ((+ B) C)) D))>
Now unwind, by filling in all left hand sides on the stack.
    1. ((& A) B)
    2. ((+ ((& A) B)) ((& C) D))
    3. ((= ((+ ((& A) B)) ((& C) D)))
        ((? A) ((? ((+ B) C)) D)))
If we pretty printed the result, we would get
    (A & B) + (C & D) = A ? ((B + C) ? D)

Beginning of HOL syntax
User Manual Table of Contents


The grammar for types

TYPE    :: TYPE -> TYPE 
         | TYPE + TYPE
         | TYPE # TYPE
         | APP_TY

APP_TY  :: TY_ARG identifier
         | BASIC 

TY_ARG  :: TY_ARG identifier
         | BTY_ARG

BTY_ARG :: ( TYPE , TY_LIST )
         | BASIC

TY_LIST :: TYPE , TY_LIST
         | TYPE 

BASIC   :: type_variable
         | identifier      (* Type constant *)
         | ^ <ML-expression>      (* Antiquoted type *)
         | ( TYPE )

Beginning of HOL syntax
User Manual Table of Contents


The grammar for define_type specifications

TYPE_SPEC :: ident = CLAUSES 

CLAUSES :: CLAUSE
         | CLAUSE | CLAUSES

CLAUSE  :: ""  (* definition of the empty string in the string library *)
         | CONSTRUCTOR
         | CONSTRUCTOR of TYPE 
         | CONSTRUCTOR of TYPE => CURRIED_TYPE

CURRIED_TYPE :: TYPE 
              | TYPE => CURRIED_TYPE

CONSTRUCTOR :: identifier 
             | symbolic_identifier

Beginning of HOL syntax
User Manual Table of Contents


Special cases in parsing


Beginning of HOL syntax
User Manual Table of Contents
Konrad Slind, slind@informatik.tu-muenchen.de