Delivery-Date: 
Sender: info-hol-request@lal.cs.byu.edu
Errors-To: info-hol-request@lal.cs.byu.edu
Precedence: bulk
From: Konrad Slind <slind@informatik.tu-muenchen.de>
To: reetz@ira.uka.de, info-hol@leopard.cs.byu.edu
In-Reply-To: <"i80fs2.ira.490:11.04.94.11.27.50"@ira.uka.de> (message from reetz on Wed, 11 May 1994 19:27:47 +0200)
Subject: Re: does there exists a hol-lex?
Message-Id: <94May11.161355met_dst.8109@sunbroy14.informatik.tu-muenchen.de>
Date: Wed, 11 May 1994 16:13:55 +0200


reetz@ira.uka.de asks, I think, about extending the parsing of HOL terms
in hol90. There are a couple of ways to do this that I know of:

    A. Write your own parser for all HOL terms, including your new
       language.

    B. Write a parser for your new language, and somehow insert its
       results into HOL terms.

The second option is much easier and the "somehow insert its results
into HOL terms" bit is implemented by antiquotation. It suffers, however
when your new language is polymorphic, since you would like to use type
inference to help find types in that case.

If we turn to option A, as far as I know, nobody besides me has done
much in the way of extending the hol90 parser for new object languages.
However, the infrastructure is there, and the hol90 system does not
depend on which parser is in force. One just needs to modify the current
HOL grammar (in src/0/hol_lex and src/0/hol_yak) to accomodate the new
syntax, and rebuild the Parse structure (src/0/parse.sml). I can give
many more details on request.

> lexgen/mlyacc won't work well for a set of states large than 255,
> because then, it switches to integer arrays to represent them and
> there is a known bug with integer arrays, which causes SML/NJ to
> increase memory to 100MB or even more. So, for large grammars
> lexgen/mlyacc won't work very well up to now.

I think I was the first to discover this bug when I added
quote/antiquote to the SML/NJ compiler, bumping the lexer from 242
states to just over 255. It's very easy to get around: rather than
having each alphanumeric keyword (say) as a separate production, have a
single production for all alphanumerics and inside that, do a case
statement on the string denoted by "yytext".

Konrad.