Return-Path: <John.Harrison-request@cl.cam.ac.uk>
Delivery-Date: 
Received: from leopard.cs.byu.edu (no rfc931) by nene.cl.cam.ac.uk 
          with SMTP (PP-6.5) outside ac.uk; Fri, 17 Mar 1995 19:13:44 +0000
Received: by leopard.cs.byu.edu (1.37.109.15/16.2) id AA163245234;
          Fri, 17 Mar 1995 11:33:54 -0700
Sender: info-hol-request@lal.cs.byu.edu
Errors-To: info-hol-request@lal.cs.byu.edu
Precedence: list
Received: from swan.cl.cam.ac.uk by leopard.cs.byu.edu 
          with ESMTP (1.37.109.15/16.2) id AA162745208;
          Fri, 17 Mar 1995 11:33:29 -0700
Received: from auk.cl.cam.ac.uk (user jrh (rfc931)) by swan.cl.cam.ac.uk 
          with SMTP (PP-6.5) to cl; Fri, 17 Mar 1995 18:17:17 +0000
To: info-hol@leopard.cs.byu.edu
Subject: Re: memory consumption of load_theory in HOL90.7
In-Reply-To: Your message of "Fri, 17 Mar 1995 17:15:04 +0100." <95Mar17.171508met.8081@sunbroy14.informatik.tu-muenchen.de>
Date: Fri, 17 Mar 1995 18:17:12 +0000
From: John Harrison <John.Harrison@cl.cam.ac.uk>
Message-Id: <"swan.cl.cam.:290530:950317181721"@cl.cam.ac.uk>


One can split the "why C is more efficient than SML" reasons into two
groups. In the first group we have ways in which C disallows or
discourages inefficient programming constructs; SML programmers could
impose the same restrictions on themselves and hope the compiler will
understand.

* There are no dynamic free variables so everything can be accessed
  directly off two index registers without multiple dereferences.

* All dynamic variables are of fixed size, so stack adjustment on
  function call can be done by adding a constant.

* Facilities for direct manipulation of composite data structures and
  higher-order objects are minimal.

Then there are more fundamental reasons why C will always beat ML:

* Array dereferences are not bounds checked.

* There is real pointer arithmetic.

* One can hack data structures easily to fit the machine.

It seems these would lead to inconsistency if copied in SML (e.g one
could have an array of theorems, and pick up random data of type "thm" by
reading outside its bounds). However it's not clear these problems are
terribly important in practice. When I wrote identicalish BDD algorithms
in C and SML, I found C was about 10 times the speed; I suspect largely
because of the efficient pointer manipulations in C (using complement edges
probably exaggerates the difference). Don't forget also:

* There are greater commercial incentives to have efficient C compilers.

Then there is the vexed question of garbage collection. It really seems
that having a garbage collected language is a tremendous plus for theorem
provers like HOL, and any other symbolic computation systems. The
convenience value is tremendous. Doing it transparently in C *should* be
less efficient. Having said that, symbolic computation may be unusual.
In most systems software, for example, I suspect one wants a bit of
extensibility in the data structures, but their allocation/deallocation
follows the structure of the program very neatly, and will be far more
efficient with C mallocs. However SML fans may disagree. Are there any
papers which study this "objectively"?

In any case SML/NJ is a heavyweight system, as Konrad remarks. There are
some very economical versions of ML (e.g. CAML Light). However it remains
to be seen how they cope with really big examples.

John.
