Delivery-Date: 
Sender: info-hol-request@lal.cs.byu.edu
Errors-To: info-hol-request@lal.cs.byu.edu
Precedence: bulk
To: info-hol@leopard.cs.byu.edu
Subject: Re: Difficulties with large terms
In-Reply-To: Your message of "Fri, 06 Jan 1995 11:02:07 GMT." <3547.9501061102@frogland.inmos.co.uk>
Date: Wed, 25 Jan 1995 21:30:19 +0000
From: John Harrison <John.Harrison@cl.cam.ac.uk>
Message-Id: <"swan.cl.cam.:186310:950125213211"@cl.cam.ac.uk>


David Shepherd writes:

| The problem with using equality is that (if its permitted) it will perform
| a traversal down the trees to see if they are equal - if this goes in
| aconv then it could make aconv quadratic!

Yes, if you put it in at each stage. But one could, as Tim says, simply do an
equality test once before calling aconv at all. At worst this would take twice
as long, and many cases, like the one I was concerned about, would be
dramatically improved. By the way, I recall doing some tests of equality and
aconv between big terms which were equal structurally but were made of
different cons cells. Equality was *still* significantly faster than aconv,
even though in principle aconv does less work (i.e. ignores the names of bound
variables, though, ahem, not their type). This isn't entirely surprising as =
is hardwired into the compiler.

Anyway, Konrad showed me how to get an EQ test in SML/NJ, but it's a bit of a
hack:

    local open System.Unsafe
     in fun EQ (M:term,N:term) = ((cast M:int) = (cast N:int))
     end;

EQ certainly isn't in the Standard, I'm told (I wonder why?)

| ... HOL seems to have several things which turn out to be quadratic (or worse)
| but only get noticed on very big terms!

Yes, which makes "stress-testing" it rather instructive. For example, I
sometimes wonder about the efficiency of de Bruijn terms when the terms are (a)
very large and (b) rich in nested abstractions. Simply traversing the term,
regardless of what you actually do, could then be quadratic, because each
dest_abs stimulates a subtraversal replacing the dB index with a free variable
(lots more consing too...) On the other hand I suppose situations where (a) and
(b) are both true are not easy to imagine.

By the way, here are some results from my BDD package on standard benchmarks.

  Example    |  Nodes  |    C   |  ML  | Inf(1) |  Inf(2)
  ---------------------------------------------------------
  add1.be    |    425  |  0.01  | 0.23 |  12.32 |    24.94
  add2.be    |   1784  |  0.06  | 0.63 |  54.53 |   295.48
  add3.be    |   5471  |  0.40  | 3.47 | 174.67 |  2572.52
  add4.be    |  11251  |  0.80  | 8.71 | 373.40 | 10074.00
  ---------------------------------------------------------

  Example = Name of test circuit
  Nodes   = Number of nodes in BDD graph
  C       = Time for C implementation
  ML      = Time for straight SML implementation
  Inf(1)  = Time for derived rule after putting equality tests in TRANS & EQ_MP
  Inf(2)  = Time for derived rule without tweaking primitive rules

As you can see, Inf(1) is linear in the number of nodes, whereas Inf(2) is
quadratic. Maybe with enough ingenuity one can do better without changing the
primitive rules, but as I said originally, there are some quite general
arguments in favour of putting an equality test in there.

John.