Comparing proof search in LEO-II and Isabelle/HOL
Nik Sultana and Lawrence Paulson, April 2011
The file "results.csv" contains the results of experiments to compare
the strength of two systems: LEO-II (v1.2.6 and linked to E 1.2) and
Isabelle/HOL 2011. Isabelle/HOL is an interactive theorem-prover first
and foremost, but it also offers a comparatively advanced level of
automation for an ITP. We tested the following tactics available in
Isabelle/HOL, of which further description can be obtained from
Isabelle's documentation:
- auto
- force
- sledmetis (this refers to Sledgehammer coupled with Metis, run after
having first massaged the problem with 'auto'),
- bestsimp
- fastsimp
- metis
- slowsimp
- sled (this is identical to sledmetis save for the
preprocessing using 'auto').
These tactics were pitted against LEO-II on all the THF (higher-order) problems
in TPTP 5.1.0. This set consisted of 2798 problems. Each system was
run for 120 seconds.
The CSV file contains the outcome of these experiments: it contains
a line for every problem, and details the following in order:
- problem name
- problem category (from TPTP classification)
- SZS annotation given in problem file
(i.e. this indicates the "expected outcome" of running
a prover on this problem).
- 2 columns per prover indicating the outcome of the
experiment and the time taken to reach it.
Times are shown in milliseconds. The outcome of the experiment is
shown using two ontologies:
- LEO-II results use the SZS ontology
- Isabelle results use a simpler classification:
- OK means that a theorem was produced
- NOK means that a theorem could not be produced.
- TIME means that the process timed out, and a theorem
was not produced.
Apart from the "TIME" status, a timeout is also indicated by
having a "time taken" of either -1 or 120000.
Please send any queries to nik.sultana "at" cl.cam.ac.uk or lp15 "at" cam.ac.uk.
Last updated: 1st Apr 2011
Related links: