1. Title: Applying machine learning to the problem of choosing a heuristic
to select the variable ordering for cylindrical algebraic decomposition
2. Sources:
(a) Zongyan Huang, Lawrence C Paulson and James Bridge
University of Cambridge
Computer Laboratory
William Gates Building
15 JJ Thomson Avenue
Cambridge CB3 0FD, UK
{zh242, lp15, jpb65}@cam.ac.uk
Matthew England, David Wilson and James H Davenport
University of Bath
Department of Computer Science
Bath, BA2 7AY, UK
{J.H.Davenport, M.England, D.J.Wilson}@bath.ac.uk
(b) Zongyan Huang - details as (a).
(c) Date: 31st March 2014
3. Past Usage:
(a) Applying machine learning to the problem of choosing a heuristic to select the variable
ordering for cylindrical algebraic decomposition, Zongyan Huang, Matthew England,
David Wilson, James H Davenport, Lawrence C Paulson and James Bridge, submitted to
Conferences on Intelligent Computer Mathematics, CICM 2014
(b) We wish to predict which of three CAD heuristics could suggest a optimal variable
ordering with minimum number of cells in the computed CADs.
(c) We have investigated the use of machine learning for making the choice of the variable
ordering to use for CAD and quantifier elimination. In both cases we found that it allows
for a superior choice to any of the three individual heuristics tested.
4. Relevant Information Paragraph:
Files:
raw.csv - raw data used to derive training,
validation and test data.
CAD-statistics.txt - min, max, mean and standard deviation
for raw data. (Tabulated below.)
train.csv - actual training, validation and test
validation.csv sets used
test.csv
Raw data:
Problems were taken from the nlsat dataset. This was chosen as it was the
only available set with sufficient numbers of problems for machine learning.
7001 three-variable CAD problems were extracted for our experiment.
Two separate experiments were undertaken, applying machine learning to
CAD itself and quantifier elimination by CAD. The problems from the nlsat
database are all fully existential (satisfiability or SAT problems) and for
the quantifier-free experiments we simply removed all quantifiers.
Columns 1 to 11 are the features selected. (See the paper for a description of
features used.)
Training, validation and test data:
In both experiments, the data was randomly split into a learning set (3545 problems),
a validation set (1735 problems) and a test set (1721 problems). The features in the
training set are normalised to zero mean and unit variance. Validation/test data was
normalized using the coefficients computed for the training set. The total number of
problems with "optimal" variable ordering suggested by a heuristic was used to measure
the efficacy of the machine learned selection process. Labels are in the final six columns.
The first three of those correspond to the three heuristics (Brown, sotd and ndrr) for
quantifier free problems and contain +1 if the corresponding heuristic suggested a variable
ordering with the lowest number of cells, or −1 otherwise. Similarly, the last three columns
correspond to the three heuristics (Brown, sotd and ndrr) for quantified problems and contain +1
if the corresponding heuristic suggested a variable ordering with the lowest number of cells,
or −1 otherwise.
5. Number of Instances:
7001 in the raw data.
The training, validation and test sets have 3545, 1735 and 1721
respectively.
6. Number of Attributes:
There are 11 features for each instance.
7. Description of attributes:
Full name of attributes:
number of polynomials
max degree of polynomials (compare the degree of each polynomial, find the maximum one)
max degree of x0 among all polynomials
max degree of x1 among all polynomials
max degree of x2 among all polynomials
proportion of x0 occurs in polynomials (number of terms containing variable x0 divided by total number of polynomials)
proportion of x1 occurs in polynomials (number of terms containing variable x1 divided by total number of polynomials)
proportion of x2 occurs in polynomials (number of terms containing variable x2 divided by total number of polynomials)
proportion of x0 occurs in monomials (number of terms containing variable x0 divided by total number of terms in polynomials)
proportion of x1 occurs in monomials (number of terms containing variable x1 divided by total number of terms in polynomials)
proportion of x2 occurs in monomials (number of terms containing variable x2 divided by total number of terms in polynomials)
Raw data: all attributes are numeric and continuous.
Training, validation and test data: all data are numeric and continuous.
8. Missing Attribute Values:
There are no missing values.
9. Class Distribution: number of positive instances in the sets for
each heuristic.
Quantifier free
Brown sotd ndrr
Training set: 2270 2121 1862
Validation set: 1146 1061 886
Test set: 1107 1039 872
Quantified case
Brown sotd ndrr
Training set: 2601 2323 2033
Validation set: 1295 1171 1016
Test set: 1270 1109 951
10. Attribute statistics:
Statistics for the raw data (7001 problems):
Attribute Min Max Mean SD
1 1 25 7.96486 3.68819
2 1 44 7.07128 6.13013
3 1 44 6.08856 6.20234
4 1 22 2.33338 2.14282
5 1 7 1.23882 0.51474
6 0.125 1 0.56049 0.15526
7 0.0588 1 0.51096 0.14110
8 0.0588 1 0.40315 0.14471
9 0.0588 1 0.32563 0.10346
10 0.0323 0.8889 0.29831 0.10455
11 0.0323 0.8889 0.20837 0.08596
Statistics for the combined training, validation and test data:
We do not include mean and standard deviation as the data are
normalized.
Correlation with predicted attribute (Quantifier free) Correlation with predicted attribute (Quantified)
Attribute Min Max Brown sotd ndrr Brown sotd ndrr
1 -1.8941 4.6051 -0.02811 0.06264 0.15782 -0.00753 0.03197 0.10014
2 -0.9928 5.9944 -0.01006 -0.12853 -0.16914 -0.06095 -0.12492 -0.16391
3 -0.8259 6.0828 -0.01067 -0.15758 -0.21268 -0.05271 -0.14019 -0.19365
4 -0.6196 8.8794 -0.19152 -0.02973 0.12475 -0.03824 0.04362 0.05879
5 -0.4546 11.3667 0.10261 0.08730 0.17186 0.10427 0.06649 0.18545
6 -2.8087 2.8893 0.13011 -0.04733 0.04724 0.13360 -0.00431 0.06817
7 -3.1824 3.4612 -0.00906 0.03095 0.15695 -0.02572 0.03783 0.11560
8 -2.376 4.1408 0.00477 0.11239 0.19709 0.00481 0.09790 0.22226
9 -2.5881 6.5909 0.08594 -0.01240 0.11293 0.06036 0.00521 0.06612
10 -2.5487 5.6949 -0.02975 0.06023 0.24894 -0.03304 0.05329 0.16939
11 -2.0685 8.0357 -0.01730 0.07064 0.15874 0.02064 0.07123 0.18664