=========================================================================
  Some Notes on a MetiTarski Tracing for Machine Learning :: 22-Nov-2011
=========================================================================
 
 I - Key commands
  
   i. --ml_trace : Output a trace file (X.trace if problem is X.tptp)
         consisting of machine-learning data for input problem.

      Example:

       > metit --ml_trace --verbose 0 exp-361-2.tptp

        ----------------------------------------------------
         Gathering machine learning data for an RCF problem
        ----------------------------------------------------
        - Analysing QepcadB:     ..
        - Analysing Mathematica: ................
        ----------------------------------------------------

        ----------------------------------------------------
         Gathering machine learning data for an RCF problem
        ----------------------------------------------------
        - Analysing QepcadB:     ..
        - Analysing Mathematica: ................
        ----------------------------------------------------

        ----------------------------------------------------
         Gathering machine learning data for an RCF problem
        ----------------------------------------------------
        - Analysing QepcadB:     ..
        - Analysing Mathematica: ................
        ----------------------------------------------------

     ... And so on.
         For this example, trace data will be in exp-361-2.trace.
         (See Section II - Timing below for an example trace data
          record.)




  ii. --ml_calibrate : Perform some sampling and statistics on
         Mathematica and QepcadB to determine the average CPU time
         spent during tool initialisation. (See Section II - Timing
         below for why this is necessary.)

      Example:

       > metit --ml_calibrate exp-361-2.tptp  

        ------------------------------------------------------------------------- 
         Calibrating average initialisation CPU time for QepcadB and Mathematica 
        ------------------------------------------------------------------------- 
        - # Samples to be taken for each tool: 50.
        - Sampling QepcadB: .....................................................
          QepcadB ave.: 0.09324 sec.
          QepcadB std.: 0.00026 sec.
          QepcadB bcd.: 0.00026 sec.
        - Sampling Mathmca: .....................................................
          Mathmca ave.: 0.20603 sec.
          Mathmca std.: 0.00250 sec.
          Mathmca bcd.: 0.00255 sec.
        ------------------------------------------------------------------------- 

      Key: `ave.' - arithmetic mean of initialisation CPU time across all samples
           `std.' - standard deviation of above sample space
           `bcd.' - Bessel-corrected standard deviation of above sample space




 II - Timing
  
  There are some subtleties with timing the execution of external
  tools.  Let EADM (`external algebraic decision method') be either
  Mathematica or QepcadB.  For this machine learning data, we have:

    1) A collection of different EADM settings S_1, ..., S_k,
    2) A collection of different RCF problems P_1, ..., P_m,

  and we would like to perform the following actions:

    For each RCF problem P_i
      - Compute a list of measurements upon P_i (`problem features')
      - For each EADM setting S_j
          - Compute the timing of EADM with setting S_j upon problem P_i.

  For a fixed problem P_i, such a record might look as follows (in
   fact, the following, modulo the manual shortening of the Problem
   string, was machine-generated by MetiTarski):


   >>>[new problem record]--->>>
   Problem: Resolve[Exists[{skoX,pi}, And[LessEqual[...REMOVED...]]]]
   Features: 
    Dim: 2
    Symbols: 183
    Atoms: 6
    Monomials: 15
    Max MVTDeg: 20
   Results:
    QepcadB: (Decision, Time_Spent) = (Sat, 0.216)
    QepcadB with Proj_Ord: (Decision, Time_Spent) = (Sat, 0.206)
    Mathematica Setting #1: (Decision, Time_Spent) = (Sat, 0.282)
    Mathematica Setting #2: (Decision, Time_Spent) = (Sat, 0.272)
    Mathematica Setting #3: (Decision, Time_Spent) = (Sat, 0.285)
    Mathematica Setting #4: (Decision, Time_Spent) = (Sat, 0.274)
    Mathematica Setting #5: (Decision, Time_Spent) = (Sat, 0.286)
    Mathematica Setting #6: (Decision, Time_Spent) = (Sat, 0.273)
    Mathematica Setting #7: (Decision, Time_Spent) = (Sat, 0.287)
    Mathematica Setting #8: (Decision, Time_Spent) = (Sat, 0.274)
    Mathematica Setting #9: (Decision, Time_Spent) = (Sat, 0.284)
    Mathematica Setting #10: (Decision, Time_Spent) = (Sat, 0.272)
    Mathematica Setting #11: (Decision, Time_Spent) = (Sat, 0.286)
    Mathematica Setting #12: (Decision, Time_Spent) = (Sat, 0.274)
    Mathematica Setting #13: (Decision, Time_Spent) = (Sat, 0.285)
    Mathematica Setting #14: (Decision, Time_Spent) = (Unknown, 0.280)
    Mathematica Setting #15: (Decision, Time_Spent) = (Unknown, 0.292)
    Mathematica Setting #16: (Decision, Time_Spent) = (Sat, 0.273)
   <<<[end problem record]---<<<



  However, consider the following points:


    - We care only to measure CPU time, not `system' or `wall clock'
      time.

    - During a MetiTarski run, each tool is only initialised once.
    - Thus, for us, the most meaningful time measures are those which
      consist of the CPU time of the EADM run upon a problem *after*
      the EADM has been initialised.  I.e., Given problem P_i and
      setting S_j, we want:

       (CPU time [EADM with setting S_i] takes to solve P_i)
         - (CPU time EADM takes to initialise itself).

    - Standard ML only allows us to measure the CPU time taken by an
      external process once that process has been killed.  For
      example, to measure CPU time QepcadB takes on a single problem
      P_i, we have to:

       1) start a new QepcadB process
       2) run QepcadB upon problem P_i
       3) kill QepcadB
       
      and then ask Standard ML to measure the time taken.  We cannot
      measure the time taken by QepcadB on the problem before we kill
      the QepcadB process which was run on the problem.

    - QepcadB itself gives us no way of measuring the CPU time it
      spent on a problem.  If it did (i.e., if we could ask QepcadB to
      output the CPU time it took on a problem after initialisation,
      then, things would be much simpler!  QepcadB does compute the
      `system time' taken by SACLIB, both in initialisation and after
      initialisation, but that is not what we want).


  The combination of the above constraints makes things difficult.
  To get around this, we do the following:

   Let EADM be either Mathematica or QepcadB.
    Given an RCF problem P_i,
     Given an EADM setting S_j,
      Record the total CPU time [EADM with setting S_j] took to solve P_i.

  So, our timing measurements include the CPU time taken by each tool in 
  initialisation.

  But, we provide an extra tool: A `calibration' command which allows the
  person performing the tracing experiments to gather statistics on how 
  much CPU time each EADM takes to initialise on their machine.
  
  For machine learning, it is then up to the researcher gathering the
  tracing data to decide how the CPU times reported in the trace files
  should be modified/normalised based upon the EADM initialisation time
  statistics.


 III - Modifying the code

  i. The Mathematica setting options can be found near the top of
     RCF/Mathematica.sml.  Please experiment with changing them.
     The key value over which the tracing mechanism iterates
     is opts_lst : mk_opts list.

  ii. To modify the formatting of the data printed to the trace
      files (to make them easier to parse, for instance) modify
      the function mk_s_resolve_ml found also in RCF/Mathematica.sml.


 I'll stop here for now.  Please ask me any questions you may have!
 -Grant
