Orangepath/HPR L/S Project: Hardware and Embedded Software Synthesis from Executable Specifications.
Direct Synthesis of Logic using Search Techniques Page.

Direct Synthesis of Logic Using a SAT Solver and Other Search Techniques

The HPR L/S Project.

Research Note

DJ Greaves

A White Paper based on this web page was published at RSP 2004. Download PDF.

The work on this page has now been taken forward with a model checker instead of a SAT solver, to handle startup and error recovery.

DS using SAT

The recent development in the size of problems that may be tackled with SAT solvers now enables problems of 100K variables to be readily solved. This is a useful size both for hardware and for software synthesis, provided the problem and the possible solutions can be phrased as logic functions.

There are many ways of using a SAT solver to generate hardware designs, but one of the most simple is to postulate a pseudo-FPGA architecture and allow the SAT solver to program it. In the basic case, the problem to be solved is phrased as a formal logic function and the behaviour of the FPGA given a given programming is expressed similarly. Then a bi-equivalence equation can be solved by a SAT solver to generate solutions. Un-used gates in the FPGA are then removed using standard identities to trim the solution for ASIC or other flows. Finally, we can select from a number of solutions either by refining the query or selecting from the generated solutions using cost evaluations (e.g. lines of code or gate count).

Converting a high-level specification of the problem to a logic function presents two main challenges. Firstly there tends to be exponential blow-up when converting to conjunctive normal form and secondly there is a desire to mix other forms of design specification with the formal specification, such as structural instantiation of existing design sections and behavioural specification. We address these later.

The selection of an appropriate pseudo-FPGA architecture is also a challange, once we realise that for the approach to be useful, we need to instantiate RAMs, ALUs and even microprocessors in the target hardware, if it is to be a comprehensive solution to today's design tasks. Finally, and for completeness, we would like to generate software for embedded processors within the same architecture.

Although SAT solvers may be able to tackle the whole of a problem in one go (by which we mean a task of comparable size to that typically handled in a run of Synopsys Design Compiler), for a tool flow to be practical, there needs to be consistency and reuse of solutions from one run to another. Otherwise, the whole of a design might change as a result of a very minor change in the specification.

HPR L/S Specification Algebra 1

In the HPR L/S project we define a number of fundamental hardware problems. A given hardware solution can implement any number of problems at once, although there may be additional restrictions on concurrency.

HPRLS-1 is a form that separates control and data according to strict rules. It applies to non-pipelined problems that operate in cycles where there is no control state at the end of a cycle. Put another way, the controller returns to a known state periodically. The data is represented purely symbolicly. Predicate functions on the data provide for data-dependent control flow. As we shall see, an HPRLS-1 specification is frequently augmented with some sort of data conservation specification which is a direct consequence of many hardware designs being concerned with buffering, filtering or otherwise transforming streams of data [ Data Conservation Primitives].

HPRLS-1P is a varient of HPRLS-1 that allows pipelined solutions. The cycle boundaries may not exist in concrete form, but still logically exist.

Notation

We use a simple non-branching time notation in this work. Predicate rules may refer to a variable, v, meaning its value at current time t, or X(v), meaning its value at time t+1 within the current timeframe. We can also write X(v,t+n) and X(v, n) where n is a small integer. These two respectively mean the value of the variable at time t+n or absolute time n. In the example below a range of values for n is given. Such ranges cause macro expansion of the rule into multiple similar rules, each with its own integer value plugged in.

Tiny Example Problem

A tiny example is presented in full on this separate web page: Almost Trivial, Divide by 3 Counter.

Example Problems

There are two small examples on these two links:

Please see Two Rail Signalling.

Please see GPIB to Centronix Example.

Here is a simple example:

A very simple problem to solve is the generation of a pulse sequence consiting of an infinitely repeating pattern consisting of 101 followed by six 0's before starting to repeat again. The period is therefore 9 clocks.

If the output is a sequence x[t] where t is time measured in units of a master clock, the problem may be specified formally as


     \A(t). (0<=t<=8) ? (x[t]==(t==0 || t==2)):x[t] = x[t-9].

We manually convert the specification into our own notation as input to the HPRLS expander:


    X(x, t+n:(0..8)) := (n==0 || n==2); 
    X(x, t+9) := x;

The expander converts the assignments into bi-implication predicates and creates a sequence of variables for each variable in the input code together with a one step delay predicate relating each to its predecessor in the sequence. The sequence length is determined by symbolic enumeration of the set of rules and is sufficient to encompass the time horizon thus created. Hence ten copies of x are encounted while macro-expanding this section of code and one of the rules is that x_0 = x_9, so there are only 8 different values. Each macro-expansion results in a bi-implication clause that is then expanded with de Morgan to a number of DNF clauses for feeding to the SAT solver. Note that the lhs and the rhs of each assignment could have been swapped over and the program would still be valid.

The specification is translated into the following set of clauses for conversion to DNF:


     x_0 == 1;
     x_1 == 0;
     x_2 == 1;
     x_3 == 0;
     x_4 == 0;
     x_5 == 0;
     x_6 == 0;
     x_7 == 0;
     x_8 == 0;
     x_9 == x_0;
     X(x_0) == x_1;
     X(x_1) == x_2;
     X(x_2) == x_3;
     X(x_3) == x_4;
     X(x_4) == x_5;
     X(x_5) == x_6;
     X(x_6) == x_7;
     X(x_7) == x_8;
     X(x_8) == x_9;

This structure looks a bit like a one-hot encoding of a divide-by-nine counter. Actually, any count sequence of length nine, whatsoever, will meet this specification and the resulting design is just a matter of which one(s) the SAT solver returns first. Later we show that a number of hand-crafted Verilog RTL designs for the example problem also meet this specification.

Example Pseudo FPGA or PAL

A suitable hardware target for this class of problem is a sea of CLBs (configurable logic blocks) where each element may either be a four input combinatorial function or the same followed by a D-type flip-flop. There are 16 bits of LUT (look-up-table) to implement the combinatorial function and one to specify sequential or combinatorial operation, making 17 bits per CLB as seen by the SAT solver. All clock-inputs are hardwired to a master free-running clock.

The wiring interconnection pattern is not restricted by 2-D layout restrictions of real FPGAs, and so could be some sort of n-dimensional hypercube. Instead we use a tree structure with the expectation that CLBs closer to the root will be used mainly for signal routing rather than sequential logic.

The behaviour of the pseudo FPGA is readily converted to a formal logic specification using conventional techniques. The programming information for the FPGA turns into unconstrained inputs. Actual run-time input data can be handled via this approach as well.

Here is the Verilog for one CLB


// Configurable logic block
module hpr_fpga_clb(clk, d0, d1, d2, d3, y);

  input clk;
  input d0, d1, d2, d3;
  output y;

  reg [15:0] clk_lut;     // This is initialised during FPGA programming
  reg clb_comb_mode;      // This is initialised during FPGA programming

  reg clb_reg;

  wire [3:0] din = { d3, d2, d1, d0 };

  wire v = clk_lut[din];
  always @(posedge clk) clb_reg <= v;
  assign y = (clb_comb_mode) ? v: clb_reg;

endmodule

Here is the scalarised version of one CLB, ready for converting to DNF.

/* cv2.100 -root hpr_fpga_clb hprfpga.v -orangepath */ /* User=djg */ /* output from CBG CV2.100 CBG cv2.100 V4.1c VC10b TT CVA EDA CORE Version=1.0.6 (17-Jan-2003) UX */ X(clb_reg) := ~clk&X(clk)&v # ~(~clk&X(clk))&clb_reg; y := clb_comb_mode&v # ~clb_comb_mode&clb_reg; v := (din__3 # din__2 # din__1 # din__0)&((din__3 # din__2 # din__1 # ~din__0)&((din__3 # din__2 # ~din__1 # din__0)&((din__3 # din__2 # ~din__1 # ~din__0)&((din__3 # ~din__2 # din__1 # din__0)&((din__3 # ~din__2 # din__1 # ~din__0)&((din__3 # ~din__2 # ~din__1 # din__0)&((din__3 # ~din__2 # ~din__1 # ~din__0)&((~din__3 # din__2 # din__1 # din__0)&((~din__3 # din__2 # din__1 # ~din__0)&((~din__3 # din__2 # ~din__1 # din__0)&((~din__3 # din__2 # ~din__1 # ~din__0)&((~din__3 # ~din__2 # din__1 # din__0)&((~din__3 # ~din__2 # din__1 # ~din__0)&((~din__3 # ~din__2 # ~din__1 # din__0)&~(~din__3 # ~din__2 # ~din__1 # ~din__0)&clk_lut__15 # ~(~din__3 # ~din__2 # ~din__1 # din__0)&clk_lut__14) # ~(~din__3 # ~din__2 # din__1 # ~din__0)&clk_lut__13) # ~(~din__3 # ~din__2 # din__1 # din__0)&clk_lut__12) # ~(~din__3 # din__2 # ~din__1 # ~din__0)&clk_lut__11) # ~(~din__3 # din__2 # ~din__1 # din__0)&clk_lut__10) # ~(~din__3 # din__2 # din__1 # ~din__0)&clk_lut__9) # ~(~din__3 # din__2 # din__1 # din__0)&clk_lut__8) # ~(din__3 # ~din__2 # ~din__1 # ~din__0)&clk_lut__7) # ~(din__3 # ~din__2 # ~din__1 # din__0)&clk_lut__6) # ~(din__3 # ~din__2 # din__1 # ~din__0)&clk_lut__5) # ~(din__3 # ~din__2 # din__1 # din__0)&clk_lut__4) # ~(din__3 # din__2 # ~din__1 # ~din__0)&clk_lut__3) # ~(din__3 # din__2 # ~din__1 # din__0)&clk_lut__2) # ~(din__3 # din__2 # din__1 # ~din__0)&clk_lut__1) # ~(din__3 # din__2 # din__1 # din__0)&clk_lut__0; /* eof */

Here is the structure used to nest CLBs into a tree. A tree with five levels generates a 32 CLB structure.


module hpr_fpga_nest(clk, d0, d1, d2, d3, y);

  input clk;
  input d0, d1, d2, d3;
  output y;

  hpr_fpga_clb tt(.clk(clk), .d0(d0), .d1(d1), .d2(d2), .d3(q), .y(y));
  hpr_fpga_clb bb(.clk(clk), .d0(y), .d1(d1), .d2(d2), .d3(d3), .y(q));
endmodule

As can be seen, the CLBs are paired with one input of each taken from the output of the other. This results in a potential combinatorial loop, depending on the programming of the clb_comb_mode bits in each cell.

A problem arises from the combinatorial loops intrinsic to all unprogrammed FPGA structures: these loops might be used as latch storage in synthesised designs. This is addressed during conversion to DNF in the HPR L/S expander. The first step of expansion is a pseudo macro expansion of combinatorial nets by recursively substituting their r.h.s. expression at places where they are referenced. Where a macro expansion touches a combinatorial net that is already on the expander's stack as work in progress, instead of entering into an infinite descent, the tool generates a new 'DG' variable. It then either creates a bi-implication assertion that that variable is always the same as the expression it is replacing, or it leaves it undriven. After SAT solving, each DG variables will be allocated a static value and this value is then assimilated into the netlist when it is trimmed.

Run-time input signals.

There are no inputs in this simple example. Where inputs are needed, the design must work for all possible input values, and so the clauses that reference them are macro-expanded for all possible combinations of the inputs. Alternatively, they can be universally qualified at the head of the design. Either way, this is another source of exponential growth in this naive approach.

Top-Level Goal

Since our example target specification has a single output bit, the top-level goal for SAT solution is created with a single additional specification that at all times, one of the nets in the FPGA is identical to the desired output net from the specification.


     \A(t). x[t]==pga_clb0_q

Here pga_clb0_q is the output net from one of the root-level CLBs. We could equally have chosen any other output or internal net.

SAT solving

A pseudo FPGA of 32 CLBs was generated, resulting in 32 pre-DNF clauses over 544 terms.

The Walksat solver freely available from Princeton was used to solve the example task. This solver supports upto 150000 terms as dimensioned in the download. SAT problems grow exponentially in terms of clauses as new conditions are added, so a better measure of relative complexity is to take logarithms of the number of clauses.

In addition, our example design uses a fraction of the FPGA capacity.

References

S. Devadas, "Optimal layout via Boolean satisfiability," in IEEE Int. Conf. Computer-Aided Design, Nov. 1989, pp. 294--297.

This is a mini-project to back up an EPSRC research proposal and is currently work in progress, so watch this space for the continuation.