Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 09 Mar 2011

Program specialization (is not just partial evaluation)

I've been thinking a lot about various techniques in program analysis, transformation and verification recently. There's certainly a lot to think about.

One idea I'm exploring is looking at verification problems as program specialization exercises. There is a recurring two-stage process in verification. First, transform your program so that a single execution captures all possible inputs. For an explicit-state model checker like CMC, we do this by putting our program in a harness that systematically explores its state space. Alternatively, for approaches based on predicate abstraction, we replace all input-dependent transitions in the program with nondeterministic choice. The effect is the same: we now have one program encoding all possible behaviours. The second step is then to specialize our program for answering the question we care about, such as “does this assertion ever fail?”. We rely on this specialization to give us a new, simpler, faster program that we can exhaustively check, or can check to greater depth, without exhausting resources (time or memory).

It's the specialization step I'm thinking about right now. How much of our program's computation can we throw away, while still computing the answer to our question? CEGAR approaches work from the bottom up: we start from a trivial abstraction and refine it compute something close to the smallest program which finds either an absence of bugs or at least one non-spurious bug. This process need not terminate; I'm not yet clear on its other failure modes, but am fairly sure there are some. Meanwhile, a top-down approach also exists. CMC is a useful tool even though it doesn't do any specialization of the computation per se. (It does support some other kinds of abstraction for reducing the state space by defining equivalences, which have a similar effect but are of limited applicability.) To improve on this, we could exploit the fact that throwing away unwanted computation is something we know something about. Compilers have been doing this since compilers began. “Program specialization” is a term used mainly by compiler-minded people rather than verification people. Can we apply ideas from one world to the other?

“Program specialization” in the literature is often used to mean partial evaluation. With partial evaluation, we take a program of n inputs, say, and then produce a smaller, simpler, faster version where some of these inputs are replaced by fixed values. This is typical of optimisation problems, where “faster” is the key requirement, and the input constraints have usually been derived from some other analysis. However, there is a converse case of program specialization which the same literature often ignores. This is where we take a program of n outputs, and then produce a smaller, simpler, faster version where we “don't care” about some of these outputs. This is typical of verification problems, where “simpler” is the key requirement, and the selection of don't-care outputs is a consequence of the specification being considered.

Predicate abstraction is doing this, but with some added room for manoeuvre---since it's open to finding sound approximations rather than precise specializations---and also with some added constraints, since it's interested in predicates that can be input to a SAT or SMT solver to perform the abstraction-refinement. Dave provided a valuable link in a productive coffee break this morning, by noting that program slicing is also an instance of specializing for don't-care outputs. What happens if we use slicing techniques to do a top-down specialization? I'm worried the answer is “not enough” or “strictly worse than abstraction-refinement”, but I'll keep thinking about it.

[/research] permanent link contact

Greek talk

One of the reasons why I'm not a theoretical computer scientist is that I am very, very averse to mathematical notation. “It's like Greek to me!”---no applause, please. Certainly, it's common to see very highly abbreviated notation that takes some serious cognitive gear-turning to decode. If I'm faced with a Greek-heavy paper, I usually skim over the symbolic stuff and look for an explanation in words. Sometimes it's there, and sometimes it isn't. In the cases where it's not, I rarely have the stamina to wade through the Greek.

Natural language, for all its imprecision, is---unsurprisingly---more “natural”! In fact, I'll wager that most of the infamous imprecision found in natural language specifications could be fixed by more precise natural language. Perhaps a semantic checker for English is in order. Diagrams are even better than natural language, of course, although they rarely stand alone.

It strikes me that formalism is primarily useful for avoiding mistakes. By turning complex reasoning into simple pattern-recognition and symbol-pushing, correctness can be checked fairly dumbly. The cost is that although it's hard to make mistakes, it's hard to make progress: there are reams of applicable rules, and expressing anything complex requires a whole lot of symbols. So I'm going to go out on a limb and claim that formalism is notably not very good for acquiring understanding. In a lecture, diagrams and examples and words have always been far more useful to me than slides full of Greek. I'm also going to assert (without proof!) that formalism is not useful for artifact construction, except where mistake-avoidance is paramount. We should allow programmers to make imprecise statements, and refine them later, because humans can be a lot more productive this way. In particular, we can make progress before we fully understand the problem! Only when the cost of the smallest mistake is so great that we really want to rein things in should we resort to fully rigorous constructive methods (such as formal refinement processes, the B method, etc.). This argument also encompasses many of the usual arguments in favour of dynamic languages over statically typed ones.

Of course, that doesn't mean that any formal notation is to be avoided. For whatever quirk of evolution, humans have some aptitude for written language---and that includes more mathematical-style symbolic notations just as well as plain old words made of letters. So mathematical notation is fine if it stays within a particular comfort zone. I can read basic logic and basic algebra without much cognitive burden. Only when the formal notation passes a certain density threshold do I suddenly hit problems. I suspect that most theoretical computer scientists (and mathematicians) have a much higher threshold than I do.

[/research] permanent link contact

validate this page