Theory Functions
theory Functions
imports Main
begin
section ‹Function Definitions for Dummies›
text ‹
  In most cases, defining a recursive function is just as simple as other definitions:
›
fun fib :: "nat ⇒ nat"
where
  "fib 0 = 1"
| "fib (Suc 0) = 1"
| "fib (Suc (Suc n)) = fib n + fib (Suc n)"
text ‹
  The syntax is rather self-explanatory: We introduce a function by
  giving its name, its type, 
  and a set of defining recursive equations.
  If we leave out the type, the most general type will be
  inferred, which can sometimes lead to surprises: Since both \<^term>‹1::nat› and ‹+› are overloaded, we would end up
  with ‹fib :: nat ⇒ 'a::{one,plus}›.
›
text ‹
  The function always terminates, since its argument gets smaller in
  every recursive call. 
  Since HOL is a logic of total functions, termination is a
  fundamental requirement to prevent inconsistencies\footnote{From the
  \qt{definition} ‹f(n) = f(n) + 1› we could prove 
  ‹0 = 1› by subtracting ‹f(n)› on both sides.}.
  Isabelle tries to prove termination automatically when a definition
  is made. In \S\ref{termination}, we will look at cases where this
  fails and see what to do then.
›
subsection ‹Pattern matching›
text ‹\label{patmatch}
  Like in functional programming, we can use pattern matching to
  define functions. At the moment we will only consider \emph{constructor
  patterns}, which only consist of datatype constructors and
  variables. Furthermore, patterns must be linear, i.e.\ all variables
  on the left hand side of an equation must be distinct. In
  \S\ref{genpats} we discuss more general pattern matching.
  If patterns overlap, the order of the equations is taken into
  account. The following function inserts a fixed element between any
  two elements of a list:
›
fun sep :: "'a ⇒ 'a list ⇒ 'a list"
where
  "sep a (x#y#xs) = x # a # sep a (y # xs)"
| "sep a xs       = xs"
text ‹
  Overlapping patterns are interpreted as \qt{increments} to what is
  already there: The second equation is only meant for the cases where
  the first one does not match. Consequently, Isabelle replaces it
  internally by the remaining cases, making the patterns disjoint:
›
thm sep.simps
text ‹@{thm [display] sep.simps[no_vars]}›
text ‹
  \noindent The equations from function definitions are automatically used in
  simplification:
›
lemma "sep 0 [1, 2, 3] = [1, 0, 2, 0, 3]"
by simp
subsection ‹Induction›
text ‹
  Isabelle provides customized induction rules for recursive
  functions. These rules follow the recursive structure of the
  definition. Here is the rule @{thm [source] sep.induct} arising from the
  above definition of \<^const>‹sep›:
  @{thm [display] sep.induct}
  
  We have a step case for list with at least two elements, and two
  base cases for the zero- and the one-element list. Here is a simple
  proof about \<^const>‹sep› and \<^const>‹map›
›
lemma "map f (sep x ys) = sep (f x) (map f ys)"
apply (induct x ys rule: sep.induct)
text ‹
  We get three cases, like in the definition.
  @{subgoals [display]}
›
apply auto 
done
text ‹
  With the \cmd{fun} command, you can define about 80\% of the
  functions that occur in practice. The rest of this tutorial explains
  the remaining 20\%.
›
section ‹fun vs.\ function›
text ‹
  The \cmd{fun} command provides a
  convenient shorthand notation for simple function definitions. In
  this mode, Isabelle tries to solve all the necessary proof obligations
  automatically. If any proof fails, the definition is
  rejected. This can either mean that the definition is indeed faulty,
  or that the default proof procedures are just not smart enough (or
  rather: not designed) to handle the definition.
  By expanding the abbreviation to the more verbose \cmd{function} command, these proof obligations become visible and can be analyzed or
  solved manually. The expansion from \cmd{fun} to \cmd{function} is as follows:
\end{isamarkuptext}
\[\left[\;\begin{minipage}{0.25\textwidth}\vspace{6pt}
\cmd{fun} ‹f :: τ›\\%
\cmd{where}\\%
\hspace*{2ex}{\it equations}\\%
\hspace*{2ex}\vdots\vspace*{6pt}
\end{minipage}\right]
\quad\equiv\quad
\left[\;\begin{minipage}{0.48\textwidth}\vspace{6pt}
\cmd{function} ‹(›\cmd{sequential}‹) f :: τ›\\%
\cmd{where}\\%
\hspace*{2ex}{\it equations}\\%
\hspace*{2ex}\vdots\\%
\cmd{by} ‹pat_completeness auto›\\%
\cmd{termination by} ‹lexicographic_order›\vspace{6pt}
\end{minipage}
\right]\]
\begin{isamarkuptext}
  \vspace*{1em}
  \noindent Some details have now become explicit:
  \begin{enumerate}
  \item The \cmd{sequential} option enables the preprocessing of
  pattern overlaps which we already saw. Without this option, the equations
  must already be disjoint and complete. The automatic completion only
  works with constructor patterns.
  \item A function definition produces a proof obligation which
  expresses completeness and compatibility of patterns (we talk about
  this later). The combination of the methods ‹pat_completeness› and
  ‹auto› is used to solve this proof obligation.
  \item A termination proof follows the definition, started by the
  \cmd{termination} command. This will be explained in \S\ref{termination}.
 \end{enumerate}
  Whenever a \cmd{fun} command fails, it is usually a good idea to
  expand the syntax to the more verbose \cmd{function} form, to see
  what is actually going on.
›
section ‹Termination›
text ‹\label{termination}
  The method ‹lexicographic_order› is the default method for
  termination proofs. It can prove termination of a
  certain class of functions by searching for a suitable lexicographic
  combination of size measures. Of course, not all functions have such
  a simple termination argument. For them, we can specify the termination
  relation manually.
›
subsection ‹The {\tt relation} method›
text‹
  Consider the following function, which sums up natural numbers up to
  ‹N›, using a counter ‹i›:
›
function sum :: "nat ⇒ nat ⇒ nat"
where
  "sum i N = (if i > N then 0 else i + sum (Suc i) N)"
by pat_completeness auto
text ‹
  \noindent The ‹lexicographic_order› method fails on this example, because none of the
  arguments decreases in the recursive call, with respect to the standard size ordering.
  To prove termination manually, we must provide a custom wellfounded relation.
  The termination argument for ‹sum› is based on the fact that
  the \emph{difference} between ‹i› and ‹N› gets
  smaller in every step, and that the recursion stops when ‹i›
  is greater than ‹N›. Phrased differently, the expression 
  ‹N + 1 - i› always decreases.
  We can use this expression as a measure function suitable to prove termination.
›
termination sum
apply (relation "measure (λ(i,N). N + 1 - i)")
text ‹
  The \cmd{termination} command sets up the termination goal for the
  specified function ‹sum›. If the function name is omitted, it
  implicitly refers to the last function definition.
  The ‹relation› method takes a relation of
  type \<^typ>‹('a × 'a) set›, where \<^typ>‹'a› is the argument type of
  the function. If the function has multiple curried arguments, then
  these are packed together into a tuple, as it happened in the above
  example.
  The predefined function @{term[source] "measure :: ('a ⇒ nat) ⇒ ('a × 'a) set"} constructs a
  wellfounded relation from a mapping into the natural numbers (a
  \emph{measure function}). 
  After the invocation of ‹relation›, we must prove that (a)
  the relation we supplied is wellfounded, and (b) that the arguments
  of recursive calls indeed decrease with respect to the
  relation:
  @{subgoals[display,indent=0]}
  These goals are all solved by ‹auto›:
›
apply auto
done
text ‹
  Let us complicate the function a little, by adding some more
  recursive calls: 
›
function foo :: "nat ⇒ nat ⇒ nat"
where
  "foo i N = (if i > N 
              then (if N = 0 then 0 else foo 0 (N - 1))
              else i + foo (Suc i) N)"
by pat_completeness auto
text ‹
  When ‹i› has reached ‹N›, it starts at zero again
  and ‹N› is decremented.
  This corresponds to a nested
  loop where one index counts up and the other down. Termination can
  be proved using a lexicographic combination of two measures, namely
  the value of ‹N› and the above difference. The \<^const>‹measures› combinator generalizes ‹measure› by taking a
  list of measure functions.  
›
termination 
by (relation "measures [λ(i, N). N, λ(i,N). N + 1 - i]") auto
subsection ‹How ‹lexicographic_order› works›
text ‹
  To see how the automatic termination proofs work, let's look at an
  example where it fails\footnote{For a detailed discussion of the
  termination prover, see \<^cite>‹bulwahnKN07›}:
\end{isamarkuptext}  
\cmd{fun} ‹fails :: "nat ⇒ nat list ⇒ nat"›\\%
\cmd{where}\\%
\hspace*{2ex}‹"fails a [] = a"›\\%
|\hspace*{1.5ex}‹"fails a (x#xs) = fails (x + a) (x#xs)"›\\
\begin{isamarkuptext}
\noindent Isabelle responds with the following error:
\begin{isabelle}
*** Unfinished subgoals:\newline
*** (a, 1, <):\newline
*** \ 1.~‹⋀x. x = 0›\newline
*** (a, 1, <=):\newline
*** \ 1.~False\newline
*** (a, 2, <):\newline
*** \ 1.~False\newline
*** Calls:\newline
*** a) ‹(a, x # xs) -->> (x + a, x # xs)›\newline
*** Measures:\newline
*** 1) ‹λx. size (fst x)›\newline
*** 2) ‹λx. size (snd x)›\newline
*** Result matrix:\newline
*** \ \ \ \ 1\ \ 2  \newline
*** a:  ?   <= \newline
*** Could not find lexicographic termination order.\newline
*** At command "fun".\newline
\end{isabelle}
›
text ‹
  The key to this error message is the matrix at the bottom. The rows
  of that matrix correspond to the different recursive calls (In our
  case, there is just one). The columns are the function's arguments 
  (expressed through different measure functions, which map the
  argument tuple to a natural number). 
  The contents of the matrix summarize what is known about argument
  descents: The second argument has a weak descent (‹<=›) at the
  recursive call, and for the first argument nothing could be proved,
  which is expressed by ‹?›. In general, there are the values
  ‹<›, ‹<=› and ‹?›.
  For the failed proof attempts, the unfinished subgoals are also
  printed. Looking at these will often point to a missing lemma.
›
subsection ‹The ‹size_change› method›
text ‹
  Some termination goals that are beyond the powers of
  ‹lexicographic_order› can be solved automatically by the
  more powerful ‹size_change› method, which uses a variant of
  the size-change principle, together with some other
  techniques. While the details are discussed
  elsewhere \<^cite>‹krauss_phd›,
  here are a few typical situations where
  ‹lexicographic_order› has difficulties and ‹size_change›
  may be worth a try:
  \begin{itemize}
  \item Arguments are permuted in a recursive call.
  \item Several mutually recursive functions with multiple arguments.
  \item Unusual control flow (e.g., when some recursive calls cannot
  occur in sequence).
  \end{itemize}
  Loading the theory ‹Multiset› makes the ‹size_change›
  method a bit stronger: it can then use multiset orders internally.
›
subsection ‹Configuring simplification rules for termination proofs›
text ‹
  Since both ‹lexicographic_order› and ‹size_change› rely on the simplifier internally,
  there can sometimes be the need for adding additional simp rules to them.
  This can be done either as arguments to the methods themselves, or globally via the
  theorem attribute ‹termination_simp›, which is useful in rare cases.
›
section ‹Mutual Recursion›
text ‹
  If two or more functions call one another mutually, they have to be defined
  in one step. Here are ‹even› and ‹odd›:
›
function even :: "nat ⇒ bool"
    and odd  :: "nat ⇒ bool"
where
  "even 0 = True"
| "odd 0 = False"
| "even (Suc n) = odd n"
| "odd (Suc n) = even n"
by pat_completeness auto
text ‹
  To eliminate the mutual dependencies, Isabelle internally
  creates a single function operating on the sum
  type \<^typ>‹nat + nat›. Then, \<^const>‹even› and \<^const>‹odd› are
  defined as projections. Consequently, termination has to be proved
  simultaneously for both functions, by specifying a measure on the
  sum type: 
›
termination 
by (relation "measure (λx. case x of Inl n ⇒ n | Inr n ⇒ n)") auto
text ‹
  We could also have used ‹lexicographic_order›, which
  supports mutual recursive termination proofs to a certain extent.
›
subsection ‹Induction for mutual recursion›
text ‹
  When functions are mutually recursive, proving properties about them
  generally requires simultaneous induction. The induction rule @{thm [source] "even_odd.induct"}
  generated from the above definition reflects this.
  Let us prove something about \<^const>‹even› and \<^const>‹odd›:
›
lemma even_odd_mod2:
  "even n = (n mod 2 = 0)"
  "odd n = (n mod 2 = 1)"
text ‹
  We apply simultaneous induction, specifying the induction variable
  for both goals, separated by \cmd{and}:›
apply (induct n and n rule: even_odd.induct)
text ‹
  We get four subgoals, which correspond to the clauses in the
  definition of \<^const>‹even› and \<^const>‹odd›:
  @{subgoals[display,indent=0]}
  Simplification solves the first two goals, leaving us with two
  statements about the ‹mod› operation to prove:
›
apply simp_all
text ‹
  @{subgoals[display,indent=0]} 
  \noindent These can be handled by Isabelle's arithmetic decision procedures.
  
›
apply arith
apply arith
done
text ‹
  In proofs like this, the simultaneous induction is really essential:
  Even if we are just interested in one of the results, the other
  one is necessary to strengthen the induction hypothesis. If we leave
  out the statement about \<^const>‹odd› and just write \<^term>‹True› instead,
  the same proof fails:
›
lemma failed_attempt:
  "even n = (n mod 2 = 0)"
  "True"
apply (induct n rule: even_odd.induct)
text ‹
  \noindent Now the third subgoal is a dead end, since we have no
  useful induction hypothesis available:
  @{subgoals[display,indent=0]} 
›
oops
section ‹Elimination›
text ‹
  A definition of function ‹f› gives rise to two kinds of elimination rules. Rule ‹f.cases›
  simply describes case analysis according to the patterns used in the definition:
›
fun list_to_option :: "'a list ⇒ 'a option"
where
  "list_to_option [x] = Some x"
| "list_to_option _ = None"
thm list_to_option.cases
text ‹
  @{thm[display] list_to_option.cases}
  Note that this rule does not mention the function at all, but only describes the cases used for
  defining it. In contrast, the rule @{thm[source] list_to_option.elims} also tell us what the function
  value will be in each case:
›
thm list_to_option.elims
text ‹
  @{thm[display] list_to_option.elims}
  \noindent
  This lets us eliminate an assumption of the form \<^prop>‹list_to_option xs = y› and replace it
  with the two cases, e.g.:
›
lemma "list_to_option xs = y ⟹ P"
proof (erule list_to_option.elims)
  fix x assume "xs = [x]" "y = Some x" thus P sorry
next
  assume "xs = []" "y = None" thus P sorry
next
  fix a b xs' assume "xs = a # b # xs'" "y = None" thus P sorry
qed
text ‹
  Sometimes it is convenient to derive specialized versions of the ‹elim› rules above and
  keep them around as facts explicitly. For example, it is natural to show that if 
  \<^prop>‹list_to_option xs = Some y›, then \<^term>‹xs› must be a singleton. The command 
  \cmd{fun\_cases} derives such facts automatically, by instantiating and simplifying the general 
  elimination rules given some pattern:
›
fun_cases list_to_option_SomeE[elim]: "list_to_option xs = Some y"
thm list_to_option_SomeE
text ‹
  @{thm[display] list_to_option_SomeE}
›
section ‹General pattern matching›
text‹\label{genpats}›
subsection ‹Avoiding automatic pattern splitting›
text ‹
  Up to now, we used pattern matching only on datatypes, and the
  patterns were always disjoint and complete, and if they weren't,
  they were made disjoint automatically like in the definition of
  \<^const>‹sep› in \S\ref{patmatch}.
  This automatic splitting can significantly increase the number of
  equations involved, and this is not always desirable. The following
  example shows the problem:
  
  Suppose we are modeling incomplete knowledge about the world by a
  three-valued datatype, which has values \<^term>‹T›, \<^term>‹F›
  and \<^term>‹X› for true, false and uncertain propositions, respectively. 
›