header {* Regular Expressions *}
(*<*)theory Solution4 imports Main begin(*>*)
text {*
This assignment \emph{will be assessed} to determine 50\% of your
final mark. Please complete the indicated tasks and write a brief
document explaining your work. You may prepare this document using
Isabelle's theory presentation facility, but this is not required. (A
very simple way to print a theory file legibly is to use the Proof
General command Isabelle~$>$ Commands~$>$ Display draft. You can
combine the resulting output with a document produced using your
favourite word processing package.) A clear write-up describing
elegant, clearly structured proofs of all tasks will receive maximum
credit.
You must work on this assignment as an individual. Collaboration is
not permitted.
*}
text {*
Consider reading, e.g.,
\url{http://en.wikipedia.org/wiki/Regular_expression} to refresh your
knowledge of regular expressions.
\smallskip
For this assignment, we define \emph{regular expressions} (over an
arbitrary type @{typ 'a} of characters) as follows:
\begin{enumerate}
\item $\emptyset$ is a regular expression.
\item $\varepsilon$ is a regular expression.
\item If $c$ is of type @{typ 'a}, then $\mathit{Atom}(c)$ is a
regular expression.
\item If $x$ and $y$ are regular expressions, then $xy$ is a regular
expression.
\item If $x$ and $y$ are regular expressions, then $x+y$ is a regular
expression.
\item If $x$ is a regular expression, then $x^*$ is a regular
expression.
\end{enumerate}
Nothing else is a regular expression.
\smallskip
$\rhd$ Define a corresponding Isabelle/HOL data type. (Your concrete
syntax may be different from that used above. For instance, you could
write @{term "Star x"} for $x^*$.)
*}
(*<*)
no_notation plus (infixl "+" 65)
no_notation rtrancl ("(_\<^sup>*)" [1000] 999)
(*>*)
datatype 'a regexp = EmptySet ("\")
| EmptyWord ("\")
| Atom 'a
| Seq "'a regexp" "'a regexp" (infixl "\" 70)
| Sum "'a regexp" "'a regexp" (infixl "+" 65)
| Star "'a regexp" ("_\<^sup>*" [80] 80)
subsection {* Regular Languages *}
text {* A \emph{word} is a list of characters: *}
type_synonym 'a word = "'a list"
text {*
Regular expressions denote formal languages, i.e., sets of words. For
$x$ a regular expression, we define its \emph{language} $L(x)$ as
follows:
\begin{enumerate}
\item $L(\emptyset) = \emptyset$.
\item $L(\varepsilon) = \{[\,]\}$.
\item $L(\mathit{Atom}(c)) = \{[c]\}$.
\item $L(xy) = \{ uv \mid u \in L(x) \wedge v \in L(y) \}$.
\item $L(x+y) = L(x) \cup L(y)$.
\item $L( x^* )$ is the smallest set that contains the empty word and
is closed under concatenation with words in $L(x)$. That is,
(i)~$[\,] \in L( x^* )$, and (ii)~if $u \in L(x)$ and $v \in L( x^*
)$, then $uv \in L( x^* )$.
\end{enumerate}
$\rhd$ Define a function @{term L} that maps regular expressions to
their language.
*}
inductive_set KleeneStar :: "'a word set \ 'a word set"
for x :: "'a word set" where
KleeneStar_epsilon [simp]: "[] \ KleeneStar x"
| KleeneStar_step: "\ u \ x; v \ KleeneStar x \ \ u @ v \ KleeneStar x"
fun L :: "'a regexp \ 'a word set" where
"L \ = {}"
| "L \ = {[]}"
| "L (Atom c) = {[c]}"
| "L (x\y) = {u @ v | u v. u\L x \ v\L y}"
| "L (x+y) = L x \ L y"
| "L (x\<^sup>*) = KleeneStar (L x)"
text {*
$\rhd$ Prove the following lemma.
*}
lemma KleeneStar_mono [simp]: "u \ x \ u \ KleeneStar x"
by (metis append_Nil2 KleeneStar_epsilon KleeneStar_step)
lemma KleeneStar_append [simp]:
"\ u \ KleeneStar x; v \ KleeneStar x \ \ u @ v \ KleeneStar x"
by (induct u rule: KleeneStar.induct) (simp, simp add: KleeneStar_step)
lemma KleeneStar_idem:
"u \ KleeneStar (KleeneStar x) \ u \ KleeneStar x"
by (induct u rule: KleeneStar.induct) simp_all
lemma "L (Star (Star x)) = L (Star x)"
by auto (erule KleeneStar_idem)
subsection {* Matching via Derivatives *}
text {*
We now consider regular expression \emph{matching}: the problem of
determining whether a given word is in the language of a given regular
expression. You are about to develop your own verified regular
expression matcher. We need some auxiliary notions first.
A regular expression is called \emph{nullable} iff its language
contains the empty word.
$\rhd$ Define a recursive function @{term "nullable x"} that computes
(by recursion over @{term x}, i.e., without explicit use of @{term L})
whether a regular expression is nullable.
*}
fun nullable :: "'a regexp \ bool" where
"nullable \ = False"
| "nullable \ = True"
| "nullable (Atom c) = False"
| "nullable (x\y) = (nullable x \ nullable y)"
| "nullable (x+y) = (nullable x \ nullable y)"
| "nullable (x\<^sup>*) = True"
text {*
$\rhd$ Prove the following lemma.
*}
lemma(*<*)nullable_correct:(*>*) "nullable x = ([] \ L x)"
by (induct x) auto
text {*
\pagebreak
The \emph{derivative} of a language~$\mathcal{L}$ with respect to a
word~$u$ is defined to be $\delta_u\,\mathcal{L} = \{ v \mid uv \in
\mathcal{L} \}$.
For languages that are given by regular expressions, there is a
natural algorithm to compute the derivative as another regular
expression.
$\rhd$ Define a recursive function @{term "\ c x"} that computes (by
recursion over @{term x}) a regular expression whose language is the
derivative of @{term "L x"} with respect to the single-character word
@{term "[c]"}.
*}
fun \ :: "'a \ 'a regexp \ 'a regexp" where
"\ c \ = \"
| "\ c \ = \"
| "\ c (Atom a) = (if c = a then \ else \)"
| "\ c (x\y) = \ c x \ y + (if nullable x then \ c y else \)"
| "\ c (x+y) = \ c x + \ c y"
| "\ c (x\<^sup>*) = \ c x \ x\<^sup>*"
text {*
Hint: @{term nullable} might come in handy.
\smallskip
$\rhd$ Prove the following lemma.
*}
lemma KleeneStar_append_Cons [simp]:
"\ c # u \ KleeneStar x; v \ KleeneStar x \ \ c # u @ v \ KleeneStar x"
by (metis KleeneStar_append append_Cons)
lemma KleeneStar_split_nonempty:
"c # w \ KleeneStar x \ \u v. w = u @ v \ c # u \ x \ v \ KleeneStar x"
by (induct "c # w" rule: KleeneStar.induct) (auto simp add: append_eq_Cons_conv)
text {*
Alternatively, we can introduce a fresh variable as an abbreviation
for the term @{term "c#w"} that we want to induct over:
*}
lemma "y \ KleeneStar x \ y = c # w \ \u v. w = u @ v \ c # u \ x \ v \ KleeneStar x"
by (induct y rule: KleeneStar.induct) (auto simp add: append_eq_Cons_conv)
lemma(*<*)Delta_correct:(*>*) "u \ L (\ c x) = (c#u \ L x)"
proof (induct x arbitrary: u)
case Seq thus ?case
by (auto simp add: nullable_correct) (metis append_Cons, metis append_eq_Cons_conv)+
case Star thus ?case
by (auto simp add: KleeneStar_split_nonempty)
qed simp_all -- {* the remaining cases are solved by simplification *}
text {*
Hint: see the \emph{Tutorial on Isabelle/HOL} and the \emph{Tutorial
on Isar} for advanced induction techniques.
\smallskip
$\rhd$ Define a recursive function @{term \} that lifts @{term \} from
single characters to words, i.e., @{term "\ u x"} is a regular
expression whose language is the derivative of @{term "L x"} with
respect to the word @{term u}.
*}
fun \ :: "'a word \ 'a regexp \ 'a regexp" where
"\ [] x = x"
| "\ (c#cs) x = \ cs (\ c x)"
text {*
$\rhd$ Prove the following lemma.
*}
lemma(*<*)delta_correct:(*>*) "u \ L (\ v x) = (v @ u \ L x)"
by (induct v arbitrary: x) (simp, simp add: Delta_correct)
text {*
To obtain a regular expression matcher, we finally observe that @{term
"u \ L x"} if and only if @{term "\ u x"} is nullable.
*}
definition match :: "'a word \ 'a regexp \ bool" where
"match u x = nullable (\ u x)"
text {*
$\rhd$ Prove correctness of @{term match}.
*}
theorem "match u x = (u \ L x)"
by (simp add: match_def nullable_correct delta_correct)
(*<*)end(*>*)