header {* Regular Expressions *}
(*<*)theory Exercises4 imports Main begin(*>*)
text {*
This assignment \emph{will be assessed} to determine 50\% of your
final mark. Please complete the indicated tasks and write a brief
document explaining your work. You may prepare this document using
Isabelle's theory presentation facility, but this is not required. (A
very simple way to print a theory file legibly is to use the Proof
General command Isabelle~$>$ Commands~$>$ Display draft. You can
combine the resulting output with a document produced using your
favourite word processing package.) A clear write-up describing
elegant, clearly structured proofs of all tasks will receive maximum
credit.
You must work on this assignment as an individual. Collaboration is
not permitted.
*}
text {*
Consider reading, e.g.,
\url{http://en.wikipedia.org/wiki/Regular_expression} to refresh your
knowledge of regular expressions.
\smallskip
For this assignment, we define \emph{regular expressions} (over an
arbitrary type @{typ 'a} of characters) as follows:
\begin{enumerate}
\item $\emptyset$ is a regular expression.
\item $\varepsilon$ is a regular expression.
\item If $c$ is of type @{typ 'a}, then $\mathit{Atom}(c)$ is a
regular expression.
\item If $x$ and $y$ are regular expressions, then $xy$ is a regular
expression.
\item If $x$ and $y$ are regular expressions, then $x+y$ is a regular
expression.
\item If $x$ is a regular expression, then $x^*$ is a regular
expression.
\end{enumerate}
Nothing else is a regular expression.
\smallskip
$\rhd$ Define a corresponding Isabelle/HOL data type. (Your concrete
syntax may be different from that used above. For instance, you could
write @{term "Star x"} for $x^*$.)
*}
(*<*)typedecl 'a regexp(*>*)
subsection {* Regular Languages *}
text {* A \emph{word} is a list of characters: *}
type_synonym 'a word = "'a list"
text {*
Regular expressions denote formal languages, i.e., sets of words. For
$x$ a regular expression, we define its \emph{language} $L(x)$ as
follows:
\begin{enumerate}
\item $L(\emptyset) = \emptyset$.
\item $L(\varepsilon) = \{[\,]\}$.
\item $L(\mathit{Atom}(c)) = \{[c]\}$.
\item $L(xy) = \{ uv \mid u \in L(x) \wedge v \in L(y) \}$.
\item $L(x+y) = L(x) \cup L(y)$.
\item $L( x^* )$ is the smallest set that contains the empty word and
is closed under concatenation with words in $L(x)$. That is,
(i)~$[\,] \in L( x^* )$, and (ii)~if $u \in L(x)$ and $v \in L( x^*
)$, then $uv \in L( x^* )$.
\end{enumerate}
$\rhd$ Define a function @{term L} that maps regular expressions to
their language.
*}
(*<*)consts(*>*) L :: "'a regexp \ 'a word set"
text {*
$\rhd$ Prove the following lemma.
*}
lemma "L (Star (Star x)) = L (Star x)"
(*<*)oops(*>*)
subsection {* Matching via Derivatives *}
text {*
We now consider regular expression \emph{matching}: the problem of
determining whether a given word is in the language of a given regular
expression. You are about to develop your own verified regular
expression matcher. We need some auxiliary notions first.
A regular expression is called \emph{nullable} iff its language
contains the empty word.
$\rhd$ Define a recursive function @{term "nullable x"} that computes
(by recursion over @{term x}, i.e., without explicit use of @{term L})
whether a regular expression is nullable.
*}
(*<*)consts(*>*) nullable :: "'a regexp \ bool"
text {*
$\rhd$ Prove the following lemma.
*}
lemma(*<*)nullable_correct:(*>*) "nullable x = ([] \ L x)"
(*<*)oops(*>*)
text {*
\pagebreak
The \emph{derivative} of a language~$\mathcal{L}$ with respect to a
word~$u$ is defined to be $\delta_u\,\mathcal{L} = \{ v \mid uv \in
\mathcal{L} \}$.
For languages that are given by regular expressions, there is a
natural algorithm to compute the derivative as another regular
expression.
$\rhd$ Define a recursive function @{term "\ c x"} that computes (by
recursion over @{term x}) a regular expression whose language is the
derivative of @{term "L x"} with respect to the single-character word
@{term "[c]"}.
*}
(*<*)consts(*>*) \ :: "'a \ 'a regexp \ 'a regexp"
text {*
Hint: @{term nullable} might come in handy.
\smallskip
$\rhd$ Prove the following lemma.
*}
lemma(*<*)Delta_correct:(*>*) "u \ L (\ c x) = (c#u \ L x)"
(*<*)oops(*>*)
text {*
Hint: see the \emph{Tutorial on Isabelle/HOL} and the \emph{Tutorial
on Isar} for advanced induction techniques.
\smallskip
$\rhd$ Define a recursive function @{term \} that lifts @{term \} from
single characters to words, i.e., @{term "\ u x"} is a regular
expression whose language is the derivative of @{term "L x"} with
respect to the word @{term u}.
*}
(*<*)consts(*>*) \ :: "'a word \ 'a regexp \ 'a regexp"
text {*
$\rhd$ Prove the following lemma.
*}
lemma(*<*)delta_correct:(*>*) "u \ L (\ v x) = (v @ u \ L x)"
(*<*)oops(*>*)
text {*
To obtain a regular expression matcher, we finally observe that @{term
"u \ L x"} if and only if @{term "\ u x"} is nullable.
*}
definition match :: "'a word \ 'a regexp \ bool" where
"match u x = nullable (\ u x)"
text {*
$\rhd$ Prove correctness of @{term match}.
*}
theorem "match u x = (u \ L x)"
(*<*)oops(*>*)
(*<*)end(*>*)