header {* Regular Expressions *} (*<*)theory Exercises4 imports Main begin(*>*) text {* This assignment \emph{will be assessed} to determine 50\% of your final mark. Please complete the indicated tasks and write a brief document explaining your work. You may prepare this document using Isabelle's theory presentation facility, but this is not required. (A very simple way to print a theory file legibly is to use the Proof General command Isabelle~$>$ Commands~$>$ Display draft. You can combine the resulting output with a document produced using your favourite word processing package.) A clear write-up describing elegant, clearly structured proofs of all tasks will receive maximum credit. You must work on this assignment as an individual. Collaboration is not permitted. *} text {* Consider reading, e.g., \url{http://en.wikipedia.org/wiki/Regular_expression} to refresh your knowledge of regular expressions. \smallskip For this assignment, we define \emph{regular expressions} (over an arbitrary type @{typ 'a} of characters) as follows: \begin{enumerate} \item $\emptyset$ is a regular expression. \item $\varepsilon$ is a regular expression. \item If $c$ is of type @{typ 'a}, then $\mathit{Atom}(c)$ is a regular expression. \item If $x$ and $y$ are regular expressions, then $xy$ is a regular expression. \item If $x$ and $y$ are regular expressions, then $x+y$ is a regular expression. \item If $x$ is a regular expression, then $x^*$ is a regular expression. \end{enumerate} Nothing else is a regular expression. \smallskip $\rhd$ Define a corresponding Isabelle/HOL data type. (Your concrete syntax may be different from that used above. For instance, you could write @{term "Star x"} for $x^*$.) *} (*<*)typedecl 'a regexp(*>*) subsection {* Regular Languages *} text {* A \emph{word} is a list of characters: *} type_synonym 'a word = "'a list" text {* Regular expressions denote formal languages, i.e., sets of words. For $x$ a regular expression, we define its \emph{language} $L(x)$ as follows: \begin{enumerate} \item $L(\emptyset) = \emptyset$. \item $L(\varepsilon) = \{[\,]\}$. \item $L(\mathit{Atom}(c)) = \{[c]\}$. \item $L(xy) = \{ uv \mid u \in L(x) \wedge v \in L(y) \}$. \item $L(x+y) = L(x) \cup L(y)$. \item $L( x^* )$ is the smallest set that contains the empty word and is closed under concatenation with words in $L(x)$. That is, (i)~$[\,] \in L( x^* )$, and (ii)~if $u \in L(x)$ and $v \in L( x^* )$, then $uv \in L( x^* )$. \end{enumerate} $\rhd$ Define a function @{term L} that maps regular expressions to their language. *} (*<*)consts(*>*) L :: "'a regexp \ 'a word set" text {* $\rhd$ Prove the following lemma. *} lemma "L (Star (Star x)) = L (Star x)" (*<*)oops(*>*) subsection {* Matching via Derivatives *} text {* We now consider regular expression \emph{matching}: the problem of determining whether a given word is in the language of a given regular expression. You are about to develop your own verified regular expression matcher. We need some auxiliary notions first. A regular expression is called \emph{nullable} iff its language contains the empty word. $\rhd$ Define a recursive function @{term "nullable x"} that computes (by recursion over @{term x}, i.e., without explicit use of @{term L}) whether a regular expression is nullable. *} (*<*)consts(*>*) nullable :: "'a regexp \ bool" text {* $\rhd$ Prove the following lemma. *} lemma(*<*)nullable_correct:(*>*) "nullable x = ([] \ L x)" (*<*)oops(*>*) text {* \pagebreak The \emph{derivative} of a language~$\mathcal{L}$ with respect to a word~$u$ is defined to be $\delta_u\,\mathcal{L} = \{ v \mid uv \in \mathcal{L} \}$. For languages that are given by regular expressions, there is a natural algorithm to compute the derivative as another regular expression. $\rhd$ Define a recursive function @{term "\ c x"} that computes (by recursion over @{term x}) a regular expression whose language is the derivative of @{term "L x"} with respect to the single-character word @{term "[c]"}. *} (*<*)consts(*>*) \ :: "'a \ 'a regexp \ 'a regexp" text {* Hint: @{term nullable} might come in handy. \smallskip $\rhd$ Prove the following lemma. *} lemma(*<*)Delta_correct:(*>*) "u \ L (\ c x) = (c#u \ L x)" (*<*)oops(*>*) text {* Hint: see the \emph{Tutorial on Isabelle/HOL} and the \emph{Tutorial on Isar} for advanced induction techniques. \smallskip $\rhd$ Define a recursive function @{term \} that lifts @{term \} from single characters to words, i.e., @{term "\ u x"} is a regular expression whose language is the derivative of @{term "L x"} with respect to the word @{term u}. *} (*<*)consts(*>*) \ :: "'a word \ 'a regexp \ 'a regexp" text {* $\rhd$ Prove the following lemma. *} lemma(*<*)delta_correct:(*>*) "u \ L (\ v x) = (v @ u \ L x)" (*<*)oops(*>*) text {* To obtain a regular expression matcher, we finally observe that @{term "u \ L x"} if and only if @{term "\ u x"} is nullable. *} definition match :: "'a word \ 'a regexp \ bool" where "match u x = nullable (\ u x)" text {* $\rhd$ Prove correctness of @{term match}. *} theorem "match u x = (u \ L x)" (*<*)oops(*>*) (*<*)end(*>*)