header {* Regular Expressions *} (*<*)theory Solution4 imports Main begin(*>*) text {* This assignment \emph{will be assessed} to determine 50\% of your final mark. Please complete the indicated tasks and write a brief document explaining your work. You may prepare this document using Isabelle's theory presentation facility, but this is not required. (A very simple way to print a theory file legibly is to use the Proof General command Isabelle~$>$ Commands~$>$ Display draft. You can combine the resulting output with a document produced using your favourite word processing package.) A clear write-up describing elegant, clearly structured proofs of all tasks will receive maximum credit. You must work on this assignment as an individual. Collaboration is not permitted. *} text {* Consider reading, e.g., \url{http://en.wikipedia.org/wiki/Regular_expression} to refresh your knowledge of regular expressions. \smallskip For this assignment, we define \emph{regular expressions} (over an arbitrary type @{typ 'a} of characters) as follows: \begin{enumerate} \item $\emptyset$ is a regular expression. \item $\varepsilon$ is a regular expression. \item If $c$ is of type @{typ 'a}, then $\mathit{Atom}(c)$ is a regular expression. \item If $x$ and $y$ are regular expressions, then $xy$ is a regular expression. \item If $x$ and $y$ are regular expressions, then $x+y$ is a regular expression. \item If $x$ is a regular expression, then $x^*$ is a regular expression. \end{enumerate} Nothing else is a regular expression. \smallskip $\rhd$ Define a corresponding Isabelle/HOL data type. (Your concrete syntax may be different from that used above. For instance, you could write @{term "Star x"} for $x^*$.) *} (*<*) no_notation plus (infixl "+" 65) no_notation rtrancl ("(_\<^sup>*)" [1000] 999) (*>*) datatype 'a regexp = EmptySet ("\") | EmptyWord ("\") | Atom 'a | Seq "'a regexp" "'a regexp" (infixl "\" 70) | Sum "'a regexp" "'a regexp" (infixl "+" 65) | Star "'a regexp" ("_\<^sup>*" [80] 80) subsection {* Regular Languages *} text {* A \emph{word} is a list of characters: *} type_synonym 'a word = "'a list" text {* Regular expressions denote formal languages, i.e., sets of words. For $x$ a regular expression, we define its \emph{language} $L(x)$ as follows: \begin{enumerate} \item $L(\emptyset) = \emptyset$. \item $L(\varepsilon) = \{[\,]\}$. \item $L(\mathit{Atom}(c)) = \{[c]\}$. \item $L(xy) = \{ uv \mid u \in L(x) \wedge v \in L(y) \}$. \item $L(x+y) = L(x) \cup L(y)$. \item $L( x^* )$ is the smallest set that contains the empty word and is closed under concatenation with words in $L(x)$. That is, (i)~$[\,] \in L( x^* )$, and (ii)~if $u \in L(x)$ and $v \in L( x^* )$, then $uv \in L( x^* )$. \end{enumerate} $\rhd$ Define a function @{term L} that maps regular expressions to their language. *} inductive_set KleeneStar :: "'a word set \ 'a word set" for x :: "'a word set" where KleeneStar_epsilon [simp]: "[] \ KleeneStar x" | KleeneStar_step: "\ u \ x; v \ KleeneStar x \ \ u @ v \ KleeneStar x" fun L :: "'a regexp \ 'a word set" where "L \ = {}" | "L \ = {[]}" | "L (Atom c) = {[c]}" | "L (x\y) = {u @ v | u v. u\L x \ v\L y}" | "L (x+y) = L x \ L y" | "L (x\<^sup>*) = KleeneStar (L x)" text {* $\rhd$ Prove the following lemma. *} lemma KleeneStar_mono [simp]: "u \ x \ u \ KleeneStar x" by (metis append_Nil2 KleeneStar_epsilon KleeneStar_step) lemma KleeneStar_append [simp]: "\ u \ KleeneStar x; v \ KleeneStar x \ \ u @ v \ KleeneStar x" by (induct u rule: KleeneStar.induct) (simp, simp add: KleeneStar_step) lemma KleeneStar_idem: "u \ KleeneStar (KleeneStar x) \ u \ KleeneStar x" by (induct u rule: KleeneStar.induct) simp_all lemma "L (Star (Star x)) = L (Star x)" by auto (erule KleeneStar_idem) subsection {* Matching via Derivatives *} text {* We now consider regular expression \emph{matching}: the problem of determining whether a given word is in the language of a given regular expression. You are about to develop your own verified regular expression matcher. We need some auxiliary notions first. A regular expression is called \emph{nullable} iff its language contains the empty word. $\rhd$ Define a recursive function @{term "nullable x"} that computes (by recursion over @{term x}, i.e., without explicit use of @{term L}) whether a regular expression is nullable. *} fun nullable :: "'a regexp \ bool" where "nullable \ = False" | "nullable \ = True" | "nullable (Atom c) = False" | "nullable (x\y) = (nullable x \ nullable y)" | "nullable (x+y) = (nullable x \ nullable y)" | "nullable (x\<^sup>*) = True" text {* $\rhd$ Prove the following lemma. *} lemma(*<*)nullable_correct:(*>*) "nullable x = ([] \ L x)" by (induct x) auto text {* \pagebreak The \emph{derivative} of a language~$\mathcal{L}$ with respect to a word~$u$ is defined to be $\delta_u\,\mathcal{L} = \{ v \mid uv \in \mathcal{L} \}$. For languages that are given by regular expressions, there is a natural algorithm to compute the derivative as another regular expression. $\rhd$ Define a recursive function @{term "\ c x"} that computes (by recursion over @{term x}) a regular expression whose language is the derivative of @{term "L x"} with respect to the single-character word @{term "[c]"}. *} fun \ :: "'a \ 'a regexp \ 'a regexp" where "\ c \ = \" | "\ c \ = \" | "\ c (Atom a) = (if c = a then \ else \)" | "\ c (x\y) = \ c x \ y + (if nullable x then \ c y else \)" | "\ c (x+y) = \ c x + \ c y" | "\ c (x\<^sup>*) = \ c x \ x\<^sup>*" text {* Hint: @{term nullable} might come in handy. \smallskip $\rhd$ Prove the following lemma. *} lemma KleeneStar_append_Cons [simp]: "\ c # u \ KleeneStar x; v \ KleeneStar x \ \ c # u @ v \ KleeneStar x" by (metis KleeneStar_append append_Cons) lemma KleeneStar_split_nonempty: "c # w \ KleeneStar x \ \u v. w = u @ v \ c # u \ x \ v \ KleeneStar x" by (induct "c # w" rule: KleeneStar.induct) (auto simp add: append_eq_Cons_conv) text {* Alternatively, we can introduce a fresh variable as an abbreviation for the term @{term "c#w"} that we want to induct over: *} lemma "y \ KleeneStar x \ y = c # w \ \u v. w = u @ v \ c # u \ x \ v \ KleeneStar x" by (induct y rule: KleeneStar.induct) (auto simp add: append_eq_Cons_conv) lemma(*<*)Delta_correct:(*>*) "u \ L (\ c x) = (c#u \ L x)" proof (induct x arbitrary: u) case Seq thus ?case by (auto simp add: nullable_correct) (metis append_Cons, metis append_eq_Cons_conv)+ case Star thus ?case by (auto simp add: KleeneStar_split_nonempty) qed simp_all -- {* the remaining cases are solved by simplification *} text {* Hint: see the \emph{Tutorial on Isabelle/HOL} and the \emph{Tutorial on Isar} for advanced induction techniques. \smallskip $\rhd$ Define a recursive function @{term \} that lifts @{term \} from single characters to words, i.e., @{term "\ u x"} is a regular expression whose language is the derivative of @{term "L x"} with respect to the word @{term u}. *} fun \ :: "'a word \ 'a regexp \ 'a regexp" where "\ [] x = x" | "\ (c#cs) x = \ cs (\ c x)" text {* $\rhd$ Prove the following lemma. *} lemma(*<*)delta_correct:(*>*) "u \ L (\ v x) = (v @ u \ L x)" by (induct v arbitrary: x) (simp, simp add: Delta_correct) text {* To obtain a regular expression matcher, we finally observe that @{term "u \ L x"} if and only if @{term "\ u x"} is nullable. *} definition match :: "'a word \ 'a regexp \ bool" where "match u x = nullable (\ u x)" text {* $\rhd$ Prove correctness of @{term match}. *} theorem "match u x = (u \ L x)" by (simp add: match_def nullable_correct delta_correct) (*<*)end(*>*)