Atlas.TheoryOfComputation.code.GNFAToRegex

theorem Sipser.cons_kstar {α : Type u_1} {L : Language α} {w p : List α} (hw : w ∈ L) (hp : p ∈ KStar.kstar L) :

Prepending a single word w ∈ L to a Kleene-star word p ∈ L* yields a Kleene-star word: w ++ p ∈ L*.

theorem Sipser.kstar_append_kstar {α : Type u_1} {L : Language α} {p q : List α} (hp : p ∈ KStar.kstar L) (hq : q ∈ KStar.kstar L) :

p ++ q ∈ KStar.kstar L

The Kleene star is closed under concatenation: if p, q ∈ L* then p ++ q ∈ L*.

source

theorem Sipser.kstar_split {α : Type u_1} (x y : List α) (L : Language α) (S : List (List α)) (hxy : x ++ y = S.flatten) (hS : ∀ w ∈ S, w ∈ L) :

(∃ (p : List α) (c : List α) (d : List α) (rest : List α), p ∈ KStar.kstar L ∧ x = p ++ c ∧ c ++ d ∈ L ∧ rest ∈ KStar.kstar L ∧ y = d ++ rest) ∨ x ∈ KStar.kstar L ∧ y = []

Splitting a Kleene-star concatenation across a fixed cut.

If x ++ y equals the flattening of a list S of L-words, then either:

the cut lies inside some element of S — there is a prefix p ∈ L* with x = p ++ c, an L-word c ++ d straddling the cut, and a suffix rest ∈ L* with y = d ++ rest; or
the cut lies after every element, in which case x ∈ L* and y = [].

source

theorem Sipser.regular_kstar {α : Type u_1} {L : Language α} (hL : L.IsRegular) :

(KStar.kstar L).IsRegular

Theorem (Closure under star). If L is a regular language then so is its Kleene star L*.

source

theorem Sipser.eq_of_card_eq_two {Q : Type u_1} [Fintype Q] [DecidableEq Q] {a b : Q} (hab : a ≠ b) (hcard : Fintype.card Q = 2) (q : Q) :

q = a ∨ q = b

In a 2-element type with two distinguished distinct elements a, b, every element equals a or b.

source

theorem Sipser.exists_third {Q : Type u_1} [Fintype Q] [DecidableEq Q] {a b : Q} (hab : a ≠ b) (hcard : 2 < Fintype.card Q) :

∃ (x : Q), x ≠ a ∧ x ≠ b

In a type with more than two elements, given two distinct elements a, b, there exists a third element distinct from both — used to pick a state to "rip" in the GNFA → regex construction.

source

theorem Sipser.gnfa_language_of_two_states {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (hcard : Fintype.card Q = 2) :

G.language = (G.δ G.start G.accept).matches'

Base case of GNFA → regex. When the GNFA has only two states (the start and accept states), its language coincides with the language of the regular expression labelling the single arrow from start to accept.

source

noncomputable def Sipser.GNFA.rip {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (x : Q) (hxs : x ≠ G.start) (hxa : x ≠ G.accept) :

GNFA { q : Q // q ≠ x } σ

The "rip" construction (state elimination).

Given a GNFA G and a state x that is neither the start nor the accept state, G.rip x hxs hxa is the GNFA on the states {q // q ≠ x} obtained by removing x and, for every remaining pair of states qᵢ, qⱼ, replacing the label δ(qᵢ, qⱼ) by the union

δ(qᵢ, qⱼ) ∪ δ(qᵢ, x) · δ(x, x)* · δ(x, qⱼ)

so that every path that used to pass through x is now captured by a direct arrow. This is the inductive step of the proof that every GNFA has an equivalent regular expression.

Instances For

source

theorem Sipser.card_rip_lt {Q : Type u_1} [Fintype Q] [DecidableEq Q] (x : Q) :

Fintype.card { q : Q // q ≠ x } < Fintype.card Q

Ripping out the state x strictly decreases the cardinality, providing the well-founded measure used in the recursion gnfa_to_regex_aux.

source

theorem Sipser.rip_forward_A {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (x : Q) (hxs : x ≠ G.start) (hxa : x ≠ G.accept) :

(∀ {q q' : Q} {w : List σ} (hq : q ≠ x) (hq' : q' ≠ x), G.AcceptPath q q' w → (G.rip x hxs hxa).AcceptPath ⟨q, hq⟩ ⟨q', hq'⟩ w) ∧ ∀ {q : Q} {w : List σ} (hq : q ≠ x), G.AcceptPath q x w → ∃ (ql : Q) (hql : ql ≠ x) (wp : List σ) (wt : List σ) (wl : List σ), (G.rip x hxs hxa).AcceptPath ⟨q, hq⟩ ⟨ql, hql⟩ wp ∧ wt ∈ (G.δ ql x).matches' ∧ wl ∈ (G.δ x x).star.matches' ∧ w = wp ++ wt ++ wl

Forward direction of the rip-equivalence, in two parts.

Let G' = G.rip x hxs hxa. For any accepting path of G from q to q' labelled by w:

if both endpoints avoid x (q ≠ x and q' ≠ x), then the same word w already labels an accepting path of G';
if the source q avoids x but the target equals x, then the path can be decomposed as w = wp ++ wt ++ wl where wp is realised in G' reaching some ql ≠ x, wt ∈ δ(ql, x) is the final exit into x, and wl ∈ δ(x, x)* is a (possibly empty) loop at x.

Together these are used to show that every word accepted by G is also accepted by G.rip x ….

source

theorem Sipser.rip_language_forward {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (x : Q) (hxs : x ≠ G.start) (hxa : x ≠ G.accept) (w : List σ) :

w ∈ G.language → w ∈ (G.rip x hxs hxa).language

Every word accepted by G is also accepted by the ripped GNFA G.rip x …: L(G) ⊆ L(G.rip x …).

source

theorem Sipser.kstar_to_acceptPath {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (q : Q) {v : List σ} (hv : v ∈ KStar.kstar (G.δ q q).matches') :

G.AcceptPath q q v

Any word in the Kleene star of the self-loop regex δ(q, q)* labels some accepting path of G from q back to q.

source

theorem Sipser.rip_language_backward {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (x : Q) (hxs : x ≠ G.start) (hxa : x ≠ G.accept) (w : List σ) :

w ∈ (G.rip x hxs hxa).language → w ∈ G.language

Every word accepted by the ripped GNFA is accepted by the original: L(G.rip x …) ⊆ L(G). Combined with rip_language_forward, this gives L(G.rip x …) = L(G).

source

theorem Sipser.gnfa_to_regex_aux (n : ℕ) (Q : Type u_1) (σ : Type u_2) [inst1 : Fintype Q] [inst2 : DecidableEq Q] :

Fintype.card Q = n → ∀ (G : GNFA Q σ), ∃ (R : RegularExpression σ), G.language = R.matches'

Auxiliary induction on the number of states. For every n, every GNFA with exactly n states has an equivalent regular expression: the base case n = 2 is handled by gnfa_language_of_two_states, and the inductive step uses GNFA.rip to remove an interior state.

source

theorem Sipser.gnfa_to_regex {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) :

∃ (R : RegularExpression σ), G.language = R.matches'

Lemma (GNFA → Regular Expressions). Every GNFA G has an equivalent regular expression R, i.e. L(G) = L(R).

source

def Sipser.regexCharUnion {α : Type u_1} :

List α → RegularExpression α

The regular expression matching the single-symbol language {[a] | a ∈ l} — a finite union of char regexes (or zero if the list is empty).

Instances For

source

theorem Sipser.mem_regexCharUnion_matches {α : Type u_1} {l w : List α} :

w ∈ (regexCharUnion l).matches' ↔ ∃ a ∈ l, w = [a]

Characterisation of the language matched by regexCharUnion l: w is matched iff w = [a] for some a appearing in l.

source

noncomputable def Sipser.dfaCharRegex {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq σ] (M : DFA α σ) (s s' : σ) :

RegularExpression α

For a DFA M and states s, s', dfaCharRegex M s s' is the regular expression matching exactly the single-letter words [a] for which M.step s a = s'.

Instances For

source

theorem Sipser.mem_dfaCharRegex_matches {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) (s s' : σ) (w : List α) :

w ∈ (dfaCharRegex M s s').matches' ↔ ∃ (a : α), M.step s a = s' ∧ w = [a]

The matching characterisation for dfaCharRegex: w is matched iff w = [a] for some symbol a with M.step s a = s'.

source

noncomputable def Sipser.dfaToGNFA {α : Type u_1} {σ : Type u_2} [Fintype α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) :

GNFA (Option (Option σ)) α

DFA → GNFA conversion.

Given a DFA M with states σ, build an equivalent GNFA whose state type is Option (Option σ):

the outer none is the new GNFA start state;
some none is the new GNFA accept state;
some (some s) corresponds to an original DFA state s ∈ σ.

Transitions are: an ε-edge from start to some (some M.start) (and directly to the accept state when M.start ∈ M.accept), an ε-edge from every DFA-accept state to the new accept state, and between two DFA states s, s' the regex dfaCharRegex M s s'. All other arrows are zero, and the no-enter-start / no-exit-accept conditions hold by construction.

Instances For

source

theorem Sipser.toGNFA_acceptPath_dfa {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) (s : σ) (w : List α) (hw : M.evalFrom s w ∈ M.accept) :

(dfaToGNFA M).AcceptPath (some (some s)) (some none) w

Simulating a DFA computation inside dfaToGNFA: if running M from state s on input w leads to an accept state, then w labels a GNFA accept path from some (some s) to the new accept state some none.

source

theorem Sipser.toGNFA_language_forward {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) (w : List α) (hw : w ∈ M.accepts) :

w ∈ (dfaToGNFA M).language

Forward inclusion L(M) ⊆ L(dfaToGNFA M): every word accepted by the DFA is accepted by the converted GNFA.

source

theorem Sipser.toGNFA_decompose_from_start {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) {q₂ : Option (Option σ)} {w : List α} (hp : (dfaToGNFA M).AcceptPath none q₂ w) :

q₂ = none ∧ w = [] ∨ ∃ (q_first : Option (Option σ)) (w_exit : List α) (w_rest : List α), w_exit ∈ ((dfaToGNFA M).δ none q_first).matches' ∧ (dfaToGNFA M).AcceptPath q_first q₂ w_rest ∧ w = w_exit ++ w_rest

Decomposition of any accept path starting at the GNFA start state none: either the path is the trivial empty self-loop, or it consists of a first exit-step from none to some q_first (consuming w_exit) followed by a residual accept path from q_first (consuming w_rest), with w = w_exit ++ w_rest.

source

theorem Sipser.toGNFA_dfa_path {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) {s : σ} {q₂ : Option (Option σ)} {u : List α} (hp : (dfaToGNFA M).AcceptPath (some (some s)) q₂ u) :

(∃ (s' : σ), q₂ = some (some s') ∧ M.evalFrom s u = s') ∨ q₂ = some none ∧ M.evalFrom s u ∈ M.accept

Any GNFA accept path starting from a DFA-state some (some s) mirrors a DFA computation: either it ends in another DFA-state some (some s') with M.evalFrom s u = s', or it ends in the GNFA accept state some none and M.evalFrom s u is an accepting state of M.

source

theorem Sipser.toGNFA_accept_path_nil {α : Type u_1} {σ : Type u_2} [Fintype α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) {q₂ : Option (Option σ)} {u : List α} (hp : (dfaToGNFA M).AcceptPath (some none) q₂ u) :

q₂ = some none ∧ u = []

The GNFA accept state some none has no outgoing edges (all labels are zero), so any accept path starting there must be trivial: the endpoint is still some none and the consumed word is empty.

source

theorem Sipser.toGNFA_language_backward {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) (w : List α) (hw : w ∈ (dfaToGNFA M).language) :

w ∈ M.accepts

Backward inclusion L(dfaToGNFA M) ⊆ L(M): every word accepted by the converted GNFA is accepted by the original DFA.

source

theorem Sipser.toGNFA_language {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) :

(dfaToGNFA M).language = M.accepts

DFA → GNFA correctness. The GNFA dfaToGNFA M recognises exactly the same language as the DFA M: L(dfaToGNFA M) = L(M). Combined with gnfa_to_regex, this shows every regular language is described by a regular expression.