Documentation

Atlas.TheoryOfComputation.code.GNFAToRegex

theorem Sipser.cons_kstar {α : Type u_1} {L : Language α} {w p : List α} (hw : w L) (hp : p KStar.kstar L) :

Prepending a single word w ∈ L to a Kleene-star word p ∈ L* yields a Kleene-star word: w ++ p ∈ L*.

theorem Sipser.kstar_append_kstar {α : Type u_1} {L : Language α} {p q : List α} (hp : p KStar.kstar L) (hq : q KStar.kstar L) :

The Kleene star is closed under concatenation: if p, q ∈ L* then p ++ q ∈ L*.

theorem Sipser.kstar_split {α : Type u_1} (x y : List α) (L : Language α) (S : List (List α)) (hxy : x ++ y = S.flatten) (hS : wS, w L) :
(∃ (p : List α) (c : List α) (d : List α) (rest : List α), p KStar.kstar L x = p ++ c c ++ d L rest KStar.kstar L y = d ++ rest) x KStar.kstar L y = []

Splitting a Kleene-star concatenation across a fixed cut.

If x ++ y equals the flattening of a list S of L-words, then either:

  • the cut lies inside some element of S — there is a prefix p ∈ L* with x = p ++ c, an L-word c ++ d straddling the cut, and a suffix rest ∈ L* with y = d ++ rest; or
  • the cut lies after every element, in which case x ∈ L* and y = [].
theorem Sipser.regular_kstar {α : Type u_1} {L : Language α} (hL : L.IsRegular) :

Theorem (Closure under star). If L is a regular language then so is its Kleene star L*.

theorem Sipser.eq_of_card_eq_two {Q : Type u_1} [Fintype Q] [DecidableEq Q] {a b : Q} (hab : a b) (hcard : Fintype.card Q = 2) (q : Q) :
q = a q = b

In a 2-element type with two distinguished distinct elements a, b, every element equals a or b.

theorem Sipser.exists_third {Q : Type u_1} [Fintype Q] [DecidableEq Q] {a b : Q} (hab : a b) (hcard : 2 < Fintype.card Q) :
∃ (x : Q), x a x b

In a type with more than two elements, given two distinct elements a, b, there exists a third element distinct from both — used to pick a state to "rip" in the GNFA → regex construction.

theorem Sipser.gnfa_language_of_two_states {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (hcard : Fintype.card Q = 2) :

Base case of GNFA → regex. When the GNFA has only two states (the start and accept states), its language coincides with the language of the regular expression labelling the single arrow from start to accept.

noncomputable def Sipser.GNFA.rip {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (x : Q) (hxs : x G.start) (hxa : x G.accept) :
GNFA { q : Q // q x } σ

The "rip" construction (state elimination).

Given a GNFA G and a state x that is neither the start nor the accept state, G.rip x hxs hxa is the GNFA on the states {q // q ≠ x} obtained by removing x and, for every remaining pair of states qᵢ, qⱼ, replacing the label δ(qᵢ, qⱼ) by the union

δ(qᵢ, qⱼ) ∪ δ(qᵢ, x) · δ(x, x)* · δ(x, qⱼ)

so that every path that used to pass through x is now captured by a direct arrow. This is the inductive step of the proof that every GNFA has an equivalent regular expression.

Instances For
    theorem Sipser.card_rip_lt {Q : Type u_1} [Fintype Q] [DecidableEq Q] (x : Q) :

    Ripping out the state x strictly decreases the cardinality, providing the well-founded measure used in the recursion gnfa_to_regex_aux.

    theorem Sipser.rip_forward_A {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (x : Q) (hxs : x G.start) (hxa : x G.accept) :
    (∀ {q q' : Q} {w : List σ} (hq : q x) (hq' : q' x), G.AcceptPath q q' w(G.rip x hxs hxa).AcceptPath q, hq q', hq' w) ∀ {q : Q} {w : List σ} (hq : q x), G.AcceptPath q x w∃ (ql : Q) (hql : ql x) (wp : List σ) (wt : List σ) (wl : List σ), (G.rip x hxs hxa).AcceptPath q, hq ql, hql wp wt (G.δ ql x).matches' wl (G.δ x x).star.matches' w = wp ++ wt ++ wl

    Forward direction of the rip-equivalence, in two parts.

    Let G' = G.rip x hxs hxa. For any accepting path of G from q to q' labelled by w:

    1. if both endpoints avoid x (q ≠ x and q' ≠ x), then the same word w already labels an accepting path of G';
    2. if the source q avoids x but the target equals x, then the path can be decomposed as w = wp ++ wt ++ wl where wp is realised in G' reaching some ql ≠ x, wt ∈ δ(ql, x) is the final exit into x, and wl ∈ δ(x, x)* is a (possibly empty) loop at x.

    Together these are used to show that every word accepted by G is also accepted by G.rip x ….

    theorem Sipser.rip_language_forward {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (x : Q) (hxs : x G.start) (hxa : x G.accept) (w : List σ) :
    w G.languagew (G.rip x hxs hxa).language

    Every word accepted by G is also accepted by the ripped GNFA G.rip x …: L(G) ⊆ L(G.rip x …).

    theorem Sipser.kstar_to_acceptPath {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (q : Q) {v : List σ} (hv : v KStar.kstar (G.δ q q).matches') :
    G.AcceptPath q q v

    Any word in the Kleene star of the self-loop regex δ(q, q)* labels some accepting path of G from q back to q.

    theorem Sipser.rip_language_backward {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) (x : Q) (hxs : x G.start) (hxa : x G.accept) (w : List σ) :
    w (G.rip x hxs hxa).languagew G.language

    Every word accepted by the ripped GNFA is accepted by the original: L(G.rip x …) ⊆ L(G). Combined with rip_language_forward, this gives L(G.rip x …) = L(G).

    theorem Sipser.gnfa_to_regex_aux (n : ) (Q : Type u_1) (σ : Type u_2) [inst1 : Fintype Q] [inst2 : DecidableEq Q] :
    Fintype.card Q = n∀ (G : GNFA Q σ), ∃ (R : RegularExpression σ), G.language = R.matches'

    Auxiliary induction on the number of states. For every n, every GNFA with exactly n states has an equivalent regular expression: the base case n = 2 is handled by gnfa_language_of_two_states, and the inductive step uses GNFA.rip to remove an interior state.

    theorem Sipser.gnfa_to_regex {Q : Type u_1} {σ : Type u_2} [Fintype Q] [DecidableEq Q] (G : GNFA Q σ) :

    Lemma (GNFA → Regular Expressions). Every GNFA G has an equivalent regular expression R, i.e. L(G) = L(R).

    The regular expression matching the single-symbol language {[a] | a ∈ l} — a finite union of char regexes (or zero if the list is empty).

    Instances For
      theorem Sipser.mem_regexCharUnion_matches {α : Type u_1} {l w : List α} :
      w (regexCharUnion l).matches' al, w = [a]

      Characterisation of the language matched by regexCharUnion l: w is matched iff w = [a] for some a appearing in l.

      noncomputable def Sipser.dfaCharRegex {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq σ] (M : DFA α σ) (s s' : σ) :

      For a DFA M and states s, s', dfaCharRegex M s s' is the regular expression matching exactly the single-letter words [a] for which M.step s a = s'.

      Instances For
        theorem Sipser.mem_dfaCharRegex_matches {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) (s s' : σ) (w : List α) :
        w (dfaCharRegex M s s').matches' ∃ (a : α), M.step s a = s' w = [a]

        The matching characterisation for dfaCharRegex: w is matched iff w = [a] for some symbol a with M.step s a = s'.

        noncomputable def Sipser.dfaToGNFA {α : Type u_1} {σ : Type u_2} [Fintype α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) :
        GNFA (Option (Option σ)) α

        DFA → GNFA conversion.

        Given a DFA M with states σ, build an equivalent GNFA whose state type is Option (Option σ):

        • the outer none is the new GNFA start state;
        • some none is the new GNFA accept state;
        • some (some s) corresponds to an original DFA state s ∈ σ.

        Transitions are: an ε-edge from start to some (some M.start) (and directly to the accept state when M.start ∈ M.accept), an ε-edge from every DFA-accept state to the new accept state, and between two DFA states s, s' the regex dfaCharRegex M s s'. All other arrows are zero, and the no-enter-start / no-exit-accept conditions hold by construction.

        Instances For
          theorem Sipser.toGNFA_acceptPath_dfa {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) (s : σ) (w : List α) (hw : M.evalFrom s w M.accept) :

          Simulating a DFA computation inside dfaToGNFA: if running M from state s on input w leads to an accept state, then w labels a GNFA accept path from some (some s) to the new accept state some none.

          theorem Sipser.toGNFA_language_forward {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) (w : List α) (hw : w M.accepts) :

          Forward inclusion L(M) ⊆ L(dfaToGNFA M): every word accepted by the DFA is accepted by the converted GNFA.

          theorem Sipser.toGNFA_decompose_from_start {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) {q₂ : Option (Option σ)} {w : List α} (hp : (dfaToGNFA M).AcceptPath none q₂ w) :
          q₂ = none w = [] ∃ (q_first : Option (Option σ)) (w_exit : List α) (w_rest : List α), w_exit ((dfaToGNFA M).δ none q_first).matches' (dfaToGNFA M).AcceptPath q_first q₂ w_rest w = w_exit ++ w_rest

          Decomposition of any accept path starting at the GNFA start state none: either the path is the trivial empty self-loop, or it consists of a first exit-step from none to some q_first (consuming w_exit) followed by a residual accept path from q_first (consuming w_rest), with w = w_exit ++ w_rest.

          theorem Sipser.toGNFA_dfa_path {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) {s : σ} {q₂ : Option (Option σ)} {u : List α} (hp : (dfaToGNFA M).AcceptPath (some (some s)) q₂ u) :
          (∃ (s' : σ), q₂ = some (some s') M.evalFrom s u = s') q₂ = some none M.evalFrom s u M.accept

          Any GNFA accept path starting from a DFA-state some (some s) mirrors a DFA computation: either it ends in another DFA-state some (some s') with M.evalFrom s u = s', or it ends in the GNFA accept state some none and M.evalFrom s u is an accepting state of M.

          theorem Sipser.toGNFA_accept_path_nil {α : Type u_1} {σ : Type u_2} [Fintype α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) {q₂ : Option (Option σ)} {u : List α} (hp : (dfaToGNFA M).AcceptPath (some none) q₂ u) :
          q₂ = some none u = []

          The GNFA accept state some none has no outgoing edges (all labels are zero), so any accept path starting there must be trivial: the endpoint is still some none and the consumed word is empty.

          theorem Sipser.toGNFA_language_backward {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) (w : List α) (hw : w (dfaToGNFA M).language) :

          Backward inclusion L(dfaToGNFA M) ⊆ L(M): every word accepted by the converted GNFA is accepted by the original DFA.

          theorem Sipser.toGNFA_language {α : Type u_1} {σ : Type u_2} [Fintype α] [DecidableEq α] [Fintype σ] [DecidableEq σ] (M : DFA α σ) :

          DFA → GNFA correctness. The GNFA dfaToGNFA M recognises exactly the same language as the DFA M: L(dfaToGNFA M) = L(M). Combined with gnfa_to_regex, this shows every regular language is described by a regular expression.