Prepending a single word w ∈ L to a Kleene-star word p ∈ L* yields
a Kleene-star word: w ++ p ∈ L*.
The Kleene star is closed under concatenation: if p, q ∈ L* then
p ++ q ∈ L*.
Splitting a Kleene-star concatenation across a fixed cut.
If x ++ y equals the flattening of a list S of L-words, then either:
- the cut lies inside some element of
S— there is a prefixp ∈ L*withx = p ++ c, anL-wordc ++ dstraddling the cut, and a suffixrest ∈ L*withy = d ++ rest; or - the cut lies after every element, in which case
x ∈ L*andy = [].
Theorem (Closure under star). If L is a regular language then so
is its Kleene star L*.
In a 2-element type with two distinguished distinct elements a, b,
every element equals a or b.
In a type with more than two elements, given two distinct elements
a, b, there exists a third element distinct from both — used to pick a
state to "rip" in the GNFA → regex construction.
Base case of GNFA → regex. When the GNFA has only two states (the
start and accept states), its language coincides with the language of the
regular expression labelling the single arrow from start to accept.
The "rip" construction (state elimination).
Given a GNFA G and a state x that is neither the start nor the accept
state, G.rip x hxs hxa is the GNFA on the states {q // q ≠ x} obtained
by removing x and, for every remaining pair of states qᵢ, qⱼ,
replacing the label δ(qᵢ, qⱼ) by the union
δ(qᵢ, qⱼ) ∪ δ(qᵢ, x) · δ(x, x)* · δ(x, qⱼ)
so that every path that used to pass through x is now captured by a
direct arrow. This is the inductive step of the proof that every GNFA
has an equivalent regular expression.
Instances For
Ripping out the state x strictly decreases the cardinality, providing
the well-founded measure used in the recursion gnfa_to_regex_aux.
Forward direction of the rip-equivalence, in two parts.
Let G' = G.rip x hxs hxa. For any accepting path of G from q to q'
labelled by w:
- if both endpoints avoid
x(q ≠ xandq' ≠ x), then the same wordwalready labels an accepting path ofG'; - if the source
qavoidsxbut the target equalsx, then the path can be decomposed asw = wp ++ wt ++ wlwherewpis realised inG'reaching someql ≠ x,wt ∈ δ(ql, x)is the final exit intox, andwl ∈ δ(x, x)*is a (possibly empty) loop atx.
Together these are used to show that every word accepted by G is also
accepted by G.rip x ….
Any word in the Kleene star of the self-loop regex δ(q, q)* labels
some accepting path of G from q back to q.
Every word accepted by the ripped GNFA is accepted by the original:
L(G.rip x …) ⊆ L(G). Combined with rip_language_forward, this gives
L(G.rip x …) = L(G).
Auxiliary induction on the number of states. For every n, every
GNFA with exactly n states has an equivalent regular expression: the
base case n = 2 is handled by gnfa_language_of_two_states, and the
inductive step uses GNFA.rip to remove an interior state.
Lemma (GNFA → Regular Expressions). Every GNFA G has an
equivalent regular expression R, i.e. L(G) = L(R).
The regular expression matching the single-symbol language
{[a] | a ∈ l} — a finite union of char regexes (or zero if the
list is empty).
Instances For
Characterisation of the language matched by regexCharUnion l:
w is matched iff w = [a] for some a appearing in l.
For a DFA M and states s, s', dfaCharRegex M s s' is the regular
expression matching exactly the single-letter words [a] for which
M.step s a = s'.
Instances For
The matching characterisation for dfaCharRegex: w is matched iff
w = [a] for some symbol a with M.step s a = s'.
DFA → GNFA conversion.
Given a DFA M with states σ, build an equivalent GNFA whose state
type is Option (Option σ):
- the outer
noneis the new GNFAstartstate; some noneis the new GNFAacceptstate;some (some s)corresponds to an original DFA states ∈ σ.
Transitions are: an ε-edge from start to some (some M.start) (and
directly to the accept state when M.start ∈ M.accept), an ε-edge from
every DFA-accept state to the new accept state, and between two DFA
states s, s' the regex dfaCharRegex M s s'. All other arrows are
zero, and the no-enter-start / no-exit-accept conditions hold by
construction.
Instances For
Simulating a DFA computation inside dfaToGNFA: if running M from
state s on input w leads to an accept state, then w labels a GNFA
accept path from some (some s) to the new accept state some none.
Forward inclusion L(M) ⊆ L(dfaToGNFA M): every word accepted by the
DFA is accepted by the converted GNFA.
Decomposition of any accept path starting at the GNFA start state
none: either the path is the trivial empty self-loop, or it consists of
a first exit-step from none to some q_first (consuming w_exit)
followed by a residual accept path from q_first (consuming w_rest),
with w = w_exit ++ w_rest.
Any GNFA accept path starting from a DFA-state some (some s) mirrors
a DFA computation: either it ends in another DFA-state some (some s')
with M.evalFrom s u = s', or it ends in the GNFA accept state
some none and M.evalFrom s u is an accepting state of M.
The GNFA accept state some none has no outgoing edges (all labels
are zero), so any accept path starting there must be trivial: the
endpoint is still some none and the consumed word is empty.
Backward inclusion L(dfaToGNFA M) ⊆ L(M): every word accepted by
the converted GNFA is accepted by the original DFA.
DFA → GNFA correctness. The GNFA dfaToGNFA M recognises exactly
the same language as the DFA M: L(dfaToGNFA M) = L(M). Combined with
gnfa_to_regex, this shows every regular language is described by a
regular expression.