A Context Free Grammar (CFG) over the terminal alphabet T.
In Sipser's notation, a CFG is a 4-tuple G = (V, Σ, R, S) where V is a finite
set of variables (nonterminals), Σ = T is the set of terminal symbols, R is a
finite set of production rules of the form V → (V ∪ Σ)*, and S ∈ V is the
start variable.
Instances For
A production rule of a CFG: an element of the form A → α where A ∈ V is a
nonterminal and α ∈ (V ∪ Σ)* is a sentential form.
Instances For
A grammar symbol: either a terminal t ∈ T or a nonterminal A ∈ N.
Instances For
A sentential form is a finite string over the combined alphabet of terminals
and nonterminals, i.e. an element of (V ∪ Σ)*.
Instances For
The language generated by the grammar g:
L(G) = { w ∈ Σ* | S ⇒* w }.
Instances For
A language L ⊆ T* is a context free language (CFL) if there exists a CFG
G with L = L(G).
Instances For
A rule r : A → α is in Chomsky Normal Form if its right-hand side has one
of the following forms:
A → BCwhereB, Care nonterminals distinct from the start variable, orA → awhereais a single terminal symbol, orS → εwhereSis the start variable (the only allowed ε-production).
Instances For
A grammar g is in Chomsky Normal Form iff every one of its rules is.
Instances For
Lemma 1 (Sipser, Lecture 7): Every context free grammar can be converted into an equivalent grammar in Chomsky Normal Form (i.e. one generating the same language).
Refinement of cfg_to_cnf producing an equivalent CNF grammar whose
nonterminal type is additionally a Fintype with decidable equality. This finite
witness is needed for the pumping-length argument, which counts nonterminals.
DerivesIn g u w n means u ⇒* w in exactly n derivation steps in g.
This is a counted refinement of g.Derives used to formalise step-counting
arguments such as Lemma 2 below.
- refl {T : Type u_1} {g : ContextFreeGrammar T} (u : List (Symbol T g.NT)) : DerivesIn g u u 0
- step {T : Type u_1} {g : ContextFreeGrammar T} {u v w : List (Symbol T g.NT)} {n : ℕ} : g.Produces u v → DerivesIn g v w n → DerivesIn g u w (n + 1)
Instances For
A string consisting only of terminals (as in w.map Symbol.terminal) contains
no nonterminal symbols.
In a CNF grammar, the start variable S never appears on the right-hand side
of any rule. This is forced by the definition of IsCNFRule, which requires the
two nonterminals in A → BC to differ from the start variable.
"Absence of the start variable" is preserved by one-step derivations in a CNF
grammar: if S ∉ sf and sf ⇒ v, then S ∉ v.
Potential function for a single derivation step in a CNF grammar.
Defining Φ(sf) := 2·tCount(sf) + ntCount(sf), one CNF production step (other
than the start ε-rule, which is forbidden once S ∉ sf) increases Φ by exactly
one. This is the key invariant behind Lemma 2 (cnf_derivation_length).
Iterating the per-step potential identity: after n CNF derivation steps
starting from a sentential form not containing S, the potential Φ has
increased by exactly n.
Lemma 2 (Sipser, Lecture 7): If H is in Chomsky Normal Form and
w ∈ L(H) is nonempty, then every derivation of w from the start symbol has
exactly 2|w| − 1 steps. Proved by combining the potential identity
cnf_derivesIn_cost with the fact that the first step must produce either
BC (cost +2) or a single terminal (cost +2 toward 2|w|).
A Pushdown Automaton (PDA), Sipser, Lecture 4.
Formally, a PDA is a 6-tuple (Q, Σ, Γ, δ, q₀, F) where Σ = α is the input
alphabet, Γ = γ is the stack alphabet, Q = σ is the set of states,
q₀ ∈ Q is the start state, F ⊆ Q is the set of accept states, and
δ : Q × Σ_ε × Γ_ε → 𝒫(Q × Γ_ε) is the transition relation (where _ε means an
optional symbol, encoded here by Option). A move (q', push) ∈ step q a pop
means: in state q, optionally reading input a and optionally popping pop
from the stack, go to q' and optionally push push.
- start : σ
- accept : Set σ
Instances For
An instantaneous configuration of a PDA: a triple (q, input, stack)
recording the current state, the remaining input to be read, and the current
stack contents (top at the head of the list).
Instances For
One step of the PDA: read (optionally) an input symbol a, pop (optionally)
pop from the top of the stack, transition to q', and push (optionally) push
on top of the stack.
- mk {α : Type u} {σ : Type v} {γ : Type w} {M : PDA α σ γ} (q q' : σ) (a : Option α) (pop push : Option γ) (input : List α) (stack : List γ) : (q', push) ∈ M.step q a pop → M.Step (q, a.toList ++ input, pop.toList ++ stack) (q', input, push.toList ++ stack)
Instances For
Pumping one level. Given a self-loop derivation
A ⇒* uAv together with any derivation A ⇒* target, we can derive
A ⇒* u·target·v by first taking the loop and then substituting target for the
internal A.
Iterated pumping. Iterating derives_pump_once i times yields
A ⇒* u^i · target · v^i. This is the source of the v^i x y^i structure in the
CFL pumping lemma.
Splitting a one-step derivation across a concatenation. If u ++ v ⇒ w,
then the single rule application happened entirely inside u or entirely inside
v; correspondingly w decomposes as u' ++ v (with u ⇒ u') or as
u ++ v' (with v ⇒ v').
Splitting a multi-step derivation across a concatenation. If
u ++ v ⇒* w, then w = w₁ ++ w₂ with u ⇒* w₁ and v ⇒* w₂.
A string consisting only of terminals cannot derive anything in one step:
a CFG rule requires a nonterminal on its left-hand side, so no rewrite position
exists inside w.map Symbol.terminal.
Any decomposition of w.map Symbol.terminal as w₁ ++ w₂ lifts to a
decomposition of w itself into a ++ b (with w₁ = a.map Symbol.terminal and
w₂ = b.map Symbol.terminal).
A parse tree of a CNF grammar g, rooted at the nonterminal A.
In CNF, each internal node is generated by a rule A → BC and has exactly two
children rooted at B and C; each leaf is generated by a rule A → a for some
terminal a. The CNF ε-rule S → ε is excluded here, since we only build parse
trees for nonempty strings.
- leaf {T : Type u_1} {g : ContextFreeGrammar T} (A : g.NT) (a : T) (hr : { input := A, output := [Symbol.terminal a] } ∈ g.rules) : CNFParseTree g A
- node {T : Type u_1} {g : ContextFreeGrammar T} (A B C : g.NT) (hr : { input := A, output := [Symbol.nonterminal B, Symbol.nonterminal C] } ∈ g.rules) (left : CNFParseTree g B) (right : CNFParseTree g C) : CNFParseTree g A
Instances For
The yield of a parse tree: the list of terminals at its leaves, read left-to-right.
Instances For
The height of a parse tree: leaves have height 0, and an internal node
has height 1 + max(height(left), height(right)).
Instances For
Yield-height bound. In a binary parse tree, the number of leaves is at most
2^h where h is the height. This is the pigeonhole bound that gives the
pumping length 2^(k+1) in the CFL pumping lemma.
Soundness of parse trees. Every parse tree rooted at A witnesses a
derivation A ⇒* yield(t) in the grammar.
A parse tree (which has no ε-leaves by construction) always yields a nonempty string of terminals.
The nonterminal label appearing at depth d along the tallest path of t
(breaking ties by descending into the left subtree). The pumping argument inspects
this sequence of labels to find a repeat by pigeonhole.
Instances For
Decomposition with nonempty context. For any positive depth d along the
tallest path of t, the tree splits as
yield(t) = lc ++ yield(sub) ++ rc
where sub is the subtree rooted at the nonterminal R = ntAtDepth t d, the
derivation A ⇒* lc · R · rc holds, the subtree has height height(t) − d, and
the surrounding context lc ++ rc is nonempty (because we went down at least one
internal node and so picked off a sibling subtree, which yields a nonempty
string). This is what produces the vy ≠ ε clause of the CFL pumping lemma.
Computation rule for ntAtDepth at a node: at depth n > 0, descend into
the taller child and ask for depth n − 1.
Decomposition that is coherent with deeper sampling. Like
exists_decomp_nonempty_ctx, but also guarantees that the nonterminal at depth
d₂ of the extracted subtree sub coincides with the nonterminal at depth
d₁ + d₂ of the original tree t. This coherence lets one apply two layers of
exists_decomp_… in tree_pump and ensure that the inner pumped nonterminal
matches the outer one (so the loop A ⇒* uAv is well-formed).
Inversion principle: a one-step derivation [A] ⇒ v from a singleton
sentential form must use a rule whose left-hand side is A, and v is that
rule's right-hand side.
A derivation [A] ⇒* w to a nonempty terminal string must contain at least
one step, and so factors as a first one-step [A] ⇒ v followed by v ⇒* w.
In a CNF grammar, only the start variable can erase a string: if B ≠ S and
B ⇒* w, then w is nonempty. (The only ε-rule allowed in CNF has S on its
left-hand side, and the start variable never appears on a right-hand side.)
Existence of a parse tree, by strong induction on the length of the yield.
If g is in CNF and [A] ⇒* w with w nonempty and |w| ≤ n, then there is a
CNFParseTree g A whose yield is w. The proof inverts the first derivation
step against the three CNF rule forms and recurses on the two halves produced by
an A → BC rule.
Completeness of CNF parse trees: every nonempty string in L(g) (for a CNF
grammar g) is the yield of some parse tree rooted at the start variable.
Tree-level pumping lemma. If a parse tree t (rooted at A) has height
exceeding the number of nonterminals k = |V|, then by pigeonhole some
nonterminal R repeats along the tallest path. The two occurrences of R
decompose yield(t) = u v x y z with:
A ⇒* u v^i x y^i zfor everyi ≥ 0(usingderives_pump_iter),vy ≠ ε(fromexists_decomp_nonempty_ctx), and|vxy| ≤ 2^(k+1)(from the yield/height bound applied to the inner subtree).
Pumping lemma for CNF grammars. If g is a CNF grammar with k = |V|
nonterminals and s ∈ L(g) has length |s| ≥ 2^(k+1), then s = u v x y z
with u v^i x y^i z ∈ L(g) for all i ≥ 0, vy ≠ ε, and |vxy| ≤ 2^(k+1).
The argument: the height bound |s| ≤ 2^h forces h > k, so tree_pump
applies to a parse tree for s.
Pumping Lemma for Context Free Languages (Sipser, Lecture 5).
For every CFL A there exists a pumping length p ∈ ℕ such that every string
s ∈ A with |s| ≥ p admits a decomposition s = uvxyz satisfying:
u v^i x y^i z ∈ Afor alli ≥ 0,vy ≠ ε, and|vxy| ≤ p.
Proof: pick a CFG for A, convert it to a CNF grammar g with a finite
nonterminal type via ChomskyNormalForm.cfg_to_cnf_fintype, and apply
cnf_pumping_core with p = 2^(|V|+1).