Atlas.TheoryOfComputation.code.ContextFree

@[reducible, inline]

Type (max 1 u_1)

A Context Free Grammar (CFG) over the terminal alphabet T.

In Sipser's notation, a CFG is a 4-tuple G = (V, Σ, R, S) where V is a finite set of variables (nonterminals), Σ = T is the set of terminal symbols, R is a finite set of production rules of the form V → (V ∪ Σ)*, and S ∈ V is the start variable.

Instances For

source

@[reducible, inline]

abbrev ContextFree.Rule (T : Type u_1) (N : Type u_2) :

Type (max u_1 u_2)

A production rule of a CFG: an element of the form A → α where A ∈ V is a nonterminal and α ∈ (V ∪ Σ)* is a sentential form.

Instances For

source

@[reducible, inline]

abbrev ContextFree.Sym (T : Type u_2) (N : Type u_3) :

Type (max u_2 u_3)

A grammar symbol: either a terminal t ∈ T or a nonterminal A ∈ N.

Instances For

source

@[reducible, inline]

abbrev ContextFree.SententialForm (T : Type u_2) (N : Type u_3) :

Type (max u_3 u_2)

A sentential form is a finite string over the combined alphabet of terminals and nonterminals, i.e. an element of (V ∪ Σ)*.

Instances For

source

@[reducible, inline]

abbrev ContextFree.Produces {T : Type u_1} (g : CFG T) (u v : List (Symbol T g.NT)) :

Prop

One-step derivation u ⇒ v: v is obtained from u by applying a single production rule of g at one position.

Instances For

source

@[reducible, inline]

abbrev ContextFree.Derives {T : Type u_1} (g : CFG T) :

List (Symbol T g.NT) → List (Symbol T g.NT) → Prop

Multi-step derivation u ⇒* v: the reflexive transitive closure of Produces g, i.e. v is obtained from u by zero or more rule applications.

Instances For

source

@[reducible, inline]

abbrev ContextFree.Generates {T : Type u_1} (g : CFG T) (s : List (Symbol T g.NT)) :

Prop

g.Generates w means the start symbol S derives the sentential form w, i.e. S ⇒* w (where w is a list of symbols, not yet restricted to terminals).

Instances For

source

@[reducible, inline]

abbrev ContextFree.language {T : Type u_1} (g : CFG T) :

Language T

The language generated by the grammar g: L(G) = { w ∈ Σ* | S ⇒* w }.

Instances For

source

@[reducible, inline]

abbrev ContextFree.IsContextFree {T : Type u_1} (L : Language T) :

Prop

A language L ⊆ T* is a context free language (CFL) if there exists a CFG G with L = L(G).

Instances For

source

def ChomskyNormalForm.IsCNFRule {T : Type u_1} {N : Type u_2} (r : ContextFreeRule T N) (startVar : N) :

Prop

A rule r : A → α is in Chomsky Normal Form if its right-hand side has one of the following forms:

A → BC where B, C are nonterminals distinct from the start variable, or
A → a where a is a single terminal symbol, or
S → ε where S is the start variable (the only allowed ε-production).

Instances For

source

def ChomskyNormalForm.IsCNF {T : Type u_1} (g : ContextFreeGrammar T) :

Prop

A grammar g is in Chomsky Normal Form iff every one of its rules is.

Instances For

source

theorem ChomskyNormalForm.cfg_to_cnf {T : Type u_1} (g : ContextFreeGrammar T) :

∃ (g' : ContextFreeGrammar T), IsCNF g' ∧ g'.language = g.language

Lemma 1 (Sipser, Lecture 7): Every context free grammar can be converted into an equivalent grammar in Chomsky Normal Form (i.e. one generating the same language).

source

theorem ChomskyNormalForm.cfg_to_cnf_fintype {T : Type u_1} (g : ContextFreeGrammar T) :

∃ (g' : ContextFreeGrammar T) (x : Fintype g'.NT) (x : DecidableEq g'.NT), IsCNF g' ∧ g'.language = g.language

Refinement of cfg_to_cnf producing an equivalent CNF grammar whose nonterminal type is additionally a Fintype with decidable equality. This finite witness is needed for the pumping-length argument, which counts nonterminals.

source

inductive ChomskyNormalForm.DerivesIn {T : Type u_1} (g : ContextFreeGrammar T) :

List (Symbol T g.NT) → List (Symbol T g.NT) → ℕ → Prop

DerivesIn g u w n means u ⇒* w in exactly n derivation steps in g. This is a counted refinement of g.Derives used to formalise step-counting arguments such as Lemma 2 below.

refl {T : Type u_1} {g : ContextFreeGrammar T} (u : List (Symbol T g.NT)) : DerivesIn g u u 0
step {T : Type u_1} {g : ContextFreeGrammar T} {u v w : List (Symbol T g.NT)} {n : ℕ} : g.Produces u v → DerivesIn g v w n → DerivesIn g u w (n + 1)

Instances For

source

def ChomskyNormalForm.ntCount {T : Type u_1} {N : Type u_2} :

List (Symbol T N) → ℕ

Counts the number of nonterminal symbols occurring in a sentential form.

Instances For

source

def ChomskyNormalForm.tCount {T : Type u_1} {N : Type u_2} :

List (Symbol T N) → ℕ

Counts the number of terminal symbols occurring in a sentential form.

Instances For

source

theorem ChomskyNormalForm.ntCount_append {T : Type u_1} {N : Type u_2} (l₁ l₂ : List (Symbol T N)) :

ntCount (l₁ ++ l₂) = ntCount l₁ + ntCount l₂

The nonterminal count is additive over list concatenation.

source

theorem ChomskyNormalForm.tCount_append {T : Type u_1} {N : Type u_2} (l₁ l₂ : List (Symbol T N)) :

tCount (l₁ ++ l₂) = tCount l₁ + tCount l₂

The terminal count is additive over list concatenation.

source

theorem ChomskyNormalForm.ntCount_map_terminal {T : Type u_1} {N : Type u_2} (w : List T) :

ntCount (List.map Symbol.terminal w) = 0

A string consisting only of terminals (as in w.map Symbol.terminal) contains no nonterminal symbols.

source

theorem ChomskyNormalForm.tCount_map_terminal {T : Type u_1} {N : Type u_2} (w : List T) :

tCount (List.map Symbol.terminal w) = w.length

A string of terminals contributes exactly w.length terminals to the tCount measure.

source

theorem ChomskyNormalForm.cnf_start_not_in_output {T : Type u_1} (g : ContextFreeGrammar T) (hcnf : IsCNF g) (r : ContextFreeRule T g.NT) (hr : r ∈ g.rules) :

Symbol.nonterminal g.initial ∉ r.output

In a CNF grammar, the start variable S never appears on the right-hand side of any rule. This is forced by the definition of IsCNFRule, which requires the two nonterminals in A → BC to differ from the start variable.

source

theorem ChomskyNormalForm.cnf_start_not_in_produces {T : Type u_1} (g : ContextFreeGrammar T) (hcnf : IsCNF g) {sf v : List (Symbol T g.NT)} (hp : g.Produces sf v) (hsf : Symbol.nonterminal g.initial ∉ sf) :

Symbol.nonterminal g.initial ∉ v

"Absence of the start variable" is preserved by one-step derivations in a CNF grammar: if S ∉ sf and sf ⇒ v, then S ∉ v.

source

theorem ChomskyNormalForm.cnf_produces_cost_step {T : Type u_1} (g : ContextFreeGrammar T) (hcnf : IsCNF g) {sf v : List (Symbol T g.NT)} (hp : g.Produces sf v) (hsf : Symbol.nonterminal g.initial ∉ sf) :

2 * tCount v + ntCount v = 2 * tCount sf + ntCount sf + 1

Potential function for a single derivation step in a CNF grammar. Defining Φ(sf) := 2·tCount(sf) + ntCount(sf), one CNF production step (other than the start ε-rule, which is forbidden once S ∉ sf) increases Φ by exactly one. This is the key invariant behind Lemma 2 (cnf_derivation_length).

source

theorem ChomskyNormalForm.cnf_derivesIn_cost {T : Type u_1} (g : ContextFreeGrammar T) (hcnf : IsCNF g) {sf w : List (Symbol T g.NT)} {n : ℕ} (hd : DerivesIn g sf w n) (hsf : Symbol.nonterminal g.initial ∉ sf) :

2 * tCount w + ntCount w = 2 * tCount sf + ntCount sf + n

Iterating the per-step potential identity: after n CNF derivation steps starting from a sentential form not containing S, the potential Φ has increased by exactly n.

source

theorem ChomskyNormalForm.cnf_derivation_length {T : Type u_1} (g : ContextFreeGrammar T) (hcnf : IsCNF g) (w : List T) (hw : w ≠ []) (hw_mem : w ∈ g.language) (n : ℕ) (hderiv : DerivesIn g [Symbol.nonterminal g.initial] (List.map Symbol.terminal w) n) :

n = 2 * w.length - 1

Lemma 2 (Sipser, Lecture 7): If H is in Chomsky Normal Form and w ∈ L(H) is nonempty, then every derivation of w from the start symbol has exactly 2|w| − 1 steps. Proved by combining the potential identity cnf_derivesIn_cost with the fact that the first step must produce either BC (cost +2) or a single terminal (cost +2 toward 2|w|).

source

structure PDA (α : Type u) (σ : Type v) (γ : Type w) :

Type (max (max u v) w)

A Pushdown Automaton (PDA), Sipser, Lecture 4.

Formally, a PDA is a 6-tuple (Q, Σ, Γ, δ, q₀, F) where Σ = α is the input alphabet, Γ = γ is the stack alphabet, Q = σ is the set of states, q₀ ∈ Q is the start state, F ⊆ Q is the set of accept states, and δ : Q × Σ_ε × Γ_ε → 𝒫(Q × Γ_ε) is the transition relation (where _ε means an optional symbol, encoded here by Option). A move (q', push) ∈ step q a pop means: in state q, optionally reading input a and optionally popping pop from the stack, go to q' and optionally push push.

step : σ → Option α → Option γ → Set (σ × Option γ)
start : σ
accept : Set σ

Instances For

source

@[reducible, inline]

abbrev PDA.Config {α : Type u} {σ : Type v} {γ : Type w} :

Type (max v w u)

An instantaneous configuration of a PDA: a triple (q, input, stack) recording the current state, the remaining input to be read, and the current stack contents (top at the head of the list).

Instances For

source

inductive PDA.Step {α : Type u} {σ : Type v} {γ : Type w} (M : PDA α σ γ) :

Config → Config → Prop

One step of the PDA: read (optionally) an input symbol a, pop (optionally) pop from the top of the stack, transition to q', and push (optionally) push on top of the stack.

mk {α : Type u} {σ : Type v} {γ : Type w} {M : PDA α σ γ} (q q' : σ) (a : Option α) (pop push : Option γ) (input : List α) (stack : List γ) : (q', push) ∈ M.step q a pop → M.Step (q, a.toList ++ input, pop.toList ++ stack) (q', input, push.toList ++ stack)

Instances For

source

def PDA.Reaches {α : Type u} {σ : Type v} {γ : Type w} (M : PDA α σ γ) :

Config → Config → Prop

The reflexive transitive closure of Step: c ⊢* c' means c can reach c' in zero or more PDA moves.

Instances For

source

def PDA.Accepts {α : Type u} {σ : Type v} {γ : Type w} (M : PDA α σ γ) (w : List α) :

Prop

M accepts the input string w if, starting from (q₀, w, ε), some sequence of moves consumes all of w and ends in an accept state (the final stack contents are arbitrary).

Instances For

source

def PDA.language {α : Type u} {σ : Type v} {γ : Type w} (M : PDA α σ γ) :

Language α

The language recognised by the PDA: L(M) = { w ∈ Σ* | M accepts w }.

Instances For

source

def listPow {α : Type u_1} (l : List α) :

ℕ → List α

listPow l n is the list l repeated n times (i.e. l^n in the free monoid of lists). Used to express the pumped strings v^i and y^i in the CFL pumping lemma.

Instances For

source

@[simp]

theorem listPow_succ {α : Type u_1} (l : List α) (n : ℕ) :

listPow l (n + 1) = l ++ listPow l n

l^(n+1) = l ++ l^n.

source

theorem listPow_comm {α : Type u_1} (l : List α) (n : ℕ) :

l ++ listPow l n = listPow l n ++ l

The list power commutes with prepending vs. appending: l ++ l^n = l^n ++ l.

source

theorem listPow_succ_right {α : Type u_1} (l : List α) (n : ℕ) :

listPow l (n + 1) = listPow l n ++ l

Right-recursive form: l^(n+1) = l^n ++ l.

source

theorem CFLPumping.derives_pump_once {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} {u v target : List (Symbol T g.NT)} (hloop : g.Derives [Symbol.nonterminal A] (u ++ [Symbol.nonterminal A] ++ v)) (htarget : g.Derives [Symbol.nonterminal A] target) :

g.Derives [Symbol.nonterminal A] (u ++ target ++ v)

Pumping one level. Given a self-loop derivation A ⇒* uAv together with any derivation A ⇒* target, we can derive A ⇒* u·target·v by first taking the loop and then substituting target for the internal A.

source

theorem CFLPumping.derives_pump_iter {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} {u v target : List (Symbol T g.NT)} (hloop : g.Derives [Symbol.nonterminal A] (u ++ [Symbol.nonterminal A] ++ v)) (htarget : g.Derives [Symbol.nonterminal A] target) (i : ℕ) :

g.Derives [Symbol.nonterminal A] (listPow u i ++ target ++ listPow v i)

Iterated pumping. Iterating derives_pump_once i times yields A ⇒* u^i · target · v^i. This is the source of the v^i x y^i structure in the CFL pumping lemma.

source

theorem CFLPumping.produces_append_split {T : Type u_1} {g : ContextFreeGrammar T} {u v w : List (Symbol T g.NT)} (h : g.Produces (u ++ v) w) :

(∃ (u' : List (Symbol T g.NT)), w = u' ++ v ∧ g.Produces u u') ∨ ∃ (v' : List (Symbol T g.NT)), w = u ++ v' ∧ g.Produces v v'

Splitting a one-step derivation across a concatenation. If u ++ v ⇒ w, then the single rule application happened entirely inside u or entirely inside v; correspondingly w decomposes as u' ++ v (with u ⇒ u') or as u ++ v' (with v ⇒ v').

source

theorem CFLPumping.derives_append_split {T : Type u_1} {g : ContextFreeGrammar T} {u v w : List (Symbol T g.NT)} (h : g.Derives (u ++ v) w) :

∃ (w1 : List (Symbol T g.NT)) (w2 : List (Symbol T g.NT)), w = w1 ++ w2 ∧ g.Derives u w1 ∧ g.Derives v w2

Splitting a multi-step derivation across a concatenation. If u ++ v ⇒* w, then w = w₁ ++ w₂ with u ⇒* w₁ and v ⇒* w₂.

source

theorem CFLPumping.all_terminals_not_produces {T : Type u_1} {g : ContextFreeGrammar T} {w : List T} {v : List (Symbol T g.NT)} (h : g.Produces (List.map Symbol.terminal w) v) :

False

A string consisting only of terminals cannot derive anything in one step: a CFG rule requires a nonterminal on its left-hand side, so no rewrite position exists inside w.map Symbol.terminal.

source

theorem CFLPumping.map_terminal_split {T : Type u_1} {N : Type u_2} (w : List T) (w1 w2 : List (Symbol T N)) (h : List.map Symbol.terminal w = w1 ++ w2) :

∃ (a : List T) (b : List T), w1 = List.map Symbol.terminal a ∧ w2 = List.map Symbol.terminal b ∧ w = a ++ b

Any decomposition of w.map Symbol.terminal as w₁ ++ w₂ lifts to a decomposition of w itself into a ++ b (with w₁ = a.map Symbol.terminal and w₂ = b.map Symbol.terminal).

source

inductive CFLPumping.CNFParseTree {T : Type u_1} (g : ContextFreeGrammar T) :

g.NT → Type u_1

A parse tree of a CNF grammar g, rooted at the nonterminal A.

In CNF, each internal node is generated by a rule A → BC and has exactly two children rooted at B and C; each leaf is generated by a rule A → a for some terminal a. The CNF ε-rule S → ε is excluded here, since we only build parse trees for nonempty strings.

leaf {T : Type u_1} {g : ContextFreeGrammar T} (A : g.NT) (a : T) (hr : { input := A, output := [Symbol.terminal a] } ∈ g.rules) : CNFParseTree g A
node {T : Type u_1} {g : ContextFreeGrammar T} (A B C : g.NT) (hr : { input := A, output := [Symbol.nonterminal B, Symbol.nonterminal C] } ∈ g.rules) (left : CNFParseTree g B) (right : CNFParseTree g C) : CNFParseTree g A

Instances For

source

def CFLPumping.CNFParseTree.yield {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} :

CNFParseTree g A → List T

The yield of a parse tree: the list of terminals at its leaves, read left-to-right.

Instances For

source

def CFLPumping.CNFParseTree.height {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} :

CNFParseTree g A → ℕ

The height of a parse tree: leaves have height 0, and an internal node has height 1 + max(height(left), height(right)).

Instances For

source

theorem CFLPumping.CNFParseTree.yield_length_le_pow_height {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} (t : CNFParseTree g A) :

t.yield.length ≤ 2 ^ t.height

Yield-height bound. In a binary parse tree, the number of leaves is at most 2^h where h is the height. This is the pigeonhole bound that gives the pumping length 2^(k+1) in the CFL pumping lemma.

source

theorem CFLPumping.CNFParseTree.tree_derives {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} (t : CNFParseTree g A) :

g.Derives [Symbol.nonterminal A] (List.map Symbol.terminal t.yield)

Soundness of parse trees. Every parse tree rooted at A witnesses a derivation A ⇒* yield(t) in the grammar.

source

theorem CFLPumping.CNFParseTree.yield_nonempty {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} (t : CNFParseTree g A) :

t.yield ≠ []

A parse tree (which has no ε-leaves by construction) always yields a nonempty string of terminals.

source

def CFLPumping.CNFParseTree.ntAtDepth {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} (t : CNFParseTree g A) (d : ℕ) (hd : d ≤ t.height) :

g.NT

The nonterminal label appearing at depth d along the tallest path of t (breaking ties by descending into the left subtree). The pumping argument inspects this sequence of labels to find a repeat by pigeonhole.

Instances For

source

theorem CFLPumping.CNFParseTree.exists_decomp_nonempty_ctx {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} (t : CNFParseTree g A) (d : ℕ) (hd : d ≤ t.height) (hd_pos : 0 < d) :

∃ (R : g.NT) (sub : CNFParseTree g R) (lc : List T) (rc : List T), R = t.ntAtDepth d hd ∧ sub.height = t.height - d ∧ t.yield = lc ++ sub.yield ++ rc ∧ g.Derives [Symbol.nonterminal A] (List.map Symbol.terminal lc ++ [Symbol.nonterminal R] ++ List.map Symbol.terminal rc) ∧ (lc ++ rc).length > 0

Decomposition with nonempty context. For any positive depth d along the tallest path of t, the tree splits as

yield(t) = lc ++ yield(sub) ++ rc

where sub is the subtree rooted at the nonterminal R = ntAtDepth t d, the derivation A ⇒* lc · R · rc holds, the subtree has height height(t) − d, and the surrounding context lc ++ rc is nonempty (because we went down at least one internal node and so picked off a sibling subtree, which yields a nonempty string). This is what produces the vy ≠ ε clause of the CFL pumping lemma.

source

theorem CFLPumping.CNFParseTree.ntAtDepth_node_eq {T : Type u_1} {g : ContextFreeGrammar T} {A B C : g.NT} (hr : { input := A, output := [Symbol.nonterminal B, Symbol.nonterminal C] } ∈ g.rules) (l : CNFParseTree g B) (r : CNFParseTree g C) (n : ℕ) (hn : 0 < n) (hd : n ≤ (node A B C hr l r).height) :

(node A B C hr l r).ntAtDepth n hd = if h : l.height ≥ r.height then l.ntAtDepth (n - 1) ⋯ else r.ntAtDepth (n - 1) ⋯

Computation rule for ntAtDepth at a node: at depth n > 0, descend into the taller child and ask for depth n − 1.

source

theorem CFLPumping.CNFParseTree.exists_decomp_with_coherence {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} (t : CNFParseTree g A) (d1 d2 : ℕ) (hd1 : d1 ≤ t.height) (hd12 : d1 + d2 ≤ t.height) :

∃ (R : g.NT) (sub : CNFParseTree g R) (lc : List T) (rc : List T), R = t.ntAtDepth d1 hd1 ∧ sub.height = t.height - d1 ∧ t.yield = lc ++ sub.yield ++ rc ∧ g.Derives [Symbol.nonterminal A] (List.map Symbol.terminal lc ++ [Symbol.nonterminal R] ++ List.map Symbol.terminal rc) ∧ ∀ (hd2 : d2 ≤ sub.height), sub.ntAtDepth d2 hd2 = t.ntAtDepth (d1 + d2) hd12

Decomposition that is coherent with deeper sampling. Like exists_decomp_nonempty_ctx, but also guarantees that the nonterminal at depth d₂ of the extracted subtree sub coincides with the nonterminal at depth d₁ + d₂ of the original tree t. This coherence lets one apply two layers of exists_decomp_… in tree_pump and ensure that the inner pumped nonterminal matches the outer one (so the loop A ⇒* uAv is well-formed).

source

theorem CFLPumping.produces_from_singleton_nonterminal' {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} {v : List (Symbol T g.NT)} (h : g.Produces [Symbol.nonterminal A] v) :

∃ r ∈ g.rules, r.input = A ∧ v = r.output

Inversion principle: a one-step derivation [A] ⇒ v from a singleton sentential form must use a rule whose left-hand side is A, and v is that rule's right-hand side.

source

theorem CFLPumping.derives_from_nt_nontrivial {T : Type u_1} {g : ContextFreeGrammar T} {A : g.NT} {w : List T} (hne : w ≠ []) (hd : g.Derives [Symbol.nonterminal A] (List.map Symbol.terminal w)) :

∃ (v : List (Symbol T g.NT)), g.Produces [Symbol.nonterminal A] v ∧ g.Derives v (List.map Symbol.terminal w)

A derivation [A] ⇒* w to a nonempty terminal string must contain at least one step, and so factors as a first one-step [A] ⇒ v followed by v ⇒* w.

source

theorem CFLPumping.derives_nil_of_nil {T : Type u_1} {g : ContextFreeGrammar T} {w : List (Symbol T g.NT)} (hd : g.Derives [] w) :

w = []

The empty sentential form derives only itself: [] ⇒* w implies w = [], since there is no nonterminal in [] to rewrite.

source

theorem CFLPumping.cnf_nonempty_of_derives_ne_initial {T : Type u_1} {g : ContextFreeGrammar T} (hcnf : ChomskyNormalForm.IsCNF g) {B : g.NT} (hB : B ≠ g.initial) (w : List (Symbol T g.NT)) (hd : g.Derives [Symbol.nonterminal B] w) :

w ≠ []

In a CNF grammar, only the start variable can erase a string: if B ≠ S and B ⇒* w, then w is nonempty. (The only ε-rule allowed in CNF has S on its left-hand side, and the start variable never appears on a right-hand side.)

source

theorem CFLPumping.cnf_tree_of_derives_aux {T : Type u_1} (g : ContextFreeGrammar T) (hcnf : ChomskyNormalForm.IsCNF g) (n : ℕ) (A : g.NT) (w : List T) :

w.length ≤ n → w ≠ [] → g.Derives [Symbol.nonterminal A] (List.map Symbol.terminal w) → ∃ (t : CNFParseTree g A), t.yield = w

Existence of a parse tree, by strong induction on the length of the yield. If g is in CNF and [A] ⇒* w with w nonempty and |w| ≤ n, then there is a CNFParseTree g A whose yield is w. The proof inverts the first derivation step against the three CNF rule forms and recurses on the two halves produced by an A → BC rule.

source

theorem CFLPumping.cnf_tree_of_mem_language {T : Type u_1} (g : ContextFreeGrammar T) (hcnf : ChomskyNormalForm.IsCNF g) (s : List T) (hs : s ∈ g.language) (hne : s ≠ []) :

∃ (t : CNFParseTree g g.initial), t.yield = s

Completeness of CNF parse trees: every nonempty string in L(g) (for a CNF grammar g) is the yield of some parse tree rooted at the start variable.

source

theorem CFLPumping.listPow_map {α : Type u_2} {β : Type u_3} (f : α → β) (l : List α) (n : ℕ) :

List.map f (listPow l n) = listPow (List.map f l) n

List.map distributes over listPow: map f (l^n) = (map f l)^n.

source

theorem CFLPumping.tree_pump {T : Type u_1} (g : ContextFreeGrammar T) [Fintype g.NT] [DecidableEq g.NT] {A : g.NT} (t : CNFParseTree g A) (hh : t.height > Fintype.card g.NT) :

∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), t.yield = u ++ v ++ x ++ y ++ z ∧ (∀ (i : ℕ), g.Derives [Symbol.nonterminal A] (List.map Symbol.terminal (u ++ listPow v i ++ x ++ listPow y i ++ z))) ∧ (v ++ y).length > 0 ∧ (v ++ x ++ y).length ≤ 2 ^ (Fintype.card g.NT + 1)

Tree-level pumping lemma. If a parse tree t (rooted at A) has height exceeding the number of nonterminals k = |V|, then by pigeonhole some nonterminal R repeats along the tallest path. The two occurrences of R decompose yield(t) = u v x y z with:

A ⇒* u v^i x y^i z for every i ≥ 0 (using derives_pump_iter),
vy ≠ ε (from exists_decomp_nonempty_ctx), and
|vxy| ≤ 2^(k+1) (from the yield/height bound applied to the inner subtree).

source

theorem CFLPumping.cnf_pumping_core {T : Type u_1} {g : ContextFreeGrammar T} [Fintype g.NT] [DecidableEq g.NT] (hcnf : ChomskyNormalForm.IsCNF g) (s : List T) (hs : s ∈ g.language) (hlen : s.length ≥ 2 ^ (Fintype.card g.NT + 1)) :

∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), s = u ++ v ++ x ++ y ++ z ∧ (∀ (i : ℕ), u ++ listPow v i ++ x ++ listPow y i ++ z ∈ g.language) ∧ (v ++ y).length > 0 ∧ (v ++ x ++ y).length ≤ 2 ^ (Fintype.card g.NT + 1)

Pumping lemma for CNF grammars. If g is a CNF grammar with k = |V| nonterminals and s ∈ L(g) has length |s| ≥ 2^(k+1), then s = u v x y z with u v^i x y^i z ∈ L(g) for all i ≥ 0, vy ≠ ε, and |vxy| ≤ 2^(k+1). The argument: the height bound |s| ≤ 2^h forces h > k, so tree_pump applies to a parse tree for s.

source

theorem CFLPumping.pumping_lemma_cfl {T : Type u_1} {A : Language T} (hA : A.IsContextFree) :

∃ (p : ℕ), ∀ s ∈ A, s.length ≥ p → ∃ (u : List T) (v : List T) (x : List T) (y : List T) (z : List T), s = u ++ v ++ x ++ y ++ z ∧ (∀ (i : ℕ), u ++ listPow v i ++ x ++ listPow y i ++ z ∈ A) ∧ (v ++ y).length > 0 ∧ (v ++ x ++ y).length ≤ p

Pumping Lemma for Context Free Languages (Sipser, Lecture 5).

For every CFL A there exists a pumping length p ∈ ℕ such that every string s ∈ A with |s| ≥ p admits a decomposition s = uvxyz satisfying:

u v^i x y^i z ∈ A for all i ≥ 0,
vy ≠ ε, and
|vxy| ≤ p.

Proof: pick a CFG for A, convert it to a CNF grammar g with a finite nonterminal type via ChomskyNormalForm.cfg_to_cnf_fintype, and apply cnf_pumping_core with p = 2^(|V|+1).