CS462 Formal Languages and Parsing

Back to UWaterloo

Combinatorics on Word: Definitions

Proof tricks:

Finite Automata and Regular Languages

Proof tricks:

3.8 Automata, graphs and boolean matrices

A transition in a NFA can be represented using an incidence matrix: a boolean matrix where for a given input character a in Sigma, Ma[i][j] = 1 if there is a transition from state i to state j on character a, or delta(qi, a) = qj. (we use AND instead of multiply, OR instead of addition).

If you multiply matrices, we end up "following paths" along the NFA. Ma x Mb gives us the incidence matrix for word w = ab.

We learned from algos cs240 that there are more effective implementations of matrix multiplication than the naive O(n^3). This gives us a more efficient computation for NFA simulation.

There are a finite number of possible matrices, even after infinite operations on them since each state can only be 0 or 1.

Proof trick: construct a second DFA from the original, using tuples of matrices as the states (this works because there are finitely many possible incidence matrices). Define the final states using a sort of "query" on all states that satisfy some property, enabling traversal from the initial state (row 0) to final states in the original DFA.

3.9 Myhill-Nerode theorem

The following are equivalent:

  1. L is regular
  2. L can be written as the union some of the equivalence classes (indistinguishable) of E, where E is a right-invariant (xRy => xzRyz x,y in S, forall z in S) equivalence relation with finite index
  3. RL is of finite index.

3.10 Minimization of finite automata

Myhill-Nerode theorem allows minimal DFAs to be constructed from the Myhill-Nerode equivalence relation

Naively, enumerate all pairs of states. "Mark" all pairs that are immediately distinguishable (one's a final state, the other isn't). Run for n iterations, marking new pairs along the way. Any unmarked states after n iterations yields the equivalence classes.

3.11 State complexity

3.12 Partial Order

4 Context-free grammars

4.3 Ogden's Lemma

4.2 & 4.6 Parikh's Theorem

so all in all, if L is a CFL, then there exists some regular language R s.t. ψ(L) = ψ(R)

so I guess we think about Parikh's theorem as a way to count things. It's kinda just saying if you forget about the order, we're just looking at patterns in the counting of each letter.

5 Parsing and recognition

CYK is a dynamic programming approach to parsing in O(n^3). Practically, large programs require better runtimes, so we use more restricted grammars to achieve these runtimes.

Earley's Method has worst case O(n^3), but runs in O(n^2) for unambiguous and O(n) for LR(1)

Knuth NFA

Turing Machines