Back to UWaterloo
Combinatorics on Word: Definitions
 A primative
x
and power w
form the word w = x^k
for integer k
 P_{2} means the set of all primatives over the alphabet of length 2: {0,1}

x Ш y
means shuffle, as in shuffling a deck (interleave)
 lexicographic order is defined for same length strings

radix order sorts first by length then lexicographical
 z is purely periodic if z = x^omega for some finite x
 z is ultimately periodic if the suffix is purely periodic
 x and y are conjugates if x is a cyclic shift of y
 x is a border of y if x is not empty, x != y, x is prefix and suffix of y "entaglement"
 LyndonSchutzenberger 1st theorem, the "alfalfa" theorem
 can not avoid squares over a finite alphabet
 a morphism is a map from one alphabet to another, preserving concatenation
 if every subword of an infinite word appears at least twice, then every subword appears infinitely many times
 if every subword appears infinitely many times, the infinite word is recurrent
 thuemorse sequence is recurrent because any subword is from position i to j, and for the binary representation of
j
, you can always find the same subword ending at 11j
 another way to think about this is the complementary construction. complement of complement gets you the original string back at a later position
Proof tricks:
 pigeon hole principle is useful for showing eventual repetition in infinite strings
Finite Automata and Regular Languages

moore vs mealy machines are representations of a DFA for computation, resulting in some output
 moore outputs different values at each state. there exists a mealy machine with the same # of states
 mealy outputs values at each transition. there exists a moore machine with Q Δ states
 Quotient operator on languages:
L1 / L2
is defined as {x in Sigma * where y in L2 and xy in L1 }
 we can define
prefix(L) = L / Sigma*
 Concatenation of languages:
L1 L2
is {xy where x in L1, y in L2 }
 Theorem: for language
L
and regular language R
, then R / L
is regular idea: start with a DFA for
R
, modify its final (accepting) states
 example: remove trailing zeroes while preserving regularity.
(L / '0'*) n {epsilon u sigma* u sigma  {0}
 Δ^{ * } is the set of all languages

substitution maps words to languages. It preserves concatenation.
 "substitution by regular languages" means regularity is preserved during substitution.
 Theorem: substitution preserves regularity. Prove by induction on the regular expression operators.

inverse morphism on a language L is, given a morphism, the set of all words that when the morphism is applied to them, they result in a word in L

transducer is a generalization of the mealy machine which transforms input to a new output, which has finite states, nondeterministic transitions, and each transition takes some input and produce output of possibly a different alphabet
Proof tricks:
 for induction, always construct the problem recursively. Then make sure during the induction step, we apply it to pieces recursively.
 start with reprentations of an object (regular language represented by DFA, NFA, etc) and modify it to reach an answer
3.8 Automata, graphs and boolean matrices
A transition in a NFA can be represented using an incidence matrix: a boolean matrix where for a given input character a in Sigma, Ma[i][j] = 1
if there is a transition from state i
to state j
on character a
, or delta(qi, a) = qj
. (we use AND
instead of multiply, OR
instead of addition).
If you multiply matrices, we end up "following paths" along the NFA. Ma x Mb
gives us the incidence matrix for word w = ab
.
We learned from algos cs240 that there are more effective implementations of matrix multiplication than the naive O(n^3)
. This gives us a more efficient computation for NFA simulation.
There are a finite number of possible matrices, even after infinite operations on them since each state can only be 0 or 1.
Proof trick: construct a second DFA from the original, using tuples of matrices as the states (this works because there are finitely many possible incidence matrices). Define the final states using a sort of "query" on all states that satisfy some property, enabling traversal from the initial state (row 0) to final states in the original DFA.
3.9 MyhillNerode theorem
The following are equivalent:

L
is regular

L
can be written as the union some of the equivalence classes (indistinguishable) of E
, where E
is a rightinvariant (xRy => xzRyz
x,y in S, forall z in S
) equivalence relation with finite index
 an equivalence relation
R
partitions a set into disjoint subsets of "equal" things (equivalence class)
R1
is a refinement of R2
if R1 subset of R2
 index is the number of classes

R
_{L} is of finite index.

MyhillNerode equivalence relation is a rightinvariant relation
R
_{L}, s.t. if xz in
language L iff yz in L
 if L is the union of any rightinvarant equivalence relation E's equivalence classes, then E is a refinement of
R
_{L}
3.10 Minimization of finite automata
MyhillNerode theorem allows minimal DFAs to be constructed from the MyhillNerode equivalence relation
Naively, enumerate all pairs of states. "Mark" all pairs that are immediately distinguishable (one's a final state, the other isn't). Run for n iterations, marking new pairs along the way. Any unmarked states after n iterations yields the equivalence classes.
3.11 State complexity

sc(L)
is the minimum number of states in a DFA that accepts regular language L

sc(L1 n L2) <= sc(L1) x sc(L2)
. This can be seen by forming a DFA whose states are pairs, one state from L1
and one from L2
. The upper bound case can be formulated using L1
= number of 0s in the string = 0 mod n
, and L2
is number of 1s in the string = 0 mod m
 NFAs are harder since the minimal NFA is not unique
 we can find lower bounds if we construct the problem in a way that we can apply known theorems to it
 if the shortest string accepted by L is of length
n
, then nsc(L) >= n+1
. Prove using pigeon hole principle to show some state is repeated, and pumping lemma idea to remove a cycle to find a shorter string.
 if you can find
n
words under the following constraints, then you can show that nsc(L) >= n
: split each of the n
words into a pairsome nonoverlapping prefixsuffix pairthen show that for all choices of two of pairs, if you swap the prefixes, then at least one of the words is not in L.
3.12 Partial Order
 subword (contiguous,
x S y
) is one natural partial ordering on strings, and subsequence (a  b
) is another
 if
R
is a partial ordering, then if a R b
or b R a
, then a
and b
are comparable (otherwise incomparable)

antichain is a set of pairwise incomparable elements
 subword can have infinite antichains (for ex.
a b^n a
)
 subsequence has no infinite antichains
 proof using contradiction on minimality, using divisionfree subsequences
 a minimal element has nothing less than or equal to it except itself
 the set of minimal elements is finite (though hard to find in some cases, like powers of 2 in base 10)
4 Contextfree grammars
 recall pushdownautomata.
∂(q,a,A)
is of the form (p, 𝛾)
where A is the current top of stack, and 𝛾
is a string that replaces A
. Input accepted if at final state or stack is empty, where stack starts with some unique Z0.
 CFL closed under union, concat, kleene star, substitution (of symbol for a CFL) and therefore morphisms, inverse morphisms
 recall the pumping lemma for context free languages. If a language is context free, then we can always find some pumping length such that all strings in the language greater than the pumping length must posess structural properties.
 to show that a language is NOT context free, we assume some
n
exists. Then construct a string in L
greater than any choice of pumping length n
(so it must depend on n
). Then we show that it does not hold the correct structural properties of a CFL.
4.3 Ogden's Lemma

Ogden's Lemma is a generalization of the pumping lemma for context free grammars. Similarly, we have a pumping length, based on some
n = d^{k+1}
where d
is the max length of an production rule's right side (the max number of possible children for each vertex in the parse tree) and k+1
for k
variables to satisfy the height requirement to arrive at the pigeon hole principle argument. The difference here is that we can mark symbols in a string.

prooftrick: factorial trick
n! + n
gives us a way to trap any a^n
(decomposed as a^{ij} a^k
) to be equal to a^{n! + n}
for some i, for any choice of j,k.
4.2 & 4.6 Parikh's Theorem
 all unary CFLs are regular. We can rewrite any unary CFL as the union of a finite number of regular languages, where the finite number is based on the number of rules and the size of them.
 Parikh's theorem generalizes this to certain CFLs over any alphabet.
 linear and semilinear are general properties of sets.

linear sets of numbers can built using any
r
coefficients (finite?), and saying that any number in that linear set can be constructed using the some of these r
coefficients multiplied by r
other integers. this is generalized in this course as a linear set of ktuples, described by
r
coefficients (which need be k
tuples)

semilinear sets are described as the union over a finite number of linear sets.
 I thought about whether semilinear sets can be redescribed using the constraints of linear sets. You can't. As as geometric example, if you have
r = 1
and k = 2
, then this plots out some line in 2D space. If you take the union of two lines in 2D space, you get an X shape (or parallel lines). But if you tried to represent this with r = 2
, you would end up describing a grid of points.
 finally, Parikh's map
ψ(w)
applies these ideas to formal languages. It maps a string to a ktuple
where k = ∑
, counting the occurences of each letter in some string. It maps a language to the set of ktuples
that represent strings in the language.
 if L is a CFL, then ψ(L) is a semilinear set of
ktuples
.
 if X is semilinear, than there exists a regular language L so that
ψ(L) = X
 any finite set of strings can be mapped to a semilinear set by, for each word in the finite set, setting each to the "yintercept" and have all 0 coefficients then taking the union of all.
so all in all, if L is a CFL, then there exists some regular language R s.t. ψ(L) = ψ(R)
so I guess we think about Parikh's theorem as a way to count things. It's kinda just saying if you forget about the order, we're just looking at patterns in the counting of each letter.
5 Parsing and recognition
CYK is a dynamic programming approach to parsing in O(n^3). Practically, large programs require better runtimes, so we use more restricted grammars to achieve these runtimes.
 first convert grammar to CNF, then we construct an
n x n
DP table for some string w
of length n
to which we want to find a parse tree for.
 each entry at
CYK[i][j]
is the list of nonterminals that derive the substring w[i..j]
 each table entry can be computed in
O(n)
by finding all rules which derrive any of CYK[i][k] x CYK[k][j]
Earley's Method has worst case O(n^3), but runs in O(n^2) for unambiguous and O(n) for LR(1)
 again, first convert to Chomsky Normal Form
A > α . β
, where we add a dot to mark "everything before this point has been derived, AKA has a parse tree"
 bottomup parsing algorithm, using a table to store intermediate results
 each table entry
M[i][j]
is a set of production rules A > α . β
where α
is a nonterminal that expands to a derivation of w
_{i,j} (substring from i to j) and also the start symbol S
derives w
_{0,i}. A
Knuth NFA
 the subword that is derived by the some nonterminal is a handle. At each step of the algorithm, a handle is replaced by a nonterminal based on some item (a production rule augmented with a dot . symbol).
 for some grammars, there are multiple handles for a partially parsed word, but LR(0) grammars are defined so that there is only ever one handle. This is how LR(0) grammars are parsable in linear time. From cs444, we think about this as representable as a DFA, so it has linear runtime.
 "What makes LR(0) parsing work is Knuth’s amazing
observation that the set of items valid for viable prefix γ can
be computed by a finite automaton."
 the Knuth NFA's language is the set of all viable prefixes, and states are the different items (production rules augmented with the dot . )
 the Knuth NFA is also used to define the set of valid items for every viable prefix (and a valid item is used to replace some handle with a nonterminal).
 we say an item
A > α . β
is valid for a viable prefix if there is a (rightmost) derivation (from the start symbol) S derives δαβw
and δα
is a viable prefix from the Knuth NFA (where δ,α,β, are terminals or non terminals and w is a subword).
Turing Machines
 01010101010... is a "simple" string because we can describe a pattern for it
 random digits could be seen as "complex", or without pattern
 complexity and randomness are inherently linked
 Kolmogorov Complexity of a string
C(x)
is the shortest program P
on input i
that prints x
and halts
 if
C(x)
is large, x
is hard to describe
 could think of Kolmogorov as encoding or compression (from storage perspective) to recover
x
 Kolmogorov complexity is upper bounded by
O(x)
since you can just input x
and print it
 by pigeon hole principle, there are incompressible or random strings (since for length
n
, there are 2^n
strings, but only 2^n  1
shorter representations, like counting nodes in a binary tree 2^(n1) + 2^(n2) + ... 4 + 2+ 1
)
 Kolmogorov complexity is uncomputable