selection sort is naive approach, swap minimum element to leftmost position
mergesort gives us a divide and conquor solution, using in tandum merge
worst case runtime analysis makes no assumption about input, and thus must hold for all input
Proving asymptotic bounds (big O) can be done through the use of limits, where the limit->infinity of T(n)/f(n) =
sum of 1/i for i 1 to n is essentially the approximation of the integral for 1/n, but strictly less than the actual integral, log n
log n! = nlog n - theta(n) is stronger than saying big o n log n
informally solving a computational problem c1 by using an algorithm that solves another computational problem c2
recurrence is an equation for divide and conquer problems / recursive
case based theorem, for find asymptotic behaviour of recurrence relations where all subproblems are of equal size.
We describe any recursive problem with the form: \(T(n) = a T(\frac{n}{b}) + f(n))\)
to prove, expand the recursion tree and simplify the expression using a summation over all levels. then use the sum of c^j formula to get three cases based on the value of (a/b^d)
find the two closest points in a set
multiplying integers is O(n^2) with respect to digits naively
Greedy proofs are probably one of the hardest things to grasp in this course. A greedy proof uses some heuristic to construct a solution to some sort of optimzation problem (maximize the output of something, minimize the weight of this graph, ...)
Once you come up with an algorithm, a proof goes as follows. This is called an exchange argument.
The key ideas here are, to show that Sg produces a solution that is optimal. By showing that every optimal solution is just a rearrangement in some way with equal cost, we prove that the greedy solution is optimal.
An alternative approach with exchange arguments is to define the solution we're comparing to as an arbitrary solution, and show that by taking exchange steps that never increase the cost, and step towards the greedy solution in a provably finite number of steps, then we can say that the greedy solution is optimal. The key idea here is that by showing we can reach the greedy solution from EVERY solution using exchanges that never increase the cost, there must be no solution who's cost is less than the greedy solution to begin with.
DP can be used to remove repeated recursive calls.
Minimum Spanning Tree is a spanning tree (one that includes all vertices in a graph) who's sum of all edge weights is minimized (it is the spanning tree with lowest weight possible). Greedy approach gives us Prim's Algorithm, starting at any vertex, continuously add the edge with smallest weight, to a vertex not yet in the set of the MST.
Topological sorting DAGs takes O(m+n) using Kosaraju's Algorithm. Intuition is to DFS through the graph and label the order of arrival at leaves of the DAG. Then sort from largest to smallest to obtain the sorted list of vertices.
Shortest paths between points in a graph has many variations.
Shortest path in a DAG is O(n + m), Topological sort then iterative DP to find the shortest path to each point from a single start.
Shortest path in a non-negative weighted undirected graph, use Dijkstra's Algorithm, greedy O(m log m) approach.
Shortest path between two vertices, with possibly negative weights but no negative weight cycles, use Bellman Ford O(nm), iterative DP solution looking at paths P(k, i) where k is the destination, and i is the number of edges in the path.
For shortest paths between all pairs of vertices, there are O(n^2) vertex pairs (nC2), use Floyd-Warshall, O(n^3) algorithm, iterative DP looking at paths P(i, j, n) where i is source, j is dest, n is some intermediate node.
Looking at problems that we do not know of any efficent algorithms for (SAT, 01 Knapsack, Travelling Salesman Problem)
Let's formalize the set of tractable problems, "easy":
class P: given a computational decision probelm C, C is in P iff there exists an algorithm solving C in O(n^k) for constant k
pretty good definition in practice, but edges cases exist like n^1000 or exponential algorithms that run efficiently on average
Idea behind reduction: if you're trying to sell security software, you can't say that it not hackable, but you can argue that if your software is hacked, then every other piece of software is hacked too.
Looking at travelling salesperson problem, there is no known algorithm that can determine a tour in G in polynomial time. It's really hard to prove that there is absolutely no algorithm that exists. But we can prove that TSP is as difficult as a bunch of other unsolved problems.
How do we prove in a relative sense?
C1 reduces to C2 (C1 <=P C2) if given a polynomial time solution for C2, we can solve C1 in poly-time
We define completeness as, Let C be a set of problems, let ci in C b3 C-complete if forall ck in C, ck reduces to ci (ci is the hardest problem in C).
We can try to argue that TSP is the hardest problem in its class by saying that it is C-complete for a set of problems. This is C=NP. It is canonically the set of brute-force solvable problems.
Looking at the optimization version of the knapsack problem, it's clear that finding the max value also let's us simply solve the decision version of it.
C* is NP complete means
Pick some set of problems C and show that C in NP if
So all problems in P are also in NP.
So we have two ways to argue C*
is NP-complete
Remember, C1 reduces to C2 => C2 is as hard as C1
Prove that Independent Sets (IS) reduces to 3-SAT
IS says find some k vertices that have no edges between any of them
3-SAT says given a boolean expression in CNF, find an assignment to n variables that yields true for the whole expression.
Write a reduciton, a polynomial time coverter from an IS problem instance to a 3SAT problem instance.
, and also make an edge for any pair of vertices labelled xi
and not xi
State that the reduction is polynomial time
Claim: F is satisfiable iff exists an IS, S in G of size >= m (for m clauses in 3SAT)
Can there be an IS of size > n?
Proof of Claim:
not xi
at the same time for any xiProve that IS reduces to Vertex cover.
Vertex cover outputs yes iff exists subset S of vertices of size <= k s.t. forall edges (u,v) in E, either u in S or v in S.
Write a reduction, notice that they are inverse problems in a sense: for solution S of vertices that covers all edges, then V - S is a solution for independent sets, since V - S cannot have any edges within in, otherwise S wouldn't have been a solution to vertex cover.
Prove that Clique is NPC
Clique is a set S subset of V, where all possible edges exist within S and S is of size >= k.
Notice that we can covert the problem by using the idea that a clique is exactly the opposite of an independent set. So we can just invert the edges of the graph and find a solution for Clique, and that is the solution for IS.
Subset-Sum reduces to 0-1 Knapsack
Instance converter:
Subset Sum
X = {x1, ..., xn}, t
0/1 Knapsack
values = {x1, ..., xn}
weights = {x1, ..., xn}
W = t
V = t
Travelling Salesman Problem (TSP) decision version says is there some tour of a given complete graph whose total weight is less than k?
Hamiltonian Cycle is can we find a path that visits every vertex exactly once? This reduces to TSP by converting a graph instance to a complete graph with weights 0 for existing edges and 1 for non existing, then solve TSP less than 1.
0/1 knapsack decision problem finds out whether there is some subset of items (with weight and value) that can fit in a constrained weight backpack, with total value over k.
Independent set is the same problem we saw before for linear graphs, but generalized to all graphs. Output yes iff there exists a subset S of V of size greater than k s.t. no pair of vertices in S have an edge.
Vertex Cover outputs yes if there is a subset S of V vertices of size > k, s.t. every edge has at least one end in S. This can be reduced easily to independent sets
NP-Complete is a set of uniquely difficult problems, but not the most difficult by any means, just uniquely interesting to study. There are more difficult (complex classes) of algorithms. At the top are ones that are undecidable, not solvable.
Consider the Halting problem: design an algorithm that determines if a given algorithm and input data halts (terminates). If a program will get stuck in an infinite loop, it does not halt.
But this is undecidable. Consider a halting algorithm that exists. Then consider an algorithm that takes an algorithm as input (think compilers and this is reasonable). Then let H be our halting algorithm. If we design a self-referential algorithm:
if H(x, x) == yes: while true {}
else halt
Then the result of this program is a contradiction and here's why. Assume HSelf halts. Then HSelf(HSelf)
should halt. But applying the halting program H, H(HSelf, HSelf)
returns true. But then HSelf would enter an infinite loop by definition. The contradiction here is that if HSelf with this input is non-halting, even though we assumed it were halting. The logical conclusion of this statement is, HSelf halts implies HSelf doesn't halt
which is a contradiciton.
Looking at the other case, assume HSelf doesn't halt. Then HSelf(HSelf)
calls H(HSelf, HSelf)
which returns false. HSelf
's implementation causes it to halt. The logical statement derived here is HSelf doesn't halt implies HSelf halts
. This statemnt is a contradiciton.
Since both cases are contradictions, we can conclude that there cannot exist a halting algorithm.
This was proven using a paradox. If x, then not x style of logic. The idea is that we assume an algorithm for a problem exists, and show that if this were true, then for some input, every case reduces to a contradiction, allowing us to conclude that such an algorithm could never exist.
We can use reduction techniques from the last chapter to show other problems are undecidable. Show that we can phrase some undecidable problem in terms of some other problem (as a specific input or case of the problem), and this proves that this other algorithm must also be undecidable.