CS 223 (Randomized Algorithms)

29 Jan 2015

These are my lecture notes from CS 223: Randomized Algorithms (Piazza), taught by Michael Mitzenmacher at Harvard. I am an enrolled student, and these notes are in no way official; they're reproduced by me from Prof. Mitzenmacher's lectures in Spring 2015, and are licensed Creative Commons Attribution-NonCommercial-ShareAlike 4.0, which supercedes any other site-wide licenses.

\(R_{01}\): Introductions, Some Probability Theory

Two Problems

Problem 1: Dice

Alice suggests the following Game: Given three fair but nonstandard dice:

A: 1,1,6,6,8,8
B: 2,2,4,4,9,9
C: 3,3,5,5,7,7

Bob will choose a die, then Alice will choose another die. Both will roll, the highest roller will win. What should Bob choose?

Quote: "Well, the meta-level reason that you don't want to play this game [as Bob] is that, if a professor of probability proposes a dice game to you, you probably don't want to play."

Solution: With 5/9 probability each, A beats C beats B beats A. Bob's choice doesn't matter, but he's 5:4 to lose no matter his choice. Read more elsewhere at Wiki | Nontransitive dice.

High-Order Lesson: You have to be very careful with probability; lots of things, like "strength of dice" are unintuitive (e.g. in being nontransitive).

Problem 2: Coins

Bob suggests a different Game: Alice will choose a sequence of three {heads, tails}, then Bob will choose a different sequence. They'll toss a fair coin until one of their sequences comes up as a consecutive subsequence of toss results. What should Alice choose?

Solution: Consider the situation if Alice picks \(\sf HHH\); then Bob can pick \(\sf THH\). Then the game is the same as if Alice had picked \(\sf HHH\) and Bob had picked \(\sf T\); the game will be over after three turns, so it's easy to see that Bob is 7:1 to win.

We can extrapolate the full strategy, involving a similar prefixing trick, whereby Bob prepends a single letter to the first two of Alice's -- specifically, the opposite of Alice's second letter. Then \(\sf HTH\) is an optimal choice for Alice, \(\sf HHT\) is an optimal response by Bob, giving Bob 2:1 to win. Read more elsewhere at Wiki | Penney's game.

Discrete Probability Spaces

Set-theoretically, a probability space consists of:
a sample space \(\Oa\) (a set of all possible outcomes)
an event space \(\F\sbq\Oa\)
a function \(\Pr:\F\to\R[0,1]\).
Axioms:
\(\Pr(\Oa)=1\).
\(\fa x\in\F,0\leq\Pr(x)\leq1\).
\(\fa x,y\in\F,\Pr(x\un y)=\Pr(x)+\Pr(y)-\Pr(x\cap y)\)

Polynomial Identity Testing

Problem: \(\ds{f(x)\dq\prod_{i=1}^n(x-a_i)}=^?\sum_{i=0}^nc_ix^i\qd g(x)\).
Deterministic Algorithm: Expand the LHS, compare symbolically.
Problem: It might take a long time (exponential blowup).
Randomized Algorithm: Plug in the same (randomly chosen) \(x'\) on both sides:
If \(\text{LHS}\lr{x'}=\text{RHS}\lr{x'}\), declare equal.
If \(\text{LHS}\lr{x'}\neq\text{RHS}\lr{x'}\), declare inequal.
However, there's a chance of mistakenly declaring them equal, if you picked a bad \(x'\) by mistake.
More precisely, you'll fail if you picked a root of \(f-g\).
But since we know that there are at most \(n\) such roots (in the case \(f\neq g\)), we can limit the probability of failure to 1% if we choose from a pool of \(100n\) distinct numbers.
Gets even better (\(0.01^k\)) if we choose repeatedly; gets even better without replacement; in the limit, we can obtain a (nonrandomized, exact) \(O\lr{n^2}\) solution by simply choosing \(k=n+1\).
But here, we're interested in a ~linear-time solution, so let's

More Probability Theory

Events \(E,F\) are independent iff \(\Pr(E\cap F)=\Pr(E)\cdot\Pr(F)\).
Events \(E_1,\ldots,E_k\) are mutually independent iff \(\fa \mathscr E\in P(\{E_1,\ldots,E_k\}),\Pr\lr{\IN\mathscr E}=\prod\Pr(\mathscr E)\).

Caution: Mutual independence is not the same thing as pairwise independence; consider three events consisting of two coin flips and their xor.

Another Problem: MatMult Checking (mod 2)

Problem: \(AB=^?C\).
Deterministic Algorithm: Do the computation yourself.
Problem: it's slow (\(O\lr{n^3}\), or slightly faster).
Randomized Algorithm: Use matrix-vector calculations.
Pick \(\vec r\) randomly. (Flip a coin for each coordinate: \(2^n\) possible outcomes, all equally likely.)
Check \(A(B\vec r)=^?C\vec r\). (Takes \(O\lr{n^2}\).)
Ask: If \(AB\neq C\), what is \(\Pr_{\vec r}(AB\vec r=C\vec r)\)?

\(T_2\):

\(AB=^?C\)

Claim: Random sampling will catch \(A\lr{B\vec r}\neq C\vec r\) w.p. \(\geq\nf12\) (and 0 error in the other direction).
Proof: There is at least one nonzero entry of \(D=AB-C\) if it's nonzero, wlog pick its input last. Then, to give a false-equals, its input value must match the rowsum of the rest -- which it will do w.p. only \(\nf12\).
It follows that the tight bound on the p (in the modulo 2 case) is \(1-2^{-k})\, where \(k\) is the rank of \(AB-C\)

Min-Cut

Reminder: Conditional probability: \(\Prr YX\dq\f{\Pr(Y\cap X)}{\Pr(X)}\), requiring \(\Pr(X)>0\).

Problem: Given \(G=(V,E)\) and \(u,v\in V\), return a partition \(U\ni u,V\ni v\) minimizing the number (for unweighted case) of edges crossing \(U\leftrightarrow V\).
Algorithm: (Contraction algorithm)
Pick a random edge (not \((u,v)\)); "contract" those vertices -- i.e. ensure that they're on the same side of the cut.
Claim: \(\Pr(\text{got min cut})\geq\f2{n(n-1)}\) -- so boost by running many times and taking min.
So after \(k\) runs, \(\Pr(\text{fail})\leq\lr{1-\f2{n(n-1)}}^k\), or \(n^2\ln n\) trials.
Proof: Let \(E_i\) be the r.v. that the \(i\)th edge contracted was OK to contract -- i.e. not crossing the mincut. Let \(F_i\) be cumulative success: \[F_i\dq\IN_{j=1}^iE_i=\]
Then \(\Pr{\text{succeed}}=\prod_{i=1}^{n-2}\Prr{E_i}{F_{i-1}}\).
Claim: Min cut size \(=k\im\abs E\geq\nf{kn}2\).
So \(\Pr\lr{E_1}=\Prr{E_1}{F_0}\geq1-\f k{kn/2}=1-2/n=\f{n-2}n\).
Then, since there are still \(k\) bad edges, \(\Prr{E_2}{F_1}\geq\f{n-3}{n-1}\) -- and the progressive-fraction-product gives \(\f2{n(n-1)}\)
Optimization: Since the risk of failure is much larger at the end, you should spend more work re-trying choices there.
Result: This algorithm has been generalized and derandomized.

Expectation

\(E:(\Oa\to\R)\to\R\), defined \(EX\dq\sum_ii\Pr(X=i)\).
Linearity: \(E\lr{\sum_iX_i}=\sum_iEX_i\) and \(E(cX_i)=cE(X_i)\)

\(T_4\):

Chernoff (-Hoeffding) Bounds

Extend the idea of Chebyshev to \(n\)th moment:
\[\Pr((X-\mu)^n)\geq a^n)\leq\f{E(X-\mu)^n}{a^n}\].

Useful w/ MGFs: \(\Pr(X\geq a)\e\Pr\lr{e^{tx}\geq e^{ta}}\leq\f{Ee^{tx}}{e^{ta}}\) for \(t>0\).

Moment-generating function: \(Ee^{tx}\); the \(n\)th derivative at \(t=0\) is the \(n\)th moment of \(X\).

Also: \(Ee^{t(X+Y)}=E\lr{e^{tX}e^{tY}}=Ee^{tX}Ee^{tY}\)

Chernoff Example

\(X_i\dq\) coin flip 1 w/ prob \(p_i\), 0 else. \(X\dq\sum X_i\); \(\mu\dq EX\).

"Multiplicative" bound:
\[\Pr\lr{X\geq(a+\da)EX}\leq\lr{\f{e^\da}{(1+\da)^{1+\da}}}\]
\[\Pr(X\geq2EX)\leq(\nf e4)^n\]

(rest of class ignored; was doing pset)

\(T_5\): Bloom Filters, Coupon Collector Revisited -- Probabilistic Method

Bloom Filters

Set membership queries -- Is \(x\in S\)? How little space do we need?
Trading off time/space vs. accuracy
Basic idea: for each \(x\in S\), for each \(i\in\s{1,\hh,k}\), set \(B[h_i(x)]\).
Then \(x\in S\mim\fa i,B[h_i(x)]=1\) -- one-sided (false-pos) error
Alternative idea: Use perfect hash function \(h:S\bij[0,n-1]\), store (orthogonal) hash fingerprints (\(b\) bits each) -- then false-positive w.p. \(2^{-b}\)
Finding/representing/computing \(h\) is potentially nontrivial
BF false-pos w.p. \(\lr{1-e^{-nk/m}}^k\)
optimal: \(k=\f mn\ln2\)
Chernoff bounds concentration: real-life false-pos \(=(\text{frac. empty bins})^k=\lr{1-\f{\text{#empty bins}}m}^k\)
But the Poisson model \(\approx\) balls+bins model (\(\leq2e\sqrt m\) times more likely)
Thm: The f.p.p. of a BF is \(\lr{1-e^{-nk/m}}^k+o(1)\)

Coupon Collectors

Pr takes \(>n\ln n+cn\) coupons \(\leq\)
\[\lim_{n\to\infty}(\Pr cc>n\ln n+cn)=1-e^{-e^{-c}}\]
We throw \(n\ln n+cn\) calls into \(n\) bins -- what's Pr \(\ex\) empty bin?

Poisson approx: \(i\)th bin is empty \(\mu=\ln n+c\)
\(i\)th bin is empty w.p. \(e^{-\mu}=e^{-(\ln n+c)}=\f{e^{-c}}n\)

Proof: Poisson model yields \(n\ln n+cn+O\lr{\rt{n\log n}}\) balls w.p. \(O(1)\); union the difference term is tiny, irrelevant.

Any time you get a balls-and-bins problem, first ask: What does the Poisson approximation tell me?

\(R_5\): Probabilistic Method

Ramsey Theory

Given: Complete graph \(K_n\) on \(n\) vertices, two-colored. Does there exist a graph with all edges colored, with no mono-colored \(K_k\) subgraph?

\(R_7\): Entropy

Lemma: \(\f{2^{nH(q)}}{n+1}\leq\binom n{nq}\lesssim2^{nH(q)}\)

Icosian Reflections

…a tendency to systematize and a keen sense

that we live in a broken world.