CS 223 (Randomized Algorithms)
\def\al{\alpha}
\def\Al{\Alpha}
\def\bt{\beta}
\def\Bt{\Beta}
\def\da{\delta}
\def\Da{\Delta}
\def\ep{\epsilon}
\def\Ep{\Epsilon}
\def\gm{\gamma}
\def\Gm{\Gamma}
\def\la{\lambda}
\def\La{\Lambda}
\def\sg{\sigma}
\def\Sg{\Sigma}
\def\te{\theta}
\def\Te{\Theta}
\def\oa{\omega}
\def\Oa{\Omega}
\def-#1{^{-#1}}
\def\2{\s{0,1}}
\def\abs#1{\left|#1\right|}
\def\argmin{\text{argmin}}
\DeclareMathOperator*{\argmin}{arg,min}
\def\aryc#1{\lr{\begin{array}{c}#1\end{array}}}
\def\aryd#1{\lr{\begin{array}{cc}#1\end{array}}}
\def\arye#1{\lr{\begin{array}{ccc}#1\end{array}}}
\def\aryf#1{\lr{\begin{array}{cccc}#1\end{array}}}
\def\aryg#1{\lr{\begin{array}{ccccc}#1\end{array}}}
\def\blr#1{\big(#1\big)}
\def\co#1{\cx{co-#1}}
\def\cx#1{{\sf #1}}
\def\dd{\ddots}
\def\del{\partial}
\def\dq{:=}
\def\ds#1{\displaystyle{#1}}
\def\e\equiv
\def\ex{\exists}
\def\Exp#1{\exp!!\left[#1\right]}
\def\f{\frac}
\def\fa{\forall}
\def\grad{\nabla}
\def\hh{\ldots}
\def\im{\Longrightarrow}
\def\IN{\bigcap}
\def\ixl#1{\xleftarrow[#1]{}}
\def\ixr#1{\xrightarrow[#1]{}}
\def\lgr#1#2{\left(#1;\middle|;#2\right)}
\def\lr#1{\left(#1\right)}
\def\mi{\Longleftarrow}
\def\mim{\Longleftrightarrow}
\def\n#1{\left|#1\right|}
\def\nf#1#2{{}^{#1}/_{#2}}
\def\nin{\notin}
\def\otr{\overset R\leftarrow}
\def\ox{\otimes}
\def\pcwz#1{\left{\begin{array}{ll}#1\end{array}\right.}
\def\pelse{&\text{else}}
\def\pif{&\text{if }}
\def\poly{\text{poly}}
\def\pr#1{\cx{pr#1}}
\def\Pr{\text{Pr}}
\def\Prr#1#2{\Pr\left(#1;\middle|;#2\right)}
\def\px#1{[\text{#1}]}
\def\qd{=:}
\def\s#1{\left{#1\right}}
\def\sbn{\subsetneq}
\def\sbq{\subseteq}
\def\sbs{\subset}
\def\spn{\supseteq}
\def\spq{\supseteq}
\def\sps{\supset}
\def\st#1#2{\left{#1\;\middle|;#2\right}}
\def\tis#1{\text{ is #1}}
\def\tr{^{\sf t}}
\def\un{\cup}
\def\UN{\bigcup}
\def\vtr{^{;\sf t}}
\def\x{\times}
\def\xl#1{\xleftarrow{#1}}
\def\xr#1{\xrightarrow{#1}}
\def\xxl#1#2{\xleftarrow[#2]{#1}}
\def\xxr#1#2{\xrightarrow[#2]{#1}}
)
These are my lecture notes from CS 223: Randomized Algorithms (Piazza), taught by Michael Mitzenmacher at Harvard. I am an enrolled student, and these notes are in no way official; they're reproduced by me from Prof. Mitzenmacher's lectures in Spring 2015, and are licensed Creative Commons Attribution-NonCommercial-ShareAlike 4.0, which supercedes any other site-wide licenses.
\(R_{01}\): Introductions, Some Probability Theory
Two Problems
Problem 1: Dice
Alice suggests the following Game: Given three fair but nonstandard dice:
- A: 1,1,6,6,8,8
- B: 2,2,4,4,9,9
- C: 3,3,5,5,7,7
Bob will choose a die, then Alice will choose another die. Both will roll, the highest roller will win. What should Bob choose?
Quote: "Well, the meta-level reason that you don't want to play this game [as Bob] is that, if a professor of probability proposes a dice game to you, you probably don't want to play."
Solution: With 5/9 probability each, A beats C beats B beats A. Bob's choice doesn't matter, but he's 5:4 to lose no matter his choice. Read more elsewhere at Wiki | Nontransitive dice.
High-Order Lesson: You have to be very careful with probability; lots of things, like "strength of dice" are unintuitive (e.g. in being nontransitive).
Problem 2: Coins
Bob suggests a different Game: Alice will choose a sequence of three {heads, tails}, then Bob will choose a different sequence. They'll toss a fair coin until one of their sequences comes up as a consecutive subsequence of toss results. What should Alice choose?
Solution: Consider the situation if Alice picks \(\sf HHH\); then Bob can pick \(\sf THH\). Then the game is the same as if Alice had picked \(\sf HHH\) and Bob had picked \(\sf T\); the game will be over after three turns, so it's easy to see that Bob is 7:1 to win.
We can extrapolate the full strategy, involving a similar prefixing trick, whereby Bob prepends a single letter to the first two of Alice's -- specifically, the opposite of Alice's second letter. Then \(\sf HTH\) is an optimal choice for Alice, \(\sf HHT\) is an optimal response by Bob, giving Bob 2:1 to win. Read more elsewhere at Wiki | Penney's game.
Discrete Probability Spaces
- Set-theoretically, a probability space consists of:
- a sample space \(\Oa\) (a set of all possible outcomes)
- an event space \(\F\sbq\Oa\)
- a function \(\Pr:\F\to\R[0,1]\).
- Axioms:
- \(\Pr(\Oa)=1\).
- \(\fa x\in\F,0\leq\Pr(x)\leq1\).
- \(\fa x,y\in\F,\Pr(x\un y)=\Pr(x)+\Pr(y)-\Pr(x\cap y)\)
Polynomial Identity Testing
- Problem: \(\ds{f(x)\dq\prod_{i=1}^n(x-a_i)}=^?\sum_{i=0}^nc_ix^i\qd g(x)\).
- Deterministic Algorithm: Expand the LHS, compare symbolically.
- Problem: It might take a long time (exponential blowup).
- Randomized Algorithm: Plug in the same (randomly chosen) \(x'\) on both sides:
- If \(\text{LHS}\lr{x'}=\text{RHS}\lr{x'}\), declare equal.
- If \(\text{LHS}\lr{x'}\neq\text{RHS}\lr{x'}\), declare inequal.
- However, there's a chance of mistakenly declaring them equal, if you picked a bad \(x'\) by mistake.
- More precisely, you'll fail if you picked a root of \(f-g\).
- But since we know that there are at most \(n\) such roots (in the case \(f\neq g\)), we can limit the probability of failure to 1% if we choose from a pool of \(100n\) distinct numbers.
- Gets even better (\(0.01^k\)) if we choose repeatedly; gets even better without replacement; in the limit, we can obtain a (nonrandomized, exact) \(O\lr{n^2}\) solution by simply choosing \(k=n+1\).
- But here, we're interested in a ~linear-time solution, so let's
More Probability Theory
- Events \(E,F\) are independent iff \(\Pr(E\cap F)=\Pr(E)\cdot\Pr(F)\).
- Events \(E_1,\ldots,E_k\) are mutually independent iff \(\fa \mathscr E\in P(\{E_1,\ldots,E_k\}),\Pr\lr{\IN\mathscr E}=\prod\Pr(\mathscr E)\).
Caution: Mutual independence is not the same thing as pairwise independence; consider three events consisting of two coin flips and their xor.
Another Problem: MatMult Checking (mod 2)
- Problem: \(AB=^?C\).
- Deterministic Algorithm: Do the computation yourself.
- Problem: it's slow (\(O\lr{n^3}\), or slightly faster).
- Randomized Algorithm: Use matrix-vector calculations.
- Pick \(\vec r\) randomly. (Flip a coin for each coordinate: \(2^n\) possible outcomes, all equally likely.)
- Check \(A(B\vec r)=^?C\vec r\). (Takes \(O\lr{n^2}\).)
- Ask: If \(AB\neq C\), what is \(\Pr_{\vec r}(AB\vec r=C\vec r)\)?
\(T_2\):
\(AB=^?C\)
- Claim: Random sampling will catch \(A\lr{B\vec r}\neq C\vec r\) w.p. \(\geq\nf12\) (and 0 error in the other direction).
- Proof: There is at least one nonzero entry of \(D=AB-C\) if it's nonzero, wlog pick its input last. Then, to give a false-equals, its input value must match the rowsum of the rest -- which it will do w.p. only \(\nf12\).
- It follows that the tight bound on the p (in the modulo 2 case) is \(1-2^{-k})\, where \(k\) is the rank of \(AB-C\)
Min-Cut
Reminder: Conditional probability: \(\Prr YX\dq\f{\Pr(Y\cap X)}{\Pr(X)}\), requiring \(\Pr(X)>0\).
- Problem: Given \(G=(V,E)\) and \(u,v\in V\), return a partition \(U\ni u,V\ni v\) minimizing the number (for unweighted case) of edges crossing \(U\leftrightarrow V\).
- Algorithm: (Contraction algorithm)
- Pick a random edge (not \((u,v)\)); "contract" those vertices -- i.e. ensure that they're on the same side of the cut.
- Claim: \(\Pr(\text{got min cut})\geq\f2{n(n-1)}\) -- so boost by running many times and taking min.
- So after \(k\) runs, \(\Pr(\text{fail})\leq\lr{1-\f2{n(n-1)}}^k\), or \(n^2\ln n\) trials.
- Proof: Let \(E_i\) be the r.v. that the \(i\)th edge contracted was OK to contract -- i.e. not crossing the mincut. Let \(F_i\) be cumulative success: \[F_i\dq\IN_{j=1}^iE_i=\]
- Then \(\Pr{\text{succeed}}=\prod_{i=1}^{n-2}\Prr{E_i}{F_{i-1}}\).
- Claim: Min cut size \(=k\im\abs E\geq\nf{kn}2\).
- So \(\Pr\lr{E_1}=\Prr{E_1}{F_0}\geq1-\f k{kn/2}=1-2/n=\f{n-2}n\).
- Then, since there are still \(k\) bad edges, \(\Prr{E_2}{F_1}\geq\f{n-3}{n-1}\) -- and the progressive-fraction-product gives \(\f2{n(n-1)}\)
- Optimization: Since the risk of failure is much larger at the end, you should spend more work re-trying choices there.
- Result: This algorithm has been generalized and derandomized.
Expectation
- \(E:(\Oa\to\R)\to\R\), defined \(EX\dq\sum_ii\Pr(X=i)\).
- Linearity: \(E\lr{\sum_iX_i}=\sum_iEX_i\) and \(E(cX_i)=cE(X_i)\)
\(T_4\):
Chernoff (-Hoeffding) Bounds
Extend the idea of Chebyshev to \(n\)th moment:
\[\Pr((X-\mu)^n)\geq a^n)\leq\f{E(X-\mu)^n}{a^n}\].
Useful w/ MGFs: \(\Pr(X\geq a)\e\Pr\lr{e^{tx}\geq e^{ta}}\leq\f{Ee^{tx}}{e^{ta}}\) for \(t>0\).
Moment-generating function: \(Ee^{tx}\); the \(n\)th derivative at \(t=0\) is the \(n\)th moment of \(X\).
Also: \(Ee^{t(X+Y)}=E\lr{e^{tX}e^{tY}}=Ee^{tX}Ee^{tY}\)
Chernoff Example
\(X_i\dq\) coin flip 1 w/ prob \(p_i\), 0 else. \(X\dq\sum X_i\); \(\mu\dq EX\).
"Multiplicative" bound:
\[\Pr\lr{X\geq(a+\da)EX}\leq\lr{\f{e^\da}{(1+\da)^{1+\da}}}\]
\[\Pr(X\geq2EX)\leq(\nf e4)^n\]
(rest of class ignored; was doing pset)
\(T_5\): Bloom Filters, Coupon Collector Revisited -- Probabilistic Method
Bloom Filters
- Set membership queries -- Is \(x\in S\)? How little space do we need?
- Trading off time/space vs. accuracy
- Basic idea: for each \(x\in S\), for each \(i\in\s{1,\hh,k}\), set \(B[h_i(x)]\).
- Then \(x\in S\mim\fa i,B[h_i(x)]=1\) -- one-sided (false-pos) error
- Alternative idea: Use perfect hash function \(h:S\bij[0,n-1]\), store (orthogonal) hash fingerprints (\(b\) bits each) -- then false-positive w.p. \(2^{-b}\)
- Finding/representing/computing \(h\) is potentially nontrivial
- BF false-pos w.p. \(\lr{1-e^{-nk/m}}^k\)
- optimal: \(k=\f mn\ln2\)
- Chernoff bounds concentration: real-life false-pos \(=(\text{frac. empty bins})^k=\lr{1-\f{\text{#empty bins}}m}^k\)
- But the Poisson model \(\approx\) balls+bins model (\(\leq2e\sqrt m\) times more likely)
- Thm: The f.p.p. of a BF is \(\lr{1-e^{-nk/m}}^k+o(1)\)
Coupon Collectors
Pr takes \(>n\ln n+cn\) coupons \(\leq\)
\[\lim_{n\to\infty}(\Pr cc>n\ln n+cn)=1-e^{-e^{-c}}\]
We throw \(n\ln n+cn\) calls into \(n\) bins -- what's Pr \(\ex\) empty bin?
- Poisson approx: \(i\)th bin is empty \(\mu=\ln n+c\)
- \(i\)th bin is empty w.p. \(e^{-\mu}=e^{-(\ln n+c)}=\f{e^{-c}}n\)
Proof: Poisson model yields \(n\ln n+cn+O\lr{\rt{n\log n}}\) balls w.p. \(O(1)\); union the difference term is tiny, irrelevant.
Any time you get a balls-and-bins problem, first ask: What does the Poisson approximation tell me?
\(R_5\): Probabilistic Method
Ramsey Theory
Given: Complete graph \(K_n\) on \(n\) vertices, two-colored. Does there exist a graph with all edges colored, with no mono-colored \(K_k\) subgraph?
\(R_7\): Entropy
Lemma: \(\f{2^{nH(q)}}{n+1}\leq\binom n{nq}\lesssim2^{nH(q)}\)