# CS 223 (Randomized Algorithms)

\def\al{\alpha}

\def\Al{\Alpha}

\def\bt{\beta}

\def\Bt{\Beta}

\def\da{\delta}

\def\Da{\Delta}

\def\ep{\epsilon}

\def\Ep{\Epsilon}

\def\gm{\gamma}

\def\Gm{\Gamma}

\def\la{\lambda}

\def\La{\Lambda}

\def\sg{\sigma}

\def\Sg{\Sigma}

\def\te{\theta}

\def\Te{\Theta}

\def\oa{\omega}

\def\Oa{\Omega}

\def-#1{^{-#1}}

\def\2{\s{0,1}}

\def\abs#1{\left|#1\right|}

\def\argmin{\text{argmin}}

\DeclareMathOperator*{\argmin}{arg,min}

\def\aryc#1{\lr{\begin{array}{c}#1\end{array}}}

\def\aryd#1{\lr{\begin{array}{cc}#1\end{array}}}

\def\arye#1{\lr{\begin{array}{ccc}#1\end{array}}}

\def\aryf#1{\lr{\begin{array}{cccc}#1\end{array}}}

\def\aryg#1{\lr{\begin{array}{ccccc}#1\end{array}}}

\def\blr#1{\big(#1\big)}

\def\co#1{\cx{co-#1}}

\def\cx#1{{\sf #1}}

\def\dd{\ddots}

\def\del{\partial}

\def\dq{:=}

\def\ds#1{\displaystyle{#1}}

\def\e\equiv

\def\ex{\exists}

\def\Exp#1{\exp!!\left[#1\right]}

\def\f{\frac}

\def\fa{\forall}

\def\grad{\nabla}

\def\hh{\ldots}

\def\im{\Longrightarrow}

\def\IN{\bigcap}

\def\ixl#1{\xleftarrow[#1]{}}

\def\ixr#1{\xrightarrow[#1]{}}

\def\lgr#1#2{\left(#1;\middle|;#2\right)}

\def\lr#1{\left(#1\right)}

\def\mi{\Longleftarrow}

\def\mim{\Longleftrightarrow}

\def\n#1{\left|#1\right|}

\def\nf#1#2{{}^{#1}/_{#2}}

\def\nin{\notin}

\def\otr{\overset R\leftarrow}

\def\ox{\otimes}

\def\pcwz#1{\left{\begin{array}{ll}#1\end{array}\right.}

\def\pelse{&\text{else}}

\def\pif{&\text{if }}

\def\poly{\text{poly}}

\def\pr#1{\cx{pr#1}}

\def\Pr{\text{Pr}}

\def\Prr#1#2{\Pr\left(#1;\middle|;#2\right)}

\def\px#1{[\text{#1}]}

\def\qd{=:}

\def\s#1{\left{#1\right}}

\def\sbn{\subsetneq}

\def\sbq{\subseteq}

\def\sbs{\subset}

\def\spn{\supseteq}

\def\spq{\supseteq}

\def\sps{\supset}

\def\st#1#2{\left{#1\;\middle|;#2\right}}

\def\tis#1{\text{ is #1}}

\def\tr{^{\sf t}}

\def\un{\cup}

\def\UN{\bigcup}

\def\vtr{^{;\sf t}}

\def\x{\times}

\def\xl#1{\xleftarrow{#1}}

\def\xr#1{\xrightarrow{#1}}

\def\xxl#1#2{\xleftarrow[#2]{#1}}

\def\xxr#1#2{\xrightarrow[#2]{#1}}

)

These are my lecture notes from CS 223: Randomized Algorithms (Piazza), taught by Michael Mitzenmacher at Harvard. I am an enrolled student, and these notes are in no way official; they're reproduced by me from Prof. Mitzenmacher's lectures in Spring 2015, and are licensed Creative Commons Attribution-NonCommercial-ShareAlike 4.0, which supercedes any other site-wide licenses.

# \(R_{01}\): Introductions, Some Probability Theory

## Two Problems

### Problem 1: Dice

Alice suggests the following **Game:** Given three fair but nonstandard dice:

- A: 1,1,6,6,8,8
- B: 2,2,4,4,9,9
- C: 3,3,5,5,7,7

Bob will choose a die, then Alice will choose another die. Both will roll, the highest roller will win. What should Bob choose?

Quote:"Well, the meta-level reason that you don't want to play this game [as Bob] is that, if a professor of probability proposes a dice game to you, you probably don't want to play."

**Solution:** With 5/9 probability each, A beats C beats B beats A. Bob's choice doesn't matter, but he's 5:4 to lose no matter his choice. Read more elsewhere at Wiki | Nontransitive dice.

High-Order Lesson:You have to be very careful with probability; lots of things, like "strength of dice" are unintuitive (e.g.in being nontransitive).

### Problem 2: Coins

Bob suggests a different **Game:** Alice will choose a sequence of three {heads, tails}, then Bob will choose a different sequence. They'll toss a fair coin until one of their sequences comes up as a consecutive subsequence of toss results. What should Alice choose?

**Solution:** Consider the situation if Alice picks \(\sf HHH\); then Bob can pick \(\sf THH\). Then the game is the same as if Alice had picked \(\sf HHH\) and Bob had picked \(\sf T\); the game will be over after three turns, so it's easy to see that Bob is 7:1 to win.

We can extrapolate the full strategy, involving a similar prefixing trick, whereby Bob prepends a single letter to the first two of Alice's -- specifically, *the opposite of Alice's second letter*. Then \(\sf HTH\) is an optimal choice for Alice, \(\sf HHT\) is an optimal response by Bob, giving Bob 2:1 to win. Read more elsewhere at Wiki | Penney's game.

## Discrete Probability Spaces

- Set-theoretically, a probability space consists of:
- a sample space \(\Oa\) (a set of all possible outcomes)
- an event space \(\F\sbq\Oa\)
- a function \(\Pr:\F\to\R[0,1]\).
- Axioms:
- \(\Pr(\Oa)=1\).
- \(\fa x\in\F,0\leq\Pr(x)\leq1\).
- \(\fa x,y\in\F,\Pr(x\un y)=\Pr(x)+\Pr(y)-\Pr(x\cap y)\)

### Polynomial Identity Testing

**Problem:**\(\ds{f(x)\dq\prod_{i=1}^n(x-a_i)}=^?\sum_{i=0}^nc_ix^i\qd g(x)\).**Deterministic Algorithm:**Expand the LHS, compare symbolically.- Problem: It might take a long time (exponential blowup).
**Randomized Algorithm:**Plug in the same (randomly chosen) \(x'\) on both sides:- If \(\text{LHS}\lr{x'}=\text{RHS}\lr{x'}\), declare equal.
- If \(\text{LHS}\lr{x'}\neq\text{RHS}\lr{x'}\), declare inequal.
- However, there's a chance of mistakenly declaring them equal, if you picked a bad \(x'\) by mistake.
- More precisely, you'll fail if you picked a root of \(f-g\).
- But since we know that there are at most \(n\) such roots (in the case \(f\neq g\)), we can limit the probability of failure to 1% if we choose from a pool of \(100n\) distinct numbers.
- Gets even better (\(0.01^k\)) if we choose repeatedly; gets even better without replacement; in the limit, we can obtain a (nonrandomized, exact) \(O\lr{n^2}\) solution by simply choosing \(k=n+1\).
- But here, we're interested in a ~linear-time solution, so let's

### More Probability Theory

- Events \(E,F\) are independent iff \(\Pr(E\cap F)=\Pr(E)\cdot\Pr(F)\).
- Events \(E_1,\ldots,E_k\) are mutually independent iff \(\fa \mathscr E\in P(\{E_1,\ldots,E_k\}),\Pr\lr{\IN\mathscr E}=\prod\Pr(\mathscr E)\).

Caution:Mutual independence is not the same thing as pairwise independence; consider three events consisting of two coin flips and their xor.

### Another Problem: MatMult Checking (mod 2)

**Problem:**\(AB=^?C\).**Deterministic Algorithm:***Do the computation yourself.*- Problem: it's slow (\(O\lr{n^3}\), or slightly faster).
**Randomized Algorithm:***Use matrix-vector calculations.*- Pick \(\vec r\) randomly. (Flip a coin for each coordinate: \(2^n\) possible outcomes, all equally likely.)
- Check \(A(B\vec r)=^?C\vec r\). (Takes \(O\lr{n^2}\).)
**Ask:**If \(AB\neq C\), what is \(\Pr_{\vec r}(AB\vec r=C\vec r)\)?

# \(T_2\):

## \(AB=^?C\)

**Claim:**Random sampling will catch \(A\lr{B\vec r}\neq C\vec r\) w.p. \(\geq\nf12\) (and 0 error in the other direction).**Proof:**There is at least one nonzero entry of \(D=AB-C\) if it's nonzero, wlog pick its input last. Then, to give a false-equals, its input value must match the rowsum of the rest -- which it will do w.p. only \(\nf12\).- It follows that the tight bound on the p (in the modulo 2 case) is \(1-2^{-k})\, where \(k\) is the rank of \(AB-C\)

## Min-Cut

Reminder:Conditional probability: \(\Prr YX\dq\f{\Pr(Y\cap X)}{\Pr(X)}\), requiring \(\Pr(X)>0\).

**Problem:**Given \(G=(V,E)\) and \(u,v\in V\), return a partition \(U\ni u,V\ni v\) minimizing the number (for unweighted case) of edges crossing \(U\leftrightarrow V\).**Algorithm:**(Contraction algorithm)- Pick a random edge (not \((u,v)\)); "contract" those vertices --
*i.e.*ensure that they're on the same side of the cut. **Claim:**\(\Pr(\text{got min cut})\geq\f2{n(n-1)}\) -- so boost by running many times and taking min.- So after \(k\) runs, \(\Pr(\text{fail})\leq\lr{1-\f2{n(n-1)}}^k\), or \(n^2\ln n\) trials.
**Proof:**Let \(E_i\) be the r.v. that the \(i\)th edge contracted was OK to contract --*i.e.***not**crossing the mincut. Let \(F_i\) be cumulative success: \[F_i\dq\IN_{j=1}^iE_i=\]- Then \(\Pr{\text{succeed}}=\prod_{i=1}^{n-2}\Prr{E_i}{F_{i-1}}\).
**Claim:**Min cut size \(=k\im\abs E\geq\nf{kn}2\).- So \(\Pr\lr{E_1}=\Prr{E_1}{F_0}\geq1-\f k{kn/2}=1-2/n=\f{n-2}n\).
- Then, since there are still \(k\) bad edges, \(\Prr{E_2}{F_1}\geq\f{n-3}{n-1}\) -- and the progressive-fraction-product gives \(\f2{n(n-1)}\)
**Optimization:**Since the risk of failure is much larger at the end, you should spend more work re-trying choices there.**Result:**This algorithm has been generalized and derandomized.

## Expectation

- \(E:(\Oa\to\R)\to\R\), defined \(EX\dq\sum_ii\Pr(X=i)\).
- Linearity: \(E\lr{\sum_iX_i}=\sum_iEX_i\) and \(E(cX_i)=cE(X_i)\)

# \(T_4\):

## Chernoff (-Hoeffding) Bounds

Extend the idea of Chebyshev to \(n\)th moment:

\[\Pr((X-\mu)^n)\geq a^n)\leq\f{E(X-\mu)^n}{a^n}\].

Useful w/ MGFs: \(\Pr(X\geq a)\e\Pr\lr{e^{tx}\geq e^{ta}}\leq\f{Ee^{tx}}{e^{ta}}\) for \(t>0\).

Moment-generating function: \(Ee^{tx}\); the \(n\)th derivative at \(t=0\) is the \(n\)th moment of \(X\).

Also: \(Ee^{t(X+Y)}=E\lr{e^{tX}e^{tY}}=Ee^{tX}Ee^{tY}\)

## Chernoff Example

\(X_i\dq\) coin flip 1 w/ prob \(p_i\), 0 else. \(X\dq\sum X_i\); \(\mu\dq EX\).

"Multiplicative" bound:

\[\Pr\lr{X\geq(a+\da)EX}\leq\lr{\f{e^\da}{(1+\da)^{1+\da}}}\]

\[\Pr(X\geq2EX)\leq(\nf e4)^n\]

*(rest of class ignored; was doing pset)*

# \(T_5\): Bloom Filters, Coupon Collector Revisited -- Probabilistic Method

## Bloom Filters

- Set membership queries -- Is \(x\in S\)? How little space do we need?
- Trading off time/space vs. accuracy
- Basic idea: for each \(x\in S\), for each \(i\in\s{1,\hh,k}\), set \(B[h_i(x)]\).
- Then \(x\in S\mim\fa i,B[h_i(x)]=1\) -- one-sided (false-pos) error
- Alternative idea: Use perfect hash function \(h:S\bij[0,n-1]\), store (orthogonal) hash fingerprints (\(b\) bits each) -- then false-positive w.p. \(2^{-b}\)
- Finding/representing/computing \(h\) is potentially nontrivial
- BF false-pos w.p. \(\lr{1-e^{-nk/m}}^k\)
- optimal: \(k=\f mn\ln2\)
- Chernoff bounds concentration: real-life false-pos \(=(\text{frac. empty bins})^k=\lr{1-\f{\text{#empty bins}}m}^k\)
- But the Poisson model \(\approx\) balls+bins model (\(\leq2e\sqrt m\) times more likely)
**Thm:**The f.p.p. of a BF is \(\lr{1-e^{-nk/m}}^k+o(1)\)

## Coupon Collectors

Pr takes \(>n\ln n+cn\) coupons \(\leq\)

\[\lim_{n\to\infty}(\Pr cc>n\ln n+cn)=1-e^{-e^{-c}}\]

We throw \(n\ln n+cn\) calls into \(n\) bins -- what's Pr \(\ex\) empty bin?

- Poisson approx: \(i\)th bin is empty \(\mu=\ln n+c\)
- \(i\)th bin is empty w.p. \(e^{-\mu}=e^{-(\ln n+c)}=\f{e^{-c}}n\)

*Proof:* Poisson model yields \(n\ln n+cn+O\lr{\rt{n\log n}}\) balls w.p. \(O(1)\); union the difference term is tiny, irrelevant.

Any time you get a balls-and-bins problem, first ask: What does the Poisson approximation tell me?

# \(R_5\): Probabilistic Method

## Ramsey Theory

Given: Complete graph \(K_n\) on \(n\) vertices, two-colored. Does there exist a graph with all edges colored, with no mono-colored \(K_k\) subgraph?

# \(R_7\): Entropy

Lemma:\(\f{2^{nH(q)}}{n+1}\leq\binom n{nq}\lesssim2^{nH(q)}\)