# CS 223 (Randomized Algorithms)

These are my lecture notes from CS 223: Randomized Algorithms (Piazza), taught by Michael Mitzenmacher at Harvard. I am an enrolled student, and these notes are in no way official; they're reproduced by me from Prof. Mitzenmacher's lectures in Spring 2015, and are licensed Creative Commons Attribution-NonCommercial-ShareAlike 4.0, which supercedes any other site-wide licenses.

# $R_{01}$: Introductions, Some Probability Theory

## Two Problems

### Problem 1: Dice

Alice suggests the following Game: Given three fair but nonstandard dice:

• A: 1,1,6,6,8,8
• B: 2,2,4,4,9,9
• C: 3,3,5,5,7,7

Bob will choose a die, then Alice will choose another die. Both will roll, the highest roller will win. What should Bob choose?

Quote: "Well, the meta-level reason that you don't want to play this game [as Bob] is that, if a professor of probability proposes a dice game to you, you probably don't want to play."

Solution: With 5/9 probability each, A beats C beats B beats A. Bob's choice doesn't matter, but he's 5:4 to lose no matter his choice. Read more elsewhere at Wiki | Nontransitive dice.

High-Order Lesson: You have to be very careful with probability; lots of things, like "strength of dice" are unintuitive (e.g. in being nontransitive).

### Problem 2: Coins

Bob suggests a different Game: Alice will choose a sequence of three {heads, tails}, then Bob will choose a different sequence. They'll toss a fair coin until one of their sequences comes up as a consecutive subsequence of toss results. What should Alice choose?

Solution: Consider the situation if Alice picks $\sf HHH$; then Bob can pick $\sf THH$. Then the game is the same as if Alice had picked $\sf HHH$ and Bob had picked $\sf T$; the game will be over after three turns, so it's easy to see that Bob is 7:1 to win.

We can extrapolate the full strategy, involving a similar prefixing trick, whereby Bob prepends a single letter to the first two of Alice's -- specifically, the opposite of Alice's second letter. Then $\sf HTH$ is an optimal choice for Alice, $\sf HHT$ is an optimal response by Bob, giving Bob 2:1 to win. Read more elsewhere at Wiki | Penney's game.

## Discrete Probability Spaces

• Set-theoretically, a probability space consists of:
• a sample space $\Oa$ (a set of all possible outcomes)
• an event space $\F\sbq\Oa$
• a function $\Pr:\F\to\R[0,1]$.
• Axioms:
• $\Pr(\Oa)=1$.
• $\fa x\in\F,0\leq\Pr(x)\leq1$.
• $\fa x,y\in\F,\Pr(x\un y)=\Pr(x)+\Pr(y)-\Pr(x\cap y)$

### Polynomial Identity Testing

• Problem: $\ds{f(x)\dq\prod_{i=1}^n(x-a_i)}=^?\sum_{i=0}^nc_ix^i\qd g(x)$.
• Deterministic Algorithm: Expand the LHS, compare symbolically.
• Problem: It might take a long time (exponential blowup).
• Randomized Algorithm: Plug in the same (randomly chosen) $x'$ on both sides:
• If $\text{LHS}\lr{x'}=\text{RHS}\lr{x'}$, declare equal.
• If $\text{LHS}\lr{x'}\neq\text{RHS}\lr{x'}$, declare inequal.
• However, there's a chance of mistakenly declaring them equal, if you picked a bad $x'$ by mistake.
• More precisely, you'll fail if you picked a root of $f-g$.
• But since we know that there are at most $n$ such roots (in the case $f\neq g$), we can limit the probability of failure to 1% if we choose from a pool of $100n$ distinct numbers.
• Gets even better ($0.01^k$) if we choose repeatedly; gets even better without replacement; in the limit, we can obtain a (nonrandomized, exact) $O\lr{n^2}$ solution by simply choosing $k=n+1$.
• But here, we're interested in a ~linear-time solution, so let's

### More Probability Theory

• Events $E,F$ are independent iff $\Pr(E\cap F)=\Pr(E)\cdot\Pr(F)$.
• Events $E_1,\ldots,E_k$ are mutually independent iff $\fa \mathscr E\in P(\{E_1,\ldots,E_k\}),\Pr\lr{\IN\mathscr E}=\prod\Pr(\mathscr E)$.

Caution: Mutual independence is not the same thing as pairwise independence; consider three events consisting of two coin flips and their xor.

### Another Problem: MatMult Checking (mod 2)

• Problem: $AB=^?C$.
• Deterministic Algorithm: Do the computation yourself.
• Problem: it's slow ($O\lr{n^3}$, or slightly faster).
• Randomized Algorithm: Use matrix-vector calculations.
• Pick $\vec r$ randomly. (Flip a coin for each coordinate: $2^n$ possible outcomes, all equally likely.)
• Check $A(B\vec r)=^?C\vec r$. (Takes $O\lr{n^2}$.)
• Ask: If $AB\neq C$, what is $\Pr_{\vec r}(AB\vec r=C\vec r)$?

# $T_2$:

## $AB=^?C$

• Claim: Random sampling will catch $A\lr{B\vec r}\neq C\vec r$ w.p. $\geq\nf12$ (and 0 error in the other direction).
• Proof: There is at least one nonzero entry of $D=AB-C$ if it's nonzero, wlog pick its input last. Then, to give a false-equals, its input value must match the rowsum of the rest -- which it will do w.p. only $\nf12$.
• It follows that the tight bound on the p (in the modulo 2 case) is $1-2^{-k})\, where \(k$ is the rank of $AB-C$

## Min-Cut

Reminder: Conditional probability: $\Prr YX\dq\f{\Pr(Y\cap X)}{\Pr(X)}$, requiring $\Pr(X)>0$.

• Problem: Given $G=(V,E)$ and $u,v\in V$, return a partition $U\ni u,V\ni v$ minimizing the number (for unweighted case) of edges crossing $U\leftrightarrow V$.
• Algorithm: (Contraction algorithm)
• Pick a random edge (not $(u,v)$); "contract" those vertices -- i.e. ensure that they're on the same side of the cut.
• Claim: $\Pr(\text{got min cut})\geq\f2{n(n-1)}$ -- so boost by running many times and taking min.
• So after $k$ runs, $\Pr(\text{fail})\leq\lr{1-\f2{n(n-1)}}^k$, or $n^2\ln n$ trials.
• Proof: Let $E_i$ be the r.v. that the $i$th edge contracted was OK to contract -- i.e. not crossing the mincut. Let $F_i$ be cumulative success: $F_i\dq\IN_{j=1}^iE_i=$
• Then $\Pr{\text{succeed}}=\prod_{i=1}^{n-2}\Prr{E_i}{F_{i-1}}$.
• Claim: Min cut size $=k\im\abs E\geq\nf{kn}2$.
• So $\Pr\lr{E_1}=\Prr{E_1}{F_0}\geq1-\f k{kn/2}=1-2/n=\f{n-2}n$.
• Then, since there are still $k$ bad edges, $\Prr{E_2}{F_1}\geq\f{n-3}{n-1}$ -- and the progressive-fraction-product gives $\f2{n(n-1)}$
• Optimization: Since the risk of failure is much larger at the end, you should spend more work re-trying choices there.
• Result: This algorithm has been generalized and derandomized.

## Expectation

• $E:(\Oa\to\R)\to\R$, defined $EX\dq\sum_ii\Pr(X=i)$.
• Linearity: $E\lr{\sum_iX_i}=\sum_iEX_i$ and $E(cX_i)=cE(X_i)$

# $T_4$:

## Chernoff (-Hoeffding) Bounds

Extend the idea of Chebyshev to $n$th moment:
$\Pr((X-\mu)^n)\geq a^n)\leq\f{E(X-\mu)^n}{a^n}$.

Useful w/ MGFs: $\Pr(X\geq a)\e\Pr\lr{e^{tx}\geq e^{ta}}\leq\f{Ee^{tx}}{e^{ta}}$ for $t>0$.

Moment-generating function: $Ee^{tx}$; the $n$th derivative at $t=0$ is the $n$th moment of $X$.

Also: $Ee^{t(X+Y)}=E\lr{e^{tX}e^{tY}}=Ee^{tX}Ee^{tY}$

## Chernoff Example

$X_i\dq$ coin flip 1 w/ prob $p_i$, 0 else. $X\dq\sum X_i$; $\mu\dq EX$.

"Multiplicative" bound:
$\Pr\lr{X\geq(a+\da)EX}\leq\lr{\f{e^\da}{(1+\da)^{1+\da}}}$
$\Pr(X\geq2EX)\leq(\nf e4)^n$

(rest of class ignored; was doing pset)

# $T_5$: Bloom Filters, Coupon Collector Revisited -- Probabilistic Method

## Bloom Filters

• Set membership queries -- Is $x\in S$? How little space do we need?
• Trading off time/space vs. accuracy
• Basic idea: for each $x\in S$, for each $i\in\s{1,\hh,k}$, set $B[h_i(x)]$.
• Then $x\in S\mim\fa i,B[h_i(x)]=1$ -- one-sided (false-pos) error
• Alternative idea: Use perfect hash function $h:S\bij[0,n-1]$, store (orthogonal) hash fingerprints ($b$ bits each) -- then false-positive w.p. $2^{-b}$
• Finding/representing/computing $h$ is potentially nontrivial
• BF false-pos w.p. $\lr{1-e^{-nk/m}}^k$
• optimal: $k=\f mn\ln2$
• Chernoff bounds concentration: real-life false-pos $=(\text{frac. empty bins})^k=\lr{1-\f{\text{#empty bins}}m}^k$
• But the Poisson model $\approx$ balls+bins model ($\leq2e\sqrt m$ times more likely)
• Thm: The f.p.p. of a BF is $\lr{1-e^{-nk/m}}^k+o(1)$

## Coupon Collectors

Pr takes $>n\ln n+cn$ coupons $\leq$
$\lim_{n\to\infty}(\Pr cc>n\ln n+cn)=1-e^{-e^{-c}}$
We throw $n\ln n+cn$ calls into $n$ bins -- what's Pr $\ex$ empty bin?

• Poisson approx: $i$th bin is empty $\mu=\ln n+c$
• $i$th bin is empty w.p. $e^{-\mu}=e^{-(\ln n+c)}=\f{e^{-c}}n$

Proof: Poisson model yields $n\ln n+cn+O\lr{\rt{n\log n}}$ balls w.p. $O(1)$; union the difference term is tiny, irrelevant.

Any time you get a balls-and-bins problem, first ask: What does the Poisson approximation tell me?

# $R_5$: Probabilistic Method

## Ramsey Theory

Given: Complete graph $K_n$ on $n$ vertices, two-colored. Does there exist a graph with all edges colored, with no mono-colored $K_k$ subgraph?

# $R_7$: Entropy

Lemma: $\f{2^{nH(q)}}{n+1}\leq\binom n{nq}\lesssim2^{nH(q)}$