Mehryar Mohri - Foundations of Machine Learning -- Errata

This page lists errors or typos appearing in the first edition and printing 2 of the book Foundations of Machine Learning as well as their corresponding corrections. We are grateful to all readers who kindly bring those to our attention.

Page 19, paragraph following equation (2.10): "the bound guarantees 90% accuracy" (not 99% accuracy).
Page 44, Example 3.3: "figure 3.2(a)" should read "figure 3.3(a)" and "figure 3.2(b)" should read "figure 3.3(b)".
Page 80, proof of Theorem 4.4: there is no need to resort to $\Phi_\rho - 1$, the proof holds directly with $\Phi_\rho$.
Page 95, first line of the proof: $\Phi(x) \colon {\cal X} \to \Rset$ should read $\Phi(x) \colon \cX \to \Rset^{\cal X}$.
Page 121, definition 6.1: "$\epsilon > 0$ and" should be removed.
Page 168, pseudocode of Kernel Perceptron algorithm: $\alpha_{t + 1}$ should be replaed by $\alpha_t$.
Page 170, in the inequality for $\Phi_{t + 1} - \Phi_{t}$, the following two intermediate lines should be inserted just before the last inequality for more explanation:
& = \log \E_{i \sim \w_t}\big[ \exp(\eta y_t x_{t, i} - \eta y_t \w_t \cdot x_{t} + \eta y_t \w_t \cdot x_{t}) \big] - \eta \rho_\infty\\
& \leq \log \big[ \exp(\eta^2 (2 r_\infty)^2/8) \big] + \underbrace{\eta y_t (\w_t \cdot x_{t})}_{\leq 0} - \eta \rho_\infty\\[-.35cm]
Page 181, exercise 7.10, second paragraph: the definition of $m_i$ in the first sentence of that paragraph is given in the special case of the zero-one loss. For the general case, the sentence should be replaced by: "Let $m_i$ be the cumulative loss of hypothesis $h_i$ on the points $(x_i, \ldots, x_T)$, that is $m_i = \sum_{t = i}^T L(h_i(x_t), y_t)$".
Page 181, exercise 7.10, in the text following the inline equation: $i^* = argmin_i m_i / (T - i)$ should be replaced by $i^* = argmin_i m_i / (T - i + 1)$ .
page 189, third paragraph: $W=(w_1^\top, \ldots, w_k^\top)^\top$ should read $W=(w_1, \ldots, w_k)^\top$.
Page 190, line 5: the empirical Rademacher complexity symbol should be replaced by that of Rademacher complexity.
Page 191, equation 8.12: the factor 4k^2 should be 2k^2 instead.
Page 191, optimization problem: the constraints $\xi_i \geq 0$ should be added.
Page 192, section 8.3.2 first paragraph: "exercise 9.5" should read "exercise 8.4".
Page 207, exercise 8.4: "family of base hypothesis" should read "family of base hypotheses".
Page 217, line 3 from bottom: the expression should be replaced by $\sqrt{1 - \frac{(\e^+_t - \e^-_t)^2}{(1 - \e^0_t)}}$, which holds by the concavity of the square-root function.
Page 283, inline after (12.2): $U^\top X X^\top U$ should read $Tr[U^\top X X^\top U]$.
Page 370, line 5: $\phi'(t) \leq \frac{(b-a)^2}{4}$ should read $\phi''(t) \leq\frac{(b-a)^2}{4}$.
Page 371, last line of lemma D.2: $\E[e^{sV} | Z ]$ should read $\E[e^{tV} | Z ]$.
Page 381, first line of section D.1: extra space before the comma should be removed.
Page 364, last line: it should read $(1 - 2t)^{-1/2}$ and not $(1 - 2t)^{1/2}$.
Page 365, first displayed equation: it should read $(1 - 2t)^{-k/2}$ and not $(1 - 2t)^{k/2}$.
Page 370, lines 3 and 4 of proof of Theorem D.1: the factor $exp(-t \epsilon)$ should be outside the product sign.