Mehryar Mohri - Foundations of Machine Learning -- Errata

This page lists errors or typos appearing in the first edition and printing 1 of the book Foundations of Machine Learning as well as their corresponding corrections. We are grateful to all readers who kindly bring those to our attention.

Page 15, first two paragraphs: \mathsf R_S should be replaced with \mathsf R everywhere except from the 5th line of the first paragraph.
Page 15, lines 7 and 9: "at least $\epsilon/4$" should read "$\epsilon/4$" and similarly $\Pr[r_i] > \epsilon/4$ should read $\Pr[r_i] = \epsilon/4$ in line two of (2.5) in parentheses.
This was in fact originally introduced to correct a technical issue in the proof of Kearns and Vazirani (1994): such regions with probability exactly $\epsilon/4$ may not exist in general, except if the measure is assumed to be absolutely continuous. It is not hard to give a proof that holds in all cases though and this will be included in the second printing of this book.
Page 15, last line: the inequality should be $\Pr[R(R_S) > \epsilon ] \leq \delta$ (and not $\Pr[R(R_S) > \epsilon ] \leq 1 - \delta$).
Page 19, paragraph following equation (2.10): "the bound guarantees 90% accuracy" (not 99% accuracy).
Page 22, Example 2.6: the second part of the first sentence should read "let our hypothesis be the one that always guesses tails" instead of "let our hypothesis be the one that always guesses heads".
Page 36, line 4 from bottom: "definition 3.2" should be replaced by "definition 3.1".
Page 37, last step of proof of Lemma 3.1: $\R_{S_\cX}(H)$ should read $\widehat \R_{S_\cX}(H$) (missing hat).
Page 39, line 6 from the bottom: $R$ should be replaced by $r$ in proof of Theorem 3.3.
Page 57, exercise 3.13 should read: "Determine the VC-dimension of the class of hypotheses described by the unions of $k$ halfspaces" (not $k$ intervals).
Page 73, last line: "SVMproblem" should read "SVM problem".
Page 77, Theorem 4.3 and proof: $R$ should be replaced by $r$. Also, the first two expectations in the proof should be preceded with the supremum $\sup_{\| \w \| \leq \Lambda }$.
Page 81, line 4 from the bottom: $\log k = \sqrt{\log \log_2 (1/\rho_k)} \leq \sqrt{\log \log_2 (2/\rho)}$ should read $\sqrt{\log k} = \sqrt{\log \log_2 (1/\rho_k)} \leq \sqrt{\log \log_2 (2/\rho)}$.
Page 84, following exercise 4.1: The "Sparse SVM" problem should appear at the end of the exercise section and be numbered as exercise 4.6.
Page 97, Equation 5.11: $\text{if } (K(x, x) = 0) \wedge (K(x', x') = 0)$ should read $\text{if } (K(x, x) = 0) \vee (K(x', x') = 0)$.
Page 97, Equation 5.12: $\| \x' - \x' \|^2$ should read $\| \x' - \x \|^2$.
Page 102, line 3 from the bottom: "definition 3.2" should be replaced by "definition 3.1".
Page 103, statement of Corollary 5.1: $r = \sup_{x \in \cX} K(x, x)$ should be read $r^2 = \sup_{x \in \cX} K(x, x)$.
Page 124, line 8: the equation should read $g_t = \sum_{s = 1}^t \alpha_s h_s$ (and not $f_t = \sum_{s = 1}^t \alpha_t h_t$).
Page 150, statement of Theorem 7.2: "boundfor" should read "bound for" (missing space).
Page 160, line 6 from the bottom: the equation should read $y_t \w_{t+1} \cdot \x_t = y_t \w_t \cdot \x_t + \eta \| \x_t \|^2$ (and not $y_t \w_{t+1} \cdot \x_t = y_t \w_t \cdot y_t \x_t + \eta \| \x_t \|^2$).
Page 176, line 16: "exercise 7.10" should be replaced by "exercises 7.10 and 7.11".
Page 181, exercise 7.10: "margin loss (4.3 def. pag 77)" should be replaced by "soft-margin $(y, y') \in \Rset^2 \mapsto \max\{0, 1 - yy'/\rho \}$".
Page 192, the first constraint $0 \leq \Alpha_i \leq \C$ in optimization at the top of the page should read instead $0 \leq \alpha_{iy_i} \leq C$ and $\alpha_{ij} \leq 0$ for $j \neq y_i$.
Page 196, paragraphs 3 and 4: "convex" should be replaced with "concave".
Page 197, line 7: "see exercise 9.6" should be replaced by "see exercise 8.5".
Page 246, line 4: the last coordinate of $\W$ should be "b" (not "1").
Page 217, line 3 from bottom: the expression should be replaced by $\sqrt{1 - \frac{(\e^+_t - \e^-_t)^2}{(1 - \e^0_t)}}$, which holds by the concavity of the square-root function.
Page 217, line 2 from bottom: the last inequality can be replaced by an equality.
Page 262, line 9 of pseudocode: $\sum_{t = 1}^T \alpha_t K(x_t, \cdot)$ should read $\sum_{t = 1}^T (\alpha'_t - \alpha_t) K(x_t, \cdot)$.
Page 290: "byTenenbaum" should read "by Tenenbaum" (missing space).
Page 341, line 3 from the bottom: missing absolute values in definition, $\| \x \|_\infty = \max_{j \in [1, N]} x_j$ should read instead $\| \x \|_\infty = \max_{j \in [1, N]} |x_j|$.
Page 367, line 1: "two random variables X and Y are independent iff" should read "if two random variables X and Y are independent".
Page 370, lines 3 and 4 of proof of Theorem D.1: the factor $exp(-t \epsilon)$ should be outside the product sign.