This page lists errors or typos appearing in the first edition and
printing 2 of the
book Foundations of
Machine Learning as well as their corresponding
corrections. We are grateful to all readers who kindly bring those to
our attention.
- Page 19, paragraph following equation (2.10): "the bound guarantees 90% accuracy" (not 99% accuracy).
- Page 44, Example 3.3: "figure 3.2(a)" should read "figure 3.3(a)" and "figure 3.2(b)" should read "figure 3.3(b)".
- Page 80, proof of Theorem 4.4: there is no need to resort to
$\Phi_\rho - 1$, the proof holds directly with $\Phi_\rho$.
- Page 95, first line of the proof: $\Phi(x) \colon {\cal X} \to \Rset$
should read $\Phi(x) \colon \cX \to \Rset^{\cal X}$.
- Page 121, definition 6.1: "$\epsilon > 0$ and" should be removed.
- Page 168, pseudocode of Kernel Perceptron algorithm: $\alpha_{t +
1}$ should be replaed by $\alpha_t$.
- Page 170, in the inequality for $\Phi_{t + 1} - \Phi_{t}$, the
following two intermediate lines should be inserted just before the
last inequality for more explanation:
& = \log \E_{i \sim \w_t}\big[ \exp(\eta y_t x_{t, i} - \eta y_t \w_t \cdot x_{t} + \eta y_t \w_t \cdot x_{t}) \big] - \eta \rho_\infty\\
& \leq \log \big[ \exp(\eta^2 (2 r_\infty)^2/8) \big] + \underbrace{\eta y_t (\w_t \cdot x_{t})}_{\leq 0} - \eta \rho_\infty\\[-.35cm]
- Page 181, exercise 7.10, second paragraph: the definition of
$m_i$ in the first sentence of that paragraph is given in the special
case of the zero-one loss. For the general case, the sentence should
be replaced by: "Let $m_i$ be the cumulative loss of hypothesis $h_i$
on the points $(x_i, \ldots, x_T)$, that is $m_i = \sum_{t = i}^T
L(h_i(x_t), y_t)$".
- Page 181, exercise 7.10, in the text following the inline equation: $i^* = argmin_i m_i / (T - i)$ should be replaced by $i^* = argmin_i m_i / (T - i + 1)$ .
- page 189, third paragraph: $W=(w_1^\top, \ldots, w_k^\top)^\top$ should read $W=(w_1, \ldots, w_k)^\top$.
- Page 190, line 5: the empirical Rademacher complexity symbol should be replaced by that of Rademacher complexity.
- Page 191, equation 8.12: the factor 4k^2 should be 2k^2 instead.
- Page 191, optimization problem: the constraints $\xi_i \geq 0$ should be added.
- Page 192, section 8.3.2 first paragraph: "exercise 9.5" should read "exercise 8.4".
- Page 207, exercise 8.4: "family of base hypothesis" should read "family of base hypotheses".
- Page 217, line 3 from bottom: the expression
should be replaced by $\sqrt{1 - \frac{(\e^+_t - \e^-_t)^2}{(1 - \e^0_t)}}$,
which holds by the concavity of the square-root function.
- Page 283, inline after (12.2): $U^\top X X^\top U$ should read $Tr[U^\top X X^\top U]$.
- Page 370, line 5: $\phi'(t) \leq \frac{(b-a)^2}{4}$ should read
$\phi''(t) \leq\frac{(b-a)^2}{4}$.
- Page 371, last line of lemma D.2: $\E[e^{sV} | Z ]$ should read
$\E[e^{tV} | Z ]$.
- Page 381, first line of section D.1: extra space before the comma should be removed.
- Page 364, last line: it should read $(1 - 2t)^{-1/2}$ and not $(1 - 2t)^{1/2}$.
- Page 365, first displayed equation: it should read $(1 - 2t)^{-k/2}$ and not $(1 - 2t)^{k/2}$.
- Page 370, lines 3 and 4 of proof of Theorem D.1: the factor $exp(-t \epsilon)$ should be outside the product sign.