Homework 5
Assigned Mon Mar 7, due Tue Mar 22
Nesterov's Optimal Method
In class we derived a lower bound for the complexity of gradient
methods on strongly convex functions; see the last 5 pages of
my notes which are
taken from p. 66-68 of
Nesterov's book.
Nesterov then derives several methods that attain this bound. However, the
derivation is very complicated. So, all we did in class was to present the
simplest one given on the last page of my notes (see also (2.2.11), p.81 of his book).
This assignment simply asks you to program this optimal method and compare the
results with the ordinary gradient method, using two different choices of
fixed stepsizes, namely, t=1/M (for which we have the bound
on the first page of my notes from (9.18) in BV) and t=2/(m+M) (for which we have
the bound on the second page of my notes, from Theorem 2.1.15 in Nesterov's book),
on three examples:
- the quadratic defined by the Hilbert matrix that you used in HW4
using the same starting point (the vector of all ones) as in HW4
- the example of Exercise 9.30 (BV page 519) that you used in HW4, using
the same starting point (zero) as in HW4
- Nesterov's worst case example mentioned above, with the given
starting point (zero) and with the order of the tridiagonal matrix truncated to
n=10,000, with m=1
and two values for M: M=100 and M=10,000. (Recall Nesterov uses μ and L for
m and M.) Be sure to code the tridiagonal matrix as a sparse matrix
(type help sparse in Matlab) so that matrix-vector multiplies with the
tridiagonal matrix are efficient.
Summarize your results in whatever way seems to be convenient.