Assigned Mon Mar 23, due Thu Apr 2 (any time)
Please do not submit homework by email. Submit a hard copy
to me in person or leave it under my office door later, but not
in my mailbox.
Ground rules for homework: you can get help from any source (friends,
relatives, books, the web) but you must acknowledge the source
in your submission. This does not mean that the homework can be
done by groups of students. You must do your own homework, but you can
get help from other sources when you need it. You may not copy
another student's homework solution or another student's computer program.
Generally, homework will not be accepted late unless there is a good reason!
Gradient and Newton Methods
Turn in Matlab code listings as well as answers to the questions above.
Make sure that your codes have plenty of comments explaining what they do.
If anything is not clear, don't hesitate to contact me by email or by
dropping by my office.
- Write a Matlab function to implement the Gradient Method (equivalently,
Steepest Descent in the 2-norm). The first line should be
function [x,f,iters] = gradmeth(fun, x0, tol, maxit), as in the
The first input parameter fun is an "anonymous function", which means
that calls to fun inside gradmeth will actually be to some
function whose name is unknown until gradmeth is called.
This is the function that gradmeth is supposed to minimize.
The second input parameter is the starting point, the third is the
tolerance on the norm of the gradient and the fourth is the max number of
iterations to be allowed. The gradmeth code should include the backtracking
line search described on p.464. You can use two nested while loops
(the outer one for the main iteration, the inner one for the backtracking
line search), or you can code the backtracking line search in a separate
function, since we will also need it for Newton's method below. Use &alpha=1/4, &beta=1/2.
Test my template on the simple quadratic function coded
in quad.m by calling the script example.m
and make sure you understand how all this works before replacing the dummy code
in the template by your own code. (If you are using Matlab 6 or earlier, or
a clone that does not support anonymous functions, you can achieve the same
effect with a little more trouble using feval.)
- Suppose you run your code for 1000 iterations. Let p* denote the
minimum value of fun. By what factor
does the theory that we discussed last lecture (see Section 9.3)
predict that the ratio (fun(x)-p*)/(fun(x0)-p*) will be reduced,
in terms of m and M, the smallest and largest eigenvalues
of the Hessian of fun? How does this compare with what the code
actually computes? You can compute the eigenvalues of the Hessian
by eig(A), and the minimal value p* from evaluating fun
at the exact minimizer -A\b. (Type "help mldivide" at the Matlab
prompt if you are not familiar with the crucial "matrix left divide"
- Write another function newtmeth that implements Newton's method in
the same way. Now the function to be minimized must return a third output
argument, the Hessian H. The Newton code must solve a system of
linear equations, which could be done using "\", or "inv", but chol
(Cholesky factorization) is far preferable, as it exploits the symmetry
and positive definiteness of H. Follow the factorization with
two triangular solves using "\" (the implementation of "\" checks whether
the left-hand side is triangular, or even a permuted triangular matrix,
and exploits this with a solve that is only O(n^2) work). Newton's method
minimizes a quadratic function in one step, so, once your code appears
to be working, apply it to the problem in Exercise 9.30 (page 519).
(Ignore the instructions in part (a) and (b)). Although in principle
writing the code for computing g and H is straightforward,
in practice it is easy to get this wrong. Therefore, check your
derivative formulas against explicit finite difference quotients
(of the function for g, of the gradient for H) before proceeding.
The size of the difference should be in the range 1e-6 to 1e-8; if
you make it too small, rounding errors will make the finite difference
quotients meaningless. For the data defining the problem,
set n to 100, m to 50, and
the vectors ai as the columns of the matrix saved
in Adata.mat (type load Adata at the
prompt). (Note the two different meanings of m in this homework!)
- Using x0=0, how many iterations of Newton's method are required to
minimize this function to approximately the limits of the machine precision
(say what you mean by this). How does this compare with the
upper bound on the number of Newton iterations from the classical
Newton theory that we discussed last week? You can estimate m and M
by computing them using eig at x0, at the minimizer, and perhaps
at some other points, and you know p* after successfully minimizing
the function. How about the upper bound given by the self-concordance
theory that we will discuss in class today?
- Using semilogy, plot the norm of the gradient and the Newton
decrement as functions of the iteration count (add output arguments
to newtmeth returning these quantities as vectors so you don't
need to plot inside newtmeth). Use legend, title,
xlabel, ylabel to label the plot nicely.
Do you observe quadratic convergence?