Homework 4
Assigned Mon Feb 29, due Thu Mar 10

Gradient and Newton Methods

Write a Matlab function to implement the Gradient Method (equivalently, Steepest Descent in the 2-norm). The first line should be function [x,f,iters] = gradmeth(fun, x0, tol, maxit), as in the template gradmeth.m. The first input parameter fun is an "anonymous function", which means that calls to fun inside gradmeth will actually be to some function whose name is unknown until gradmeth is called. This is the function that gradmeth is supposed to minimize; its implementation must return both the function value f at a given point x and its gradient g there. The second input parameter to gradmeth is the starting point, the third is the tolerance on the norm of the gradient and the fourth is the max number of iterations to be allowed. You can use a while loop to implement the main iteration, and code the backtracking line search (see p.464) in a separate function, since we will also need it for Newton's method below. Use α=1/4, β=1/2. Test my template on the simple quadratic function (and starting point) coded in quad.m by calling the script example.m and make sure you understand how all this works before replacing the dummy code in the template by your own code.
Suppose you run your code for 1000 iterations. Let p* denote the minimum value of fun. By what factor does the theory that we discussed in the lecture (see Section 9.3) predict that the ratio (fun(x)-p*)/(fun(x0)-p*) will be reduced, in terms of m and M, the smallest and largest eigenvalues of the Hessian of fun? How does this compare with what the code actually computes? You can compute the eigenvalues of the Hessian by eig(A), and the minimal value p* from evaluating fun at the exact minimizer -A\b. (Type "help \" at the Matlab prompt if you are not familiar with the crucial "matrix left divide" operator "\".)
Write another function newtmeth that implements Newton's method in the same way. Now the function to be minimized must return a third output argument, the Hessian H. The Newton code must solve a system of linear equations, which is most easily done using "\". You should never use inv to solve a system of equations, because it may introduce unnecessary error and, especially if the matrix is large and sparse (not this case), cost far too much. Matlab checks to see if the matrix is symmetric, and if it is, solves the linear system using Cholesky factorization; should the matrix turn out not to be positive definite, so Cholesky breaks down, it would then switch to the LU factorization, but that won't happen in the strictly convex case if you code the Hessian matrix correctly. You could alternatively explicitly compute the Cholesky factorization using chol and then solve the system of equations using forward and backward substitution, but this isn't necessary. Newton's method minimizes a quadratic function in one step, so, once your code appears to be working, apply it to the problem in Exercise 9.30 (page 519). (Ignore the instructions in part (a) and (b)). Although in principle writing the code for computing g and H is straightforward, in practice it is easy to get this wrong. Therefore, check your derivative formulas against explicit finite difference quotients (of the function for g, of the gradient for H) before proceeding. The size of the difference should be in the range 1e-6 to 1e-8; if you make it too small, rounding errors will make the finite difference quotients meaningless. For the data defining the problem, set n to 100, m to 50, and the vectors a_i as the columns of the matrix saved in Adata.mat (type load Adata at the prompt). (Note the two different meanings of m in this homework!) NEW: You can define f to be inf (and g, H to be anything, a good choice is nan*ones(n,1), nan*ones(n,n)) if x is not in dom f. Then, the line search will reject points that are not in dom f.
Using x0=0, how many iterations of Newton's method are required to minimize this function to approximately the limits of the machine precision (say what you mean by this). How does this compare with the theory that we discussed in the lecture (see section 9.5.3)? You can estimate m and M by computing them using eig at x0 and at the minimizer, and perhaps at some other points, and you know p* after successfully minimizing the function. NEW: Likewise you can estimate L by looking at differences of the norm of the Hessian at various points in the domain.
Using semilogy, plot the norm of the gradient against the iteration count (add an output argument to newtmeth that returns all the computed gradient norms so you don't need to plot inside newtmeth). Use legend, title, xlabel, ylabel to label the plot nicely. Do you observe quadratic convergence?

Turn in Matlab code listings as well as answers to the questions above. Make sure that your codes have plenty of comments explaining what they do. If anything is not clear, don't hesitate to contact me by email or by dropping by my office.