G22.2750/G63.2031....... Nonlinear Optimization ....... M. Overton

Homework 1

Assigned: Thurs Jan 27......Due: Thu Feb 10

Homework is due at midnight on the date indicated. Homework will be accepted up to one week late with a 20% penalty. Homework is not normally accepted more a week late. Homework should be given to me, to Celeste in Rm 524, or put under my office door, not left in my mailbox.
  1. Exercises from the text
    1. Ex 15 on p.30, Chapter 2
    2. Ex 16 on p.30, Chapter 2.
    3. Ex 7 on p.65, Chapter 3.

  2. The line search algorithm (Algorithms 3.6 and 3.7) of Chapter 3 is, in some respects, the most unappealing algorithm we will see in this course. It takes quite some effort to understand it in detail and one is left with the lingering feeling that it should be possible to write one that is just as good but less complicated. This assignment asks you to examine the algorithm carefully and decide for yourself whether it could be improved, and whether its complexity is justified.

    1. Carefully review the Matlab codes that I have written. The first two are my implementations of the line search algorithm, and these call a third (to do cubic interpolation) and fourth (to check if the points generated lie inside an interval). The fifth is a sample test function, and the sixth is a demo script. You should particularly notice and adopt the crucial idea of writing a function (not a script, except for the demo) to do a specific job, carefully documenting all input and output parameters, and writing it in a way that will be generally useful subsequently. Notice in particular the use of structures as parameters, so lots of things can be collected in the fields of a single structure without making for a complicated list of parameters. These fields include the name of the function, which is a Matlab string. The objective function is called inside "linesch" and "zoom" by "feval(pars.fgname, x, pars)" (type "help feval" for more info).

      Either convince yourself that linesch.m and lszoom.m are correct implementations of Algorithms 3.6 and 3.7, or find fault with them, document it and correct it.

    2. Write and document another Matlab function to implement the steepest descent algorithm for minimizing a function of several variables, using the line search code. The function should have two parameters, "pars" and "options". The "pars" parameter should be a structure defining the objective function, including the name in the string field pars.fgname and any other parameters needed to evaluate the function, just as in "lineschdemo". The "options" parameter should be a structure used to pass options to your steepest descent code, such as "gradnormtol", "maxiter", "wolfeparamc1", "wolfeparamc2", with obvious meanings (and perhaps others too). An important part of the assignment is to write readable, commented code, so that someone else (such as the grader) can look at the code and understand it. Don't forget to use indentation of code appropriately, such as inside a for loop. Look at my codes as examples of hopefully good programming practice.

      Test the algorithm on the quadratic function coded in samplefun.m, perhaps with a smaller n. Are the results consistent with what was predicted by the theorem we proved on convergence of steepest descent on a quadratic? Does the line search automatically do an exact line search in one step, and if so why? Note that different calls to lineschdemo will result in different matrices in pars.A and therefore different condition numbers (the ratio of largest to smallest eigenvalue).

    3. Extend your experiments to two non-quadratic functions, namely
      • Rosenbrock's function (see Exercise 2.1 on p.29; you should implement this as another function file, following the model in samplefun.m, returning both function and gradient values)
      • a third function of two or more variables that you make up yourself
      What happens now? Does the choice of Wolfe parameters have much effect? In particular, does choosing the second Wolfe parameter c2 to be small, e.g. 1e-6, reduce the number of steepest descent iterations required to achieve a given accuracy (or, if you prefer, the gradient norm or function value achieved in a fixed number of iterations)? Try a number of different choices of the parameters and compare the results, starting from the same starting point.

    4. Is it possible that the cubic interpolation routine might compute the square root of a negative number or divide by 0? What would happen if in either case as a result? If you are not sure, try it.

    5. How important is the cubic interpolation? Add an option (this will require editing the line search code) that replaces it by simple bisection, and see if this does the job just as well. Compare the two on a case where you again make the second Wolfe parameter c2 very small, e.g. 1.0e-6. This has the effect of making the line search closely approximate the one-dimensional minimizer along the line at each step of steepest descent. The code with cubic interpolation should converge much more quickly than bisection, and so should require signficantly less function evaluations per line search. Verify whether or not this is the case.

    6. Add code to print information that tells us the Q-rate of convergence of the cubic interpolation procedure to the one-dimensional minimizer along the line, again forcing it be closely approximated by choosing the second Wolfe parameter to be very small. What rate do you observe, experimentally?

    7. Code a much simpler line search that just starts with a step of one and repeatedly bisects until a reduction in f is achieved, ignoring the Wolfe conditions. Does this work just as well? You may find to your surprise that it does. If so, try to cook up examples where it does not.

    Hand in written answers to all these questions along with supporting output from Matlab, selected judiciously. Be sure to include listings of any functions you write, particularly your steepest descent code, but please do not submit a lot of computer output. You may find ``diary" to be helpful.