Homework 5. Assigned: Thu Mar 22. Due: Mon Apr 2 at 10 p.m.

  1. Write a Matlab function that does the following:

    Write another function whose input is the page rank vector and a vector of strings called url that specifies the corresponding URLs. Use the built-in function sort to sort the page rank vector and print the corresponding URLs in order from highest to lowest page rank. The second output of sort returns an index vector that can be used to order the URLs.

    Run your program on the adjacency matrices M and URLs url defined in the following Matlab data files (access the data with "load filename"):

    Use tol = 1e-6 and maxit = 1000 in all cases, and run the code for values of alpha equal to 0.95, 0.85, 0.75 and 0.65. For the NYU data sets, print the URLs and their page ranks in sorted order from highest to lowest page rank. Do these vary much for different values of alpha? For all the datasets, plot the number of iterations required against alpha using plot. Do these vary much with alpha? If the number of iterations reaches 1000 iterations without the tolerance being satisfied, there is probably a bug in your program.

  2. As we discussed in class, the rate of convergence depends on the second largest of the absolute values of the eigenvalues of the Google matrix: the closer this is to 1, the slower the convergence. Modifying the power method to compute this eigenvalue is a little beyond the scope of this class, so instead, compute the Google matrix G explicitly for the nyu20.mat data (in contrast to the previous question, where it is important not to compute G explicitly), and compute all the eigenvalues by calling the built-in function eig. Then look at the eigenvalue absolute values using abs (the absolute value of a complex number is the square root of the sum of the squares of the real and imaginary parts). Investigate the truth of the claim in the 4th paragraph of p.11 of the Austin paper: that the second largest of the absolute values of the eigenvalues is alpha. If this claim is true, print output to support it. If it's not, figure out what Austin intended to say instead, printing output to support your claim.

  3. This question demonstrates image compression using low rank approximation of a matrix via the singular value decomposition (SVD). (The SVD is not used for image compression in practice, but it is used in many other applications, such as machine learning.) Write a Matlab function with two input arguments and two output arguments The function must have comments explaining the inputs and outputs and what the function does, which is the following: Furthermore, if the approximate rank input argument is 0, the function should display the singular values in a semilogy plot and prompt the user to enter the desired approximate rank using input. Test your function on one or more gray-scale images from the web or from your personal collection of pictures and

  4. The eigenvector decomposition finds scalars &lambdak and vectors wk such that A wk = &lambdakwk. The singular value decomposition finds scalars &sigmak and vectors uk, vk such that A vk = &sigmakuk. If you compare the eigenvalue decomposition and the SVD for random A you will not see any relationship between them. However, there is a strong relationship if A is symmetric. What is it? Play with eig and svd until you see the relationship and explain it as well as you can.

Submit the homework, including function listings and everything else requested above, either by hard copy left under my office door or as a single pdf file which should be emailed directly to me.

Reading:

  1. How Google Finds Your Needle in the Web's Haystack, by David Austin
  2. Chapter 7 of Ascher-Greif