Homework 6
DUE: WED APR 17, 11:59pm

Write a program that reads an html file that I posted on the web at http://cs.nyu.edu/courses/fall11/CSCI-UA.0002-003/declaration.html and determines the frequencies of each letter of the alphabet by using a list as your counters (freq = 26*[0], here all the 26 elements of freq are set to zero). Here are steps you should follow.

Part A

• Import urllib.request
• In the main() function use
t =urllib.request.urlopen('http://cs.nyu.edu/courses/fall11/CSCI-UA.0002-003/declaration.html')
and then s = t.read() to read the file, where url stands for "universal resource locator" and t can be any Python variable. Note that although you can see the characters in the file when you click on the file on the course homepage, they are stored as their ascii codes in the file.
• Send s as a parameter to function histo(s).
• In histo(s)
• Zero the elements of freq (i.e., freq = 26*[0]).
• Use a for j in range() loop to process the first 500 ascii codes s[j] in the file. Convert the ascii codes to characters using the chr function. If the character is a letter ( .isalpha() tests for this)
• Use function lower() to converts any uppercase letter to a lowercase one.
• convert a letter to the int num such that for 'a', num becomes 0; for 'b', it becomes 1, etc. Use an expression with ord() to do this.
• increment the proper element of freq. So if an 'a' is encountered, freq[0] = freq[0] + 1; if a 'b' is encountered, freq[1] = freq[1] + 1, and in general for the num'th letter, freq[num] = freq[num] + 1.
• At the end of histo print the frequencies for each letter.
• Use histo2.py of APR 2 as a general guide to writing function histo(s) but now the histogramming and printing are done in the same function.

The results should look like the following although the format may be different.
```a: 33aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
b:  3bbb
c: 14cccccccccccccc
d: 14dddddddddddddd
e: 65eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
f:  7fffffff
g:  2gg
h: 33hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
i: 18iiiiiiiiiiiiiiiiii
j:  0
k:  1k
l: 15lllllllllllllll
m: 10mmmmmmmmmm
n: 27nnnnnnnnnnnnnnnnnnnnnnnnnnn
o: 29ooooooooooooooooooooooooooooo
p: 10pppppppppp
q:  3qqq
r: 18rrrrrrrrrrrrrrrrrr
s: 25sssssssssssssssssssssssss
t: 48tttttttttttttttttttttttttttttttttttttttttttttttt
u: 11uuuuuuuuuuu
v:  4vvvv
w:  8wwwwwwww
x:  0
y:  3yyy
z:  0
```
In order to align all the frequencies, use format(freq[j], '3d') in the print statement. It may be concatenated with any other string in the print statement. Note that the numbers of n's and p's are larger than they should be because the html input file contains mark-up characters.

Part B

Run the program for the entire html file. You may use a c in s: loop. Now, however, becaues of the volume of data, you must scale the results. Here's how:
• Using the function max(), find the maximum value of the list freq and call that big.
• Form scale = big//60 so that the maximum number of characters in an interval is approximately 60.
• Set mult to 1 if scale is zero, otherwise set it to scale.
• Divide freq[j] by mult when it multiplies a letter.
. The output should look like:
```scaling factor is  14
a:477|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
b: 95|bbbbbb
c:184|ccccccccccccc
d:252|dddddddddddddddddd
e:859|eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
f:180|ffffffffffff
g:130|ggggggggg
h:349|hhhhhhhhhhhhhhhhhhhhhhhh
i:449|iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
j: 16|j
k: 14|k
l:228|llllllllllllllll
m:144|mmmmmmmmmm
n:483|nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
o:513|oooooooooooooooooooooooooooooooooooo
p:159|ppppppppppp
q:  6|
r:425|rrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
s:478|ssssssssssssssssssssssssssssssssss
t:639|ttttttttttttttttttttttttttttttttttttttttttttt
u:209|uuuuuuuuuuuuuu
v: 74|vvvvv
w: 97|wwwwww
x:  9|
y: 81|yyyyy
z:  4|
```
Hand in only part B.