DUE: TUE NOV 27, 11:59pm
Write a program that reads an html file that I posted on the web at
and determines the frequencies of each letter of the alphabet by using a list
as your counters (freq = 26*, here all the 26 elements of freq
are set to zero). Here are steps you should follow.
- Import urllib.request
- In the main() function use
and then s = t.read() to read the file, where url stands for
"universal resource locator" and t can be any Python variable. Note
that although you can see the characters in the file when you click on the file
on the course homepage, they are stored as their ascii codes in the file.
- Send s as a parameter to function histo(s).
- In histo(s)
- Zero the elements of freq (i.e., freq = 26*).
- Use a for j in range() loop to process the first 500 ascii codes
s[j] in the file. Convert the ascii codes to characters using the
chr function. If the charater is a letter ( .isalpha() tests for
- Use function
lower() to converts any uppercase letter to a lowercase one.
- convert a letter to the int num such that for 'a', num
becomes 0; for 'b', it becomes 1, etc. Use an expression with ord() to
- increment the proper element of freq. So if an 'a' is encountered,
freq = freq + 1; if a 'b' is encountered, freq = freq +
1, and in general for the num'th letter, freq[num] = freq[num] +
- At the end of histo print the frequencies for each letter.
- Use histoe.py of NOV 12 as a general guide to writing function
histo(s) but now the histogramming and printing are done in the same
The results should look like the following although the format may be
In order to align all the frequencies, use format(freq[j], '3d') in the
print statement. It may be concatenated with any other string in the
print statement. Note that the numbers of n's and p's are larger than
they should be because the html input file contains mark-up characters.
Run the program for the entire html file. You may use a
c in s: loop. Now, however, becaues of the volume of data, you must
scale the results. Here's how:
The output should look like:
- Using the function max(), find the maximum value of the list
freq and call that big.
- Form scale = big//60 so that the maximum number of characters in
an interval is approximately 60.
- Set mult to 1 if scale is zero, otherwise set it to
- Divide freq[j] by mult when it multiplies a letter.
scaling factor is 14
Hand in only part B.