Internet and Intranet Protocols and Applications

Spring 2002

 

Programming Project 5

 

An HTTP Proxy Server

Modification history

Revision 0:        April  4, 2002

Revision 1:        April 13, 2002 [no caching required in this project]

Summary

You will write a partly functional HTTP proxy server (HPS) that responds to HTTP Get requests.

 

This assignment is due by May 1, 2002   Projects submitted after May 1 but on or before May 8 will incur a 10 % penalty.  Projects submitted AFTER May 8 WILL NOT BE ACCEPTED, and a grade of zero (0) will be given.

 

 

Introduction

Proxy servers are intermediary servers that accept request from clients and handle the request in one of the following ways:

  1. forwards the request to the origin server (the server specified in the request)
  2. delivers the requested resource from its own cache
  3. Passes the request to another proxy server.

You will implement a simple proxy server that responds to GET requests only.  We discussed proxy servers in lecture 8 and HTTP in lecture 7.  However, your proxy server will not be required to implement ANY caching.  You must forward every VALID request to the origin server.

 

General Requirements

o       HPS must handle all error codes returned by system calls and library routines.

o       HPS must avoid deadlock, avoid infinite resource use, and avoid busy waiting.

o       HPS must handle concurrent requests.

 

There may be command line options.

 

Your HPS must use port 8080 as its default HTTP server port.

Detailed Requirements

Your primary specification for the HPS will be the HTTP/1.1 Specification, RFC 2616.

 

Comply with sections 1.1, 1.4, 3.1, 3.2.2, 4.1, 4.2, 4.3, 4.4, 5.*, 6.*, 8.1, 9.3, 10.2.1, 10.4.1, 10.4.5, 10.4.9, 10.5.5, 10.5.6, 14.10, 14.13, 14.23, 14.38, 14.43, 14.45.

 

You may ignore (pass through) any other headers not listed above.  That is, you are not required to check that other headers not in the above list are valid.

Behave like a proxy

Grading program

The HPS talks HTTP. It receives a GET request from the browser, forwards the request to the origin server (OS), receives the HTTP response, and forwards the response to the browser.

 

Concurrently respond to multiple requests from different clients. In particular, suppose request R1 arrives and HPS is forwarding R1 to the OS. HPS MUST be able to receive and completely serve another request R2 and before OS finishes responding to R1.

 

Process (for syntactic correctness) the headers described in sections 14.10, 14.13, 14.23, 14.38, 14.43, 14.45.  You must ignore all headers OTHER than these, but you MUST pass them on.

 

You must insert (or append to), the VIA header in requests and responses (see Section Details below).

 

NOTE:  the standard (default) port number for HTTP Servers is 80.  However, the grading program is required to listen for requests on port number 8080.  Therefore, your program MUST use port number 8080 as the default HTTP port number (that is, unless the URI specifies a port number, you must use 8080.  If the URI includes a port number, then you MUST use THAT  port number).

Section Details

3.2.2

Although the spec says "The use of IP addresses in URLs SHOULD be avoided whenever possible." The HPS MUST support a host that is an IP address. Ignore the last 2 sentences.

4.4

To determine the length of a message body, support these techniques: the Content-length header, and the server closing the connection. You will never have to support the self-delimiting "multipart/byteranges" technique.

5.1.2

Assume Request-URI = absoluteURI | abs_path.

Change

In order to avoid request loops, a proxy MUST be able to recognize all of its server names, including any aliases, local variations, and the numeric IP address.

To

In order to avoid request loops, a proxy MUST be able to recognize all of its server name and the numeric IP address.

HPS's client does not know HPS exists. Therefore, the request is formatted for a standard web server. The web server is actually HPS, which behaves 'like' a proxy, but is transparent to the client. Thus, for this assignment an abs_path is a legal Request-URI.

9.3

Assume that HPS will receive only GET requests.

10.4.1

Do not forward syntactically bad HTTP requests. Detect them and return status code 400.

10.4.5

HPS MUST detect requests for HPS’s server name or numeric IP address. This will avoid request loops. Return status code 400 in this case. (A clearer error would be nice, but the HTTP/1.1 Spec doesn't provide one.)

14.45

via = received-by

 

 

Implementation

Advice

Start early. The semester will end in 3 weeks.

Programming Language

You MUST write your HPS in the Java language. You are not permitted to use these classes in the java.net package: HttpURLConnection, URLConnection, URLDecoder, URLEncode, or URLStreamHandler. You may use (at your own risk) the URL class to parse URLs. Be warned though, that you are responsible the consequences that arise due any bugs in this class.

Software tools

Write your server against the Java Socket class network abstraction.

Grading

You will help us grade your HPS by having it communicate via TCP/IP with a grading program (GP). You will start GP from your browser. You will need to tell GP which machine and port number your HPS listens on. The grading program will access your HPS and check its operation.

 

While being graded your server must run on a machine that can be accessed by a grading program running at NYU. This means it cannot be behind a firewall.

 

The Project 5 Grading Program is HERE

 

Using the HPS Grading Program

 

The grading program is a server (servlets, actually) that issue a series of HTTP GET requests to your HPS.  That is, it pretends at this point to be a browser and assumes that your HPS is an HTTP server (remember the "transparency" property of proxies).  The grading program knows the IP Address and port because you enter it on the Grading web page (an HTML form).

 

The URLs that you process in these GET requests direct you to the server where the test servlets are running.  So, we are also pretending to be the origin server too.  So, our test servlet gives you some information in response to your request, and you then return that information to the requesting servlet (the originator of the GET request).  We check that what we gave you in the response is what you in turn give back to the requesting servlet.

 

There are 20 individual tests, each of which tests a different requirement.  Each test is worth 5% of the score for this project.

You can select a specific test by marking its select box.   You can choose all tests with one click by selecting "ALL".

 

The results of each test will appear as output on your browser screen.  Some tests take longer than others, so be patient.

 

If a test fails, don't ask me why it failed - this is your job You'll have to look at the test output and read the corresponding part of RFC 2616 CAREFULLY.

 

We will grade your submission using this very same tester.

 

What to Pass In

  1. A text file copy of the output from the grading program.
  2. Your source code
  3. Instructions how to compile and run your HPS

 

Place these files into a jar file and email the jar file to ME (jconron@cs.nyu.edu).

Name your jar file  <sid>.jar where <sid> is your student ID.

 

REMEMBER to put your name and student ID in the subject line of your email as follows:

 

Project 5  <name>:<sid>

 

 

Computing

You may run HPS on any machine (except that you MUST NOT run HPS on the main CS department server, sparky.cs.nyu.edu).

 

The department has machines named courses[1-5].cs.nyu.edu which you might want to use, as they might be lightly loaded. These are sitting in the server room, so they're remote access only.

There are also 5 Sun Ultra10's in room 505, which are for console use, but they can certainly handle additional cycles from remote users. These are pubsun[1-5].cs.nyu.edu. Of course, since they are accessible to people, there is no guaranty that someone won't hit the power switch at any given moment, despite the signs telling people not to.

Assignment Updates

You must subscribe to and read the class email. Announcements I make on the list become official parts of this assignment. I will refrain from making any announcements less than a few days before an assignment is due.