The choice of project is up to you, as long as it satisfies the following
- The project is to develop a system that answers some kind of user query
using content on the web.
Some Examples of Possible Projects.
- The type and form of query (keyword, Boolean, menu-driven etc.)
is up to you except that a very large number of queries should be within
the query language (much too many to hand-code all the answers).
- The interface should be web-based; you should set up a web page
that runs the program. Keep the interface as simple as possible, consistent
with the purpose of the project. I do not want you spending time adding
bells and whistles to the interface, even if cool, unless they really
increase the functionality.
- The system should be usable by a naive user. What the system does,
and how it works, should sufficiently explained by a very brief description
on the web interface page itself.
- How and when you collect pages and process information is up to you. E.g.
- Crawl at query time.
- Issue a Google query at query time.
- Collect pages from pre-determined sites at query time.
- Do any of the above off-line.
- You may even do manual processing on information collected off-line,
as long as the ratio of manual work to computer work is plausibly small.
- No manual processing may be done by the programmer at query time
(the program is supposed to work while the programmer is asleep).
However, the program is allowed to require interaction with the user.
- You may restrict the category of web pages used in any way you want
(e.g. only HTML pages; only JPEG files; only NYU.EDU pages; only
Wikipedia pages) except that you may not restrict it to highly
structured pages (e.g. databases). You may combine information from a structured
source with information from a non-structured source, as
long as you can make the case that the non-structured information is
being used in an important way.
Similarly, the project should not involve getting information that is
presented in a very predictable and systematic even in a non-structured
format. For instance, "Find all the movies that a specified actor has
acted in" is not an acceptable project, because it can be easily done very
reliably by finding the actor's page on www.imdb.com and parsing it.
In general, I would rather see an unsuccessful, imaginative approach
to a open-ended or impossible problem than a successful, pedestrian
approach to an easy problem.
Public software and external resources
You may use any open-source software you want, and any web resources
that you want, and I encourage you to, but they must all be credited
in your write-up and in a credits page linked to your query page.
Web pages from which content has been extracted
should be cited, by title and URL, on the results page.
You may do a group project, if you want. I expect a project that reflects
a proportionally greater amount of work. All members of the group get
the same grade on the project. After the project proposal has been
accepted, I will consider the membership of the group to change only if
I have unanimous agreement from all the members of the group. Proposals and
final projects should be submitted by one member of the group; the other
members should send email confirming their participation.
If you want to use this same project for this and another class, you may
request this when you submit your project proposal. The other instructor
must also be consulted and approve it
at that time. Again, the project will have to
be proportionally larger. Such requests will be considered
case by case.
evaluation is not required but is certainly welcome, if the nature of the
project allows it.
The project proposal is due Wednesday, March 23. It should contain:
None of this, except the membership list, constitutes a commitment on your
part; though if you make a major change after the proposal is accepted, please
consult with me. I just want to be sure that by mid-semester you have
started seriously thinking about the project. The proposal will not
be graded; HOWEVER IT MUST CONFORM TO ACADEMIC STANDARDS REGARDING CITING
WORK THAT YOU HAVE USED.
- If a group project, the members of the group.
- Objective of the project.
- Sketch of the architecture. Since the sketch will probably be
over-ambitious, you should state the order in which you will work on
various parts of the system.
- List of external software to be used.
- List of web resources to be used.
- Other pertinent system features (e.g. the programming language to be used).
The project is due Wednesday May 4. You should email to me and to the
Again as will all academic projects, CITATIONS MUST BE GIVEN FOR MATERIALS,
SOFTWARE, RESOURCES, IDEAS, TEXT ETC. AND ACKNOWLEDGEMENTS MUST BE MADE
FOR ANY ASSISTANCE FROM ANY PERSON (except myself). FAILURE
TO DO SO WILL BE CONSIDERED PLAGIARISM AND CHEATING AND WILL RESULT IN
A GRADE OF "F" FOR THE COURSE.
- The URL of the page where your project is running. No credit will
be given for a project that does not run.
- The source code for the project. The code should be executable on
the departmental Linux machines.
- A README file with instructions for compiling and running the project.
- A write-up of the project. This should include the same material as
the proposal (brought up to date) and also:
Some examples where the project succeeds (if any) and where it fails
- (Optional) Discussion of interesting/unexpected issues met with in
the project, either overcome or not.
- (Optional) Systematic evaluation of the project.
- (Optional) Discussion of what you would do next to improve the system,
if you were going to work on it more.