Data Mining
G22.3033-002
Dr. Jean-Claude Franchitti
Computer Science Department
Courant Institute of Mathematical Sciences
Session 4: Proposal Sample
Course Title: Data Mining Course Number: G22.3033-002
Instructor: Jean-Claude Franchitti Session: 4
Title of Project
Group Member 1, Group Member 2
Abstract
The abstract should be one paragraph that
summarizes what you will do for your project.
Introduction
Provide a brief overview of data mining.
Describe what your proposal is about and the organization of the rest of the
proposal. Include whether you will be performing data mining tasks,
implementing a new algorithm in Weka (or another data
mining tool), or modifying some other system to incorporate data mining features,
etc. Basically, provide the nature of your project. This section should be a
page or less in length.
Data Mining Task
Provide the specific tasks you will perform on
the data set. Include specific questions you will investigate, and the goals
for the tasks. This should be independent of the specific techniques you will
use to achieve your goals. This section should be a page or less.
Data Set
Describe the data set(s) you will be using in
your project. Include the origin of the data set, an overview of the data set
organization, attributes of the data, and challenges of the data set you've
selected. Include any information you have about missing values in the data
set. This should be one to two pages in length.
Methods and Models
Describe in detail the data mining methods and
models you plan to employ to achieve the goals you set in the Data Mining
Task section of your document. Include some mention of necessary data
transformation. If you're implementing a technique, you should have some idea
of how it will be implemented and incorporated into Weka (or some other data
mining tool). If you are combining techniques, explain how you intend to use
the output of one technique as input into another technique. This section
should be up to 5 pages in length. Remember, be detailed, include how you will
select the best model from the model space, etc.
Assessment
Discuss the assessment methodology you will use
to validate that you have found meaningful patterns. Will you use n-fold
cross-validation, confidence intervals for accuracy, etc.
How will you create your training and test sets? What baseline models will you
use? This section should be about a page or two in length.
Presentation and Visualization
Describe how your results will be presented and
visualized in such a way to show meaningful patterns in the data. This should
be up to a page in length.
Roles
In this section, discuss the roles that each
group member will have in the project. One paragraph per group member is
sufficient.
Schedule
The schedule is a table of dates and tasks that
you plan to complete by those dates. Tasks to be done by the progress report
must be listed, as well as any other dates you want to set for yourselves.
Additional deadlines are highly recommended. Be sure to include when you
will have data transformation, modeling, assessment, visualization, etc.
completed.
|
Date |
|
Tasks to be Completed |
|
??/??/10 |
|
Tasks completed by chosen date |
|
??/??/10 |
|
Tasks to be completed by the progress report date |
|
??/??/10 |
|
Tasks completed by the class presentation |
Bibliography
This is where you list bibliographic
information for any references you made throughout the proposal. You should
have lots of references.