Data Mining

 

G22.3033-002

 

Dr. Jean-Claude Franchitti

New York University

Computer Science Department

Courant Institute of Mathematical Sciences

 

Session 4: Proposal Sample

 

 

Course Title: Data Mining                                                                                                       Course Number: G22.3033-002

Instructor: Jean-Claude Franchitti                                                                                            Session: 4

 

Title of Project
Group Member 1, Group Member 2

 

Abstract

The abstract should be one paragraph that summarizes what you will do for your project.

Introduction

Provide a brief overview of data mining. Describe what your proposal is about and the organization of the rest of the proposal. Include whether you will be performing data mining tasks, implementing a new algorithm in Weka (or another data mining tool), or modifying some other system to incorporate data mining features, etc. Basically, provide the nature of your project. This section should be a page or less in length.

Data Mining Task

Provide the specific tasks you will perform on the data set. Include specific questions you will investigate, and the goals for the tasks. This should be independent of the specific techniques you will use to achieve your goals. This section should be a page or less.

Data Set

Describe the data set(s) you will be using in your project. Include the origin of the data set, an overview of the data set organization, attributes of the data, and challenges of the data set you've selected. Include any information you have about missing values in the data set. This should be one to two pages in length.

Methods and Models

Describe in detail the data mining methods and models you plan to employ to achieve the goals you set in the Data Mining Task section of your document. Include some mention of necessary data transformation. If you're implementing a technique, you should have some idea of how it will be implemented and incorporated into Weka (or some other data mining tool). If you are combining techniques, explain how you intend to use the output of one technique as input into another technique. This section should be up to 5 pages in length. Remember, be detailed, include how you will select the best model from the model space, etc.

Assessment

Discuss the assessment methodology you will use to validate that you have found meaningful patterns. Will you use n-fold cross-validation, confidence intervals for accuracy, etc. How will you create your training and test sets? What baseline models will you use? This section should be about a page or two in length.

Presentation and Visualization

Describe how your results will be presented and visualized in such a way to show meaningful patterns in the data. This should be up to a page in length.

Roles

In this section, discuss the roles that each group member will have in the project. One paragraph per group member is sufficient.

Schedule

The schedule is a table of dates and tasks that you plan to complete by those dates. Tasks to be done by the progress report must be listed, as well as any other dates you want to set for yourselves. Additional deadlines are highly recommended. Be sure to include when you will have data transformation, modeling, assessment, visualization, etc. completed.

Date

   

Tasks to be Completed

??/??/10

   

Tasks completed by chosen date

??/??/10

   

Tasks to be completed by the progress report date

??/??/10

   

Tasks completed by the class presentation

Bibliography

This is where you list bibliographic information for any references you made throughout the proposal. You should have lots of references.