144 Peter Road, Monash University south Africa
144 Peter Road, Monash University south Africa
FIT 3002
Edson Zandamela and Mbuto Carlos Machili
Assignment 2 Report
FIT 3002
Edson Zandamela and Mbuto Carlos Machili
Assignment 2 Report
08
Fall
08
Fall
Table of Contents 1. Introduction 2 2. Problem Definition 2 2.1 Objective 2 2.2 Data Characteristics 2 2.3 Model Evaluation Method 3 2.4 Budgetary Constraints 3 2.5 Response rate without a model 3 3. Data Preparation and Pre-processing 4 3.1 File formatting 4 3.2 Missing Values 4 4. Experiments 4 4.1 Learning Algorithm Selection 4 4.2 Iteration Process 6 4.2.1. Attribute selection: 6 4.2.2. Changing Parameter settings 7 4.2.3. Data Normalization 7 4.2.4. Model Recommendation 8 4.2.4.1 Lift Chart 8 4.2.4.2 Gain Chart 9 5. Campaign suggestions 10 6. Conclusion 12
1. Introduction
Global Paper’s prime objective is to analyze and evaluate the market response rate of a new paper product that they are currently exploring by testing the market using a mass mailing campaign. The evaluation is based on how much the product will appeal to people based on their earned salaries (<=$50k, or >$50k) per year. The company has purchased demographic data sets (Adult data set and test data) from a known source, and through market research, it has discovered that the new product is likely to appeal to persons who make over $50K a year. This report documents the data mining processes (using the data mining tool Weka) involved in exploring several models, and choosing one that will likely produce the largest profit within the budgetary constraint ($ 12,000) for the mass mailing campaign.
2. Problem Definition
2.1 Objective
The aim of this data mining project is to test the provided demographic data sets with several models on Weka, and select a single model that will likely