Premium Essay

Data Mining Assignment

In:

Submitted By moer1990
Words 316
Pages 2
1. Briefly describe th emajor differences between data mining and statistics. a. Statistics is user driven, while data mining is data driven. b. In statistics, there exist underlying theory about certain relationships in data. While in data mining, there is often no pre-existing theory. c. In statistics, users use statistical methods to testify the hypothesis among data. While in data mining, users often use different techniques to examine data and uncover unknown relationships.

2. What can an organization do to deal with data problems such as missing data and outliers?
Missing data: a. Ignore the tuple b. Fill in the missing value manually c. Use a proxy variable with no missing values.
Outliers:
a. Delete rows. b. Recode c. Transform variables.

3. In a data mining exercise, a data set is usually partitioned into training, validation, and test data. Briefly describe the roles assumed by these partitions. a. Training data: used to build and fit models b. Validation data: used to monitor and fine-tune the model to improve its generalization. Tuning involves selecting competing models and optimize the selected model based on validation data. c. Test data: used to test the performance of model m unbiased assessment.

4. Which takes four possible values: freshman, sophomore, junior, and senior.
Recode the variable: replace freshman with 1, replace sophomore with 2, replace junior with 3, replace senior with 4. 5. Data cleansing a. I select the whole Comment table, and insert a pivot table in a new sheet. Then I summarize the Comment table b. I check each row of thread ID in pivot table to find out whether this specific thread ID is contained in Thread table by using Find c. I use red color to highlight the thread IDs which are not find in Thread table. d. I make a copy of Comment

Similar Documents

Premium Essay

Mark1012

...Assignment B Zoe Suet Yee Wan, Jason Lau, Yaoyu Su OutThere Pty Ltd Z3416862 Zoe Suet Yee Wan Z3416733 Jason Lau Z3353653 Yaoyu Su Tutorial: Friday 9-11am Tutor: Jason Simpson Date Submitted: 12/10/2012 Word Count (excluding appendix and overview) : 3113 Word Count for Overview: 301 1 Assignment B Zoe Suet Yee Wan, Jason Lau, Yaoyu Su Executive Summary This following report focuses on providing recommendations and appropriate suggestions to OutThere Pty Ltd regarding the success for FillUp and QuickStop brands. Although FillUp has been growing steadily for the past five years, constant evaluation of efficiency and productivity is necessary for its continual success. On the other hand, standalone QuickStop shops has not been as successful as compared to FillUp as most newly established stores are located in remote areas which may be a contributing factor as to why new tools and strategies are essential to sustain profitability in the long run. Regarding the Chief Information Officer’s priorities to improve purchasing power and distribution efficiency, store layout and product mix, detect fraudulent activities and offer personalised promotion, different types of Business Intelligence Tools will be recommended specifically with the aim to achieve these priorities. 2 Assignment B Zoe Suet Yee Wan, Jason Lau, Yaoyu Su Table of Contents Executive Summary ...........................................................................................

Words: 4553 - Pages: 19

Premium Essay

Cis 500 Assignment 4

...Assignment 4: Data Mining CIS 500 Dr. Besharatian Submitted by: Eric Spurbeck December 7, 2013 Abstract This paper will discuss the process of data mining, how it is used, for what purpose it is used and what information can be gathered from the data, which is compiled from data mining. Assignment 4: Data Mining Webopedia (2013) defines data mining as, "A class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior. For example, data mining software can help retail companies find customers with common interests." This means that large groups of data that is derived by information obtained through customers, customer purchases and customer buying habits. Businesses use this information for a variety of reasons; it is used for purchasing merchandise, tracking how certain merchandise is selling and even customers buying habits. Webopedia goes on to state that "data mining is popular in the science and mathematical fields but also is utilized increasingly by marketers trying to distill useful consumer data from Web sites." Predictive analytics are used to understand customer's behaviors, according to the article Predictive Analytics with Data Mining: How It Works (Siegel, Feb. 2005) it describes how this method has a predictor. This is "a single value measured for each customer" this is based on the customers purchased over a period and sets higher values for the most recent customer purchases. The...

Words: 1808 - Pages: 8

Premium Essay

Data Mining

...Running Head: DATA MINING Assignment 4: Data Mining Submitted by: Submitted to: Course: Introduction Data Mining is also called as Knowledge Discovery in Databases (KDD). It is a powerful technology which has great potential in helping companies to focus on the most important information they have in their data base. Due to the increased use of technologies, interest in data mining has increased speedily. Data mining can be used to predict future behavior rather than focus on past events. This is done by focusing on existing information that may be stored in their data warehouse or information warehouse. Companies are now utilizing data mining techniques to assess their database for trends, relationships, and outcomes to improve their overall operations and discover new ways that may permit them to improve their customer services. Data mining provides multiple benefits to government, businesses, society as well as individual persons (Data Mining, 2011). Benefits of data mining to the businesses when employing Advantages of data mining from business point of view is that large sizes of apparently pointless information have been filtered into important and valuable business information to the company, which could be stored in data warehouses. While in the past, the responsibility was on marketing utilities and services, products, the center of attention is now on customers- their choices, preferences, dislikes and likes, and possibly data mining is one of the most important tools...

Words: 1302 - Pages: 6

Premium Essay

Dbm460Syllabus

...computing, middleware, and industry standards as relating to the enterprise data repository. Data warehousing, data mining, and data marts are covered from an enterprise perspective. Policies Faculty and students will be held responsible for understanding and adhering to all policies contained within the following two documents: • • University policies: You must be logged into the student website to view this document. Instructor policies: This document is posted in the Course Materials forum. University policies are subject to change. Be sure to read the policies at the beginning of each class. Policies may be slightly different depending on the modality in which you attend class. If you have recently changed modalities, read the policies governing your current class modality. Course Materials Coronel, C., Morris, S., & Rob, P. (2011). Database systems: Design, implementation and management (9th ed.). Mason, OH: Cengage Learning. Eckerson, W. W. (2011). Performance dashboards: Measuring, monitoring, and managing your business (2nd ed.). Hoboken, NJ: John Wiley & Sons, Inc. Hoffer, J. A., Ramesh, V., & Topi, H. (2011). Modern database management (10th ed.). Upper Saddle River, NJ: Pearson. Linoff, G. S., & Berry, M. J. A. (2011). Data mining techniques: For marketing, sales, and customer relationship management (3rd ed.). Indianapolis, IN: Wiley Publishing, Inc. Ponniah, P. (2010). Data warehousing: Fundamentals for IT professionals (2nd ed.). Hoboken, NJ:...

Words: 2603 - Pages: 11

Free Essay

Information Technology Management

...INT401I/202/0/2013 Tutorial Letter 202/0/2013 Information and Technology Management IV INT401I Year module School of Computing This tutorial letter contains the model answers to assignment 1 & 2, and the Oct/Nov 2013 examination and Jan/Feb 2014 supplementary examination preparation notes. Bar code CONTENTS 1 TUTORIAL MATTER THAT YOU SHOULD HAVE RECEIVED TO DATE .................................. 3 Plagiarism ................................................................................................................................... 3 2 THE MODEL ANSWERS TO ASSIGNMENT 1 ........................................................................... 3 3 THE MODEL ANSWERS TO ASSIGNMENT 2 ........................................................................... 6 4 THE OCT/NOV 2013 EXAMINATION AND JAN/FEB 2014 SUPPLEMENTARY EXAMINATION PREPARATION NOTES ............................................................................................................12 2 INT401I/202 1 TUTORIAL MATTER THAT YOU SHOULD HAVE RECEIVED TO DATE Title Description TUTORIAL LETTER 101/0/2013 Tutorial letter 101 contains important information about your module. VERY important information. TUTORIAL LETTER 201/0/2013 TUTORIAL LETTER 202/0/2013 Tutorial letter 202 contains the model answers to assignment 1 & 2, and the Oct/Nov 2013 examination and Jan/Feb 2014 supplementary examination preparation notes (this tutorial letter). Plagiarism NB: Plagiarism is the act of taking words...

Words: 4310 - Pages: 18

Premium Essay

Cis 500 Complete Class Assignments and Term Paper

...CIS 500 Complete ClasCIS 500 Complete Class Assignments and Term Paper Click link Below To Download Entire Class: http://strtutorials.com/CIS-500-Complete-Class-Assignments-and-Term-Paper-CIS5006.htm CIS 500 Complete Class Assignments and Term Paper CIS 500 Assignment 1 Predictive Policing CIS 500 Assignment 2: 4G Wireless Networks CIS 500 Assignment 3 Mobile Computing and Social Networking CIS 500 Assignment 4 Data Mining CIS 500 Term Paper Mobile Computing and Social Networks CIS 500 Assignment 1 Predictive Policing Click link Below To Download: http://strtutorials.com/CIS-500-Assignment-1-Predictive-Policing-CIS5001.htm In 1994, the New York City Police Department adopted a law enforcement crime fighting strategy known as COMPSTAT (COMPuter STATistics). COMPSTAT uses Geographic Information Systems (GIS) to map the locations of where crimes occur, identify “hotspots”, and map problem areas. COMPSTAT has amassed a wealth of historical crime data. Mathematicians have designed and developed algorithms that run against the historical data to predict future crimes for police departments. This is known as predictive policing. Predictive policing has led to a drop in burglaries, automobile thefts, and other crimes in some cities. Write a four to five (45) page paper in which you Compare and contrast the application of information technology (IT) to optimize police departments’ performance to reduce crime versus random patrols of the streets...

Words: 2044 - Pages: 9

Premium Essay

Data Mining Research

...One of this week’s chapters discusses Data Mining; the article I will focus on discusses a product created by Hampton Creek. The company created the Just Mayo product which is simply an egg-free version of mayo that hit stores nationwide within the past year. Hampton Creek is partially backed by one of the most famous financial entrepreneurs of the world, Bill Gates and was recently sued by a competitor, Unilever (Smith, 2014). Unilever is just one of many Hampton Creek’s competitors that creates Hellmann’s mayonnaise and believed that the mayo created by HC was falsely advertising its product because it does not includes eggs (Smith, 2014). The overall point of the article focuses on how Hampton Creek utilizes data mining to create more than just healthy food; data mining is utilized to find the best-tasting substitutes for unhealthier foods to change the future of food production (Smith, 2014). In doing so the company is in the process of creating less expensive foods with less water and using less land so that the product is more sustainable and free of GMOs and other unnatural ingredients (Smith, 2014). This article relates to unit two because in chapter 5 of the text book Kotler and Keller (2012) states that data mining can be utilized in a way for business management can gain a competitive advantage (p. 144). Data mining is a process that allows data collection via cluster analysis, automatic interaction detection, predictive modeling, neural networking and even regression...

Words: 579 - Pages: 3

Premium Essay

Data Mining A1

...FIT3002 Applications of Data Mining Assignment 1 (100 marks) This assignment requires you to use the data mining tool, WEKA, to build a good model from a given set of data; and then write a report to describe the process. The Hyperthyroid data set is for the study of hyperthyroid disease. The data is supplied by Garvan Institute and J. Ross Quinlan. An instance in this data set is a diagnosis record for a single patient, and the data set contains a total of 2800 instances. Each instance is represented by 29 input attributes and a class attribute indicating whether the diagnosis for the patient is hyperthyroid, T3 toxic, goitre, secondary toxic, or negative. The attribute information is given below: age: numeric. sex: M, F. on thyroxine: f, t. query on thyroxine: f, t. on antithyroid medication: f, t. sick: f, t. pregnant: f, t. thyroid surgery: f, t. I131 treatment: f, t. query hypothyroid: f, t. query hyperthyroid: f, t. lithium: f, t. goitre: f, t. tumor: f, t. hypopituitary: f, t. psych: f, t. TSH measured: f, t. TSH: numeric. T3 measured: f, t. T3: numeric. TT4 measured: f, t. TT4: numeric. T4U measured: f, t. T4U: numeric. FTI measured: f, t. FTI: numeric. TBG measured: f, t. TBG: numeric. referral source: WEST, STMW, SVHC, SVI, SVHD, other. class: hyperthyroid, T3 toxic, goitre, secondary toxic, negative. Your tasks are to: (a) analyze the data, and convert the data as suggested above, build several models from it and choose the best model, and (b) to write a report...

Words: 973 - Pages: 4

Free Essay

Mines

...Benefits of data mining to the businesses: Data Mining. Assignment 4 Mustafa Abdullah Strayer University Dr. Jodine Burchell 08/30/2012 Data Mining is a useful tool in the business world today. Data Mining is a process that uses statistical information to gather useful information knowledge from data warehouses. Data Mining can be used for many reasons when gathering information. Businesses that use it are finance, retail and banks for the purpose of finding information on a company or individual. Most business use data mining to predict sales, credit card fraud and to find out what makes the patient ill. HR departments use data mining to predict the value of the employee. Robert (2006)” The eventual goal is to project how much workers will produce over their careers”(para6). This tactic helps companies predict employees who will stay longer in the company as time goes by. The information is then stored into their database to help in the hiring process. “ Robert(2006)”Companies will be able to carry out cost-benefit studies on recruiting, training, and employee retention (along with its counterpart, layoffs)”.Base on this information companies are tired of playing the guessing game but data mining gives them a more accurate look. All the data gathered such as videos email, social media helps the HR understand the person and gives the business clues. Data Mining gives HR the ability to understand a person and search for the best job candidates through social media...

Words: 316 - Pages: 2

Premium Essay

Data Mining

...Data mining is an iterative process of selecting, exploring and modeling large amounts of data to identify meaningful, logical patterns and relationships among key variables.  Data mining is used to uncover trends, predict future events and assess the merits of various courses of action.             When employing, predictive analytics and data mining can make marketing more efficient. There are many techniques and methods, including business intelligence data collection. Predictive analytics is using business intelligence data for forecasting and modeling. It is a way to use predictive analysis data to predict future patterns. It is used widely in the insurance, medical and credit industries. Assessment of credit, and assignment of a credit score is probably the most widely known use of predictive analytics. Using events of the past, managers are able to estimate the likelihood of future events. Data mining aids predictive analysis by providing a record of the past that can be analyzed and used to predict which customers are most likely to renew, purchase, or purchase related products and services. Business intelligence data mining is important to your marketing campaigns. Proper data mining algorithms and predictive modeling can narrow your target audience and allow you to tailor your ads to each online customer as he or she navigates your site. Your marketing team will have the opportunity to develop multiple advertisements based on the past clicks of your visitors. Predictive...

Words: 1136 - Pages: 5

Free Essay

Crime Investigation

...(Online): 2347 - 4718 DATA MINING TECHNIQUES TO ANALYZE CRIME DATA R. G. Uthra, M. Tech (CS) Bharathidasan University, Trichy, India. Abstract: In data mining, Crime management is an interesting application where it plays an important role in handling of crime data. Crime investigation has very significant role of police system in any country. There had been an enormous increase in the crime in recent years. With rapid popularity of the internet, crime information maintained in web is becoming increasingly rampant. In this paper the data mining techniques are used to analyze the web data. This paper presents detailed study on classification and clustering. Classification is the process of classifying the crime type Clustering is the process of combining data object into groups. The construct of scenario is to extract the attributes and relations in the web page and reconstruct the scenario for crime mining. Key words: Crime data analysis, classification, clustering. I. INTRODUCTION Crime is one of the dangerous factors for any country. Crime analysis is the activity in which analysis is done on crime activities. Today criminals have maximum use of all modern technologies and hi-tech methods in committing crimes. The law enforcers have to effectively meet out challenges of crime control and maintenance of public order. One challenge to law enforcement and intelligence agencies is the difficulty of analyzing large volumes of data involved in criminal and...

Words: 1699 - Pages: 7

Premium Essay

Analytics

...INTRODUCTION TO BUSINESS ANALYTICS Sumeet Gupta Associate Professor Indian Institute of Management Raipur Outline •  Business Analytics and its Applications •  Analytics using Data Mining Techniques •  Working with R BUSINESS ANALYTICS AND ITS APPLICATIONS What is Business Analytics? Analytics is the use of: data, information technology, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers gain improved insight about their business operations and make better, fact-based decisions. Evolution of Business Analytics? •  Operations research •  Management science •  Business intelligence •  Decision support systems •  Personal computer software Application Areas of Business Analytics •  Management of customer relationships •  Financial and marketing activities •  Supply chain management •  Human resource planning •  Pricing decisions •  Sport team game strategies Why Business Analytics? •  There is a strong relationship of BA with: •  profitability of businesses •  revenue of businesses •  shareholder return •  BA enhances understanding of data •  BA is vital for businesses to remain competitive •  BA enables creation of informative reports Global Warming Poll Winner Sales Revenue Predicting Customer Churn Credit Card Fraud Loan Default Prediction Managing Employee Retention Market Segmentation Medical Imaging Analyzing Tweets stylus ...

Words: 952 - Pages: 4

Premium Essay

Business Analytics

...exploration of an organization’s data with emphasis on statistical analysis.  It describes the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics is used by companies committed to data-driven decision making.  It focuses on developing new insights and understanding of business performance based on data and statistical methods. BA is used to gain insights that inform business decisions and can be used to automate and optimize business processes. Business analytics makes extensive use of statistical analysis, including explanatory and predictive modeling, and fact-based management to drive decision making. It is therefore closely related to management science. Analytics may be used as input for human decisions or may drive fully automated decisions. Data-driven companies treat their data as a corporate asset and leverage it for competitive advantage. Successful business analytics depends on data quality, skilled analysts who understand the technologies and the business and an organizational commitment to data-driven decision making. Once the business goal of the analysis is determined, an analysis methodology is selected and data is acquired to support the analysis.  Data acquisition often involves extraction from one or more business systems, cleansing, and integration into a single repository such as a data warehouse or data mart.  The analysis is typically...

Words: 4604 - Pages: 19

Premium Essay

Syllabus for Qtm2000

...Gerber 102 Instructor: Denise Sakai Troxell Office: Babson Hall 318 Office hrs: By appointment only | Phone: (781) 239-6309e-mail: troxell@babson.edu | Course Description (from catalog): This course builds on the modeling skills acquired in the QTM core with special emphasis on case studies in Business Analytics – the science of iterative exploration of data that can be used to gain insights and optimize business processes. Data visualization and predictive analytics techniques are used to investigate the relationships between items of interest to improve the understanding of complex managerial models with sometimes large data sets to aid decision-making. These techniques and methods are introduced with widely used commercial statistical packages for data mining and predictive analytics, in the context of real-world applications from diverse business areas such as marketing, finance, and operations. Students will gain exposure to a variety of software packages, including R, the most popular open-source package used by analytics practitioners around the world. Topics covered include advanced methods for data visualization, logistic regression, decision tree learning methods, clustering, and association rules. Case studies draw on examples ranging from database marketing to financial forecasting. This course satisfies one of the core requirements towards the new Business Analytics concentration. It may also be used as an advanced liberal arts elective or an elective in...

Words: 1583 - Pages: 7

Premium Essay

Mis 301

...impacted organizational structure, culture, politics, decision making, and society as a whole. IT is transforming how physical products are designed, how services are bundled with products, and how individuals interact with businesses and with other individuals. A silent transformation is occurring as more and more physical products use embedded IT to improve customer experience and product performance. The pervasiveness of IT is expanding global trade and changing how and where work is performed. It is vital that future managers—in every area of business—have a working knowledge of modern IT, practical experience in its use, and management perspective on how IT is used in organizations. MIS 301 will focus on three broad issues: (a) data and enterprise systems; (b) IT and competitive strategies; and (c) emerging technologies. While there is some introduction to the practical business use of hands-on technology, the real value that McCombs majors...

Words: 3229 - Pages: 13