Premium Essay

Data Mining: Introduction

In:

Submitted By makubexpro
Words 2236
Pages 9
Data Mining: Introduction

Lecture Notes for Chapter 1 Introduction to Data Mining by Tan, Steinbach, Kumar

© Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

1

Why Mine Data? Commercial Viewpoint
Lots of data is being collected and warehoused – Web data, e-commerce – purchases at department/ grocery stores – Bank/Credit Card transactions Computers have become cheaper and more powerful Competitive Pressure is Strong – Provide better, customized services for an edge (e.g. in Customer Relationship Management)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

Why Mine Data? Scientific Viewpoint
Data collected and stored at enormous speeds (GB/hour) – remote sensors on a satellite – telescopes scanning the skies – microarrays generating gene expression data – scientific simulations generating terabytes of data Traditional techniques infeasible for raw data Data mining may help scientists – in classifying and segmenting data – in Hypothesis Formation

Mining Large Data Sets - Motivation
There is often information “hidden” in the data that is not readily evident Human analysts may take weeks to discover useful information Much of the data is never analyzed at all
4,000,000 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 1995 1996 1997

The Data Gap
Total new disk (TB) since 1995

Number of analysts
1998 1999
4

© Tan,Steinbach, KumarKamath, V. Kumar, “Data Mining for Mining and Engineering Applications” From: R. Grossman, C. Introduction to Data Scientific 4/18/2004

What is Data Mining? Many Definitions
– Non-trivial extraction of implicit, previously unknown and potentially useful information from data – Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

© Tan,Steinbach, Kumar

Introduction to Data

Similar Documents

Premium Essay

Introduction to Data Mining

...Data Mining D t Mi i Module 1 Introduction to Data Mining Dr. Jason T.L. Wang, Professor Department of Computer Science New Jersey Institute of Technology / Data Management: Its Evolution  1960s: – File management and network DBMS  1970s: – Relational DBMS  1980s: 980s – Non-first normal form, extended-relational, OO, deductive databases and application-oriented DBMS pp (spatial, scientific, CAD/CAM, etc.)  1990s - present: p – Data mining, digital library, and Web databases – Cloud databases, data science, and Big Data Data Mining © Jason Wang 2 Data Mining: Its Definition  Data mining (knowledge discovery in databases): ) – Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases  Alternative names: – Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, analysis data archeology, data dredging archeology dredging, information harvesting, etc. Data Mining © Jason Wang 3 Data Mining: A Multidisciplinary Field  Pattern Recognition  Machine Learning  Databases  St ti ti Statistics  Information Visualization Data Mining © Jason Wang 4 Data to be mined  Text databases  Web databases  Scientific and biological databases  Transactional databases Data Mining © Jason Wang 5 Knowledge to be discovered K l d t b di d  Association (correlation) ...

Words: 687 - Pages: 3

Premium Essay

Information Analysis

...3) Abstract Data is the most valuable enterprise asset, and a properly integrated data management strategy will enhance an organization’s ability to develop valuable insights that will provide greater business value. More specifically, data management is the development and execution of a company’s procedures, policies and architectures: in order to better manage its informational needs in an effective and efficient manner. Data warehousing, online transactional databases, and data mining can solve or reduce difficulties associated with managing data. Solving Data Management Difficulties The concept of data warehousing is to create a central location and permanent storage space for the various data sources needed to support a company’s analysis and reporting. In other words, it is a database that focuses on query and analysis rather than actual transaction processing (Reeves, 2009). It usually contains historical data derived from a company’s transaction data, but may include data from several other sources. Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources. Since executives and managers can quickly and efficiently access data from a multiple sources, they are able to make informed decisions on key initiatives promptly: rather than waste time retrieving data from multiple sources. Another benefit of data warehousing is that it stores large amounts of historical data so it can be analyzed...

Words: 668 - Pages: 3

Premium Essay

Cis 500 Assignment 4

...Assignment 4: Data Mining CIS 500 Dr. Besharatian Submitted by: Eric Spurbeck December 7, 2013 Abstract This paper will discuss the process of data mining, how it is used, for what purpose it is used and what information can be gathered from the data, which is compiled from data mining. Assignment 4: Data Mining Webopedia (2013) defines data mining as, "A class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior. For example, data mining software can help retail companies find customers with common interests." This means that large groups of data that is derived by information obtained through customers, customer purchases and customer buying habits. Businesses use this information for a variety of reasons; it is used for purchasing merchandise, tracking how certain merchandise is selling and even customers buying habits. Webopedia goes on to state that "data mining is popular in the science and mathematical fields but also is utilized increasingly by marketers trying to distill useful consumer data from Web sites." Predictive analytics are used to understand customer's behaviors, according to the article Predictive Analytics with Data Mining: How It Works (Siegel, Feb. 2005) it describes how this method has a predictor. This is "a single value measured for each customer" this is based on the customers purchased over a period and sets higher values for the most recent customer purchases. The...

Words: 1808 - Pages: 8

Premium Essay

Personality Test Analyses

...Data Mining Nabeel Ahmed University of Northern Virginia Abstract ‘The vein of research data is almost always richer than it appears to be on the surface, but it can only be of value if mined.—Morris Rosenberg’ (AGOSTA, 2000) Recent years, Data Mining has become hot topic of enterprises. More and more companies intend to introduce data mining techniques. One report from the United States treats data mining as one of the ten favorable fields in the 21st century, of which by means shows its importance. Generally speaking, data mining are often applied in those fields, such as insurance and finance industries, retailing and direct marketing industries, communication industry, manufacturing industry and Medical service industry, etc. The data related to management decision making has been accumulating surprisingly quickly because of the improvement in high technology. As the byproduct of internet, e-commerce, e-banking, pos system, barcode scanner and intelligent robot, the acquirement of electronic data has already become cheap and existing everywhere. These data are normally stored in data warehouse and data marts to provide assistance for management decision-making. Data mining is a fast growing field, its main target is to develop some techniques to assist the managers in intelligent analyzing and utilizing mass data. Data mining was already being reported in successfully utilized in the aspects of credit rating, fraud detection, database marketing, customer relationship...

Words: 3916 - Pages: 16

Premium Essay

Data Mining

...Running Head: DATA MINING Assignment 4: Data Mining Submitted by: Submitted to: Course: Introduction Data Mining is also called as Knowledge Discovery in Databases (KDD). It is a powerful technology which has great potential in helping companies to focus on the most important information they have in their data base. Due to the increased use of technologies, interest in data mining has increased speedily. Data mining can be used to predict future behavior rather than focus on past events. This is done by focusing on existing information that may be stored in their data warehouse or information warehouse. Companies are now utilizing data mining techniques to assess their database for trends, relationships, and outcomes to improve their overall operations and discover new ways that may permit them to improve their customer services. Data mining provides multiple benefits to government, businesses, society as well as individual persons (Data Mining, 2011). Benefits of data mining to the businesses when employing Advantages of data mining from business point of view is that large sizes of apparently pointless information have been filtered into important and valuable business information to the company, which could be stored in data warehouses. While in the past, the responsibility was on marketing utilities and services, products, the center of attention is now on customers- their choices, preferences, dislikes and likes, and possibly data mining is one of the most important tools...

Words: 1302 - Pages: 6

Premium Essay

Application of Bootstrap Method in Spectrometric Data Analysis

...spectrometric data analysis By XIAO Jiali, Jenny ( 0830300038) A Final Year Project thesis (STAT 4121; 3 Credits) submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Statistics at BNU-HKBU UNITED INTERNATIONAL COLLEGE December, 2011 DECLARATION I hereby declare that all the work done in this Project is of my independent effort. I also certify that I have never submitted the idea and product of this Project for academic or employment credits. XIAO Jiali, Jenny (0830300038) Date: ii Application of Bootstrap method in spectrometric data analysis XIAO Jiali, Jenny Science and Technology Division Abstract In this project the bootstrap methodology for spectrometric data is considered. The bootstrap can also compare two populations, without the normality condition and without the restriction to comparison of means. The most important new idea is that bootstrap resampling must mimic the separate samples design that produced the original data. Bootstrap in mean, bootstrap in median, and bootstrap in confidence interval are three kinds of effective way to handle mass spectrometric data. Then,we need to reduce dimension based on bootstrap method. It may allow the data to be more easily visualized. Afterwards, using results obtained by bootstrap, we use data mining method to predict a patient has ovarian cancer or not. Decision tree induction and neural network are usual way to classify it. Keywords: Bootstrap, data mining...

Words: 7049 - Pages: 29

Premium Essay

Advanced Business

...ADVANCES BUSINESS ANALISIS Introduction : We can answers to what happene,d why, what is happening. But difficlut to answer what will happen ? and we can often discover uncexpected connection in the business ! Data mining is defi ned as “the nontrivial extraction of implicit, previously unknown, and potentially useful information (patterns) from data.” This is called knowledge discovery. The most important thing is to identify the patterns, whcich allow us to deine the structure ; We can say tjat data mining gives us knowledge. The most common application of data mining are : classification, prediction, cluster analisis (objetcs that have similars featurs) , mining association rules 1) Classification : trees Ex : the decision to grant credit How you construct a classifier : learning on the basis of learning eexamples (examples of correctly categorized objects) it gives us learning system (algorytms) and then classifier. Limitations : To construct a classifi er on the basis of a set of examples, you need to solve many problems that are common for the majority of data-mining algorithms. However, if you are aware of these limitations, you should have reasonable expectations regarding their possible applications and the quality of the knowledge generated by them. The main problems are connected with induction, history, updating, and overfi tting : * Induction problem : learning from examples is inductive reasoning : so we make generalizacion, from limited observation...

Words: 654 - Pages: 3

Premium Essay

Data Mining for Business Intelligence

... Page no. 1. Introduction to Data Mining 3 2. Characteristics and Objectives of Data Mining 3 3. Data type in Data Mining 3 4. Patterns of Data Mining 4 5. Applications of Data Mining 5 6. Data Mining Process Models 6 7. Classification of Techniques 7 8. Common Data Mining Mistakes 8 9. Data Mining softwares 8 10. References 8 Data Mining for Business Intelligence Introduction: Business Intelligence (BI)is defined as the set of techniques and tools that transform the raw data into meaningful and useful information for business...

Words: 1668 - Pages: 7

Premium Essay

What Is Data Mining?

...What Is Data Mining? Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD). The key properties of data mining are: * Automatic discovery of patterns * Prediction of likely outcomes * Creation of actionable information * Focus on large data sets and databases Data mining can answer questions that cannot be addressed through simple query and reporting techniques. Automatic Discovery Data mining is accomplished by building models. A model uses an algorithm to act on a set of data. The notion of automatic discovery refers to the execution of data mining models. Data mining models can be used to mine the data on which they are built, but most types of models are generalizable to new data. The process of applying a model to new data is known as scoring. See Also: Oracle Data Mining Application Developer's Guide for a discussion of scoring and deployment in Oracle Data Mining Prediction Many forms of data mining are predictive. For example, a model might predict income based on education and other demographic factors. Predictions have an associated probability (How likely is this prediction to be true?). Prediction probabilities are also known as confidence (How confident can I be of this prediction...

Words: 532 - Pages: 3

Premium Essay

Data Mining

...Data Mining 0. Abstract With the development of different fields, artificial intelligence, machine learning, statistic, database, pattern recognition and neurocomputing they merge to a newly technology, the data mining. The ultimate goal of data mining is to obtain knowledge from the large database. It helps to discover previously unknown patterns, most of the time it is followed by deeper manual evaluation to explain and correlate the results to establish a new knowledge. It is often practically used by government, bank, insurance company and medical researcher. A general basic idea of data mining would be introduced. In this article, they are divided into four types, predictive modeling, database segmentation, link analysis and deviation detection. A brief introduction will explain the variation among them. For the next part, current privacy, ethical as well as technical issue regarding data mining will be discussed. Besides, the future development trends, especially concept of the developing sport data mining is written. Last but not the least different views on data mining including the good side, the drawback and our views are integrated into the paragraph. 1. Introduction This century, is the age of digital world. We are no longer able to live without the computing technology. Due to information explosion, we are having difficulty to obtain knowledge from large amount of unorganized data. One of the solutions, Knowledge Discovery in Database (KDD) is introduced...

Words: 1700 - Pages: 7

Premium Essay

Innovation

...generally understood as the successful introduction of a new thing or method. They also said “innovation is the embodiment, combination, or synthesis of knowledge in original, relevant, valued new products, processes, or services”. Innovation therefore involves creativity but the two are not the same. The innovation should bring new product, improve quality and enhance customer service. Innovation begins with creative ideas as suggested by Amabile et al (1996) who define innovation as “the successful implementation of creative ideas within an organisation”. Authors like Byrd (2003) formulated equation to differentiate innovation and creativity: Innovation = Creativity * Risk Taking. From the equation there is clear difference between creativity and innovation. In Economic perspective innovation is viewed as introduction of new good, new market, new methods of production, new source of raw materials and new organization of any industry such as monopoly. Schumpeter (1934). The data mining teaching tool to illustrate association to level three students is an innovation because it is a new application which is unique and has not been developed according to the research carried out. A number of data mining software for academic purpose have so been produced according to the research. However teaching tools for level three students to demonstrate association is a new product which will open new market, new methods of teaching data mining, create new industry (monopoly) especially...

Words: 1550 - Pages: 7

Premium Essay

Bpcl

...ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics...

Words: 16335 - Pages: 66

Premium Essay

Mark1012

...sustain profitability in the long run. Regarding the Chief Information Officer’s priorities to improve purchasing power and distribution efficiency, store layout and product mix, detect fraudulent activities and offer personalised promotion, different types of Business Intelligence Tools will be recommended specifically with the aim to achieve these priorities. 2 Assignment B Zoe Suet Yee Wan, Jason Lau, Yaoyu Su Table of Contents Executive Summary .................................................................................................... 2 1. Introduction .............................................................................................................. 4 2. Business Intelligence (BI) Tools ......................................................................... 5 2.1 Online Analytical Processing (OLAP) ............................................................ 5 2.2Data Mining...

Words: 4553 - Pages: 19

Premium Essay

They Want to Do Their Best Work

...organizations need to place greater emphasis on attracting human capital rather than financial capital. Global staffing and management of a workforce diverse in culture and language skills, and dispersed in different nations are the key goals of global human resources. Only those multinational enterprises willing to adapt their human resource practices to the changing global labor market conditions will be able to attract and retain high performing employees. Companies with the ability to foresee their business needs and their workforce needs – especially for high skills – will gain the decisive competitive advantage. Keywords: Human Resource Management, Globalization, Data Analytics, Data Warehouse, Online Analytical Processing, Data Mining, Key Performance Indicators, Dashboards, Scorecards. INTRODUCTION Human Resources departments are transforming as the modern business faces numerous and complex challenges, and exploit opportunities. The transformation of human resources today is a direct call of the rapid changes within businesses due to factors such as globalization. In the global competition within the flat and connected new world, decision making in organizations has become increasingly intricate and convoluted. The new global world has widened...

Words: 4640 - Pages: 19

Premium Essay

Business Information Systems

...environmental monitoring system Ioannis N. Athanasiadis and Pericles A. Mitkas ABSTRACT Fairly rapid environmental changes call for continuous surveillance and on- line decision- making. Two areas where IT technologies can be valuable. In this paper we present a multi-agent system for monitoring and assessing air-quality attributes, which uses data coming from a meteorological station. A community of software agents is assigned to monitor and validate measurements coming from several sensors, to assess air-quality, and, finally, to fire alarms to appropriate recipients, when needed. Data mining techniques have been used for adding data-driven, customized intelligence into agents. The architecture of the developed system, its domain ontology, and typical agent interactions are presented. Finally, the deployment of a real-world test case is demonstrated. Keywords : Multi-Agent Systems, Intelligent Applications, Data Mining, Inductive Agents, Air-Quality Monitoring Introduction Environmental Information Systems (EIS) is a generic term that describes the class of systems that perform one or more of the following tasks: environmental monitoring, data storage and access, disaster description and response, environmental reporting, planning and simulation, modeling and decision- making. As the requirements for accurate and timely information in these systems are increasing, the need for incorporating advanced, intelligent features in EIS is revealed. In this context advances in Information...

Words: 4327 - Pages: 18