Premium Essay

Introduction to Data Mining

In:

Submitted By sum0mer
Words 687
Pages 3
Data Mining D t Mi i
Module 1
Introduction to Data Mining

Dr. Jason T.L. Wang, Professor
Department of Computer Science New Jersey Institute of Technology
/

Data Management: Its Evolution


1960s:
– File management and network DBMS



1970s:
– Relational DBMS



1980s: 980s
– Non-first normal form, extended-relational, OO,

deductive databases and application-oriented DBMS pp (spatial, scientific, CAD/CAM, etc.)


1990s - present: p
– Data mining, digital library, and Web databases – Cloud databases, data science, and Big Data
Data Mining © Jason Wang 2

Data Mining: Its Definition


Data mining (knowledge discovery in databases): )
– Extraction of interesting (non-trivial, implicit,

previously unknown and potentially useful) information or patterns from data in large databases


Alternative names:
– Knowledge discovery (mining) in databases

(KDD), knowledge extraction, data/pattern analysis, analysis data archeology, data dredging archeology dredging, information harvesting, etc.
Data Mining © Jason Wang 3

Data Mining: A Multidisciplinary Field


Pattern Recognition  Machine Learning  Databases  St ti ti Statistics  Information Visualization

Data Mining

© Jason Wang

4

Data to be mined


Text databases  Web databases  Scientific and biological databases  Transactional databases

Data Mining

© Jason Wang

5

Knowledge to be discovered K l d t b di d


Association (correlation)
– Multi-dimensional vs. single-dimensional association – age(X “20 29”) ^ income(X “20 29K”)  buys(X “PC”) age(X, 20..29 ) income(X, 20..29K ) buys(X, PC )

[support = 2%, confidence = 60%]
– contains(X, “computer”)  contains(X, “software”) [1%, computer ) software )

75%]

Data Mining

© Jason Wang

6

Knowledge to be discovered g (cont.)


Classification
– Finding models

Similar Documents

Premium Essay

Data Mining: Introduction

...Data Mining: Introduction Lecture Notes for Chapter 1 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused – Web data, e-commerce – purchases at department/ grocery stores – Bank/Credit Card transactions Computers have become cheaper and more powerful Competitive Pressure is Strong – Provide better, customized services for an edge (e.g. in Customer Relationship Management) © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Why Mine Data? Scientific Viewpoint Data collected and stored at enormous speeds (GB/hour) – remote sensors on a satellite – telescopes scanning the skies – microarrays generating gene expression data – scientific simulations generating terabytes of data Traditional techniques infeasible for raw data Data mining may help scientists – in classifying and segmenting data – in Hypothesis Formation Mining Large Data Sets - Motivation There is often information “hidden” in the data that is not readily evident Human analysts may take weeks to discover useful information Much of the data is never analyzed at all 4,000,000 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 1995 1996 1997 The Data Gap Total new disk (TB) since 1995 Number of analysts 1998 1999 4 © Tan,Steinbach, KumarKamath, V. Kumar, “Data Mining for Mining and Engineering Applications”...

Words: 2236 - Pages: 9

Free Essay

Market Value for Olive Oil in Chile

...Class © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Intra-cluster distances are minimized Inter cluster Inter-cluster distances are maximized © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Applications of Cluster Analysis Understanding – Group related documents p for browsing, group genes and proteins that have similar functionality, or group stocks with similar price fluctuations Discovered Clusters Industry Group 1 2 3 4 Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN, Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN, DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN, Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down, Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN, Sun-DOWN Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN, ADV-Micro-Device-DOWN,Andrew-Corp-DOWN, Computer-Assoc-DOWN,Circuit-City-DOWN, Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN, Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN Fannie-Mae-DOWN,Fed-Home-Loan-DOWN, Fannie Mae DOWN Fed Home Loan DOWN MBNA-Corp-DOWN,Morgan-Stanley-DOWN Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP, Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP, Schlumberger-UP Technology1-DOWN Technology2-DOWN Financial-DOWN Oil-UP Summarization – Reduce the size of large data sets C uste g precipitation Clustering...

Words: 2980 - Pages: 12

Free Essay

Basic Classification

...Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) – Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Illustrating Classification Task Tid 1 2 3 4 5 6 7 8 9 10 10 Attrib1 Yes No No Yes No No Yes No No No Attrib2 Large Medium Small Medium Large Medium Large Small Medium Small Attrib3 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K Class No No No No Yes No No Yes No Yes Learning algorithm Induction Learn Model Training Set Tid 11 12 13 14 15 10 Model Apply Model Attrib1 No Yes Yes No No Attrib2 Small Medium Large Small Large Attrib3 55K 80K 110K 95K 67K Class ? ? ? ? ? Deduction Test Set © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Examples of Classification Task Predicting tumor...

Words: 5724 - Pages: 23

Premium Essay

Intro to Data Mining

...Data Mining: Concepts and Techniques (3rd ed.) Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University ©2011 Han, Kamber & Pei. All rights reserved. Adapted for CSE 347-447, Lecture 1b, Spring 2015 1 1 Introduction n  n  n  n  n  n  n  n  n  n  Why Data Mining? What Is Data Mining? A Multi-Dimensional View of Data Mining What Kind of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Technologies Are Used? What Kind of Applications Are Targeted? Major Issues in Data Mining A Brief History of Data Mining and Data Mining Society Summary 2 Why Data Mining? n  The Explosive Growth of Data: from terabytes to petabytes n  Data collection and data availability n  Automated data collection tools, database systems, Web, computerized society n  Major sources of abundant data n  n  n  Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube n  n  We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets 3 Evolution of Sciences: New Data Science Era n  n  Before 1600: Empirical science 1600-1950s: Theoretical science n  Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding...

Words: 3169 - Pages: 13

Free Essay

Case Study

...Computer Networks Computer Graphics and Multimedia Lab Advanced Operating System Internet programming and Web Design Data Mining and Warehousing Internet programming and Web Design Lab Project Work and Viva Voce Total University Examinations Durations Max in Hrs Marks 3 100 3 100 3 100 3 100 3 100 3 3 3 3 100 100 100 100 100 1000 II For project work and viva voce (External) Breakup: Project Evaluation : 75 Viva Voce : 25 1 Anx.31 J - M Sc CS (SDE) 2007-08 with MQP Page 2 of 16 YEAR – I PAPER I: ADVANCED COMPUTER ARCHITECTURE Subject Description: This paper presents the concept of parallel processing, solving problem in parallel processing, Parallel algorithms and different types of processors. Goal: To enable the students to learn the Architecture of the Computer. Objectives: On successful completion of the course the students should have: Understand the concept of Parallel Processing. Learnt the different types of Processors. Learnt the Parallel algorithms. Content: Unit I Introduction to parallel processing – Trends towards parallel processing – parallelism in uniprocessor Systems – Parallel Computer structures – architectural classification schemes – Flynn’ Classification – Feng’s Classification – Handler’s Classification – Parallel Processing Applications. Unit II Solving problems in Parallel: Utilizing Temporal Parallelism – Utilizing Data Parallelism –...

Words: 3613 - Pages: 15

Premium Essay

Information Analysis

...3) Abstract Data is the most valuable enterprise asset, and a properly integrated data management strategy will enhance an organization’s ability to develop valuable insights that will provide greater business value. More specifically, data management is the development and execution of a company’s procedures, policies and architectures: in order to better manage its informational needs in an effective and efficient manner. Data warehousing, online transactional databases, and data mining can solve or reduce difficulties associated with managing data. Solving Data Management Difficulties The concept of data warehousing is to create a central location and permanent storage space for the various data sources needed to support a company’s analysis and reporting. In other words, it is a database that focuses on query and analysis rather than actual transaction processing (Reeves, 2009). It usually contains historical data derived from a company’s transaction data, but may include data from several other sources. Data warehouses separate analysis workload from transaction workload and enable an organization to consolidate data from several sources. Since executives and managers can quickly and efficiently access data from a multiple sources, they are able to make informed decisions on key initiatives promptly: rather than waste time retrieving data from multiple sources. Another benefit of data warehousing is that it stores large amounts of historical data so it can be analyzed...

Words: 668 - Pages: 3

Premium Essay

Data Mining

...Data Mining Introduction to Management Information System 04-73-213 Section 5 Professor Mao March 22, 2011 Group 5: Carol DeBruyn, Jason Rekker, Matt Smith, Mike St. Denis Odette School of Business – The University of Windsor Table of Contents Table of Contents ……………………………………………………………...…….………….. ii Introduction ……………………………………………………………………………………… 1 Data Mining ……………………………………………………………………...……………… 1 Text Mining ……………………………………………………………………...……………… 4 Conclusion ………………………...…………………………………………………………….. 7 References ………………………………………………..……………………………………… 9 Introduction Everyday millions of transactions occur at thousands of businesses. Each transaction provides valuable data to these businesses. This valuable data is then stored in data warehouses and data marts for later reference. This stored data represents a large asset that until the advent of data mining had been largely unexploited. As companies attempt to gain a competitive advantage over each other, new data mining techniques have been developed. The most recent revolution in data mining has resulted in text mining. Prior to text mining, companies could only focus on leveraging their numerical data. Now companies are beginning to benefit from the textual data stored in data warehouses as well. Data Mining Data mining, which is also known as data discovery or knowledge discovery is the procedure that gathers, analyzes and places into perspective useful information. This facilitates the analysis of data from...

Words: 2331 - Pages: 10

Premium Essay

Data Mining

...[pic] Data Mining Assignment 4 [pic] “Data mining software is one of a number of analytical tools for analyzing data (Data Mining, para. 1).” We will be learning about the competitive advantage, reliability of such tool, and privacy concerns towards consumers. Data mining tool is used by majority of companies to increase revenue, and build on the relationship with current consumers. Let’s explore the world of data mining technology in the following selection. “Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data (Data Mining, para. 7).” Data mining is implemented online to promote business ideas, products, and other ways to market them. Data mining is used in political websites, when you go to some sites they take your information then, they began to send you things to promote the Republicans and Democrats message. This is how your voice counts. “Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research...

Words: 1183 - Pages: 5

Premium Essay

Data Mining

...Data Mining 0. Abstract With the development of different fields, artificial intelligence, machine learning, statistic, database, pattern recognition and neurocomputing they merge to a newly technology, the data mining. The ultimate goal of data mining is to obtain knowledge from the large database. It helps to discover previously unknown patterns, most of the time it is followed by deeper manual evaluation to explain and correlate the results to establish a new knowledge. It is often practically used by government, bank, insurance company and medical researcher. A general basic idea of data mining would be introduced. In this article, they are divided into four types, predictive modeling, database segmentation, link analysis and deviation detection. A brief introduction will explain the variation among them. For the next part, current privacy, ethical as well as technical issue regarding data mining will be discussed. Besides, the future development trends, especially concept of the developing sport data mining is written. Last but not the least different views on data mining including the good side, the drawback and our views are integrated into the paragraph. 1. Introduction This century, is the age of digital world. We are no longer able to live without the computing technology. Due to information explosion, we are having difficulty to obtain knowledge from large amount of unorganized data. One of the solutions, Knowledge Discovery in Database (KDD) is introduced...

Words: 1700 - Pages: 7

Premium Essay

Innovation

...generally understood as the successful introduction of a new thing or method. They also said “innovation is the embodiment, combination, or synthesis of knowledge in original, relevant, valued new products, processes, or services”. Innovation therefore involves creativity but the two are not the same. The innovation should bring new product, improve quality and enhance customer service. Innovation begins with creative ideas as suggested by Amabile et al (1996) who define innovation as “the successful implementation of creative ideas within an organisation”. Authors like Byrd (2003) formulated equation to differentiate innovation and creativity: Innovation = Creativity * Risk Taking. From the equation there is clear difference between creativity and innovation. In Economic perspective innovation is viewed as introduction of new good, new market, new methods of production, new source of raw materials and new organization of any industry such as monopoly. Schumpeter (1934). The data mining teaching tool to illustrate association to level three students is an innovation because it is a new application which is unique and has not been developed according to the research carried out. A number of data mining software for academic purpose have so been produced according to the research. However teaching tools for level three students to demonstrate association is a new product which will open new market, new methods of teaching data mining, create new industry (monopoly) especially...

Words: 1550 - Pages: 7

Premium Essay

Business Intelligence

...FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics, Web 2.0 Introduction Business...

Words: 16335 - Pages: 66

Premium Essay

Bpcl

...ANALYTICS: FROM BIG DATA TO BIG IMPACT Hsinchun Chen Eller College of Management, University of Arizona, Tucson, AZ 85721 U.S.A. {hchen@eller.arizona.edu} Roger H. L. Chiang Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH 45221-0211 U.S.A. {chianghl@ucmail.uc.edu} Veda C. Storey J. Mack Robinson College of Business, Georgia State University, Atlanta, GA 30302-4015 U.S.A. {vstorey@gsu.edu} Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework. Keywords: Business intelligence and analytics, big data analytics...

Words: 16335 - Pages: 66

Premium Essay

Data Mining for Business Intelligence

... Page no. 1. Introduction to Data Mining 3 2. Characteristics and Objectives of Data Mining 3 3. Data type in Data Mining 3 4. Patterns of Data Mining 4 5. Applications of Data Mining 5 6. Data Mining Process Models 6 7. Classification of Techniques 7 8. Common Data Mining Mistakes 8 9. Data Mining softwares 8 10. References 8 Data Mining for Business Intelligence Introduction: Business Intelligence (BI)is defined as the set of techniques and tools that transform the raw data into meaningful and useful information for business...

Words: 1668 - Pages: 7

Premium Essay

What Is Data Mining?

...What Is Data Mining? Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD). The key properties of data mining are: * Automatic discovery of patterns * Prediction of likely outcomes * Creation of actionable information * Focus on large data sets and databases Data mining can answer questions that cannot be addressed through simple query and reporting techniques. Automatic Discovery Data mining is accomplished by building models. A model uses an algorithm to act on a set of data. The notion of automatic discovery refers to the execution of data mining models. Data mining models can be used to mine the data on which they are built, but most types of models are generalizable to new data. The process of applying a model to new data is known as scoring. See Also: Oracle Data Mining Application Developer's Guide for a discussion of scoring and deployment in Oracle Data Mining Prediction Many forms of data mining are predictive. For example, a model might predict income based on education and other demographic factors. Predictions have an associated probability (How likely is this prediction to be true?). Prediction probabilities are also known as confidence (How confident can I be of this prediction...

Words: 532 - Pages: 3

Premium Essay

Cis 500 Assignment 4

...Assignment 4: Data Mining CIS 500 Dr. Besharatian Submitted by: Eric Spurbeck December 7, 2013 Abstract This paper will discuss the process of data mining, how it is used, for what purpose it is used and what information can be gathered from the data, which is compiled from data mining. Assignment 4: Data Mining Webopedia (2013) defines data mining as, "A class of database applications that look for hidden patterns in a group of data that can be used to predict future behavior. For example, data mining software can help retail companies find customers with common interests." This means that large groups of data that is derived by information obtained through customers, customer purchases and customer buying habits. Businesses use this information for a variety of reasons; it is used for purchasing merchandise, tracking how certain merchandise is selling and even customers buying habits. Webopedia goes on to state that "data mining is popular in the science and mathematical fields but also is utilized increasingly by marketers trying to distill useful consumer data from Web sites." Predictive analytics are used to understand customer's behaviors, according to the article Predictive Analytics with Data Mining: How It Works (Siegel, Feb. 2005) it describes how this method has a predictor. This is "a single value measured for each customer" this is based on the customers purchased over a period and sets higher values for the most recent customer purchases. The...

Words: 1808 - Pages: 8