...Data Mining Third Edition This page intentionally left blank Data Mining Practical Machine Learning Tools and Techniques Third Edition Ian H. Witten Eibe Frank Mark A. Hall AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann Publishers is an imprint of Elsevier Morgan Kaufmann Publishers is an imprint of Elsevier 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA This book is printed on acid-free paper. Copyright © 2011 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must...
Words: 194698 - Pages: 779
...A Statistical Perspective on Data Mining Ranjan Maitra∗ Abstract Technological advances have led to new and automated data collection methods. Datasets once at a premium are often plentiful nowadays and sometimes indeed massive. A new breed of challenges are thus presented – primary among them is the need for methodology to analyze such masses of data with a view to understanding complex phenomena and relationships. Such capability is provided by data mining which combines core statistical techniques with those from machine intelligence. This article reviews the current state of the discipline from a statistician’s perspective, illustrates issues with real-life examples, discusses the connections with statistics, the differences, the failings and the challenges ahead. 1 Introduction The information age has been matched by an explosion of data. This surfeit has been a result of modern, improved and, in many cases, automated methods for both data collection and storage. For instance, many stores tag their items with a product-specific bar code, which is scanned in when the corresponding item is bought. This automatically creates a gigantic repository of information on products and product combinations sold. Similar databases are also created by automated book-keeping, digital communication tools or by remote sensing satellites, and aided by the availability of affordable and effective storage mechanisms – magnetic tapes, data warehouses and so on. This has created a situation...
Words: 22784 - Pages: 92
...PRINCIPLES AND PRACTICE OF EDUCATIONAL RESEARCH Willis Yuko Oso Faculty of Education and School of Postgraduate Studies Amoud University - Somaliland [pic] Barkhadleh Printing, BORAMA - SOMALILAND Typesetting and Printing By Barkhadleh Printing, Borama, Somaliland. Barkhadleh52hotmail.com /0025224509257 Copyright © Willis Yuko Oso, 2013. All rights reserved. No part of this publication may be reproduced in whole or in part or transmitted in any form or by any means (except in the case of brief quotations embodied in critical review for educational purposes) without the express permission of the publisher in writing. Library of Congress Cataloguing in Publication Data Willis Yuko Oso Faculty of Education and School of Postgraduate Studies Amoud University Somaliland ISBN: 978-9966-793-32-1 TABLE OF CONTENTS TABLE OF CONTENTS iii LIST OF TABLES vii LIST OF FIGURES vii SYMBOLS USED IN THE TEXT x PREFACE xi 1: EDUCATIONAL RESEARCH – CONCEPTUALIZATION 1 1.0 Introduction 1 1.1 Defining Educational Research 1 1.2 Characteristics of Educational Research 4 1.3 Purpose of Educational Research 5 1.4 Types of Research 9 1.4.1 Basic Research 9 1.4.2 Applied Research 10 1.4.3 Action Research 11 1.4.4 Research and Development (R&D) 15 1.4.5 Operations Research 15 2: THE RESEARCH PROCESS 18 2.0 Introduction 18 2.1 Research Topic 18 2.1.1 What is a Research Topic? 18 2.1.2 Elements of a Research Topic 19 2.1.3 Identifying a Research Topic...
Words: 114525 - Pages: 459
...Marco Brambilla, Sara Comai, and Maristella Matera Mining the Web: Discovering Knowledge from Hypertext Data Soumen Chakrabarti Understanding SQL and Java Together: A Guide to SQLJ, JDBC, and Related Technologies Jim Melton and Andrew Eisenberg Database: Principles, Programming, and Performance, Second Edition Patrick O’Neil and Elizabeth O’Neil The Object Data Standard: ODMG 3.0 Edited by R. G. G. Cattell, Douglas K. Barry, Mark Berler, Jeff Eastman, David Jordan, Craig Russell, Olaf Schadow, Torsten Stanienda, and Fernando Velez Data on the Web: From Relations to Semistructured Data and XML Serge Abiteboul, Peter Buneman, and Dan Suciu Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Ian H. Witten and Eibe Frank Joe Celko’s SQL for Smarties: Advanced SQL Programming, Second Edition Joe Celko Advanced SQL: 1999—Understanding Object-Relational and Other Advanced Features Jim Melton Joe Celko’s Data and Databases: Concepts in Practice Joe Celko Database Tuning: Principles, Experiments, and Troubleshooting Techniques Dennis Shasha and Philippe Bonnet Developing Time-Oriented Database...
Words: 191947 - Pages: 768
...Mostly Harmless Econometrics: An Empiricist’ Companion s Joshua D. Angrist Massachusetts Institute of Technology Jörn-Ste¤en Pischke The London School of Economics March 2008 ii Contents Preface Acknowledgments Organization of this Book xi xiii xv I Introduction 1 3 9 10 12 16 1 Questions about Questions 2 The Experimental Ideal 2.1 2.2 2.3 The Selection Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random Assignment Solves the Selection Problem . . . . . . . . . . . . . . . . . . . . . . . . Regression Analysis of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II The Core 19 21 22 23 26 30 36 38 38 44 47 51 51 3 Making Regression Make Sense 3.1 Regression Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 3.1.2 3.1.3 3.1.4 3.2 Economic Relationships and the Conditional Expectation Function . . . . . . . . . . . Linear Regression and the CEF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Asymptotic OLS Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saturated Models, Main E¤ects, and Other Regression Talk . . . . . . . . . . . . . . . Regression and Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 3.2.2 3.2.3 The Conditional Independence Assumption . . . . . . . . . . . . . . . . . . . . . . . . The Omitted Variables Bias Formula . ....
Words: 114745 - Pages: 459
...UNIVERSITY OF NAIROBI COLLEGE OF EDUCATION AND EXTERNAL STUDIES SCHOOL OF CONTINUING AND DISTANCE EDUCATION DEPARTMENT OF EXTRA-MURAL STUDIES. In collaboration with CENTRE FOR OPEN AND DISTANCE LEARNING MASTER IN PROJECT PLANNING AND MANAGEMENT COURSE: LDP 603: RESEARCH METHODS Authored by: Dr. Christopher Mwangi Gakuu Senior Lecturer, Department of ExtraMural studies, University of Nairobi & Dr. Harriet Jepchumba Kidombo Senior Lecturer, Department of Educational Studies University of Nairobi Page 1 of 240 GENERAL INTRODUCTION TO THE COURSE MODULE The Research Methods course is one of the first semester core courses for those learners pursuing the Master in Project Planning and Management course. You are aware that any good decision is based on facts. Facts are based on data. The data must be systematically collected, processed, analysed and presented for use. The best-known way of collecting empirical data is through scientific research methods. This is what this course module is all about. The main aims of this course unit is to: 1. Providing you with the basic information needed to understand the research process. 2. Enable you to use the knowledge to design their own research agenda on an area of personal interest or that of an organization. MODULE STRUCTURE The module is covered in Lectures. Each Lecture focuses on area in research. You will note that in each unit, there is an introduction, unit objectives, contents presented...
Words: 62976 - Pages: 252
...Managers and research The manager and the consultant–researcher Internal versus external consultants/researchers Knowledge about research and managerial effectiveness Ethics and business research Summary Discussion Questions Chapter 2: Scientific investigation The hallmarks of scientific research Some obstacles to conducting scientific research in the management area The hypothetico-deductive method Other types of research Summary Discussion Questions Chapter 3: The research process: the broad problem area and defining the problem statement Broad problem area Preliminary information gathering Literature review Defining the problem statement The research proposal Managerial implications Ethical issues in the preliminary stages of investigation Summary Discussion Questions Practice Projects Appendix Chapter 4: The research process: theoretical framework and hypothesis development The need for a theoretical framework Variables Theoretical framework Hypothesis development Hypothesis testing with qualitative research: negative case analysis Managerial implications Summary Discussion Questions Practice Project Chapter 5: The research process: elements of research design The research design Purpose of the study: exploratory, descriptive, hypothesis testing (analytical and predictive), case study analysis Type of investigation: causal versus correlational Extent of researcher interference with the study Study setting: contrived and noncontrived ...
Words: 119604 - Pages: 479
...Para os meus pais, porque "o valor das coisas não está no tempo que elas duram, mas na intensidade com que acontecem. Por isso existem momentos inesquecíveis, coisas inexplicáveis e pessoas incomparáveis" como vocês! Obrigado por tudo, Filipe Abstract The Retail Banking Industry has been severely affected by fraud over the past few years. Indeed, despite all the research and systems available, fraudsters have been able to outsmart and deceive the banks and their customers. With this in mind, we intend to introduce a novel and multi-purpose technology known as Stream Computing, as the basis for a Fraud Detection solution. Indeed, we believe that this architecture will stimulate research, and more importantly organizations, to invest in Analytics and Statistical Fraud-Scoring to be used in conjunction with the already in-place preventive techniques. Therefore, in this research we explore different strategies to build a Streambased Fraud Detection solution, using advanced Data Mining Algorithms and Statistical Analysis, and show how they lead to increased accuracy in the detection of fraud by at least 78% in our reference dataset. We also discuss how a combination of these strategies can be embedded in a Stream-based application to detect fraud in real-time. From this perspective, our experiments lead to an average processing time of 111,702ms per transaction, while strategies to further improve the performance are discussed. Keywords: Fraud Detection, Stream Computing, Real-Time...
Words: 56858 - Pages: 228
...Para os meus pais, porque "o valor das coisas não está no tempo que elas duram, mas na intensidade com que acontecem. Por isso existem momentos inesquecíveis, coisas inexplicáveis e pessoas incomparáveis" como vocês! Obrigado por tudo, Filipe Abstract The Retail Banking Industry has been severely affected by fraud over the past few years. Indeed, despite all the research and systems available, fraudsters have been able to outsmart and deceive the banks and their customers. With this in mind, we intend to introduce a novel and multi-purpose technology known as Stream Computing, as the basis for a Fraud Detection solution. Indeed, we believe that this architecture will stimulate research, and more importantly organizations, to invest in Analytics and Statistical Fraud-Scoring to be used in conjunction with the already in-place preventive techniques. Therefore, in this research we explore different strategies to build a Streambased Fraud Detection solution, using advanced Data Mining Algorithms and Statistical Analysis, and show how they lead to increased accuracy in the detection of fraud by at least 78% in our reference dataset. We also discuss how a combination of these strategies can be embedded in a Stream-based application to detect fraud in real-time. From this perspective, our experiments lead to an average processing time of 111,702ms per transaction, while strategies to further improve the performance are discussed. Keywords: Fraud Detection, Stream Computing, Real-Time...
Words: 56858 - Pages: 228
...Outline: RESEARCH 1) NATURE AND SCOPE OF RESEARCH 1.1) Definition – purposive, systematic and scientific process of gathering, analyzing, classifying, organizing, presenting and interpreting data for the solution of a problem, for prediction, for invention, for the discovery of truth, or for the expansion or verification of existing knowledge, all for the preservation and improvement of the quality of human life. 1.1.1) History of Research Historical records reveal that there is no written document on the beginning of business research as an organized business activity, but this is definitely of modern origin. During the Middle Ages, the merchant families of Fugger and Rothschild prospered in part because their organizations enabled them to get information before their competitors did. These studies were unsystematic, but considered to be well organized during that time. In 1879, more by accident than foresight, N. W. Ayer and Son conducted a crude but formal market survey, to measure markets for agricultural machineries, manufactured by Nicholas-Shepard Company. This market survey is probably the first real attempt at business research in the United States. The Curits Publishing company is generally conceived to have formed the first formal business research department with the appointment of Charles Parlin as manager of the Commercial Research Division of the Advertising Department in 1911. His original idea was that advertising space could be sold more effectively...
Words: 30970 - Pages: 124
...REVIEWS, REFINEMENTS AND NEW IDEAS IN FACE RECOGNITION Edited by Peter M. Corcoran Reviews, Refinements and New Ideas in Face Recognition Edited by Peter M. Corcoran Published by InTech Janeza Trdine 9, 51000 Rijeka, Croatia Copyright © 2011 InTech All chapters are Open Access articles distributed under the Creative Commons Non Commercial Share Alike Attribution 3.0 license, which permits to copy, distribute, transmit, and adapt the work in any medium, so long as the original work is properly cited. After this work has been published by InTech, authors have the right to republish it, in whole or part, in any publication of which they are the author, and to make other personal use of the work. Any republication, referencing or personal use of the work must explicitly identify the original source. Statements and opinions expressed in the chapters are these of the individual contributors and not necessarily those of the editors or publisher. No responsibility is accepted for the accuracy of information contained in the published articles. The publisher assumes no responsibility for any damage or injury to persons or property arising out of the use of any materials, instructions, methods or ideas contained in the book. Publishing Process Manager Mirna Cvijic Technical Editor Teodora Smiljanic Cover Designer Jan Hyrat Image Copyright hfng, 2010. Used under license from Shutterstock.com First published July, 2011 Printed in Croatia A free online edition of this book is available...
Words: 33246 - Pages: 133
...Campbell Systematic Reviews 2011:8 First published: 14 November, 2011 Last updated: 14 November, 2011 Search date: April, 2011 Dropout prevention and intervention programs: Effects on school completion and dropout among schoolaged children and youth Sandra Jo Wilson, Emily E. Tanner-Smith, Mark W. Lipsey, Katarzyna Steinka-Fry, & Jan Morrison Colophon Title Institution Authors Dropout prevention and intervention programs: Effects on school completion and dropout among school-aged children and youth The Campbell Collaboration Wilson, Sandra Jo Tanner-Smith, Emily E. Lipsey, Mark W. Steinka-Fry, Katarzyna Morrison, Jan 10.4073/csr.2011.8 62 24 August, 2011 Wilson SJ, Tanner-Smith EE, Lipsey, MW, Steinka-Fry, K, Morrison, J. Dropout prevention and intervention programs: Effects on school completion and dropout among school aged children and youth. Campbell Systematic Reviews 2011:8 DOI: 10.4073/csr.2011.8 © Wilson et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. School dropout, school attendance, early school leaving, school failure Wilson, Tanner-Smith, and Lipsey contributed to the writing and revising of this review and protocol. Wilson, Tanner-Smith, Steinka-Fry and Morrison contributed to information retrieval and data collection. Work on this review was supported by the Campbell Collaboration...
Words: 20551 - Pages: 83
...University of South Florida Scholar Commons Textbooks Collection USF Tampa Library Open Access Collections 2012 Social Science Research: Principles, Methods, and Practices Anol Bhattacherjee University of South Florida, abhatt@usf.edu Follow this and additional works at: http://scholarcommons.usf.edu/oa_textbooks Part of the American Studies Commons, Education Commons, Public Health Commons, and the Social and Behavioral Sciences Commons Recommended Citation Bhattacherjee, Anol, "Social Science Research: Principles, Methods, and Practices" (2012). Textbooks Collection. Book 3. http://scholarcommons.usf.edu/oa_textbooks/3 This Book is brought to you for free and open access by the USF Tampa Library Open Access Collections at Scholar Commons. It has been accepted for inclusion in Textbooks Collection by an authorized administrator of Scholar Commons. For more information, please contact scholarcommons@usf.edu. SOCIAL SCIENCE RESEARCH: PRINCIPLES, METHODS, AND PRACTICES ANOL BHATTACHERJEE SOCIAL SCIENCE RESEARCH: PRINCIPLES, METHODS, AND PRACTICES Anol Bhattacherjee, Ph.D. University of South Florida Tampa, Florida, USA abhatt@usf.edu Second Edition Copyright © 2012 by Anol Bhattacherjee Published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License Social Science Research: Principles, Methods, and Practices, 2nd edition By Anol Bhattacherjee First published 2012 ISBN-13: 978-1475146127 ...
Words: 39864 - Pages: 160
...ANNUAL REPORT 2011-12 Government of India Ministry of Statistics and Programme Implementation Sardar Patel Bhawan New Delhi - 110001 Website: http//mospi.gov.in. CONTENTS Chapters Page Vision Mission Introduction Development and Highlights National Statistical Commission Central Statistical Office National Sample Survey Office Coordination of Statistical Activities Computer Centre Statistical Services Indian Statistical Institute Twenty Point Programme Infrastructure and Projects Monitoring Member of Parliament Local Area Development Scheme Hindi Promotion Other Activities ANNEXES I IA IB IC ID IE IF IG IH II IIIA IIIB IVA IVB IVC V VI VII VIII Organisation Charts Ministry of Statistics & Programme Implementation Administration National Statistical Commission Central Statistical Office National Sample Survey Office Computer Centre Programme Implementation Wing Abbreviations used Allocation of Business to the Ministry Project, Seminar/Conference/Workshop and Travel Grant Assistance sanctioned during 2010-11 Project, Seminar/Conference/Workshop and Travel Grant Assistance sanctioned during 2011-12 (Up to December, 2011) Statement of Budget Estimate (SBE) -2011-12 Total Plan Gross Budgetary Support (GBS) for 2010-11 (BE and RE) for North-Eastern Region. Total Plan Gross Budgetary Support (GBS) for 2011-12 (BE and RE) for North-Eastern Region. Performance of Monthly Monitored Items under TPP-2006 (April, 2010 to March, 2011) Performance of Monthly Monitored Items under TPP-2006...
Words: 58344 - Pages: 234
...NBER WORKING PAPER SERIES FINANCIAL RISK MEASUREMENT FOR FINANCIAL RISK MANAGEMENT Torben G. Andersen Tim Bollerslev Peter F. Christoffersen Francis X. Diebold Working Paper 18084 http://www.nber.org/papers/w18084 NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 May 2012 Forthcoming in Handbook of the Economics of Finance, Volume 2, North Holland, an imprint of Elsevier. For helpful comments we thank Hal Cole and Dongho Song. For research support, Andersen, Bollerslev and Diebold thank the National Science Foundation (U.S.), and Christoffersen thanks the Social Sciences and Humanities Research Council (Canada). We appreciate support from CREATES funded by the Danish National Science Foundation. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research. NBER working papers are circulated for discussion and comment purposes. They have not been peerreviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications. © 2012 by Torben G. Andersen, Tim Bollerslev, Peter F. Christoffersen, and Francis X. Diebold. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Financial Risk Measurement for Financial Risk Management Torben G. Andersen, Tim Bollerslev, Peter F. Christoffersen, and...
Words: 41700 - Pages: 167