...The Quality of Data Introduction Value Errors Missing Data and Bad Structures Entity Resolution Anonymous Resolution Conclusion 34 35 39 50 62 66 1 DATA MINING FOR INTELLIGENCE Introduction Computer processors are faster than ever, storage is fairly cheap, network bandwidth is continually expanding, and information technologies are capable of integrating massive amounts of data. With all of these high-end systems and capabilities, there is still a limitation on performing effective analytics and much of this has to do with the quality of the data collected throughout the years. The real challenge lies in improving the accuracy of the data through better collection and representation methods. Only when this problem is appropriately addressed can one realistically expect to see improvement in the detection and analytics of fraud, terrorism, money laundering, and other critical areas. One high-profile situation emphasizes this point. It was reported1 that Senator Edward Kennedy (Massachusetts) was stopped while boarding airline fl ights on five different occasions because his name matched an entry on a government no-fly list. Additionally, Congressman John Lewis (Georgia) claims he was required to submit to additional security checks because his name also matched one on a watch list. In both cases, the data processed by these systems represented only a limited portion of what was necessary to properly perform an appropriate match. Ultimately the situations were...
Words: 10818 - Pages: 44