Premium Essay

Knowledge Discovery in Medical Databases Leveraging Data Mining

In:

Submitted By
Words 35271
Pages 142
Abstract
Abstract

The goal of this master’s thesis is to identify and evaluate data mining algorithms which are commonly implemented in modern Medical Decision Support Systems (MDSS). They are used in various healthcare units all over the world. These institutions store large amounts of medical data. This data may contain relevant medical information hidden in various patterns buried among the records.

Within the research several popular MDSS’s are analysed in order to determine the most common data mining algorithms utilized by them. Three algorithms have been identified:
Naïve Bayes, Multilayer Perceptron and C4.5. Prior to the very analyses the algorithms are calibrated. Several testing configurations are tested in order to determine the best setting for the algorithms. Afterwards, an ultimate comparison of the algorithms orders them with respect to their performance. The evaluation is based on a set of performance metrics. The analyses are conducted in WEKA on five UCI medical datasets: breast cancer, hepatitis, heart disease, dermatology disease, diabetes.

The analyses have shown that it is very difficult to name a single data mining algorithm to be the most suitable for the medical data. The results gained for the algorithms were very similar. However, the final evaluation of the outcomes allowed singling out the Naïve Bayes to be the best classifier for the given domain. It was followed by the Multilayer Perceptron and the C4.5.

Keywords: Naïve Bayes, Multilayer Perceptron, C4.5, medical data mining, medical decision support Chapter 1: Introduction to the Study
Introduction
Thesis Structure
Study Overview
Background of the research
Focus Area & Motivation
Aims and Objectives
Research Problems
Motivation and Challenges
Thesis Outline
Intellectual Challenge
Justification for the Research
Methodology
Conclusion Chapter 1: