Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data mining and medical knowledge management: cases and applications
Nội dung xem thử
Mô tả chi tiết
Data Mining and Medical
Knowledge Management:
Cases and Applications
Petr Berka
University of Economics, Prague, Czech Republic
Jan Rauch
University of Economics, Prague, Czech Republic
Djamel Abdelkader Zighed
University of Lumiere Lyon 2, France
Hershey • New York
Medical Information science reference
Director of Editorial Content: Kristin Klinger
Managing Editor: Jamie Snavely
Assistant Managing Editor: Carole Coulson
Typesetter: Sean Woznicki
Cover Design: Lisa Tosheff
Printed at: Yurchak Printing Inc.
Published in the United States of America by
Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue, Suite 200
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: [email protected]
Web site: http://www.igi-global.com/reference
and in the United Kingdom by
Information Science Reference (an imprint of IGI Global)
3 Henrietta Street
Covent Garden
London WC2E 8LU
Tel: 44 20 7240 0856
Fax: 44 20 7379 0609
Web site: http://www.eurospanbookstore.com
Copyright © 2009 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by
any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does
not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
Data mining and medical knowledge management : cases and applications / Petr Berka, Jan Rauch, and Djamel Abdelkader Zighed, editors.
p. ; cm.
Includes bibliographical references and index.
Summary: "This book presents 20 case studies on applications of various modern data mining methods in several important areas of medicine, covering classical data mining methods, elaborated approaches related to mining in EEG and ECG data, and methods related to mining
in genetic data"--Provided by publisher.
ISBN 978-1-60566-218-3 (hardcover)
1. Medicine--Data processing--Case studies. 2. Data mining--Case studies. I. Berka, Petr. II. Rauch, Jan. III. Zighed, Djamel A., 1955-
[DNLM: 1. Medical Informatics--methods--Case Reports. 2. Computational Biology--methods--Case Reports. 3. Information Storage and
Retrieval--methods--Case Reports. 4. Risk Assessment--Case Reports. W 26.5 D2314 2009]
R858.D33 2009
610.0285--dc22
2008028366
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not
necessarily of the publisher.
If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating
the library's complimentary electronic access to this publication.
Editorial Advisory Board
Riccardo Bellazzi, University of Pavia, Italy
Radim Jiroušek, Academy of Sciences, Prague, Czech Republic
Katharina Morik, University of Dortmund, Germany
Ján Paralič, Technical University, Košice, Slovak Republic
Luis Torgo, LIAAD-INESC Porto LA, Portugal
Blaž Župan, University of Ljubljana, Slovenia
List of Reviewers
Ricardo Bellazzi, University of Pavia, Italy
Petr Berka, University of Economics, Prague, Czech Republic
Bruno Crémilleux, University Caen, France
Peter Eklund, Umeå University, Umeå, Sveden
Radim Jiroušek, Academy of Sciences, Prague, Czech Republic
Jiří Kléma, Czech Technical University, Prague, Czech Republic
Mila Kwiatkovska, Thompson Rivers University, Kamloops, Canada
Martin Labský, University of Economics, Prague, Czech Republic
Lenka Lhotská, Czech Technical University, Prague, Czech Republic
Ján Paralić, Technical University, Kosice, Slovak Republic
Vincent Pisetta, University Lyon 2, France
Simon Marcellin, University Lyon 2, France
Jan Rauch, University of Economics, Prague, Czech Republic
Marisa Sánchez, National University, Bahía Blanca, Argentina
Ahmed-El Sayed, University Lyon 2, France
Olga Štěpánková, Czech Technical University, Prague, Czech Republic
Vojtěch Svátek, University of Economics, Prague, Czech Republic
Arnošt Veselý, Czech University of Life Sciences, Prague, Czech Republic
Djamel Zighed, University Lyon 2, France
Foreword ............................................................................................................................................ xiv
Preface ................................................................................................................................................ xix
Acknowledgment .............................................................................................................................xxiii
Section I
Theoretical Aspects
Chapter I
Data, Information and Knowledge.......................................................................................................... 1
Jana Zvárová, Institute of Computer Science of the Academy of Sciences of the Czech
Republic v.v.i., Czech Republic; Center of Biomedical Informatics, Czech Republic
Arnošt Veselý, Institute of Computer Science of the Academy of Sciences of the Czech Republic
v.v.i., Czech Republic; Czech University of Life Sciences, Czech Republic
Igor Vajda, Institutes of Computer Science and Information Theory and Automation of
the Academy of Sciences of the Czech Republic v.v.i., Czech Republic
Chapter II
Ontologies in the Health Field .............................................................................................................. 37
Michel Simonet, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé,
France
Radja Messai, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé,
France
Gayo Diallo, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé,
France
Ana Simonet, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé,
France
Chapter III
Cost-Sensitive Learning in Medicine.................................................................................................... 57
Alberto Freitas, University of Porto, Portugal; CINTESIS, Portugal
Pavel Brazdil, LIAAD - INESC Porto L.A., Portugal; University of Porto, Portugal
Altamiro Costa-Pereira, University of Porto, Portugal; CINTESIS, Portugal
Table of Contents
Chapter IV
Classification and Prediction with Neural Networks............................................................................ 76
Arnošt Veselý, Czech University of Life Sciences, Czech Republic
Chapter V
Preprocessing Perceptrons and Multivariate Decision Limits............................................................ 108
Patrik Eklund, Umeå University, Sweden
Lena Kallin Westin, Umeå University, Sweden
Section II
General Applications
Chapter VI
Image Registration for Biomedical Information Integration .............................................................. 122
Xiu Ying Wang, BMIT Research Group, The University of Sydney, Australia
Dagan Feng, BMIT Research Group, The University of Sydney, Australia; Hong Kong Polytechnic
University, Hong Kong
Chapter VII
ECG Processing .................................................................................................................................. 137
Lenka Lhotská, Czech Technical University in Prague, Czech Republic
Václav Chudáček, Czech Technical University in Prague, Czech Republic
Michal Huptych, Czech Technical University in Prague, Czech Republic
Chapter VIII
EEG Data Mining Using PCA............................................................................................................ 161
Lenka Lhotská, Czech Technical University in Prague, Czech Republic
Vladimír Krajča, Faculty Hospital Na Bulovce, Czech Republic
Jitka Mohylová, Technical University Ostrava, Czech Republic
Svojmil Petránek, Faculty Hospital Na Bulovce, Czech Republic
Václav Gerla, Czech Technical University in Prague, Czech Republic
Chapter IX
Generating and Verifying Risk Prediction Models Using Data Mining ............................................. 181
Darryl N. Davis, University of Hull, UK
Thuy T.T. Nguyen, University of Hull, UK
Chapter X
Management of Medical Website Quality Labels via Web Mining.................................................... 206
Vangelis Karkaletsis, National Center of Scienti.c Research “Demokritos”, Greece
Konstantinos Stamatakis, National Center of Scientific Research “Demokritos”, Greece
Pythagoras Karampiperis, National Center of Scientific Research “Demokritos”, Greece
Martin Labský, University of Economics, Prague, Czech Republic
Marek Růžička, University of Economics, Prague, Czech Republic
Vojtěch Svátek, University of Economics, Prague, Czech Republic
Enrique Amigó Cabrera, ETSI Informática, UNED, Spain
Matti Pöllä, Helsinki University of Technology, Finland
Miquel Angel Mayer, Medical Association of Barcelona (COMB), Spain
Dagmar Villarroel Gonzales, Agency for Quality in Medicine (AquMed), Germany
Chapter XI
Two Case-Based Systems for Explaining Exceptions in Medicine .................................................... 227
Rainer Schmidt, University of Rostock, Germany
Section III
Speci.c Cases
Chapter XII
Discovering Knowledge from Local Patterns in SAGE Data............................................................. 251
Bruno Crémilleux, Université de Caen, France
Arnaud Soulet, Université François Rabelais de Tours, France
Jiří Kléma, Czech Technical University, in Prague, Czech Republic
Céline Hébert, Université de Caen, France
Olivier Gandrillon, Université de Lyon, France
Chapter XIII
Gene Expression Mining Guided by Background Knowledge........................................................... 268
Jiří Kléma, Czech Technical University in Prague, Czech Republic
Filip Železný, Czech Technical University in Prague, Czech Republic
Igor Trajkovski, Jožef Stefan Institute, Slovenia
Filip Karel, Czech Technical University in Prague, Czech Republic
Bruno Crémilleux, Université de Caen, France
Jakub Tolar, University of Minnesota, USA
Chapter XIV
Mining Tinnitus Database for Knowledge.......................................................................................... 293
Pamela L. Thompson, University of North Carolina at Charlotte, USA
Xin Zhang, University of North Carolina at Pembroke, USA
Wenxin Jiang, University of North Carolina at Charlotte, USA
Zbigniew W. Ras, University of North Carolina at Charlotte, USA
Pawel Jastreboff, Emory University School of Medicine, USA
Chapter XV
Gaussian-Stacking Multiclassifiers for Human Embryo Selection..................................................... 307
Dinora A. Morales, University of the Basque Country, Spain
Endika Bengoetxea, University of the Basque Country, Spain
Pedro Larrañaga, Universidad Politécnica de Madrid, Spain
Chapter XVI
Mining Tuberculosis Data................................................................................................................... 332
Marisa A. Sánchez, Universidad Nacional del Sur, Argentina
Sonia Uremovich, Universidad Nacional del Sur, Argentina
Pablo Acrogliano, Hospital Interzonal Dr. José Penna, Argentina
Chapter XVII
Knowledge-Based Induction of Clinical Prediction Rules................................................................. 350
Mila Kwiatkowska, Thompson Rivers University, Canada
M. Stella Atkins, Simon Fraser University, Canada
Les Matthews, Thompson Rivers University, Canada
Najib T. Ayas, University of British Columbia, Canada
C. Frank Ryan, University of British Columbia, Canada
Chapter XVIII
Data Mining in Atherosclerosis Risk Factor Data .............................................................................. 376
Petr Berka, University of Economics, Prague, Czech Republic; Academy of Sciences of the
Czech Republic, Prague, Czech Republic
Jan Rauch, University of Economics, Praague, Czech Republic; Academy of Sciences of the
Czech Republic, Prague, Czech Republic
Marie Tomečková, Academy of Sciences of the Czech Republic, Prague, Czech Republic
Compilation of References............................................................................................................... 398
About the Contributors.................................................................................................................... 426
Index................................................................................................................................................... 437
Foreword ............................................................................................................................................ xiv
Preface ................................................................................................................................................ xix
Acknowledgment .............................................................................................................................xxiii
Section I
Theoretical Aspects
This section provides a theoretical and methodological background for the remaining parts of the book.
It defines and explains basic notions of data mining and knowledge management, and discusses some
general methods.
Chapter I
Data, Information and Knowledge.......................................................................................................... 1
Jana Zvárová, Institute of Computer Science of the Academy of Sciences of the Czech
Republic v.v.i., Czech Republic; Center of Biomedical Informatics, Czech Republic
Arnošt Veselý, Institute of Computer Science of the Academy of Sciences of the Czech Republic
v.v.i., Czech Republic; Czech University of Life Sciences, Czech Republic
Igor Vajda, Institutes of Computer Science and Information Theory and Automation of
the Academy of Sciences of the Czech Republic v.v.i., Czech Republic
This chapter introduces the basic concepts of medical informatics: data, information, and knowledge. It
shows how these concepts are interrelated and can be used for decision support in medicine. All discussed
approaches are illustrated on one simple medical example.
Chapter II
Ontologies in the Health Field .............................................................................................................. 37
Michel Simonet, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé,
France
Radja Messai, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé,
France
Gayo Diallo, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé,
France
Ana Simonet, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé,
France
Detailed Table of Contents
This chapter introduces the basic notions of ontologies, presents a survey of their use in medicine, and
explores some related issues: knowledge bases, terminology, information retrieval. It also addresses the
issues of ontology design, ontology representation, and the possible interaction between data mining
and ontologies.
Chapter III
Cost-Sensitive Learning in Medicine.................................................................................................... 57
Alberto Freitas, University of Porto, Portugal; CINTESIS, Portugal
Pavel Brazdil, LIAAD - INESC Porto L.A., Portugal; University of Porto, Portugal
Altamiro Costa-Pereira, University of Porto, Portugal; CINTESIS, Portugal
Health managers and clinicians often need models that try to minimize several types of costs associated
with healthcare, including attribute costs (e.g. the cost of a specific diagnostic test) and misclassification
costs (e.g. the cost of a false negative test). This chapter presents some concepts related to cost-sensitive
learning and cost-sensitive classification in medicine and reviews research in this area.
Chapter IV
Classification and Prediction with Neural Networks............................................................................ 76
Arnošt Veselý, Czech University of Life Sciences, Czech Republic
This chapter describes the theoretical background of artificial neural networks (architectures, methods
of learning) and shows how these networks can be used in medical domain to solve various classification and regression problems.
Chapter V
Preprocessing Perceptrons and Multivariate Decision Limits............................................................ 108
Patrik Eklund, Umeå University, Sweden
Lena Kallin Westin, Umeå University, Sweden
This chapter introduces classification networks composed of preprocessing layers and classification
networks, and compares them with “classical” multilayer percpetrons on three medical case studies.
Section II
General Applications
This section presents work that is general in the sense of a variety of methods or variety of problems
described in each of the chapters.
Chapter VI
Image Registration for Biomedical Information Integration .............................................................. 122
Xiu Ying Wang, BMIT Research Group, The University of Sydney, Australia
Dagan Feng, BMIT Research Group, The University of Sydney, Australia; Hong Kong Polytechnic
University, Hong Kong
In this chapter, biomedical image registration and fusion, which is an effective mechanism to assist medical
knowledge discovery by integrating and simultaneously representing relevant information from diverse
imaging resources, is introduced. This chapter covers fundamental knowledge and major methodologies
of biomedical image registration, and major applications of image registration in biomedicine.
Chapter VII
ECG Processing .................................................................................................................................. 137
Lenka Lhotská, Czech Technical University in Prague, Czech Republic
Václav Chudáček, Czech Technical University in Prague, Czech Republic
Michal Huptych, Czech Technical University in Prague, Czech Republic
This chapter describes methods for preprocessing, analysis, feature extraction, visualization, and classification of electrocardiogram (ECG) signals. First, preprocessing methods mainly based on the discrete
wavelet transform are introduced. Then classification methods such as fuzzy rule-based decision trees
and neural networks are presented. Two examples - visualization and feature extraction from Body
Surface Potential Mapping (BSPM) signals and classification of Holter ECGs – illustrate how these
methods are used.
Chapter VIII
EEG Data Mining Using PCA............................................................................................................ 161
Lenka Lhotská, Czech Technical University in Prague, Czech Republic
Vladimír Krajča, Faculty Hospital Na Bulovce, Czech Republic
Jitka Mohylová, Technical University Ostrava, Czech Republic
Svojmil Petránek, Faculty Hospital Na Bulovce, Czech Republic
Václav Gerla, Czech Technical University in Prague, Czech Republic
This chapter deals with the application of principal components analysis (PCA) to the field of data mining
in electroencephalogram (EEG) processing. Possible applications of this approach include separation of
different signal components for feature extraction in the field of EEG signal processing, adaptive segmentation, epileptic spike detection, and long-term EEG monitoring evaluation of patients in a coma.
Chapter IX
Generating and Verifying Risk Prediction Models Using Data Mining ............................................. 181
Darryl N. Davis, University of Hull, UK
Thuy T.T. Nguyen, University of Hull, UK
In this chapter, existing clinical risk prediction models are examined and matched to the patient data to
which they may be applied using classification and data mining techniques, such as neural Nets. Novel
risk prediction models are derived using unsupervised cluster analysis algorithms. All existing and derived
models are verified as to their usefulness in medical decision support on the basis of their effectiveness
on patient data from two UK sites.
Chapter X
Management of Medical Website Quality Labels via Web Mining.................................................... 206
Vangelis Karkaletsis, National Center of Scientific Research “Demokritos”, Greece
Konstantinos Stamatakis, National Center of Scientific Research “Demokritos”, Greece
Pythagoras Karampiperis, National Center of Scientific Research “Demokritos”, Greece
Martin Labský, University of Economics, Prague, Czech Republic
Marek Růžička, University of Economics, Prague, Czech Republic
Vojtěch Svátek, University of Economics, Prague, Czech Republic
Enrique Amigó Cabrera, ETSI Informática, UNED, Spain
Matti Pöllä, Helsinki University of Technology, Finland
Miquel Angel Mayer, Medical Association of Barcelona (COMB), Spain
Dagmar Villarroel Gonzales, Agency for Quality in Medicine (AquMed), Germany
This chapter deals with the problem of quality assessment of medical Web sites. The so called “quality
labeling” process can benefit from employment of Web mining and information extraction techniques,
in combination with flexible methods of Web-based information management developed within the
Semantic Web initiative.
Chapter XI
Two Case-Based Systems for Explaining Exceptions in Medicine .................................................... 227
Rainer Schmidt, University of Rostock, Germany
In medicine, doctors are often confronted with exceptions, both in medical practice or in medical research.
One proper method of how to deal with exceptions is case-based systems. This chapter presents two such
systems. The first one is a knowledge-based system for therapy support. The second one is designed for
medical studies or research. It helps to explain cases that contradict a theoretical hypothesis.
Section III
Specific Cases
This part shows results of several case studies of (mostly) data mining applied to various specific medical problems. The problems covered by this part, range from discovery of biologically interpretable
knowledge from gene expression data, over human embryo selection for the purpose of human in-vitro
fertilization treatments, to diagnosis of various diseases based on machine learning techniques.
Chapter XII
Discovering Knowledge from Local Patterns in SAGE Data............................................................. 251
Bruno Crémilleux, Université de Caen, France
Arnaud Soulet, Université François Rabelais de Tours, France
Jiří Kléma, Czech Technical University, in Prague, Czech Republic
Céline Hébert, Université de Caen, France
Olivier Gandrillon, Université de Lyon, France
Current gene data analysis is often based on global approaches such as clustering. An alternative way
is to utilize local pattern mining techniques for global modeling and knowledge discovery. This chapter
proposes three data mining methods to deal with the use of local patterns by highlighting the most promis-
ing ones or summarizing them. From the case study of the SAGE gene expression data, it is shown that
this approach allows generating new biological hypotheses with clinical applications.
Chapter XIII
Gene Expression Mining Guided by Background Knowledge........................................................... 268
Jiří Kléma, Czech Technical University in Prague, Czech Republic
Filip Železný, Czech Technical University in Prague, Czech Republic
Igor Trajkovski, Jožef Stefan Institute, Slovenia
Filip Karel, Czech Technical University in Prague, Czech Republic
Bruno Crémilleux, Université de Caen, France
Jakub Tolar, University of Minnesota, USA
This chapter points out the role of genomic background knowledge in gene expression data mining.
Its application is demonstrated in several tasks such as relational descriptive analysis, constraint-based
knowledge discovery, feature selection and construction, or quantitative association rule mining.
Chapter XIV
Mining Tinnitus Database for Knowledge.......................................................................................... 293
Pamela L. Thompson, University of North Carolina at Charlotte, USA
Xin Zhang, University of North Carolina at Pembroke, USA
Wenxin Jiang, University of North Carolina at Charlotte, USA
Zbigniew W. Ras, University of North Carolina at Charlotte, USA
Pawel Jastreboff, Emory University School of Medicine, USA
This chapter describes the process used to mine a database containing data, related to patient visits during Tinnitus Retraining Therapy. The presented research focused on analysis of existing data, along with
automating the discovery of new and useful features in order to improve classification and understanding
of tinnitus diagnosis.
Chapter XV
Gaussian-Stacking Multiclassifiers for Human Embryo Selection..................................................... 307
Dinora A. Morales, University of the Basque Country, Spain
Endika Bengoetxea, University of the Basque Country, Spain
Pedro Larrañaga, Universidad Politécnica de Madrid, Spain
This chapter describes a new multi-classification system using Gaussian networks to combine the outputs
(probability distributions) of standard machine learning classification algorithms. This multi-classification technique has been applied to a complex real medical problem: The selection of the most promising
embryo-batch for human in-vitro fertilization treatments.
Chapter XVI
Mining Tuberculosis Data................................................................................................................... 332
Marisa A. Sánchez, Universidad Nacional del Sur, Argentina
Sonia Uremovich, Universidad Nacional del Sur, Argentina
Pablo Acrogliano, Hospital Interzonal Dr. José Penna, Argentina
This chapter reviews current policies of tuberculosis control programs for the diagnosis of tuberculosis.
A data mining project that uses WHO’s Direct Observation of Therapy data to analyze the relationship
among different variables and the tuberculosis diagnostic category registered for each patient is then
presented.
Chapter XVII
Knowledge-Based Induction of Clinical Prediction Rules................................................................. 350
Mila Kwiatkowska, Thompson Rivers University, Canada
M. Stella Atkins, Simon Fraser University, Canada
Les Matthews, Thompson Rivers University, Canada
Najib T. Ayas, University of British Columbia, Canada
C. Frank Ryan, University of British Columbia, Canada
This chapter describes how to integrate medical knowledge with purely inductive (data-driven) methods
for the creation of clinical prediction rules. To address the complexity of the domain knowledge, the
authors have introduced a semio-fuzzy framework, which has its theoretical foundations in semiotics
and fuzzy logic. This integrative framework has been applied to the creation of clinical prediction rules
for the diagnosis of obstructive sleep apnea, a serious and under-diagnosed respiratory disorder.
Chapter XVIII
Data Mining in Atherosclerosis Risk Factor Data .............................................................................. 376
Petr Berka, University of Economics, Prague, Czech Republic; Academy of Sciences of the
Czech Republic, Prague, Czech Republic
Jan Rauch, University of Economics, Praague, Czech Republic; Academy of Sciences of the
Czech Republic, Prague, Czech Republic
Marie Tomečková, Academy of Sciences of the Czech Republic, Prague, Czech Republic
This chapter describes goals, current results, and further plans of long-time activity concerning the application of data mining and machine learning methods to the complex medical data set. The analyzed
data set concerns longitudinal study of atherosclerosis risk factors.
Compilation of References............................................................................................................... 398
About the Contributors.................................................................................................................... 426
Index................................................................................................................................................... 437
xiv
Foreword
Current research directions are looking at Data Mining (DM) and Knowledge Management (KM) as
complementary and interrelated fields, aimed at supporting, with algorithms and tools, the lifecycle of
knowledge, including its discovery, formalization, retrieval, reuse, and update. While DM focuses on
the extraction of patterns, information, and ultimately knowledge from data (Giudici, 2003; Fayyad et
al., 1996; Bellazzi, Zupan, 2008), KM deals with eliciting, representing, and storing explicit knowledge,
as well as keeping and externalizing tacit knowledge (Abidi, 2001; Van der Spek, Spijkervet, 1997).
Although DM and KM have stemmed from different cultural backgrounds and their methods and tools
are different, too, it is now clear that they are dealing with the same fundamental issues, and that they
must be combined to effectively support humans in decision making.
The capacity of DM to analyze data and to extract models, which may be meaningfully interpreted
and transformed into knowledge, is a key feature for a KM system. Moreover, DM can be a very useful
instrument to transform the tacit knowledge contained in transactional data into explicit knowledge, by
making experts’ behavior and decision-making activities emerge.
On the other hand, DM is greatly empowered by KM. The available, or background knowledge, (BK)
is exploited to drive data gathering and experimental planning, and to structure the databases and data
warehouses. BK is used to properly select the data, choose the data mining strategies, improve the data
mining algorithms, and finally evaluates the data mining results (Bellazzi, Zupan, 2008; Bellazzi, Zupan,
2008). The output of the data analysis process is an update of the domain knowledge itself, which may
lead to new experiments and new data gathering (see Figure 1).
If the interaction and integration of DM and KM is important in all application areas, in medical
applications it is essential (Cios, Moore, 2002). Data analysis in medicine is typically part of a complex
reasoning process which largely depends on BK. Diagnosis, therapy, monitoring, and molecular research
are always guided by the existing knowledge of the problem domain, on the population of patients or
on the specific patient under consideration. Since medicine is a safety critical context (Fox, Das, 2000),
P atterns
interpretation
B ackground
K now ledge
E xperim ental design
D ata b ase design
D ata e xtraction
C ase-base definition
D ata M ining P atterns
interpretation
B ackground
K now ledge
E xperim ental design
D ata b ase design
D ata e xtraction
C ase-base definition
D ata M ining P atterns
interpretation
B ackground
K now ledge
E xperim ental design
D ata b ase design
D ata e xtraction
C ase-base definition
D ata M ining
Figure 1. Role of the background knowledge in the data mining process