Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Preprocessing in Data Mining
PREMIUM
Số trang
327
Kích thước
5.0 MB
Định dạng
PDF
Lượt xem
1366

Data Preprocessing in Data Mining

Nội dung xem thử

Mô tả chi tiết

Intelligent Systems Reference Library 72

Salvador García

Julián Luengo

Francisco Herrera

Data

Preprocessing

in Data

Mining

Intelligent Systems Reference Library

Volume 72

Series editors

Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

e-mail: [email protected]

Lakhmi C. Jain, University of Canberra, Canberra, Australia

e-mail: [email protected]

About this Series

The aim of this series is to publish a Reference Library, including novel advances

and developments in all aspects of Intelligent Systems in an easily accessible and

well structured form. The series includes reference works, handbooks, compendia,

textbooks, well-structured monographs, dictionaries, and encyclopedias. It contains

well integrated knowledge and current information in the field of Intelligent

Systems. The series covers the theory, applications, and design methods of Intel￾ligent Systems. Virtually all disciplines such as engineering, computer science,

avionics, business, e-commerce, environment, healthcare, physics and life science

are included.

More information about this series at http://www.springer.com/series/8578

Salvador García • Julián Luengo

Francisco Herrera

Data Preprocessing

in Data Mining

123

Salvador García

Department of Computer Science

University of Jaén

Jaén

Spain

Julián Luengo

Department of Civil Engineering

University of Burgos

Burgos

Spain

Francisco Herrera

Department of Computer Science

and Artificial Intelligence

University of Granada

Granada

Spain

ISSN 1868-4394 ISSN 1868-4408 (electronic)

ISBN 978-3-319-10246-7 ISBN 978-3-319-10247-4 (eBook)

DOI 10.1007/978-3-319-10247-4

Library of Congress Control Number: 2014946771

Springer Cham Heidelberg New York Dordrecht London

© Springer International Publishing Switzerland 2015

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of

the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,

recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or

information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar

methodology now known or hereafter developed. Exempted from this legal reservation are brief

excerpts in connection with reviews or scholarly analysis or material supplied specifically for the

purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the

work. Duplication of this publication or parts thereof is permitted only under the provisions of

the Copyright Law of the Publisher’s location, in its current version, and permission for use must

always be obtained from Springer. Permissions for use may be obtained through RightsLink at the

Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this

publication does not imply, even in the absence of a specific statement, that such names are exempt

from the relevant protective laws and regulations and therefore free for general use.

While the advice and information in this book are believed to be true and accurate at the date of

publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for

any errors or omissions that may be made. The publisher makes no warranty, express or implied, with

respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

This book is dedicated to all people with

whom we have worked over the years and

have made it possible to reach this moment.

Thanks to the members of the research group

“Soft Computing and Intelligent Information

Systems”

To our families.

Preface

Data preprocessing is an often neglected but major step in the data mining process.

The data collection is usually a process loosely controlled, resulting in out of range

values, e.g., impossible data combinations (e.g., Gender: Male; Pregnant: Yes),

missing values, etc. Analyzing data that has not been carefully screened for such

problems can produce misleading results. Thus, the representation and quality of

data is first and foremost before running an analysis. If there is much irrelevant and

redundant information present or noisy and unreliable data, then knowledge dis￾covery is more difficult to conduct. Data preparation can take considerable amount

of processing time.

Data preprocessing includes data preparation, compounded by integration,

cleaning, normalization and transformation of data; and data reduction tasks; such

as feature selection, instance selection, discretization, etc. The result expected after

a reliable chaining of data preprocessing tasks is a final dataset, which can be

considered correct and useful for further data mining algorithms.

This book covers the set of techniques under the umbrella of data preprocessing,

being a comprehensive book devoted completely to the field of Data Mining,

including all important details and aspects of all techniques that belonging to this

families. In recent years, this area has become of great importance because the data

mining algorithms require meaningful and manageable data to correctly operate and

to provide useful knowledge, predictions or descriptions. It is well known that most

of the efforts made in a knowledge discovery application is dedicated to data

preparation and reduction tasks. Both theoreticians and practitioners are constantly

searching for data preprocessing techniques to ensure reliable and accurate results

together trading off with efficiency and time-complexity. Thus, an exhaustive and

updated background in the topic could be very effective in areas such as data

mining, machine learning, and pattern recognition. This book invites readers to

explore the many advantages the data preparation and reduction provide:

vii

• To adapt and particularize the data for each data mining algorithm.

• To reduce the amount of data required for a suitable learning task, also

decreasing its time-complexity.

• To increase the effectiveness and accuracy in predictive tasks.

• To make possible the impossible with raw data, allowing data mining algorithms

to be applied over high volumes of data.

• To support to the understanding of the data.

• Useful for various tasks, such as classification, regression and unsupervised

learning.

The target audience for this book is anyone who wants a better understanding of

the current state-of-the-art in a crucial part of the knowledge discovery from data:

the data preprocessing. Practitioners in industry and enterprise should find new

insights and possibilities in the breadth of topics covered. Researchers and data

scientist and/or analysts in universities, research centers, and government could find

a comprehensive review in the topic addressed and new ideas for productive

research efforts.

Granada, Spain, June 2014 Salvador García

Julián Luengo

Francisco Herrera

viii Preface

Contents

1 Introduction ........................................ 1

1.1 Data Mining and Knowledge Discovery. . . . . . . . . . . . . . . . . 1

1.2 Data Mining Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4.1 Pattern Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4.2 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Other Learning Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.1 Imbalanced Learning . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.2 Multi-instance Learning . . . . . . . . . . . . . . . . . . . . . . 9

1.5.3 Multi-label Classification . . . . . . . . . . . . . . . . . . . . . 9

1.5.4 Semi-supervised Learning . . . . . . . . . . . . . . . . . . . . 9

1.5.5 Subgroup Discovery . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5.6 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5.7 Data Stream Learning . . . . . . . . . . . . . . . . . . . . . . . 10

1.6 Introduction to Data Preprocessing . . . . . . . . . . . . . . . . . . . . 10

1.6.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.6.2 Data Reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Data Sets and Proper Statistical Analysis of Data Mining

Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1 Data Sets and Partitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.1 Data Set Partitioning . . . . . . . . . . . . . . . . . . . . . . . . 21

2.1.2 Performance Measures. . . . . . . . . . . . . . . . . . . . . . . 24

2.2 Using Statistical Tests to Compare Methods. . . . . . . . . . . . . . 25

2.2.1 Conditions for the Safe Use of Parametric Tests . . . . . 26

2.2.2 Normality Test over the Group of Data Sets

and Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

ix

2.2.3 Non-parametric Tests for Comparing Two

Algorithms in Multiple Data Set Analysis . . . . . . . . . 29

2.2.4 Non-parametric Tests for Multiple Comparisons

Among More than Two Algorithms . . . . . . . . . . . . . 32

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Data Preparation Basic Models . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2.1 Finding Redundant Attributes . . . . . . . . . . . . . . . . . . 41

3.2.2 Detecting Tuple Duplication and Inconsistency. . . . . . 43

3.3 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.4 Data Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.4.1 Min-Max Normalization . . . . . . . . . . . . . . . . . . . . . 46

3.4.2 Z-score Normalization . . . . . . . . . . . . . . . . . . . . . . . 47

3.4.3 Decimal Scaling Normalization. . . . . . . . . . . . . . . . . 48

3.5 Data Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.5.1 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 49

3.5.2 Quadratic Transformations . . . . . . . . . . . . . . . . . . . . 49

3.5.3 Non-polynomial Approximations

of Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5.4 Polynomial Approximations of Transformations . . . . . 51

3.5.5 Rank Transformations . . . . . . . . . . . . . . . . . . . . . . . 52

3.5.6 Box-Cox Transformations . . . . . . . . . . . . . . . . . . . . 53

3.5.7 Spreading the Histogram . . . . . . . . . . . . . . . . . . . . . 54

3.5.8 Nominal to Binary Transformation . . . . . . . . . . . . . . 54

3.5.9 Transformations via Data Reduction . . . . . . . . . . . . . 55

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Dealing with Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Assumptions and Missing Data Mechanisms . . . . . . . . . . . . . 61

4.3 Simple Approaches to Missing Data . . . . . . . . . . . . . . . . . . . 63

4.4 Maximum Likelihood Imputation Methods. . . . . . . . . . . . . . . 64

4.4.1 Expectation-Maximization (EM) . . . . . . . . . . . . . . . . 65

4.4.2 Multiple Imputation . . . . . . . . . . . . . . . . . . . . . . . . 68

4.4.3 Bayesian Principal Component Analysis (BPCA) . . . . 72

4.5 Imputation of Missing Values. Machine Learning

Based Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.5.1 Imputation with K-Nearest Neighbor (KNNI) . . . . . . . 76

4.5.2 Weighted Imputation with K-Nearest Neighbour

(WKNNI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.5.3 K-means Clustering Imputation (KMI). . . . . . . . . . . . 78

x Contents

4.5.4 Imputation with Fuzzy K-means Clustering

(FKMI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.5.5 Support Vector Machines Imputation (SVMI). . . . . . . 79

4.5.6 Event Covering (EC). . . . . . . . . . . . . . . . . . . . . . . . 82

4.5.7 Singular Value Decomposition Imputation (SVDI) . . . 86

4.5.8 Local Least Squares Imputation (LLSI) . . . . . . . . . . . 86

4.5.9 Recent Machine Learning Approaches to Missing

Values Imputation. . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.6 Experimental Comparative Analysis . . . . . . . . . . . . . . . . . . . 90

4.6.1 Effect of the Imputation Methods

in the Attributes’ Relationships. . . . . . . . . . . . . . . . . 90

4.6.2 Best Imputation Methods for Classification

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.6.3 Interesting Comments . . . . . . . . . . . . . . . . . . . . . . . 100

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Dealing with Noisy Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.1 Identifying Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.2 Types of Noise Data: Class Noise and Attribute Noise . . . . . . 110

5.2.1 Noise Introduction Mechanisms . . . . . . . . . . . . . . . . 111

5.2.2 Simulating the Noise of Real-World Data Sets . . . . . . 114

5.3 Noise Filtering at Data Level . . . . . . . . . . . . . . . . . . . . . . . . 115

5.3.1 Ensemble Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.3.2 Cross-Validated Committees Filter . . . . . . . . . . . . . . 117

5.3.3 Iterative-Partitioning Filter . . . . . . . . . . . . . . . . . . . . 117

5.3.4 More Filtering Methods . . . . . . . . . . . . . . . . . . . . . . 118

5.4 Robust Learners Against Noise. . . . . . . . . . . . . . . . . . . . . . . 118

5.4.1 Multiple Classifier Systems for Classification

Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.4.2 Addressing Multi-class Classification

Problems by Decomposition. . . . . . . . . . . . . . . . . . . 123

5.5 Empirical Analysis of Noise Filters and Robust Strategies . . . . 125

5.5.1 Noise Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.5.2 Noise Filters for Class Noise . . . . . . . . . . . . . . . . . . 127

5.5.3 Noise Filtering Efficacy Prediction by Data

Complexity Measures . . . . . . . . . . . . . . . . . . . . . . . 129

5.5.4 Multiple Classifier Systems with Noise . . . . . . . . . . . 133

5.5.5 Analysis of the OVO Decomposition with Noise . . . . 136

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6 Data Reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.2 The Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . 148

Contents xi

6.2.1 Principal Components Analysis. . . . . . . . . . . . . . . . . 149

6.2.2 Factor Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6.2.3 Multidimensional Scaling. . . . . . . . . . . . . . . . . . . . . 152

6.2.4 Locally Linear Embedding . . . . . . . . . . . . . . . . . . . . 155

6.3 Data Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

6.3.1 Data Condensation . . . . . . . . . . . . . . . . . . . . . . . . . 158

6.3.2 Data Squashing . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6.3.3 Data Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

6.4 Binning and Reduction of Cardinality . . . . . . . . . . . . . . . . . . 161

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

7 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

7.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

7.2.1 The Search of a Subset of Features . . . . . . . . . . . . . . 164

7.2.2 Selection Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 168

7.2.3 Filter, Wrapper and Embedded Feature Selection . . . . 173

7.3 Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

7.3.1 Output of Feature Selection . . . . . . . . . . . . . . . . . . . 176

7.3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

7.3.3 Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

7.3.4 Using Decision Trees for Feature Selection . . . . . . . . 179

7.4 Description of the Most Representative Feature Selection

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

7.4.1 Exhaustive Methods . . . . . . . . . . . . . . . . . . . . . . . . 181

7.4.2 Heuristic Methods. . . . . . . . . . . . . . . . . . . . . . . . . . 182

7.4.3 Nondeterministic Methods . . . . . . . . . . . . . . . . . . . . 182

7.4.4 Feature Weighting Methods . . . . . . . . . . . . . . . . . . . 184

7.5 Related and Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . 185

7.5.1 Leading and Recent Feature Selection Techniques. . . . 186

7.5.2 Feature Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . 188

7.5.3 Feature Construction . . . . . . . . . . . . . . . . . . . . . . . . 189

7.6 Experimental Comparative Analyses in Feature Selection. . . . . 190

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

8 Instance Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

8.2 Training Set Selection Versus Prototype Selection. . . . . . . . . . 197

8.3 Prototype Selection Taxonomy . . . . . . . . . . . . . . . . . . . . . . . 199

8.3.1 Common Properties in Prototype Selection

Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

8.3.2 Prototype Selection Methods . . . . . . . . . . . . . . . . . . 202

8.3.3 Taxonomy of Prototype Selection Methods . . . . . . . . 202

xii Contents

8.4 Description of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

8.4.1 Condensation Algorithms. . . . . . . . . . . . . . . . . . . . . 206

8.4.2 Edition Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 210

8.4.3 Hybrid Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 212

8.5 Related and Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . 221

8.5.1 Prototype Generation. . . . . . . . . . . . . . . . . . . . . . . . 221

8.5.2 Distance Metrics, Feature Weighting

and Combinations with Feature Selection. . . . . . . . . . 221

8.5.3 Hybridizations with Other Learning Methods

and Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

8.5.4 Scaling-Up Approaches . . . . . . . . . . . . . . . . . . . . . . 223

8.5.5 Data Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . 223

8.6 Experimental Comparative Analysis in Prototype Selection . . . 224

8.6.1 Analysis and Empirical Results on Small

Size Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

8.6.2 Analysis and Empirical Results on Medium

Size Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

8.6.3 Global View of the Obtained Results . . . . . . . . . . . . 231

8.6.4 Visualization of Data Subsets: A Case Study

Based on the Banana Data Set . . . . . . . . . . . . . . . . . 233

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

9 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

9.2 Perspectives and Background . . . . . . . . . . . . . . . . . . . . . . . . 247

9.2.1 Discretization Process . . . . . . . . . . . . . . . . . . . . . . . 247

9.2.2 Related and Advanced Work . . . . . . . . . . . . . . . . . . 250

9.3 Properties and Taxonomy. . . . . . . . . . . . . . . . . . . . . . . . . . . 251

9.3.1 Common Properties. . . . . . . . . . . . . . . . . . . . . . . . . 251

9.3.2 Methods and Taxonomy . . . . . . . . . . . . . . . . . . . . . 255

9.3.3 Description of the Most Representative

Discretization Methods . . . . . . . . . . . . . . . . . . . . . . 259

9.4 Experimental Comparative Analysis . . . . . . . . . . . . . . . . . . . 265

9.4.1 Experimental Set up . . . . . . . . . . . . . . . . . . . . . . . . 265

9.4.2 Analysis and Empirical Results. . . . . . . . . . . . . . . . . 268

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

10 A Data Mining Software Package Including Data Preparation

and Reduction: KEEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

10.1 Data Mining Softwares and Toolboxes . . . . . . . . . . . . . . . . . 285

10.2 KEEL: Knowledge Extraction Based on Evolutionary

Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

10.2.1 Main Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

10.2.2 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . 289

Contents xiii

10.2.3 Design of Experiments: Off-Line Module . . . . . . . . . 291

10.2.4 Computer-Based Education: On-Line Module. . . . . . . 293

10.3 KEEL-Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

10.3.1 Data Sets Web Pages . . . . . . . . . . . . . . . . . . . . . . . 294

10.3.2 Experimental Study Web Pages . . . . . . . . . . . . . . . . 297

10.4 Integration of New Algorithms into the KEEL Tool . . . . . . . . 298

10.4.1 Introduction to the KEEL Codification Features . . . . . 298

10.5 KEEL Statistical Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

10.5.1 Case Study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

10.6 Summarizing Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

xiv Contents

Acronyms

ANN Artificial Neural Network

CV Cross Validation

DM Data Mining

DR Dimensionality Reduction

EM Expectation-Maximization

FCV Fold Cross Validation

FS Feature Selection

IS Instance Selection

KDD Knowledge Discovery in Data

KEEL Knowledge Extraction based on Evolutionary Learning

KNN K-Nearest Neighbors

LLE Locally Linear Embedding

LVQ Learning Vector Quantization

MDS Multi Dimensional Scaling

MI Mutual Information

ML Machine Learning

MLP Multi-Layer Perceptron

MV Missing Value

PCA Principal Components Analysis

RBFN Radial Basis Function Network

SONN Self Organizing Neural Network

SVM Support Vector Machine

xv

Tải ngay đi em, còn do dự, trời tối mất!