Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Introduction to Pattern Recognition and Machine Learning (IISc Lecture Notes Series 2010–2402 ; vol. 5)
Nội dung xem thử
Mô tả chi tiết
8037_9789814335454_tp.indd 1 26/2/15 12:15 pm
IISc Lecture Notes Series ISSN: 2010-2402
Editor-in-Chief: Gadadhar Misra
Editors: Chandrashekar S Jog
Joy Kuri
K L Sebastian
Diptiman Sen
Sandhya Visweswariah
Published:
Vol. 1: Introduction to Algebraic Geometry and Commutative Algebra
by Dilip P Patil & Uwe Storch
Vol. 2: Schwarz’s Lemma from a Differential Geometric Veiwpoint
by Kang-Tae Kim & Hanjin Lee
Vol. 3: Noise and Vibration Control
by M L Munjal
Vol. 4: Game Theory and Mechanism Design
by Y Narahari
Vol. 5 Introduction to Pattern Recognition and Machine Learning
by M. Narasimha Murty & V. Susheela Devi
Dipa - Introduction to pattern recognition.indd 1 10/4/2015 1:29:09 PM
World Scientific
8037_9789814335454_tp.indd 2 26/2/15 12:15 pm
Published by
World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224
USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601
UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data
Murty, M. Narasimha.
Introduction to pattern recognition and machine learning / by M Narasimha Murty &
V Susheela Devi (Indian Institute of Science, India).
pages cm. -- (IISc lecture notes series, 2010–2402 ; vol. 5)
ISBN 978-9814335454
1. Pattern recognition systems. 2. Machine learning. I. Devi, V. Susheela. II. Title.
TK7882.P3M87 2015
006.4--dc23
2014044796
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
Copyright © 2015 by World Scientific Publishing Co. Pte. Ltd.
All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,
electronic or mechanical, including photocopying, recording or any information storage and retrieval
system now known or to be invented, without written permission from the publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy
is not required from the publisher.
In-house Editors: Chandra Nugraha/Dipasri Sardar
Typeset by Stallion Press
Email: [email protected]
Printed in Singapore
Dipa - Introduction to pattern recognition.indd 3 10/4/2015 1:29:09 PM
Series Preface
World Scientific Publishing Company - Indian Institute of Science Collaboration
IISc Press and WSPC are co-publishing books authored by world renowned scientists and engineers. This collaboration, started in 2008 during IISc’s centenary
year under a Memorandum of Understanding between IISc and WSPC, has resulted
in the establishment of three Series: IISc Centenary Lectures Series (ICLS), IISc
Research Monographs Series (IRMS), and IISc Lecture Notes Series (ILNS).
This pioneering collaboration will contribute significantly in disseminating current
Indian scientific advancement worldwide.
The “IISc Centenary Lectures Series” will comprise lectures by designated
Centenary Lecturers - eminent teachers and researchers from all over the world.
The “IISc Research Monographs Series” will comprise state-of-the-art monographs written by experts in specific areas. They will include, but not limited to,
the authors’ own research work.
The “IISc Lecture Notes Series” will consist of books that are reasonably selfcontained and can be used either as textbooks or for self-study at the postgraduate
level in science and engineering. The books will be based on material that has been
class-tested for most part.
Editorial Board for the IISc Lecture Notes Series (ILNS):
Gadadhar Misra, Editor-in-Chief ([email protected])
Chandrashekar S Jog ([email protected])
Joy Kuri ([email protected])
K L Sebastian ([email protected])
Diptiman Sen ([email protected])
Sandhya Visweswariah ([email protected])
Dipa - Introduction to pattern recognition.indd 2 10/4/2015 1:29:09 PM
May 2, 2013 14:6 BC: 8831 - Probability and Statistical Theory PST˙ws
This page intentionally left blank
April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page vii
Table of Contents
About the Authors xiii
Preface xv
1. Introduction 1
1. Classifiers: An Introduction . . . . . . . . . . . . . . 5
2. An Introduction to Clustering . . . . . . . . . . . . . 14
3. Machine Learning . . . . . . . . . . . . . . . . . . . 25
2. Types of Data 37
1. Features and Patterns . . . . . . . . . . . . . . . . . 37
2. Domain of a Variable . . . . . . . . . . . . . . . . . 39
3. Types of Features . . . . . . . . . . . . . . . . . . . 41
3.1. Nominal data . . . . . . . . . . . . . . . . . . 41
3.2. Ordinal data . . . . . . . . . . . . . . . . . . . 45
3.3. Interval-valued variables . . . . . . . . . . . . 48
3.4. Ratio variables . . . . . . . . . . . . . . . . . . 49
3.5. Spatio-temporal data . . . . . . . . . . . . . . 49
4. Proximity measures . . . . . . . . . . . . . . . . . . 50
4.1. Fractional norms . . . . . . . . . . . . . . . . 56
4.2. Are metrics essential? . . . . . . . . . . . . . . 57
4.3. Similarity between vectors . . . . . . . . . . . 59
4.4. Proximity between spatial patterns . . . . . . 61
4.5. Proximity between temporal patterns . . . . . 62
vii
April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page viii
viii Table of Contents
4.6. Mean dissimilarity . . . . . . . . . . . . . . . . 63
4.7. Peak dissimilarity . . . . . . . . . . . . . . . . 63
4.8. Correlation coefficient . . . . . . . . . . . . . . 64
4.9. Dynamic Time Warping (DTW) distance . . . 64
3. Feature Extraction and Feature Selection 75
1. Types of Feature Selection . . . . . . . . . . . . . . . 76
2. Mutual Information (MI) for Feature Selection . . . 78
3. Chi-square Statistic . . . . . . . . . . . . . . . . . . 79
4. Goodman–Kruskal Measure . . . . . . . . . . . . . . 81
5. Laplacian Score . . . . . . . . . . . . . . . . . . . . . 81
6. Singular Value Decomposition (SVD) . . . . . . . . 83
7. Non-negative Matrix Factorization (NMF) . . . . . . 84
8. Random Projections (RPs) for Feature
Extraction . . . . . . . . . . . . . . . . . . . . . . . 86
8.1. Advantages of random projections . . . . . . . 88
9. Locality Sensitive Hashing (LSH) . . . . . . . . . . . 88
10. Class Separability . . . . . . . . . . . . . . . . . . . 90
11. Genetic and Evolutionary Algorithms . . . . . . . . 91
11.1. Hybrid GA for feature selection . . . . . . . . 92
12. Ranking for Feature Selection . . . . . . . . . . . . . 96
12.1. Feature selection based on an optimization
formulation . . . . . . . . . . . . . . . . . . . . 97
12.2. Feature ranking using F-score . . . . . . . . . 99
12.3. Feature ranking using linear support vector
machine (SVM) weight vector . . . . . . . . . 100
12.4. Ensemble feature ranking . . . . . . . . . . . . 101
12.5. Feature ranking using number
of label changes . . . . . . . . . . . . . . . . . 103
13. Feature Selection for Time Series Data . . . . . . . . 103
13.1. Piecewise aggregate approximation . . . . . . 103
13.2. Spectral decomposition . . . . . . . . . . . . . 104
13.3. Wavelet decomposition . . . . . . . . . . . . . 104
13.4. Singular Value Decomposition (SVD) . . . . . 104
13.5. Common principal component loading based
variable subset selection (CLeVer) . . . . . . . 104
April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page ix
Table of Contents ix
4. Bayesian Learning 111
1. Document Classification . . . . . . . . . . . . . . . . 111
2. Naive Bayes Classifier . . . . . . . . . . . . . . . . . 113
3. Frequency-Based Estimation of Probabilities . . . . 115
4. Posterior Probability . . . . . . . . . . . . . . . . . . 117
5. Density Estimation . . . . . . . . . . . . . . . . . . . 119
6. Conjugate Priors . . . . . . . . . . . . . . . . . . . . 126
5. Classification 135
1. Classification Without Learning . . . . . . . . . . . 135
2. Classification in High-Dimensional Spaces . . . . . . 139
2.1. Fractional distance metrics . . . . . . . . . . . 141
2.2. Shrinkage–divergence proximity (SDP) . . . . 143
3. Random Forests . . . . . . . . . . . . . . . . . . . . 144
3.1. Fuzzy random forests . . . . . . . . . . . . . . 148
4. Linear Support Vector Machine (SVM) . . . . . . . 150
4.1. SVM–kNN . . . . . . . . . . . . . . . . . . . . 153
4.2. Adaptation of cutting plane algorithm . . . . 154
4.3. Nystrom approximated SVM . . . . . . . . . . 155
5. Logistic Regression . . . . . . . . . . . . . . . . . . . 156
6. Semi-supervised Classification . . . . . . . . . . . . . 159
6.1. Using clustering algorithms . . . . . . . . . . . 160
6.2. Using generative models . . . . . . . . . . . . 160
6.3. Using low density separation . . . . . . . . . . 161
6.4. Using graph-based methods . . . . . . . . . . 162
6.5. Using co-training methods . . . . . . . . . . . 164
6.6. Using self-training methods . . . . . . . . . . . 165
6.7. SVM for semi-supervised classification . . . . 166
6.8. Random forests for semi-supervised
classification . . . . . . . . . . . . . . . . . . . 166
7. Classification of Time-Series Data . . . . . . . . . . 167
7.1. Distance-based classification . . . . . . . . . . 168
7.2. Feature-based classification . . . . . . . . . . . 169
7.3. Model-based classification . . . . . . . . . . . 170
April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page x
x Table of Contents
6. Classification using Soft Computing Techniques 177
1. Introduction . . . . . . . . . . . . . . . . . . . . . . 177
2. Fuzzy Classification . . . . . . . . . . . . . . . . . . 178
2.1. Fuzzy k-nearest neighbor algorithm . . . . . . 179
3. Rough Classification . . . . . . . . . . . . . . . . . . 179
3.1. Rough set attribute reduction . . . . . . . . . 180
3.2. Generating decision rules . . . . . . . . . . . . 181
4. GAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.1. Weighting of attributes using GA . . . . . . . 182
4.2. Binary pattern classification using GA . . . . 184
4.3. Rule-based classification using GAs . . . . . . 185
4.4. Time series classification . . . . . . . . . . . . 187
4.5. Using generalized Choquet integral with
signed fuzzy measure for classification
using GAs . . . . . . . . . . . . . . . . . . . . 187
4.6. Decision tree induction using
Evolutionary algorithms . . . . . . . . . . . . 191
5. Neural Networks for Classification . . . . . . . . . . 195
5.1. Multi-layer feed forward network
with backpropagation . . . . . . . . . . . . . . 197
5.2. Training a feedforward neural network
using GAs . . . . . . . . . . . . . . . . . . . . 199
6. Multi-label Classification . . . . . . . . . . . . . . . 202
6.1. Multi-label kNN (mL-kNN) . . . . . . . . . . 203
6.2. Probabilistic classifier chains (PCC) . . . . . . 204
6.3. Binary relevance (BR) . . . . . . . . . . . . . 205
6.4. Using label powersets (LP) . . . . . . . . . . . 205
6.5. Neural networks for Multi-label
classification . . . . . . . . . . . . . . . . . . . 206
6.6. Evaluation of multi-label classification . . . . 209
7. Data Clustering 215
1. Number of Partitions . . . . . . . . . . . . . . . . . 215
2. Clustering Algorithms . . . . . . . . . . . . . . . . . 218
2.1. K-means algorithm . . . . . . . . . . . . . . . 219
April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page xi
Table of Contents xi
2.2. Leader algorithm . . . . . . . . . . . . . . . . 223
2.3. BIRCH: Balanced Iterative Reducing
and Clustering using Hierarchies . . . . . . . . 225
2.4. Clustering based on graphs . . . . . . . . . . . 230
3. Why Clustering? . . . . . . . . . . . . . . . . . . . . 241
3.1. Data compression . . . . . . . . . . . . . . . . 241
3.2. Outlier detection . . . . . . . . . . . . . . . . 242
3.3. Pattern synthesis . . . . . . . . . . . . . . . . 243
4. Clustering Labeled Data . . . . . . . . . . . . . . . . 246
4.1. Clustering for classification . . . . . . . . . . . 246
4.2. Knowledge-based clustering . . . . . . . . . . 250
5. Combination of Clusterings . . . . . . . . . . . . . . 255
8. Soft Clustering 263
1. Soft Clustering Paradigms . . . . . . . . . . . . . . . 264
2. Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . 266
2.1. Fuzzy K-means algorithm . . . . . . . . . . . 267
3. Rough Clustering . . . . . . . . . . . . . . . . . . . . 269
3.1. Rough K-means algorithm . . . . . . . . . . . 271
4. Clustering Based on Evolutionary Algorithms . . . . 272
5. Clustering Based on Neural Networks . . . . . . . . 281
6. Statistical Clustering . . . . . . . . . . . . . . . . . . 282
6.1. OKM algorithm . . . . . . . . . . . . . . . . . 283
6.2. EM-based clustering . . . . . . . . . . . . . . . 285
7. Topic Models . . . . . . . . . . . . . . . . . . . . . . 293
7.1. Matrix factorization-based methods . . . . . . 295
7.2. Divide-and-conquer approach . . . . . . . . . 296
7.3. Latent Semantic Analysis (LSA) . . . . . . . . 299
7.4. SVD and PCA . . . . . . . . . . . . . . . . . . 302
7.5. Probabilistic Latent Semantic Analysis
(PLSA) . . . . . . . . . . . . . . . . . . . . . . 307
7.6. Non-negative Matrix Factorization
(NMF) . . . . . . . . . . . . . . . . . . . . . . 310
7.7. LDA . . . . . . . . . . . . . . . . . . . . . . . 311
7.8. Concept and topic . . . . . . . . . . . . . . . . 316
April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page xii
xii Table of Contents
9. Application — Social and Information Networks 321
1. Introduction . . . . . . . . . . . . . . . . . . . . . . 321
2. Patterns in Graphs . . . . . . . . . . . . . . . . . . . 322
3. Identification of Communities in Networks . . . . . . 326
3.1. Graph partitioning . . . . . . . . . . . . . . . 328
3.2. Spectral clustering . . . . . . . . . . . . . . . . 329
3.3. Linkage-based clustering . . . . . . . . . . . . 331
3.4. Hierarchical clustering . . . . . . . . . . . . . 331
3.5. Modularity optimization for partitioning
graphs . . . . . . . . . . . . . . . . . . . . . . 333
4. Link Prediction . . . . . . . . . . . . . . . . . . . . . 340
4.1. Proximity functions . . . . . . . . . . . . . . . 341
5. Information Diffusion . . . . . . . . . . . . . . . . . 347
5.1. Graph-based approaches . . . . . . . . . . . . 348
5.2. Non-graph approaches . . . . . . . . . . . . . 349
6. Identifying Specific Nodes in a Social Network . . . 353
7. Topic Models . . . . . . . . . . . . . . . . . . . . . . 355
7.1. Probabilistic latent semantic analysis
(pLSA) . . . . . . . . . . . . . . . . . . . . . . 355
7.2. Latent dirichlet allocation (LDA) . . . . . . . 357
7.3. Author–topic model . . . . . . . . . . . . . . . 359
Index 365
April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page xiii
About the Authors
Professor M. Narasimha Murty completed his B.E., M.E., and
Ph.D. at the Indian Institute of Science (IISc), Bangalore. He joined
IISc as an Assistant Professor in 1984. He became a professor in 1996
and currently he is the Dean, Engineering Faculty at IISc. He has
guided more than 20 doctoral students and several masters students
over the past 30 years at IISc; most of these students have worked in
the areas of Pattern Recognition, Machine Learning, and Data Mining. A paper co-authored by him on Pattern Clustering has around
9600 citations as reported by Google scholar. A team led by him
had won the KDD Cup on the citation prediction task organized by
the Cornell University in 2003. He is elected as a fellow of both the
Indian National Academy of Engineering and the National Academy
of Sciences.
Dr. V. Susheela Devi completed her PhD at the Indian Institute
of Science in 2000. Since then she has worked as a faculty in the
Department of Computer Science and Automation at the Indian
Institute of Science. She works in the areas of Pattern Recognition, Data Mining, Machine Learning, and Soft Computing. She has
taught the courses Data Mining, Pattern Recognition, Data Structures and Algorithms, Computational Methods of Optimization and
Artificial Intelligence. She has a number of papers in international
conferences and journals.
xiii
May 2, 2013 14:6 BC: 8831 - Probability and Statistical Theory PST˙ws
This page intentionally left blank