Introduction to Pattern Recognition and Machine Learning (IISc Lecture Notes Series 2010–2402 ; vol. 5)

8037_9789814335454_tp.indd 1 26/2/15 12:15 pm

IISc Lecture Notes Series ISSN: 2010-2402

Editor-in-Chief: Gadadhar Misra

Editors: Chandrashekar S Jog

Joy Kuri

K L Sebastian

Diptiman Sen

Sandhya Visweswariah

Published:

Vol. 1: Introduction to Algebraic Geometry and Commutative Algebra

by Dilip P Patil & Uwe Storch

Vol. 2: Schwarz’s Lemma from a Differential Geometric Veiwpoint

by Kang-Tae Kim & Hanjin Lee

Vol. 3: Noise and Vibration Control

by M L Munjal

Vol. 4: Game Theory and Mechanism Design

by Y Narahari

Vol. 5 Introduction to Pattern Recognition and Machine Learning

by M. Narasimha Murty & V. Susheela Devi

Dipa - Introduction to pattern recognition.indd 1 10/4/2015 1:29:09 PM

World Scientific

8037_9789814335454_tp.indd 2 26/2/15 12:15 pm

Published by

World Scientific Publishing Co. Pte. Ltd.

5 Toh Tuck Link, Singapore 596224

USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601

UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

Library of Congress Cataloging-in-Publication Data

Murty, M. Narasimha.

Introduction to pattern recognition and machine learning / by M Narasimha Murty &

V Susheela Devi (Indian Institute of Science, India).

pages cm. -- (IISc lecture notes series, 2010–2402 ; vol. 5)

ISBN 978-9814335454

1. Pattern recognition systems. 2. Machine learning. I. Devi, V. Susheela. II. Title.

TK7882.P3M87 2015

006.4--dc23

2014044796

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.

electronic or mechanical, including photocopying, recording or any information storage and retrieval

system now known or to be invented, without written permission from the publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance

Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy

is not required from the publisher.

In-house Editors: Chandra Nugraha/Dipasri Sardar

Typeset by Stallion Press

Email: [email protected]

Printed in Singapore

Dipa - Introduction to pattern recognition.indd 3 10/4/2015 1:29:09 PM

Series Preface

World Scientific Publishing Company - Indian Institute of Science Collaboration

IISc Press and WSPC are co-publishing books authored by world renowned scientists and engineers. This collaboration, started in 2008 during IISc’s centenary

year under a Memorandum of Understanding between IISc and WSPC, has resulted

in the establishment of three Series: IISc Centenary Lectures Series (ICLS), IISc

Research Monographs Series (IRMS), and IISc Lecture Notes Series (ILNS).

This pioneering collaboration will contribute significantly in disseminating current

Indian scientific advancement worldwide.

The “IISc Centenary Lectures Series” will comprise lectures by designated

Centenary Lecturers - eminent teachers and researchers from all over the world.

The “IISc Research Monographs Series” will comprise state-of-the-art monographs written by experts in specific areas. They will include, but not limited to,

the authors’ own research work.

The “IISc Lecture Notes Series” will consist of books that are reasonably selfcontained and can be used either as textbooks or for self-study at the postgraduate

level in science and engineering. The books will be based on material that has been

class-tested for most part.

Editorial Board for the IISc Lecture Notes Series (ILNS):

Gadadhar Misra, Editor-in-Chief ([email protected])

Chandrashekar S Jog ([email protected])

Joy Kuri ([email protected])

K L Sebastian ([email protected])

Diptiman Sen ([email protected])

Sandhya Visweswariah ([email protected])

Dipa - Introduction to pattern recognition.indd 2 10/4/2015 1:29:09 PM

May 2, 2013 14:6 BC: 8831 - Probability and Statistical Theory PST˙ws

This page intentionally left blank

April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page vii

Table of Contents

About the Authors xiii

Preface xv

1. Introduction 1

1. Classifiers: An Introduction . . . . . . . . . . . . . . 5

2. An Introduction to Clustering . . . . . . . . . . . . . 14

3. Machine Learning . . . . . . . . . . . . . . . . . . . 25

2. Types of Data 37

1. Features and Patterns . . . . . . . . . . . . . . . . . 37

2. Domain of a Variable . . . . . . . . . . . . . . . . . 39

3. Types of Features . . . . . . . . . . . . . . . . . . . 41

3.1. Nominal data . . . . . . . . . . . . . . . . . . 41

3.2. Ordinal data . . . . . . . . . . . . . . . . . . . 45

3.3. Interval-valued variables . . . . . . . . . . . . 48

3.4. Ratio variables . . . . . . . . . . . . . . . . . . 49

3.5. Spatio-temporal data . . . . . . . . . . . . . . 49

4. Proximity measures . . . . . . . . . . . . . . . . . . 50

4.1. Fractional norms . . . . . . . . . . . . . . . . 56

4.2. Are metrics essential? . . . . . . . . . . . . . . 57

4.3. Similarity between vectors . . . . . . . . . . . 59

4.4. Proximity between spatial patterns . . . . . . 61

4.5. Proximity between temporal patterns . . . . . 62

vii

April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page viii

viii Table of Contents

4.6. Mean dissimilarity . . . . . . . . . . . . . . . . 63

4.7. Peak dissimilarity . . . . . . . . . . . . . . . . 63

4.8. Correlation coefficient . . . . . . . . . . . . . . 64

4.9. Dynamic Time Warping (DTW) distance . . . 64

3. Feature Extraction and Feature Selection 75

1. Types of Feature Selection . . . . . . . . . . . . . . . 76

2. Mutual Information (MI) for Feature Selection . . . 78

3. Chi-square Statistic . . . . . . . . . . . . . . . . . . 79

4. Goodman–Kruskal Measure . . . . . . . . . . . . . . 81

5. Laplacian Score . . . . . . . . . . . . . . . . . . . . . 81

6. Singular Value Decomposition (SVD) . . . . . . . . 83

7. Non-negative Matrix Factorization (NMF) . . . . . . 84

8. Random Projections (RPs) for Feature

Extraction . . . . . . . . . . . . . . . . . . . . . . . 86

8.1. Advantages of random projections . . . . . . . 88

9. Locality Sensitive Hashing (LSH) . . . . . . . . . . . 88

10. Class Separability . . . . . . . . . . . . . . . . . . . 90

11. Genetic and Evolutionary Algorithms . . . . . . . . 91

11.1. Hybrid GA for feature selection . . . . . . . . 92

12. Ranking for Feature Selection . . . . . . . . . . . . . 96

12.1. Feature selection based on an optimization

formulation . . . . . . . . . . . . . . . . . . . . 97

12.2. Feature ranking using F-score . . . . . . . . . 99

12.3. Feature ranking using linear support vector

machine (SVM) weight vector . . . . . . . . . 100

12.4. Ensemble feature ranking . . . . . . . . . . . . 101

12.5. Feature ranking using number

of label changes . . . . . . . . . . . . . . . . . 103

13. Feature Selection for Time Series Data . . . . . . . . 103

13.1. Piecewise aggregate approximation . . . . . . 103

13.2. Spectral decomposition . . . . . . . . . . . . . 104

13.3. Wavelet decomposition . . . . . . . . . . . . . 104

13.4. Singular Value Decomposition (SVD) . . . . . 104

13.5. Common principal component loading based

variable subset selection (CLeVer) . . . . . . . 104

April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page ix

Table of Contents ix

4. Bayesian Learning 111

1. Document Classification . . . . . . . . . . . . . . . . 111

2. Naive Bayes Classifier . . . . . . . . . . . . . . . . . 113

3. Frequency-Based Estimation of Probabilities . . . . 115

4. Posterior Probability . . . . . . . . . . . . . . . . . . 117

5. Density Estimation . . . . . . . . . . . . . . . . . . . 119

6. Conjugate Priors . . . . . . . . . . . . . . . . . . . . 126

5. Classification 135

1. Classification Without Learning . . . . . . . . . . . 135

2. Classification in High-Dimensional Spaces . . . . . . 139

2.1. Fractional distance metrics . . . . . . . . . . . 141

2.2. Shrinkage–divergence proximity (SDP) . . . . 143

3. Random Forests . . . . . . . . . . . . . . . . . . . . 144

3.1. Fuzzy random forests . . . . . . . . . . . . . . 148

4. Linear Support Vector Machine (SVM) . . . . . . . 150

4.1. SVM–kNN . . . . . . . . . . . . . . . . . . . . 153

4.2. Adaptation of cutting plane algorithm . . . . 154

4.3. Nystrom approximated SVM . . . . . . . . . . 155

5. Logistic Regression . . . . . . . . . . . . . . . . . . . 156

6. Semi-supervised Classification . . . . . . . . . . . . . 159

6.1. Using clustering algorithms . . . . . . . . . . . 160

6.2. Using generative models . . . . . . . . . . . . 160

6.3. Using low density separation . . . . . . . . . . 161

6.4. Using graph-based methods . . . . . . . . . . 162

6.5. Using co-training methods . . . . . . . . . . . 164

6.6. Using self-training methods . . . . . . . . . . . 165

6.7. SVM for semi-supervised classification . . . . 166

6.8. Random forests for semi-supervised

classification . . . . . . . . . . . . . . . . . . . 166

7. Classification of Time-Series Data . . . . . . . . . . 167

7.1. Distance-based classification . . . . . . . . . . 168

7.2. Feature-based classification . . . . . . . . . . . 169

7.3. Model-based classification . . . . . . . . . . . 170

April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page x

x Table of Contents

6. Classification using Soft Computing Techniques 177

1. Introduction . . . . . . . . . . . . . . . . . . . . . . 177

2. Fuzzy Classification . . . . . . . . . . . . . . . . . . 178

2.1. Fuzzy k-nearest neighbor algorithm . . . . . . 179

3. Rough Classification . . . . . . . . . . . . . . . . . . 179

3.1. Rough set attribute reduction . . . . . . . . . 180

3.2. Generating decision rules . . . . . . . . . . . . 181

4. GAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

4.1. Weighting of attributes using GA . . . . . . . 182

4.2. Binary pattern classification using GA . . . . 184

4.3. Rule-based classification using GAs . . . . . . 185

4.4. Time series classification . . . . . . . . . . . . 187

4.5. Using generalized Choquet integral with

signed fuzzy measure for classification

using GAs . . . . . . . . . . . . . . . . . . . . 187

4.6. Decision tree induction using

Evolutionary algorithms . . . . . . . . . . . . 191

5. Neural Networks for Classification . . . . . . . . . . 195

5.1. Multi-layer feed forward network

with backpropagation . . . . . . . . . . . . . . 197

5.2. Training a feedforward neural network

using GAs . . . . . . . . . . . . . . . . . . . . 199

6. Multi-label Classification . . . . . . . . . . . . . . . 202

6.1. Multi-label kNN (mL-kNN) . . . . . . . . . . 203

6.2. Probabilistic classifier chains (PCC) . . . . . . 204

6.3. Binary relevance (BR) . . . . . . . . . . . . . 205

6.4. Using label powersets (LP) . . . . . . . . . . . 205

6.5. Neural networks for Multi-label

classification . . . . . . . . . . . . . . . . . . . 206

6.6. Evaluation of multi-label classification . . . . 209

7. Data Clustering 215

1. Number of Partitions . . . . . . . . . . . . . . . . . 215

2. Clustering Algorithms . . . . . . . . . . . . . . . . . 218

2.1. K-means algorithm . . . . . . . . . . . . . . . 219

April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page xi

Table of Contents xi

2.2. Leader algorithm . . . . . . . . . . . . . . . . 223

2.3. BIRCH: Balanced Iterative Reducing

and Clustering using Hierarchies . . . . . . . . 225

2.4. Clustering based on graphs . . . . . . . . . . . 230

3. Why Clustering? . . . . . . . . . . . . . . . . . . . . 241

3.1. Data compression . . . . . . . . . . . . . . . . 241

3.2. Outlier detection . . . . . . . . . . . . . . . . 242

3.3. Pattern synthesis . . . . . . . . . . . . . . . . 243

4. Clustering Labeled Data . . . . . . . . . . . . . . . . 246

4.1. Clustering for classification . . . . . . . . . . . 246

4.2. Knowledge-based clustering . . . . . . . . . . 250

5. Combination of Clusterings . . . . . . . . . . . . . . 255

8. Soft Clustering 263

1. Soft Clustering Paradigms . . . . . . . . . . . . . . . 264

2. Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . 266

2.1. Fuzzy K-means algorithm . . . . . . . . . . . 267

3. Rough Clustering . . . . . . . . . . . . . . . . . . . . 269

3.1. Rough K-means algorithm . . . . . . . . . . . 271

4. Clustering Based on Evolutionary Algorithms . . . . 272

5. Clustering Based on Neural Networks . . . . . . . . 281

6. Statistical Clustering . . . . . . . . . . . . . . . . . . 282

6.1. OKM algorithm . . . . . . . . . . . . . . . . . 283

6.2. EM-based clustering . . . . . . . . . . . . . . . 285

7. Topic Models . . . . . . . . . . . . . . . . . . . . . . 293

7.1. Matrix factorization-based methods . . . . . . 295

7.2. Divide-and-conquer approach . . . . . . . . . 296

7.3. Latent Semantic Analysis (LSA) . . . . . . . . 299

7.4. SVD and PCA . . . . . . . . . . . . . . . . . . 302

7.5. Probabilistic Latent Semantic Analysis

(PLSA) . . . . . . . . . . . . . . . . . . . . . . 307

7.6. Non-negative Matrix Factorization

(NMF) . . . . . . . . . . . . . . . . . . . . . . 310

7.7. LDA . . . . . . . . . . . . . . . . . . . . . . . 311

7.8. Concept and topic . . . . . . . . . . . . . . . . 316

April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page xii

xii Table of Contents

9. Application — Social and Information Networks 321

1. Introduction . . . . . . . . . . . . . . . . . . . . . . 321

2. Patterns in Graphs . . . . . . . . . . . . . . . . . . . 322

3. Identification of Communities in Networks . . . . . . 326

3.1. Graph partitioning . . . . . . . . . . . . . . . 328

3.2. Spectral clustering . . . . . . . . . . . . . . . . 329

3.3. Linkage-based clustering . . . . . . . . . . . . 331

3.4. Hierarchical clustering . . . . . . . . . . . . . 331

3.5. Modularity optimization for partitioning

graphs . . . . . . . . . . . . . . . . . . . . . . 333

4. Link Prediction . . . . . . . . . . . . . . . . . . . . . 340

4.1. Proximity functions . . . . . . . . . . . . . . . 341

5. Information Diffusion . . . . . . . . . . . . . . . . . 347

5.1. Graph-based approaches . . . . . . . . . . . . 348

5.2. Non-graph approaches . . . . . . . . . . . . . 349

6. Identifying Specific Nodes in a Social Network . . . 353

7. Topic Models . . . . . . . . . . . . . . . . . . . . . . 355

7.1. Probabilistic latent semantic analysis

(pLSA) . . . . . . . . . . . . . . . . . . . . . . 355

7.2. Latent dirichlet allocation (LDA) . . . . . . . 357

7.3. Author–topic model . . . . . . . . . . . . . . . 359

Index 365

April 8, 2015 13:2 Introduction to Pattern Recognition and Machine Learning - 9in x 6in b1904-fm page xiii

About the Authors

Professor M. Narasimha Murty completed his B.E., M.E., and

Ph.D. at the Indian Institute of Science (IISc), Bangalore. He joined

IISc as an Assistant Professor in 1984. He became a professor in 1996

and currently he is the Dean, Engineering Faculty at IISc. He has

guided more than 20 doctoral students and several masters students

over the past 30 years at IISc; most of these students have worked in

the areas of Pattern Recognition, Machine Learning, and Data Mining. A paper co-authored by him on Pattern Clustering has around

9600 citations as reported by Google scholar. A team led by him

had won the KDD Cup on the citation prediction task organized by

the Cornell University in 2003. He is elected as a fellow of both the

Indian National Academy of Engineering and the National Academy

of Sciences.

Dr. V. Susheela Devi completed her PhD at the Indian Institute

of Science in 2000. Since then she has worked as a faculty in the

Department of Computer Science and Automation at the Indian

Institute of Science. She works in the areas of Pattern Recognition, Data Mining, Machine Learning, and Soft Computing. She has

taught the courses Data Mining, Pattern Recognition, Data Structures and Algorithms, Computational Methods of Optimization and

Artificial Intelligence. She has a number of papers in international

conferences and journals.

xiii

May 2, 2013 14:6 BC: 8831 - Probability and Statistical Theory PST˙ws

This page intentionally left blank

Thư viện tri thức trực tuyến

Introduction to Pattern Recognition and Machine Learning (IISc Lecture Notes Series 2010–2402 ; vol. 5)

Nội dung xem thử

Mô tả chi tiết

Tài liệu tương tự (6)

Introduction to pattern recognition

Nhập môn LÝ THUYẾT NHẬN DẠNG Introduction to Pattern Recognition Chương 2: Hàm quyết định. Giảng

introduction to statistical pattern recognition 2nd ed. - k. fukunaga

introduction to design patterns in c 2002 1433

An introduction to design pattern in c++ with qt4

a0046 ibm introduction to design patterns in c shar morebook vn 015