Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Mining for Business Intelligence
Nội dung xem thử
Mô tả chi tiết
Contents
Foreword
Preface to the second edition
Preface to the first edition
Acknowledgments
PART I PRELIMINARIES
Chapter 1 Introduction
1.1 What Is Data Mining?
1.2 Where Is Data Mining Used?
1.3 Origins of Data Mining
1.4 Rapid Growth of Data Mining
1.5 Why Are There So Many Different Methods?
1.6 Terminology and Notation
1.7 Road Maps to This Book
Chapter 2 Overview of the Data Mining Process
2.1 Introduction
2.2 Core Ideas in Data Mining
2
2.3 Supervised and Unsupervised Learning
2.4 Steps in Data Mining
2.5 Preliminary Steps
2.6 Building a Model: Example with Linear Regression
2.7 Using Excel for Data Mining
PROBLEMS
PART II DATA EXPLORATION AND DIMENSION
REDUCTION
Chapter 3 Data Visualization
3.1 Uses of Data Visualization
3.2 Data Examples
3.3 Basic Charts: bar charts, line graphs, and scatterplots
3.4 Multidimensional Visualization
3.5 Specialized Visualizations
3.6 Summary of major visualizations and operations,
according to data mining goal
PROBLEMS
Chapter 4 Dimension Reduction
3
4.1 Introduction
4.2 Practical Considerations
4.3 Data Summaries
4.4 Correlation Analysis
4.5 Reducing the Number of Categories in Categorical
Variables
4.6 Converting A Categorical Variable to A Numerical
Variable
4.7 Principal Components Analysis
4.8 Dimension Reduction Using Regression Models
4.9 Dimension Reduction Using Classification and
Regression Trees
PROBLEMS
PART III PERFORMANCE EVALUATION
Chapter 5 Evaluating Classification and Predictive
Performance
5.1 Introduction
5.2 Judging Classification Performance
5.3 Evaluating Predictive Performance
4
PROBLEMS
PART IV PREDICTION AND CLASSIFICATION
METHODS
Chapter 6 Multiple Linear Regression
6.1 Introduction
6.2 Explanatory versus Predictive modeling
6.3 Estimating the Regression Equation and Prediction
6.4 Variable Selection in Linear Regression
PROBLEMS
Chapter 7 k-Nearest Neighbors (k-NN)
7.1 k-NN Classifier (categorical outcome)
7.2 k-NN for a Numerical Response
7.3 Advantages and Shortcomings of k-NN Algorithms
PROBLEMS
Chapter 8 Naive Bayes
8.1 Introduction
8.2 Applying the Full (Exact) Bayesian Classifier
5
8.3 Advantages and Shortcomings of the Naive Bayes
Classifier
PROBLEMS
Chapter 9 Classification and Regression Trees
9.1 Introduction
9.2 Classification Trees
9.3 Measures of Impurity
9.4 Evaluating the Performance of a Classification Tree
9.5 Avoiding Overfitting
9.6 Classification Rules from Trees
9.7 Classification Trees for More Than two Classes
9.8 Regression Trees
9.9 Advantages, weaknesses, and Extensions
PROBLEMS
Chapter 10 Logistic Regression
10.1 Introduction
10.2 Logistic Regression Model
10.3 Evaluating Classification performance
6
10.4 Example of Complete Analysis: Predicting Delayed
Flights
10.5 Appendix: logistic Regression for Profiling
PROBLEMS
Chapter 11 Neural Nets
11.1 Introduction
11.2 Concept And Structure Of A Neural Network
11.3 Fitting A Network To Data
11.4 Required User Input
11.5 Exploring The Relationship Between Predictors And
Response
11.6 Advantages And Weaknesses Of Neural Networks
PROBLEMS
Chapter 12 Discriminant Analysis
12.1 Introduction
12.2 Distance of an Observation from a Class
12.3 Fisher’s Linear Classification Functions
12.4 Classification performance of Discriminant Analysis
7
12.5 Prior Probabilities
12.6 Unequal Misclassification Costs
12.7 Classifying more Than Two Classes
12.8 Advantages and Weaknesses
PROBLEMS
PART V MINING RELATIONSHIPS AMONG
RECORDS
Chapter 13 Association Rules
13.1 Introduction
13.2 Discovering Association Rules in Transaction
Databases
13.3 Generating Candidate Rules
13.4 Selecting Strong Rules
13.5 Summary
PROBLEMS
Chapter 14 Cluster Analysis
14.1 Introduction
14.2 Measuring Distance Between Two Records
8
14.3 Measuring Distance Between Two Clusters
14.4 Hierarchical (Agglomerative) Clustering
14.5 Nonhierarchical Clustering: The k-Means Algorithm
PROBLEMS
PART VI FORECASTING TIME SERIES
Chapter 15 Handling Time Series
15.1 Introduction
15.2 Explanatory versus Predictive Modeling
15.3 Popular Forecasting Methods in Business
15.4 Time Series Components
15.5 Data Partitioning
PROBLEMS
Chapter 16 Regression-Based Forecasting
16.1 Model With Trend
16.2 Model With Seasonality
16.3 Model With Trend And Seasonality
16.4 Autocorrelation And ARIMA Models
9
PROBLEMS
Chapter 17 Smoothing Methods
17.1 Introduction
17.2 Moving Average
17.3 Simple Exponential Smoothing
17.4 Advanced Exponential Smoothing
PROBLEMS
PART VII CASES
Chapter 18 Cases
18.1 Charles book Club
18.2 German Credit
18.3 Tayko Software Cataloger
18.4 Segmenting Consumers of Bath Soap
18.5 Direct-Mail Fundraising
18.6 Catalog Cross Selling
18.7 Predicting Bankruptcy
18.8 Time Series Case: Forecasting Public Transportation
Demand
10
References
Index
11
12
To our families
Boaz and Noa
Tehmi, Arjun, and in
memory of Aneesh
Liz, Lisa, and Allison
13
Copyright 2010 by John Wiley & Sons, Inc. All rights
reserved
Published by John Wiley & Sons, Inc., Hoboken, New
Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a
retrieval system, or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording,
scanning, or otherwise, except as permitted under Section
107 or 108 of the 1976 United States Copyright Act,
without either the prior written permission of the
Publisher, or authorization through payment of the
appropriate per-copy fee to the Copyright Clearance
Center, Inc., 222 Rosewood Drive, Danvers, MA 01923,
(978) 750-8400, fax (978) 750-4470, or on the web at
www.copyright.com. Requests to the Publisher for
permission should be addressed to the Permissions
Department, John Wiley & Sons, Inc., 111 River Street,
Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008,
or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the
publisher and author have used their best efforts in
preparing this book, they make no representations or
warranties with respect to the accuracy or completeness of
the contents of this book and specifically disclaim any
implied warranties of merchantability or fitness for a
particular purpose. No warranty may be created or
extended by sales representatives or written sales
14
materials. The advice and strategies contained herein may
not be suitable for your situation. You should consult with
a professional where appropriate. Neither the publisher nor
author shall be liable for any loss of profit or any other
commercial damages, including but not limited to special,
incidental, consequential, or other damages.
For general information on our other products and services
or for technical support, please contact our Customer Care
Department within the United States at (800) 762-2974,
outside the United States at (317) 572-3993 or fax
(317)572-4002.
Wiley also publishes its books in a variety of electronic
formats. Some content that appears in print may not be
available in electronic formats. For more information
about Wiley products, visit our web site at
www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Shmueli, Galit, 1971-
Data mining for business intelligence: concepts,
techniques, and applications in Microsoft Office Excel
with XLMiner / Galit Shmueli, Nitin R. Patel, Peter C.
Bruce. – 2nd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-52682-8 (cloth)
15