Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Mining for Business Intelligence
PREMIUM
Số trang
726
Kích thước
12.2 MB
Định dạng
PDF
Lượt xem
1183

Data Mining for Business Intelligence

Nội dung xem thử

Mô tả chi tiết

Contents

Foreword

Preface to the second edition

Preface to the first edition

Acknowledgments

PART I PRELIMINARIES

Chapter 1 Introduction

1.1 What Is Data Mining?

1.2 Where Is Data Mining Used?

1.3 Origins of Data Mining

1.4 Rapid Growth of Data Mining

1.5 Why Are There So Many Different Methods?

1.6 Terminology and Notation

1.7 Road Maps to This Book

Chapter 2 Overview of the Data Mining Process

2.1 Introduction

2.2 Core Ideas in Data Mining

2

2.3 Supervised and Unsupervised Learning

2.4 Steps in Data Mining

2.5 Preliminary Steps

2.6 Building a Model: Example with Linear Regression

2.7 Using Excel for Data Mining

PROBLEMS

PART II DATA EXPLORATION AND DIMENSION

REDUCTION

Chapter 3 Data Visualization

3.1 Uses of Data Visualization

3.2 Data Examples

3.3 Basic Charts: bar charts, line graphs, and scatterplots

3.4 Multidimensional Visualization

3.5 Specialized Visualizations

3.6 Summary of major visualizations and operations,

according to data mining goal

PROBLEMS

Chapter 4 Dimension Reduction

3

4.1 Introduction

4.2 Practical Considerations

4.3 Data Summaries

4.4 Correlation Analysis

4.5 Reducing the Number of Categories in Categorical

Variables

4.6 Converting A Categorical Variable to A Numerical

Variable

4.7 Principal Components Analysis

4.8 Dimension Reduction Using Regression Models

4.9 Dimension Reduction Using Classification and

Regression Trees

PROBLEMS

PART III PERFORMANCE EVALUATION

Chapter 5 Evaluating Classification and Predictive

Performance

5.1 Introduction

5.2 Judging Classification Performance

5.3 Evaluating Predictive Performance

4

PROBLEMS

PART IV PREDICTION AND CLASSIFICATION

METHODS

Chapter 6 Multiple Linear Regression

6.1 Introduction

6.2 Explanatory versus Predictive modeling

6.3 Estimating the Regression Equation and Prediction

6.4 Variable Selection in Linear Regression

PROBLEMS

Chapter 7 k-Nearest Neighbors (k-NN)

7.1 k-NN Classifier (categorical outcome)

7.2 k-NN for a Numerical Response

7.3 Advantages and Shortcomings of k-NN Algorithms

PROBLEMS

Chapter 8 Naive Bayes

8.1 Introduction

8.2 Applying the Full (Exact) Bayesian Classifier

5

8.3 Advantages and Shortcomings of the Naive Bayes

Classifier

PROBLEMS

Chapter 9 Classification and Regression Trees

9.1 Introduction

9.2 Classification Trees

9.3 Measures of Impurity

9.4 Evaluating the Performance of a Classification Tree

9.5 Avoiding Overfitting

9.6 Classification Rules from Trees

9.7 Classification Trees for More Than two Classes

9.8 Regression Trees

9.9 Advantages, weaknesses, and Extensions

PROBLEMS

Chapter 10 Logistic Regression

10.1 Introduction

10.2 Logistic Regression Model

10.3 Evaluating Classification performance

6

10.4 Example of Complete Analysis: Predicting Delayed

Flights

10.5 Appendix: logistic Regression for Profiling

PROBLEMS

Chapter 11 Neural Nets

11.1 Introduction

11.2 Concept And Structure Of A Neural Network

11.3 Fitting A Network To Data

11.4 Required User Input

11.5 Exploring The Relationship Between Predictors And

Response

11.6 Advantages And Weaknesses Of Neural Networks

PROBLEMS

Chapter 12 Discriminant Analysis

12.1 Introduction

12.2 Distance of an Observation from a Class

12.3 Fisher’s Linear Classification Functions

12.4 Classification performance of Discriminant Analysis

7

12.5 Prior Probabilities

12.6 Unequal Misclassification Costs

12.7 Classifying more Than Two Classes

12.8 Advantages and Weaknesses

PROBLEMS

PART V MINING RELATIONSHIPS AMONG

RECORDS

Chapter 13 Association Rules

13.1 Introduction

13.2 Discovering Association Rules in Transaction

Databases

13.3 Generating Candidate Rules

13.4 Selecting Strong Rules

13.5 Summary

PROBLEMS

Chapter 14 Cluster Analysis

14.1 Introduction

14.2 Measuring Distance Between Two Records

8

14.3 Measuring Distance Between Two Clusters

14.4 Hierarchical (Agglomerative) Clustering

14.5 Nonhierarchical Clustering: The k-Means Algorithm

PROBLEMS

PART VI FORECASTING TIME SERIES

Chapter 15 Handling Time Series

15.1 Introduction

15.2 Explanatory versus Predictive Modeling

15.3 Popular Forecasting Methods in Business

15.4 Time Series Components

15.5 Data Partitioning

PROBLEMS

Chapter 16 Regression-Based Forecasting

16.1 Model With Trend

16.2 Model With Seasonality

16.3 Model With Trend And Seasonality

16.4 Autocorrelation And ARIMA Models

9

PROBLEMS

Chapter 17 Smoothing Methods

17.1 Introduction

17.2 Moving Average

17.3 Simple Exponential Smoothing

17.4 Advanced Exponential Smoothing

PROBLEMS

PART VII CASES

Chapter 18 Cases

18.1 Charles book Club

18.2 German Credit

18.3 Tayko Software Cataloger

18.4 Segmenting Consumers of Bath Soap

18.5 Direct-Mail Fundraising

18.6 Catalog Cross Selling

18.7 Predicting Bankruptcy

18.8 Time Series Case: Forecasting Public Transportation

Demand

10

References

Index

11

12

To our families

Boaz and Noa

Tehmi, Arjun, and in

memory of Aneesh

Liz, Lisa, and Allison

13

Copyright 2010 by John Wiley & Sons, Inc. All rights

reserved

Published by John Wiley & Sons, Inc., Hoboken, New

Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a

retrieval system, or transmitted in any form or by any

means, electronic, mechanical, photocopying, recording,

scanning, or otherwise, except as permitted under Section

107 or 108 of the 1976 United States Copyright Act,

without either the prior written permission of the

Publisher, or authorization through payment of the

appropriate per-copy fee to the Copyright Clearance

Center, Inc., 222 Rosewood Drive, Danvers, MA 01923,

(978) 750-8400, fax (978) 750-4470, or on the web at

www.copyright.com. Requests to the Publisher for

permission should be addressed to the Permissions

Department, John Wiley & Sons, Inc., 111 River Street,

Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008,

or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the

publisher and author have used their best efforts in

preparing this book, they make no representations or

warranties with respect to the accuracy or completeness of

the contents of this book and specifically disclaim any

implied warranties of merchantability or fitness for a

particular purpose. No warranty may be created or

extended by sales representatives or written sales

14

materials. The advice and strategies contained herein may

not be suitable for your situation. You should consult with

a professional where appropriate. Neither the publisher nor

author shall be liable for any loss of profit or any other

commercial damages, including but not limited to special,

incidental, consequential, or other damages.

For general information on our other products and services

or for technical support, please contact our Customer Care

Department within the United States at (800) 762-2974,

outside the United States at (317) 572-3993 or fax

(317)572-4002.

Wiley also publishes its books in a variety of electronic

formats. Some content that appears in print may not be

available in electronic formats. For more information

about Wiley products, visit our web site at

www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Shmueli, Galit, 1971-

Data mining for business intelligence: concepts,

techniques, and applications in Microsoft Office Excel

with XLMiner / Galit Shmueli, Nitin R. Patel, Peter C.

Bruce. – 2nd ed.

p. cm.

Includes bibliographical references and index.

ISBN 978-0-470-52682-8 (cloth)

15

Tải ngay đi em, còn do dự, trời tối mất!