Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Data Warehousing
Nội dung xem thử
Mô tả chi tiết
This page
intentionally left
blank
Copyright © 2006, New Age International (P) Ltd., Publishers
Published by New Age International (P) Ltd., Publishers
All rights reserved.
No part of this ebook may be reproduced in any form, by photostat, microfilm,
xerography, or any other means, or incorporated into any information retrieval
system, electronic or mechanical, without the written permission of the publisher.
All inquiries should be emailed to [email protected]
PUBLISHING FOR ONE WORLD
NEW AGE INTERNATIONAL (P) LIMITED, PUBLISHERS
4835/24, Ansari Road, Daryaganj, New Delhi - 110002
Visit us at www.newagepublishers.com
ISBN (13) : 978-81-224-2705-9
Dedicated DedicatedDedicated
To
Beloved Friends Beloved Friends
This page
intentionally left
blank
PREFACE
This book is intended for Information Technology (IT) professionals who have been
hearing about or have been tasked to evaluate, learn or implement data warehousing
technologies. This book also aims at providing fundamental techniques of KDD and Data
Mining as well as issues in practical use of Mining tools.
Far from being just a passing fad, data warehousing technology has grown much in scale
and reputation in the past few years, as evidenced by the increasing number of products,
vendors, organizations, and yes, even books, devoted to the subject. Enterprises that have
successfully implemented data warehouses find it strategic and often wonder how they ever
managed to survive without it in the past. Also Knowledge Discovery and Data Mining (KDD)
has emerged as a rapidly growing interdisciplinary field that merges together databases,
statistics, machine learning and related areas in order to extract valuable information and
knowledge in large volumes of data.
Volume-I is intended for IT professionals, who have been tasked with planning, managing, designing, implementing, supporting, maintaining and analyzing the organization’s data
warehouse.
The first section introduces the Enterprise Architecture and Data Warehouse concepts,
the basis of the reasons for writing this book.
The second section focuses on three of the key People in any data warehousing initiative: the Project Sponsor, the CIO, and the Project Manager. This section is devoted to
addressing the primary concerns of these individuals.
The third section presents a Process for planning and implementing a data warehouse
and provides guidelines that will prove extremely helpful for both first-time and experienced
warehouse developers.
The fourth section focuses on the Technology aspect of data warehousing. It lends order
to the dizzying array of technology components that you may use to build your data warehouse.
The fifth section opens a window to the future of data warehousing.
The sixth section deals with On-Line Analytical Processing (OLAP), by providing different features to select the tools from different vendors.
Volume-II shows how to achieve success in understanding and exploiting large databases
by uncovering valuable information hidden in data; learn what data has real meaning and
what data simply takes up space; examining which data methods and tools are most effective
for the practical needs; and how to analyze and evaluate obtained results.
S. NAGABHUSHANA
This page
intentionally left
blank
ACKNOWLEDGEMENTS
My sincere thanks to Prof. P. Rama Murthy, Principal, Intell Engineering College,
Anantapur, for his able guidance and valuable suggestions - in fact, it was he who brought
my attention to the writing of this book. I am grateful to Smt. G. Hampamma, Lecturer in
English, Intell Engineering College, Anantapur and her whole family for their constant support and assistance while writing the book. Prof. Jeffrey D. Ullman, Department of Computer
Science, Stanford University, U.S.A., deserves my special thanks for providing all the necessary resources. I am also thankful to Mr. R. Venkat, Senior Technical Associate at Virtusa,
Hyderabad, for going through the script and encouraging me.
Last but not least, I thank Mr. Saumya Gupta, Managing Director, New Age International (P) Limited, Publishers. New Delhi, for their interest in the publication of the book.
This page
intentionally left
blank
(xi)
CONTENTS
Preface (vii)
Acknowledgements (ix)
VOLUME I: DATA WAREHOUSING
IMPLEMENTATION AND OLAP
PART I : INTRODUCTION
Chapter 1. The Enterprise IT Architecture 5
1.1 The Past: Evolution of Enterprise Architectures 5
1.2 The Present: The IT Professional’s Responsibility 6
1.3 Business Perspective 7
1.4 Technology Perspective 8
1.5 Architecture Migration Scenarios 12
1.6 Migration Strategy: How do We Move Forward? 20
Chapter 2. Data Warehouse Concepts 24
2.1 Gradual Changes in Computing Focus 24
2.2 Data Warehouse Characteristics and Definition` 26
2.3 The Dynamic, Ad Hoc Report 28
2.4 The Purposes of a Data Warehouse 29
2.5 Data Marts 30
2.6 Operational Data Stores 33
2.7 Data Warehouse Cost-Benefit Analysis / Return on Investment 35
PART II : PEOPLE
Chapter 3. The Project Sponsor 39
3.1 How does a Data Warehouse Affect Decision-Making Processes? 39
3.2 How does a Data Warehouse Improve Financial Processes? Marketing?
Operations? 40
3.3 When is a Data Warehouse Project Justified? 41
3.4 What Expenses are Involved? 43
3.5 What are the Risks? 45
3.6 Risk-Mitigating Approaches 50
3.7 Is Organization Ready for a Data Warehouse? 51
3.8 How the Results are Measured? 51
Chapter 4. The CIO 54
4.1 How is the Data Warehouse Supported? 54
4.2 How Does Data Warehouse Evolve? 55
4.3 Who should be Involved in a Data Warehouse Project? 56
4.4 What is the Team Structure Like? 60
4.5 What New Skills will People Need? 60
4.6 How Does Data Warehousing Fit into IT Architecture? 62
4.7 How Many Vendors are Needed to Talk to? 63
4.8 What should be Looked for in a Data Warehouse Vendor? 64
4.9 How Does Data Warehousing Affect Existing Systems? 67
4.10 Data Warehousing and its Impact on Other Enterprise Initiatives 68
4.11 When is a Data Warehouse not Appropriate? 69
4.12 How to Manage or Control a Data Warehouse Initiative? 71
Chapter 5. The Project Manager 73
5.1 How to Roll Out a Data Warehouse Initiative? 73
5.2 How Important is the Hardware Platform? 76
5.3 What are the Technologies Involved? 78
5.4 Are the Relational Databases Still Used for Data Warehousing? 79
5.5 How Long Does a Data Warehousing Project Last? 83
5.6 How is a Data Warehouse Different from Other IT Projects? 84
5.7 What are the Critical Success Factors of a Data Warehousing 85
Project?
(xii)
PART III : PROCESS
Chapter 6. Warehousing Strategy 89
6.1 Strategy Components 89
6.2 Determine Organizational Context 90
6.3 Conduct Preliminary Survey of Requirements 90
6.4 Conduct Preliminary Source System Audit 92
6.5 Identify External Data Sources (If Applicable) 93
6.6 Define Warehouse Rollouts (Phased Implementation) 93
6.7 Define Preliminary Data Warehouse Architecture 94
6.8 Evaluate Development and Production Environment and Tools 95
Chapter 7. Warehouse Management and Support Processes 96
7.1 Define Issue Tracking and Resolution Process 96
7.2 Perform Capacity Planning 98
7.3 Define Warehouse Purging Rules 108
7.4 Define Security Management 108
7.5 Define Backup and Recovery Strategy 111
7.6 Set Up Collection of Warehouse Usage Statistics 112
Chapter 8. Data Warehouse Planning 114
8.1 Assemble and Orient Team 114
8.2 Conduct Decisional Requirements Analysis 115
8.3 Conduct Decisional Source System Audit 116
8.4 Design Logical and Physical Warehouse Schema 119
8.5 Produce Source-to-Target Field Mapping 119
8.6 Select Development and Production Environment and Tools 121
8.7 Create Prototype for this Rollout 121
8.8 Create Implementation Plan of this Rollout 122
8.9 Warehouse Planning Tips and Caveats 124
Chapter 9. Data Warehouse Implementation 128
9.1 Acquire and Set Up Development Environment 128
9.2 Obtain Copies of Operational Tables 129
9.3 Finalize Physical Warehouse Schema Design 129
(xiii)
(xiv)
9.4 Build or Configure Extraction and Transformation Subsystems 130
9.5 Build or Configure Data Quality Subsystem 131
9.6 Build Warehouse Load Subsystem 135
9.7 Set Up Warehouse Metadata 138
9.8 Set Up Data Access and Retrieval Tools 138
9.9 Perform the Production Warehouse Load 140
9.10 Conduct User Training 140
9.11 Conduct User Testing and Acceptance 141
PART IV : TECHNOLOGY
Chapter 10. Hardware and Operating Systems 145
10.1 Parallel Hardware Technology 145
10.2 The Data Partitioning Issue 148
10.3 Hardware Selection Criteria 152
Chapter 11. Warehousing Software 154
11.1 Middleware and Connectivity Tools 155
11.2 Extraction Tools 155
11.3 Transformation Tools 156
11.4 Data Quality Tools 158
11.5 Data Loaders 158
11.6 Database Management Systems 159
11.7 Metadata Repository 159
11.8 Data Access and Retrieval Tools 160
11.9 Data Modeling Tools 162
11.10 Warehouse Management Tools 163
11.11 Source Systems 163
Chapter 12. Warehouse Schema Design 165
12.1 OLTP Systems Use Normalized Data Structures 165
12.2 Dimensional Modeling for Decisional Systems 167
12.3 Star Schema 168
12.4 Dimensional Hierarchies and Hierarchical Drilling 169
12.5 The Granularity of the Fact Table 170