Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Mastering data warehouse aggregates
Nội dung xem thử
Mô tả chi tiết
Christopher Adamson
Mastering
Data Warehouse
Aggregates
Solutions for Star
Schema Performance
01_777099 ffirs.qxp 6/2/06 3:42 PM Page iii
Mastering
Data Warehouse
Aggregates
01_777099 ffirs.qxp 6/2/06 3:42 PM Page i
01_777099 ffirs.qxp 6/2/06 3:42 PM Page ii
Christopher Adamson
Mastering
Data Warehouse
Aggregates
Solutions for Star
Schema Performance
01_777099 ffirs.qxp 6/2/06 3:42 PM Page iii
Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance
Published by
Wiley Publishing, Inc.
10475 Crosspoint Boulevard
Indianapolis, IN 46256
www.wiley.com
Copyright © 2006 by Wiley Publishing, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN-13: 978-0-471-77709-0
ISBN-10: 0-471-77709-9
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2 1
1MA/SQ/QW/QW/IN
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as
permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978)
646-8600. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley
Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4355, or
online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or
warranties with respect to the accuracy or completeness of the contents of this work and specifically
disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No
warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the
publisher is not engaged in rendering legal, accounting, or other professional services. If professional
assistance is required, the services of a competent professional person should be sought. Neither the
publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or
Website is referred to in this work as a citation and/or a potential source of further information does not
mean that the author or the publisher endorses the information the organization or Website may provide
or recommendations it may make. Further, readers should be aware that Internet Websites listed in this
work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services or to obtain technical support, please contact our Customer Care Department within the U.S. at (800) 762-2974, outside the U.S. at (317) 572-3993
or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may
not be available in electronic books.
Library of Congress Cataloging-in-Publication Data
Adamson, Christopher, 1967–
Mastering data warehouse aggregates: solutions for star schema performance / Christopher Adamson.
p. cm.
Includes index.
ISBN-13: 978-0-471-77709-0 (pbk.)
ISBN-10: 0-471-77709-9 (pbk.)
1. Data warehousing. I. Title.
QA76.9.D37A333 2006
005.74—dc22
2006011219
Trademarks: Wiley, the Wiley logo, and related trade dress are trademarks or registered trademarks of
John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be
used without written permission. All other trademarks are the property of their respective owners.
Wiley Publishing, Inc., is not associated with any product or vendor mentioned in this book.
01_777099 ffirs.qxp 6/2/06 3:42 PM Page iv
For Wayne H. Adamson
1929–2003
Through those whose lives you touched,
your spirit of love endures.
01_777099 ffirs.qxp 6/2/06 3:42 PM Page v
01_777099 ffirs.qxp 6/2/06 3:42 PM Page vi
Christopher Adamson is a data warehousing consultant and founder of
Oakton Software LLC. An expert in star schema design, he has managed and
executed data warehouse implementations in a variety of industries. His customers have included Fortune 500 companies, large and small businesses,
government agencies, and data warehousing tool vendors. Mr. Adamson also
teaches dimensional modeling and is a co-author of Data Warehouse Design
Solutions (also from Wiley). He can be contacted through his website, www
.ChrisAdamson.net.
About the Author
vii
01_777099 ffirs.qxp 6/2/06 3:42 PM Page vii
01_777099 ffirs.qxp 6/2/06 3:42 PM Page viii
Executive Editor
Robert Elliott
Development Editor
Brian Herrmann
Technical Editor
Jim Hadley
Copy Editor
Nancy Rapoport
Editorial Manager
Mary Beth Wakefield
Production Manager
Tim Tate
Vice President and Executive
Group Publisher
Richard Swadley
Vice President and Executive
Publisher
Joseph B. Wikert
Project Coordinator
Michael Kruzil
Graphics and Production
Specialists
Jennifer Click
Denny Hager
Stephanie D. Jumper
Heather Ryan
Quality Control Technicians
John Greenough
Brian H. Walls
Proofreading and Indexing
Techbooks
Credits
ix
01_777099 ffirs.qxp 6/2/06 3:42 PM Page ix
01_777099 ffirs.qxp 6/2/06 3:42 PM Page x
Foreword xix
Acknowledgments xxi
Introduction xxiii
Chapter 1 Fundamentals of Aggregates 1
Star Schema Basics 2
Operational Systems and the Data Warehouse 3
Operational Systems 3
Data Warehouse Systems 4
Facts and Dimensions 5
The Star Schema 7
Dimension Tables and Surrogate Keys 7
Fact Tables and Grain 10
Using the Star Schema 13
Multiple Stars and Conformance 15
Data Warehouse Architecture 20
Invisible Aggregates 22
Improving Performance 23
The Base Schema and the Aggregate Schema 25
The Aggregate Navigator 26
Principles of Aggregation 27
Providing the Same Results 27
The Same Facts and Dimension Attributes
as the Base Schema 28
Contents
xi
02_777099 ftoc.qxp 6/2/06 3:43 PM Page xi
Other Types of Summarization 29
Pre-Joined Aggregates 29
Derived Tables 30
Tables with New Facts 31
Summary 32
Chapter 2 Choosing Aggregates 35
What Is a Potential Aggregate? 36
Aggregate Fact Tables: A Question of Grain 36
Aggregate Dimensions Must Conform 37
Pre-Joined Aggregates Have Grain Too 39
Enumerating Potential Aggregates 39
Identifying Potentially Useful Aggregates 40
Drawing on Initial Design 41
Design Decisions 41
Listening to Users 44
Where Subject Areas Meet 45
The Conformance Bus 45
Aggregates for Drilling Across 46
Query Patterns of an Existing System 49
Analyzing Reports for Potential Aggregates 49
Choosing Which Reports to Analyze 54
Assessing the Value of Potential Aggregates 55
Number of Aggregates 55
Presence of an Aggregate Navigator 55
Space Consumed by Aggregate Tables 56
How Many Rows Are Summarized 57
Examining the Number of Rows Summarized 59
The Cardinality Trap and Sparsity 62
Who Will Benefit from the Aggregate 64
Summary 65
Chapter 3 Designing Aggregates 67
The Base Schema 68
Identification of Grain 68
When Grain Is Forgotten 68
Grain and Aggregates 69
Conformance Bus 70
Rollup Dimensions 72
Aggregation Points 74
Natural Keys 74
Source Mapping 75
Slow Change Processing 76
Hierarchies 76
Housekeeping Columns 78
xii Contents
02_777099 ftoc.qxp 6/2/06 3:43 PM Page xii
Design Principles for the Aggregate Schema 81
A Separate Star for Each Aggregation 81
Single Schema and the Level Field 81
Drawbacks to the Single Schema Approach 84
Advantages of Separate Tables 85
Pre-Joined Aggregates 86
Naming Conventions 87
Naming the Attributes 87
Naming Aggregate Tables 88
Aggregate Dimension Design 90
Attributes of Aggregate Dimensions 90
Sourcing Aggregate Dimensions 91
Shared Dimensions 92
Aggregate Fact Table Design 93
Aggregate Facts: Names and Data Types 94
No New Facts, Including Counts 94
Degenerate Dimensions 96
Audit Dimension 96
Sourcing Aggregate Fact Tables 97
Pre-Joined Aggregate Design 98
Documenting the Aggregate Schema 98
Identify Schema Families 99
Identify Dimensional Conformance 99
Documenting Aggregate Dimension Tables 101
Documenting Aggregate Fact Tables 103
Pre-Joined Aggregates 106
Materialized Views and Materialized Query Tables 108
Summary 108
Chapter 4 Using Aggregates 109
Which Tables to Use? 110
The Schema Design 110
Relative Size 113
Aggregate Portfolio and Availability 114
Requirements for the Aggregate Navigator 116
Why an Aggregate Navigator? 116
Two Views and Query Rewrite 117
Dynamic Availability 120
Multiple Front Ends 121
Multiple Back Ends 123
Evaluating Aggregate Navigators 126
Front-End Aggregate Navigators 127
Approach 127
Pros and Cons 128
Contents xiii
02_777099 ftoc.qxp 6/2/06 3:43 PM Page xiii