Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Applied multivariate statistical analysis
Nội dung xem thử
Mô tả chi tiết
International Edition
CD-ROM
INCLUDED
I
A pplied
M ultivariate
S tatistical
A nalysis
Fifth Edition
R ichard A. J o h n s o n
D ean W. W ichern
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn1
A pplied Multivariate
Statistical Analysis
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn2
A p plied Multivariate
Statistical Analysis
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn3
FIFTH EDITION
Applied Multivariate
Statistical Analysis
RICHARD A. JOHNSON
University o f Wisconsin— Madison
DEAN W. WICHERN
Texas A S M University
Pearson Education International
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn4
If you have purchased this book within the United States or Canada you should be aware that
it has been wrongfully imported without the approval of the Publisher or the Author.
Acquisitions Editor: Quincy McDonald
Editor-in-Chief: Sally Yagan
Vice President/Director Production and Manufacturing: David W. Riccardi
Executive Managing Editor: Kathleen Schiaparelli
Senior Managing E d ito r Linda Mihatov Behrens
Assistant Managing Editor: Bayani DeLeon
Production Editor: Steven S. Pawlowski
Manufacturing Buyer: Alan Fischer
Manufacturing Manager: Trudy Pisciotti
Marketing Manager: Angela Battle
Editorial Assistant/Supplements Editor: Joanne Wendelken
Managing Editor, Audio/Video Assets: Grace Hazeldine
Art Director: Jayne Conte
Cover Designer: Bruce Kenselaar
Illustrator: Marita Froimson
© 2002.1998.1992.1988.1982 by Prentice-Hall. Inc.
Upper Saddle River, NJ 07458
All rights reserved. No part of this book may be reproduced, in any form or by any means,
without permission in writing from the publisher.
Printed in the United States o f America
10 9876543
ISBN 0-13-121973-1
Pearson Education LTD.
Pearson Education Australia PTY, Limited
Pearson Education Singapore, Pte. Ltd.
Pearson Education North Asia Ltd.
Pearson Education Canada, Ltd.
Pearson Educacion de Mexico, S.A. de C.V.
Pearson Education - Japan
Pearson Education Malaysia, Pte. Ltd.
Pearson Education, Upper Saddle River, New Jersey Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn5
To the memory of my mother and my father.
R. A. J.
To Dorothy, Michael, and Andrew.
D. W. W.
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn6
Contents
PREFA CE X V
1 A SP ECTS O F M ULTIVARIATE A N A LY SIS 1
1.1 Introduction 1
1.2 Applications of Multivariate Techniques 3
1.3 The Organization of Data 5
Arrays, 5
D escriptive Statistics, 6
G ra p h ica l Techniques, 11
1.4 Data Displays and Pictorial Representations 19
L in k in g M u ltip le T w o -D im en sio n a l Scatter Plots, 20
G ra p h s o f G ro w th Curves, 24
Stars, 25
C h e r n o ff Faces, 28
1.5 Distance 30
1.6 Final Comments 38
Exercises 38
References 48
2 M ATRIX A LG EBR A A N D RA N DO M VECTORS 50
2.1 Introduction 50
2.2 Some Basics of Matrix and Vector Algebra 50
Vectors, 50
M atrices, 55
2.3 Positive Definite Matrices 61
2.4 A Square-Root Matrix 66
2.5 Random Vectors and Matrices 67
2.6 Mean Vectors and Covariance Matrices 68
P artitioning the C ovariance M atrix, 74
T h e M ean Vector a n d C ovariance M atrix
fo r L in ea r C o m b in a tio n s o f R a n d o m Variables, 76
P artitioning the S a m p le M ean Vector
a n d C ovariance M atrix, 78
2.7 Matrix Inequalities and Maximization 79
vii
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn7
viii Contents
Supplement 2A: Vectors and Matrices: Basic Concepts 84
Vectors, 84
M atrices, 89
Exercises 104
References 111
3 SA M PLE G EO M ETRY A N D RAN DO M SAM PLING
3.1 Introduction 112
3.2 The Geometry of the Sample 112
3.3 Random Samples and the Expected Values of the Sample Mean and
Covariance Matrix 120
3.4 Generalized Variance 124
S ituations in w hich the G en era lized S a m p le Variance Is Zero, 130
G eneralized Variance D eterm in ed by \ R
a n d Its G eom etrical Interpretation, 136
A n o th e r G eneralization o f Variance, 138
3.5 Sample Mean, Covariance, and Correlation
As Matrix Operations 139
3.6 Sample Values of Linear Combinations of Variables 141
Exercises 145
References 148
4 THE M ULTIVARIATE NO RM AL DISTRIBUTION
4.1 Introduction 149
4.2 The Multivariate Normal Density and Its Properties 149
A d d itio n a l P roperties o f the M ultivariate
N o rm a l D istribution, 156
4.3 Sampling from a Multivariate Normal Distribution
and Maximum Likelihood Estimation 168
T he M ultivariate N o r m a l L ike lih o o d , 168
M a x im u m L ik e lih o o d E stim a tio n o f pi a n d X, 170
S u fficien t Statistics, 173
4.4 The Sampling Distribution of X and S 173
P roperties o f the W ishart D istribution, 174
4.5 Large-Sample Behavior of X and 5 175
4.6 Assessing the Assumption of Normality 177
E v a lu a tin g the N o rm a lity o f the U nivariate M a rg in a l D istributions, 178
E valuating B ivariate N o rm a lity, 183
4.7 Detecting Outliers and Cleaning Data 189
Steps fo r D etecting O utliers, 190
4.8 Transformations To Near Normality 194
T ra n sfo rm in g M ultivariate O bservations, 198
Exercises 202
References 209
112
149
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn8
Contents ix
5 INFERENCES A B O U T A M EAN VECTOR
5.1 Introduction 210
5.2 The Plausibility of Mo as a Value for a Normal
Population Mean 210
5.3 Hotelling's T 2 and Likelihood Ratio Tests 216
G eneral L ik e lih o o d R atio M ethod, 219
5.4 Confidence Regions and Simultaneous Comparisons
of Component Means 220
S im u lta n eo u s C onfidence Statem ents, 223
A C om parison o f Sim u lta n eo u s C onfidence Intervals
with O ne-at-a-T im e Intervals, 229
T he B o n ferro n i M eth o d o f M u ltip le C om parisons, 232
5.5 Large Sample Inferences about a Population Mean Vector 234
5.6 Multivariate Quality Control Charts 239
C harts fo r M o n ito rin g a S a m p le o f In d iv id u a l M ultivariate O bservations
fo r Stability, 241
C o n tro l R eg io n s fo r Future In d iv id u a l O bservations, 241
C o n tro l E llipse fo r Future O bservations, 248
T 2-C hart fo r Future O bservations, 248
C o n tro l C harts B a sed on S u b sa m p le M eans, 249
C o n tro l R egions fo r Future S u b sa m p le O bservations, 251
5.7 Inferences about Mean Vectors
when Some Observations Are Missing 252
5.8 Difficulties Due to Time Dependence
in Multivariate Observations 256
Supplement 5A: Simultaneous Confidence Intervals and Ellipses
as Shadows of the p-Dimensional Ellipsoids 258
Exercises 260
References 270
6 CO M PA RISO N S O F SEV ERA L M ULTIVARIATE M EAN S
6.1 Introduction 272
6.2 Paired Comparisons and a Repeated Measures Design 272
P aired C om parisons, 272
A R epeated M easures D esign fo r C o m p a rin g Treatments, 278
6.3 Comparing Mean Vectors from Two Populations 283
A s s u m p tio n s C oncerning the Structure o f the D ata, 283
F urther A ssu m p tio n s w hen n , a n d n ,A r e Sm all, 284
S im u lta n eo u s C o n fid en ce Intervals, 287
T he T w o-Sam ple Situation w hen X, ^ X2, 290
6.4 Comparing Several Multivariate Population Means
(One-Way Manova) 293
A ssu m p tio n s a b o u t the Structure o f the D ata fo r O n e-w a y M A N O V A , 293
A S u m m a ry o f U nivariate A N O V A , 293
M ultivariate A n a lysis o f Variance (M A N O V A ), 298
210
272
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn9
6.5 Simultaneous Confidence Intervals for Treatment Effects 305
6.6 Two-Way Multivariate Analysis of Variance 307
Univariate Two-W ay F ixed-E ffects M o d el with Interaction, 307
M ultivariate Tw o-W ay F ixed-E ffects M o d el w ith Interaction, 309
6.7 Profile Analysis 318
6.8 Repeated Measures Designs and Growth Curves 323
6.9 Perspectives and a Strategy for Analyzing
Multivariate Models 327
Exercises 332
References 352
x Contents
7 M ULTIVARIATE LIN EAR REGRESSIO N M ODELS
7.1 Introduction 354
7.2 The Classical Linear Regression Model 354
7.3 Least Squares Estimation 358
S u m -o f-S q u a res D eco m p o sitio n , 360
G eo m etry o f L east Squares, 361
S a m p lin g Properties o f Classical L east Squares Estim ators, 363
7.4 Inferences About the Regression Model 365
Inferences C oncerning the Regression Parameters, 365
L ik e lih o o d Ratio Tests fo r the Regression Parameters, 370
7.5 Inferences from the Estimated Regression Function 374
E stim ating the Regression F unction at z0, 374
Forecasting a N ew O bservation at zf;, 375
7.6 Model Checking and Other Aspects of Regression 377
D oes the M o d el Fit?, 377
L everage a n d Influence, 380
A d d itio n a l P ro b lem s in L in ea r Regression, 380
7.7 Multivariate Multiple Regression 383
L ik e lih o o d R atio Tests fo r Regression Param eters, 392
O th er M ultivariate Test Statistics, 395
P redictions fr o m M ultivariate M u ltip le Regressions, 395
7.8 The Concept of Linear Regression 398
Prediction o f Several Variables, 403
Partial C orrelation C oefficient, 406
7.9 Comparing the Two Formulations of the Regression Model 407
M ean C orrected Form o f the Regression M odel, 407
Relating the Form ulations, 409
7.10 Multiple Regression Models with Time Dependent Errors 410
Supplement 7A: The Distribution of the Likelihood Ratio
for the Multivariate Multiple Regression Model 415
Exercises 417
References 424
354
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn10
Contents xi
8 PRINCIPAL COM PONENTS
8.1 Introduction 426
8.2 Population Principal Components 426
P rincipal C o m p o n en ts O btained fr o m S ta n d a rd ized Variables, 432
P rincipal C o m p o n en ts fo r Covariance M atrices
with Special Structures, 435
8.3 Summarizing Sample Variation by Principal Components 437
The N u m b er o f P rincipal C om ponents, 440
Interpretation o f the Sam ple P rincipal C om ponents, 444
S ta n d a rd izin g the S a m p le P rincipal C om ponents. 445
8.4 Graphing the Principal Components 450
8.5 Large Sample Inferences _ 452
L arge S a m p le Properties o f A, a n d e, , 452
Testing fo r the E q u a l C orrelation Structure, 453
8.6 Monitoring Quality with Principal Components 455
C h eckin g a G iven Set o f M easurem ents fo r Stability, 455
C o n tro llin g Future Values, 459
Supplement 8A: The Geometry of the Sample Principal
Component Approximation 462
T he p -D im en sio n a l G eom etrical Interpretation, 464
T he n -D im en sio n a l G eom etrical Interpretation, 465
Exercises 466
References 475
9 FA CTO R A N A LY SIS A N D INFERENCE
FOR STRU CTU RED CO VARIANCE M ATRICES
9.1 Introduction 477
9.2 The Orthogonal Factor Model 478
9.3 Methods of Estimation 484
T he P rincipal C o m p o n en t (and P rincipal Factor) M ethod, 484
A M vilified A p p ro a ch — the P rincipal Factor Solution. 490
T he M a xim u m L ik e lih o o d M eth o d , 492
A L arge S a m p le Test fo r the N u m b er o f C o m m o n Factors, 498
9.4 Factor Rotation 501
O b liq u e R otations, 509
9.5 Factor Scores 510
T he W eighted L ea st Squares M ethod, 511
T he Regression M ethod, 513
9.6 Perspectives and a Strategy for Factor Analysis 517
9.7 Structural Equation Models 524
T he L I S R E L M odel, 525
C onstruction o f a Path D iagram , 525
C ovariance Structure, 526
E stim ation, 527
M odel-F itting Strategy, 529
426
477
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn11
xii Contents
Supplement 9A: Some Computational Details
for Maximum Likelihood Estimation 530
R eco m m en d e d C o m p u ta tio n a l Schem e, 531
M a xim u m L ik e lih o o d E stim ators o f P = L ,L ',+ i|I, ,5 3 2
Exercises 533
References 541
10 CANON ICAL CORRELATION A N A LY SIS
10.1 Introduction 543
10.2 Canonical Variates and Canonical Correlations 543
10.3 Interpreting the Population Canonical Variables 551
Id en tifyin g the C anonical Variables, 551
C anonical C orrelations as G eneralizations
o f O ther C orrelation C oefficients, 553
T h e First r C anonical Variables as a S u m m a ry o f Variability, 554
A G eom etrical Interpretation o f the P opulation C anonical
C orrelation A n a lysis 555
10.4 The Sample Canonical Variates and Sample
Canonical Correlations 556
10.5 Additional Sample Descriptive Measures 564
M atrices o f E rrors o f A p p ro xim a tio n s, 564
P ro p o rtio n s o f E xp la in ed S a m p le Variance, 567
10.6 Large Sample Inferences 569
Exercises 573
References 580
11 DISCRIM INATION A N D CLA SSIFICA TIO N
11.1 Introduction 581
11.2 Separation and Classification for Two Populations 582
11.3 Classification with Two Multivariate Normal Populations 590
C lassification o f N o rm a l P o p ulations W h en S i = I . - X, 590
Scaling, 595
C lassification o f N o rm a l P o p u la tio n s W h en 2 , # X2, 596
11.4 Evaluating Classification Functions 598
11.5 Fisher’s Discriminant Function—Separation of Populations 609
11.6 Classification with Several Populations 612
T he M in im u m E xp ected C ost o f M isclassification M eth o d , 613
C lassification with N o rm a l Populations, 616
11.7 Fisher’s Method for Discriminating
among Several Populations 628
U sing Fisher's D iscrim in a n ts to C lassify O bjects, 635
11.8 Final Comments 641
In clu d in g Q ualitative Variables, 641
C lassification Trees, 641
N eural N etw orks, 644
543
581
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn12
Contents xiii
Selection o f Variables, 645
Testing fo r G ro u p D ifferences, 645
G raphics, 646
Practical C onsiderations Regarding M ultivariate N orm ality, 646
Exercises 647
References 666
12 CLUSTERING, D ISTA N CE M ETHODS, A N D ORDINATION
12.1 Introduction 668
12.2 Similarity Measures 670
D istances a n d Sim ilarity C oefficients fo r Pairs o f Items, 670
Sim ilarities a n d A sso cia tio n M easures
fo r Pairs o f Variables, 676
C o n clu d in g C o m m e n ts on Sim ilarity, 677
12.3 Hierarchical Clustering Methods 679
S in g le L in ka g e, 681
C o m p lete L in ka g e, 685
A v era g e L in ka g e, 689
W a rd ’s H ierarchical C lustering M ethod, 690
F inal C o m m e n ts— H ierarchical Procedures, 693
12.4 Nonhierarchical Clustering Methods 694
K -m ea n s M eth o d , 694
F inal C o m m en ts■— N onhierarchical Procedures, 698
12.5 Multidimensional Scaling 700
T h e B a sic A lg o rith m , 700
12.6 Correspondence Analysis 709
A lg eb ra ic D evelo p m e n t o f C orrespondence A nalysis, 711
Inertia, 718
Interpretation in Two D im ensions, 719
F inal C om m ents. 719
12.7 Biplots for Viewing Sampling Units and Variables 719
C o n stru ctin g Biplots, 720
12.8 Procrustes Analysis: A Method
for Comparing Configurations 723
C o n stru ctin g th e P rocrustes M easure o f A greem ent, 724
Supplement 12A: Data Mining 731
In tro d u ctio n , 731
T h e D a ta M in in g Process, 732
M o d e l A ssessm en t, 733
Exercises 738
References 745
A P P EN D IX
DATA IN D EX
668
748
758
S U B JE C T IN D EX 761 Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn13
Preface
INTENDED AUDIENCE
This book originally grew out of our lecture notes for an "Applied Multivariate Analysis” course offered jointly by the Statistics Department and the School of Business at
the University of Wisconsin-Madison. Applied Multivariate Statistical Analysis, Fifth
Edition, is concerned with statistical methods for describing and analyzing multivariate data. Data analysis, while interesting with one variable, becomes truly fascinating and challenging when several variables are involved. Researchers in the
biological, physical, and social sciences frequently collect measurements on several
variables. Modern computer packages readily provide the numerical results to rather
complex statistical analyses. We have tried to provide readers with the supporting
knowledge necessary for making proper interpretations, selecting appropriate techniques. and understanding their strengths and weaknesses. We hope our discussions
will meet the needs of experimental scientists, in a wide variety of subject matter
areas, as a readable introduction to the statistical analysis of multivariate observations.
LEVEL
Our aim is to present the concepts and methods of multivariate analysis at a level
that is readily understandable bv readers who have taken two or more statistics courses. We emphasize the applications of multivariate methods and, consequently, have
attempted to make the mathematics as palatable as possible. We avoid the use of calculus. On the other hand, the concepts of a matrix and of matrix manipulations are
important. We do not assume the reader is familiar with matrix algebra. Rather, we
introduce matrices as they appear naturally in our discussions, and we then show how
they simplify the presentation of multivariate models and techniques.
The introductory account of matrix algebra, in Chapter 2. highlights the more
important matrix algebra results as they apply to multivariate analysis. The Chapter
2 supplement provides a summary of matrix algebra results for those with little or no
previous exposure to the subject. This supplementary material helps make the book
self-contained and is used to complete proofs. The proofs may be ignored on the first
reading. In this way we hope to make the book accessible to a wide audience.
In our attempt to make the study of multivariate analysis appealing to a large
audience of both practitioners and theoreticians, we have had to sacrifice a consistency
xv
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn14
xvi Preface
of level. Some sections are harder than others. In particular, we have summarized a
voluminous amount of material on regression in Chapter 7. The resulting presentation is rather succinct and difficult the first time through. We hope instructors will he
able to compensate for the unevenness in level by judiciously choosing those sections. and subsections, appropriate for their students and by toning them down if
necessary.
ORGANIZATION AND APPROACH
The methodological "tools" of multivariate analysis are contained in Chapters 5
through 12. These chapters represent the heart of the book, but they cannot be assimilated without much of the material in the introductory Chapters 1 through 4.
Even those readers with a good knowledge of matrix algebra or those willing to accept the mathematical results on faith should, at the very least, peruse Chapter 3.
“Sample Geometry,” and Chapter 4, “Multivariate Normal Distribution."
Our approach in the methodological chapters is to keep the discussion direct and
uncluttered. Typically, we start with a formulation of the population models, delineate
the corresponding sample results, and liberally illustrate everything with examples. The
examples are of two types: those that are simple and whose calculations can be easily done by hand, and those that rely on real-world data and computer software. These
will provide an opportunity to (1) duplicate our analyses. (2) carry out the analyses
dictated by exercises, or (3) analyze the data using methods other than the ones we
have used or suggested.
The division of the methodological chapters (5 through 12) into three units allows instructors some flexibility in tailoring a course to their needs. Possible sequences
for a one-semester (two quarter) course are indicated schematically.
Each instructor will undoubtedly omit certain sections from some chapters to
cover a broader collection of topics than is indicated by these two choices.
For most students, we would suggest a quick pass through the first four chapters (concentrating primarily on the material in Chapter 1; Sections 2.1,2.2,2.3.2.5.
2.6. and 3.6: and the "assessing normality” material in Chapter 4) followed by a selection of methodological topics. For example, one might discuss the comparison of
mean vectors, principal components, factor analysis, discriminant analysis and clustering. The discussions could feature the many “worked out” examples included in
Số hóa bởi Trung tâm Học liệu – ĐH TN http://www.lrc-tnu.edu.vn15