Siêu thị PDFTải ngay đi em, trời tối mất

Thư viện tri thức trực tuyến

Kho tài liệu với 50,000+ tài liệu học thuật

© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Statistics and Analysis of Scientific Data
PREMIUM
Số trang
323
Kích thước
5.6 MB
Định dạng
PDF
Lượt xem
832

Statistics and Analysis of Scientific Data

Nội dung xem thử

Mô tả chi tiết

Graduate Texts in Physics

Massimiliano Bonamente

Statistics and

Analysis of

Scienti c Data

Second Edition

Graduate Texts in Physics

Series editors

Kurt H. Becker, Polytechnic School of Engineering, Brooklyn, USA

Jean-Marc Di Meglio, Université Paris Diderot, Paris, France

Sadri Hassani, Illinois State University, Normal, USA

Bill Munro, NTT Basic Research Laboratories, Atsugi, Japan

Richard Needs, University of Cambridge, Cambridge, UK

William T. Rhodes, Florida Atlantic University, Boca Raton, USA

Susan Scott, Australian National University, Acton, Australia

H. Eugene Stanley, Boston University, Boston, USA

Martin Stutzmann, TU München, Garching, Germany

Andreas Wipf, Friedrich-Schiller-Univ Jena, Jena, Germany

Graduate Texts in Physics

Graduate Texts in Physics publishes core learning/teaching material for graduate￾and advanced-level undergraduate courses on topics of current and emerging fields

within physics, both pure and applied. These textbooks serve students at the

MS- or PhD-level and their instructors as comprehensive sources of principles,

definitions, derivations, experiments and applications (as relevant) for their mastery

and teaching, respectively. International in scope and relevance, the textbooks

correspond to course syllabi sufficiently to serve as required reading. Their didactic

style, comprehensiveness and coverage of fundamental material also make them

suitable as introductions or references for scientists entering, or requiring timely

knowledge of, a research field.

More information about this series at http://www.springer.com/series/8431

Massimiliano Bonamente

Statistics and Analysis

of Scientific Data

Second Edition

123

Massimiliano Bonamente

University of Alabama

Huntsville

Alabama, USA

ISSN 1868-4513 ISSN 1868-4521 (electronic)

Graduate Texts in Physics

ISBN 978-1-4939-6570-0 ISBN 978-1-4939-6572-4 (eBook)

DOI 10.1007/978-1-4939-6572-4

Library of Congress Control Number: 2016957885

1st edition: © Springer Science+Business Media New York 2013

2nd edition: © Springer Science+Business Media LLC 2017

© Springer Science+Busines Media New York 2017

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of

the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,

broadcasting, reproduction on microfilms or in any other physical way, and transmission or information

storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology

now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication

does not imply, even in the absence of a specific statement, that such names are exempt from the relevant

protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book

are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or

the editors give a warranty, express or implied, with respect to the material contained herein or for any

errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The registered company is Springer Science+Business Media LLC

The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A

To Giorgio and Alida, who taught me the

value of a book.

To Carlo and Gaia, to whom I teach the

same.

And to Kerry, with whom I share the love

of books, and everything else.

Preface to the First Edition

Across all sciences, a quantitative analysis of data is necessary to assess the

significance of experiments, observations, and calculations. This book was written

over a period of 10 years, as I developed an introductory graduate course on

statistics and data analysis at the University of Alabama in Huntsville. My goal

was to put together the material that a student needs for the analysis and statistical

interpretation of data, including an extensive set of applications and problems that

illustrate the practice of statistical data analysis.

The literature offers a variety of books on statistical methods and probability

theory. Some are primarily on the mathematical foundations of statistics, some

are purely on the theory of probability, and others focus on advanced statistical

methods for specific sciences. This textbook contains the foundations of probability,

statistics, and data analysis methods that are applicable to a variety of fields—

from astronomy to biology, business sciences, chemistry, engineering, physics, and

more—with equal emphasis on mathematics and applications. The book is therefore

not specific to a given discipline, nor does it attempt to describe every possible

statistical method. Instead, it focuses on the fundamental methods that are used

across the sciences and that are at the basis of more specific techniques that can

be found in more specialized textbooks or research articles.

This textbook covers probability theory and random variables, maximum￾likelihood methods for single variables and two-variable datasets, and more complex

topics of data fitting, estimation of parameters, and confidence intervals. Among the

topics that have recently become mainstream, Monte Carlo Markov chains occupy

a special role. The last chapter of the book provides a comprehensive overview of

Markov chains and Monte Carlo Markov chains, from theory to implementation.

I believe that a description of the mathematical properties of statistical tests is

necessary to understand their applicability. This book therefore contains mathemat￾ical derivations that I considered particularly useful for a thorough understanding of

the subject; the book refers the reader to other sources in case of mathematics that

goes beyond that of basic calculus. The reader who is not familiar with calculus may

skip those derivations and continue with the applications.

vii

viii Preface to the First Edition

Nonetheless, statistics is necessarily slanted toward applications. To highlight

the relevance of the statistical methods described, I have reported original data

from four fundamental scientific experiments from the past two centuries: J.J.

Thomson’s experiment that led to the discovery of the electron, G. Mendel’s data

on plant characteristics that led to the law of independent assortment of species,

E. Hubble’s observation of nebulae that uncovered the expansion of the universe,

and K. Pearson’s collection of biometric characteristics in the UK in the early

twentieth century. These experiments are used throughout the book to illustrate how

statistical methods are applied to actual data and are used in several end-of-chapter

problems. The reader will therefore have an opportunity to see statistics in action

on these classic experiments and several additional examples.

The material presented in this book is aimed at upper-level undergraduate

students or beginning graduate students. The reader is expected to be familiar

with basic calculus, and no prior knowledge of statistics or probability is assumed.

Professional scientists and researchers will find it a useful reference for fundamental

methods such as maximum-likelihood fit, error propagation formulas, goodness of

fit and model comparison, Monte Carlo methods such as the jackknife and bootstrap,

Monte Carlo Markov chains, Kolmogorov-Smirnov tests, and more. All subjects

are complemented by an extensive set of numerical tables that make the book

completely self-contained.

The material presented in this book can be comfortably covered in a one-semester

course and has several problems at the end of each chapter that are suitable as

homework assignments or exam questions. Problems are both of theoretical and

numerical nature, so that emphasis is equally placed on conceptual and practical

understanding of the subject. Several datasets, including those in the four “classic

experiments,” are used across several chapters, and the students can therefore use

them in applications of increasing difficulty.

Huntsville, AL, USA Massimiliano Bonamente

Preface to the Second Edition

The second edition of Statistics and Analysis of Scientific Data was motivated by

the overall goal to provide a textbook that is mathematically rigorous and easy to

read and use as a reference at the same time. Basically, it is a book for both the

student who wants to learn in detail the mathematical underpinnings of statistics

and the reader who wants to just find the practical description on how to apply a

given statistical method or use the book as a reference.

To this end, first I decided that a more clear demarcation between theoretical and

practical topics would improve the readability of the book. As a result, several pages

(i.e., mathematical derivations) are now clearly marked throughout the book with a

vertical line, to indicate material that is primarily aimed to those readers who seek

a more thorough mathematical understanding. Those parts are not required to learn

how to apply the statistical methods presented in the book. For the reader who uses

this book as a reference, this makes it easy to skip such sections and go directly

to the main results. At the end of each chapter, I also provide a summary of key

concepts, intended for a quick look-up of the results of each chapter.

Secondly, certain existing material needed substantial re-organization and expan￾sion. The second edition is now comprised of 16 chapters, versus ten of the first

edition. A few chapters (Chap. 6 on mean, median, and averages, Chap. 9 on multi￾variable regression, and Chap. 11 on systematic errors and intrinsic scatter) contain

material that is substantially new. In particular, the topic of multi-variable regression

was introduced because of its use in many fields such as business and economics,

where it is common to apply the regression method to many independent variables.

Other chapters originate from re-arranging existing material more effectively. Some

of the numerical tables in both the main body and the appendix have been expanded

and re-arranged, so that the reader will find it even easier to use them for a variety

of applications and as a reference.

The second edition also contains a new classic experiment, that of the measure￾ment of iris characteristics by R.A. Fisher and E. Anderson. These new data are used

to illustrate primarily the method of regression with many independent variables.

The textbook now features a total of five classic experiments (including G. Mendel’s

data on the independent assortment of species, J.J. Thomson’s data on the discovery

ix

x Preface to the Second Edition

of the electron, K. Pearson’s collection of data of biometric characteristics, and

E. Hubble’s measurements of the expansion of the universe). These data and their

analysis provide a unique way to learn the statistical methods presented in the book

and a resource for the student and the teacher alike. Many of the end-of-chapter

problems are based on these experimental data.

Finally, the new edition contains corrections to a number of typos that had

inadvertently entered the manuscript. I am very much in debt to many of my students

at the University of Alabama in Huntsville for pointing out these typos to me over the

past few years, in particular, to Zachary Robinson, who has patiently gone through

much of the text to find typographical errors.

Huntsville, AL, USA Massimiliano Bonamente

Acknowledgments

In my early postdoc years, I was struggling to solve a complex data analysis

problem. My longtime colleague and good friend Dr. Marshall Joy of NASA’s

Marshall Space Flight Center one day walked down to my office and said something

like “Max, I have a friend in Chicago who told me that there is a method that maybe

can help us with our problem. I don’t understand any of it, but here’s a paper that

talks about Monte Carlo Markov chains. See if it can help us.” That conversation

led to the appreciation of one of statistics and data analysis, most powerful tools

and opened the door for virtually all the research papers that I wrote ever since. For

over a decade, Marshall taught me how to be careful in the analysis of data and

interpretation of results—and always used a red felt-tip marker to write comments

on my papers.

The journey leading to this book started about 10 years ago, when Prof. A. Gor￾don Emslie, currently provost at Western Kentucky University, and I decided to offer

a new course in data analysis and statistics for graduate students in our department.

Gordon’s uncanny ability to solve virtually any problem presented to him—and

likewise make even the experienced scientist stumble with his questions—has been

a great source of inspiration for this book.

Some of the material presented in this book is derived from Prof. Kyle Siegrist’s

lectures on probability and stochastic processes at the University of Alabama in

Huntsville. Kyle reinforced my love for mathematics and motivated my desire to

emphasize both mathematics and applications for the material presented in this

book.

xi

Contents

1 Theory of Probability ...................................................... 1

1.1 Experiments, Events, and the Sample Space ........................ 1

1.2 Probability of Events................................................. 2

1.2.1 The Kolmogorov Axioms .................................. 2

1.2.2 Frequentist or Classical Method ........................... 3

1.2.3 Bayesian or Empirical Method............................. 4

1.3 Fundamental Properties of Probability .............................. 4

1.4 Statistical Independence ............................................. 5

1.5 Conditional Probability .............................................. 7

1.6 A Classic Experiment: Mendel’s Law of Heredity

and the Independent Assortment of Species ........................ 8

1.7 The Total Probability Theorem and Bayes’ Theorem .............. 10

2 Random Variables and Their Distributions ............................. 17

2.1 Random Variables .................................................... 17

2.2 Probability Distribution Functions .................................. 19

2.3 Moments of a Distribution Function ................................ 20

2.3.1 The Mean and the Sample Mean........................... 21

2.3.2 The Variance and the Sample Variance .................... 22

2.4 A Classic Experiment: J.J. Thomson’s Discovery

of the Electron ........................................................ 23

2.5 Covariance and Correlation Between Random Variables .......... 26

2.5.1 Joint Distribution and Moments of Two

Random Variables .......................................... 26

2.5.2 Statistical Independence of Random Variables............ 28

2.6 A Classic Experiment: Pearson’s Collection of Data on

Biometric Characteristics ............................................ 30

3 Three Fundamental Distributions: Binomial, Gaussian,

and Poisson ................................................................. 35

3.1 The Binomial Distribution ........................................... 35

3.1.1 Derivation of the Binomial Distribution ................... 35

3.1.2 Moments of the Binomial Distribution .................... 38

xiii

xiv Contents

3.2 The Gaussian Distribution ........................................... 40

3.2.1 Derivation of the Gaussian Distribution

from the Binomial Distribution ............................ 40

3.2.2 Moments and Properties of the Gaussian Distribution.... 44

3.2.3 How to Generate a Gaussian Distribution

from a Standard Normal.................................... 45

3.3 The Poisson Distribution ............................................. 45

3.3.1 Derivation of the Poisson Distribution..................... 46

3.3.2 Properties and Interpretation of the Poisson

Distribution ................................................. 47

3.3.3 The Poisson Distribution and the Poisson Process........ 48

3.3.4 An Example on Likelihood and Posterior

Probability of a Poisson Variable .......................... 49

3.4 Comparison of Binomial, Gaussian, and Poisson Distributions ... 51

4 Functions of Random Variables and Error Propagation .............. 55

4.1 Linear Combination of Random Variables .......................... 55

4.1.1 General Mean and Variance Formulas..................... 55

4.1.2 Uncorrelated Variables and the 1=pN Factor............. 56

4.2 The Moment Generating Function .................................. 58

4.2.1 Properties of the Moment Generating Function ........... 59

4.2.2 The Moment Generating Function of the

Gaussian and Poisson Distribution ........................ 59

4.3 The Central Limit Theorem.......................................... 61

4.4 The Distribution of Functions of Random Variables ............... 64

4.4.1 The Method of Change of Variables ....................... 65

4.4.2 A Method for Multi-dimensional Functions .............. 66

4.5 The Law of Large Numbers.......................................... 68

4.6 The Mean of Functions of Random Variables ...................... 69

4.7 The Variance of Functions of Random Variables

and Error Propagation Formulas..................................... 70

4.7.1 Sum of a Constant .......................................... 72

4.7.2 Weighted Sum of Two Variables........................... 72

4.7.3 Product and Division of Two Random Variables.......... 73

4.7.4 Power of a Random Variable ............................... 74

4.7.5 Exponential of a Random Variable ........................ 75

4.7.6 Logarithm of a Random Variable .......................... 75

4.8 The Quantile Function and Simulation of Random Variables...... 76

4.8.1 General Method to Simulate a Variable ................... 78

4.8.2 Simulation of a Gaussian Variable ......................... 79

5 Maximum Likelihood and Other Methods to Estimate

Variables .................................................................... 85

5.1 The Maximum Likelihood Method for Gaussian Variables ........ 85

5.1.1 Estimate of the Mean ....................................... 86

5.1.2 Estimate of the Variance.................................... 87

5.1.3 Estimate of Mean for Non-uniform Uncertainties ........ 88

Contents xv

5.2 The Maximum Likelihood Method for Other Distributions........ 90

5.3 Method of Moments.................................................. 91

5.4 Quantiles and Confidence Intervals ................................. 93

5.4.1 Confidence Intervals for a Gaussian Variable ............. 94

5.4.2 Confidence Intervals for the Mean of a Poisson

Variable ..................................................... 97

5.5 Bayesian Methods for the Poisson Mean............................ 102

5.5.1 Bayesian Expectation of the Poisson Mean ............... 102

5.5.2 Bayesian Upper and Lower Limits for a

Poisson Variable ............................................ 103

6 Mean, Median, and Average Values of Variables ....................... 107

6.1 Linear and Weighted Average ....................................... 107

6.2 The Median ........................................................... 109

6.3 The Logarithmic Average and Fractional

or Multiplicative Errors .............................................. 109

6.3.1 The Weighted Logarithmic Average ....................... 110

6.3.2 The Relative-Error Weighted Average ..................... 113

7 Hypothesis Testing and Statistics ......................................... 117

7.1 Statistics and Hypothesis Testing .................................... 117

7.2 The 2 Distribution................................................... 122

7.2.1 The Probability Distribution Function ..................... 122

7.2.2 Moments and Other Properties............................. 125

7.2.3 Hypothesis Testing ......................................... 126

7.3 The Sampling Distribution of the Variance ......................... 127

7.4 The F Statistic ........................................................ 131

7.4.1 The Probability Distribution Function ..................... 132

7.4.2 Moments and Other Properties............................. 133

7.4.3 Hypothesis Testing ......................................... 134

7.5 The Sampling Distribution of the Mean

and the Student’s t Distribution ...................................... 137

7.5.1 Comparison of Sample Mean with Parent Mean .......... 137

7.5.2 Comparison of Two Sample Means and

Hypothesis Testing ......................................... 141

8 Maximum Likelihood Methods for Two-Variable Datasets ........... 147

8.1 Measurement of Pairs of Variables .................................. 147

8.2 Maximum Likelihood Method for Gaussian Data .................. 149

8.3 Least-Squares Fit to a Straight Line, or Linear Regression ........ 150

8.4 Multiple Linear Regression .......................................... 151

8.4.1 Best-Fit Parameters for Multiple Regression.............. 152

8.4.2 Parameter Errors and Covariances for

Multiple Regression ........................................ 153

8.4.3 Errors and Covariance for Linear Regression ............. 154

8.5 Special Cases: Identical Errors or No Errors Available ............ 155

xvi Contents

8.6 A Classic Experiment: Edwin Hubble’s Discovery

of the Expansion of the Universe .................................... 157

8.7 Maximum Likelihood Method for Non-linear Functions .......... 160

8.8 Linear Regression with Poisson Data ............................... 160

9 Multi-Variable Regression ................................................ 165

9.1 Multi-Variable Datasets .............................................. 165

9.2 A Classic Experiment: The R.A. Fisher and

E. Anderson Measurements of Iris Characteristics ................. 166

9.3 The Multi-Variable Linear Regression .............................. 168

9.4 Tests for Significance of the Multiple Regression Coefficients .... 170

9.4.1 T-Test for the Significance of Model Components........ 170

9.4.2 F-Test for Goodness of Fit ................................. 172

9.4.3 The Coefficient of Determination .......................... 174

10 Goodness of Fit and Parameter Uncertainty ............................ 177

10.1 Goodness of Fit for the 2

min Fit Statistic ............................ 177

10.2 Goodness of Fit for the Cash C Statistic ............................ 180

10.3 Confidence Intervals of Parameters for Gaussian Data ............. 181

10.3.1 Confidence Interval on All Parameters .................... 183

10.3.2 Confidence Intervals on Reduced Number

of Parameters ............................................... 184

10.4 Confidence Intervals of Parameters for Poisson Data .............. 186

10.5 The Linear Correlation Coefficient .................................. 187

10.5.1 The Probability Distribution Function ..................... 188

10.5.2 Hypothesis Testing ......................................... 190

11 Systematic Errors and Intrinsic Scatter ................................. 195

11.1 What to Do When the Goodness-of-Fit Test Fails .................. 195

11.2 Intrinsic Scatter and Debiased Variance ............................. 196

11.2.1 Direct Calculation of the Intrinsic Scatter ................. 196

11.2.2 Alternative Method to Estimate the Intrinsic Scatter ..... 197

11.3 Systematic Errors..................................................... 198

11.4 Estimate of Model Parameters with Systematic Errors

or Intrinsic Scatter.................................................... 200

12 Fitting Two-Variable Datasets with Bivariate Errors .................. 203

12.1 Two-Variable Datasets with Bivariate Errors ....................... 203

12.2 Generalized Least-Squares Linear Fit to Bivariate Data ........... 204

12.3 Linear Fit Using Bivariate Errors in the 2 Statistic ................ 209

13 Model Comparison ........................................................ 211

13.1 The F Test ............................................................ 211

13.1.1 F-Test for Two Independent 2 Measurements ........... 212

13.1.2 F-Test for an Additional Model Component .............. 214

13.2 Kolmogorov–Smirnov Tests ......................................... 216

13.2.1 Comparison of Data to a Model............................ 216

13.2.2 Two-Sample Kolmogorov–Smirnov Test .................. 219

Tải ngay đi em, còn do dự, trời tối mất!