Thư viện tri thức trực tuyến
Kho tài liệu với 50,000+ tài liệu học thuật
© 2023 Siêu thị PDF - Kho tài liệu học thuật hàng đầu Việt Nam

Tài liệu Kinh tế ứng dụng_ Lecture 3: Outliers, Leverage and Influence docx
Nội dung xem thử
Mô tả chi tiết
Applied Econometrics 1 Outliers, Leverage and Influence
Applied Econometrics
Lecture 3: Outliers, Leverage and Influence
‘Life is the art of drawing sufficient conclusions from insufficient premises’
SAMUEL BUTLER
1) Introduction
The estimates of the regression parameters are influenced by a few extreme observations. The
residual plot may let us pick out, which the individual data points are high or low. We may use the
residual plot to find the outlier, which are inadequately captured by the regression model itself.
2) Identification of outliers
¾ The percentiles that cut the data up into four quarters have special names: The 25th percentiles
and the 75th percentiles are called the lower and upper quartiles (QL and QU)
¾ The lower quartile will be the [integer((n+1)/2)+1]/2 value from the bottom of the ordered list.
the upper quartile is the [integer((n+1)/2)+1]/2 value from the top
¾ A data point Y0 is considered to be an outliers if
Y0 < QL – 1.5 IQR or Y0 > QU + 1.5 IQR
where IQR is the inter – quartile range (IQR = QU – QL) (Source: Hoaglin, 1983)
3) Outliers
An outlier is a point, which is far removed from its fitted value (i.e., has large residual). Large in this
context does not refer to the absolute size of a residual but to its size relative to most of the other
residuals in the regression.
When a point is an outlier in univariate analysis, it is defined with reference to its own mean. When
a point is an outlier in bivariate analysis, it has a large residual (i.e., Y value is far removed from its
fitted value).
Apart from the graphical methods, we can also rely on special statistics to detect outliers. In order to
compare the large residual to the other residual, we may calculate the standardized residual, which is
simply the residual divided by the standard error of the estimate (ei/s). But an outlier in the data set
will inflate the standard error of the regression. Hence we use the studentized residual
Written by Nguyen Hoang Bao May 20, 2004