Outliers and Influential Observations

Here we will focus on the difference between the outliers and influential observations.

The cases (observations or data points) that do not follow the model as the rest of the data are called outliers. In Regression, the cases with large residuals are a candidate for outliers. So an outlier is a data point that diverges from an overall pattern in a sample. Therefore, an outlier can certainly influence the relationship between the variables and may also exert an influence on the slope of the regression line.

An outlier can be created by a shift in the location (mean) or in the scale (variability) of the process. An outlier may be due to recording errors (may be correctable), or due to the sample not being entirely from the same population. May also be due to the values from the same population but from the non-normal (heavy-tailed) population. That is, an outliers may be due to incorrect specifications that are based on the wrong distributional assumptions.

An influential observation is often an outlier in the x-direction. Influential observation may arise from

  1. observations that are unusually large or otherwise deviate in unusually extreme forms from the center of a reference distribution,
  2. the observation may be associated with a unit that has lowprobability, and thus having high probability weight.
  3. the observation may have a weight that is very large (relative to the weights of other units in the specified sub-population) due to problems with stratum jumping; sampling of birth units or highly seasonal units; large nonresponse adjustment factors arising from unusually low response rates within a given adjustment cell; unusual calibration-weighting effects; or other factors.

Muhammad Imdad Ullah

Currently working as Assistant Professor of Statistics in Ghazi University, Dera Ghazi Khan. Completed my Ph.D. in Statistics from the Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan. l like Applied Statistics, Mathematics, and Statistical Computing. Statistical and Mathematical software used is SAS, STATA, GRETL, EVIEWS, R, SPSS, VBA in MS-Excel. Like to use type-setting LaTeX for composing Articles, thesis, etc.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

x Logo: Shield Security
This Site Is Protected By
Shield Security