Bias in Statistics

Bias in Statistics is defined as the difference between the expected value of a statistic and the true value of the corresponding parameter. Therefore, the bias is a measure of the systematic error of an estimator. The bias indicates the distance of the estimator from …

Read Complete Post

Outliers and Influential Observations

Here we will focus on the difference between the outliers and influential observations.

Outliers

The cases (observations or data points) that do not follow the model as the rest of the data are called outliers. In Regression, the cases with large residuals are a candidate for outliers. So an outlier is a data point that diverges from an overall pattern in a sample. Therefore, an outlier can certainly influence the relationship between the variables and may also exert an influence on the slope of the regression line.

An outlier can be created by a shift in the location (mean) or in the scale (variability) of the process. An outlier may be due to recording errors (may be correctable), or due to the sample not being entirely from the same population. This may also be due to the values from the same population but from the non-normal (heavy-tailed) population. That is, outliers may be due to incorrect specifications that are based on the wrong distributional assumptions.

Outliers and Influential Observations

Inferential Observations

An influential observation is often an outlier in the x-direction. Influential observation may arise from

  1. observations that are unusually large or otherwise deviate in unusually extreme forms from the center of a reference distribution,
  2. the observation may be associated with a unit that has a low probability and thus has a high probability weight.
  3. the observation may have a very large weight (relative to the weights of other units in the specified sub-population) due to problems with stratum jumping; sampling of birth units or highly seasonal units; large nonresponse adjustment factors arising from unusually low response rates within a given adjustment cell; unusual calibration-weighting effects; or other factors.

Importance of Outliers and Influential Observations

Outliers and Influential observations are important because:

  • Both outliers and influential observations can potentially mislead the interpretation of the regression model.
  • Outliers might indicate errors in the data or a non-linear relationship that the model cannot capture.
  • Influential observations can make the model seem more accurate than it is, masking underlying issues.

Both outliers and influential observations can be identified by using:

How to identify them?

  • Visual inspection: Scatterplots can reveal outliers as distant points.
  • Residual plots: Plotting residuals against predicted values or independent variables can show patterns indicative of influential observations.
  • Statistical diagnostics: Measures like Cook’s distance or leverage can quantify the influence of each data point.

By being aware of outliers and influential observations, one can ensure that the regression analysis provides a more reliable picture of the relationship between variables.

Learn R Programming Language

Error and Residual in Regression

Error and Residual in Regression In Statistics and Optimization, Statistical Errors and Residuals are two closely related and easily confused measures of “Deviation of a sample from the mean”. Error is a misnomer; an error is the amount by which an observation differs from its …

Read Complete Post

P-value Interpretation and Misinterpretation of P-value 2012

The P-value is a probability, with a value ranging from zero to one. It is a measure of how much evidence we have against the null hypothesis. P-value is a way to express the likelihood that $H_0$ is not true. The smaller the p-value, the …

Read Complete Post