Cases that do not follow the model as the rest of the data are called outliers. In Regression the cases with large residuals are candidate for outliers. So an outlier is a data point that diverges from an overall pattern in a sample. Therefore an outlier can certainly influence the relationship between the variables and may also exert an influence on the slope of the regression line.
An outlier can be created by a shift in the location (mean) or in the scale (variability) of the process. Outlier may be due to recording errors (may be correctable), or due to the sample not being entirely from the same population. May also be due to the values from the same population but from non-normal (heavy tailed) population. i.e. Outliers may be due to incorrect specifications that are based on the wrong distributional assumptions.
An influential observation is often an outlier in the x-direction. Influential observation may arise from
- observations that are unusually large or otherwise deviate in unusually extreme forms from the center of a reference distribution,
- the observation may be associated with a unit that has low probability, and thus having high probability weight.
- the observation may have a weight that is very large (relative to the weights of other units in the specified subpopulation) due to problems with stratum jumping; sampling of birth units or highly seasonal units; large nonresponse adjustment factors arising from unusually low response rates within a given adjustment cell; unusual calibration-weighting effects; or other factors.