Heteroscedasticity Definition, Reasons, Consequences (2012)

Heteroscedasticity Definition

An important assumption of OLS is that the disturbances ui appearing in the population regression function are homoscedastic (Error terms have the same variance).

The variance of each disturbance term ui, conditional on the chosen values of explanatory variables is some constant number equal to σ2. E(ui2)=σ2; where i=1,2,,n.
Homo means equal and scedasticity means spread.

Consider the general linear regression model
yi=β1+β2x2i+β3x3i++βkxki+ε

If E(εi2)=σ2 for all i=1,2,,n then the assumption of constant variance of the error term or homoscedasticity is satisfied.

If E(εi2)σ2 then the assumption of homoscedasticity is violated and heteroscedasticity is said to be present. In the case of heteroscedasticity, the OLS estimators are unbiased but inefficient.

Examples:

  1. The range in family income between the poorest and richest families in town is the classical example of heteroscedasticity.
  2. The range in annual sales between a corner drug store and a general store.
Heteroscedasticity Definition, Reasons, Consequences

Reasons for Heteroscedasticity

There are several reasons why the variances of error term ui may be variable, some of which are:

  1. Following the error learning models, as people learn their errors of behavior become smaller over time. In this case σi2 is expected to decrease. For example the number of typing errors made in a given period on a test to the hours put in typing practice.
  2. As income grows, people have more discretionary income, and hence σi2 is likely to increase with income.
  3. As data-collecting techniques improve, σi2 is likely to decrease.
  4. Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such observations, especially when the sample size is small, can substantially alter the results of regression analysis.
  5. Heteroscedasticity arises from violating the assumption of CLRM (classical linear regression model), that the regression model is not correctly specified.
  6. Skewness in the distribution of one or more regressors included in the model is another source of heteroscedasticity.
  7. Incorrect data transformation and incorrect functional form (linear or log-linear model) are also the sources of heteroscedasticity
Heteroscedasticity Definition

Consequences of Heteroscedasticity

  1. The OLS estimators and regression predictions based on them remain unbiased and consistent.
  2. The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
  3. Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.

Note: Problems of heteroscedasticity are likely to be more common in cross-sectional than in time series data.

Reference
Greene, W.H. (1993). Econometric Analysis, Prentice–Hall, ISBN 0-13-013297-7.
Verbeek, Marno (2004.) A Guide to Modern Econometrics, 2. ed., Chichester: John Wiley & Sons.
Gujarati, D. N. & Porter, D. C. (2008). Basic Econometrics, 5. ed., McGraw Hill/Irwin.

FAQS about Heteroscedasticity

  1. Define heteroscedasticity.
  2. What are the major consequences that may occur if heteroscedasticity occurs?
  3. What does mean by the constant variance of the error term in linear regression models?
  4. What are the possible reasons that make error term variance a variable?
  5. In what kind of data are problems of heteroscedasticity is likely to exist?
https://itfeature.com

Learn R Programming Language

Moments In Statistics (2012)

Introduction to Moments in Statistics

The measure of central tendency (location) and the measure of dispersion (variation) are useful for describing a data set. Both the measure of central tendencies and the measures of dispersion fail to tell anything about the shape of the distribution. We need some other certain measure called the moments. Moments in Statistics are used to identify the shape of the distribution known as skewness and kurtosis.

Moments are fundamental statistical tools for understanding the characteristics of any dataset. They provide quantitative measures that describe the data:

  • Central tendency: The “center” of the data. It is the most common measure of central tendency, but other moments can also be used.
  • Spread: Indicates how scattered the data is around the central tendency. Common measures of spread include variance and standard deviation.
  • Shape: Describes the overall form of the data distribution. For instance, is it symmetrical? Does it have a long tail on one side? Higher-order moments like skewness and kurtosis help analyze the shape.

Moments about Mean

The moments about the mean are the mean of deviations from the mean after raising them to integer powers. The rth population moment about the mean is denoted by μr is

μr=i=1N(yiy¯)rN

where r=1,2,

The corresponding sample moment denoted by mr is

μr=i=1n(yiy¯)rn

Note that if r=1 i.e. the first moment is zero as μ1=i=1n(yiy¯)1n=0. So the first moment is always zero.

If r=2 then the second moment is variance i.e. μ2=i=1n(yiy¯)2n

Similarly, the 3rd and 4th moments are

μ3=i=1n(yiy¯)3n

μ4=i=1n(yiy¯)4n

For grouped data, the rth sample moment  about the sample mean y¯ is

μr=i=1nfi(yiy¯)ri=1nfi

where i=1nfi=n

Moments about Arbitrary Value

The rth sample sample moment about any arbitrary origin “a” denoted by mr is
mr=i=1n(yia)2n=i=1nDirn
where Di=(yia) and r=1,2,.

therefore
m1=i=1n(yia)n=i=1nDinm2=i=1n(yia)2n=i=1nDi2nm3=i=1n(yia)3n=i=1nDi3nm4=i=1n(yia)4n=i=1nDi4n

The rth sample moment for grouped data about any arbitrary origin “a” is

mr=i=1nfi(yia)ri=1nf=fiDirf

The moments about the mean are usually called central moments and the moments about any arbitrary origin “a” are called non-central moments or raw moments.

One can calculate the moments about mean from the following relations by calculating the moments about arbitrary value

m1=m1(m1)=0m2=m2(m1)2m3=m33m2m1+2(m1)3m4=m44m3m1+6m2(m1)23(m1)4

Moments about Zero

If variable y assumes n values y1,y2,,yn then rth moment about zero can be obtained by taking a=0 so the moment about arbitrary value will be
mr=yrn

where r=1,2,3,.

therefore
m1=y1nm2=y2nm3=y3nm4=y4n

The third moment is used to define the skewness of a distribution

Skewness=ni=1(yiy)3ns3

If the distribution is symmetric then the skewness will be zero. Skewness will be positive if there is a long tail in the positive direction and skewness will be negative if there is a long tail in the negative direction.

The fourth moment is used to define the kurtosis of a distribution

Kurtosis=ni=1(yiy)4ns4

Moments in Statistics

In summary, moments are quantitative measures that describe the distribution of a dataset around its central tendency. Different types of moments, provide specific information about the shape and characteristics of data. By understanding and utilizing moments, one can get a deeper understanding of the data and make more informed decisions in statistical analysis.

FAQS about Moments in Statistics

  1. Define moments in Statistics.
  2. What is the use of moments?
  3. How moments are used to understand the characteristics of the data?
  4. What is meant by moments about mean?
  5. What are moments about arbitrary value?
  6. What is meant by moments about zero?
  7. Define the different types of moments.
Moments In Statistics (2012)

Online MCQs Test Preparation Website

Skewness Formula

The post outlines key skewness formulas providing essential tools for analyzing data distribution asymmetry. The skewness formulas help quantify the direction and degree of skewness, aiding in data analysis and decision-making.

What is Skewness?

Skewness is a statistical measure that describes the asymmetry of a probability distribution around its mean. It indicates whether the data is skewed to the left (negative skew), the right (positive skew), or symmetrically distributed (zero skew). In short, Skewness is the degree of asymmetry or departure from the symmetry of the distribution of a real-valued random variable. The post describes some important skewness formulas.

Positive Skewed

If the frequency curve of distribution has a longer tail to the right of the central maximum than to the left, the distribution is said to be skewed to the right or to have positively skewed. In a positively skewed distribution, the mean is greater than the median and the median is greater than the mode i.e. Mean>Median>Mode

Negative Skewed

If the frequency curve has a longer tail to the left of the central maximum than to the right, the distribution is said to be skewed to the left or to be negatively skewed. In a negatively skewed distribution, the mode is greater than the median and the median is greater than the mean i.e. Mode>Median>Mean

Zero Skewness

For zero skewness, the data is symmetrically distributed, as in a normal distribution.

Measure of Skewness Formulation

In a symmetrical distribution, the mean, median, and mode coincide. In a skewed distribution, these values are pulled apart.

Skewness Formula

Pearson’s Coefficient of Skewness Formula

Karl Pearson, (1857-1936) introduced a coefficient to measure the degree of skewness of distribution or curve, which is denoted by Sk and defined by

Sk=MeanModeStandardDeviationSk=3(MeanMedian)StandardDeviation
Usually, this coefficient varies between –3 (for negative) to +3 (for positive) and the sign indicates the direction of skewness.

Bowley’s Coefficient of Skewness Formula (Quartile Coefficient)

Arthur Lyon Bowley (1869-1957) proposed a measure of skewness based on the median and the two quartiles.

Sk=Q1+Q32MedianQ3Q1
Its values lie between 0 and ±1.

Moment Coefficient of Skewness Formula

This measure of skewness is the third moment expressed in standard units (or the moment ratio) thus given by

Sk=μ3σ3
Its values lie between -2 and +2.

If Sk is greater than zero, the distribution or curve is said to be positively skewed. If Sk is less than zero the distribution or curve is said to be negatively skewed. If Sk is zero the distribution or curve is said to be symmetrical.

The skewness of the distribution of a real-valued random variable can easily be seen by drawing a histogram or frequency curve.

The skewness may be very extreme and in such a case these are called J-shaped distributions.

Skewness: J-Shaped Distribution

Skewness helps identify deviations from normality, which is crucial for selecting appropriate statistical methods and interpreting data accurately. It is commonly used in finance, economics, and data analysis to understand the shape and behavior of datasets

FAQs about Skewness

  1. What is the degree of asymmetry called?
  2. What is a departure from symmetry?
  3. If a distribution is negatively skewed then what is the relation between mean, median, and mode?
  4. If a distribution is positively skewed then what is the relation between mean, median, and mode?
  5. What is the relation between mean, median, and mode for a symmetrical distribution?
  6. What is the range of the moment coefficient of skewness?

Learn R Frequently Asked Questions