Statistics for Data Analyst - Statistics MCQs, Analysis, Software

Goldfeld Quandt Test: Comparison of Variances of Error Terms

Jun 15, 2024Jul 22, 2012 by Muhammad Imdad Ullah

The Goldfeld Quandt test is one of two tests proposed in a 1965 paper by Stephen Goldfeld and Richard Quandt. Both parametric and nonparametric tests are described in the paper, but the term “Goldfeld–Quandt test” is usually associated only with the parametric test.
Goldfeld-Quandt test is frequently used as it is easy to apply when one of the regressors (or another r.v.) is considered the proportionality factor of heteroscedasticity. Goldfeld-Quandt test is applicable for large samples. The observations must be at least twice as many as the parameters to be estimated. The test assumes normality and serially independent error terms $u_i$.

The Goldfeld Quandt test compares the variance of error terms across discrete subgroups. So data is divided into h subgroups. Usually, the data set is divided into two parts or groups, and hence the test is sometimes called a two-group test.

Goldfeld Quandt Test: Comparison of Variances of Error Terms

Before starting how to perform the Goldfeld Quand Test, you may read more about the term Heteroscedasticity, the remedial measures of heteroscedasticity, Tests of Heteroscedasticity, and Generalized Least Square Methods.

Goldfeld Quandt Test Procedure:

The procedure for conducting the Goldfeld-Quandt Test is;

Order the observations according to the magnitude of $X$ (the independent variable which is the proportionality factor).
Select arbitrarily a certain number (c) of central observations which we omit from the analysis. (for $n=30$, 8 central observations are omitted i.e. 1/3 of the observations are removed). The remaining $n-c$ observations are divided into two sub-groups of equal size i.e. $\frac{(n-2)}{2}$, one sub-group includes small values of $X$ and the other sub-group includes the large values of $X$, and a data set is arranged according to the magnitude of $X$.
Now Fit the separate regression to each of the sub-groups, and obtain the sum of squared residuals from each of them.
So $\sum c_1^2$ shows the sum of squares of Residuals from a sub-sample of low values of $X$ with $(n – c)/2 – K$ df, where K is the total number of parameters.$\sum c_2^2$ shows the sum of squares of Residuals from a sub-sample of large values of $X$ with $(n – c)/2 – K$ df, where K is the total number of parameters.
Compute the Relation $F^* = \frac{RSS_2/df}{RSS_2/df}=\frac{\sum c_2^2/ ((n-c)/2-k)}{\sum c_1^2/((n-c)/2-k) }$

If variances differ, F^* will have a large value. The higher the observed value of the F^*-ratio the stronger the heteroscedasticity of the $u_i$.

References

Goldfeld, Stephen M.; Quandt, R. E. (June 1965). “Some Tests for Homoscedasticity”. Journal of the American Statistical Association 60 (310): 539–547
Kennedy, Peter (2008). A Guide to Econometrics (6th ed.). Blackwell. p. 116

Numerical Example of the Goldfeld-Quandt Test.

R Programming and Data Analysis in R

Online MCQs Test Website

Heteroscedasticity Definition, Reasons, Consequences (2012)

Sep 21, 2024Jul 18, 2012 by Muhammad Imdad Ullah

Heteroscedasticity Definition

An important assumption of OLS is that the disturbances $u_i$ appearing in the population regression function are homoscedastic (Error terms have the same variance).

The variance of each disturbance term $u_i$, conditional on the chosen values of explanatory variables is some constant number equal to $\sigma^2$. $E(u_{i}^{2})=\sigma^2$; where $i=1,2,\cdots, n$.
Homo means equal and scedasticity means spread.

Consider the general linear regression model
\[y_i=\beta_1+\beta_2 x_{2i}+ \beta_3 x_{3i} +\cdots + \beta_k x_{ki} + \varepsilon\]

If $E(\varepsilon_{i}^{2})=\sigma^2$ for all $i=1,2,\cdots, n$ then the assumption of constant variance of the error term or homoscedasticity is satisfied.

If $E(\varepsilon_{i}^{2})\ne\sigma^2$ then the assumption of homoscedasticity is violated and heteroscedasticity is said to be present. In the case of heteroscedasticity, the OLS estimators are unbiased but inefficient.

Examples:

The range in family income between the poorest and richest families in town is the classical example of heteroscedasticity.
The range in annual sales between a corner drug store and a general store.

Heteroscedasticity Definition, Reasons, Consequences

Reasons for Heteroscedasticity

There are several reasons why the variances of error term $u_i$ may be variable, some of which are:

Following the error learning models, as people learn their errors of behavior become smaller over time. In this case $\sigma_{i}^{2}$ is expected to decrease. For example the number of typing errors made in a given period on a test to the hours put in typing practice.
As income grows, people have more discretionary income, and hence $\sigma_{i}^{2}$ is likely to increase with income.
As data-collecting techniques improve, $\sigma_{i}^{2}$ is likely to decrease.
Heteroscedasticity can also arise as a result of the presence of outliers. The inclusion or exclusion of such observations, especially when the sample size is small, can substantially alter the results of regression analysis.
Heteroscedasticity arises from violating the assumption of CLRM (classical linear regression model), that the regression model is not correctly specified.
Skewness in the distribution of one or more regressors included in the model is another source of heteroscedasticity.
Incorrect data transformation and incorrect functional form (linear or log-linear model) are also the sources of heteroscedasticity

Consequences of Heteroscedasticity

The OLS estimators and regression predictions based on them remain unbiased and consistent.
The OLS estimators are no longer the BLUE (Best Linear Unbiased Estimators) because they are no longer efficient, so the regression predictions will be inefficient too.
Because of the inconsistency of the covariance matrix of the estimated regression coefficients, the tests of hypotheses, (t-test, F-test) are no longer valid.

Note: Problems of heteroscedasticity are likely to be more common in cross-sectional than in time series data.

Reference
Greene, W.H. (1993). Econometric Analysis, Prentice–Hall, ISBN 0-13-013297-7.
Verbeek, Marno (2004.) A Guide to Modern Econometrics, 2. ed., Chichester: John Wiley & Sons.
Gujarati, D. N. & Porter, D. C. (2008). Basic Econometrics, 5. ed., McGraw Hill/Irwin.

FAQS about Heteroscedasticity

Define heteroscedasticity.
What are the major consequences that may occur if heteroscedasticity occurs?
What does mean by the constant variance of the error term in linear regression models?
What are the possible reasons that make error term variance a variable?
In what kind of data are problems of heteroscedasticity is likely to exist?

Learn R Programming Language

Moments In Statistics (2012)

Sep 23, 2024Jul 14, 2012 by Muhammad Imdad Ullah

Introduction to Moments in Statistics

The measure of central tendency (location) and the measure of dispersion (variation) are useful for describing a data set. Both the measure of central tendencies and the measures of dispersion fail to tell anything about the shape of the distribution. We need some other certain measure called the moments. Moments in Statistics are used to identify the shape of the distribution known as skewness and kurtosis.

Moments are fundamental statistical tools for understanding the characteristics of any dataset. They provide quantitative measures that describe the data:

Central tendency: The “center” of the data. It is the most common measure of central tendency, but other moments can also be used.
Spread: Indicates how scattered the data is around the central tendency. Common measures of spread include variance and standard deviation.
Shape: Describes the overall form of the data distribution. For instance, is it symmetrical? Does it have a long tail on one side? Higher-order moments like skewness and kurtosis help analyze the shape.

Moments about Mean

The moments about the mean are the mean of deviations from the mean after raising them to integer powers. The $r$th population moment about the mean is denoted by $\mu_r$ is

\[\mu_r=\frac{\sum\limits^{N}_{i=1}(y_i – \bar{y} )^r}{N}\]

where $r=1,2,\cdots$

The corresponding sample moment denoted by $m_r$ is

\[\mu_r=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^r}{n}\]

Note that if $r=1$ i.e. the first moment is zero as $\mu_1=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^1}{n}=0$. So the first moment is always zero.

If $r=2$ then the second moment is variance i.e. \[\mu_2=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^2}{n}\]

Similarly, the 3rd and 4th moments are

\[\mu_3=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^3}{n}\]

\[\mu_4=\frac{\sum\limits^{n}_{i=1}(y_i – \bar{y} )^4}{n}\]

For grouped data, the $r$th sample moment about the sample mean $\bar{y}$ is

\[\mu_r=\frac{\sum\limits^{n}_{i=1}f_i(y_i – \bar{y} )^r}{\sum\limits^{n}_{i=1}f_i}\]

where $\sum\limits^{n}_{i=1}f_i=n$

Moments about Arbitrary Value

The $r$th sample sample moment about any arbitrary origin “a” denoted by $m’_r$ is
\[m’_r = \frac{\sum\limits^{n}_{i=1}(y_i – a)^2}{n} = \frac{\sum\limits^{n}_{i=1}D^r_i}{n}\]
where $D_i=(y_i -a)$ and $r=1,2,\cdots$.

therefore
\begin{eqnarray*}
m’_1&=&\frac{\sum\limits^{n}_{i=1}(y_i – a)}{n}=\frac{\sum\limits^{n}_{i=1}D_i}{n}\\
m’_2&=&\frac{\sum\limits^{n}_{i=1}(y_i – a)^2}{n}=\frac{\sum\limits^{n}_{i=1}D_i ^2}{n}\\
m’_3&=&\frac{\sum\limits^{n}_{i=1}(y_i – a)^3}{n}=\frac{\sum\limits^{n}_{i=1}D_i ^3}{n}\\
m’_4&=&\frac{\sum\limits^{n}_{i=1}(y_i – a)^4}{n}=\frac{\sum\limits^{n}_{i=1}D_i ^4}{n}
\end{eqnarray*}

The $r$th sample moment for grouped data about any arbitrary origin “a” is

$$m’_r=\frac{\sum\limits^{n}_{i=1}f_i(y_i – a)^r}{\sum\limits^{n}_{i=1}f} = \frac{\sum f_i D_i ^r}{\sum f}$$

The moments about the mean are usually called central moments and the moments about any arbitrary origin “a” are called non-central moments or raw moments.

One can calculate the moments about mean from the following relations by calculating the moments about arbitrary value

\begin{eqnarray*}
m_1&=& m’_1 – (m’_1) = 0 \\
m_2 &=& m’_2 – (m’_1)^2\\
m_3 &=& m’_3 – 3m’_2m’_1 +2(m’_1)^3\\
m_4 &=& m’_4 -4 m’_3m’_1 +6m’_2(m’_1)^2 -3(m’_1)^4
\end{eqnarray*}

Moments about Zero

If variable $y$ assumes $n$ values $y_1, y_2, \cdots, y_n$ then $r$th moment about zero can be obtained by taking $a=0$ so the moment about arbitrary value will be
\[m’_r = \frac{\sum y^r}{n}\]

where $r=1,2,3,\cdots$.

therefore
\begin{eqnarray*}
m’_1&=&\frac{\sum y^1}{n}\\
m’_2 &=&\frac{\sum y^2}{n}\\
m’_3 &=&\frac{\sum y^3}{n}\\
m’_4 &=&\frac{\sum y^4}{n}\\
\end{eqnarray*}

The third moment is used to define the skewness of a distribution

\[{\rm Skew ness} = \frac{\sum\limits^{i=1}_n (y_i-\overline{y})^3} {ns^3}\]

If the distribution is symmetric then the skewness will be zero. Skewness will be positive if there is a long tail in the positive direction and skewness will be negative if there is a long tail in the negative direction.

The fourth moment is used to define the kurtosis of a distribution

\[{\rm Kurtosis} = \frac{\sum\limits^{i=1}_{n} (y_i -\overline{y})^4}{ns^4}\]

In summary, moments are quantitative measures that describe the distribution of a dataset around its central tendency. Different types of moments, provide specific information about the shape and characteristics of data. By understanding and utilizing moments, one can get a deeper understanding of the data and make more informed decisions in statistical analysis.

FAQS about Moments in Statistics

Define moments in Statistics.
What is the use of moments?
How moments are used to understand the characteristics of the data?
What is meant by moments about mean?
What are moments about arbitrary value?
What is meant by moments about zero?
Define the different types of moments.

Online MCQs Test Preparation Website

Skewness Formula, Introduction, Interpretation (2012)

Sep 21, 2024Jul 11, 2012 by Muhammad Imdad Ullah

Skewness is the degree of asymmetry or departure from the symmetry of the distribution of a real-valued random variable.

Positive Skewed
If the frequency curve of distribution has a longer tail to the right of the central maximum than to the left, the distribution is said to be skewed to the right or to have positively skewed. In a positively skewed distribution, the mean is greater than the median and the median is greater than the mode i.e. $$Mean > Median > Mode$$

Negative Skewed
If the frequency curve has a longer tail to the left of the central maximum than to the right, the distribution is said to be skewed to the left or to be negatively skewed. In a negatively skewed distribution, the mode is greater than the median and the median is greater than the mean i.e. $$Mode > Median > Mean$$

In a symmetrical distribution, the mean, median, and mode coincide. In a skewed distribution, these values are pulled apart.

Pearson’s Coefficient of Skewness Formula

Karl Pearson, (1857-1936) introduced a coefficient to measure the degree of skewness of distribution or curve, which is denoted by $S_k$ and defined by

\begin{eqnarray*}
S_k &=& \frac{Mean – Mode}{Standard Deviation}\\
S_k &=& \frac{3(Mean – Median)}{Standard Deviation}\\
\end{eqnarray*}
Usually, this coefficient varies between –3 (for negative) to +3 (for positive) and the sign indicates the direction of skewness.

Bowley’s Coefficient of Skewness Formula (Quartile Coefficient)

Arthur Lyon Bowley (1869-1957) proposed a measure of skewness based on the median and the two quartiles.

\[S_k=\frac{Q_1+Q_3-2Median}{Q_3 – Q_1}\]
Its values lie between 0 and ±1.

Moment Coefficient of Skewness Formula

This measure of skewness is the third moment expressed in standard units (or the moment ratio) thus given by

\[S_k=\frac{\mu_3}{\sigma^3} \]
Its values lie between -2 and +2.

If $S_k$ is greater than zero, the distribution or curve is said to be positively skewed. If $S_k$ is less than zero the distribution or curve is said to be negatively skewed. If $S_k$ is zero the distribution or curve is said to be symmetrical.

The skewness of the distribution of a real-valued random variable can easily be seen by drawing a histogram or frequency curve.

The skewness may be very extreme and in such a case these are called J-shaped distributions.

FAQs about Skewness

What is the degree of asymmetry called?
What is a departure from symmetry?
If a distribution is negatively skewed then what is the relation between mean, median, and mode?
If a distribution is positively skewed then what is the relation between mean, median, and mode?
What is the relation between mean, median, and mode for a symmetrical distribution?
What is the range of the moment coefficient of skewness?

Learn R Frequently Asked Questions

Goldfeld Quandt Test: Comparison of Variances of Error Terms

Goldfeld Quandt Test Procedure:

Heteroscedasticity Definition, Reasons, Consequences (2012)

Heteroscedasticity Definition

Table of Contents

Reasons for Heteroscedasticity

Consequences of Heteroscedasticity

FAQS about Heteroscedasticity

Moments In Statistics (2012)

Introduction to Moments in Statistics

Table of Contents

Moments about Mean

Moments about Arbitrary Value

Moments about Zero

FAQS about Moments in Statistics

Skewness Formula, Introduction, Interpretation (2012)

Table of Contents

Pearson’s Coefficient of Skewness Formula

Bowley’s Coefficient of Skewness Formula (Quartile Coefficient)

Moment Coefficient of Skewness Formula

FAQs about Skewness

Goldfeld Quandt Test Procedure:

Share this:

Heteroscedasticity Definition

Table of Contents

Reasons for Heteroscedasticity

Consequences of Heteroscedasticity

FAQS about Heteroscedasticity

Share this:

Introduction to Moments in Statistics

Table of Contents

Moments about Mean

Moments about Arbitrary Value

Moments about Zero

FAQS about Moments in Statistics

Share this:

Table of Contents

Pearson’s Coefficient of Skewness Formula

Bowley’s Coefficient of Skewness Formula (Quartile Coefficient)

Moment Coefficient of Skewness Formula

FAQs about Skewness

Share this: