Hierarchical Multiple Regression SPSS

In this tutorial, we will learn how to perform hierarchical multiple regression analysis SPSS, which is a variant of the basic multiple regression analysis that allows specifying a fixed order of entry for variables (regressors) to control for the effects of covariates or to test the effects of certain predictors independent of the influence of other.

Step By Step Procedure of Hierarchical Multiple Regression SPSS

The basic command for hierarchical multiple regression analysis SPSS is “regression -> linear”:

Hierarchical Multiple Regression SPSS

In the main dialog box of linear regression (as given below), input the dependent variable. For example “income” variable from the sample file of customer_dbase.sav available in the SPSS installation directory.

Next, enter a set of predictor variables into an independent(s) pan. These variables that you want SPSS to put into the regression model first (that you want to control for when testing the variables). For example, in this analysis, we want to find out whether the “Number of people in the house” predicts the “Household income in thousands”.

We are also concerned that other variables like age, education, gender, union member, or retirement might be associated with both the “number of people in the house” and “household income in thousands”. To make sure that these variables (age, education, gender, union member, and retired) do not explain away the entire association between the “number of people in the house” and “Household income in thousands”, let’s put them into the model first.

This ensures that they will get credit for any shared variability that they may have with the predictor that we are interested in, “Number of people in the house”. any observed effect of “Number of people in the house” can then be said to be “independent of the effects of these variables that already have been controlled for. See the figure below

Linear Regression Variable

In the next step put the variable that we are interested in, which is the “number of people in the house”. To include it in the model click the “NEXT” button. You will see all of the predictors (that were entered previously) disappear. Note that they are still in the model, just not on the current screen (block). You will also see Block 2 of 2 above the “independent(s)” pan.

Hierarchical Regression

Now click the “OK” button to run the analysis.

Note you can also hit the “NEXT” button again if you are interested in entering a third or fourth (and so on) block of variables.

Often researchers enter variables as related sets. For example demographic variables in the first step, all potentially confounding variables in the second step, and then the variables that you are most interested in in the third step. However, it is not necessary to follow. One can also enter each variable as a separate step if that seems more logical based on the design of your experiment.

Output Hierarchical Multiple Regression Analysis

Using just the default “Enter” method, with all the variables in Block 1 (demographics) entered together, followed by “number of people in the house” as a predictor in Block 2, we get the following output:

Output Hierarchical Regression

The first table of output windows confirms that variables are entered in each step.

The summary table shows the percentage of explained variation in the dependent variable that can be accounted for by all the predictors together. The change in $R^2$ (R-squared) is a way to evaluate how much predictive power was added to the model by the addition of another variable in STEP 2. In our example, predictive power does not improve with the addition of another predictor in STEP 2.

Hierarchical Regression Output

The overall significance of the model can be checked from this ANOVA table. In this case, both models are statistically significant.

Hierarchical Regression Output

The coefficient table is used to check the individual significance of predictors. For model 2, the Number of people in the household is statistically non-significant, therefore excluded from the model.

Learn about Multiple Regression Analysis

R Language Frequently Asked Questions

Model Selection Criteria (2019)

All models are wrong, but some are useful. Model selection criteria are rules used to select a (statistical) model among competing models, based on given data.

Several model selection criteria are used to choose among a set of candidate models, and/ or compare models for forecasting purposes.

All model selection criteria aim at minimizing the residual sum of squares (or increasing the coefficient of determination value). The criterion Adj-$R^2$, Akaike Information, Bayesian Information Criterion, Schwarz Information Criterion, and Mallow’s $C_p$ impose a penalty for including an increasingly large number of regressors. Therefore, there is a trade-off between the goodness of fit of the model and its complexity. The complexity refers to the number of parameters in the model.

Model Selection Criteria

Model Selection Criteria: Coefficient of Determination ($R^2$)

$$R^2=\frac{\text{Explained Sum of Square}}{\text{Total Sum of Squares}}=1-\frac{\text{Residuals Sum of Squares}}{\text{Total Sum of Squares}}$$

Adding more variables to the model may increase $R^2$ but it may also increase the variance of forecast error.
There are some problems with $R^2$

  • It measures in-sample goodness of fit (how close an estimated $Y$ value is to its actual values) in the given sample. There is no guarantee that $R^2$ will forecast well out-of-sample observations.
  • In comparing two or more $R^2$’s, the dependent variable must be the same.
  • $R^2$ cannot fall when more variables are added to the model.

Model Selection Criteria: Adjusted Coefficient of Determination ($R^2$)

$$\overline{R}^2=1-\frac{RSS/(n-k}{TSS(n-1)}$$

$\overline{R}^2 \ge R^2$ shows that the adjusted $R^2$ penalizes for adding more regressors (explanatory variables). Unlike $R^2$, the adjusted $R^2$ will increase only if the absolute $t$-value of the added variable is greater than 1. For comparative purposes, $\overline{R}^2$ is a better measure than $R^2$. The regressand (dependent variable) must be the same for the comparison of models to be valid.

Model Selection Criteria: Akaike’s Information Criterion (AIC)

$$AIC=e^{\frac{2K}{n}}\frac{\sum \hat{u}^2_i}{n}=e^{\frac{2k}{n}}\frac{RSS}{n}$$
where $k$ is the number of regressors including the intercept. The formula of AIC is

$$\ln AIC = \left(\frac{2k}{n}\right) + \ln \left(\frac{RSS}{n}\right)$$
where $\ln AIC$ is natural log of AIC and $\frac{2k}{n}$ is penalty factor.

AIC imposes a harsher penalty than the adjusted coefficient of determination for adding more regressors. In comparing two or more models, the model with the lowest value of AIC is preferred. AIC is useful for both in-sample and out-of-sample forecasting performance of a regression model. AIC is used to determine the lag length in an AR(p) model also.

Model Selection Criteria: Schwarz’s Information Criterion (SIC)

\begin{align*}
SIC &=n^{\frac{k}{n}}\frac{\sum \hat{u}_i^2}{n}=n^{\frac{k}{n}}\frac{RSS}{n}\\
\ln SIC &= \frac{k}{n} \ln n + \ln \left(\frac{RSS}{n}\right)
\end{align*}
where $\frac{k}{n}\ln\,n$ is the penalty factor. SIC imposes a harsher penalty than AIC.

Like AIC, SIC is used to compare the in-sample or out-of-sample forecasting performance of a model. The lower the values of SIC, the better the model.

Model Selection Criteria: Mallow’s $C_p$ Criterion

For Model selection the Mallow criteria is
$$C_p=\frac{RSS_p}{\hat{\sigma}^2}-(n-2p)$$
where $RSS_p$ is the residual sum of the square using the $p$ regression in the model.
\begin{align*}
E(RSS_p)&=(n-p)\sigma^2\\
E(C_p)&\approx \frac{(n-p)\sigma^2}{\sigma^2}-(n-2p)\approx p
\end{align*}
A model that has a low $C_p$ value, about equal to $p$ is preferable.

Model Selection Criteria: Bayesian Information Criteria (BIC)

The Bayesian information Criteria is based on the likelihood function and it is closely related to the AIC. The penalty term in BIC is larger than in AIC.
$$BIC=\ln(n)k-2\ln(\hat{L})$$
where $\hat{L}$ is the maximized value of the likelihood function of the regression model.

Cross-Validation

Cross-validation is a technique where the data is split into training and testing sets. The model is trained on the training data and then evaluated on the unseen testing data. This helps assess how well the model generalizes to unseen data and avoids overfitting.

Note that no one of these criteria is necessarily superior to the others.

Read more about Correlation and Regression Analysis

Learning R Language Programming

Coefficient of Determination Formula: Quick Guide 2019

In this post, we will discuss not only the coefficient of determination formula but also the use and computation of the coefficient of determination. Coefficient of Determination as a Link between Regression and Correlation Analysis.

Coefficient of Determination $R^2$ in Statistics

The R squared ($r^2$; the square of the correlation coefficient) shows the percentage of the total variation of the dependent variable ($Y$) that can be explained by the independent (explanatory) variable ($X$). For this reason, $r^2$ (r-squared) is sometimes called the coefficient of determination.

The coefficient of Determination (R-squared is commonly used in various fields like Social Science, Finance, and Economics to evaluate the performance of the regression models. It helps the researchers to understand how well their models capture the relationship between the variables being studied.

Since

\[r=\frac{\sum x_i y_y}{\sqrt{\sum x_i^2} \sqrt{\sum y_i^2}},\]

Coefficient of Determination Formula

\begin{align*}
r^2&=\frac{(\sum x_iy_i)^2}{(\sum x_i^2)(\sum y_i^2)}=\frac{\sum \hat{y}^2}{\sum y^2}\\
&=\frac{\text{Explained Variation}}{\text{Total Variation}}
\end{align*}

where $r$ shows the degree of covariability of $X$ and $Y$. Note that the formula used here is in deviation form, that is, $x=X-\mu$ and $y=Y-\mu$.

The link of $r^2$ between regression and correlation analysis can be considered from these points.

Coefficient of Determination Formula
  • If all the observations lie on the regression line then there will be no scattered points. In other words, the total variation of variable $Y$ is explained completely by the estimated regression line, which shows that there would be no scatterness in the data points(or no unexplained variation). That is
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=0\]
    Hence, $r^2=r=1$.
  • If the regression line explains only part of the variation in variable $Y$ then there will be some explained variation, that is,
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}>0\]
    then, $r^2$ will be smaller than 1.
  • If the regression line does not explain any part of the variation of variable $Y$, that is,
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=1\Rightarrow=\sum y^2 = \sum e^2\]
    then, $r^2=0$.

Because $r^2=1-\frac{\text{unexlained variation}}{\text{total variation}}$

Key Points about Coefficient of Determination

  • Overfitting: A model can achieve a high $R^2$ value by simply memorizing the training data, but the model might not perform well on unseen data.
  • Number of Predictors: Adding more independent variables to a model will tend to increase the $R^2$ value, but it does not necessarily mean the additional variables are statistically significant.
  • Alternative Metrics: To assess the nuance of the model fit, use other metrics like adjusted R-squared or residual analysis.

Keeping in mind the limitations of R-squared, the data analysts can use the coefficient of determination as a valuable tool to assess how well their models capture real-world relationships between variables.

Note that there are two main ways to calculate R-squared value:

  1. Squared Correlation Coefficient: R-squared is the square of the correlation coefficient ($r$) between the predicted values ($\hat{y}$) from the model and the actual values of the dependent variable ($y$).
  2. Analysis of Variance (ANOVA): R-squared can also be calculated using the ratio of the explained variance to the total variance (variance in the dependent variable).
Coefficient of determination formula

FAQs about Coefficient of Determination

  1. For a simple linear regression model, what is the link between the coefficient of correlation and the coefficient of determination?
  2. How Coefficient of Determination is interpreted?
  3. How Coefficient of determination can be obtained from the ANOVA table?
  4. How overfitting can be identified from the value of $R^2$?
  5. What are alternatives to $R^2$?
  6. What is the link between total variation, explained variation, and unexplained variation?
  7. What is the impact of adding extra/ more explanatory variables in the linear regression model?
  8. What is the link between explained and unexplained variation?
  9. Give real-life examples of coefficient of determination in which it is high enough.

Learn more about the Coefficient of Determination Formula and Definition in Statistics

https://itfeature.com

Regression Model in R Programming Language

Checking Normality of Error Term (2019)

Normality of Error Term

In multiple linear regression models, the sum of squared residuals (SSR) is divided by $n-p$ (degrees of freedom, where $n$ is the total number of observations, and $p$ is the number of the parameter in the model) is a good estimate of the error variance. In the multiple linear regression model, the residual vector is

\begin{align*}
e &=(I-H)y\\
&=(I-H)(X\beta+e)\\
&=(I-H)\varepsilon
\end{align*}

where $H$ is the hat matrix for the regression model.

Each component $e_i=\varepsilon – \sum\limits_{i=1}^n h_{ij} \varepsilon_i$. Therefore, In multiple linear regression models, the normality of the residual is not simply the normality of the error term.

Note that:

\[Cov(\mathbf{e})=(I-H)\sigma^2 (I-H)’ = (I-H)\sigma^2\]

We can write $Var(e_i)=(1-h_{ii})\sigma^2$.

If the sample size ($n$) is much larger than the number of the parameters ($p$) in the model (i.e. $n > > p$), in other words, if sample size ($n$) is large enough, $h_{ii}$ will be small as compared to 1, and $Var(e_i) \approx \sigma^2$.

In multiple regression models, a residual behaves like an error if the sample size is large. However, this is not true for a small sample size.

It is unreliable to check the normality of error term assumption using residuals from multiple linear regression models when the sample size is small.

Normality of the Error Term

Learn more about Hat matrix: Role of Hat matrix in Diagnostics of Regression Analysis.

https://itfeature.com statistics help

Learn R Programming Language