Category: Statistical Package for Social Science (SPSS)

Statistics Software, SPSS, Eviews, Gretl, SAS, Stata, R Language, MS-Excel

Performing Chi-Square test from Crosstabs in SPSS

From the ANALYSIS menu of SPSS, the crosstabs procedure in descriptive statistics is used to create contingency tables also known as two-way frequency table, cross tabulation, which describe the association between two categories variables.

In a crosstab, the categories of one variable determine the rows of the contingency table, and the categories of the other variable determine the columns. The contingency table dimensions can be reported as $R\times C$, where $R$ is the number of categories for the row variables, and $C$ is the number of categories for the column variable. Additionally, a “square” crosstab is one in which the row and column variables have the same number of categories. Tables of dimensions $2 \times 2$, $3\times 3$, $4\times 4$, etc., are all square crosstab.

To perform Chi-Square test on cross-tabulation in SPSS, first click Analysis from main menu, then Descriptive Statistics and then crosstabs, as shown in figure below

Crosstabs in SPSS

As an example, we are using “satisf.sav” data file that is already available in SPSS installation folder. Suppose, we are interested in finding the relationship between “Shopping Frequency” and “Made Purchase” variable. For this purpose, shift any one of the variable from left pan to the right pan as row(s) and the other in right pan as column(s). Here, we are taking “Shopping Frequency” as row(s) and “Made Purchase” as column(s) variable. Pressing OK will give the contingency table only.

Crosstabs in SPSS

The ROW(S) box is used to enter one or more variables to be used in the cross-table and Chi-Square statistics. Similarly, the COLUMNS(S) box is used to enter one or more variables to be used in the cross-table and Chi-Square statistics. Note At least one row and one column variable should be used.

When you need to find the association between three or more variables the layer box is used. When the layer variable is specified, the crosstab between the row and the column variables will be created at each level of the layer variable. You can have multiple layers of variables by specifying the first layer variable and then clicking next to specify the second layer variable. Alternatively, you can try out multiple variables as single layers at a time by putting them all in layer 1 of 1 box.

The STATISTICS button will lead to a dialog box which contains different inferential statistics for finding the association between categorical variables.

The CELL button will lead to a dialog box which controls which output is displayed in each cell of the crosstab, such as observed frequency, expected frequency, percentages, and residuals, etc., as shown below.

Crosstabs cell display

To perform the Chi-Square test on the selected variables, click on “Statistics” button and choose (tick) the option of “Chi-Square” from the top-left side of the dialog box shown below. Note the Chi-square check box must have tick in it, otherwise only cross-table will be displayed.

Crosstabs Chi-Square Statistics in SPSS

Press “Continue” button and then OK button. We will get output windows containing the cross-tabulation results in Chi-Square statistics as shown below

Crosstabs output SPSS windows

The Chi-Square results indicate that there is association between categories of “Sopping Frequency” variable and “Made Purchase” variable, since, p-value is smaller than say 0.01 level of significance.

For video lecture on Contingency Table, Chi-Square statistics, See the video lectures

How to perform Select Cases in SPSS

Sometimes you may be interested in analyzing the specific part (subpart) of the available dataset. For example, you may be interested to get descriptive or inferential statistics for males and females separately. One may also be interested in a certain age range or may want to study (say) only non-smokers. In such cases, one may use a select case option in SPSS.

Example and Step by Step procedure

For illustrative purposes, I am using the “customer_dbase” file available in SPSS sample data files. I am assuming the gender variable to select male customers only and will present some descriptive statistics only males. For this purpose follow these steps:

Step 1: Go to the Menu bar, select “Data” and then “Select Cases”.

Select Case

Step 2: A new window called “Select Cases” will open.

Step 3: Tick the box called “If condition is satisfied” as shown in the figure below.

Select Case Dialog box

Step 4: Click on the button “If” highlighted in the above picture.

Step 5: A new window called “Select Cases: If” will open.

Select cases: if dialog box

Step 6: The left box of this dialog box contains all the variables from the data view. Choose the variable (using the left mouse button) that you want to select cases for and use the “arrow” button to move the selected variable to the right box.

Step 7: In this example, the variable gender (for which we want to select only men) is shifted from the left to the right box. In the right box, write “gender=0” (since men have the value 0 code in this dataset).

Select Case if: with condition

Step 8: Click on Continue and then the OK button. Now, only men are selected (and the women’s data values are temporarily filtered out from the dataset).

Note: To “re-select” all cases (complete dataset), you carry out the following steps:

Step a: Go to the Menu bar, choose “Data” and then “Select Cases”.

Step b: From the dialog box of “Select Cases”, tick the box called “All cases”, and then click on the OK button. 

When you use the select cases tool in SPSS, a new variable called “filter” will be created in the dataset. Deleting this filter variable, the selection will disappear. The “un-selected” cases are crossed over in the data view windows.

Select Case: filter variable in data view
Select case in data view

Note: The selection will be applied to everything you do from the point you select cases until you remove the selection. In other words, all statistics, tables, and graphs will be based only on the selected individuals until you remove (or change) the selection.

There is another kind of selection too. For example, the random sample of cases, based on time or case range, and use the filter variable. The selected case can be copied to a new dataset or unselected cases can be deleted. For this purpose choose the appropriate option from the output section of the select cases dialog box.

Select cases: output

For other SPSS tutorials Click the links below

Cronbach’s Alpha Reliability Analysis of Measurement Scales

Reliability analysis is used to study the properties of measurement scales (Likert scale questionnaire) and the items (questions) that make them up. The reliability analysis method computes a number of commonly used measures of scale reliability. The reliability analysis also provides information about the relationships between individual items in the scale. The intraclass correlation coefficients can be used to compute the interrater reliability estimates.

Consider that you want to know that does my questionnaire measures the customer satisfaction in a useful way? For this purpose, you can use the reliability analysis to determine the extent to which the items (questions) in your questionnaire are correlated with each other. The overall index of the reliability or internal consistency of the scale as a whole can be obtained. You can also identify problematic items that should be removed (deleted) from the scale.

As an example open the data “satisf.save” already available in SPSS sample files. To check the reliability of Likert scale items follows the steps given below:

Step 1: On the Menu bar of SPSS, Click Analyze > Scale > Reliability Analysis… option
Reliability SPSS menu


Step 2: Select two more variables that you want to test and shift them from left pan to right pan of reliability analysis dialogue box. Note, multiple variables (items) can be selected by holding down the CTRL key and clicking the variable you want. Clicking the arrow button between the left and right pan will shift the variables to the item pan (right pan).
Reliability Analysis Dialog box
Step 3: Click on the “Statistics” Button to select some other statistics such as descriptives (for item, scale and scale if item deleted), summaries (for means, variances, covariances and correlations), inter-item (for correlations and covariances) and Anova table (for none, F-test, Friedman chi-square and Cochran chi-square) statistics etc.

Reliability Statistics

Click on the “Continue” button to save the current statistics options for analysis. Click the OK button in the Reliability Analysis dialogue box to get analysis to be done on selected items. The output will be shown in SPSS output windows.

Reliability Analysis Output

The Cronbach’s Alpha Reliability ($\alpha$) is about 0.827, which is good enough. Note that, deleting the item “organization satisfaction” will increase the reliability of remaining items to 0.860.

A rule of thumb for interpreting alpha for dichotomous items (questions with two possible answers only) or Likert scale items (question with 3, 5, 7, or 9 etc items) is:

  • If Cronbach’s Alpha is $\ge 0.9$, the internal consistency of scale is Excellent.
  • If Cronbach’s Alpha is $0.90 > \alpha \ge 0.8$, the internal consistency of scale is Good.
  • If Cronbach’s Alpha is $0.80 > \alpha \ge 0.7$, the internal consistency of scale is Acceptable.
  • If Cronbach’s Alpha is $0.70 > \alpha \ge 0.6$, the internal consistency of scale is Questionable.
  • If Cronbach’s Alpha is $0.60 > \alpha \ge 0.5$, the internal consistency of scale is Poor.
  • If Cronbach’s Alpha is $0.50 > \alpha $, the internal consistency of scale is Unacceptable.

However, the rules of thumb listed above should be used with caution. Since Cronbach’s Alpha reliability is sensitive to the number of items in a scale. A larger number of questions can results in a larger Alpha Reliability, while a smaller number of items may result in smaller $\alpha$.

Independent Sample t test using SPSS

Introduction

A t-test for independent groups is useful when the same variable has been measured in two independent groups and the researcher wants to know whether the difference between group means is statistically significant. “Independent groups” means that the groups have different people in them and that the people in the different groups have not been matched or paired in any way.

Objectives

The independent t-test compares the means of two unrelated/independent groups measured on the Interval or ratio scale. The SPSS t-test procedure allows the testing of the hypothesis when variances are assumed to be equal or when are not equal and also provides the t-value for both assumptions. This test also provides the relevant descriptive statistics for both of the groups.

Assumptions

  • Variable can be classified in two groups independent of each other.
  • Variable is Measured on interval or ratio scale.
  • Measured variable is approximately normally distributed
  • Both groups have similar variances  (variances are homogeneity)

Data

Suppose a researcher wants to discover whether left and right-handed telephone operators differed in the time it took them to answer calls. The data for reaction time were obtained (RT’s measured in seconds):

Subject no. RTs (Left) Subject no. RTs (Right)
1 500 11 392
2 513 12 445
3 300 13 271
4 561 14 523
5 483 15 421
6 502 16 489
7 539 17 501
8 467 18 388
9 420 19 411
10 480 20 467
Mean

476.5

 

430.8

Variance Ŝ2

5341.167

 

5298.84

The mean reaction times suggest that the left-handers were slower but do a t-test confirm this?

Independent Sample t Test using SPSS

Perform the following step by running the SPSS and entering the data set in the SPSS data view

  1. Click Analyze > Compare Means > Independent-Samples T Test… on the top menu as shown below.
Menu option for independent sample t test
Menu option for independent sample t-test
  • Select continuous variables that you want to test from the list.
  • Dialog box for independent sample t test
    The dialog box for independent sample t-test
  • Click on the arrow to send the variable in the “Test Variable(s)” box. You can also double click the variable to send it in “Test Variable” Box.
  • Select the categorical/grouping variable so that group comparison can be made and send it to the “Grouping Variable” box.
  • Click on the “Define Groups” button. A small dialog box will appear asking about the name/code used in variable view for the groups. We used 1 for males and 2 for females. Click Continue button when you’re done. Then click OK when you’re ready to get the output.  See the Pictures for Visual view.
  • Define Group for Independent sample t test
    Define Group for Independent sample t-test

    Output

    Independent sample t test output
    Independent sample t-test output

    First Table in output is about descriptive statistics concerning your variables. Number of observations, mean, variance, and standard error is available for both of the groups (male and female)

    The second Table in output is an important one concerning the testing of the hypothesis. You will see that there are two t-tests. You have to know which one to use. When comparing groups having approximately similar variances use the first t-test. Levene’s test checks for this. If the significance for Levene’s test is 0.05 or below, then it means that the “Equal Variances Not Assumed” test should be used (the second one), Otherwise use the “Equal Variances Assumed” test (first one).  Here the significance is 0.287, so we’ll be using the “Equal Variances” first row in the second table.

    In the output table “t” is calculated t-value from test statistics, from example, t-value is 1.401

    df stands for degrees of freedom, in the example, we have 18 degrees of freedom

    Sig (two-tailed) means two-tailed significance value (P-Value), for example, sig value is greater than 0.05 (significance level).

    Decision

    As the P-value of 0.178 is greater than our 0.05 significance level we fail to reject the null hypothesis. (two-tailed case)

    As the P-value of 0.089 is greater than our 0.05 significance level we fail to reject the null hypothesis. (one tail case with 0.05 significance level)

    As the