Statistics for Data Science & Analytics - MCQs, Software & Data Analysis

Post Views: 1,274

In this post, we will learn about “performing Chi Square Test” in SPSS Statistics Software. For this purpose, from the ANALYSIS menu of SPSS, the crosstabs procedure in descriptive statistics is used to create contingency tables also known as two-way frequency tables, cross-tabulation, which describe the association between two categories of variables.

In a crosstab, the categories of one variable determine the rows of the contingency table, and the categories of the other variable determine the columns. The contingency table dimensions can be reported as $R\times C$, where $R$ is the number of categories for the row variables, and $C$ is the number of categories for the column variable. Additionally, a “square” crosstab is one in which the row and column variables have the same number of categories. Tables of dimensions $2 \times 2$, $3\times 3$, $4\times 4$, etc., are all square crosstab.

Performing Chi Square Test in SPSS

Let us start performing Chi Square test on cross-tabulation in SPSS, first, click Analysis from the main menu, then Descriptive Statistics, and then Crosstabs, as shown in the figure below

Performing Chi Square Test Crosstabs in SPSS

As an example, we are using the “satisf.sav” data file that is already available in the SPSS installation folder. Suppose, we are interested in finding the relationship between the “Shopping Frequency” and the “Made Purchase” variable. For this purpose, shift any one of the variables from the left pan to the right pan as row(s) and the other in the right pan as column(s). Here, we are taking “Shopping Frequency” as row(s) and “Made Purchase” as column(s) variables. Pressing OK will give the contingency table only.

The ROW(S) box is used to enter one or more variables to be used in the cross-table and Chi-Square statistics. Similarly, the COLUMNS(S) box is used to enter one or more variables to be used in the cross-table and Chi-Square statistics. Note At least one row and one column variable should be used.

The layer box is used when you need to find the association between three or more variables. When the layer variable is specified, the crosstab between the row and the column variables will be created at each level of the layer variable. You can have multiple layers of variables by specifying the first layer variable and then clicking next to specify the second layer variable. Alternatively, you can try out multiple variables as single layers at a time by putting them all in layer 1 of 1 box.

The STATISTICS button will lead to a dialog box that contains different inferential statistics for finding the association between categorical variables.

The CELL button will lead to a dialog box that controls which output is displayed in each crosstab cell, such as observed frequency, expected frequency, percentages, residuals, etc., as shown below.

Performing Chi Square test on the selected variables, click on the “Statistics” button and choose (tick) the option of “Chi-Square” from the top-left side of the dialog box shown below. Note the Chi-square check box must have a tick in it, otherwise only a cross-table will be displayed.

Press the “Continue” button and then the OK button. We will get output windows containing the cross-tabulation results in Chi-Square statistics as shown below

The Chi-Square results indicate an association between the categories of the “Sopping Frequency” variable and the “Made Purchase” variable since the p-value is smaller than say 0.01 level of significance.

For video lecture on Contingency Table and chi-square statistics, See the video lectures

See another video about the Contingency Table and Chi-Square Goodness of Fit Test

Learn How to perform data analysis in SPSS

Learn R Programming Language

Post Views: 2,730

General Purpose Statistical Software

All these Free Statistical Software provide a wide variety of statistical analyses. The following list of software is completely free and can be used in its fully functional mode.

Free Statistical Software

OpenStat

OpenStat is a general-purpose free statistical software/ package. It supports all Windows versions (Windows XP, Windows 7, Windows 8). It is also available for Linux Systems (under Wine). This software is developed by Bill Miller of Iowa State University, with a very broad range of data manipulation and analysis capabilities. It has an SPSS-like user interface. This software has excellent reference material and video tutorials.

Download Software OpenStat
Download Sample Data files
Download Help Files

OpenStat’s Tutorials

The use of OpenStat to create a file and do several analyses, Tutorial1.zip.
Importing Excel file data into OpenStat Tutorial2.zip.
Use of Multiple regression analysis with OpenStat Tutorial3.zip.
Creating a professional-looking output document in OpenStat Tutorial4.zip.
The Relationship of Analysis of Variance to Multiple Regression Tutorial5.zip.
The use of Simulation procedures in OpenStat Tutorial6.zip.
PowerPoint presentation on the use of the Select Cases in OpenStat Tutorial7.zip.
Converting string group codes to integer group codes and the Recode option, Tutorial8.zip.

SalStat-2

SalStat-2 is a free statistical software written in Python that has a graphical user interface. It is a multi-platform, easy-to-use statistical system that provides data management such as importing, editing, and pivot tables. It provides a range of Numerical Statistical Calculations such as descriptive statistics, probability functions, chi-square, t-tests, One-way ANOVA, Regression Analysis, Correlation, non-parametric tests, and Six Sigma.

It has a graphics system inherited from matplotlib and can produce bar, line, scatter, area, histogram, box and whisker, stem, adaptive, ternary scatter, normal probability, and quality control graphs.
Download Windows Version: Final Windows version S2 V2.1

SOFA (Statistics Open For All)

SOFA is an innovative statistics analysis, reporting package, and user-friendly open-source software. It is available for Windows, Mac, and Linux systems.SOFA has an emphasis on ease of use, learn-as-you-go, and beautifully formatted output. It can help you if you are a researcher, student, or data analyst

analyst, or anyone who wants to understand their data

Download SOFA

ViSta

ViSta is a Visual Statistics program that can run under Windows, Mac, and Unix, available in three languages: English, Spanish, and French. ViSta can perform univariate and multivariate visualization and data analysis. ViSta constructs very high-interaction, dynamic graphics that simultaneously show you multiple views of your data. The graphics are designed to augment your visual intuition so that you can better understand your data.

Download Windows version (English)
Download Mac version (English)
Download Unix version (English)

Visit for: Lecture Notes on Statistics and Data Analysis with Vista

PSPP

PSPP is a free replacement for SPSS, although it currently implements only a small fraction of SPSS’s analyses. But it never “expires.” It closely resembles SPSS and even reads native SPSS syntax and files!

Some features…

Supports over 1 billion cases and over 1 billion variables.
Choice of the terminal or graphical user interface
Choice of text, postscript, or HTML output formats.
Interoperates with Gnumeric, Open Office, and other free software.
Easy data import from spreadsheets, text files, and database sources.
Fast statistical procedures, even on very large data sets.
No license fees; no expiration period; no unethical “end-user license agreements”.
Fully indexed user manual.
Cross-platform; Runs on many different computers and operating systems.

Download PSPP Windows Version

OpenEpi

OpenEpi is a free web-based open-source program for use in public health and medicine. It provides several epidemiologic and statistical tools. It can be run from a web server or downloaded to run without an internet connection. The programs are written in JavaScript and HTML. It provides stratified analysis with exact confidence limit, matched pair, and person-time analysis, sample size, power test, sensitivity, R x C tables, chi-square for dose-response, etc.

Download: OpenEpi

Installation Instructions:

Download and unzip the OpenEpi.zip file. Be sure you have “OpenEpi” folder after unzipping the file; otherwise, rename it as OpenEpi.zip .
To use the OpenEpi program, find and double-click the index.html file.
Enter the data in the given required tables
Save the output from the browser’s File menu by using the Save As command.

Statext

Statext provides a nice assortment of basic statistical tests, with text output; its graph output is text-based.
Capabilities include: Data can be rearranged, transposed, and tabulated; Similarly random sample, basic descriptive, Graphs such as dot plot (text-based), box-and-whiskers plot, stem-and-leaf display, histogram, scatter-plot, Parametric tests such as finding z-values, the confidence interval for means, t-tests (one group, two groups, and paired); one- and two-way ANOVA, Pearson, Spearman and Kendall correlation, linear regression Analysis, Non-parametric tests such as Chi-square goodness-of-fit and independence tests, sign tests, Mann-Whitney U and Kruskal-Wallis H tests, probability tables such as z, t, Chi-square, F, U, random number generator, Central Limit Theorem, and Chi-square distribution.

Download Statext

You can also buy Statext software at a cheap price.

MicrOsiris

MicrOsiris is a comprehensive statistical and data management package for Windows, derived from the OSIRIS IV package developed at the University of Michigan. MicrOsiris has special statistical techniques for data mining and analysis of nominal, ordinal, and scaled data.

It can handle any size data set. It has an Excel-type of data entry. SPSS, SAS, and Stats data sets can be imported or exported. MicrOsiris reads ICPSR (OSIRIS) and UNESCO (IDAMS) datasets, an interactive decision tree for selecting appropriate tests, database manipulation, extensive statistics such as scatter-plot, cross-tabs, ANOVA/MANOVA, log-linear, correlation/regression, performs logistic, linear, Tobit, Poisson, and proportional hazard regression, cluster, factor, MINISSA, item analysis, survival analysis, and internal consistency. Fully functional and freeware.

Download MicrOsiris

Gnumeric

Gnumeric is a free, fast, and accurate high-powered spreadsheet with better statistical features than Microsoft Excel. It has about 60 extra functions as compared to Excel, with basic support for financial derivatives including Black Scholes and telecommunication engineering-related problem functions, advanced statistical analysis tools, extensive random number generation techniques, linear and non-linear solvers, implicit intersection and iteration, goal seek, and Monte Carlo simulation tools. It also has many features of Excel, such as autofill, automatic input guess, batch process import, and export from and to different file formats.

Download Gnumeric (Linux Version)

Statist

Statist is a compact, portable program having most of the basic statistical capabilities such as data manipulation (recoding, transforming, selecting), descriptive statistics (including histograms, and box plots), correlation and regression analysis, and common significance tests such as chi-square, t-test, etc. This free Statistical software is written in the C Language. (Its source code is also available for improvement and further update.) This software can run on Unix/Linux, Windows, and Mac, among other operating systems. Statist is simple to use and can be run in scripts. It also handles Big data sets well on small machines.

To download this software, get registered on this site and register as a site user

Tanagra

Tanagra is an (open-source) free statistical software for data mining for academic and research purposes, supporting the standard
The “stream diagram” paradigm is used by most data-mining systems. This software contains components for the Data sources (tab-delimited text),
Visualization (grid, scatter-plots), Descriptive statistics (cross-tab, ANOVA, correlation), Instance selection (sampling, stratified),
Feature selection and construction, Multiple Linear Regression, Factorial analysis (principal components, multiple correspondences (K-means, SOM, LVQ, HAC), Supervised learning (logistic regression, k-NN, multi-layer perceptron, prototype-NN, ID3,
discriminant analysis, naive Bayes, radial basis function), Meta-spv learning (instance Spv, arcing, boosting, bagging), Learning assessment
(train-test, cross-validation), and Association (Agrawal a priori).

Download Link Download (XP, Vista, Win 7)

Dap

DAP is a statistics and graphics package developed by Susan Bassein for Unix and Linux systems, with necessary and common data management facilities. It helps to conduct Statistical analysis such as univariate statistics, correlations and regression, ANOVA, categorical data analysis, logistic regression, and nonparametric analyses. DAP provides some of the core functionality of SAS and can read and run many SAS program files (but not all).

Download Dap

AM Statistical Software

AM is a free statistical software package for analyzing data from complex samples, especially large-scale assessments, as well as non-assessment survey data. AM has advanced statistical tools, an easy drag-and-drop interface, and an integrated help system that explains the statistics as well as how to use the software. It can estimate statistical models via marginal maximum likelihood (MML), which defines a probability distribution over the proficiency scale. It also analyzes “plausible values” used in programs like NAEP. AM automatically provides appropriate standard errors for complex samples via Taylor-series approximation, jackknife & other replication techniques. This software also offers a set of non-MML statistics, including regression, probit, logit, cross-tabs, and other statistics that are useful for survey data in general.

You can download the AM Statistical software.

Instat Plus

Instat Plus is a statistical software computing package from the University of Reading, at the statistical service center, in the UK.
(Do not confuse it with Instant from GraphPad Software.) It is an interactive statistics package for Windows or DOS. This statistical software is simple and useful in teaching statistical ideas and has the power to assist the researcher in any discipline that requires the analysis of data. Instat includes many special facilities for the processing of climatic data.

Download Instat Plus

SSP

SSP (Smith’s Statistical Package) is a simple, user-friendly statistical software package available for both Mac and Windows operating systems. SSP software helps enter, edit, transform, import, and export the data. It can calculate basic summaries, prepare charts, evaluate distribution function probabilities, and perform simulations. Many inferential statistics tests are available, such as comparing means and proportions tests, ANOVA’s, Chi-Square tests, and simple & multiple regression analyses.

Download SSP Windows Version
Download SSP Mac Version

Dataplot

Dataplot software systems are available for Unix, Linux, PC-DOS, and Windows operating systems for scientific visualization, statistical analysis, and non-linear modeling. It has extensive mathematical and graphical capabilities. The target Dataplot user is the researcher and analyst engaged in the characterization, modeling, visualization, analysis, monitoring, and optimization of scientific and engineering processes. Closely integrated with the NIST /SEMATECH Engineering Statistics Handbook.

Download Dataplot Windows Version

Regress+

Regress+ is a professional statistical software package for performing univariate mathematical modeling (equations and distributions). The most powerful software of its kind available anywhere, with state-of-the-art functionality and user-friendliness. It has 21 built-in equations and 59 built-in distributions.

Download Regress+
Download Compendium of Common Probability Distributions

SISA

SISA is a simple Interactive Statistical Analysis for PC (DOS), i.e., for the Windows operating system from Daan Uitenbroek. There is an
excellent collection of individual windows and DOS modules for several statistical calculations, including some analyses not readily available elsewhere.

Download the SISA Windows Program

These Windows programs contain a certain procedure that performs a specific statistical analysis.

lifetables
It helps to perform Mortality Analysis for Demography and Epidemiology. The Lifetables program calculates the life expectancy, including all intermediary statistics, variance, a confidence interval for the life expectancy, Potential Gains in Life Expectancy (PGLE), Years of Potential Life Lost (YPLL), and Lifetime Years of Potential Life Lost (LYPLL).
Distributions
The SISA-Distributions program allows the user to analyze discrete single-dimensional distributions. The program is based on various manipulations of the Poisson, binomial, and hypergeometric distributions. Available are the probability of an observed number of cases for the certain null hypothesis, the calculation of exact Poisson, binomial, or hypergeometric confidence intervals, the exact and approximate size of a population using catch-recatch methodologies, the full analysis of a Poisson distributed rate ratio, Fieller analysis, and two versions of the negative binomial distribution can be used in various ways.
Multinomial
The multinomial program is the exact solution to the Chi-square Goodness of fit test for testing a difference between an observed and an expected distribution in a one-dimensional array. For the two-category array, the multinomial test provides a two-sided solution for the Binomial test. The multinomial allows you to work with empty ‘0’ observation cells, although you must expect a cell.
Tables
SISA-Tables is a program for the analysis of tables with up to 2*7 and 3*3 cells. This program (Tables) allows for exact and approximate
statistics. Fisher exact, Number Needed to Treat, Proportional Reduction in Error Statistics, Normal Approximations, Four different Chi-squares, Gamma, Odds-ratio, t-tests, and Kappa are among the many statistical procedures available in the Tables Program.
Weighting
The weighting program by SISA calculates sample weights according to the cell weight procedure. The design factor and the effective sample size for the resulting set of weights are determined. It is possible to specify a value above which extreme weights will be trimmed. The not-trimmed weights will be recalculated.
Intra Correlation
The intra-correlation program from SISA calculates intra-correlations and design effects for clustered samples where the outcome measure is
the number of positive responses per cluster. Confidence intervals and other statistics corrected for design effects can be calculated.
This program helps to compare two groups of clusters with a t-test procedure.

JASP

JASP is an open-source Free Statistical Software by the University of Amsterdam. It has a user-friendly interface. JASP offers standard statistical analysis routines in both classical and Bayesian forms.

Download: https://jasp-stats.org/download/

Jamovi

Jamovi is another free statistical software designed to be easy to use and as a good alternative to other costly statistical software such as SAS, Minitab, and SPSS. It is integrated with the R language. Jamovi is made by the scientific community. Jamovi is available in both Desktop and Cloud versions.

Jamovi Cloud Version
Jamovi Desktop Version

Develve

Develve is a free statistical software for experimental data. It is equipped with basic statistics, graphical representation of data, Inferential statistics, non-parametric statistics, and many designs of experiment-related statistics.
Download: Develve

Spreadsheets

There are two spreadsheets available: a spreadsheet that does demographic analysis and another spreadsheet for the calculation of intracorrelation coefficients. The spreadsheets are in Microsoft Excel file format; If you have MS Excel installed on the computer, your computer will start up Excel and load the spreadsheet into Excel automatically after you double-click the procedure name.

Lifetable
This life table spreadsheet does a full abridged current life table analysis to obtain the life expectancy of a population. Furthermore, one can calculate Potential Gains in Life Expectancy (PGLE) after removing cause k, considering competing causes of death; the (Premature) Years of
Potential Life Lost (YPLL), the Standardized Mortality Ratio (SMR), standardized numbers per 100,000, and the Comparative Mortality Figure (CMF) can also be calculated.
Discounted YPLL
This spreadsheet contains the procedure to discount the YPLL if you only have mortality by age.
Intra Correlation
The spreadsheet performs intra-correlation calculations for dichotomous (binary yes/no) type outcome variables according to two different methods proposed for the single cluster, one by Fleiss and another by Bennett et.al. A third spreadsheet concerns a method for two clusters by Donner and Klar.
Distributions
22 spreadsheets demonstrate various statistical distributions such as Beta, Binomial, Normal, Poisson, Pareto, etc.

Online Data Analysis Programs

The programs below are for use on the Internet and are performed directly on the Internet. These procedures (programs) are very fast, and a study guide is available.

Hypergeometric
This procedure calculates the hypergeometric probability distribution to evaluate the hypothesis with sampling without replacing in small populations i.e., (hypergeometric distribution)
Binomial
This procedure calculates probabilities for sampling with replacement in small populations or without replacement in a very large population. It can be used to approximate the hypergeometric distribution.
Poisson
This procedure calculates probabilities for samples that are very large in an even larger population. It can be used to approximate the binomial distribution.
Negative Binomial 1
It is used to study accidents, and is a more general case than Poison, it considers the probability of getting into accidents if accidents cluster differently in subgroups of the population.
Negative Binomial 2
Another version of the negative binomial, this one is used to calculate the marginal distribution of binomials. Often used to predict the termination of real-time events. Such as the probability of terminating listening to a non-answering phone after n rings.
Fisher
Is used to calculate the exact p-value for 2*2 contingency tables. Use the Fisher exact instead of the Chi-square when you have a small value in one cell or a very uneven marginal distribution.
Chi-Square
This DOS procedure calculates the Chi-square and some other measures for two-dimensional tables.
Downloadable Programs: see the list of different downloadable programs.

Statistical Software by Paul W. Mielke Jr.

Free Statistical Software by Paul W. Mielke Jr. has a large collection of executable DOS programs and FORTRAN sources. It contains Matrix occupancy, exact g-sample empirical coverage test, and interactions of exact analyses. It also contains spectral decomposition analysis, randomized block (exact mrbp) analyses, exact multi-response permutation procedure, Fisher’s Exact for cross-classification, and goodness-of-fit test. Furthermore, Fisher’s combined p-values i.e., meta-analysis, largest part’s proportion test, Pearson-Zelterman test, Greenwood-Moran, and Kendall-Sherman
Goodness-of-fit runs tests. The advanced statistical procedures include multivariate Hotelling’s test, least-absolute-deviation regression analysis, sequential permutation procedures, LAD regression, principal component analysis, matched pair permutation, r-by-c contingency tables, r-way contingency tables, and Jonkheere-Terpstra.

Download Free Statistical Software by Paul W. Mielke Jr. (Windows Version)
Download Free Statistical Software by Paul W. Mielke Jr. (Unix Version)

If any link is broken or not working, please let me know about it. Also, if you have the web address of any free statistical software, inform me I will update the list.

RegressIt (MS-Excel add-in)

RegressIt is a powerful free Excel add-in that performs multivariate descriptive data analysis and regression analysis with high-quality table and chart output. It’s an excellent tool for instructors who are running online data analysis exercises using platforms such as Zoom. The software includes built-in documentation, and it can embed regression teaching notes in output worksheets in the form of cell comments. It also has some innovative auditing tools that allow instructors to easily review and verify the originality of the complete analysis carried out by every student in a class. RegressIt also has a unique interface with R that allows Excel to be used as a front end for running very detailed linear and logistic regression analyses in R, and which also allows R to be used as a computational engine for running models in Excel. Visit https://regressit.com for complete details and free downloading of the software.

Statistical Software

R Programming Language

Performing Chi Square test from Crosstabs in SPSS

Performing Chi Square Test in SPSS

Measure of Association: Contingency Table (2019)

Contingency Table: A Measure of Association

Free Statistical Software

General Purpose Statistical Software

Free Statistical Software

OpenStat

SalStat-2

SOFA (Statistics Open For All)

ViSta

PSPP

OpenEpi

Statext

MicrOsiris

Gnumeric

Statist

Tanagra

Dap

AM Statistical Software

Instat Plus

SSP

Dataplot

Regress+

SISA

Download the SISA Windows Program

JASP

Jamovi

Develve

Spreadsheets

Online Data Analysis Programs

Statistical Software by Paul W. Mielke Jr.

RegressIt (MS-Excel add-in)

Statistical Software

Performing Chi Square Test in SPSS

Share this:

Contingency Table: A Measure of Association

Share this:

General Purpose Statistical Software

Free Statistical Software

OpenStat

SalStat-2

SOFA (Statistics Open For All)

ViSta

PSPP

OpenEpi

Statext

MicrOsiris

Gnumeric

Statist

Tanagra

Dap

AM Statistical Software

Instat Plus

SSP

Dataplot

Regress+

SISA

Download the SISA Windows Program

JASP

Jamovi

Develve

Spreadsheets

Online Data Analysis Programs

Statistical Software by Paul W. Mielke Jr.

RegressIt (MS-Excel add-in)

Statistical Software

Share this: