# Category: Statistical Softwares

## Performing Chi-Square test from Crosstabs in SPSS

From the ANALYSIS menu of SPSS, the crosstabs procedure in descriptive statistics is used to create contingency tables also known as two-way frequency table, cross tabulation, which describe the association between two categories variables.

In a crosstab, the categories of one variable determine the rows of the contingency table, and the categories of the other variable determine the columns. The contingency table dimensions can be reported as $R\times C$, where $R$ is the number of categories for the row variables, and $C$ is the number of categories for the column variable. Additionally, a “square” crosstab is one in which the row and column variables have the same number of categories. Tables of dimensions $2 \times 2$, $3\times 3$, $4\times 4$, etc., are all square crosstab.

To perform Chi-Square test on cross-tabulation in SPSS, first click Analysis from main menu, then Descriptive Statistics and then crosstabs, as shown in figure below

As an example, we are using “satisf.sav” data file that is already available in SPSS installation folder. Suppose, we are interested in finding the relationship between “Shopping Frequency” and “Made Purchase” variable. For this purpose, shift any one of the variable from left pan to the right pan as row(s) and the other in right pan as column(s). Here, we are taking “Shopping Frequency” as row(s) and “Made Purchase” as column(s) variable. Pressing OK will give the contingency table only.

The ROW(S) box is used to enter one or more variables to be used in the cross-table and Chi-Square statistics. Similarly, the COLUMNS(S) box is used to enter one or more variables to be used in the cross-table and Chi-Square statistics. Note At least one row and one column variable should be used.

When you need to find the association between three or more variables the layer box is used. When the layer variable is specified, the crosstab between the row and the column variables will be created at each level of the layer variable. You can have multiple layers of variables by specifying the first layer variable and then clicking next to specify the second layer variable. Alternatively, you can try out multiple variables as single layers at a time by putting them all in layer 1 of 1 box.

The STATISTICS button will lead to a dialog box which contains different inferential statistics for finding the association between categorical variables.

The CELL button will lead to a dialog box which controls which output is displayed in each cell of the crosstab, such as observed frequency, expected frequency, percentages, and residuals, etc., as shown below.

To perform the Chi-Square test on the selected variables, click on “Statistics” button and choose (tick) the option of “Chi-Square” from the top-left side of the dialog box shown below. Note the Chi-square check box must have tick in it, otherwise only cross-table will be displayed.

Press “Continue” button and then OK button. We will get output windows containing the cross-tabulation results in Chi-Square statistics as shown below

The Chi-Square results indicate that there is association between categories of “Sopping Frequency” variable and “Made Purchase” variable, since, p-value is smaller than say 0.01 level of significance.

For video lecture on Contingency Table, Chi-Square statistics, See the video lectures

## General Purpose Statistical Software

All these Statistical Software provides a wide variety of statistical analysis. The following list of software is completely free and can be used in its fully functional mode.

### OpenStat

OpenStat is a general-purpose free statistical software/ package. It supports all Windows versions (Windows XP, Windows 7, Windows 8). It is also available for Linux Systems (under Wine). This software is developed by Bill Miller of Iowa State U, with a very broad range of data manipulation and analysis capabilities. It has SPSS like user interface. This software has excellent reference material and video tutorials.

OpenStat’s Tutorials

• The use of OpenStat to create a file and do several analyses Tutorial1.zip.
• Importing Excel file data into OpenStat Tutorial2.zip.
• Use of Multiple regression analysis with OpenStat Tutorial3.zip.
• Creating a professional-looking output document in OpenStat Tutorial4.zip.
• The relationship of Analysis of Variance to Multiple Regression Tutorial5.zip.
• The use of Simulation procedures in OpenStat Tutorial6.zip.
• PowerPoint presentation on the use of the Select Cases in OpenStat Tutorial7.zip.
• Converting string group codes to integer group codes and the Recode option, Tutorial8.zip.

### SalStat-2

SalStat-2 is a statistical software written in Python Language having a graphical user interface. It is a multi-platform, easy-to-use statistical system that provides data management such as importing, editing, pivot tables. It provides a range of Numerical Statistical Calculations such as descriptive statistics, probability functions, chi-square, t-tests, One-way ANOVA, Regression Analysis, Correlation, non-parametric tests, Six-Sigma.

It has a graphics system inherited from matplotlib and can produce bar, line, scatter, area, histogram, box&whisker, stem, adaptive, ternary scatter, normal probability, quality control graphs.

### SOFA (Statistics Open For All)

SOFA is an innovative statistics analysis, reporting package user-friendly open-source software. It is available for Windows, Mac, and Linux systems.SOFA Has an emphasis on ease of use, learn as you go, and beautifully formatted output. It can help you if you are a researcher, student, data analyst or anyone who want to understand their data

### ViSta

ViSta is a Visual Statistics program that can run under Windows, Mac, and Unix available in three languages English, Spanish, and French. ViSta can perform univariate and multivariate visualization and data analysis. ViSta constructs very-high-interaction, dynamic graphics that show you multiple views of your data simultaneously. The graphics are designed to augment your visual intuition so that you can better understand your data.

### PSPP

PSPP is a free replacement for SPSS although at this time it implements only a small fraction of SPSS’s analyses. But it never “expire”. It closely looks like SPSS, and even reads native SPSS syntax and files!

Some features…

• Supports over 1 billion cases and over 1 billion variables.
• Choice of the terminal or graphical user interface
• Choice of text, postscript or HTML output formats.
• Inter-operates with Gnumeric, Open Office, and other free software.
• Easy data import from spreadsheets, text files, and database sources.
• Fast statistical procedures, even on very large data sets.
• No license fees; no expiration period; no unethical “end-user license agreements”.
• Fully indexed user manual.
• Cross-platform; Runs on many different computers and operating systems.

### OpenEpi

OpenEpi is a free web-based open-source program for use in public health and medicine. It provides a number of epidemiologic and statistical tools. It can be run from a web server or downloaded to run without an internet connection. The programs are written in JavaScript and HTML. It provides stratified analysis with exact confidence limit, matched pair and person-time analysis, sample size, power test, sensitivity, R x C tables, chi-square for dose-response, etc.

Installation Instructions:

• Download and unzip the OpenEpi.zip file. Be sure you have “OpenEpi” folder after unzipping the file, Otherwise rename it as OpenEpi.zip .
• To use the OpenEpi program find and double click the index.html file.
• Enter the data in given required tables
• Save the output from the browser’s File menu by using Save as the command.

### Statext

Statext Provides a nice assortment of basic statistical tests, with text output its graph output is text-based.
Capabilities include: Data can be rearranged, transposed, and tabulated; Similarly random sample, basic descriptive, Graphs such as dot plot
(text-based), box-and-whiskers plot, stem-and-leaf display, histogram, scatter-plot, Parametric tests such as find z-values, the confidence interval for means, t-tests (one group, two groups, and paired); one- and two-way ANOVA, Pearson, Spearman and Kendall correlation, linear regression Analysis,
Non-Parametric tests such as Chi-square goodness-of-fit and independence tests, sign test, Mann-Whitney U and Kruskal-Wallis H tests,
probability tables such as z, t, Chi-square, F, U, random number generator, Central Limit Theorem, Chi-square distribution.

You can also buy Statext software at a cheap price.

### MicrOsiris

MicrOsiris is a comprehensive statistical and data management package for Windows, derived from the OSIRIS IV package developed at the University of Michigan. MicrOsiris have special statistical techniques for data mining and analysis of nominal, ordinal, and scaled data.

It can handle any size data set. It has an Excel type of data entry. SPSS, SAS, and Stats data sets can be imported or exported. MicrOsiris reads ICPSR (OSIRIS) and UNESCO (IDAMS) datasets, interactive decision tree for selecting appropriate tests, database manipulation extensive statistics such as scatter-plot, cross-tabs, ANOVA/MANOVA, log-linear, correlation/regression, perform logistic, linear, Tobit, Poisson, and proportional hazard regression, cluster, factor, MINISSA, item analysis, survival analysis, internal consistency. Fully functional and freeware.

MicrOsiris requires a processor which includes the SSE3 instructions. Although all modern processors have SSE3 instructions, if you are in doubt you can download and run CPU_Z software to find out if your computer has these instructions. See MicrOsiris Guide

### Gnumeric

Gnumeric is a free, fast, and accurate high-powered spreadsheet with better statistical features than Microsoft Excel. It has about 60 extra functions as compared to Excel, with basic support for financial derivatives including Black Scholes and telecommunication engineering related problem’s function, advanced statistical analysis tools, extensive random number generation techniques, linear and non-linear solvers, implicit intersection and iteration, goal seek, and Monte Carlo simulation tools. It also has many features of Excel such as autofill, automatic input guess, batch process import, and export from and to the different file formats.

Genumeric Tutorials

### Statist

Statist is a compact, portable program having most of the basic statistical capabilities such as data manipulation (recoding, transforming, selecting), descriptive statistics (including histograms, box plots), correlation and regression analysis, and the common significance tests such as chi-square, t-test, etc. This Statistical software is written in C Language. (Its source code is also available for the improvement and further update). This software can run on Unix/Linux, Windows, Mac, among other operating systems. Statist is simple to use and can be run in scripts. It also handles Big data sets well on small machines.

To Download this software get register on this site get registered as a site user

### Tanagra

Tanagra is a free (open source) statistical software for data mining for academic and research purposes, supporting the standard
The “stream diagram” paradigm is used by most data-mining systems. This software contains components for the Data sources (tab-delimited text),
Visualization (grid, scatter-plots), Descriptive statistics (cross-tab, ANOVA, correlation), Instance selection (sampling, stratified),
Feature selection and construction, Multiple Linear Regression, Factorial analysis (principal components, multiple correspondings (K-means, SOM, LVQ, HAC), Supervised learning (logistic regression, k-NN, multi-layer perceptron, prototype-NN, ID3,
discriminant analysis, naive Bayes, radial basis function), Meta-spv learning (instance Spv, arcing, boosting, bagging), Learning assessment
(train-test, cross-validation), and Association (Agrawal a-priori).

Functionalities Tanagra Functionalities

### Dap

Dap is a statistics and graphics package developed by Susan Bassein for Unix and Linux systems, with necessary and common data management facilities. It helps to conduct Statistical analysis such as univariate statistics, correlations and regression, ANOVA, categorical data analysis, logistic regression, and nonparametric analyses. Dap Provides some of the core functionality of SAS and is able to read and run many SAS program files (but not all).

### AM Statistical Software

AM is a free statistical package for analyzing data from complex samples, especially large-scale assessments, as well as non-assessment survey data. AM has advanced statistical tools, an easy drag & drop interface, and an integrated help system that explains the statistics as well as how to use the software. It can estimate statistical models via marginal maximum likelihood (MML), which defines a probability distribution over the proficiency scale. It also analyzes “plausible values” used in programs like NAEP. AM automatically provides appropriate standard errors for complex samples via Taylor-series approximation, jackknife & other replication techniques. This software also offers a set of non-MML statistics, including regression, probit, logit, cross-tabs, and other statistics that are useful for survey data in general.

### Instat Plus

Instat Plus is a statistical computing package from the University of Reading, at the statistical service center, in the UK.
(do not confuse it with Instat from GraphPad Software.) It is an interactive statistics package for Windows or DOS. This statistical software is simple and useful in teaching statistical ideas and has the power to assist the researcher in any discipline that requires the analysis of data. Instat includes many special facilities for the processing of climatic data.

### WinIDAMS

WinIDAMS is free statistical software from UNESCO and Information processing tools, for numerical information processing and
statistical data analysis. WinIDAMS Provides data manipulation and validation facilities for classical and advanced statistical
techniques (table building, regression analysis, one-way analysis of variance, etc.), including interactive construction of multidimensional
tables, graphical exploration of data set such as 3D scattergram spinning, etc., time series analysis, and a large number REmultivariate
techniques such as discriminant analysis, cluster analysis, principal components factor analysis, and analysis of correspondences, partial order scoring, the rank ordering of alternatives, segmentation, and iterative typology.

### SSP

SSP (Smith’s Statistical Package) is a simple, user-friendly statistical package available for both Mac and Windows operating systems. SSP software helps for entering, editing, transforming, importing, and exporting the data. It can calculate basic summaries, prepare charts, evaluate distribution function probabilities, and can perform simulations. Many inferential statistics test are available such as compare means and proportions test, ANOVA’s, Chi-Square tests, simple & multiple regressions analysis.

### Dataplot

Dataplot software systems are available for Unix, Linux, PC-DOS, and Windows operating systems for scientific visualization, statistical analysis, and non-linear modeling. It has extensive mathematical and graphical capabilities. The target Dataplot user is the researcher and analyst engaged in the characterization, modeling, visualization, analysis, monitoring, and optimization of scientific and engineering processes. Closely integrated with the NIST/SEMATECH Engineering Statistics Handbook.

### Regress+

Regress+ is a professional statistical package for performing univariate mathematical modeling (equations and distributions). The most powerful software of its kind available anywhere, with state-of-the-art functionality and user-friendliness. It has 21 built-in equation
and 59 built-in distributions.

### SISA

SISA is a simple Interactive Statistical Analysis for PC (DOS) i.e for windows operating system from Daan Uitenbroek. There is an
excellent collection of individual windows and DOS modules for several statistical calculations, including some analyses not readily available elsewhere.

These windows programs contain a certain procedure that performs specific statistical analysis.

• lifetables
It helps to perform Mortality Analysis for Demography and Epidemiology. Lifetables program calculates the life expectancy, including all intermediary statistics, variance a confidence interval for the life expectancy, Potential Gains in Life Expectancy (PGLE), Years of Potential Life Lost (YPLL) and Lifetime Years of Potential Life Lost (LYPLL).
• Distributions
SISA-Distributions program allows the user for analysis of discrete single dimension distributions. The program is based on various manipulations of the Poisson, binomial and hypergeometric distribution. Available are the probability of an observed number of cases for the certain null hypothesis, the calculation of exact Poisson, binomial or hypergeometric confidence intervals, the exact and approximate size of a population using catch-recatch methodologies, the full analysis of a Poisson distributed rate ratio, Fieller analysis, and two versions of the negative binomial distribution can be used in various ways.
• Multinomial
The multinomial program is the exact solution to the Chi-square Goodness of fit test of testing for a difference between an observed and an expected distribution in a one-dimensional array. For the two-category array, the multinomial test provides a two-sided solution for the Binomial test. The multinomial allows you to work with empty ‘0’ observation cells although you must have an expectation about a cell.
• Tables
SISA-Tables is a program for the analysis of tables with up to 2*7 and 3*3 cells. This program (Tables) allows for exact and approximate
statistics. Fisher exact, Number Needed to Treat, Proportional Reduction in Error Statistics, Normal Approximations, Four different Chi-squares, Gamma, Odds-ratio, t-tests, and Kappa are among the many statistical procedures available in Tables Program.
• Weighting
The weighting program by SISA calculates sample weights according to the cell weight procedure. The design factor and the effective sample size for the resulting set of weights are determined. It is possible to specify a value above which extreme weights will be trimmed. The not trimmed weights will be recalculated.
• Intra Correlation
The intra correlation program from SISA calculates intra correlations and design effects for clustered samples were the outcome measure is
the number of positive responses per cluster. Confidence intervals and other statistics corrected for design effects can be calculated.
This program helps to compare two groups of clusters with a t-test procedure.

There are two spreadsheets available, a spreadsheet that does demographic analysis and another spreadsheet for the calculation of
intracorrelation coefficients. The spreadsheets are in Microsoft Excel file format; If you have MS Excel installed on the computer, your computer will start up Excel and load the spreadsheet into Excel automatically after you double click the procedure name.

• Lifetable
This lifetable spreadsheet does a full abridged current life table analysis to obtain the life expectancy of a population. Furthermore, one can calculate Potential Gains in Life Expectancy (PGLE) after removing cause k, considering competing causes of death; the (Premature) Years of
Potential Life Lost (YPLL), the Standardized Mortality Ratio (SMR), standardized numbers per 100,000 and the Comparative Mortality Figure (CMF) can also be calculated.
• Discounted YPLL
This spreadsheet contains the procedure to discount the YPLL if you only have mortality by age.
• Intra Correlation
The spreadsheet performs intra correlation calculations for dichotomous (binary yes/no) type outcome variables according to two different
methods proposed for the single cluster one by Fleiss and another one by Bennett et.al. A third spreadsheet concerns a method for two clusters by Donner and Klar.
• Distributions
There are 22 spreadsheets that demonstrate various statistical distributions such as Beta, Binomial, Normal, Poisson, Pareto, etc.

## MS-DOS Programs

The programs below are for use on MS-DOS (DOS Command Prompt). These DOS procedures (programs) are no longer maintained, except for bug fixing, and generally, have limited statistical capabilities. The DOS procedures are very fast as compared with the HTML/Javascript programs on the Website and are also very small in size.

• Hypergeometric
This procedure calculates the hypergeometric probability distribution to evaluate hypothesis in relation to sampling without replacing in small populations i.e. (hypergeometric distribution)
• Binomial
This procedure calculates probabilities for sampling with replacing in small populations or without replacing it in a very large population. It Can be used to approximate the hypergeometric distribution.
• Poisson
This procedure calculates probabilities for samples which are very large in an even larger population. it can used to approximate the binomial distribution.
• Negative Binomial 1
It is used to study accidents, is a more general case than the Poison, it considers that the probability of getting accidents if accidents
clusters differently in subgroups of the population.
• Negative Binomial 2
Another version of the negative binomial, this one is used to do the marginal distribution of binomials. Often used to predict the
termination of real-time events. Such as the probability of terminating listening to a non-answering phone after n-rings.
• Multinomial
Same as described above in Windows Programs.
• Fisher
Is used to calculate the exact p-value for 2*2 contingency tables. Use the Fisher exact instead of the Chi-square when you have a small value in one cell or a very uneven marginal distribution.
• SPRT
This Dos procedure is not often used, but it is actually quite good. This procedure is based on the case of phenomena
being observed, tested, or data collected, sequentially in time. It is sometimes used in medical trials to monitor the amount of
negative side effects and to decide whether the trial should be stopped as the number of side effects is considered high enough.
• Chi-Square
This Dos procedure calculates the Chi-square and some other measures for two-dimensional tables.
• Casro
This Dos Program calculates response rates according to different procedures. The CASRO (Council of American Survey Research Organizations)
the procedure is the “accepted” procedure for surveys.

## Statistical Software by Paul W. Mielke Jr.

Statistical Software by Paul W. Mielke Jr. has a large collection of executable DOS programs and FORTRAN source. It contains Matrix occupancy, exact g-sample empirical coverage test, interactions of exact analyses. It also contains spectral decomposition analysis, randomized block (exact mrbp) analyses, exact multi-response permutation procedure, Fisher’s Exact for cross-classification, and goodness-of-fit test. Furthermore,
Fisher’s combined p-values i.e. meta-analysis, largest part’s proportion test, Pearson-Zelterman test, Greenwood-Moran, and Kendall-Sherman
goodness-of-fit runs tests. The advanced statistical procedures include multivariate Hotelling’s test, least-absolute-deviation regression analysis, sequential permutation procedures, LAD regression, principal component analysis, matched pair permutation, r by c contingency tables, r-way contingency tables, and Jonkheere-Terpstra.

If any link is broken or not working please let me know about it. Also, if you have the web address of any free statistical software, inform me I will update the list.

RegressIt is a powerful free Excel add-in that performs multivariate descriptive data analysis and regression analysis with high-quality table and chart output.  It’s an excellent tool for instructors who are running online data analysis exercises using platforms such as Zoom.  The software includes built-in documentation and it can embed regression teaching notes in output worksheets in the form of cell comments. It also has some innovative auditing tools that allow instructors to easily review and verify the originality of the complete analysis carried out by every student in a class.  RegressIt also has a unique interface with R that allows Excel to be used as a front end for running very detailed linear and logistic regression analyses in R and which also allows R to be used as a computational engine for running models in Excel. Visit https://regressit.com for complete details and free downloading of the software.

## How to perform Select Cases in SPSS

Sometimes you may be interested in analyzing the specific part (subpart) of the available dataset. For example, you may be interested to get descriptive or inferential statistics for males and females separately. One may also be interested in a certain age range or may want to study (say) only non-smokers. In such cases, one may use a select case option in SPSS.

### Example and Step by Step procedure

For illustrative purposes, I am using the “customer_dbase” file available in SPSS sample data files. I am assuming the gender variable to select male customers only and will present some descriptive statistics only males. For this purpose follow these steps:

Step 1: Go to the Menu bar, select “Data” and then “Select Cases”.

Step 2: A new window called “Select Cases” will open.

Step 3: Tick the box called “If condition is satisfied” as shown in the figure below.

Step 4: Click on the button “If” highlighted in the above picture.

Step 5: A new window called “Select Cases: If” will open.

Step 6: The left box of this dialog box contains all the variables from the data view. Choose the variable (using the left mouse button) that you want to select cases for and use the “arrow” button to move the selected variable to the right box.

Step 7: In this example, the variable gender (for which we want to select only men) is shifted from the left to the right box. In the right box, write “gender=0” (since men have the value 0 code in this dataset).

Step 8: Click on Continue and then the OK button. Now, only men are selected (and the women’s data values are temporarily filtered out from the dataset).

Note: To “re-select” all cases (complete dataset), you carry out the following steps:

Step a: Go to the Menu bar, choose “Data” and then “Select Cases”.

Step b: From the dialog box of “Select Cases”, tick the box called “All cases”, and then click on the OK button.

When you use the select cases tool in SPSS, a new variable called “filter” will be created in the dataset. Deleting this filter variable, the selection will disappear. The “un-selected” cases are crossed over in the data view windows.

Note: The selection will be applied to everything you do from the point you select cases until you remove the selection. In other words, all statistics, tables, and graphs will be based only on the selected individuals until you remove (or change) the selection.

There is another kind of selection too. For example, the random sample of cases, based on time or case range, and use the filter variable. The selected case can be copied to a new dataset or unselected cases can be deleted. For this purpose choose the appropriate option from the output section of the select cases dialog box.

For other SPSS tutorials Click the links below

## Cronbach’s Alpha Reliability Analysis of Measurement Scales

Reliability analysis is used to study the properties of measurement scales (Likert scale questionnaire) and the items (questions) that make them up. The reliability analysis method computes a number of commonly used measures of scale reliability. The reliability analysis also provides information about the relationships between individual items in the scale. The intraclass correlation coefficients can be used to compute the interrater reliability estimates.

Consider that you want to know that does my questionnaire measures the customer satisfaction in a useful way? For this purpose, you can use the reliability analysis to determine the extent to which the items (questions) in your questionnaire are correlated with each other. The overall index of the reliability or internal consistency of the scale as a whole can be obtained. You can also identify problematic items that should be removed (deleted) from the scale.

As an example open the data “satisf.save” already available in SPSS sample files. To check the reliability of Likert scale items follows the steps given below:

Step 1: On the Menu bar of SPSS, Click Analyze > Scale > Reliability Analysis… option Step 2: Select two more variables that you want to test and shift them from left pan to right pan of reliability analysis dialogue box. Note, multiple variables (items) can be selected by holding down the CTRL key and clicking the variable you want. Clicking the arrow button between the left and right pan will shift the variables to the item pan (right pan). Step 3: Click on the “Statistics” Button to select some other statistics such as descriptives (for item, scale and scale if item deleted), summaries (for means, variances, covariances and correlations), inter-item (for correlations and covariances) and Anova table (for none, F-test, Friedman chi-square and Cochran chi-square) statistics etc. Click on the “Continue” button to save the current statistics options for analysis. Click the OK button in the Reliability Analysis dialogue box to get analysis to be done on selected items. The output will be shown in SPSS output windows. The Cronbach’s Alpha Reliability ($\alpha$) is about 0.827, which is good enough. Note that, deleting the item “organization satisfaction” will increase the reliability of remaining items to 0.860.

A rule of thumb for interpreting alpha for dichotomous items (questions with two possible answers only) or Likert scale items (question with 3, 5, 7, or 9 etc items) is:

• If Cronbach’s Alpha is $\ge 0.9$, the internal consistency of scale is Excellent.
• If Cronbach’s Alpha is $0.90 > \alpha \ge 0.8$, the internal consistency of scale is Good.
• If Cronbach’s Alpha is $0.80 > \alpha \ge 0.7$, the internal consistency of scale is Acceptable.
• If Cronbach’s Alpha is $0.70 > \alpha \ge 0.6$, the internal consistency of scale is Questionable.
• If Cronbach’s Alpha is $0.60 > \alpha \ge 0.5$, the internal consistency of scale is Poor.
• If Cronbach’s Alpha is $0.50 > \alpha$, the internal consistency of scale is Unacceptable.

However, the rules of thumb listed above should be used with caution. Since Cronbach’s Alpha reliability is sensitive to the number of items in a scale. A larger number of questions can results in a larger Alpha Reliability, while a smaller number of items may result in smaller $\alpha$.