Statistical Data: Introduction and Real Life Examples (2020)

By statistical Data we mean, the piece of information collected for descriptive or inferential statistical analysis of the data. Data is everywhere. Therefore, everything that has past and/ or features is called statistical data.

One can find the Statistical data

  • Any financial/ economics data
  • Transactional data (from stores, or banks)
  • The survey, or census (of unemployment, houses, population, roads, etc)
  • Medical history
  • Price of product
  • Production, and yields of a crop
  • My history, your history is also statistical data

Data

Data is the plural of datum — it is a piece of information. The value of the variable (understudy) associated with one element of a population or sample is called a datum (or data in a singular sense or data point). For example, Mr. Asif entered college at the age of 18 years, his hair is black, has a height of 5 feet 7 inches, and he weighs about 140 pounds. The set of values collected for the variable from each of the elements belonging to the sample is called data (or data in a plural sense). For example, a set of 25 weights was collected from the 25 students.

Types of Data

The data can be classified into two general categories: quantitative data and qualitative data. The quantitative data can further be classified as numerical data that can be either discrete or continuous. The qualitative data can be further subdivided into nominal, ordinal, and binary data.

Qualitative data represent information that can be classified by some quality, characteristics, or criterion—for example, the color of a car, religion, blood type, and marital status.

When the characteristic being studied is non-numeric it is called a qualitative variable or an attribute. A qualitative variable is also known as a categorical variable. A categorical variable is not comparable to taking numerical measurements. Observations falling in each category (group, class) can only be counted for examples, gender (either male or female), general knowledge (poor, moderate, or good), religious affiliation, type of automobile owned, city of birth, eye color (red, green, blue, etc), etc. Qualitative variables are often summarized in charts graphs etc. Other examples are what percent of the total number of cars sold last month were Suzuki, what percent of the population has blue eyes?

Quantitative data result from a process that quantifies, such as how much or how many. These quantities are measured on a numerical scale. For example, weight, height, length, and volume.

When the variables studied can be reported numerically, the variable is called a quantitative variable. e.g. the age of the company president, the life of an automobile battery, the number of children in a family, etc. Quantitative variables are either discrete or continuous.

Statistical Data

Note that some data can be classified as either qualitative or quantitative, depending on how it is used. If a numerical is used as a label for identification, then it is qualitative; otherwise, it is quantitative. For example, if a serial number on a car is used to identify the number of cars manufactured up to that point then it is a quantitative measure. However, if this number is used only for identification purposes then it is qualitative data.

Binary Data

The binary data has only two possible values/states; such as, defected or non-defective, yes or no, and true or false, etc. If both of the values are equally important then it is binary symmetric data (for example, gender). However, if both of the values are not equally important then it can be called binary asymmetric data (for example, result: pass or fail, cancer detected: yes or no).

For quantitative data, a count will always give discrete data, for example, the number of leaves on a tree. On the other hand, a measure of a quantity will usually be continuous, for example, weigh 160 pounds, to the nearest pound. This weight could be any value in the interval say 159.5 to 160.5.

The following are some examples of Qualitative Data. Note that the outcomes of all examples of Qualitative Variables are non-numeric.

  • The type of payment (cheque, cash, or credit) used by customers in a store
  • The color of your new cell phone
  • Your eyes color
  • The make of the types on your car
  • The obtained exam grade

The following are some examples of Quantitative Data. Note that the outcomes of all examples of Quantitative Variables are numeric.

  • The age of the customer in a stock
  • The length of telephone calls recorded at a switchboard
  • The cost of your new refrigerator
  • The weight of your watch
  • The air pressure in a tire
  • the weight of a shipment of tomatoes
  • The duration of a flight from place A to B
  • The grade point average

Learn about the Measures of Central Tendency

Visit Online MCQs Quiz Website

Hierarchical Multiple Regression SPSS

In this tutorial, we will learn how to perform hierarchical multiple regression analysis SPSS, which is a variant of the basic multiple regression analysis that allows specifying a fixed order of entry for variables (regressors) to control for the effects of covariates or to test the effects of certain predictors independent of the influence of other.

Step By Step Procedure of Hierarchical Multiple Regression SPSS

The basic command for hierarchical multiple regression analysis SPSS is “regression -> linear”:

Hierarchical Multiple Regression SPSS

In the main dialog box of linear regression (as given below), input the dependent variable. For example “income” variable from the sample file of customer_dbase.sav available in the SPSS installation directory.

Next, enter a set of predictor variables into an independent(s) pan. These variables that you want SPSS to put into the regression model first (that you want to control for when testing the variables). For example, in this analysis, we want to find out whether the “Number of people in the house” predicts the “Household income in thousands”.

We are also concerned that other variables like age, education, gender, union member, or retirement might be associated with both the “number of people in the house” and “household income in thousands”. To make sure that these variables (age, education, gender, union member, and retired) do not explain away the entire association between the “number of people in the house” and “Household income in thousands”, let’s put them into the model first.

This ensures that they will get credit for any shared variability that they may have with the predictor that we are interested in, “Number of people in the house”. any observed effect of “Number of people in the house” can then be said to be “independent of the effects of these variables that already have been controlled for. See the figure below

Linear Regression Variable

In the next step put the variable that we are interested in, which is the “number of people in the house”. To include it in the model click the “NEXT” button. You will see all of the predictors (that were entered previously) disappear. Note that they are still in the model, just not on the current screen (block). You will also see Block 2 of 2 above the “independent(s)” pan.

Hierarchical Regression

Now click the “OK” button to run the analysis.

Note you can also hit the “NEXT” button again if you are interested in entering a third or fourth (and so on) block of variables.

Often researchers enter variables as related sets. For example demographic variables in the first step, all potentially confounding variables in the second step, and then the variables that you are most interested in in the third step. However, it is not necessary to follow. One can also enter each variable as a separate step if that seems more logical based on the design of your experiment.

Output Hierarchical Multiple Regression Analysis

Using just the default “Enter” method, with all the variables in Block 1 (demographics) entered together, followed by “number of people in the house” as a predictor in Block 2, we get the following output:

Output Hierarchical Regression

The first table of output windows confirms that variables are entered in each step.

The summary table shows the percentage of explained variation in the dependent variable that can be accounted for by all the predictors together. The change in $R^2$ (R-squared) is a way to evaluate how much predictive power was added to the model by the addition of another variable in STEP 2. In our example, predictive power does not improve with the addition of another predictor in STEP 2.

Hierarchical Regression Output

The overall significance of the model can be checked from this ANOVA table. In this case, both models are statistically significant.

Hierarchical Regression Output

The coefficient table is used to check the individual significance of predictors. For model 2, the Number of people in the household is statistically non-significant, therefore excluded from the model.

Learn about Multiple Regression Analysis

R Language Frequently Asked Questions

Learn Cholesky Transformation (2020)

Given the covariances between variables, one can write an invertible linear transformation that “uncorrelated” the variables. Contrariwise, one can transform a set of uncorrelated variables into variables with given covariances. This transformation is called Cholesky Transformation; represented by a matrix that is the “Square Root” of the covariance matrix.

The Square Root Matrix

Given a covariance matrix $\Sigma$, it can be factored uniquely into a product $\Sigma=U’U$, where $U$ is an upper triangle matrix with positive diagonal entries. The matrix $U$ is the Cholesky (or square root) matrix. If one prefers to work with the lower triangular matrix entries ($L$), then one can define $$L=U’ \Rightarrow \quad \Sigma = LL’.$$

This is the form of the Cholesky decomposition given by Golub and Van Lean in 1996. They provided proof of the Cholesky Decomposition and various ways to compute it.

The Cholesky matrix transforms uncorrelated variables into variables whose variances and covariances are given by $\Sigma$. If one generates standard normal variates, the Cholesky transformation maps the variables into variables for the multivariate normal distribution with covariance matrix $\Sigma$ and centered at the origin (%MVN(0, \Sigma)$).

Generally, pseudo-random numbers are used to generate two variables sampled from a population with a given degree of correlation. Property is used for a set of variables (correlated or uncorrelated) in the population, a given correlation matrix can be imposed by post-multiplying the data matrix $X$ by the upper triangular Cholesky Decomposition of the correlation matrix R. That is

  • Create two variables using the pseudo-random number, let the names be $X$ and $Y$
  • Create the desired correlation matrix between variables using $Y=X*R + Y*\sqrt{1-r^2},$
    where $r$ is the desired correlation value. $X$ and $Y$ variables will have an exact desired relationship between them. For a larger number of times, the distribution of correlation will be centered on $r$.

The Cholesky Transformation: The Simple Case

Suppose you want to generate multivariate normal data that are uncorrelated but have non-unit variance. The covariance matrix is the diagonal matrix of variance: $\Sigma = diag(\sigma_1^2,\sigma_2^2,\cdots, \sigma_p^2)$. The $\sqrt{\Sigma}$ is the diagnoal matrix $D$ that consists of the standard deviations $\Sigma = D’D$, where $D=diag(\sigma_1,\sigma_2,\cdots, \sigma_p)$.

Geometrically, the $D$ matrix scales each coordinate direction independent of other directions. The $X$-axix is scaled by a factor of 3, whereas the $Y$-axis is unchanged (scale factor of 1). The transformation $D$ is $diag(3,1)$, which corresponds to a covariance matrix of $diag(9,1)$.

Thinking the circles in Figure ‘a’ as probability contours for multivariate distribution $MNV(0, I)$, and Figure ‘b’ as the corresponding probability ellipses for the distribution $MNV(0, D)$.

Cholesky Transformation
# define the correlation matrix
C <- matrix(c(1.0, 0.6, 0.3,0.6, 1.0, 0.5,0.3, 0.5, 1.0),3,3)

# Find its cholesky decomposition
U = chol(C)

#generate correlated random numbers from uncorrelated
#numbers by multiplying them with the Cholesky matrix.
x <- matrix(rnorm(3000),1000,3)
xcorr <- x%*%U
cor(xcorr)
Cholesky Transformation
https://itfeature.com

Reference: Cholesky Transformation to correlate and Uncorrelated variables

R Programming Language

MCQs General Knowledge