How to Convert Continuous Variables in SPSS: A Quick Guide

There may be situations in which one may want to convert continuous variables in SPSS to categorical. For example, one may want to find out how many females earn a starting salary of more than 80,000 using the data of the University of say Florida. For this numeric data, we need to change into categorical variables. In SPSS, this type of transformation is called the recoding of continuous variables to categorical.

Convert Continuous Variables in SPSS to Categorical

Step-by-Step Procedure

In SPSS there are three basic options for recoding the variables.

  • Recode into different variables
  • Recode into the same variable
  • DO IF syntax

Recode into different variables and DO IF syntax creates a new variable without modifying the original variable, while recode into the same variable will permanently overwrite the original variable. Best to record a variable into a different variable. To recode into different variables,

Click Transform > Recode into different variables

Convert Continuous Variables in SPSS to Categorical

The Recode into different variables dialog box will appear as:

Convert Continuous Variables to Categorical in SPSS Input Variable Output variable

The left-side pane of the dialog box lists all of the variables. Select the variable of interest to recode and move the variable to the right-side pane by clicking the arrow button in between the left and right-side dialog box. Let us have the salary variable to transform.

  • Input Variable -> Output
    The center text box lists the variables(s). In this case, we have only a salary variable.
  • Output Variable
    Define the name and label (label is optional) for your recoded variable(s) by typing them in the text field. The new name of the recoded variable (say) will be “new-salary” and then click change.
  • Old and New Variables
  • Click the “old and new values” to specify the categories of the selected variable. A new dialog box will appear, where one needs to specify how to transform the values will appear.
Convert Continuous Variables to Categorical in SPSS Old new Values

Old Values and New Values

The “Old -> New” box specifies the type of value of a recode variable. For example, the value of the recode variable (new value) is 1 or range of 20000 through the highest.

A short description of “Old Values” options.

  • Value:
    Enter a numeric code that represents the category. for example, give the value 1 for 1st category or group.
  • System Missing:
    Apply any system missing value(.).
  • Range or Through:
    This option is used to enter the lower and upper limits that should be coded. The recode category includes both limits (inclusive). For example, 20000 to 40000.
  • Range, Lowest through Value:
    Recode all values greater than or equal to some number.
  • All Other Values:
    Applies any value not explicitly accounted for by the previous recoding rules.

A short description of the “Old -> New” option:

Enter the required group/ category numerical code in the “New Value” and then click the add button below. Repeat this step for each group value that you wish to recode. All the required groups are recorded by adding an “Old -> New” box. Finally, click the continue button. Click the OK button to transform the continuous variable into a categorical variable.

https://gmstat.com, https://rfaqs.com

MCQs Data and Variable 14

The post is about MCQs Data and Variables. There are 20 multiple-choice questions related to variables, data, population, sample, and types of variables. Let us start with MCQs Data and Variable with Answers.

Online Multiple choice questions about Variable and Data with Answers

1. Variables whose measurement is done in terms such as weight, height, and length are classified as

 
 
 
 

2. A scientist is experimenting to determine the relationship between the consumption of a certain type of food and high blood pressure. He conducts a random sample on 2,000 people and first asks them a “yes” or “no” question: Do you eat this type of food more than once a week? He also takes the blood pressure of each person and records it (for example: 120/80). Which one of the following statements is true?

 
 
 
 

3. Government and non-government publications are considered as

 
 
 
 

4. In statistics, conducting a survey means:

 
 
 
 

5. A data set is a:

 
 
 
 

6. A quantitative variable is one that can:

 
 
 
 

7. When data are collected in a statistical study for only a portion or subset of all elements of interest we are using:

 
 
 
 

8. Time-series data are collected:

 
 
 
 

9. A qualitative variable is the one that:

 
 
 
 

10. A statistician wants to determine the total annual medical costs incurred by all districts of Pakistan from 1981 to 2001 as a result of health problems related to smoking. He polls each of the districts annually to obtain health care expenditures, in dollars, on smoking-related illnesses. Which one of the following is not a true statement?

 
 
 
 

11. In statistics, a population consists of:

 
 
 
 

12. A variable is a:

 
 
 
 

13. In statistics, a sample means:

 
 
 
 

14. Which one of the following is an example of qualitative data?

 
 
 
 

15. What tasks are involved in data cleaning? Select all that apply

 
 
 
 

16. What is the main objective of data cleaning?

 
 
 
 

17. Cross-section data are collected:

 
 
 
 

18. Which one of the following is a continuous variable?

 
 
 
 

19. An observation is the:

 
 
 
 

20. Which one of the following is an example of cross-section data?

 
 
 
 

MCQs Data and Variable with Answers

MCQs Data and Variable with answers
  • When data are collected in a statistical study for only a portion or subset of all elements of interest we are using:
  • In statistics, a population consists of:
  • In statistics, a sample means:
  • In statistics, conducting a survey means:
  • A data set is a:
  • A variable is a:
  • An observation is the:
  • A quantitative variable is one that can:
  • A qualitative variable is the one that:
  • Time-series data are collected:
  • Cross-section data are collected:
  • Which one of the following is an example of qualitative data?
  • Which one of the following is an example of cross-section data?
  • Which one of the following is a continuous variable?
  • What tasks are involved in data cleaning? Select all that apply
  • What is the main objective of data cleaning?
  • A statistician wants to determine the total annual medical costs incurred by all districts of Pakistan from 1981 to 2001 as a result of health problems related to smoking. He polls each of the districts annually to obtain health care expenditures, in dollars, on smoking-related illnesses. Which one of the following is not a true statement?
  • A scientist is experimenting to determine the relationship between the consumption of a certain type of food and high blood pressure. He conducts a random sample on 2,000 people and first asks them a “yes” or “no” question: Do you eat this type of food more than once a week? He also takes the blood pressure of each person and records it (for example: 120/80). Which one of the following statements is true?
  • Variables whose measurement is done in terms such as weight, height, and length are classified as
  • Government and non-government publications are considered as
Statistics Help: MCQs Data and Variable with Answres

https://gmstat.com, https://rfaqs.com

Critical Values and Rejection Region

In statistical hypotheses testing procedure, an important step is to determine whether to reject the null hypothesis. The step is to compute/find the critical values and rejection region.

Rejection Region and Critical Values

A rejection region for a hypothesis test is the range of values for the standardized test statistic which would lead us to decide whether to reject the null hypothesis. The Critical values for a hypothesis test are the z-scores which separate the rejection region(s) from the non-rejection region (also called the acceptance region of $H_0$).  The critical values will be denoted by $Z_0$.

The rejection region for a test is determined by the type of test (left-tailed, right-tailed, or two-tailed) and the level of significance (denoted by $\alpha$) for the test. For a left-tailed test, the rejection region is a region in the left tail of the normal distribution, for a right-tailed test, it is in the right tail, and for a two-tailed test, there are two equal rejection regions in either tail.

Hypothesis-Testing-Tails-Critical Values and Rejection Region

Once we establish the critical values and rejection region, if the standardized test statistics for a sample data set fall in the region of rejection, the null hypothesis is rejected.

Examples: Critical Values and Rejection Region

Example 1: A university claims that the average SAT score for its incoming freshmen is 1080. A sample of 56 freshmen at the university is drawn and the average SAT score is found to be $\overline{x}=1044$ with a sample standard deviation of $s=94.7$ points.

    In the above SAT example, the test is two-tailed, so the rejection region will be the two tails at either end of the normal distribution. If we again want $\alpha=0.05$, then the area under the curve in both rejection regions together should be 0.05. For this purpose, we will look up $\frac{\alpha}{2}=0.025$ in the standard normal table to get critical values of $Z_0 = \pm 1.96$. The rejection region thus consists of $Z \le 1.96$ and $Z\ge 1.96$. Since the standardized test statistic $Z=-2.85$ falls in the region, the university’s claim of $\mu = 1080$ would be rejected in this case.

    Example 2: Consider a left-tailed Z test. For a 0.05 level of significance, the rejection region would be the values in the lowest 5% of the standard normal distribution (5% lowest area under the normal curve). In this case, the critical value (the corresponding) Z-score will be $-1.645$. So the critical value $Z_0$ will be $-1.645$ and the rejection region will be $Z\le -1.645$.

    Note that for the case of right-tailed the rejection region would be the values in the highest 5% of the standard normal distribution table. The Z-score will be $1.645$ and the rejection region will be $Z\ge 1.645$.

    Hypothesis Test

    Exercise: Critical Values and Rejection Region

    1. Find the critical values and rejection regions(s) for the standardized Z-test of the following:
    • A right-tailed test with $\alpha = 0.05$
    • A left-tailed test with $\alpha = 0.01$
    • A two-tailed test with $\alpha = 0.10$
    • A right-tailed test with $\alpha = 0.02$
    1. Mercury levels in fish are considered dangerous to people if they exceed 0.5mg mercury per kilogram of meat. A sample of 50 tuna is collected, and the mean level of mercury in these 50 fishes is 0.6m/kg, with a standard deviation of 0.2mg/kg. A health warning will be issued if the claim that the mean exceeds 0.5mg/kg can be supported at the $\alpha=0.10$ level of significance. Determine the null and alternative hypotheses in this case, the type of the test, the critical value(s), and the rejection region. Find the standardized test statistics for the information given in the exercise. Should the health warning be issued?

    https://rfaqs.com, https://gmstat.com

    Errors in Statistics: A Comprehensive Guide

    To learn about errors in statistics, we first need to understand the concepts related to true value, accuracy, and precision. Let us start with these basic concepts.

    True Value

    The true value is the value that would be obtained if no errors were made in any way by obtaining the information or computing the characteristics of the population under study.

    The true value of the population is possible obtained only if the exact procedures are used for collecting the correct data, every element of the population has been covered and no mistake or even the slightest negligence has happened during the data collection process and its analysis. It is usually regarded as an unknown constant.

    Accuracy

    Accuracy refers to the difference between the sample result and the true value. The smaller the difference the greater will be the accuracy. Accuracy can be increased by

    • Elimination of technical errors
    • Increasing the sample size

    Precision

    Precision refers to how closely we can reproduce, from a sample, the results that would be obtained if a complete count (census) was taken using the same method of measurement.

    Errors in Statistics

    The difference between an estimated value and the population’s true value is called an error. Since a sample estimate is used to describe a characteristic of a population, a sample being only a part of the population cannot provide a perfect representation of the population (no matter how carefully the sample is selected). Generally, it is seen that an estimate is rarely equal to the true value and we may think about how close will the sample estimate be to the population’s true value. There are two kinds of errors, sampling and non-sampling errors.

    • Sampling error (random error)
    • Non-sampling errors (nonrandom errors)

    Sampling Errors

    A sampling error is the difference between the value of a statistic obtained from an observed random sample and the value of the corresponding population parameter being estimated. Sampling errors occur due to the natural variability between samples. Let $T$ be the sample statistic and it is used to estimate the population parameter $\theta$. The sampling error may be denoted by $E$,

    $$E=T-\theta$$

    The value of the sampling error reveals the precision of the estimate. The smaller the sampling error, the greater will be the precision of the estimate. The sampling error may be reduced by some of the following listed:

    • By increasing the sample size
    • By improving the sampling design
    • By using the supplementary information

    Usually, sampling error arises when a sample is selected from a larger population to make inferences about the whole population.

    Errors in Statistics, Sampling Error

    Non-Sampling Errors

    The errors that are caused by sampling the wrong population of interest and by response bias as well as those made by an investigator in collecting, analyzing, and reporting data are all classified as non-sampling errors (or non-random errors). These errors are present in a complete census as well as in a sampling survey.

    Bias

    Bias is the difference between the expected value of a statistic and the true value of the parameter being estimated. Let $T$ be the sample statistic used to estimate the population parameter $\theta$, then the amount of bias is

    $$Bias = E(T) – \theta$$

    The bias is positive if $E(T)>\theta$, bias is negative if $E(T) <\theta$, and bias is zero if $E(T)=\theta$. The bias is a systematic component of error that refers to the long-run tendency of the sample statistic to differ from the parameter in a particular direction. Bias is cumulative and increases with the increase in size of the sample. If proper methods of selection of units in a sample are not followed, the sample result will not be free from bias.

    Note that non-sampling errors can be difficult to identify and quantify, therefore, the presence of non-sampling errors can significantly impact the accuracy of statistical results. By understanding and addressing these errors, researchers can improve the reliability and validity of their statistical findings.

    Errors in Statistics: Potential Sources of Error

    https://rfaqs.com, https://gmstat.com