Interval Estimation and Point Estimation: A Quick Guide 2012

The problem with using a point estimate is that although it is the single best guess you can make about the value of a population parameter, it is also usually wrong. Interval estimate overcomes this problem using interval estimation technique which is based on point estimate and margin of error.

Interval Estimation

Point Estimation

Point estimation involves calculating a single value from sample data to estimate a population parameter. The examples of point estimation are: (i) Estimating the population mean using the sample mean and (ii) Estimating the population proportion using the sample proportion. The common point estimators are:

  • Sample mean $\overline{x}$ for population mean ($\mu$).
  • Sample proportion ($\hat{p}$​) for population proportion ($P$).
  • Sample variance ($s^2$) for population variance ($\sigma^2$).

Interval Estimation

Interval estimation involves calculating a range of values (set of values: an interval) from sample data to estimate a population parameter. The range constructed has a specified level of confidence. The Components of an interval are:

  • Confidence level: The probability that the true population parameter lies within the interval.
  • Margin of error: The maximum allowable error (difference between the point estimate and the true population parameter).

The common confidence intervals for the population mean are:

  • Confidence interval for a large sample (or known population standard deviation):
    $\overline{x} \pm Z_{\alpha/2} \frac{s}{\sqrt{n}}$
  • Confidence interval for small sample (or unknown population standard deviation):
    $\overline{x} \pm t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}$
  • Confidence interval for the population proportion
    $\hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p} {1-\hat{p}}}{n}}$

Advantages of Interval Estimation

  • A major advantage of using interval estimation is that you provide a range of values with a known probability of capturing the population parameter (e.g. if you obtain from SPSS a 95% confidence interval you can claim to have 95% confidence that it will include the true population parameter.
  • An interval estimate (i.e., confidence intervals) also helps one not to be so confident that the population value is exactly equal to the single-point estimate. That is, it makes us more careful in interpreting our data and helps keep us in proper perspective.
  • Perhaps the best thing to do is to provide both the point estimate and the interval estimate. For example, our best estimate of the population mean is the value of $32,640 (the point estimate) and our 95% confidence interval is $30,913.71 to $34,366.29.
  • By the way, note that the bigger your sample size, the more narrow the confidence interval will be.
  • Remember to include many participants in your research study if you want narrow (i.e., exact) confidence intervals.

In essence, interval estimation is a game-changer in the field of statistics. Interval estimation, acknowledges the uncertainty inherent in data, providing a range of probable values (interval estimates) instead of a single (point estimate), potentially misleading, point estimate. By incorporating it into the statistical analysis, one can gain a more realistic understanding of the data and can make more informed decisions based on evidence, not just a single number.

Learn R Programming Language

Interval Estimation and Point Estimation

https://gmstat.com

Scatter Diagram: Graphical Representation (2012)

A scatterplot (also called a scatter graph or scatter Diagram) is used to observe the strength and direction between two quantitative variables. In statistics, the quantitative variables follow the interval or ratio scale from measurement scales.

Scatter Diagram

Usually, in a scatter, diagram the independent variable (also called the explanatory, regressor, or predictor variable) is taken on the X-axis (the horizontal axis) while on the Y-axis (the vertical axis) the dependent (also called the outcome variable) is taken to measure the strength and direction of the relationship between the variables. However, it is not necessary to take explanatory variables on the X-axis and outcome variables on the Y-axis. Because, the scatter diagram and Pearson’s correlation measure the mutual correlation (interdependencies) between the variables, not the dependence or cause and effect.

The diagram below describes some possible relationships between two quantitative variables ($X$ & $Y$). A short description is also given of each possible relationship.

Scatter diagram

A scatter diagram can be drawn between two quantitative variables. The length (number of observations) of both of the variables should be equal. Suppose, we have two quantitative variables $X$ and $Y$. We want to observe the strength and direction of the relationship between these two variables. It can be done in R language easily.

x <- c(5, 7, 8, 7, 2, 2, 9, 4, 11 ,12, 9, 6)
y <- c(99, 86, 87, 88, 111, 103, 87, 94, 78, 77, 85, 86)

plot(x, y)
Scatter Diagram

From the above discussion, it is clear that the main objective of a scatter diagram is to visualize the linear or some other type of relationship between two quantitative variables. The visualization may also help to depict the trends, strength, and direction of the relationship between variables.

Limitations of Scatter Diagrams

  • Limited to Two Variables: Scatter plots can only depict the relationship between two variables at a time. If there are more than two variables, one might need to use other visualization techniques.
  • Strength of Correlation: While scatter diagrams can show the direction of a relationship, they don’t necessarily indicate the strength of that correlation. You might need to calculate correlation coefficients to quantify the strength.

In conclusion, scatter diagrams are a powerful and versatile tool for exploring relationships between variables. By understanding how to create and interpret them, one can gain valuable insights from the data and inform decision-making processes across various disciplines.

https://itfeature.com

For more about correlation and regression analysis

Learn R Language for Statistical Computing

Pearson’s Correlation Coefficient SPSS (2012)

Pearson’s Correlation Coefficient SPSS

Pearson’s correlation coefficient (or correlation or simply correlation) is used to find the degree of linear relationship between two continuous variables. The value for a correlation coefficient lies between 0.00 (no correlation) and 1.00 (perfect correlation). Generally, correlations above 0.80 are considered pretty high.

Remember:

  1. Correlation is the interdependence of continuous variables, it does not refer to cause and effect.
  2. Correlation is used to determine the linear relationship between variables.
  3. Draw a scatter plot before performing/calculating the correlation (to check the assumptions of linearity)

How to Perform Pearson’s Correlation Coefficient SPSS

The command for correlation is found at Analyze –> Correlate –> Bivariate i.e.

Correlation Coefficient SPSS

The Bivariate Correlation Coefficient SPSS dialog box will be there:

Pearson's Correlation Coefficient SPSS

Select one of the variables that you want to correlate in the left-hand pane of the Bivariate Correlations dialog box and shift it into the Variables pane on the right-hand pan by clicking the arrow button. Now click on the other variable that you want to correlate in the left-hand pane and move it into the Variables pane by clicking on the arrow button

Pearson's Correlation Coefficient SPSS

Correlation Coefficient SPSS Output

Pearson's Correlation Coefficient SPSS

The Correlations table in the output gives the values of the specified correlation tests, such as Pearson’s correlation. Each row of the table corresponds to one of the variables similarly each column also corresponds to one of the variables.

Interpreting Correlation Coefficient

For example, the cell at the bottom row of the right column represents the correlation of depression with depression which is equal to 1.0. Likewise, the cell at the middle row of the middle column represents the correlation of anxiety with anxiety having a correlation value This in in both cases shows that anxiety is related to anxiety similarly depression is related to depression, so have the perfect relationship.

The cell in the middle row and right column (or the cell in the bottom row at the middle column) is more interesting. This cell represents the correlation between anxiety and depression (or depression with anxiety). There are three numbers in these cells.

  1. The top number is the correlation coefficient value which is 0.310.
  2. The middle number is the significance of this correlation which is 0.018.
  3. The bottom number, 46 is the number of observations that were used to calculate the correlation coefficient. between the variables of the study.

Note that the significance tells us whether we would expect a correlation that was this large purely due to chance factors and not due to an actual relation. In this case, it is improbable that we would get an r (correlation coefficient) this big if there was not a relation between the variables.

Online General Knowledge MCQs Test with Answers

Independent Samples t test in SPSS

Introduction (Independent Samples t test using SPSS)

Independent Samples t test is a test for independent groups and is useful when the same variable has been measured in two independent groups and the researcher wants to know whether the difference between group means is statistically significant. “Independent groups” means that the groups have different people in them and that the people in the different groups have not been matched or paired in any way.

Objectives of Independent Samples t test

The independent t-test compares the means of two unrelated/independent groups measured on the Interval or ratio scale. The SPSS t-test procedure allows the testing of the hypothesis when variances are assumed to be equal or when are not equal and also provides the t-value for both assumptions. This test also provides the relevant descriptive statistics for both of the groups.

Assumptions (Independent Samples t test)

  • Variable can be classified into two groups independent of each other.
  • The variable is Measured on an interval or ratio scale.
  • The measured variable is approximately normally distributed
  • Both groups have similar variances  (variances are homogeneity)

Data Required for (Independent Samples t test)

Suppose a researcher wants to discover whether left and right-handed telephone operators differed in the time it took them to answer calls. The data for reaction time were obtained (RT’s measured in seconds):

Data Telephone: Independent Samples t test

The mean reaction times suggest that the left-handers were slower but does a t-test confirm this?

Independent Samples t Test using SPSS

Perform the following steps to perform the Independent Samples t-test by using the SPSS and entering the data set in the SPSS data view

1) Click Analyze > Compare Means > Independent-Samples T Test… on the top menu as shown below.

Independent Samples t test in SPSS

2) Select continuous variables that you want to test from the list.

independent samples t test - 2

3) Click on the arrow to send the variable in the “Test Variable(s)” box. You can also double-click the variable to send it in the “Test Variable” Box.

4) Select the categorical/grouping variable so that group comparison can be made and send it to the “Grouping Variable” box.

5) Click on the “Define Groups” button. A small dialog box will appear asking about the name/code used in the variable view for the groups. We used 1 for males and 2 for females. Click the Continue button when you’re done. Then click OK when you’re ready to get the output.  See the Pictures for a Visual view.

independent samples t test - Define groups 3

Independent Samples t-test SPSS Output

independent samples t test - SPSS Output 4

The first Table in the output is about descriptive statistics concerning your variables. The number of observations, mean, variance, and standard error are available for both of the groups (male and female)

The second Table in the output is an important one concerning the testing of the hypothesis. You will see that there are two t-tests. You have to know which one to use. When comparing groups having approximately similar variances use the first t-test. Levene’s test checks for this. If the significance for Levene’s test is 0.05 or below, then it means that the “Equal Variances Not Assumed” test should be used (the second one), Otherwise use the “Equal Variances Assumed” test (first one).  Here the significance is 0.287, so we’ll be using the “Equal Variances” first row in the second table.

In the output table “t” is the calculated t-value from test statistics, for example, the t-value is 1.401

df stands for degrees of freedom, in the example, we have 18 degrees of freedom

Sig (two-tailed) means two-tailed significance value (P-Value), for example, the sig value is greater than 0.05 (significance level).

Decision

As the P-value of 0.178 is greater than our 0.05 significance level we fail to reject the null hypothesis. (two-tailed case)

As the P-value of 0.089 is greater than our 0.05 significance level we fail to reject the null hypothesis. (one tail case with 0.05 significance level)

As the P-value of 0.089 is smaller than our 0.10 significance level we reject the null hypothesis and accept the alternative hypothesis. (one tail case with 0.10 significance level). In this case, it means that the left handler has a slower reaction time as compared to the right handler on average.

Other links to study Independent Samples t-test using SPSS

  • https://libguides.library.kent.edu/SPSS/IndependentTTest
  • https://statistics.laerd.com/spss-tutorials/independent-t-test-using-spss-statistics.php

R Programming Language Frequently Asked Questions