The Z-Score Definition, Formula, Real Life Examples (2020)

Z-Score Definition: The Z-Score also referred to as standardized raw scores (or simply standard score) is a useful statistic because not only permits to computation of the probability (chances or likelihood) of the raw score (occurring within normal distribution) but also helps to compare two raw scores from different normal distributions. The Z score is a dimensionless measure since it is derived by subtracting the population mean from an individual raw score and then this difference is divided by the population standard deviation. This computational procedure is called standardizing raw score, which is often used in the Z-test of testing of hypothesis.

Any raw score can be converted to a Z-score formula by

$$Z-Score=\frac{raw score – mean}{\sigma}$$

Z-Score Real Life Examples

Example 1: If the mean = 100 and standard deviation = 10, what would be the Z-score of the following raw score

Raw ScoreZ Scores
90$ \frac{90-100}{10}=-1$
110$ \frac{110-100}{10}=1$
70$ \frac{70-100}{10}=-3$
100$ \frac{100-100}{10}=0$

Note that: If Z-Score,

  • has a zero value then it means that the raw score is equal to the population mean.
  • has a positive value then it means that the raw score is above the population mean.
  • has a negative value then it means that the raw score is below the population mean.
The Z-Score Definition, Formula, Real Life Examples

Example 2: Suppose you got 80 marks in an Exam of a class and 70 marks in another exam of that class. You are interested in finding that in which exam you have performed better. Also, suppose that the mean and standard deviation of exam-1 are 90 and 10 and in exam-2 60 and 5 respectively. Converting both exam marks (raw scores) into the standard score, we get

$Z_1=\frac{80-90}{10} = -1$

The Z-score results ($Z_1=-1$) show that 80 marks are one standard deviation below the class mean.

$Z_2=\frac{70-60}{5}=2$

The Z-score results ($Z_2=2$) show that 70 marks are two standard deviations above the mean.

From $Z_1$ and $Z_2$ means that in the second exam, students performed well as compared to the first exam. Another way to interpret the Z score of $-1$ is that about 34.13% of the students got marks below the class average. Similarly, the Z Score of 2 implies that 47.42% of the students got marks above the class average.

Application of Z Score

  • Identifying Outliers: The standard score can help in identifying the outliers in a dataset. By looking for data points with very high negative or positive z-scores, one can easily flag potential outliers that might warrant further investigation.
  • Comparing Data Points from Different Datasets: Z-scores allow us to compare data points from different datasets because these scores are expressed in standard deviation units.
  • Standardizing Data for Statistical Tests: Some statistical tests require normally distributed data. The Zscore can be used to standardize data (transforming it to have a mean of 0 and a standard deviation of 1), making it suitable for such tests.

Limitation of ZScores

  • Assumes Normality: The Zscores are most interpretable when the data is normally distributed (a bell-shaped curve). If the data is significantly skewed, the scores might be less informative.
  • Sensitive to Outliers: The presence of extreme outliers can significantly impact the calculation of the mean and standard deviation, which in turn, affects the standard score of all data points.

In conclusion, z-scores are a valuable tool for understanding the relative position of a data point within its dataset. The standard score offers a standardized way to compare data points, identify outliers, and prepare data for statistical analysis. However, it is important to consider the assumptions of normality and the potential influence of outliers when interpreting the z-scores.

Read about Standard Normal Table

Visit Online MCQs Website: gmstat.com

Sufficient Estimators and Sufficient Statistics

Introduction to Sufficient Estimator and Sufficient Statistics

An estimator $\hat{\theta}$ is sufficient if it makes so much use of the information in the sample that no other estimator could extract from the sample, additional information about the population parameter being estimated.

The sample mean $\overline{X}$ utilizes all the values included in the sample so it is a sufficient estimator of the population mean $\mu$.

Sufficient estimators are often used to develop the estimator that has minimum variance among all unbiased estimators (MVUE).

If a sufficient estimator exists, no other estimator from the sample can provide additional information about the population being estimated.

If there is a sufficient estimator, then there is no need to consider any of the non-sufficient estimators. A good estimator is a function of sufficient statistics.

Let $X_1, X_2,\cdots, X_n$ be a random sample from a probability distribution with unknown parameter $\theta$, then this statistic (estimator) $U=g(X_1, X_,\cdots, X_n)$ observation gives $U=g(X_1, X_2,\cdots, X_n)$ does not depend upon population parameter $\Theta$.

Sufficient Statistics Example

The sample mean $\overline{X}$ is sufficient for the population mean $\mu$ of a normal distribution with known variance. Once the sample mean is known, no further information about the population mean $\mu$ can be obtained from the sample itself, while the median is not sufficient for the mean; even if the median of the sample is known, knowing the sample itself would provide further information about the population mean $\mu$.

Mathematical Definition of Sufficiency

Suppose that $X_1,X_2,\cdots,X_n \sim p(x;\theta)$. $T$ is sufficient for $\theta$ if the conditional distribution of $X_1,X_2,\cdots, X_n|T$ does not depend upon $\theta$. Thus
\[p(x_1,x_2,\cdots,x_n|t;\theta)=p(x_1,x_2,\cdots,x_n|t)\]
This means that we can replace $X_1,X_2,\cdots,X_n$ with $T(X_1,X_2,\cdots,X_n)$ without losing information.

Sufficient Estimator Sufficient Statistics

For further reading visit: https://en.wikipedia.org/wiki/Sufficient_statistic

Computer MCQs Test Online

Consistent Estimator: Easy Learning

Statistics is a consistent estimator of a population parameter if “as the sample size increases, it becomes almost certain that the value of the statistics comes close (closer) to the value of the population parameter”. If an estimator (statistic) is considered consistent, it becomes more reliable with a large sample ($n \to \infty$). All this means that the distribution of the estimates becomes more and more concentrated near the value of the population parameter that is being estimated, such that the probability of the estimator being arbitrarily closer to $\theta$ converges to one (sure event).

Consistent Estimator

The estimator $\hat{\theta}_n$ is said to be a consistent estimator of $\theta$ if for any positive $\varepsilon$;
\[limit_{n \rightarrow \infty} P[|\hat{\theta}_n-\theta| \le \varepsilon]=1\]
or
\[limit_{n\rightarrow \infty} P[|\hat{\theta}_n-\theta|> \varepsilon]=0]\]

Here $\hat{\theta}_n$ expresses the estimator of $\theta$, calculated by using a sample size of size $n$.

Consistent Estimator
  • The sample median is a consistent estimator of the population mean if the population distribution is symmetrical; otherwise, the sample median would approach the population median, not the population mean.
  • The sample estimate of standard deviation is biased but consistent as the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases.
  • A sample statistic can be an inconsistent estimator, whereas a consistent statistic is unbiased in the limit but an unbiased estimator may or may not be consistent.

Note that these two are not equivalent: (1) Unbiasedness is a statement about the expected value of the sampling distribution of the estimator, while (2) Consistency is a statement about “where the sampling distribution of the estimator is going” as the sample size.

A consistent estimate has insignificant (non-significant) errors (variations) as sample sizes increase indefinitely. More specifically, the probability that those errors will vary by more than a given amount approaches zero as the sample size increases. In other words, the more data you collect, the more consistent the estimator will be with the real population parameter you’re trying to measure. The sample mean ($\overline{X}$) and sample variance ($S^2$) are two well-known consistent estimators.

Statistics Help

R Language Lectures

Point Estimation of Parameters

Introduction to Point Estimation of Parameters

The objective of point estimation of parameters is to obtain a single number from the sample which will represent the unknown value of the parameter.

Practically we did not know about the population mean and standard deviation i.e. population parameters such as mean, standard deviation, etc. However, our goal is to measure (estimate) the mean and standard deviation of the population we are interested in from sample information to save time, cost, etc.  This can be done by estimating the sample mean and standard deviation as the best guess for the true population mean and standard deviation.  We can call this estimate a “best guess” and termed a “point estimate” as it is a single number summarized one.

Point Estimate

A Point Estimate is a statistic (a statistical measure from the sample) that gives a plausible estimate (or possibly a best guess) for the value in question.

$\overline{x}$ is a point estimate for $\mu$ and s is a point estimate for $\sigma$.

Or we can say that

A statistic used to estimate a parameter is called a point estimator or simply an estimator. The actual numerical value which we obtain for an estimator in a given problem is called an estimate.

Generally symbol $\theta$ (unknown constant) is used to denote a population parameter which may be a proportion, mean, or some measure of variability. The available information is in the form of a random sample $X_1, X_2, \cdots, X_n$ of size n drawn from the population. We wish to formulate a function of the sample observations $X_1, X_2, \cdots, X_n$; that is, we look for a statistic such that its value computed from the sample data would reflect the value of the population parameter as closely as possible. The estimator of $\theta$ is commonly denoted by $\hat{\theta}$. Different random samples usually provide different values of the statistic $\hat{\theta}$ having its sampling distribution.

Note that Unbiasedness, Efficiency, Consistency, and Sufficiency are the criteria (statistical properties of the estimator) to identify whether a statistic is a “good” estimator.

Application of Point Estimator Confidence Intervals

We can build intervals with confidence as we are not only interested in finding the point estimate for the mean but also in determining how accurate the point estimate is. Here the Central Limit Theorem plays a very important role in building confidence interval.  We assume that the sample standard deviation is close to the population standard deviation (which will almost always be true for large samples). The standard deviation of the sampling distribution of the estimator (here for mean) is

\[\sigma_x \approx \frac{\sigma}{\sqrt{n}}\]

Our interest is to find an interval around $\overline{x}$ such that there is a large probability that the actual (true) mean falls inside the computed interval.  This interval is called a confidence interval and the large probability is called the confidence level.

Example of Point Estimation of Parameters

Question: Suppose that we check for clarity in 50 locations in Lake and discover that the average depth of clarity of the lake is 14 feet with a standard deviation of 2 feet.  What can we conclude about the average clarity of the lake with a 95% confidence level?

Solution: Variable $x$ (depth of lack at 50 locations) can be used to provide a point estimate for $\mu$ and s to provide a point estimate for $s$. To answer how accurate is $x$ as a point estimate, we can construct a 95% confidence interval for $\mu$ as follows.

normal curve: Point Estimation of Parameters

Draw the picture given below and use the standard normal table to find the z-score associated with the probability of .025 (there is .025 to the left and .025 to the right i.e. two-tailed case).

The Z-score for a 95% confidence level is about $\pm 1.96$.

\begin{align*}
Z&=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\\
\pm 1.96&=\frac{\overline{x}-\mu}{\frac{2}{\sqrt{n}}}\\
\overline{x}-14&=\pm 0.5488
\end{align*}

Note that $Z\frac{\sigma}{\sqrt{n}}$ is called the margin of error.

The 95% confidence interval for the mean clarity will be (13.45, 14.55)

In other words, there is a 95% chance that the mean clarity is between 13.45 and 14.55.

In general, if $z$ is the standard normal table value associated with a given level of confidence then a $\alpha$% confidence interval for the mean is

\[\overline{x} \pm Z_{\alpha}\frac{\sigma}{\sqrt{n}}\]

See more at Wikipedia about Point Estimation of Parameters

R Frequently Asked Questions