Estimation of Population Parameters

Introduction to Estimation of Population Parameters

In statistics, estimating population parameters is important because it allows the researcher to conclude a population (whole group) by analyzing a small part of that population. The estimation of population parameters is done when the population under study is large enough. For example, instead of performing a census, a random sample from the population can be drawn. To draw some conclusions about the population, one can calculate the required sample statistic(s).

Important Terminologies

The following are some important terminologies to understand the concept of estimating the population parameters.

  • Population: The entire collection of individuals or items one is interested in studying. For instance, all the people living in a particular country.
  • Sample: A subgroup (or small portion) chosen from the population that represents the larger group.
  • Parameter: A characteristic that describes the entire population, such as the population mean, median, or standard deviation.
  • Statistic: A value calculated from the sample data used to estimate the population parameter. For example, the sample mean is an estimate of the population mean. It is the characteristics of the sample under study.

Various statistical methods are used to estimate population parameters with different levels of accuracy. The accuracy of the estimate depends on the size of the sample and how well the sample represents the population.

We use statistics calculated from the sample data as estimates for the population parameters.

Estimation of Population Parameters Sample Statistic, Population Parameter
  • Sample mean: is used to estimate the population mean. It is calculated by averaging the values of all observations in the sample, that is the sum of all data values divided by the total number of observations in the data.
  • Sample proportion: is used to estimate the population proportion (percentage). It represents the number of successes (events of interest) divided by the total sample size.
  • Sample standard deviation: is used to estimate the population standard deviation. It reflects how spread out the data points are in the sample.

Types of Estimates

There are two types of estimates:

Estimation of Population Parameters: Point Estimate and Interval Estimate
  • Point Estimate: A single value used to estimate the population parameter. The example of point estimates are:
    • The mean/average height of Boys in Colleges is 65 inches.
    • 65% of Lahore residents support a ban on cell phone use while driving.
  • Interval Estimate: It is a set of values (interval) that is supposed to contain the population parameter. Examples of interval estimates are:
    • The mean height of Boys in Colleges lies between 63.5 and 66.5 inches.
    • 65% ($\pm 3$% of Lahore residents support a ban on cell phone use during driving.

Some Examples

Estimation of population parameters is widely used in various fields of life. For example,

  • a company might estimate customer satisfaction through a sample survey,
  • a biologist might estimate the average wingspan of a specific bird species by capturing and measuring a small group.

https://rfaqs.com

https://gmstat.com

Empirical Probability Examples

Introduction to Empirical Probability

An empirical probability (also called experimental probability) is calculated by collecting data from past trials of the experiments. The experimental probability obtained is used to predict the future likelihood of the event occurring.

Formula and Examples Empirical/ Experimental Probability

To calculate an empirical/ experimental probability, one can use the formula

$$P(A)=\frac{\text{Number of trials in which $A$ occurs} }{$\text{Total number of trials}}$$

  • Coin Flip: Let us flip a coin 200 times and get heads 105 times. The empirical probability of getting heads is $\frac{105}{200} = 0.525%, or 52.5%.
  • Weather Prediction: Let you track the weather for a month and see that it rained 12 out of 30 days. The empirical probability of rain on a given day that month is $\frac{12}{30} = 0.4$ or 40%.
  • Plant Growth: Let you plant 50 seeds and 35 sprout into seedlings. The experimental probability of a seed sprouting is $\frac{35}{50} = 0.70$ or 70%.
  • Board Game: Suppose you play a new board game 10 times and win 6 times. The empirical probability of winning the game is $\frac{6}{10} = 0.6$ or 60%.
  • Customer Preferences: In a survey of 100 customers, 80 prefer chocolate chip cookies over oatmeal raisins. The empirical probability of a customer preferring chocolate chip cookies is $\frac{80}{100} = 0.80$ or 80%.
  • Basketball Game: A basketball player practices free throws and makes 18 out of 25 attempts. The experimental probability of the player making their next free throw is $\frac{18}{25} = 0.72$ or 72%.

Empirical Probability From Frequency Tables

A frequency table calculates the probability that a certain data value falls into any data group or class. Consider the frequency table of examination scores in a certain class.

ClassFrequency ($f$)$frf$
40 – 491$\frac{1}{20}=0.05$
50 – 592$\frac{1}{20}=0.10$
60 – 693$\frac{3}{20}=0.15$
70 – 794$\frac{4}{20}=0.20$
80 – 896$\frac{6}{20}=0.30$
90 – 994$\frac{4}{20}=0.20$

Let event $A$ be the event that a student scores between 90 and 99 on the exam, then

$$P(A) = \frac{\text{Number of students scoring 90-99}}{\text{Total number of students}} = \frac{4}{20} = 0.20$$

Notice that $P(A)$ is the relative frequency of the class 90-99.

Empirical Probability and Classical Probability

Key Points Empirical/ Experimental Probability

  • It is based on actual data, not theoretical models.
  • It is a good approach when the data is from similar events in the past.
  • The more data you have, the more accurate the estimate will be.
  • It is not always perfect, as past results do not guarantee future outcomes.

Limitations Empirical/ Experimental Probability

  • It can be time-consuming and expensive to collect enough data.
  • It may not be representative of the future, especially if the underlying conditions change.

FAQS about Empirical/ Experimental Probability

  1. Define empirical probability.
  2. How one can compute empirical probability, write the formula of empirical probability.
  3. Give real-life examples of empirical/ experimental probability.
  4. What are the limitations of empirical/ experimental probability?
  5. How does empirical/ experimental probability resemble with frequency distribution, explain.
Statistics Help: Empirical Probability

Online Quiz Website

R Frequently Asked Questions

Median of Ungrouped Data

Introduction to Median of Ungrouped Data

The post is about calculating the median ungrouped data. The median is the most central point (middlemost central value) of the data/set of observations, with the condition that the data or set of observations should be arranged in ascending or descending order. The median divides the data into two equal parts. That is the main objective of the median.

It is important to note that the criteria for finding the median for grouped and ungrouped data are different.

The primary and secondary data can be defined as:

  1. Primary data, also called raw or ungrouped data, does not undergo any statistical procedure/method, which is not in the form of frequency distribution.
  2. Secondary data may also be called group data if it is in the form of frequency distribution.

Let us discuss how to find the median for ungrouped data.

There are two cases for ungrouped data. These cases are based on no of observations which is $n$

When the number of observations is odd (Say $n$ i.e. $n$ is odd), and when the number of observations is even (Say $n$ i.e. $n$ is even).

Median Calculations

The data below contains the odd number of observations.

Observation No.
(Ascending Order)
1st2nd3rd4th5th6th7th8th9th10th11th
Data Values81899096100102103104108109118
(Descending Order)1110987654321

Since the number of observations is odd ($n = 11$), the central value after arranging in ascending order will be the 6th value. and the 6th value is 102. That is the median is 102 for the above data.

The position of the median can be located mathematically, as follows:

\begin{align*}
\tilde{x} &= \left( \frac{n+1}{2} \right)th\,\, \text{value}\\
&=\frac{11+1}{2} = 6th\,\, \text{value}
\end{align*}

The value at the 6th position (from sorted data) is 102. The $\tilde{x}$ can be read as “x-tild” which is the notation of the median.

Median for Even Numbers of Observations

Consider the following data that contains an even number of observations.

Observation No.12345678910
Data Values81100961089010210410310989

Data after sorting (either in ascending or descending order) is

Observations No.1st2nd3rd4th5th6th7th8th9th10th
x81899096100102103104108109

Since $n=10$ which is even, the central position (that is median) lies between the 5th value and the 6th value. This central value is the average of the 5th and 6th values (from the sorted data). The average of these two central observations is called the median. The two central positions are 100 and 102, take the average of these two numbers and find the median.

$$Median = \frac{100+102}{2} = 101$$

Median Formula for Large Data Sets

The median formula for large or small data sets can be represented mathematically.

  • For large data sets one can find the median of data mathematically. The formula for both odd number of observations and even numbers of observations is different.

The point to remember when computing the median is that

  • For an odd number of observations, the median is the centermost value after sorting the data
  • For an even number of observations, the median is the average of two central values after sorting the data

\begin{align*}
\tilde{x} &= \frac{1}{2} \left[ \left(\frac{n}{2}th \, \, value \right)+ \left(\frac{n}{2}+1 \right)the \,\, value \right]\quad \quad \text{(When observations are even)}\\
&= \frac{n+1}{2} \quad \quad \text{(when observations are odd)}
\end{align*}

The other way of the median formula is

Median of ungrouped data

Consider, a data set containing 157 observations. To compute the median, first of all, you need to sort the data in either ascending or descending order. The formula for this data will be

$$\tilde{x} = \frac{n+1}{2} = \frac{157+1}{2}=79th$$.

The 79th observation in the sorted data will be the median of the data.

In case, if there are even number of observations (say $n=396$, the median will be

\begin{align*}
\tilde{x} &= \frac{1}{2}\left[\left(\frac{n}{2}\right)th + \left(\frac{n+1}{2}\right)th \right]\\
&=\frac{1}{2} \left[\frac{396}{2}th + \frac{396}{2}+1 \right]\\
&= \frac{1}{2} \left[198th + 199th\right]
\end{align*}

The average of 198th value and 199th value from the sorted data will be the median of the data.

https://rfaqs.com

https://gmstat.com

Statistical Inference: An Introduction

Introduction to Statistical Inference

Inference means conclusion. When we discuss statistical inference, it is the branch of Statistics that deals with the methods to make conclusions (inferences) about a population (called reference population or target population), based on sample information. The statistical inference is also known as inferential statistics. As we know, there are two branches of Statistics: descriptive and inferential.

Statistical inference is a cornerstone of many fields of life. It allows the researchers to make informed decisions based on data, even when they can not study the entire population of interest. The statistical inference has two fields of study:

Statistical Inference

Estimation

Estimation is the procedure by which we obtain an estimate of the true but unknown value of a population parameter by using the sample information that is taken from that population. For example, we can find the mean of a population by computing the mean of a sample drawn from that population.

Estimator

The estimator is a statistic (Rule or formula) whose calculated values are used to estimate (a wise guess from data information) is used to estimate a population parameter $\theta$.

Estimate

An estimate is a particular realization of an estimator $\hat{\theta}$. It is the notation of a sample statistic.

Types of Estimators

An estimator can be classified either as a point estimate or an interval estimate.

Point Estimate

A point estimate is a single number that can be regarded as the most plausible value of the $\theta$ (notation for a population parameter).

Interval Estimate

An interval estimate is a set of values indicating confidence that the interval will contain the true value of the population parameter $\theta$.

Testing of Hypothesis

Testing of Hypothesis is a procedure that enables us to decide, based on information obtained by sampling procedure whether to accept or reject a specific statement or hypothesis regarding the value of a parameter in a Statistical problem.

Note that since we rely on samples, there is always some chance our inferences are not perfect. Statistical inference acknowledges this by incorporating concepts like probability and confidence intervals. These help us quantify the uncertainty in our estimates and test results.

Important Considerations about Testing of Hypothesis

  • Hypothesis testing does not prove anything; it provides evidence for or against a claim.
  • There is always a chance of making errors (Type I or Type II).
  • The results are specific to the chosen sample and significance level.

FAQs about Statistical Inference

  1. Define the term estimation.
  2. Define the term estimate.
  3. Define the term estimator.
  4. Write a short note on statistical inference.
  5. What is statistical hypothesis testing?
  6. What is the estimation in statistics?
  7. What are the types of estimations?
  8. Write about point estimation and intervention estimation.

https://rfaqs.com, https://gmstat.com