Muhammad Imdad Ullah - Statistics for Data Science & Analytics

Determination of Sample Size: A Quick Tutorial

Sep 2, 2024 by Muhammad Imdad Ullah

By determination of sample size, we mean to select the appropriate number of observations/ persons/ subjects from a large group to use in a sample. A sample with an appropriate number of observations and a sample with an appropriate size so that the results are statistically valid and accurate estimate the population parameters.

Importance of Determining the Sample Size

Determination of sample size is important as appropriate sample size usually saves time, costs, and labor involved in studying the members of the population. It also helps to select a representative sample of objects/subjects if an appropriate sampling technique is used for the selection of objects/subjects.

Therefore, it is important to remember that, a good sample size depends on the contexts and goals of the research being done. On the other hand, a good sample size results in reliable statistical estimates and represents the population under study accurately. In general, large sample sizes are considered better as they reduce the likelihood of sampling error. However, the larger the sample larger the time, cost, and labor required to collect the sample. The sample size directly affects the accuracy and reliability of your findings.

The margin of error will decrease by drawing a larger sample, for a given confidence level say $c$, standard deviation $\sigma$.

Determination of Sample Size and Sample Size Formula

One can determine the sample size if the maximum allowable error and level of confidence are known. If population standard deviation can be estimated, then the necessary sample size can be determined by simplifying the error formula for $n$.

The maximum allowable error is: $E=Z \frac{\sigma}{\sqrt{n}}$

By multiplying both sides with $\sqrt{n}$, we have

$E\sqrt{n} = Z \sigma$

Dividing both sides by $E$, we obtain $\sqrt{n} = \frac{Z\sigma}{E}$

Finally, squaring both sides, we get the sample size formula:
$$n=\left(\frac{Z\sigma}{E}\right)^2$$

Example: Determining Sample Size

Suppose, we are interested in finding the average weight of Pakistani men, and we want to be 95% confident that our estimate falls within $\pm2$lbs, of the actual mean. Let’s suppose that according to the previous studies, the population standard deviation (or estimated standard deviation) is $\sigma = 18.4$lbs. We are interested in determining sample size from the above-given information.

According to the given information
$\alpha = 0.05, Z_\alpha = 1.96$, the desired maximum error is $E=2$, and the estimated $\sigma = 18.4$. Therefore,
$$n=\left(\frac{Z\sigma}{E}\right)^2 = \left( \frac{1.96 \times 18.4}{2}\right)^2 \approx 325.15$$
The appropriate sample size for the above scenario should be 326 men for the given desired level of accuracy.

Summary

Note that If the population under study is highly diverse (heterogeneous population), a larger sample size may be necessary to ensure adequate representation of different subgroups. The type of study (e.g., survey, experiment) and the research questions can also influence the appropriate sample size. Similarly, the Practical Constraints: Factors such as budget, time, and accessibility can limit the feasible sample size.

https://rfaqs.com

https://gmstat.com

Properties of Arithmetic Mean with Examples

Aug 31, 2024Aug 25, 2024 by Muhammad Imdad Ullah

In this post, we will discuss about properties of Arithmetic mean with Examples.

Arithmetic Mean

The arithmetic mean, often simply referred to as the mean, is a statistical measure that represents the central value of a dataset. The arithmetic mean is calculated by summing all the values in the dataset and then dividing by the total number of observations in the data.

The Sum of Deviations From the Mean is Zero

Property 1: The sum of deviations taken from the mean is always equal to zero. Mathematically $\sum\limits_{i=1}^n (x_i-\overline{x}) = 0$

Consider the ungrouped data case.

Obs. No.	$X$	$X_i-\overline{X}$
1	81	-19
2	100	0
3	96	-4
4	108	8
5	90	-10
6	102	2
7	104	4
8	103	3
9	100	0
10	109	9
11	91	-9
12	116	16
Total	$\sum X_i = 1200$	$\sum\limits_{i=1}^n (X_i-\overline{X})=0$

For grouped data $\overline{X} = \sum\limits_{i=1}^k f_i(X_i -\overline{X}) =0$, where for grouped data $\overline{X} =\frac{\sum\limits_{i=1}^n M_i f_i}{\sum\limits_{i=1}^n f_i}$. Suppose, we have the following grouped data

Classes	$f$	$M$	$fM$	$f_i(M_i – \overline{X})$
65 – 85	9	75	675	$9\times (75 – 123) = -432$
85 – 105	10	95	950	$10\times (95 – 123) = -280$
105 – 125	17	115	1955	$17\times (115 – 123) = -136$
125 – 145	10	135	1350	$10\times (135 – 123) = 120$
145 – 165	5	155	775	$5\times (155 – 123) = 160$
165 – 185	4	175	700	$4\times (175 – 123) = 208$
185 – 205	5	195	975	$5\times (195 – 123) = 360$
Total	$\Sigma f = n = 60$		$\Sigma fM = 7380$	$\sum\limits_{i=1}^k f_i(X_i -\overline{X}) =0$

Mean = $\overline{X} = \frac{\Sigma fM}{\Sigma f} = \frac{7380}{60} = 123$ .

The Combined Mean of Different Data Sets

Property 2: If there are different sets of data say $k$ then the combined mean/ average is

\begin{align*}
\overline{X}_c &= \frac{n_1 \overline{x}_1 + n_2\overline{x}_2 +\cdots + n_k \overline{x}_k }{n_1+n_2\cdots + n_k}\\
&=\frac{\Sigma x_1 + \Sigma x_2 + \cdots + \Sigma x_k}{n_1+n_2+\cdots + n_k}
\end{align*}

Suppose, we have data of $k$ groups.

Obs. No.	$X_1$	$X_2$	$X_3$	$X_4$	$X_5$
1	81	40	92	107	113
2	100	30	95	110	94
3	96	22	99	114	93
4	108	51	94	109	119
5	90		101	116	105
6	102		103	118
7	104		100	115
8	103		102
9	100		101
10	109
11	91
12	116
Sum	1200	143	887	789	524

For \begin{align*}
\overline{X}_1 &= \frac{\sum\limits_{i=1}^n X_1}{n_1} = \frac{1200}{12} = 100\\
\overline{X}_2 &= \frac{\sum\limits_{i=1}^n X_2}{n_2} = \frac{143}{4} = 35.8\\
\overline{X}_3 &= \frac{\sum\limits_{i=1}^n X_3}{n_3} = \frac{887}{9} = 98.6\\
\overline{X}_4 &= \frac{\sum\limits_{i=1}^n X_4}{n_4} = \frac{789}{7} = 112.7\\
\overline{X}_5 &= \frac{\sum\limits_{i=1}^n X_5}{n_5} = \frac{524}{5} = 104.8\\
\overline{X}_c &= \frac{n_1\overline{X}_1 + n_2 \overline{X}_2 + \cdots + n_5 \overline{X}_5}{n_1+n_2+n_3+n_4+n_5}\\
&=\frac{12\times 100 + 4\times 35.8 + 9\times 98.6 + 7\times 112.7 + 5\times 104.8}{12+4+9+7+5} =\frac{3543.5}{37} = 95.7703
\end{align*}

For combined mean, not all the data set needs to be ungrouped or grouped. It may be possible that some data sets are ungrouped and some data sets are grouped.

Sum Squared Deviations from the Mean are Always Minimum

Property 3: The sum of the squared deviations of the observations from the arithmetic mean is minimum, which is less than the sum of the squared deviations of the observations from any other values. In other words, the sum of squared deviations from the mean is less than the sum of squared deviations from any other value. Mathematically,

For Ungrouped Data: $\Sigma (X_i – \overline{X})^2 < \Sigma (X_i – A)^2$

For Grouped Data: $\Sigma f(X_i – \overline{x})^2 < \Sigma f(M_i – A)^2$

where $A$ is any arbitrary value, also known as provisional mean. For this condition, $A$ is not equal to the arithmetic mean.

Note that the difference between the sum of deviations and the sum of squared deviations is that in the sum of deviations we take the difference (subtract) of each observation from the mean and then sum all the differences. In the sum of squared deviations, we take the difference of each observation from the mean, then take the square of all the differences, and then sum all the resultant values at the end.

Properties of Arithmetic Mean with Examples

From the above calculations, it can observed that $\Sigma (X_i – \overline{X})^2 < \Sigma (X_i – 90)^2 < \Sigma (X_i – 99)^2$.

No Resistant to Outliers

Property 4: The arithmetic mean is not resistant to outliers. It means that the arithmetic mean can be misleading if there are extreme values in the data.

Arithmetic Mean is Sensitive to Outliers

Property 5: The arithmetic mean is sensitive to extreme values (outliers) in the data. If there are a few very large or very small values, they can significantly influence the mean.

The Affect of Change in Scale and Origin

Property 6: If a constant value is added or subtracted from each data point, the mean will be changed by the same amount.
Similarly, if a constant value is multiplied or divided by each data point, the mean will be multiplied or divided by the same constant.

Unique Value

Property 7: For any given dataset, there is only one unique arithmetic mean.

In summary, the arithmetic mean is a widely used statistical measure (a measure of central tendency) that provides a central value for a dataset.
However, it is important to be aware of the properties of arithmetic mean and its limitations, especially when dealing with data containing outliers.

FAQs about Arithmetic Mean Properties

Explain how the sum of deviation from the mean is zero.
What is meant by unique arithmetic mean for a data set?
What is arithmetic mean?
How combined mean of different data sets can be computed, explain.
Elaborate Sum of Squared Deviation is minimum?
What is the impact of outliers on arithmetic mean?
How does a change of scale and origin change the arithmetic mean?

https://rfaqs.com

https://gmstat.com

Testing a Claim about a Mean Using a Large Sample: Secrets

Sep 2, 2024Aug 22, 2024 by Muhammad Imdad Ullah

In this post, we will learn about “Testing a claim about a Mean” using a Large sample. Before going to the main topic, we need to understand some related basics.

Hypothesis Testing

When a hypothesis test involves a claim about a population parameter (in our case mean/average), we draw a representative sample from the target population and compute the sample mean to test the claim about population. If the sample drawn is large enough ($n\ge 30$), then the Central Limit Theorem (CLT) applies, and the distribution of the sample mean is assumed to be approximately normal, that is we have $\mu_{\overline{x}} = \mu$ and $\sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}} \approx \frac{s}{\sqrt{c}}$.

Testing a Claim about a Mean

It is worth noting that $s$ and $n$ are known from the sample data, and we have a good estimate of $\sigma_{\overline{x}}$ but the population mean $\mu$ is not known to us. The $\mu$ is the parameter that we are testing a claim about a mean. To have a value for $\mu$, we will always assume that the null hypothesis is true in any hypothesis test.

It is also worth noting that the null hypothesis must be of one of the following types:

$H_0:\mu = \mu_o$
$H_0:\mu \ge \mu_0$
$H_0:\mu \le \mu_0$

where $\mu_0$ is a constant, and we will always assume that the purpose of our test is that $\mu=mu_0$.

Standardized Test Statistic

To determine whether to reject or not reject the null hypothesis, we have two methods namely (i) a standardized value and (ii) a p-value. In both cases, it will be more convenient to convert the sample mean $\overline{x}$ to a Z-score called the standardized test statistic/score.

Since, we assumed that $\mu=\mu_0$, and we have $\mu_{\overline{x}} =\mu_0$, then the standardized statistic is:

$$Z = \frac{\overline{x} – \mu _{\overline{x}}} {\sigma_{\overline{x}} } = \frac{\overline{x} – \mu _{\overline{x}}} {\frac{s}{\sqrt{n}} }$$

As long as $\mu=\mu_0$ is assumed, the distribution standardized test statistics $Z$ is Standard Normal Distribution.

Example: Testing a Claim about an Average/ Mean

Suppose the average body temperature of a healthy person is less than the commonly accepted temperature of $98.6^{o}F$. Assume that a sample of 60 healthy persons is drawn. The average temperature of these 60 persons is $\overline{x}=98.2^oF$ and the sample standard deviation is $s=1.1^oF$.

The hypothesis of the above statement/claim would be

$H_0:\mu\ge 98.6$
$H_1:\mu < 98.6$

Note that from the alternative hypothesis, we have a left-tailed test with $\mu_0=98.6$.

Based on our sample data, the standardized test statistic is

\begin{align*}
Z &= \frac{\overline{x} – \mu _{\overline{x} } } {\frac{s}{\sqrt{n} } }\\
&=\frac{98.2 – 98.6}{\frac{1.1}{\sqrt{60}}} \approx -2.82
\end{align*}

Learn R Programming Language

Online Quiz Website

Determination of Sample Size: A Quick Tutorial

Table of Contents

Importance of Determining the Sample Size