Basic Statistics - Statistics for Data Science & Analytics

Properties of Arithmetic Mean with Examples

Aug 31, 2024Aug 25, 2024 by Muhammad Imdad Ullah

In this post, we will discuss about properties of Arithmetic mean with Examples.

Arithmetic Mean

The arithmetic mean, often simply referred to as the mean, is a statistical measure that represents the central value of a dataset. The arithmetic mean is calculated by summing all the values in the dataset and then dividing by the total number of observations in the data.

The Sum of Deviations From the Mean is Zero

Property 1: The sum of deviations taken from the mean is always equal to zero. Mathematically $\sum\limits_{i=1}^n (x_i-\overline{x}) = 0$

Consider the ungrouped data case.

Obs. No.	$X$	$X_i-\overline{X}$
1	81	-19
2	100	0
3	96	-4
4	108	8
5	90	-10
6	102	2
7	104	4
8	103	3
9	100	0
10	109	9
11	91	-9
12	116	16
Total	$\sum X_i = 1200$	$\sum\limits_{i=1}^n (X_i-\overline{X})=0$

For grouped data $\overline{X} = \sum\limits_{i=1}^k f_i(X_i -\overline{X}) =0$, where for grouped data $\overline{X} =\frac{\sum\limits_{i=1}^n M_i f_i}{\sum\limits_{i=1}^n f_i}$. Suppose, we have the following grouped data

Classes	$f$	$M$	$fM$	$f_i(M_i – \overline{X})$
65 – 85	9	75	675	$9\times (75 – 123) = -432$
85 – 105	10	95	950	$10\times (95 – 123) = -280$
105 – 125	17	115	1955	$17\times (115 – 123) = -136$
125 – 145	10	135	1350	$10\times (135 – 123) = 120$
145 – 165	5	155	775	$5\times (155 – 123) = 160$
165 – 185	4	175	700	$4\times (175 – 123) = 208$
185 – 205	5	195	975	$5\times (195 – 123) = 360$
Total	$\Sigma f = n = 60$		$\Sigma fM = 7380$	$\sum\limits_{i=1}^k f_i(X_i -\overline{X}) =0$

Mean = $\overline{X} = \frac{\Sigma fM}{\Sigma f} = \frac{7380}{60} = 123$ .

The Combined Mean of Different Data Sets

Property 2: If there are different sets of data say $k$ then the combined mean/ average is

\begin{align*}
\overline{X}_c &= \frac{n_1 \overline{x}_1 + n_2\overline{x}_2 +\cdots + n_k \overline{x}_k }{n_1+n_2\cdots + n_k}\\
&=\frac{\Sigma x_1 + \Sigma x_2 + \cdots + \Sigma x_k}{n_1+n_2+\cdots + n_k}
\end{align*}

Suppose, we have data of $k$ groups.

Obs. No.	$X_1$	$X_2$	$X_3$	$X_4$	$X_5$
1	81	40	92	107	113
2	100	30	95	110	94
3	96	22	99	114	93
4	108	51	94	109	119
5	90		101	116	105
6	102		103	118
7	104		100	115
8	103		102
9	100		101
10	109
11	91
12	116
Sum	1200	143	887	789	524

For \begin{align*}
\overline{X}_1 &= \frac{\sum\limits_{i=1}^n X_1}{n_1} = \frac{1200}{12} = 100\\
\overline{X}_2 &= \frac{\sum\limits_{i=1}^n X_2}{n_2} = \frac{143}{4} = 35.8\\
\overline{X}_3 &= \frac{\sum\limits_{i=1}^n X_3}{n_3} = \frac{887}{9} = 98.6\\
\overline{X}_4 &= \frac{\sum\limits_{i=1}^n X_4}{n_4} = \frac{789}{7} = 112.7\\
\overline{X}_5 &= \frac{\sum\limits_{i=1}^n X_5}{n_5} = \frac{524}{5} = 104.8\\
\overline{X}_c &= \frac{n_1\overline{X}_1 + n_2 \overline{X}_2 + \cdots + n_5 \overline{X}_5}{n_1+n_2+n_3+n_4+n_5}\\
&=\frac{12\times 100 + 4\times 35.8 + 9\times 98.6 + 7\times 112.7 + 5\times 104.8}{12+4+9+7+5} =\frac{3543.5}{37} = 95.7703
\end{align*}

For combined mean, not all the data set needs to be ungrouped or grouped. It may be possible that some data sets are ungrouped and some data sets are grouped.

Sum Squared Deviations from the Mean are Always Minimum

Property 3: The sum of the squared deviations of the observations from the arithmetic mean is minimum, which is less than the sum of the squared deviations of the observations from any other values. In other words, the sum of squared deviations from the mean is less than the sum of squared deviations from any other value. Mathematically,

For Ungrouped Data: $\Sigma (X_i – \overline{X})^2 < \Sigma (X_i – A)^2$

For Grouped Data: $\Sigma f(X_i – \overline{x})^2 < \Sigma f(M_i – A)^2$

where $A$ is any arbitrary value, also known as provisional mean. For this condition, $A$ is not equal to the arithmetic mean.

Note that the difference between the sum of deviations and the sum of squared deviations is that in the sum of deviations we take the difference (subtract) of each observation from the mean and then sum all the differences. In the sum of squared deviations, we take the difference of each observation from the mean, then take the square of all the differences, and then sum all the resultant values at the end.

Properties of Arithmetic Mean with Examples

From the above calculations, it can observed that $\Sigma (X_i – \overline{X})^2 < \Sigma (X_i – 90)^2 < \Sigma (X_i – 99)^2$.

No Resistant to Outliers

Property 4: The arithmetic mean is not resistant to outliers. It means that the arithmetic mean can be misleading if there are extreme values in the data.

Arithmetic Mean is Sensitive to Outliers

Property 5: The arithmetic mean is sensitive to extreme values (outliers) in the data. If there are a few very large or very small values, they can significantly influence the mean.

The Affect of Change in Scale and Origin

Property 6: If a constant value is added or subtracted from each data point, the mean will be changed by the same amount.
Similarly, if a constant value is multiplied or divided by each data point, the mean will be multiplied or divided by the same constant.

Unique Value

Property 7: For any given dataset, there is only one unique arithmetic mean.

In summary, the arithmetic mean is a widely used statistical measure (a measure of central tendency) that provides a central value for a dataset.
However, it is important to be aware of the properties of arithmetic mean and its limitations, especially when dealing with data containing outliers.

FAQs about Arithmetic Mean Properties

Explain how the sum of deviation from the mean is zero.
What is meant by unique arithmetic mean for a data set?
What is arithmetic mean?
How combined mean of different data sets can be computed, explain.
Elaborate Sum of Squared Deviation is minimum?
What is the impact of outliers on arithmetic mean?
How does a change of scale and origin change the arithmetic mean?

https://rfaqs.com

https://gmstat.com

One Factor Design: A Comprehensive Guide

Aug 10, 2024Aug 8, 2024 by Muhammad Imdad Ullah

One Factor Design: An Introduction

A one factor design (also known as a one-way ANOVA) is a statistical method used to determine if there are significant differences between the means of multiple groups. In this design, there is one independent variable (factor) with multiple levels or categories.

Suppose $y_{ij}$ is the response is the $i$th treatment for the $j$th experimental unit, where $i=1,2,\cdots, I$. The statistical model for a completely randomized one-factor design that leads to a One-Way ANOVA is

$$y_{ij} = \mu_i + e_{ij}$$

where $\mu_i$ is the unknown (population) mean for all potential responses to the $i$th treatment, and $e_{ij}$ is the error (deviation of the response from population mean).

The responses within and across treatments are assumed to be independent and normally distributed random variables with constant variance.

One Factor Design’s Statistical Model

Let $\mu = \frac{1}{I} \sum \limits_{i} \mu_i$ be the grand mean or average of the population means. Let $\alpha_i=\mu_i-\mu$ be the $i$th group treatment effect. The treatment effects are constrained to add to zero ($\alpha_1+\alpha_2+\cdots+\alpha_I=0$) and measure the difference between the treatment population means and the grand mean.

Therefore the one way ANOVA model is $$y{ij} = \mu + \alpha_i + e_{ij}$$

$$Response = \text{Grand Mean} + \text{Treatment Effect} + \text{Residuals}$$

From this model, the hypothesis of interest is whether the population means are equal:

$$H_0:\mu_1=\mu_2= \cdots = \mu_I$$

The hypothesis is equivalent to $H_0:\alpha_1 = \alpha_2 =\cdots = \alpha_I=0$. If $H_0$ is true, then the one-way ANOVA model is

$$ y_{ij} = \mu + e_{ij}$$ where $\mu$ is the common population mean.

One Factor Design Example

Let’s say you want to compare the average test scores of students from three different teaching methods (Method $A$, Method $B$, and Method $C$).

Independent variable: Teaching method (with three levels: $A, B, C$)
Dependent variable: Test scores

When to Use a One Factor Design

Comparing means of multiple groups: When one wants to determine if there are significant differences in the mean of a dependent variable across different groups or levels of a factor.
Exploring the effect of a categorical variable: When one wants to investigate how a categorical variable influences a continuous outcome.

Assumptions of One-Factor ANOVA

Normality: The data within each group should be normally distributed.
Homogeneity of variance (Equality of Variances): The variances of the populations from which the samples are drawn should be equal.
Independence: The observations within each group should be independent of each other.

When to Use One Factor Design

When one wants to compare the means of multiple groups.
When the independent variable has at least three levels.
When the dependent variable is continuous (e.g., numerical).

Note that

If The Null hypothesis is rejected, one can perform post-hoc tests (for example, Tukey’s HSD, Bonferroni) to determine which specific groups differ significantly from each other.

One Factor Design, Design of Experiments

Remember: While one-factor designs are useful for comparing multiple groups, they cannot establish causation.

R Language Frequently Asked Questions

Online Quiz Website

Importance of Statistics in Various Disciplines

Aug 6, 2024 by Muhammad Imdad Ullah

Introduction to the Importance of Statistics

Statistics is used as a tool to make appropriate decisions in the face of uncertainty. We all apply statistical concepts in our daily life either we are educated or uneducated. Therefore the importance of Statistics cannot be ignored.

The information collected in the form of data (observation) from any field/discipline will almost always involve some sort of variability or uncertainty, so this subject has applications in almost all fields of research. The researchers use statistics in the analysis, interpretation, and communication of their research findings.

Some examples of the questions which statistics might help to answer with appropriate data are:

How much better yield of wheat do we get if we use a new fertilizer as opposed to a commonly used fertilizer?
Does the company’s sales are likely to increase in the next year as compared to the previous?
What dose of insecticide is used successfully to monitor an insect population?
What is the likely weather in the coming season?

Importance of Statistics — https://www.intellspot.com/applications-of-statistics/

Application of Statistics

Statistical techniques being powerful tools for analyzing numerical data are used in almost every branch of learning. Statistics plays a vital role in every field of human activity. Statistics has an important role in determining the existing position of per capita income, unemployment, population growth rate, housing, schooling medical facilities, etc in a country. Now statistics holds a central position in almost every field like Industry, Commerce, Biological and Physical sciences, Genetics, Agronomy, Anthropometry, Astronomy, Geology, Psychology, Sociology, etc are the main areas where statistical techniques have been developed and are being used increasingly.

Statistics has its application in almost every field where research is carried out and findings are reported. Application of statistics (by keeping in view the importance of statistics) in different fields as follows:

In social sciences, one of the major objectives is to establish a relationship that exists between certain variables. This end is achieved through postulating hypothesis and their testing by using different statistical techniques. Most of the areas of our economy can be studied by econometric models because these help in forecasting and forecasts are important for future planning.

Plant Sciences

The most important aspect of statistics in plant sciences is its role in the efficient planning of experiments and drawing valid conclusions. A technique in statistics known as “Design of Experiments” helps introduce new varieties. Optimum plot sizes can be worked out for different crops like wheat, cotton, sugarcane, and others under different environmental conditions using statistical techniques.

Physical Sciences

The application of statistics in physical sciences is widely accepted. The researchers use these methods in the analysis, interpretation, and communication of their research findings, linear and nonlinear regression models are used to establish cause and effect relationships between different variables, and also these days computers have facilitated experimentation and it is possible to simulate the process rather than experimentation.

Medical Sciences

The interest may be in the effectiveness of new drugs, the effect of environmental factors, heritability, standardization of various records, and other related problems. Statistics come to the rescue. It helps to plan the next investigation to get trustworthy information from the limited resources. It also helps to analyze the current data and integrate the information with that previously existing.

How statistics is used by banks, insurance companies, Business and economic planning and administration, Accounting and controlling of events, Construction Companies, Politicians

Banks

Banks are a very important economic part of a country. They do their work on the guess that all the depositors do not take their money on the same day. Bankers use probability theory to approximate the deposits and claims to take out their money.

Insurance Companies

Insurance companies play an important role in increasing economic progress. These companies collect payment from the people. They approximate the death rate, accident rate, and average expected life of people from the life tables. The payment per month is decided on these rates.

Business

Business planning for the future is very important such as the price, quality, quantity, demand, etc of a particular product. Businessmen can make correct decisions about the location of the business, marketing of the products, financial resources, etc. Statistics helps a businessman to plan production according to the taste of the customers, the quality of the products can also be checked more efficiently by using statistical methods

The relationship between supply and demand is a very important topic of everyday life. The changes in prices and demands are studied by index numbers. The relation between supply and demand is determined by correlation and regression.

Economic Planning

Economic planning for the future is a very important problem for economists. For example (i) opening of new educational institutions such as schools, and colleges, revision of pay scales of employees, construction of new hospitals, and preparation of government budgets, etc. all these require estimates at some future time which is called forecasting which is done by regression analysis and the different sources of earning, planning of projects, forecasting of economic trends are administered by the use of various statistical techniques.

Accounting and Controlling of Events

All the events in the world are recorded, for example, births, deaths, imports, exports, and crops grown by the farmer etc. These are recorded as statistical data and analyzed to make better policies for the betterment of the nation.

Administrator

An administrator whether in the public or private sector leans on statistical data to provide a factual basis for appropriate decisions.

Politician

A politician uses statistical advantageously to lend support and credence to his argument while elucidating the problems he handles.

Construction Companies

All kinds of construction companies start and run their programs after making judgments about the total cost of the project (job, work). To guess the expenditure a very important statistical technique of estimation is used.

Biology

In biology correlation and regression are used for analysis of hereditary relations. To classify the organization into different classes according to their age, height, weight, hair color, eyebrow color, etc. the rules of classification are tabulation of statistics are used.

https://rfaqs.com

https://gmstat.com

Properties of Arithmetic Mean with Examples

Arithmetic Mean

Table of Contents

The Sum of Deviations From the Mean is Zero

The Combined Mean of Different Data Sets

Sum Squared Deviations from the Mean are Always Minimum

No Resistant to Outliers

Arithmetic Mean is Sensitive to Outliers

The Affect of Change in Scale and Origin

Unique Value

FAQs about Arithmetic Mean Properties

One Factor Design: A Comprehensive Guide

One Factor Design: An Introduction

Table of Contents

One Factor Design’s Statistical Model

One Factor Design Example

When to Use a One Factor Design

Assumptions of One-Factor ANOVA

When to Use One Factor Design

Importance of Statistics in Various Disciplines

Introduction to the Importance of Statistics

Table of Contents

Application of Statistics

Plant Sciences

Physical Sciences

Medical Sciences

Banks

Insurance Companies

Business

Economic Planning

Accounting and Controlling of Events

Administrator

Politician

Construction Companies

Biology

Arithmetic Mean

Table of Contents

The Sum of Deviations From the Mean is Zero

The Combined Mean of Different Data Sets

Sum Squared Deviations from the Mean are Always Minimum

No Resistant to Outliers

Arithmetic Mean is Sensitive to Outliers

The Affect of Change in Scale and Origin

Unique Value

FAQs about Arithmetic Mean Properties

Share this:

One Factor Design: An Introduction

Table of Contents

One Factor Design’s Statistical Model

One Factor Design Example

When to Use a One Factor Design

Assumptions of One-Factor ANOVA

When to Use One Factor Design

Share this:

Introduction to the Importance of Statistics

Table of Contents

Application of Statistics

Social Sciences

Plant Sciences

Physical Sciences

Medical Sciences

Banks

Insurance Companies

Business

Economic Planning

Accounting and Controlling of Events

Administrator

Politician

Construction Companies

Biology

Share this: