Statistics for Data Science & Analytics - Statistics MCQs, Software & Data Analysis

Multiple Regression Analysis

Jul 7, 2024 by Muhammad Imdad Ullah

Post Views: 1,368

Introduction to Multiple Regression Analysis

Francis Galton (a biometrician) examines the relationship between fathers’ and sons’ height. He analyzed the similarities between the parent and child generation of 700 sweet peas. Galton found that the offspring of tall parents tended to be shorter and offspring of shorter parents tended to be taller. The height of the children depends ($Y$) upon the height of the parents ($X$). In case, there is more than one independent variable (IV), we need multiple regression analysis (MRA), also called multiple linear regression (MLR).

Multiple Linear Regression Model

The linear regression model (equation) for two independent variables (regressors) is

$$Y_{ij} = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \varepsilon_{ij}$$

The general linear regression model (equation) for $k$ independent variables is

$$Y_{ij} = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3X_{3i} + \cdots + \varepsilon_{ij}$$

The $\beta$s are all regression coefficients (partial slopes) and the $\alpha$ is the intercept.

The sample linear regression model is

$$\hat{y} = \hat{\alpha} + \hat{\beta}_1 x_{1i} + \hat{\beta}_2x_{2i} + \hat{\varepsilon}_{ij}$$

Multiple Regression Coefficients Formula

To fit the MLR equation for two variables, one needs to compute the values of $\hat{\beta}_1, \hat{\beta}_2$, and $\alpha$.

Multiple Regression Analysis Partial Coefficient 1

The yellow part of the above formula is the (“sum of the product of 1st independent and dependent variables”) multiplied by the (“sum of the square of 2nd independent variable).

The red part of the above formula is the (“Sum of the product of 2nd independent and dependent variables”) multiplied by the (“sum of the product of two independent variables”).

The green part of the above formula is the (“sum of the square of 1st independent variable”) multiplied by the (“sum of the square of 2nd independent variable”).

The blue part of the above formula is the (“square of the sum of the product of two independent variables”).

The formula for 2nd regression coefficient is

In short, note that the $S$ stands for the sum of squares and the sum of products.

Multiple Linear Regression Example

Consider the following data about two regressors ($X_1, X_2$) and one regressand variable ($Y$).

$Y$	$X_1$	$X_2$	$X_1 y$	$X_2 y$	$X_1 X_2$	$X_1^2$	$X_2^2$
30	10	15	300	450	150	100	225
22	5	8	110	176	40	25	64
16	10	12	160	192	120	100	144
7	3	7	21	49	21	9	49
14	2	10	28	140	20	4	100

89	30	52	619	1007	351	238	582

\begin{align*}
S_{x_1Y} &= \sum X_1 y – \frac{\sum X_1 \sum Y}{n} = 619 – \frac{30\times 59}{5} = 265\\
S_{x_1x_2} &= \sum X_1 X_2 – \frac{\sum X_1 \sum X_2}{n} = 351 – \frac{30 \times 52}{5} = 39\\
S_{X_1^2} &= \sum X_1^2 – \frac{(\sum X_1)^2}{n} = 238 -\frac{30^2}{5} = 58\\
S_{X_2^2} &= \sum X_2^2 – \frac{(\sum X_2)^2}{n} = 582 – \frac{52^2}{5} = 41.2\\
S_{X_2 y} &= \sum X_2 Y – \frac{\sum X_2 \sum Y}{n} =1007 – \frac{52 \times 89}{5} = 81.4
\end{align*}

\begin{align*}
\hat{\beta}_1 &= \frac{(S_{X_1 Y})(S_{X_2^2}) – (S_{X_2Y})(S_{X_1 X_2}) }{(S_{X_1^2})(S_{X_2^2}) – (S_{X_1X_2})^2} = \frac{(265)(41.2) – (81.4)(39)}{(58)(41.2) – (39)^2} = 8.91\\
\hat{\beta}_2 &= \frac{(S_{X_2 Y})(S_{X_1^2}) – (S_{X_1Y})(S_{X_1 X_2}) }{(S_{X_1^2})(S_{X_2^2}) – (S_{X_1X_2})^2} = \frac{(81.4)(58) – (265)(39)}{(58)(41.2) – (39)^2} = -6.46\\
\hat{\alpha} &= \overline{Y} – \hat{\beta}_1 \overline{X}_1 – \hat{\beta}_2 \overline{X}_2\\
&=31.524 + 8.91X_1 – 6.46X_2
\end{align*}

Important Key Points of Multiple Regression

Independent variables (predictors, regressors): These are the variables that one believes to influence the dependent variable. One can have two or more independent variables in a multiple-regression model.
Dependent variable (outcome, response): This is the variable one is trying to predict or explain using the independent variables.
Linear relationship: The core assumption is that the relationship between the independent variables and dependent variable is linear. This means the dependent variable changes at a constant rate for a unit change in the independent variable, holding all other variables constant.

The main goal of multiple regression analysis is to find a linear equation that best fits the data. The multiple regression analysis also allows one to:

Predict the value of the dependent variable based on the values of the independent variables.
Understand how changes in the independent variables affect the dependent variable while considering the influence of other independent variables.

Interpreting the Multiple Regression Coefficient

https://rfaqs.com

https://gmstat.com

Geometric Mean Formula

Apr 4, 2025Jun 30, 2024 by Muhammad Imdad Ullah

Post Views: 720

Introduction to the Geometric Mean

The geometric mean (GM) is a way of calculating an average, but instead of adding values like the regular (arithmetic) mean, it multiplies them and then takes a root. The geometric mean (a useful measure of central tendency) is defined as the $n$th root of the product of $n$ positive values.

If we have two observations, let’s say 9 and 4, then the geometric mean is the square root of the product of these values, which is 6 ($\sqrt{9\times 4}=6$. If there are three values, say 3, 9, and 3, then the geometric average will be the $sqrt[3]{3\times 9 \times 3} = 3$. In a similar pattern, mathematically, for $n$ number of observations ($x_1, x_2, \cdots, x_n$) then the Geometric Average Formula will be

$$GM = (x_1 \times x_2 \times x_3 \times \cdots \times x_n)^{\frac{1}{n} }$$

Geometric Mean Example

Suppose we have the following set of values: $x=32, 36, 36, 37, 39, 41, 45, 46, 48$. The Computation of the Geometric Mean will be

\begin{align*}
GM &= (32\times 36 \times 36 \times 37 \times 39 \times 41 \times 45 \times 46 \times 48)^{\frac{1}{9}}\\
&=(243790484520960)^{\frac{1}{9}} = 39.7
\end{align*}

For a large number of observations, one can compute the GM by taking the log of all observations using the following formula:

$$GM = antilog \left[\frac{\sum\limits_{i=1}^n log\, x}{n} \right]$$

$x$	$log\, x$
32	Log 32 = 1.5051
36	log 36 = 1.5563
36	log 36 = 1.5563
37	log 37 = 1.5682
39	log 39 = 1.5911
41	log 41 = 1.6128
45	log 45 = 1.6532
46	log 46 = 1.6628
48	log 48 = 1.6812
Total	14.3870

\begin{align*}
GM &= antilog \left[ \frac{\sum\limits_{i=1}^n log\, x}{n} \right]\\
&= antilog \left[\frac{14.3870}{9}\right] = antilog [1.5986]\\
&= 38.7
\end{align*}

One important point that should be remembered is that if any value in the data set is zero or negative, then the GM cannot be computed.

Geometric Mean for Grouped Data

The GM for grouped data can also be computed using the following formula:

$$GM = antilog \left[ \frac{\Sigma f\times log\, x}{\Sigma f} \right]$$

Suppose we have the following frequency distribution as follows:

Classes	Frequency
65 to 84	9
85 to 104	10
105 to 124	17
125 to 144	10
145 to 164	5
165 to 184	4
185 to 204	5
Tota	60

The GM of the above frequency distribution can be performed as follows

Classes	$f$	$X$	$log\, X$	$f \times log\, X$
65-84	9	74.5	log 74.5 = 1.8722	16.8494
85-104	10	94.5	log 94.5 = 1.9754	19.7543
105-124	17	114.5	log 114.5 = 2.0588	34.9997
125-144	10	134.5	log 134.5 = 2.1287	21.2872
145-164	5	154.5	log 154.5 = 2.1889	10.9446
165-184	4	174.5	log 174.5 = 2.2418	8.9672
185-204	5	194.5	log 194.5 = 2.2889	11.4446
Total	60			124.2471

\begin{align*}
GM &= antilog \left[ \frac{124.2471}{60} \right]\\
&=antilog (2.0708) = 117.4
\end{align*}

The GM is particularly useful when dealing with rates of change or ratios, such as growth rates in investments. That is because the geometric mean considers how things are multiplied over time rather than simply added.

Use and Application of the Geometric Mean

The GM is useful in situations like:

Investment returns: When one looks at average investment growth, one wants to consider how much one’s money is multiplied over time, not just the change each year. That is why the GM is better suited for this scenario.
Rates of change: Similar to investment returns, if something is increasing or decreasing by a percentage each time, the GM is a more accurate measure of the overall change.
Growth Rates: When dealing with percentages or ratios that change over time (like investment returns or population growth), the geometric mean provides a more accurate picture of the overall change compared to the arithmetic mean.
Proportional Changes: This is helpful for situations where changes are multiplied, not added. For example, if a recipe calls for doubling all ingredients, the geometric mean of the original quantities represents the final amount.

Real-Life Examples

Finance (Average Investment Returns): To calculate the average rate of return on investments over time one can use the Geometric Mean? It is because the returns are compound that one cannot use the arithmetic mean. For example, Year 1 return = +10%, the Year 2 return = -20%, Year 3 return = +30%, the GM Return will give the true average annual return over 3 years.
Economics (Growth Rates): The GM should be used to compute average GDP growth, inflation, or population growth over multiple years. It is because growth over time is multiplicative. For example, GDP grows at 3%, 4%, and 5% over 3 years. The geometric mean provides the average annual growth rate.
Business (Average Rate of Change in Prices or Sales): To find the average percentage change in prices or sales across several periods, the GM can be used. For example, A product price increased by 10%, then decreased by 5%, then increased by 8%. The GM will give the true average percentage change.
Environmental Science (Air or Water Quality Data): The GM should be used to calculate the average concentration of pollutants, as environmental data often contains highly skewed values. For example, Pollution levels: 2, 4, 8, 50 → The arithmetic mean is skewed by 50, therefore, the Geometric Mean will give a better central tendency for such data.
Demographics (Fertility or Mortality Rates): In demographic research, to average birth or death rates across different countries or regions, The Geometric Mean should be used because these rates are often ratios and vary widely between groups.
Health & Medicine (Drug Dosage and Bacterial Growth): To measure average bacterial growth rates, enzyme activities, or dosage effectiveness. It is because these processes grow or decline exponentially.
Marketing (Average Performance Metrics): The Geometric Mean should be used to calculate the average conversion rate or engagement rate over multiple platforms or campaigns. Because metrics are often multiplicative percentages, and geometric mean gives a more accurate reflection.

Summary Table

Field	Application	Why Use Geometric Mean?
Finance	Investment returns	Captures compound interest effects
Economics	GDP, inflation, population growth	Works well with growth rates
Business	Sales/prices over time	Accurately represents percent changes
Environment	Pollution concentration levels	Handles skewed environmental data
Demographics	Birth/death rates	Compares ratios fairly
Medicine	Bacterial/drug growth rates	Models exponential biological change
Marketing	Engagement/conversion rates	Why Use the Geometric Mean?

Summary

The geometric mean is incredibly useful in real-life situations where values are multiplied together, grow exponentially, or vary in ratios or percentages — rather than being added. The GM is very useful, especially in business, finance, science, and data analysis.

FAQS about Geometric Mean

What is meant by the Geometric Mean?
In what situation should the GM be used?
For what observations, the GM not computed?
Write down the formula of GM for group and ungrouped data.
Give some real-life examples that make use of the Geometric Mean.

https://rfaqs.com

https://gmstat.com

Weighted Average Real Life Examples

Jun 25, 2024Jun 25, 2024 by Muhammad Imdad Ullah

Post Views: 800

Introduction to Weighted Averages

The multipliers or sets of numbers that express more or less relative importance of various observations (data points) in a data set are called weights.

The weighted arithmetic mean (simply called weighted average or weighted mean) is similar to an ordinary arithmetic mean except that instead of each data point contributing equally to the final average, some data points contribute more than others. Weighted means are useful in a wide variety of scenarios. Weighted averages are used when there are a bunch of values, but some of those values are more important or contribute more to the overall result.

Example of Weighted Average

For example, a student may use a weighted mean to calculate his/her percentage grade in a course. In such an example, the student would multiply the weight of all assessment items in the course (e.g., assignments, exams, sessionals, quizzes, projects, etc.) by the respective grade that was obtained in each of the categories.

As an example, suppose in a course there are a total of 60 marks, while the distribution of marks is as follows, Assignment-1 has a weightage of 10%, Assignment-2 has a weightage of 10%, the mid-term examination has a weightage of 30% and the final term examination have the weightage of 50%. The scenario is described in the table below:

Assessment Item	Weight ($w_i$)	Grades ($x_i$)	Marks	Weighted Marks ($w_ix_i$)
Midterm	10 %	70 %	6	7 %
Assignment # 2	10 %	65 %	6	6.5 %
Midterm Examination	30 %	70 %	12	21 %
Final Term Examination	50 %	85 %	30	42.5 %
	100 %	290 %	60	77 %

Weighted Average Formula

Mathematically, the weighted average forma is given as

$$\overline{x}_w = \frac{\sum\limits_{i=1}^n w_i x_i}{\sum\limits_{i=1}^n w_i}$$

Another Example

Consider another example: Suppose we have monthly expenditures of a family on different items with their quantity

Items	Weights ($w_i$)	Expenses ($x_i$)	Weighted Expenses $w_ix_i$
Food	7.5	290	2175
Rent	2.0	54	108
Clothing	1.5	96	144
Fuel and light	1.0	75	75
Misc	0.5	75	37.5
Total	12.5	590	2539.5

The average expenses will be: $AM = \frac{590}{5} = 118$.

However, the weighted average of the scenario will be $\overline{x}_w = \frac{\sum\limits_{i=1}^n w_i x_i}{\sum\limits_{i=1}^n w_i} = \frac{2539.5}{12.5}=203.16$

Keeping in mind the importance of weight, the average monthly expenses of a family was 203.16, not 118.

Note that in a frequency distribution, the computation of relative frequency (rf) is also related to the concept of weighted averages.

Classes	Frequency	Mid point ($X$)	rf	Percentage
65-84	9	74.5	$\frac{9}{60} = 0.15$	15
85-104	10	94.5	$\frac{10}{60} = 0.17$	17
105-124	17	114.5	$\frac{10}{60} = 0.28$	28
125-144	10	134.5	$\frac{10}{60} = 0.17$	17
145-164	5	154.5	$\frac{5}{60} = 0.08$	8
165-184	4	174.5	$\frac{4}{60} =0.07$	7
185-204	5	194.5	$\frac{5}{60} =0.08$	8
Total	60

Some Real-World Examples of Weighted Averages

Calculating class grade: Different assignments might have different weights (e.g., exams worth more than quizzes). A weighted mean considers these weights to determine the overall grade.
Stock market performance: A stock index might use a weighted average to reflect the influence of large companies compared to smaller ones.
Customer Satisfaction: Finding the average customer satisfaction score when some customers’ feedback might hold more weight (e.g., frequent buyers).
Average Customer Spending: if some customers buy more frequently.
Expected Value: Determining the expected value of outcomes with different probabilities.

The following are some important questions. What is the importance of weighted mean? Describe its advantages and disadvantages. What is an average? What are the qualities of a good average? What does Arithmetic mean? Describe the advantages and disadvantages of Arithmetic mean. In which situations do we apply arithmetic mean?

https://gmstat.com

https://rfaqs.com

Multiple Regression Analysis

Introduction to Multiple Regression Analysis

Table of Contents

Multiple Linear Regression Model

Multiple Regression Coefficients Formula

Multiple Linear Regression Example

Important Key Points of Multiple Regression

Geometric Mean Formula

Introduction to the Geometric Mean

Table of Contents

Geometric Mean Example

Geometric Mean for Grouped Data

Use and Application of the Geometric Mean

Real-Life Examples

Summary Table

Summary

FAQS about Geometric Mean

Weighted Average Real Life Examples

Introduction to Weighted Averages

Table of Contents

Example of Weighted Average

Weighted Average Formula

Another Example

Some Real-World Examples of Weighted Averages

Introduction to Multiple Regression Analysis

Table of Contents

Multiple Linear Regression Model

Multiple Regression Coefficients Formula

Multiple Linear Regression Example

Important Key Points of Multiple Regression

Share this:

Introduction to the Geometric Mean

Table of Contents

Geometric Mean Example

Geometric Mean for Grouped Data

Use and Application of the Geometric Mean

Real-Life Examples

Summary Table

Summary

FAQS about Geometric Mean

Share this:

Introduction to Weighted Averages

Table of Contents

Example of Weighted Average

Weighted Average Formula

Another Example

Some Real-World Examples of Weighted Averages

Share this: