Multiple Regression Analysis

Introduction to Multiple Regression Analysis

Francis Galton (a biometrician) examines the relationship between fathers’ and sons’ height. He analyzed the similarities between the parent and child generation of 700 sweet peas. Galton found that the offspring of tall parents tended to be shorter and offspring of shorter parents tended to be taller. The height of the children depends ($Y$) upon the height of the parents ($X$). In case, there is more than one independent variable (IV), we need multiple regression analysis (MRA), also called multiple linear regression (MLR).

Multiple Linear Regression Model

The linear regression model (equation) for two independent variables (regressors) is

$$Y_{ij} = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \varepsilon_{ij}$$

The general linear regression model (equation) for $k$ independent variables is

$$Y_{ij} = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3X_{3i} + \cdots + \varepsilon_{ij}$$

The $\beta$s are all regression coefficients (partial slopes) and the $\alpha$ is the intercept.

The sample linear regression model is

$$\hat{y} = \hat{\alpha} + \hat{\beta}_1 x_{1i} + \hat{\beta}_2x_{2i} + \hat{\varepsilon}_{ij}$$

Multiple Regression Coefficients Formula

To fit the MLR equation for two variables, one needs to compute the values of $\hat{\beta}_1, \hat{\beta}_2$, and $\alpha$.

Multiple Regression Analysis Partial Coefficient 1

The yellow part of the above formula is the (“sum of the product of 1st independent and dependent variables”) multiplied by the (“sum of the square of 2nd independent variable).

The red part of the above formula is the (“Sum of the product of 2nd independent and dependent variables”) multiplied by the (“sum of the product of two independent variables”).

The green part of the above formula is the (“sum of the square of 1st independent variable”) multiplied by the (“sum of the square of 2nd independent variable”).

The blue part of the above formula is the (“square of the sum of the product of two independent variables”).

The formula for 2nd regression coefficient is

Multiple Regression Analysis Partial Coefficient 1

In short, note that the $S$ stands for the sum of squares and the sum of products.

Multiple Linear Regression Example

Consider the following data about two regressors ($X_1, X_2$) and one regressand variable ($Y$).

$Y$$X_1$$X_2$$X_1 y$$X_2 y$$X_1 X_2$$X_1^2$$X_2^2$
301015300450150100225
2258110176402564
161012160192120100144
737214921949
1421028140204100
8930526191007351238582

\begin{align*}
S_{x_1Y} &= \sum X_1 y – \frac{\sum X_1 \sum Y}{n} = 619 – \frac{30\times 59}{5} = 265\\
S_{x_1x_2} &= \sum X_1 X_2 – \frac{\sum X_1 \sum X_2}{n} = 351 – \frac{30 \times 52}{5} = 39\\
S_{X_1^2} &= \sum X_1^2 – \frac{(\sum X_1)^2}{n} = 238 -\frac{30^2}{5} = 58\\
S_{X_2^2} &= \sum X_2^2 – \frac{(\sum X_2)^2}{n} = 582 – \frac{52^2}{5} = 41.2\\
S_{X_2 y} &= \sum X_2 Y – \frac{\sum X_2 \sum Y}{n} =1007 – \frac{52 \times 89}{5} = 81.4
\end{align*}

\begin{align*}
\hat{\beta}_1 &= \frac{(S_{X_1 Y})(S_{X_2^2}) – (S_{X_2Y})(S_{X_1 X_2}) }{(S_{X_1^2})(S_{X_2^2}) – (S_{X_1X_2})^2} = \frac{(265)(41.2) – (81.4)(39)}{(58)(41.2) – (39)^2} = 8.91\\
\hat{\beta}_2 &= \frac{(S_{X_2 Y})(S_{X_1^2}) – (S_{X_1Y})(S_{X_1 X_2}) }{(S_{X_1^2})(S_{X_2^2}) – (S_{X_1X_2})^2} = \frac{(81.4)(58) – (265)(39)}{(58)(41.2) – (39)^2} = -6.46\\
\hat{\alpha} &= \overline{Y} – \hat{\beta}_1 \overline{X}_1 – \hat{\beta}_2 \overline{X}_2\\
&=31.524 + 8.91X_1 – 6.46X_2
\end{align*}

Important Key Points of Multiple Regression

  • Independent variables (predictors, regressors): These are the variables that one believes to influence the dependent variable. One can have two or more independent variables in a multiple-regression model.
  • Dependent variable (outcome, response): This is the variable one is trying to predict or explain using the independent variables.
  • Linear relationship: The core assumption is that the relationship between the independent variables and dependent variable is linear. This means the dependent variable changes at a constant rate for a unit change in the independent variable, holding all other variables constant.

The main goal of multiple regression analysis is to find a linear equation that best fits the data. The multiple regression analysis also allows one to:

  • Predict the value of the dependent variable based on the values of the independent variables.
  • Understand how changes in the independent variables affect the dependent variable while considering the influence of other independent variables.

Interpreting the Multiple Regression Coefficient

https://rfaqs.com

https://gmstat.com

Geometric Mean

Introduction to Geometric Mean

The geometric mean (GM) is a way of calculating an average, but instead of adding values like the regular (arithmetic) mean, it multiplies them and then takes a root. The geometric mean is defined as the $n$th root of the product of $n$ positive values.

If we have two observations let’s say 9 and 4, then the geometric mean is the square root of the product of these values, which is 6 ($\sqrt{9\times 4}=6$. If there are three values let’s say  3, 9, and 3 then the geometric average will be the $sqrt[3]{3\times 9 \times 3} = 3$. In a similar pattern, mathematically, for $n$ number of observations ($x_1, x_2, \cdots, x_n$) then the Geometric Average Formula will be

$$GM = (x_1 \times x_2 \times x_3 \times \cdots \times x_n)^{\frac{1}{n} }$$

Geometric Mean

Geometric Mean Example

Suppose we have the following set of values $x=32, 36, 36, 37, 39, 41, 45, 46, 48$. The Computation of Geometric Mean will be

\begin{align*}
GM &= (32\times 36 \times 36 \tmies 37 \times 39 \times 41 \times 45 \times 46 \times 48)^{\frac{1}{9}}\\
&=(243790484520960)^{\frac{1}{9}} = 39.7
\end{align*}

For a large number of observations one can compute the GM by taking the log of all observations using the following formula:

$$GM = antilog \left[\frac{\sum\limits_{i=1}^n log\, x}{n} \right]$$

$x$$log\, x$
32Log 32 = 1.5051
36log 36 = 1.5563
36log 36 = 1.5563
37log 37 = 1.5682
39log 39 = 1.5911
41log 41 = 1.6128
45log 45 = 1.6532
46log 46 = 1.6628
48log 48 = 1.6812
Total14.3870

\begin{align*}
GM &= antilog \left[ \frac{\sum\limits_{i=1}^n log\, x}{n} \right]\\
&= antilog \left[\frac{14.3870}{9}\right] = antilog [1.5986]\\
&= 38.7
\end{align*}

One important point that should be remembered is that if any value in the data set is zero or negative then the GM cannot be computed.

Geometric Mean for Grouped Data

The GM for grouped data can also be computed using the following formula:

$$GM = antilog \left[ \frac{\Sigma f\times log\, x}{\Sigma f} \right]$$

Suppose, we have the following frequency distribution as follows:

ClassesFrequency
65 to 849
85 to 10410
105 to 12417
125 to 14410
145 to 1645
165 to 1844
185 to 2045
Tota60

The GM of the above frequency distribution can be performed as follows

Classes$f$$X$$log\, X$$f \times log\, X$
65-84974.5log 74.5 = 1.872216.8494
85-1041094.5log 94.5 = 1.975419.7543
105-12417114.5log 114.5 = 2.058834.9997
125-14410134.5log 134.5 = 2.128721.2872
145-1645154.5log 154.5 = 2.188910.9446
165-1844174.5log 174.5 = 2.24188.9672
185-2045194.5log 194.5 = 2.288911.4446
Total60  124.2471

\begin{align*}
GM &= antilog \left[ \frac{124.2471}{60} \right]\\
&=antilog (2.0708) = 117.4
\end{align*}

The GM is particularly useful when dealing with rates of change or ratios, such as growth rates in investments. That is because geometric mean considers how things are multiplied over time, rather than simply added.

Use and Application of Geometric Mean

Geometric Mean is useful in situations like:

  • Investment returns: When one looks at average investment growth, one wants to consider how much one’s money is multiplied over time, not just the change each year. That is why the GM is better suited for this scenario.
  • Rates of change: Similar to investment returns, if something is increasing or decreasing by a percentage each time, the GM is a more accurate measure of the overall change.
  • Growth Rates: When dealing with percentages or ratios that change over time (like investment returns or population growth), the geometric mean provides a more accurate picture of the overall change compared to the arithmetic mean.
  • Proportional Changes: It is helpful for situations where changes are multiplied, not added. For example, if a recipe calls for doubling all ingredients, the geometric mean of the original quantities represents the final amount.

https://rfaqs.com

https://gmstat.com

Weighted Average Real Life Examples

Introduction to Weighted Averages

The multipliers or sets of numbers that express more or less relative importance of various observations (data points) in a data set are called weights.

The weighted arithmetic mean (simply called weighted average or weighted mean) is similar to an ordinary arithmetic mean except that instead of each data point contributing equally to the final average, some data points contribute more than others. Weighted means are useful in a wide variety of scenarios. Weighted averages are used when there are a bunch of values, but some of those values are more important or contribute more to the overall result.

Example of Weighted Average

For example, a student may use a weighted mean to calculate his/her percentage grade in a course. In such an example, the student would multiply the weight of all assessment items in the course (e.g., assignments, exams, sessionals, quizzes, projects, etc.) by the respective grade that was obtained in each of the categories.

As an example, suppose in a course there are a total of 60 marks, while the distribution of marks is as follows, Assignment-1 has a weightage of 10%, Assignment-2 has a weightage of 10%, the mid-term examination has a weightage of 30% and the final term examination have the weightage of 50%. The scenario is described in the table below:

Assessment
Item
Weight
($w_i$)
Grades
($x_i$)
MarksWeighted Marks
($w_ix_i$)
Midterm10 %70 %67 %
Assignment # 210 %65 %66.5 %
Midterm Examination30 %70 %1221 %
Final Term Examination50 %85 %3042.5 %
 100 %290 %6077 %

Weighted Average Formula

Mathematically, the weighted average forma is given as

$$\overline{x}_w = \frac{\sum\limits_{i=1}^n w_i x_i}{\sum\limits_{i=1}^n w_i}$$

Another Example

Consider another example: Suppose we have monthly expenditures of a family on different items with their quantity

ItemsWeights ($w_i$)Expenses ($x_i$)Weighted Expenses
$w_ix_i$
Food7.52902175
Rent2.054108
Clothing1.596144
Fuel and light1.07575
Misc0.57537.5
Total12.55902539.5

The average expenses will be: $AM = \frac{590}{5} = 118$.

However, the weighted average of the scenario will be $\overline{x}_w = \frac{\sum\limits_{i=1}^n w_i x_i}{\sum\limits_{i=1}^n w_i} = \frac{2539.5}{12.5}=203.16$

Keeping in mind the importance of weight, the average monthly expenses of a family was 203.16, not 118.

Note that in a frequency distribution, the computation of relative frequency (rf) is also related to the concept of weighted averages.

ClassesFrequencyMid point ($X$)rfPercentage
65-84974.5$\frac{9}{60} = 0.15$15
85-1041094.5$\frac{10}{60} = 0.17$17
105-12417114.5$\frac{10}{60} = 0.28$28
125-14410134.5$\frac{10}{60} = 0.17$17
145-1645154.5$\frac{5}{60} = 0.08$8
165-1844174.5$\frac{4}{60} =0.07$7
185-2045194.5$\frac{5}{60} =0.08$8
Total60  

Some Real-World Examples of Weighted Averages

  • Calculating class grade: Different assignments might have different weights (e.g., exams worth more than quizzes). A weighted mean considers these weights to determine the overall grade.
  • Stock market performance: A stock index might use a weighted average to reflect the influence of large companies compared to smaller ones.
  • Customer Satisfaction: Finding the average customer satisfaction score when some customers’ feedback might hold more weight (e.g., frequent buyers).
  • Average Customer Spending: if some customers buy more frequently.
  • Expected Value: Determining the expected value of outcomes with different probabilities.
Weighted Average

The following are some important questions. What is the importance of weighted mean? Describe its advantages and disadvantages. What is an average? What are the qualities of a good average? What does Arithmetic mean? Describe the advantages and disadvantages of Arithmetic mean. In which situations do we apply arithmetic mean?

https://gmstat.com

https://rfaqs.com

Important MCQs Sampling Distribution Quiz with Answer 11

Online Sampling Distribution Quiz with Answers for the preparation of exams and different statistical job tests in Government/ Semi-Government or Private Organization sectors. This Online Quiz about Sampling Distributions is also helpful in getting admission to various colleges and Universities. There are 20 Multiple Choice Type Questions from the Sampling and Sampling Distribution Quiz. Let us start with the Sampling Distribution Quiz.

Online Multiple Choice Questions about Sampling and Sampling Distributions with Answers

1. The value of $n_2$ by a proportional allocation from the following information is:
$N_1=580$, $N_2=140$ and $n=80$.

 
 
 
 

2. When the procedure of selecting the elements from the population is not based on probability is known as:

 
 
 
 

3. Which one of the following is the benefit of using simple random sampling?

 
 
 
 

4. When sampling is done with or without replacement, $E(\overline{y})$ is equal to

 
 
 
 

5. Which one of these sampling methods is a probability method?

 
 
 
 

6. Sampling in which a sampling unit can be repeated more than once is called

 
 
 
 

7. If a researcher randomly samples 100 observations in each population category then his ————— sample will be ———-.

 
 
 
 

8. When the sample size increases, everything else remains the same, and the width of a confidence interval for a population parameter will

 
 
 
 

9. A group consists of 300 people and we are interviewing all members of a given group called:

 
 
 
 

10. Consider a population of size 700 consisting of three strata such that $N_1=100, N_2=250$, and $N_3=350$. The required sample size is 18. What will be the sample size for stratum-III according to proportional allocation?

 
 
 
 

11. The weight of the stratum is equal to the proportion of

 
 
 
 

12. Which one of the following is the main problem with using non-probability sampling techniques?

 
 
 
 

13. Regardless of the difference in the distribution of the sample and population, the mean of sampling distribution must be equal to

 
 
 
 

14. Consider a population of size 700 consisting of three strata such that $N_1=100, N_2=250$, and $N_3=350$. The required sample size is 18. What will be the sample size for stratum-II according to proportional allocation?

 
 
 
 

15. The weight of the stratum is equal to the proportion of:

 
 
 
 

16. A procedure in which the number of elements in a stratum is proportional to the number of elements in the population is called

 
 
 
 

17. Stratification is to produce estimators with small

 
 
 
 

18. The university has 5000 students belonging to the following classes: (i) 1500 are freshmen, (ii) 1200 are sophomores, (iii) 1400 are juniors, and (iv) 900 are seniors. The university administration wants to get an estimate of all the student’s views on a proposal to help alleviate the parking problem on campus. Suppose, that a sample of 100 students is chosen, what is the required sample size for the freshman stratum under proportional allocation?

 
 
 
 

19. The margin of error is the level of _________ you require.

 
 
 
 

20. When the number of observations drawn from a stratum is small relative to the overall size of the stratum then the ————- will also be small.

 
 
 
 

MCQs Sampling Distribution Quiz with Answers

  • The weight of the stratum is equal to the proportion of:
  • A group consists of 300 people and we are interviewing all members of a given group called:
  • When the procedure of selecting the elements from the population is not based on probability is known as:
  • The university has 5000 students belonging to the following classes: (i) 1500 are freshmen, (ii) 1200 are sophomores, (iii) 1400 are juniors, and (iv) 900 are seniors. The university administration wants to get an estimate of all the student’s views on a proposal to help alleviate the parking problem on campus. Suppose, that a sample of 100 students is chosen, what is the required sample size for the freshman stratum under proportional allocation?
  • When the sample size increases, everything else remains the same, and the width of a confidence interval for a population parameter will
  • The value of $n_2$ by a proportional allocation from the following information is: $N_1=580$, $N_2=140$ and $n=80$.
  • Sampling in which a sampling unit can be repeated more than once is called
  • Which one of these sampling methods is a probability method?
  • Which one of the following is the main problem with using non-probability sampling techniques?
  • Which one of the following is the benefit of using simple random sampling?
  • When sampling is done with or without replacement, $E(\overline{y})$ is equal to
  • The margin of error is the level of ———- you require.
  • If a researcher randomly samples 100 observations in each population category then his ————— sample will be ———-.
  • Stratification is to produce estimators with small
  • The weight of the stratum is equal to the proportion of
  • When the number of observations drawn from a stratum is small relative to the overall size of the stratum then the ————- will also be small.
  • Regardless of the difference in the distribution of the sample and population, the mean of sampling distribution must be equal to
  • A procedure in which the number of elements in a stratum is proportional to the number of elements in the population is called
  • Consider a population of size 700 consisting of three strata such that $N_1=100, N_2=250$, and $N_3=350$. The required sample size is 18. What will be the sample size for stratum-II according to proportional allocation?
  • Consider a population of size 700 consisting of three strata such that $N_1=100, N_2=250$, and $N_3=350$. The required sample size is 18. What will be the sample size for stratum-III according to proportional allocation?
MCQs Sampling and Sampling Distribution Quiz with Answers

https://rfaqs.com

https://gmstat.com

A sampling distribution depends on several factors:

  • The statistic being used: Is the researcher looking at the mean, median, or something else?
  • The original population’s distribution: Is the population data normally distributed, skewed, or something else?
  • Sample size: Generally, larger samples lead to sampling distributions closer to the actual population distribution.
https://itfeature.com Sampling Distribution Quiz

In conclusion, sampling distributions are vital tools in statistics. Sampling Distributions help us to understand the variability of statistics calculated from samples and make informed inferences about the population from which the samples were drawn.