Statistics for Data Science & Analytics - Statistics MCQs, Software & Data Analysis

Median of Ungrouped Data

Jul 13, 2024 by Muhammad Imdad Ullah

Post Views: 881

Introduction to Median of Ungrouped Data

The post is about calculating the median ungrouped data. The median is the most central point (middlemost central value) of the data/set of observations, with the condition that the data or set of observations should be arranged in ascending or descending order. The median divides the data into two equal parts. That is the main objective of the median.

It is important to note that the criteria for finding the median for grouped and ungrouped data are different.

The primary and secondary data can be defined as:

Primary data, also called raw or ungrouped data, does not undergo any statistical procedure/method, which is not in the form of frequency distribution.
Secondary data may also be called group data if it is in the form of frequency distribution.

Let us discuss how to find the median for ungrouped data.

There are two cases for ungrouped data. These cases are based on no of observations which is $n$

When the number of observations is odd (Say $n$ i.e. $n$ is odd), and when the number of observations is even (Say $n$ i.e. $n$ is even).

Median Calculations

The data below contains the odd number of observations.

Observation No. (Ascending Order)	1^st	2^nd	3^rd	4^th	5^th	6th	7^th	8^th	9^th	10^th	11^th
Data Values	81	89	90	96	100	102	103	104	108	109	118
(Descending Order)	11	10	9	8	7	6	5	4	3	2	1

Since the number of observations is odd ($n = 11$), the central value after arranging in ascending order will be the 6^th value. and the 6^th value is 102. That is the median is 102 for the above data.

The position of the median can be located mathematically, as follows:

\begin{align*}
\tilde{x} &= \left( \frac{n+1}{2} \right)th\,\, \text{value}\\
&=\frac{11+1}{2} = 6th\,\, \text{value}
\end{align*}

The value at the 6th position (from sorted data) is 102. The $\tilde{x}$ can be read as “x-tild” which is the notation of the median.

Median for Even Numbers of Observations

Consider the following data that contains an even number of observations.

Observation No.	1	2	3	4	5	6	7	8	9	10
Data Values	81	100	96	108	90	102	104	103	109	89

Data after sorting (either in ascending or descending order) is

Observations No.	1^st	2^nd	3^rd	4^th	5^th	6th	7^th	8^th	9^th	10^th
x	81	89	90	96	100	102	103	104	108	109

Since $n=10$ which is even, the central position (that is median) lies between the 5th value and the 6th value. This central value is the average of the 5th and 6th values (from the sorted data). The average of these two central observations is called the median. The two central positions are 100 and 102, take the average of these two numbers and find the median.

$$Median = \frac{100+102}{2} = 101$$

Median Formula for Large Data Sets

The median formula for large or small data sets can be represented mathematically.

For large data sets one can find the median of data mathematically. The formula for both odd number of observations and even numbers of observations is different.

The point to remember when computing the median is that

For an odd number of observations, the median is the centermost value after sorting the data
For an even number of observations, the median is the average of two central values after sorting the data

\begin{align*}
\tilde{x} &= \frac{1}{2} \left[ \left(\frac{n}{2}th \, \, value \right)+ \left(\frac{n}{2}+1 \right)the \,\, value \right]\quad \quad \text{(When observations are even)}\\
&= \frac{n+1}{2} \quad \quad \text{(when observations are odd)}
\end{align*}

The other way of the median formula is

Consider, a data set containing 157 observations. To compute the median, first of all, you need to sort the data in either ascending or descending order. The formula for this data will be

$$\tilde{x} = \frac{n+1}{2} = \frac{157+1}{2}=79th$$.

The 79th observation in the sorted data will be the median of the data.

In case, if there are even number of observations (say $n=396$, the median will be

\begin{align*}
\tilde{x} &= \frac{1}{2}\left[\left(\frac{n}{2}\right)th + \left(\frac{n+1}{2}\right)th \right]\\
&=\frac{1}{2} \left[\frac{396}{2}th + \frac{396}{2}+1 \right]\\
&= \frac{1}{2} \left[198th + 199th\right]
\end{align*}

The average of 198th value and 199th value from the sorted data will be the median of the data.

https://rfaqs.com

https://gmstat.com

Statistical Inference: An Introduction

Mar 18, 2025Jul 10, 2024 by Muhammad Imdad Ullah

Post Views: 642

Introduction to Statistical Inference

Inference means conclusion. When we discuss statistical inference, it is the branch of Statistics that deals with the methods to make conclusions (inferences) about a population (called reference population or target population), based on sample information. The statistical inference is also known as inferential statistics. As we know, there are two branches of Statistics: descriptive and inferential.

Statistical inference is a cornerstone of many fields of life. It allows the researchers to make informed decisions based on data, even when they can not study the entire population of interest. The statistical inference has two fields of study:

Estimation

Estimation is the procedure by which we obtain an estimate of the true but unknown value of a population parameter by using the sample information that is taken from that population. For example, we can find the mean of a population by computing the mean of a sample drawn from that population.

Estimator

The estimator is a statistic (Rule or formula) whose calculated values are used to estimate (a wise guess from data information) is used to estimate a population parameter $\theta$.

Estimate

An estimate is a particular realization of an estimator $\hat{\theta}$. It is the notation of a sample statistic.

Types of Estimators

An estimator can be classified either as a point estimate or an interval estimate.

Point Estimate

A point estimate is a single number that can be regarded as the most plausible value of the $\theta$ (notation for a population parameter).

Interval Estimate

An interval estimate is a set of values indicating confidence that the interval will contain the true value of the population parameter $\theta$.

Testing of Hypothesis

Testing of Hypothesis is a procedure that enables us to decide, based on information obtained by sampling procedure whether to accept or reject a specific statement or hypothesis regarding the value of a parameter in a Statistical problem.

Note that since we rely on samples, there is always some chance our inferences are not perfect. Statistical inference acknowledges this by incorporating concepts like probability and confidence intervals. These help us quantify the uncertainty in our estimates and test results.

Important Considerations about Testing of Hypothesis

Hypothesis testing does not prove anything; it provides evidence for or against a claim.
There is always a chance of making errors (Type I or Type II).
The results are specific to the chosen sample and significance level.

Statistical Inference in Real-Life

Some real-life examples of inferential statistics:

Medical Trials: When a new drug is developed, it is tested on a sample of patients to infer its effectiveness and safety for the general population. Statistical inference helps determine whether the observed effects are due to the drug or random chance.
Market Research: Companies use inferential statistics to understand consumer preferences and behaviours. By surveying a sample of consumers, they can infer the preferences of the broader market and make informed decisions about product development and marketing strategies.
Public Health: Epidemiologists use statistical inference to track the spread of diseases and the effectiveness of interventions. Analyzing sample data one can infer the overall impact of a disease and the effectiveness of measures like vaccinations.
Quality Control: Manufacturers use statistical inference to monitor product quality. By sampling a few items from a production batch, they can infer the quality of the entire batch and make decisions about whether to continue production or make adjustments.
Election Polling: Pollsters use samples of voter opinions to infer the likely outcome of an election. Statistical inference helps estimate the proportion of the population that supports each candidate and the margin of error in these estimates.
Education: Educators and policymakers use statistical inference to evaluate the effectiveness of teaching methods and educational programs. By analyzing test scores and other performance metrics from a sample of students, they can infer the impact of these methods on the broader student population.
Environmental Studies: Researchers use statistical inference to assess environmental impacts. For example, by sampling air or water quality in specific locations, they can infer the overall environmental conditions and the effectiveness of pollution control measures.
Sports Analytics: Teams and coaches use statistical inference to evaluate player performance and strategy effectiveness. By analyzing data from a sample of games, they can infer the overall performance trends and make decisions about training and game strategy.
Finance: Investors and financial analysts use statistical inference to make decisions about investments. By analyzing sampled historical data of stocks or other financial instruments, one can infer future performance and make informed investment decisions.
Customer Satisfaction: Businesses use statistical inference to gauge customer satisfaction and loyalty. By surveying a sample of customers, one can infer the overall satisfaction levels and identify areas for improvement.

FAQs about Statistical Inference

Define the term estimation.
Define the term estimate.
Define the term estimator.
Write a short note on statistical inference.
What is statistical hypothesis testing?
What is the estimation in statistics?
What are the types of estimations?
Write about point estimation and intervention estimation.

https://rfaqs.com, https://gmstat.com

Multiple Regression Analysis

Jul 7, 2024 by Muhammad Imdad Ullah

Post Views: 1,286

Introduction to Multiple Regression Analysis

Francis Galton (a biometrician) examines the relationship between fathers’ and sons’ height. He analyzed the similarities between the parent and child generation of 700 sweet peas. Galton found that the offspring of tall parents tended to be shorter and offspring of shorter parents tended to be taller. The height of the children depends ($Y$) upon the height of the parents ($X$). In case, there is more than one independent variable (IV), we need multiple regression analysis (MRA), also called multiple linear regression (MLR).

Multiple Linear Regression Model

The linear regression model (equation) for two independent variables (regressors) is

$$Y_{ij} = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \varepsilon_{ij}$$

The general linear regression model (equation) for $k$ independent variables is

$$Y_{ij} = \alpha + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3X_{3i} + \cdots + \varepsilon_{ij}$$

The $\beta$s are all regression coefficients (partial slopes) and the $\alpha$ is the intercept.

The sample linear regression model is

$$\hat{y} = \hat{\alpha} + \hat{\beta}_1 x_{1i} + \hat{\beta}_2x_{2i} + \hat{\varepsilon}_{ij}$$

Multiple Regression Coefficients Formula

To fit the MLR equation for two variables, one needs to compute the values of $\hat{\beta}_1, \hat{\beta}_2$, and $\alpha$.

Multiple Regression Analysis Partial Coefficient 1

The yellow part of the above formula is the (“sum of the product of 1st independent and dependent variables”) multiplied by the (“sum of the square of 2nd independent variable).

The red part of the above formula is the (“Sum of the product of 2nd independent and dependent variables”) multiplied by the (“sum of the product of two independent variables”).

The green part of the above formula is the (“sum of the square of 1st independent variable”) multiplied by the (“sum of the square of 2nd independent variable”).

The blue part of the above formula is the (“square of the sum of the product of two independent variables”).

The formula for 2nd regression coefficient is

In short, note that the $S$ stands for the sum of squares and the sum of products.

Multiple Linear Regression Example

Consider the following data about two regressors ($X_1, X_2$) and one regressand variable ($Y$).

$Y$	$X_1$	$X_2$	$X_1 y$	$X_2 y$	$X_1 X_2$	$X_1^2$	$X_2^2$
30	10	15	300	450	150	100	225
22	5	8	110	176	40	25	64
16	10	12	160	192	120	100	144
7	3	7	21	49	21	9	49
14	2	10	28	140	20	4	100

89	30	52	619	1007	351	238	582

\begin{align*}
S_{x_1Y} &= \sum X_1 y – \frac{\sum X_1 \sum Y}{n} = 619 – \frac{30\times 59}{5} = 265\\
S_{x_1x_2} &= \sum X_1 X_2 – \frac{\sum X_1 \sum X_2}{n} = 351 – \frac{30 \times 52}{5} = 39\\
S_{X_1^2} &= \sum X_1^2 – \frac{(\sum X_1)^2}{n} = 238 -\frac{30^2}{5} = 58\\
S_{X_2^2} &= \sum X_2^2 – \frac{(\sum X_2)^2}{n} = 582 – \frac{52^2}{5} = 41.2\\
S_{X_2 y} &= \sum X_2 Y – \frac{\sum X_2 \sum Y}{n} =1007 – \frac{52 \times 89}{5} = 81.4
\end{align*}

\begin{align*}
\hat{\beta}_1 &= \frac{(S_{X_1 Y})(S_{X_2^2}) – (S_{X_2Y})(S_{X_1 X_2}) }{(S_{X_1^2})(S_{X_2^2}) – (S_{X_1X_2})^2} = \frac{(265)(41.2) – (81.4)(39)}{(58)(41.2) – (39)^2} = 8.91\\
\hat{\beta}_2 &= \frac{(S_{X_2 Y})(S_{X_1^2}) – (S_{X_1Y})(S_{X_1 X_2}) }{(S_{X_1^2})(S_{X_2^2}) – (S_{X_1X_2})^2} = \frac{(81.4)(58) – (265)(39)}{(58)(41.2) – (39)^2} = -6.46\\
\hat{\alpha} &= \overline{Y} – \hat{\beta}_1 \overline{X}_1 – \hat{\beta}_2 \overline{X}_2\\
&=31.524 + 8.91X_1 – 6.46X_2
\end{align*}

Important Key Points of Multiple Regression

Independent variables (predictors, regressors): These are the variables that one believes to influence the dependent variable. One can have two or more independent variables in a multiple-regression model.
Dependent variable (outcome, response): This is the variable one is trying to predict or explain using the independent variables.
Linear relationship: The core assumption is that the relationship between the independent variables and dependent variable is linear. This means the dependent variable changes at a constant rate for a unit change in the independent variable, holding all other variables constant.

The main goal of multiple regression analysis is to find a linear equation that best fits the data. The multiple regression analysis also allows one to:

Predict the value of the dependent variable based on the values of the independent variables.
Understand how changes in the independent variables affect the dependent variable while considering the influence of other independent variables.

Interpreting the Multiple Regression Coefficient

https://rfaqs.com

https://gmstat.com

Median of Ungrouped Data

Introduction to Median of Ungrouped Data

Table of Contents

Median Calculations

Median for Even Numbers of Observations

Median Formula for Large Data Sets

Statistical Inference: An Introduction

Introduction to Statistical Inference

Table of Contents

Estimation

Estimator

Estimate

Types of Estimators

Point Estimate

Interval Estimate

Testing of Hypothesis

Statistical Inference in Real-Life

FAQs about Statistical Inference

Multiple Regression Analysis

Introduction to Multiple Regression Analysis

Table of Contents

Multiple Linear Regression Model

Multiple Regression Coefficients Formula

Multiple Linear Regression Example

Important Key Points of Multiple Regression

Introduction to Median of Ungrouped Data

Table of Contents

Median Calculations

Median for Even Numbers of Observations

Median Formula for Large Data Sets

Share this:

Introduction to Statistical Inference

Table of Contents

Estimation

Estimator

Estimate

Types of Estimators

Point Estimate

Interval Estimate

Testing of Hypothesis

Statistical Inference in Real-Life

FAQs about Statistical Inference

Share this:

Introduction to Multiple Regression Analysis

Table of Contents

Multiple Linear Regression Model

Multiple Regression Coefficients Formula

Multiple Linear Regression Example

Important Key Points of Multiple Regression

Share this: