Statistics for Data Science & Analytics - Statistics MCQs, Software & Data Analysis

Important MCQs DOE Quiz 4

May 20, 2024May 20, 2024 by Muhammad Imdad Ullah

Post Views: 579

The quiz contains MCQs on the Design of Experiments DOE Quiz. Most MCQs on the DOE Quiz are from Basics of Design of Experiments.

Design of experiments (DOE) is a systematic method used to plan, conduct, analyze, and interpret controlled tests to study the relationship between factors and outcomes. Design of Experiment is a powerful tool used in various fields, including science, engineering, and business, to gain insights and optimize processes.

By following the principles of DOE, one can conduct more efficient and informative experiments, ultimately leading to better decision-making and improved outcomes in various fields.

DOE Quiz with Answers

What is the purpose of the experiment?
What is a random experiment?
Probability theory is based on the paradigm of:
What is the design of the experiment?
What is the main characteristic of a designed experiment?
The first step in the random experiment is:
One of the main objectives of an experiment?
Robustness against missing observations means?
Robustness against outliers means?
Randomized complete block design is used in agriculture when?
When treatments are continuous quantitative variables we use?
The most simple blocked design is:
The important use of DOE in engineering is?
What treatments are continuous quantitative variables we should use?
Evaluation and comparison of basic design configuration is important applications in:
The important use of DOE in life sciences is?
When prior knowledge of variables is available we should use?
Conducting Bayesian experimentation we use:
Common types of DOE for environmental sciences include.
When the experiment is to be repeated a large number of times under similar conditions, this is called?

https://gmstat.com

https://rfaqs.com

Student’s t Table Free Download, 2024

May 18, 2024May 18, 2024 by Muhammad Imdad Ullah

Post Views: 760

The t-distribution was discovered by W. S. Gosset and R.A. Fisher. The entries in Student’s t table entries are the critical values (percentiles) for the t distribution. The applications of Student’s t distribution are related to (i) the sampling distribution of the mean $\overline{x}$, (ii) the distribution of a difference $(\overline{x}_1 – \overline{x}_2)$ of two independent populations, (iii) the distribution of two paired (dependent) populations, and (iv) the significance of correlation coefficient. It is also used for constructing confidence intervals for small samples. The Student’s t distribution is a crucial tool in statistical analysis, especially when dealing with small sample sizes. It helps us make informed decisions based on our data, even when the population standard deviation is unknown.

student's t table, Student's t Distribution

The Student’s t variable can be generated by dividing the standard normal random variable ($Z$) with the square root of a $\chi^2_{v}$ random variable. The $\chi^2_v$ is itself divided by its parameter $v$. That is

\begin{align*}
t_v &= \frac{x – \mu }{s_v} = \frac{\tfrac{(x-\mu)}{\sigma} }{\sqrt{\dfrac{\frac{v\times s^2_v}{\sigma^2} } {v}}}\\
&= \frac{Z}{\sqrt{\dfrac{\chi^2_v}{v}}}
\end{align*}

where

$Z$ is from a standard normal distribution
$chi _{v}^{2}$ is a Chi-square random variable with $v$ degrees of freedom
$Z$ and $chi _{v}^{2}$ are independent of each other

PDF of Student’s t Distribution

The PDF of t having $v$ degrees of freedom is

$$p(t_v) = K_v (1+\frac{t^2}{v})^{-\frac{v+1}{2}}$$

where

$$K_v = \frac{\Gamma \left[ \frac{(v+1}{2} \right]}{\sqrt{v\pi} \left(\frac{v}{2}\right) }$$

The t distribution is symmetric about zero and wider than normal density. It has one mode and it tends to be normal as $v\rightarrow \infty$. Note that $\Gamma(x)$ indicates the Gamma function.

Moments of t Distribution

Since the t distribution is symmetric and its PDF is centered at zero, the expectation (average), the median, and the mode are all zero for the t distribution with $v$ degrees of freedom. The variance ($\sigma^2$) equals $\frac{v}{v-2}$ and kurtosis is $\frac{6}{v-4}$.

For bivariate normal population, the distribution of correlation coefficient $r$ is linked with Student’s t distribution through transformation:

$$\frac{r}{\sqrt{\frac{1-r^2}{n-2}}}\rightarrow t_{n-2}$$

Generation of Pseudo Random t Variates

The following algorithm can be used to generate random variates from Student’s $t(v)$ distribution using serially generated independent uniform $U(0,1)$ random variates. For example,

Let $n=v$ (the degrees of freedom)

$C = -2n$

Repeat
$t = 2 \times U(0, 1) – 1$
$u = 2 \times U(0, 1) – 1$
$r = t^2 + u^2$
Until
$r < 1$
Return
$t \times \sqrt{\frac{n \times (r^C – 1)}{r}}$

Student’s t Table

Students-t-Table Download

Online Quiz Website: https://gmstat.com

Simple Linear Regression Model

Apr 4, 2025May 15, 2024 by Muhammad Imdad Ullah

Post Views: 685

A simple Linear Regression model is one of the most fundamental techniques in machine learning and statistics. Whether you are a data science newbie or just brushing up on the basics, understanding linear regression is essential.

Introduction

Frequently, we measure two or more variables on each individual and try to express the nature of the relationship between these variables (for example, in the simple linear regression model and correlation analysis). Using the regression technique, we estimate the relationship of one variable with another by expressing the one in terms of a linear (or more complex) function of another. We also predict the values of one variable in terms of the other. The variables involved in regression and correlation analysis are continuous. In this post, we will learn about the Simple Linear Regression Model.

Functional Relationship Between Variables

We are interested in establishing significant functional relationships between two (or more) variables. For example, the function $Y=f(X)=a+bx$ (read as $Y$ is a function of $X$) establishes a relationship to predict the values of variable $Y$ for the given values of variable $X$. In statistics (biostatistics), the function is a simple linear regression model or the regression equation.

The variable $Y$ is called the dependent (response) variable, and $X$ is called the independent (regressor or explanatory) variable.

In biology, many relationships can be appropriate over only a limited range of values of $X$. Negative values are meaningless in many cases, such as age, height, weight, and body temperature.

The method of linear regression is used to estimate the best-fitting straight line to describe the relationship between variables. The linear regression gives the equation of the straight line that best describes how the outcome of $Y$ increases/decreases with an increase/decrease in the explanatory variable $X$. The equation of the regression line is
$$Y=\beta_0 + \beta_1 X,$$
where $\beta_0$ is the intercept (value of $Y$ when $X=0$) and $\beta_1$ is the slope of the line. Both $\beta_0$ and $\beta_1$ are the parameters (or regression coefficients) of the linear equation.

Estimation of Regression Coefficients in Simple Linear Regression Model

The best-fitting line is derived using the method of the \textit{Least Squares} by finding the values of the parameters $\beta_0$ and $\beta_1$ that minimize the sum of the squared vertical distances of the points from the regression line,

The dotted-line (best-fit) line passes through the point ($\overline{X}, \overline{Y}$).

The regression line $Y=\beta_0+\beta_1X$ is fit by the least-squares methods. The regression coefficients $\beta_0$ and $\beta_1$ are both calculated to minimize the sum of squares of the vertical deviations of the points about the regression line. Each deviation equals the difference between the observed value of $Y$ and the estimated value of $Y$ (the corresponding point on the regression.

The following table shows the \textit{body weight} and \textit{plasma volume} of eight healthy men.

Subject	Body Weight (KG)	Plasma Volume (liters)
1	58.0	2.75
2	70.0	2.86
3	74.0	3.37
4	63.5	2.76
5	62.0	2.62
6	70.5	3.49
7	71.0	3.05
8	66.0	3.12

Simple Linear Regression Models: Scatter plot with regression line

Estimation of Paramters

The parameters $\beta_0$ and $\beta_1$ are estimated using the following formula (for simple linear regression model):

\begin{align}
\beta_1 &= \frac{n\sum\limits_{i=1}^{n} x_iy_i -\sum\limits_{i=1}^{n} x_i \sum\limits_{i=1}^{n} y_i} {n \sum\limits_{i=1}^{n} x_i^2 – \left(\sum\limits_{i=1}^{n} x_i \right)^2}\\
\beta_0 &= \overline{Y} – \beta_1 \overline{X}
\end{align}

Regression coefficients are sometimes known as “beta-coefficients”. When the slope ($\beta_1=0$, then there is no relationship between $X$ and $Y$ variables. For the data above, the best-fitting straight line describing the relationship between plasma volume with body weight is
$$Plasma\, Volume = 0.0857 +0.0436\times Weight$$
Note that the calculated values for $\beta_0$ and $\beta_1$ are estimates of the population values and, therefore, subject to sampling variations.

Real-Life Examples: Simple Linear Regression Models

Real Estate: Predicting House Prices (Estimate home prices based on size to guide buyers and sellers.)
Independent Variable ($X$): Size of the house (sq ft)
Dependent Variable ($Y$): Price of the house
Education: Predicting Student Scores (Teachers or students can predict likely outcomes based on study habits.)
$X$: Hours studied
$Y$: Exam scores
Healthcare: Predicting Blood Pressure (Understand how blood pressure tends to rise with age, aiding diagnosis.)
$X$: Age of patient
$Y$: Systolic blood pressure
Energy: Predicting Electricity Usage (Power companies use this to forecast demand and manage resources)
$X$: Temperature (°C or °F)
$Y$: Electricity consumption (kWh)
Manufacturing: Predicting Machine Failures
$X$: Hours a machine has been in use (Predict maintenance schedules and avoid production delays.)
$Y$: Number of breakdowns or wear percentage
Business: Predicting Sales Based on Advertising Spend (Helps businesses decide how much to invest in advertising.)
$X$: Advertising expenditure (in $\$$)
$Y$: Product sales (in units)
Agriculture: Predicting Crop Yield (Estimate yield based on expected rainfall to plan for food production.)
$X$: Amount of rainfall (mm)
$Y$: Crop yield (kg per acre)
Finance: Predicting Stock Prices (Although basic, it helps in forecasting trends over time (note: simple linear regression has limits in volatile markets))
$X$: Time (days or months)
$Y$: Stock closing price
Transportation: Estimating Fuel Consumption (Predict fuel needs and optimize transportation costs.)
$X$: Distance traveled (km)
$Y$: Fuel used (liters)
E-commerce: Predicting Customer Spending (Analyze user behavior and optimize website experience for better conversion.)
$X$: Time spent on the website
$Y$: Amount spent on a purchase

https://gmstat.com, https://rfaqs.com

DOE Quiz with Answers

Share this:

PDF of Student’s t Distribution

Moments of t Distribution

Generation of Pseudo Random t Variates

Student’s t Table

Share this:

Table of Contents

Introduction

Functional Relationship Between Variables

Estimation of Regression Coefficients in Simple Linear Regression Model

Estimation of Paramters

Real-Life Examples: Simple Linear Regression Models

Share this: