Simple Linear Regression Model

Frequently, we measure two or more variables on each individual and try to express the nature of the relationship between these variables (for example in simple linear regression model and correlation analysis). Using the regression technique, we estimate the relationship of one variable with another by expressing the one in terms of a linear (or more complex) function of another. We also predict the values of one variable in terms of the other. The variables involved in regression and correlation analysis are continuous. In this post we will learn about Simple Linear Regression Model.

We are interested in establishing significant functional relationships between two (or more) variables. For example, the function $Y=f(X)=a+bx$ (read as $Y$ is function of $X$) establishes a relationship to predict the values of variable $Y$ for the given values of variable $X$. In statistics (biostatistics), the function is called a simple linear regression model or simply the regression equation.

The variable $Y$ is called the dependent (response) variable, and $X$ is called the independent (regressor or explanatory) variable.

In biology, many relationships can be appropriate over only a limited range of values of $X$. Negative values are meaningless in many cases, such as age, height, weight, and body temperature.

The method of linear regression is used to estimate the best-fitting straight line to describe the relationship between variables. The linear regression gives the equation of the straight line that best describes how the outcome of $Y$ increases/decreases with an increase/decrease in the explanatory variable $X$. The equation of the regression line is
$$Y=\beta_0 + \beta_1 X,$$
where $\beta_0$ is the intercept (value of $Y$ when $X=0$) and $\beta_1$ is the slope of the line. Both $\beta_0$ and $\beta_1$ are the parameters (or regression coefficients) of the linear equation.

Estimation of Regression Coefficients in Simple Linear Regression Model

The best-fitting line is derived using the method of the \textit{Least Squares} by finding the values of the parameters $\beta_0$ and $\beta_1$ that minimize the sum of the squared vertical distances of the points from the regression line,

The dotted-line (best-fit) line passes through the point ($\overline{X}, \overline{Y}$).

The regression line $Y=\beta_0+\beta_1X$ is fit by the least-squares methods. The regression coefficients $\beta_0$ and $\beta_1$ both are calculated to minimize the sum of squares of the vertical deviations of the points about the regression line. Each deviation equals the difference between the observed value of $Y$ and the estimated value of $Y$ (the corresponding point on the regression.

The following table shows the \textit{body weight} and \textit{plasma volume} of eight healthy men.

SubjectBody Weight (KG)Plasma Volume (liters)
158.02.75
270.02.86
374.03.37
463.52.76
562.02.62
670.53.49
771.03.05
866.03.12
Simple Linear Regression Models: Scatter plot with regression line

The parameters $\beta_0$ and $\beta_1$ are estimated using the following formula (for simple linear regression model):

\begin{align}
\beta_1 &= \frac{n\sum\limits_{i=1}^{n} x_iy_i -\sum\limits_{i=1}^{n} x_i \sum\limits_{i=1}^{n} y_i} {n \sum\limits_{i=1}^{n} x_i^2 – \left(\sum\limits_{i=1}^{n} x_i \right)^2}\\
\beta_0 &= \overline{Y} – \beta_1 \overline{X}
\end{align}

Regression coefficients are sometimes known as “beta-coefficients”. When slope ($\beta_1=0$) then there is no relationship between $X$ and $Y$ variable. For the data above, the best-fitting straight line describing the relationship between plasma volume with body weight is
$$Plasma\, Volume = 0.0857 +0.0436\times Weight$$
Note that the calculated values for $\beta_0$ and $\beta_1$ are estimates of the population values, therefore, subject to sampling variations.

Simple linear regression model equation

https://gmstat.com

https://rfaqs.com

Split Plot Design

The design in which the levels of one factor can be applied to large experimental units and the levels of other factors to the sub-units are known as “split plot design“.

A split plot experiment is a blocked experiment in which blocks serve as experimental units. After blocking the levels of other factors are randomly applied to large units within blocks, often called whole plots or main plots.

The split plot design are specifically suited for two factors designs that have more treatment to be accommodated by a complete block designs. In split plot design all the factors are not of equal importance. For example, in an experiment of varieties and fertilizers, the variety is less important and the fertilizer is more important.

In these design, the experimental units are divided into two parts, (i) Main plot and (ii) sub-plot. The levels of one factor are assigned at random to large experimental units (main plot) and the levels of the other (second) factor are applied at random the the sub-units (sub-plot) within the large experimental units. The sub-units are obtained by dividing the large experimental units.

Note that the assignment of a particular factor to either the main plot or to the subplot is extremely important, it is because the plot size and precision of measurement of the effects are not the same for both factors.

The sub-plot treatments are the combination of the levels of different factors.

The split plot design involves assigning the levels of one factor to main plots which may be arranged in a “CRD”, “RCBD” or “LSD”. The levels of the other factor are assigned to subplots within each main plot.

Split Plot Design Layout Example

If there are 3 varieties and 3 fertilizers and we want more precision for fertilizers then with the RCBD with 3 replication, the varieties are assigned randomly to the main plots within 3 blocks using a separate randomization for each. Then the levels of the fertilizers are randomly assigned to the subplots within the main plots using a separate randomization in each main plot. The layout is

Split Plot Design

Another Split Plot Design Example

Suppose we want to study the effects of two irrigation methods (factor 1) and two different fertilizer types (factor 2) on four different fields (“whole plots”). While a field can easily be split into two for the two different fertilizers, the field cannot easily be split into two for irrigation: One irrigation system normally covers a whole field and the systems are expensive to replace.

Split Plot Design Example

Advantages and Disadvantages of Split Plot Design

Advantages of Split Plot Design

  • More Practical
    Randomizing hard-to-change factors in groups, rather, than randomizing every run, is much less labor and time intensive.
  • Pliable
    Factors that naturally have large experimental units can be easily combined with factors having smaller experimental units.
  • More powerful
    Tests for the subplot effects from the easy-to-change factors generally have higher power due to partitioning the variance sources.
  • Adaptable
    New treatments can be introduced to experiments that are already in progress.
  • Cheaper to Run
    In case of a CRD, implementing a new irrigation method for each subplot would be extremely expensive.
  • More Efficient
    Changing the hard-to-change factors causes more error (increased variance) than changing the easy-to-change factors a split-plot design is more precise (than a completely randomized run order) for the subplot factors, subplot by subplot interactions and subplot by whole-plot interactions.
  • Efficient
    More efficient statistically, with increased precision. It permits efficient application of factors that would be difficult to apply to small plots.
  • Reduced Cost
    They can reduce the cost and complexity of manipulating factors that are difficult or expensive to change.
  • Precision
    The overall precision of split-plot design relative to the randomized complete block design may be increased by designing the main plot treatment in a Latin square design or in an incomplete Latin square design.

Disadvantages of Split Plot Design

  • Less powerful
    Tests for the hard-to-change factors are less powerful, having a larger variance to test against and fewer changes to help overcome the larger error.
  • Unfamiliar
    Analysis requires specialized methods to cope with partitioned variance sources.
  • Different
    Hard-to-change (whole-plot) and easy-to-change (subplot) factor effects are tested against different estimated noise. This can result in large whole-plot effects not being statistically significant, whereas small subplot effects are significant even though they may not be practically important.
  • Precision
    Differential in the estimation of interaction and the main effects.
  • Statistical Analysis
    Complicated statistical analysis.
  • Sources of Variation
    They involve different sources of variation ad error for each factor.
  • Missing Data
    When missing data occurs, the analysis is more complex than for a randomized complete block design.
  • Different treatment comparisons have different basic error variances which make the analysis more complex than with the randomized complete block design, especially if some unusual type of comparison is being made.
Design of Experiment

https://rfaqs.com

https://gmstat.com

Important MCQs Probability Distributions Quiz 5

This Quiz contains MCQs Probability Distributions Quiz. It covers events, experiments, mutually exclusive events, collectively exhaustive events, sure events, impossible events, addition and multiplication laws of probability, concepts related to discrete and continuous random variables, probability distribution and probability density functions, characteristics and properties of probability distributions, discrete probability distribution, and continuous probability distributions, etc.

Online MCQs about Probability Distributions with Answers

1. If $N$ is population size, $n$ is the sample size, $p$ is probability of success, $K$ is number of successes stated in population, $k$ is the number of observed successes, then the parameters of binomial distribution are

 
 
 
 

2. Themean of the Poisson distribution is 9 then its standard deviation is

 
 
 
 

3. The formula of mean of uniform or rectangular distribution is as

 
 
 
 

4. The parameters of hypergeometric distributions are

Note that $N$ is population size, $n$ is sample size, $p$ is the probability of successes, $K$ is number of successes stated in the populaiton, $k$ is the number of observed successes.

 
 
 
 

5. Which of the distribution have larger variance than it s mean

 
 
 
 

6. A random variable $X$ has a binomial distribution with $n=9$, the variance of $X$ is

 
 
 
 

7. In any normal distribution, the proportion of observations that are outside $\pm$ standard deviation of the mean is closest to

 
 
 
 

8. The mean deviation of a normal distribution is

 
 
 
 

9. The normal distribution is also classified as

 
 
 
 

10. In binomial probability distribution, the formula of calculating standard deviation is

 
 
 
 

11. In normal distribution, the proportion of observations that lies between 1 standard deviations of the mean is closest to

 
 
 
 

12. If $X$ follows Goemtric distribution with parameter $p$ (probability of success) then the Mean of $X$ is

 
 
 
 

13. For beta distribution of 1st kind, the range of $X$ is

 
 
 
 

14. The distribution of square of standard normal random variable will be

 
 
 
 

15. The Chi-Square distribution is a special case of

 
 
 
 

16. When can we use a normal distribution to approximate a binomial distribution?

 
 
 
 

17. An oil company conducts a geological study that indicates that an exploratory oil well should have a 0.25 probability of striking oil. The company is interested to find the probability that the 3rd strik comes on the 6th well drilled. Which distribution will be used?

 
 
 
 

18. An oil company conducts a geological study that indicates that an exploratory oil well should have a 20% chance of striking oil. The company is interested to find the probaiblity that the first strike comes on the third well drilled. Which distribution distribution will be used?

 
 
 
 

19. For Beta distribution of 2nd kind, the range of $X$ is

 
 
 
 

20. In binomial probability distributions, the dependents of standard deviations must includes

 
 
 
 


Probability distributions are the foundation for various statistical tests like hypothesis testing. By comparing observed data to a theoretical distribution (the null hypothesis), we can assess the likelihood that the data arose by chance.

Probability distributions are crucial tools in data analysis. They help identify patterns, outliers, and relationships between variables. Furthermore, many statistical models depend on specific probability distributions to function accurately.

Probability Distributions

Online MCQs Probability Distributions Quiz

  • In binomial probability distributions, the dependents of standard deviations must includes
  • In binomial probability distribution, the formula of calculating standard deviation is
  • The formula of mean of uniform or rectangular distribution is as
  • The normal distribution is also classified as
  • The mean deviation of a normal distribution is
  • The Chi-Square distribution is a special case of
  • Which of the distribution have larger variance than it s mean
  • For Beta distribution of 2nd kind, the range of $X$ is
  • Themean of the Poisson distribution is 9 then its standard deviation is
  • In normal distribution, the proportion of observations that lies between 1 standard deviations of the mean is closest to
  • For beta distribution of 1st kind, the range of $X$ is
  • The parameters of hypergeometric distributions are Note that $N$ is population size, $n$ is sample size, $p$ is the probability of successes, $K$ is number of successes stated in the populaiton, $k$ is the number of observed successes.
  • If $N$ is population size, $n$ is the sample size, $p$ is probability of success, $K$ is number of successes stated in population, $k$ is the number of observed successes, then the parameters of binomial distribution are
  • An oil company conducts a geological study that indicates that an exploratory oil well should have a 0.25 probability of striking oil. The company is interested to find the probability that the 3rd strik comes on the 6th well drilled. Which distribution will be used?
  • If $X$ follows Goemtric distribution with parameter $p$ (probability of success) then the Mean of $X$ is
  • The distribution of square of standard normal random variable will be
  • A random variable $X$ has a binomial distribution with $n=9$, the variance of $X$ is
  • In any normal distribution, the proportion of observations that are outside $\pm$ standard deviation of the mean is closest to
  • When can we use a normal distribution to approximate a binomial distribution?
  • An oil company conducts a geological study that indicates that an exploratory oil well should have a 20% chance of striking oil. The company is interested to find the probaiblity that the first strike comes on the third well drilled. Which distribution distribution will be used?
Probability distributions Quiz

https://itfeature.com

https://rfaqs.com

How to Split Data File in SPSS?

In SPSS (Statistical Packages for Social Sciences) split file option lets the user to splits the data into separate groups for analysis based on the values of one or more grouping variables. If user select multiple grouping variables, the cases are grouped by each variable within categories of the preceding variable on the groups based on list. Let us learn about the step-by-step procedure to Split Data file in SPSS.

How to Split Data File in SPSS

Suppose you want to take the separate mean of male and female (groups/ categories from gender variable) then one may use split file option.

  • First Open the data file you want to split.
  • Second, from the menu bar, click the Data Menu and then Split File Option (Data -> Split File)
Split Data File in SPSS Menu

The following dialog box “Split File” will appears. Click on the radio button title “Organize output by Groups” after clicking the Grouping variable from left pan.

Split File in SPSS Dialog Box Options
  • Select the Gender Varaible (or the grouping variable you want to split) in the dialog box at the left pan and clikc on the arrow at the “Groups based on” box.
Split File in SPSS
  • Click the OK button. Now, subsequent analyses will reflect the split.
  • The data in data windows will be logical splitted. One can run requierd descriptive and inferential analsysi of the splitted data.

Split File Off

  • The most important point is to get back to ‘normal’ where the data are not split, go back to Data/Split Files… and select the option ‘Analyze All cases.’
  • Press OK. It will show SPLIT FILE OFF. Then you can get back output of data without splitting the files.

https://rfaqs.com