One Way Analysis of Variance: Made Easy

The article is about one way Analysis of Variance. In the analysis of variance, the total variation in the data of the sample is split up into meaningful components that measure different sources of variation. Each component yields an estimate of the population variance, and these estimates are tested for homogeneity by using the F-distribution.

One Way Classification (Single Factor Experiments)

The classification of observations based on a single criterion or factor is called a one-way classification.

In single factor experiments, independent samples are selected from $k$ populations, each with $n$ observations. For samples, the word treatment is used and each treatment has $n$ repetitions or replications. By treatment, we mean the fertilizers applied to the fields, the varieties of a crop sown, or the temperature and humidity to which an item is subjected in a production process. The collected data consisting of $kn$ observations ($k$ samples of $n$ observations each) can be presented as.

One way analysis of variance

where

$X_{ij}$ is the $i$th observation receiving the $j$th treatment

$X_{\cdot j}=\sum\limits_{i=1}^n X_{ij}$ is the total observations receiving the $j$th treatment

$\overline{X}_{\cdot j}=\frac{X_{\cdot j}}{n}$ is the mean of the observations receiving the $j$th treatment

$X_{\cdot \cdot}=\sum\limits_{i=j}^n X_{\cdot j} = \sum\limits_{j=1}^k \sum\limits_{i=1}^n X_{ij}$ is the total of all observations

$\overline{\overline{X}} = \frac{X_{\cdot \cdot}}{kn}$ is the mean of all observations.

The $k$ treatments are assumed to be homogeneous, and the random samples taken from the same parent population are approximately normal with mean $\mu$ and variance $\sigma^2$.

Design of Experiments

One Way Analysis of Variance Model

The linear model on which the one way analysis of variance is based is

$$X_{ij} = \mu + \alpha_j + e_{ij}, \quad\quad i=1,2,\cdots, n; \quad j=1,2,\cdots, k$$

Where $X_{ij}$ is the $i$th observation in the $j$th treatment, $\mu$ is the overall mean for all treatments, $\alpha_j$ is the effect of the $j$th treatment, and $e_{ij}$ is the random error associated with the $i$th observation in the $j$th treatment.

The One Way Analysis of Variance model is based on the following assumptions:

  • The model assumes that each observation $X_{ij}$ is the sum of three linear components
    • The true mean effect $\mu$
    • The true effect of the $j$th treatment $\alpha_j$
    • The random error associated with the $j$th observation $e_{ij}$
  • The observations to which the $k$ treatments are applied are homogeneous.
  • Each of the $k$ samples is selected randomly and independently from a normal population with mean $\mu$ and variance $\sigma^2_e$.
  • The random error $e_{ij}$ is a normally distributed random variable with $E(e_{ij})=0$ and $Var(e_{ij})=\sigma^2_{ij}$.
  • The sum of all $k$ treatments effects must be zero $(\sum\limits_{j=1}^k \alpha_j =0)$.

Suppose you are comparing crop yields that were fertilized with different mixtures. The yield (numerical) is the dependent variable, and fertilizer type (categorical with 3 levels) is the independent variable. ANOVA helps you determine if the fertilizer mixtures have a statistically significant effect on the average yield.

https://rfaqs.com

https://gmstat.com

Block Design, Incidence, and Concurrence Matrix (2018)

Block Design Properties

The necessary conditions that the parameters of a Balanced Incomplete Block Design (BIB design) must satisfy are

  • $bk = vr$, where $r=\frac{bk}{v}$ each treatment has $r$ replications
  • no treatment appears more than once in any block
  • all unordered pairs of treatments appear exactly in $\lambda$ blocks (equi-concurrence)
    where $\lambda=\frac{r(k-1)}{v-1}=\frac{bk(k-1}{v(v-1)}$ is often referred to as the concurrence parameter of a BIB design.

A design say $d$ with parameters $(v, b, r, k, \lambda)$ can be represented as a $v \times b$ treatment block incidence matrix (having $v$ rows and $b$ columns). Let denote it by $N=n_{ij}$ whose elements $n_{ij}$ signify the number of units in block $j$ allocated to treatment $i$. The rows of the incidence matrix are labeled with varieties (treatments) of the design and the columns with the blocks.

We have to put 1 in the ($i$, $j$)th cell of the matrix if variety $i$ is contained in block $j$ and 0 otherwise. Each row of the incidence matrix has $r$ 1’s, each column has $k$ 1’s, and each pair of distinct rows has $\lambda$ column 1’s, leading to a useful identity matrix.
The matrix $NN’$ has $v$ rows and $v$ columns, referred to as the concurrence matrix of design $d$, and its entries, the concurrence parameters are denoted by $\lambda_{dij}$. For a BIBD, $n_{ij}$ is either one or zero, and $n_{ij}^2= n_{ij}$.

Theorem: If $N$ is the incidence matrix of a $(v, b, r, k, \lambda)$-design then $NN’=(r-\lambda)I+\lambda J$ where $I$ is $v\times v$ identity matrix and $J$ is the $v\times v$ matrix of all 1’s.

Example: For Block Design {1,2,3}, {2,3,4}, {3,4,1}, {4,1,2} construct incidence matrix

Block Design: incidence matrix
Incidence and Concurrence matrix


Denoting the elements of $NN’$ by $q_{ih}$, we see that $q_{ii}=\sum_j n_{ij}^2$ and $q_{ih}=\sum_j n_{ij} n_{hj}, (i \ne h)$. For any block design $NN’$, the treatment concurrence with diagonal elements equal to $q_{ii}=r$ and off-diagonal elements are $q_{ih}=\lambda, (i\ne h)$ equal to the number of times any pairs of treatment occur together within the block. In a balanced design, the off-diagonal entries in $NN’$ are all equal to a constant $\lambda$ i.e., the common replication for a BIBD is $r$, and the common pairwise treatment concurrence is $\lambda$.

$N$ is a matrix of $v$ rows and $b$ columns that $r(N)\le min(b, c)$. Hence, $t\le min(b, v)$. If design is symmetric $b=v$ and $N$ is square the $|NN’|=|N|^2$, so $(r-\lambda)^{v-1}r^2$ is a perfect square.

Using R Packages

MCQs General Knowledge

Latin Square Designs (LSD) Definition and Introduction

Introduction to Latin Square Designs

In Latin Square Designs the treatments are grouped into replicates in two different ways, such that each row and each column is a complete block, and the grouping for balanced arrangement is performed by restricting that each of the treatments must appear once and only once in each of the rows and only once in each of the column. The experimental material should be arranged and the experiment conducted in such a way that the differences among the rows and columns represent a major source of variation.

Hence a Latin Square Design is an arrangement of $k$ treatments in a $k\times k$ squares, where the treatments are grouped in blocks in two directions. It should be noted that in a Latin Square Design the number of rows, the number of columns, and the number of treatments must be equal.

In other words unlike Randomized Completely Block Design (RCBD) and Completely Randomized Design (CRD) a Latin Square Design is a two-restriction design, which provides the facility of two blocking factors that are used to control the effect of two variables that influence the response variable. Latin Square Design is called Latin Square because each Latin letter represents the treatment that occurs once in a row and once in a column in such a way that for one criterion (restriction), rows are completely homogeneous blocks, and concerning another criterion (second restriction) columns are completely homogeneous blocks.

Application of Latin Square Designs

The application of Latin Square Designs is mostly in animal science, agriculture, industrial research, etc. A daily life example can be a simple game called Sudoku puzzle is also a special case of Latin square designs. The main assumption is that there is no contact between treatments, rows, and columns effect.

Latin Square Designs

The general model is defined as
\[Y_{ijk}=\mu+\alpha_i+\beta_j+\tau_k +\varepsilon_{ijk}\]

where $i=1,2,\cdots,t; j=1,2,\cdots,t$ and $k=1,2,\cdots,t$ with $t$ treatments, $t$ rows and $t$ columns,
$\mu$ is the overall mean (general mean) based on all of the observations,
$\alpha_i$ is the effect of the $i$th row,
$\beta_j$ is the effect of $j$th rows,
$\tau_k$ is the effect of the $k$th column.
$\varepsilon_{ijk}$ is the corresponding error term which is assumed to be independent and normally distributed with mean zero and constant variance i.e $\varepsilon_{ijk}\sim N(0, \sigma^2)$.

Latin Square Designs Experimental Layout

Suppose we have 4 treatments (namely: $A, B, C$, and $D$), then it means that we have

Number of Treatments = Number of Rows = Number of Columns =4

The Latin Square Designs Layout can be for example

A
$Y_{111}$
B
$Y_{122}$
C
$Y_{133}$
D
$Y_{144}$
B
$Y_{212}$
C
$Y_{223}$
D
$Y_{234}$
A
$Y_{241}$
C
$Y_{313}$
D
$Y_{324}$
A
$Y_{331}$
B
$Y_{342}$
D
$Y_{414}$
A
$Y_{421}$
B
$Y_{432}$
C
$Y_{443}$

The number in subscript represents a row, block, and treatment number respectively. For example, $Y_{421}$ means the first treatment in the 4th row, the second block (column).

Latin Square Designs

Benefits of using Latin Square Designs

  • Efficiency: It allows to examination of multiple factors (treatments) within a single experiment, reducing the time and resources needed.
  • Controlling Variability: By ensuring a balanced distribution of treatments across rows and columns, one can effectively control for two sources of variation that might otherwise influence the results.

Limitations

The following limitations need to be considered:

  • Number of Treatments: The number of rows and columns in the Latin square must be equal to the number of treatments. This means it works best with a small to moderate number of treatments.
  • Interaction Effects: Latin squares are good for analyzing the main effects of different factors, but they cannot account for interaction effects between those factors.

Matrices and Determinants Quizzes

Randomized Complete Block Design (RCBD)

The Randomized Complete Block Design may be defined as the design in which the experimental material is divided into blocks/groups of homogeneous experimental units (experimental units have same characteristics) and each block/group contains a complete set of treatments which are assigned at random to the experimental units.

In Randomized Complete Design (CRD), there is no restriction on the allocation of the treatments to experimental units. But in practical life there are situations where there is relatively large variability in the experimental material, it is possible to make blocks (in a simpler sense groups) of the relatively homogeneous experimental material or units. The design applied in such situations is called a Randomized Complete Block Design (RCBD).

Randomized Complete Block Design

RCBD is a one-restriction design, used to control a variable that influences the response variable. The main aim of the restriction is to control the variable causing the variability in response. Efforts of blocking are made to create a situation of homogeneity within the block. Blocking is a source of variability. An example of a blocking factor might be the gender of a patient (by blocking on gender), this is a source of variability controlled for, leading to greater accuracy. RCBD is a mixed model in which one factor is fixed and the other is random. The main assumption of the design is that there is no contact between the treatment and block effect.

Randomized Complete Block design is said to be a complete design because in this design the experimental units and number of treatments are equal. Each treatment occurs in each block.

The general model is defined as

\[Y_{ij}=\mu+\eta_i+\xi_j+e_{ij}\]

where $i=1,2,3\cdots, t$ and $j=1,2,\cdots, b$ with $t$ treatments and $b$ blocks. $\mu$ is the overall mean based on all observations, $\eta_i$ is the effect of the ith treatment response, $\xi$ is the effect of the jth block, and $e_{ij}$ is the corresponding error term which is assumed to be independent and normally distributed with mean zero and constant variance.

The main objective of blocking is to reduce the variability among experimental units within a block as much as possible and to maximize the variation among blocks; the design would not contribute to improving the precision in detecting treatment differences.

Randomized Complete Block Design Experimental Layout

Suppose there are $t$ treatments and $r$ blocks in a randomized complete block design, then each block contains homogeneous plots for one of each treatment. An experimental layout for such a design using four treatments in three blocks is as follows.

Block 1Block 2Block 3
ABC
BCD
CDA
DAB
Randomized Complete Block Design (RCBD)

From the RCBD layout, we can see that

  • The treatments are assigned at random within blocks of adjacent subjects and each of the treatments appears once in a block.
  • The number of blocks represents the number of replications
  • Any treatment can be adjacent to any other treatment, but not to the same treatment within the block.
  • Variation in an experiment is controlled by accounting for spatial effects.

MCQs about Sequence and Series (Intermediate Math Part – 1)

Learn R Programming