# Category: Design of Experiment (DOE)

## MCQs DOE 1

This test contains multiple-choice questions from Design of Experiments (DOE).

1. In one-way ANOVA with total number of observations is 15 with 5 treatments then total degrees of freedom is

2. In one-way ANOVA, with usual notation, the error degree of freedom is

3. Analysis of variance is used to test

4. In one-way ANOVA, given SSB = 2580, SSE =1656, k = 4, n = 20 then the value of F is

5. The assumption used in ANOVA is

6. In one-way ANOVA, the caluclated F value is less than the table F value then

7. In two-way ANOVA with $m$ rows and $n$ columns, the error degrees of freedom is

8. Consider $k$ independent samples each containing $n_1, n_2, \cdots, n_k$ items such that $n_1+n_2+\cdots+ n_k=n$. In ANOVA we use F-distribution with degree of freedom

9. In two-way ANOVA with m=5, n=4, then the total degrees of freedom is

10. In ANOVA we use

## Block Designs Properties

The necessary conditions that the parameters of a BIB design must satisfy are

• $bk = vr$, where $r=\frac{bk}{v}$ each treatment has $r$ replications
• no treatment appears more than once in any block
• all unordered pairs of treatments appear exactly in $\lambda$ blocks (equiconcurrence)
where $\lambda=\frac{r(k-1)}{v-1}=\frac{bk(k-1}{v(v-1)}$ is often referred to as the concurrence parameter of a BIB design.

A design say $d$ with parameters $(v, b, r, k, \lambda)$ can be represented as a $v \times b$ treatment block incidence matrix (having $v$ rows and $b$ columns). Let denote it by $N=n_{ij}$ whose elements $n_{ij}$ signify the number of units in block $j$ allocated to treatment $i$. The rows of incidence matrix are labeled with varieties (treatments) of the design and the columns with the blocks. We have to put 1 in the ($i$, $j$)th cell of the matrix if variety $i$ is contained in block $j$ and 0 otherwise. Each row of the incidence matrix has $r$ 1’s and each column has $k$ 1’s and each pair of distinct rows have $\lambda$ column 1’s, which lead to a useful identity matrix.
The matrix $NN’$ have $v$ rows and $v$ columns, referred to as concurrence matrix of design $d$ and its entries, the concurrence parameters are denoted by $\lambda_{dij}$. For a BIBD, $n_{ij}$ is either one or zero and $n_{ij}^2= n_{ij}$.

Theorem: If $N$ is the incidence matrix of a $(v, b, r, k, \lambda)$-design then $NN’=(r-\lambda)I+\lambda J$ where $I$ is $v\times v$ identity matrix and $J$ is the $v\times v$ matrix of all 1’s.

Example: For design {1,2,3}, {2,3,4}, {3,4,1}, {4,1,2} construct incidence matrix

Denoting the elements of $NN’$ by $q_{ih}$, we see that $q_{ii}=\sum_j n_{ij}^2$ and $q_{ih}=\sum_j n_{ij} n_{hj}, (i \ne h)$. For any block design $NN’$, the treatment concurrence with diagonal elements equal to $q_{ii}=r$ and off diagonal elements are $q_{ih}=\lambda, (i\ne h)$ equal to the number of times any pairs of treatment occur together within block. In a balanced design, the off-diagonal entries in $NN’$ are all equal to a constant $\lambda$ i.e., the common replication for a BIBD is $r$, and the common pairwise treatment concurrence is $\lambda$.

As $N$ is a matrix of $v$ rows and $b$ columns so that $r(N)\le min(b, c)$. Hence, $t\le min(b, v)$. If design is symmetric $b=v$ and $N$ is square the $|NN’|=|N|^2$, so $(r-\lambda)^{v-1}r^2$ is a perfect square.

## Data Collection Methods

There are many methods to collect data, but these methods can be classified into four main methods (sources) of collecting data to use in statistical inference. These are (i) Survey Method (ii) Simulation (iii) Controlled Experiments (iv) Observational Study.

## Survey Method

A very popular and widely used method is the survey, where people with special training go out and record observations of, the number of vehicles, traveling along a road, the acres of fields that farmers are using to grow a particular food crop; the number of households that own more than one motor vehicle, the number of passengers using Metro transport and so on. Here the person making the study has no direct control over generating the data that can be recorded, although the recording methods need care and control.

## Simulation

In Simulation, a computer model for the operation of an (industrial)  system is set up in which an important measurement is a percentage purity of a (chemical) product. A very large number of realizations of the model can be run in order to look for any pattern in the results. Here the success of the approach depends on how well that measurement can be explained by the model and this has to be tested by carrying out at least a small amount of work on the actual system in operation.

## Controlled Experiments

An experiment is possible when the background conditions can be controlled, at least to some extent. For example, we may be interested in choosing the best type of grass seed to use in the sports field.

The first stage of work is to grow all the competing varieties of seed at the same place and make suitable records of their growth and development. The competing varieties should be grown in quite small units close together in the field as in the figure below

This is the controlled experiment as it has certain constraints such as;

i) River on the right side
ii) Shadow of trees on the left side
iii) There are 3 different varieties (say, v1, v2, v3) and are distributed in 12 units.

In the diagram below, much more control of local environmental conditions than there would have been of one variety had been replaced in the strip in the shelter of the trees, another close by the river while the third one is more exposed in the center of the field;

There are 3 experimental units. One is close to the stream and the other is to trees while the third one is between them which is more beneficial than others. It is now our choice where to place any one of them at any of the sides.

## Observational Study

Like experiments, observational studies try to understand cause-and-effect relationships. However, unlike experiments, the researcher is not able to control (1) how subjects are assigned to groups and/or (2) which treatments each group receives.

Note that small units of land or plots are called experimental units or simply units.

There is no “right” side for a unit, it depends on the type of the crop, the work that is to be done on it, and the measurements that are to be taken. Similarly, the measurements upon which inferences are eventually going to be based are to be taken as accurately as possible. The unit must, therefore, need not be so large as to make recording very tedious because that leads to errors and inaccuracy. On the other hand, if a unit is very small there is the danger that relatively minor physical errors in recording, can lead to large percentage errors.

Experimenters and statisticians who collaborate with them, need to gain a good knowledge of their experimental material or units as a research program proceeds.

## Basic Principles of Experimental Design

The basic principles of experimental design are (i) Randomization, (ii) Replication, and (iii) Local Control.

1. Randomization

Randomization is the cornerstone underlying the use of statistical methods in experimental designs.  Randomization is the random process of assigning treatments to the experimental units. The random process implies that every possible allotment of treatments has the same probability. For example, if number of treatment = 3 (say, A, B, and C) and replication = r = 4, then the number of elements = t * r = 3 * 4 = 12 = n. Replication means that each treatment will appear 4 times as r = 4. Let the design is

 A C B C C B A B A C B A

Note from the design elements 1, 7, 9, 12 are reserved for treatment A, element 3, 6, 8 and 11 are reserved for Treatment B and elements 2, 4, 5 and 10 are reserved for Treatment C. P(A)= 4/12, P(B)= 4/12, and P(C)=4/12, meaning that Treatment A, B, and C have equal chances of its selection.

2. Replication

By replication, we mean that repetition of the basic experiments. For example, If we need to compare the grain yield of two varieties of wheat then each variety is applied to more than one experimental units. The number of times these are applied to experimental units is called their number of replication. It has two important properties:

• It allows the experimenter to obtain an estimate of the experimental error.
• The more replication would provide the increased precision by reducing the standard error (SE) of mean as $s_{\overline{y}}=\tfrac{s}{\sqrt{r}}$, where $s$ is sample standard deviation and $r$ is number of replications. Note that increase in $r$ value $s_{\overline{y}}$ (standard error of $\overline{y}$).
3. Local Control

It has been observed that all extraneous source of variation is not removed by randomization and replication, i.e. unable to control the extraneous source of variation.
Thus we need to a refinement in the experimental technique. In other words, we need to choose a design in such a way that all extraneous source of variation is brought under control. For this purpose we make use of local control, a term referring to the amount of (i) balancing, (ii) blocking and (iii) grouping of experimental units.

Balancing: Balancing means that the treatment should be assigned to the experimental units in such a way that the result is a balanced arrangement of treatment.

Blocking: Blocking means that the like experimental units should be collected together to far relatively homogeneous groups. A block is also a replicate.

The main objective/ purpose of local control is to increase the efficiency of experimental design by decreasing the experimental error.