Randomized Complete Block Design

In Randomized Complete Design (CRD), there is no restriction on the allocation of the treatments to experimental units. But in practical life there are situations where there is relatively large variability in the experimental material, it is possible to make blocks (in simpler sense groups) of the relatively homogeneous experimental material or units. The design applied in such situations is named as Randomized Complete Block Design (RCBD).

The Randomized Complete Block Design may be defined as the design in which the experimental material is divided into blocks/groups of homogeneous experimental units (experimental units have same characteristics) and each block/group contains a complete set of treatments which are assigned at random to the experimental units.

Actually RCBD is a one restrictional design, used to control a variable which is influence the response variable. The main aim of the restriction is to control the variable causing the variability in response. Efforts of blocking is done to create the situation of homogeneity within block. A blocking is a source of variability. An example of blocking factor might be the gender of a patient (by blocking on gender), this is source of variability controlled for, leading to greater accuracy. RCBD is a mixed model in which a factor is fixed and other is random. The main assumption of the design is that there is no contact between the treatment and block effect.

Randomized Complete Block design is said to be complete design because in this design the experimental units and number of treatments are equal. Each treatment occurs in each block.

The general model is defined as


where $i=1,2,3\cdots, t$ and $j=1,2,\cdots, b$ with $t$ treatments and $b$ blocks. $\mu$ is the overall mean based on all observations, $\eta_i$ is the effect of the ith treatment response, $\xi$ is the effect of jth block and $e_{ij}$ is the corresponding error term which is assumed to be independent and normally distributed with mean zero and constant variance.

The main objective of blocking is to reduce the variability among experimental units within a block as much as possible and to maximize the variation among blocks; the design would not contribute to improve the precision in detecting treatment differences.

Randomized Complete Block Design Experimental Layout

Suppose there are $t$ treatments and $r$ blocks in a randomized complete block design, then each block contains homogeneous plots one of each treatment. An experimental layout for such a design using four treatments in three blocks be as follows.

Block 1 Block 2 Block 3

From RCBD layout we can see that

  • The treatments are assigned at random within blocks of adjacent subjects and each of the treatment appears once in a block.
  • The number of block represents the number of replications
  • Any treatment can be adjacent to any other treatment, but not to the same treatment within the block.
  • Variation in an experiment is controlled by accounting spatial effects.


Be Sociable, Share!

Covariance and Correlation

Covariance and Correlation

Covariance measures the degree to which two variables co-vary (i.e. vary/ changes together). If the greater values of one variable (say, $X_i$) correspond with the greater values of the other variable (say, $X_j$), i.e. if the variables tend to show similar behaviour, then the covariance between two variables ($X_i$, $X_j$) will be positive. Similarly if the smaller values of one variable correspond with the smaller values of the other variable, then the covariance between two variables will be positive. In contrast, if the greater values of one variable (say, $X_i$) mainly correspond to the smaller values of the other variables (say, $X_j$), i.e. both of the variables tend to show opposite behaviour, then the covariance will be negative.

In other words, for positive covariance between two variables means they (both of the variables) vary/changes together in the same direction relative to their expected values (averages). It means that if one variable moves above its average value, then the other variable tend to be above its average value also. Similarly, if covariance is negative between the two variables, then one variable tends to be above its expected value, while the other variable tends to be below its expected value. If covariance is zero then it means that there is no linear dependency between the two variables. Mathematically covariance between two random variables $X_i$ and $X_j$ can be represented as
\[COV(X_i, X_j)=E[(X_i-\mu_i)(X_j-\mu_j)]\]
$\mu_i=E(X_i)$ is the average of the first variable
$\mu_j=E(X_j)$ is the average of the second variable

COV(X_i, X_j)&=E[(X_i-\mu_i)(X_j-\mu_j)]\\
&=E[X_i X_j – X_i E(X_j)-X_j E(X_i)+E(X_i)E(X_j)]\\
&=E(X_i X_j)-E(X_i)E(X_j) – E(X_j)E(X_i)+E(X_i)E(X_j)\\
&=E(X_i X_j)-E(X_i)E(X_j)

Note that, the covariance of a random variable with itself is the variance of the random variable, i.e. $COV(X_i, X_i)=VAR(X)$. If $X_i$ and $X_j$ are independent, then $E(X_i X_j)=E(X_i)E(X_j)$ and $COV(X_i, X_j)=E(X_i X_j)-E(X_i) E(X_j)=0$.

Covariance and Correlation

Correlation and covariance are related measures but not equivalent statistical measures. The correlation between two variables (Let, $X_i$ and $X_j$) is their normalized covariance, defined as
\rho_{i,j}&=\frac{E[(X_i-\mu_i)(X_j-\mu_j)]}{\sigma_i \sigma_j}\\
&=\frac{n \sum XY – \sum X \sum Y}{\sqrt{(n \sum X^2 -(\sum X)^2)(n \sum Y^2 – (\sum Y)^2)}}
where $\sigma_i$ is the standard deviation of $X_i$ and $\sigma_j$ is the standard deviation of $X_j$.

Note that correlation is the dimensionless, i.e. a number which is free of measurement unit and its values lies between -1 and +1 inclusive. In contrast covariance has a unit of measure–the product of the units of two variables.

For further reading about Correlation follows these posts


Be Sociable, Share!

Data Transformation (Variable Transformation)

Data Transformation (Variable Transformation)

A transformation is a rescaling of the data using a function or some mathematical operation on each observation. When data are very strongly skewed (negative or positive), we sometime transform the data so that they are easier to model. In other way, if variable(s) does not fit a normal distribution then one should try a data transformation to fit the assumption of using a parametric statistical test.

The most common data transformation is log (or natural log) transformation, which is often applied when most of the data values cluster around zero relative to the larger values in the data set and all of the observations are positive.

Transformation can also be applied to one or more variables in scatter plot, correlation and regression analysis to make the relationship between the variables more linear; and hence it is easier to model with simple method. Other transformation than log are square root, reciprocal etc.

Reciprocal Transformation
The reciprocal transformation $x$ to $\frac{1}{x}$ or $(-\frac{1}{x})$ is a very strong transformation with a drastic effect on shape of the distribution. Note that this transformation cannot be applied to zero values, but can be applied to negative values. Reciprocal transformation is not useful unless all of the values are positive and reverses the order among values of the same sign i.e. largest becomes smallest etc.

Logarithmic Transformation
The logarithm $x$ to log (base 10) (or natural log, or log base 2) is an other strong transformation that have effect on the shape of distribution. Logarithmic transformation commonly used for reducing right skewness, but cannot be applied to negative or zero values.

Square Root Transformation
The square root x to $x^{\frac{1}{2}}=\sqrt(x)$ transformation have moderate effect on distribution shape and weaker than the logarithm. Square root transformation can be applied to zero values but not negative values.

Goals of transformation
The goals of transformation may be

  • one might want to see the data structure differently
  • one might want to reduce the skew that assist in modeling
  • one might want to straighten a nonlinear (curvilinear) relationship in a scatter plot. In other words a transformation may be used to have approximately equal dispersion, making data easier to handle and interpret


Be Sociable, Share!

Autocorrelation Time Series Data

Autocorrelation Time Series Data

Autocorrelation (serial correlation, or cross-autocorrelation) function (the diagnostic tool) helps to describe the evaluation of a process through time. Inference based on autocorrelation function is often called an analysis in the time domain.

Autocorrelation of a random process, is the measure of correlation (relationship) between observations at different distances apart. This coefficients (correlation or autocorrelation) often provide insight into the probability model which generated the data. One can say that autocorrelation is a mathematical tool for finding repeating patterns in the data series.

Autocorrelation is usually used for the following two purposes:

  1. Help to detect the non-randomness in data (the first i.e. lag 1 autocorrelation is performed)
  2. Help in identifying an appropriate time series model if the data are not random (autocorrelation are usually plotted for many lags)

For simple correlation, let there are $n$ pairs of observations on two variables $x$ and $y$, then the usual correlation coefficient (Pearson’s coefficient of correlation) is

\[r=\frac{\sum(x_i-\overline{x})(y_i-\overline{y})}{\sqrt{\sum (x_i-\overline{x})^2 \sum (y_i-\overline{y})^2 }}\]

Similar idea can be used to time series to see either successive observations are correlated or not. Given $N$ observations $x_1, x_2, \cdots, x_N$ on a discrete time series, we can form ($n-1$) pairs of observations such as $(x_1, x_2), (x_2, x_3), \cdots, (x_{n-1}, x_n)$. Here in each pair first observation is as one variable ($x_t$) and the second observation is as second variable ($x_{t+1}$). So the correlation coefficient between $x_t$ and $x_{t+1}$ is

\[r_1\frac{ \sum_{t=1}^{n-1} (x_t- \overline{x}_{(1)} ) (x_{t+1}-\overline{x}_{(2)})  }    {\sqrt{ [\sum_{t=1}^{n-1} (x_t-\overline{x}_{(1)})^2] [ \sum_{t=1}^{n-1} (y_t-\overline{y}_{(1)})^2 ] } }\]


$\overline{x}_{(1)}=\sum_{t=1}^{n-1} \frac{x_t}{n-1}$ is the mean of first $n-1$ observations

$\overline{x}_{(2)}=\sum_{t=2}^{n} \frac{x_t}{n-1}$ is the mean of last $n-1$ observations

Note that: The assumption is that the observations in autocorrelation are equally spaced (equi-spaced).

It is called autocorrelation or serial correlation coefficient. For large $n$, $r_1$ is approximately

\[r_1=\frac{\frac{\sum_{t=1}^{n-1} (x_t-\overline{x})(x_{t+1}-\overline{x}) }{n-1}}{ \frac{\sum_{t=1}^n (x_t-\overline{x})^2}{n}}\]


\[r_1=\frac{\sum_{t=1}^{n-1} (x_t-\overline{x})(x_{t+1}-\overline{x}) } { \sum_{t=1}^n (x_t-\overline{x})^2}\]

For $k$ distance apart i.e., for $k$ lags

\[r_k=\frac{\sum_{t=1}^{n-k} (x_t-\overline{x})(x_{t+k}-\overline{x}) } { \sum_{t=1}^n (x_t-\overline{x})^2}\]

An $r_k$ value of $\pm \frac{2}{\sqrt{n} }$ denotes a significant difference from zero and signifies an autocorrelation.

Application of Autocorrelation

  • Autocorrelation analysis is widely used in fluorescence correlation spectroscopy.
  • Autocorrelation is used to measurement the optical spectra and to measure the very-short-duration light pulses produced by lasers.
  • Autocorrelation is used to analyze dynamic light scattering data for the determination of the particle size distributions of nanometer-sized particles in a fluid. A laser shining into the mixture produces a speckle pattern. Autocorrelation of the signal can be analyzed in terms of the diffusion of the particles. From this, knowing the fluid viscosity, the sizes of the particles can be calculated using Autocorrelation.
  • The small-angle X-ray scattering intensity of a nano-structured system is the Fourier transform of the spatial autocorrelation function of the electron density.
  • In optics, normalized autocorrelations and cross-correlations give the degree of coherence of an electromagnetic field.
  • In signal processing, autocorrelation can provide information about repeating events such as musical beats or pulsar frequencies, but it cannot tell the position in time of the beat. It can also be used to estimate the pitch of a musical tone.
  • In music recording, autocorrelation is used as a pitch detection algorithm prior to vocal processing, as a distortion effect or to eliminate undesired mistakes and inaccuracies.
  • In statistics, spatial autocorrelation between sample locations also helps one estimate mean value uncertainties when sampling a heterogeneous population.
  • In astrophysics, auto-correlation is used to study and characterize the spatial distribution of galaxies in the Universe and in multi-wavelength observations of Low Mass X-ray Binaries.
  • In analysis of Markov chain Monte Carlo data, autocorrelation must be taken into account for correct error determination.

Further Reading: Autocorrelation

Download pdf file:


Be Sociable, Share!

Completely Randomized Design (CRD)

Completely Randomized Design (CRD)

A simplest and non–restricted experimental design, in which occurrence of each treatment has equal number of chances, each treatment can be accommodate in the plan, and the replication of each treatment is unequal is known to be completely randomized design (CRD). In this regard this design is known as unrestricted (a design without any condition) design that have one primary factor. In general form it is also known as one-way analysis of variance.

Let we have three treatments names A, B, and C placed randomly in different experimental units.



We can see that from the table above:

  • There may or may not be repetition of treatment
  • Only source of variation is treatment
  • It is not necessary that specific treatment comes in specific unit.
  • There are three treatments such that each treatment appears three times having P(A)=P(B)=P(C)=3/9.
  • Each treatment is appearing equal number of times (it may be unequal i.e. unbalance)
  • The total number of experimental units are 9.

Some Advantages of Completely Randomized Design (CRD)

  1. The main advantage of this design is that the analysis of data is simplest even if some unit of does not response due to any reason.
  2. Another advantage of this design is that is provided maximum degree of freedom for error.
  3. This design is mostly used in laboratory experiment where all the other factors are in under control of the researcher. For example in a tube experiment CRD in best because all the factors are under control.

An assumption regarded to completely randomized design (CRD) is that the observation in each level of a factor will be independent from each other.

The general model with one factor can be defined as

\[Y_{ij}=\mu + \eta_i +e_{ij}\]

Where$i=1,2,\cdots,t$ and $j=1,2,\cdots, r_i$ with $t$ treatments and $r$ replication. $\mu$ is the overall mean based on all observation. $eta_i$ is the effect of ith treatment response. $e_{ij}$ is the corresponding error term which is assumed to be independent and normally distributed with mean zero and constant variance for each.

Read from WikiPedia: Completely Randomized Design (CRD)

Download pdf file of Completely Randomized Design (CRD):


Be Sociable, Share!
error: Content is protected !!