Standard Error of Estimate

Standard Error

Standard error is a statistical term used to measure the accuracy within a sample taken from population of interest. The standard error of the mean measures the variation in the sampling distribution of the sample mean, usually denoted by $\sigma_\overline{x}$ is calculated as


Drawing (obtaining) different samples from the same population of interest usually results in different values of sample means, indicating that there is a distribution of sampled means having its own mean (average values) and variance. The standard error of the mean is considered as the standard deviation of all those possible sample drawn from the same population.

The size of the standard error is affected by standard deviation of the population and number of observations in a sample called the sample size. The larger the standard deviation of the population ($\sigma$), the larger the standard error will be, indicating that there is more variability in the sample means. However larger the number of observations in a sample smaller will be the standard error of estimate, indicating that there is less variability in the sample means, where by less variability we means that the sample is more representative of the population of interest.

If the sampled population is not very larger, we need to make some adjustment in computing the standard error of the sample means. For a finite population, in which total number of objects (observations) is $N$ and the number of objects (observations) in a sample is $n$, then the adjustment will be $\sqrt{\frac{N-n}{N-1}}$. This adjustment is called the finite population correction factor. Then the adjusted standard error will be

\[\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}\]

The standard error is used to:

  1. measure the spread of values of statistic about the expected value of that statistic
  2. construct confidence intervals
  3. test the null hypothesis about population parameter(s)

The standard error is computed from sample statistics. To compute the standard error for simple random samples, assuming that the size of population ($N$) is at least 20 times larger than that of the sample size ($n$).
Sample mean, \overline{x} & \Rightarrow SE_{\overline{x}} = \frac{n}{\sqrt{n}}\\
Sample proportion, p &\Rightarrow SE_{p} \sqrt{\frac{p(1-p)}{n}}\\
Difference between means, \overline{x}_1 – \overline{x}_2 &\Rightarrow SE_{\overline{x}_1-\overline{x}_2}=\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\\
Difference between proportions, \overline{p}_1-\overline{p}_2 &\Rightarrow SE_{p_1-p_2}=\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_@}}

The standard error is identical to the standard deviation, except the standard error use statistics whereas the standard deviation uses the parameter.

For more about Standard Error follow the link Standard Error of Estimate


Be Sociable, Share!

Latin Square Design (LSD)

Latin Square Design (LSD)

In Latin Square Design the treatments are grouped into replicates in two different ways, such that each row and each column is a complete block, and the grouping for balanced arrangement is performed by imposing the restriction that each of the treatment must appears once and only once in each of the row and only once in each of the column. The experimental material should be arranged and the experiment conducted in such a way that the differences among the rows and columns represents major source of variation.

Hence a Latin Square Design is an arrangement of k treatments in a k x k squares, where the treatments are grouped in blocks in two directions. It should be noted that in a Latin Square Design the number of rows, the number of columns and the number of treatments must be equal.

In other words unlike Randomized Completely Block Design (RCBD) and Completely Randomized Design (CRD) a Latin Square Design is a two restrictional design, which provided the facility of two blocking factor which are used to control the effect of two variable that influences the response variable. Latin Square Design is called Latin Square because each Latin letter represents the treatment that occurs once in a row and once in a column in such a way that in respect of one criterion (restriction) rows are completely homogeneous blocks and in respect of other criterion (second restriction) columns are completely homogeneous blocks.

The application of Latin Square Design is mostly in animal science, agriculture and industrial research etc. A daily life example can be a simple game called Sudoku puzzle is also a special case of Latin square design. The main assumption is that there is no contact between treatments, rows and columns effect.

The general model is defined as
\[Y_{ijk}=\mu+\alpha_i+\beta_j+\tau_k +\varepsilon_{ijk}\]

where $i=1,2,\cdots,t; j=1,2,\cdots,t$ and $k=1,2,\cdots,t$ with $t$ treatments, $t$ rows and $t$ columns,
$\mu$ is the overall mean (general mean) based on all of the observation,
$\alpha_i$ is the effect of ith row,
$\beta_j$ is the effect of jth rows,
$\tau_k$ is the effect of kth column.
$\varepsilon_{ijk}$ is the corresponding error term which is assumed to be independent and normally distributed with mean zero and constant variance i.e $\varepsilon_{ijk}\sim N(0, \sigma^2)$.

Latin Square Design Experimental Layout

Suppose we have 4 treatments (namely: A, B, C and D), then it means that we have

Number of Treatments = Number of Rows = Number of Columns =4

And the Latin Square Design’s Layout can be for example


The number in subscript represents row, block and treatment number respectively. For example $Y_{421}$ means first treatment in 4th row, second block (column).


Be Sociable, Share!

Creating Matrices in Mathematica

A matrix is an array of numbers arranged in rows and columns. In Mathematica matrices are expressed as a list of rows, each of which is a list itself. It means a matrix is a list of lists. If a matrix has n rows and m columns then we call it an n by m matrix. The value(s) in the ith row and jth column is called the i, j entry.

In mathematica, matrices can be entered with the { } notation, constructed from a formula or imported from a data file. There are also commands for creating diagonal matrices, constant matrices and other special matrix types.

Creating matrices in Mathematica

  • Create a matrix using { } notation
    mat={{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}
    but output will not be in matrix form, to get in matrix form use command like
  • Creating matrix using Table command
    mat1=Table[b{row, column},
    {row, 1, 4, 1}, {column, 1, 2, 1}]
  • Creating symbolic matrix such as
    mat2=Table[xi+xj , {i, 1, 4}, {j, 1, 3}]
  • Creating a diagonal matrix with nonzero entries at its diagonal
    DiagonalMatrix[{1, 2, 3, r}]//MatrixForm
  • Creating a matrix with same entries i.e. a constant matrix
    ConstantArray[3, {2, 4}]//MatrixForm
  • Creating an identity matrix of order n × n

Matrix Operations in Mathematica

In mathematica matrix operations can be performed on both numeric and symbolic matrices.

  • To find the determinant of a matrix
  • To find the transpose of a matrix
  • To find the inverse of a matrix for linear system
  • To find the Trace of a matrix i.e. sum of diagonal elements in a matrix
  • To find Eigenvalues of a matrix
  • To find Eigenvector of a matrix
  • To find both Eigenvalues and Eigenvectors together

Note that +, *, ^ operators all automatically work element-wise.

Displaying matrix and its elements

  • mat[[1]]         displays the first row of a matrix where mat is a matrix create above
  • mat[[1, 2]]     displays the element from first row and second column, i.e. m12 element of the matrix
  • mat[[All, 2]]  displays the 2nd column of matrix


Be Sociable, Share!

Student t test

Student t test

William Sealy Gosset in 1908 published his work under the pseudonym “Student” to solve problems associated with inference based on sample(s) drawn from normally distributed population when the population standard deviation is unknown. He developed the t-test and t-distribution, which can be used to compare two small sets of quantitative data collected independently of one another, in this case this t-test is called independent samples t-test or also called unpaired samples t-test.

Student’s t-test is the most commonly used statistical techniques in testing of hypothesis on the basis of difference between sample means. The t-test can be computed just by knowing the means, standard deviations and number of data points in both samples by using the following formula

\[t=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\]

where $s_p^2$ is the pooled (combined) variance and can be computed as

\[s_p^2=\frac{(n_1-1)s_1^2 + (n_2-2)s_2^2}{n_1+n_2-2}\]

Using this test statistic, we test the null hypothesis $H_0:\mu_1=\mu_2$ which means that both samples came from the same population under the given level of significance or level of risk.

If the computed t-statistics from above formula is greater than the critical value (value from t-table with $n_1+n_2-2$ degrees of freedom and given level of significance, say $\alpha=0.05$), the null hypothesis will be rejected, otherwise null hypothesis will be accepted.

Note that the t-distribution is a family of curves depending of degree of freedom (the number of independent observations in the sample minus number of parameters). As the sample size increases, the t-distribution approaches to bell shape i.e. normal distribution.

Example: The production manager wants to compare the number of defective products produced on the day shift with the number on the afternoon shift. A sample of the production from 6day shifts and 8 afternoon shifts revealed the following numbers of defects. The production manager wants to check at the 0.05 significance level, is there a significant difference in the mean number of defects per shits?

Day shift 5 8 7 6 9 7
Afternoon Shit 8 10 7 11 9 12 14 9

Some required calculations are:

Mean of samples:

$\overline{X}_1=7$, $\overline{X}_2=10$,

Standard Deviation of samples

$s_1=1.4142$, $s_2=2.2678$ and $s_p^2=\frac{(6-1) (1.4142)^2+(8-1)(2.2678)^2}{6+8-2}=3.8333$

Step 1: Null and alternative hypothesis are: $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1 \ne \mu_2$

Step 2: Level of significance: $\alpha=0.05$

Step 3: Test Statistics

t&=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\\

Step 4: Critical value or rejection region (Reject $H_0$ if absolute value of t-calculated in step 3 is greater than absolute table value i.e. $|t_{calculated}|\ge t_{tabulated}|$). In this example t-tabulated is -2.179 with 12 degree of freedom at significance level 5%.

Step 5: Conclusion: As computed value $|2.837| > |2.179|$, which means that the number of defects is not same on the two shifts.

See some Mathematica demonstration


Student T Distribution

Be Sociable, Share!

Sum of Squares

Sum of Sqaures

In statistics, the sum of squares is a measure of the total variability (spread, variation) within a data set. In other words the sum of squares is a measure of deviation or variation from mean value of the given data set. A sum of squares calculated by first computing the differences between each data point (observation) and mean of the data set, i.e. $x=X-\overline{X}$. The computed $x$ is the deviation score for the given data set. Squaring each of this deviation score and then adding these squared deviation scores gave us the sum of squares (SS), which is represented mathematically as


Note that the small letter $x$ usually represents the deviation of each observation from mean value, while capital letter $X$ represents the variable of interest in statistics.

Sum of Squares Example

Consider the following data set {5, 6, 7, 10, 12}. To compute the sum of squares of this data set, follow these steps

  • Calculate the average of the given data by summing all the values in the data set and then divide this sum of numbers by the total number of observations in the date set. Mathematically, it is $\frac{\sum X_i}{n}=\frac{40}{5}=8$, where 40 is the sum of all numbers $5+6+7+10+12$ and there are 5 observations in number.
  • Calculate the difference of each observation in data set from the average computed in step 1, for given data. The difference are
    5 – 8 = –3; 6 – 8 = –2; 7 – 8 = –1; 10 – 8 =2 and 12 – 8 = 4
    Note that the sum of these differences should be zero. (–3 + –2 + –1 + 2 +4 = 0)
  • Now square the each of the differences obtained in step 2. The square of these differences are
    9, 4, 1, 4 and 16
  • Now add the squared number obtained in step 3. The sum of these squared quantities will be 9 + 4 + 1 + 4 + 16 = 34, which is the sum of the square of the given data set.

In statistics, sum of squares occurs in different contexts such as

  • Partitioning of Variance (Partition of Sums of Squares)
  • Sum of Squared Deviations (Least Squares)
  • Sum of Squared Differences (Mean Squared Error)
  • Sum of Squared Error (Residual Sum of Squares)
  • Sum of Squares due to Lack of Fit (Lack of Fit Sum of Squares)
  • Sum of Squares for Model Predictions (Explained Sum of Squares)
  • Sum of Squares for Observations (Total Sum of Squares)
  • Sum of Squared Deviation (Squared Deviations)
  • Modeling involving Sum of Squares (Analysis of Variance)
  • Multivariate Generalization of Sum of Square (Multivariate Analysis of Variance)

As previously discussed, Sum of Square is a measure of the Total Variability of a set of scores around a specific number.


Be Sociable, Share!
error: Content is protected !!