Standard Error 2: A Quick Guide

Introduction to Standard Errors (SE)

Standard error (SE) is a statistical term used to measure the accuracy within a sample taken from a population of interest. The standard error of the mean measures the variation in the sampling distribution of the sample mean, usually denoted by $\sigma_\overline{x}$ is calculated as

\[\sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}\]

Drawing (obtaining) different samples from the same population of interest usually results in different values of sample means, indicating that there is a distribution of sampled means having its mean (average values) and variance. The standard error of the mean is considered the standard deviation of all those possible samples drawn from the same population.

Size of the Standard Error

The size of the standard error is affected by the standard deviation of the population and the number of observations in a sample called the sample size. The larger the population’s standard deviation ($\sigma$), the larger the standard error will be, indicating more variability in the sample means. However, the larger the number of observations in a sample, the smaller the estimate’s SE, indicating less variability in the sample means. In contrast, by less variability, we mean that the sample is more representative of the population of interest.

Adjustments in Computing SE of Sample Means

If the sampled population is not very large, we need to make some adjustments in computing the SE of the sample means. For a finite population, in which the total number of objects (observations) is $N$ and the number of objects (observations) in a sample is $n$, then the adjustment will be $\sqrt{\frac{N-n}{N-1}}$. This adjustment is called the finite population correction factor. Then the adjusted standard error will be

\[\frac{\sigma}{\sqrt{n}} \sqrt{\frac{N-n}{N-1}}\]

Uses of Standard Error

  1. It measures the spread of values of statistics about the expected value of that statistic. It helps us understand how well a sample represents the entire population.
  2. It is used to construct confidence intervals, which provide a range of values likely to contain the true population parameter.
  3. It helps to test the null hypothesis about population parameter(s), such as t-tests and z-tests. It helps determine the significance of differences between sample means or between a sample mean and a population mean.
  4. It helps in determining the required sample size for a study to achieve the desired level of precision.
  5. By comparing standard errors of different samples or estimates, one can assess the relative variability and reliability of those estimates.
Standard Error

The SE is computed from sample statistic. To compute SE for simple random samples, assuming that the size of the population ($N$) is at least 20 times larger than that of the sample size ($n$).
\begin{align*}
Sample\, mean, \overline{x} & \Rightarrow SE_{\overline{x}} = \frac{n}{\sqrt{n}}\\
Sample\, proportion, p &\Rightarrow SE_{p} \sqrt{\frac{p(1-p)}{n}}\\
Difference\, b/w \, means, \overline{x}_1 – \overline{x}_2 &\Rightarrow SE_{\overline{x}_1-\overline{x}_2}=\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\\
Difference\, b/w\, proportions, \overline{p}_1-\overline{p}_2 &\Rightarrow SE_{p_1-p_2}=\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}
\end{align*}

Summary

The SE provides valuable insights about the reliability and precision of sample-based estimates. By understanding SE, a researcher can make more informed decisions and draw more accurate conclusions from the data under study. The SE is identical to the standard deviation, except that it uses statistics whereas the standard deviation uses the parameter.

FAQS about SE

  1. What is the SE, and how it is computed?
  2. What are the uses of SE?
  3. From which is the size of the SE affected?
  4. When will the SE be large?
  5. When will the SE be small?
  6. What will be the standard error for proportion?

For more about SE follow the link Standard Error of Estimate

R for Data Analysis

MCQs Mathematics Intermediate Second Year

Latin Square Designs (LSD) Definition and Introduction

Introduction to Latin Square Designs

In Latin Square Designs the treatments are grouped into replicates in two different ways, such that each row and each column is a complete block, and the grouping for balanced arrangement is performed by restricting that each of the treatments must appear once and only once in each of the rows and only once in each of the column. The experimental material should be arranged and the experiment conducted in such a way that the differences among the rows and columns represent a major source of variation.

Hence a Latin Square Design is an arrangement of $k$ treatments in a $k\times k$ squares, where the treatments are grouped in blocks in two directions. It should be noted that in a Latin Square Design the number of rows, the number of columns, and the number of treatments must be equal.

In other words unlike Randomized Completely Block Design (RCBD) and Completely Randomized Design (CRD) a Latin Square Design is a two-restriction design, which provides the facility of two blocking factors that are used to control the effect of two variables that influence the response variable. Latin Square Design is called Latin Square because each Latin letter represents the treatment that occurs once in a row and once in a column in such a way that for one criterion (restriction), rows are completely homogeneous blocks, and concerning another criterion (second restriction) columns are completely homogeneous blocks.

Application of Latin Square Designs

The application of Latin Square Designs is mostly in animal science, agriculture, industrial research, etc. A daily life example can be a simple game called Sudoku puzzle is also a special case of Latin square designs. The main assumption is that there is no contact between treatments, rows, and columns effect.

Latin Square Designs

The general model is defined as
\[Y_{ijk}=\mu+\alpha_i+\beta_j+\tau_k +\varepsilon_{ijk}\]

where $i=1,2,\cdots,t; j=1,2,\cdots,t$ and $k=1,2,\cdots,t$ with $t$ treatments, $t$ rows and $t$ columns,
$\mu$ is the overall mean (general mean) based on all of the observations,
$\alpha_i$ is the effect of the $i$th row,
$\beta_j$ is the effect of $j$th rows,
$\tau_k$ is the effect of the $k$th column.
$\varepsilon_{ijk}$ is the corresponding error term which is assumed to be independent and normally distributed with mean zero and constant variance i.e $\varepsilon_{ijk}\sim N(0, \sigma^2)$.

Latin Square Designs Experimental Layout

Suppose we have 4 treatments (namely: $A, B, C$, and $D$), then it means that we have

Number of Treatments = Number of Rows = Number of Columns =4

The Latin Square Designs Layout can be for example

A
$Y_{111}$
B
$Y_{122}$
C
$Y_{133}$
D
$Y_{144}$
B
$Y_{212}$
C
$Y_{223}$
D
$Y_{234}$
A
$Y_{241}$
C
$Y_{313}$
D
$Y_{324}$
A
$Y_{331}$
B
$Y_{342}$
D
$Y_{414}$
A
$Y_{421}$
B
$Y_{432}$
C
$Y_{443}$

The number in subscript represents a row, block, and treatment number respectively. For example, $Y_{421}$ means the first treatment in the 4th row, the second block (column).

Latin Square Designs

Benefits of using Latin Square Designs

  • Efficiency: It allows to examination of multiple factors (treatments) within a single experiment, reducing the time and resources needed.
  • Controlling Variability: By ensuring a balanced distribution of treatments across rows and columns, one can effectively control for two sources of variation that might otherwise influence the results.

Limitations

The following limitations need to be considered:

  • Number of Treatments: The number of rows and columns in the Latin square must be equal to the number of treatments. This means it works best with a small to moderate number of treatments.
  • Interaction Effects: Latin squares are good for analyzing the main effects of different factors, but they cannot account for interaction effects between those factors.

Matrices and Determinants Quizzes

Creating Matrices in Mathematica (2015)

In this article, we will discuss creating matrices in Mathematica.

Matrices in Mathematica

A matrix is an array of numbers arranged in rows and columns. In Mathematica, matrices are expressed as a list of rows, each of which is a list itself. It means a matrix is a list of lists. If a matrix has $n$ rows and $m$ columns then we call it an $n$ by $m$ matrix. The value(s) in the ith row and jth column is called the $i,j$ entry.

In Mathematica, matrices can be entered with the { } notation, constructed from a formula, or imported from a data file. There are also commands for creating diagonal matrices, constant matrices, and other special matrix types.

Creating Matrices in Mathematica

  • Create a matrix using { } notation
    mat={{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}
    but the output will not be in matrix form, to get in matrix form use commands like
    mat//MatrixForm
  • Creating matrix using Table command
    mat1=Table[b{row, column},
    {row, 1, 4, 1}, {column, 1, 2, 1}]
    ];
    MatrixForm[mat1]
  • Creating symbolic matrices such as
    mat2=Table[xi+xj , {i, 1, 4}, {j, 1, 3}]
    mat2//MatrixForm
  • Creating a diagonal matrix with nonzero entries at its diagonal
    DiagonalMatrix[{1, 2, 3, r}]//MatrixForm
  • Creating a matrix with the same entries i.e. a constant matrix
    ConstantArray[3, {2, 4}]//MatrixForm
  • Creating an identity matrix of order $n\times n$
    IdentityMatrix[4]
Matrices and Mathematica

Matrix Operations in Mathematica

In Mathematica, matrix operations can be performed on both numeric and symbolic matrices.

  • To find the determinant of a matrix
    Det[mat]
  • To find the transpose of a matrix
    Transpose[mat]
  • To find the inverse of a matrix for a linear system
    Inverse[mat]
  • To find the Trace of a matrix i.e. sum of diagonal elements in a matrix
    Tr[mat]
  • To find the Eigenvalues of a matrix
    Eigenvalues[mat]
  • To find the Eigenvector of a matrix
    Eigenvector[mat]
  • To find both Eigenvalues and Eigenvectors together
    Eigensystem[mat]

Note that +, *, and ^ operators all automatically work element-wise.

Displaying Matrix and its Elements

  • mat[[1]]         displays the first row of a matrix where mat is a matrix created above
  • mat[[1, 2]]     displays the element from the first row and second column, i.e. m12 element of the matrix
  • mat[[All, 2]]  displays the 2nd column of matrix

Interactive Input (Menu)

  1. Go to Insert > Table/Matrix > New…
  2. Select Matrix (List of lists).
  3. Define the number of rows and columns.
  4. Click OK.
  5. Use the provided interface to enter values in each cell.

Predefined Matrices

Mathematica provides functions to generate specific types of matrices:

  • IdentityMatrix: Creates an identity matrix.
  • DiagonalMatrix: Creates a diagonal matrix from a specified list.
  • HilbertMatrix: Generates a Hilbert matrix.
  • VandermondeMatrix: Creates a Vandermonde matrix.

Importing from Files

  • Use the Import function to read data from various file formats like CSV, TSV, or Excel spreadsheets and convert them into matrices.
Matrices in Mathematica

References

R Frequently Asked Questions

Student t-test Comparison Test (2015)

In 1908, William Sealy Gosset published his work under the pseudonym “Student” to solve problems associated with inference based on sample(s) drawn from a normally distributed population when the population standard deviation is unknown. He developed the Student t-test and t-distribution, which can be used to compare two small sets of quantitative data collected independently of one another, in this case, this t-test is called independent samples t-test or also called unpaired samples t-test.

The Student t-test is the most commonly used statistical technique in testing of hypothesis based on the difference between sample means. The student t-test can be computed just by knowing the means, standard deviations, and number of data points in both samples by using the following formula

\[t=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\]

where $s_p^2$ is the pooled (combined) variance and can be computed as

\[s_p^2=\frac{(n_1-1)s_1^2 + (n_2-2)s_2^2}{n_1+n_2-2}\]

Using this test statistic, we test the null hypothesis $H_0:\mu_1=\mu_2$ which means that both samples came from the same population under the given “level of significance” or “level of risk”.

If the computed t-statistics from the above formula is greater than the critical value (value from t-table with $n_1+n_2-2$ degrees of freedom and given a level of significance, say $\alpha=0.05$), the null hypothesis will be rejected, otherwise, the null hypothesis will be accepted.

Note that the t-distribution is a family of curves depending on the degree of freedom (the number of independent observations in the sample minus the number of parameters). As the sample size increases, the t-distribution approaches a bell shape i.e. normal distribution.

Student t-test Example

The production manager wants to compare the number of defective products produced on the day shift with the number on the afternoon shift. A sample of the production from 6-day and 8-afternoon shifts revealed the following defects. The production manager wants to check at the 0.05 significance level, is there a significant difference in the mean number of defects per shits?

Day shift587697  
Afternoon Shit810711912149

Some required calculations for the Student t-test are:

The mean of samples:

$\overline{X}_1=7$, $\overline{X}_2=10$,

Standard Deviation of samples

$s_1=1.4142$, $s_2=2.2678$ and $s_p^2=\frac{(6-1) (1.4142)^2+(8-1)(2.2678)^2}{6+8-2}=3.8333$

Step 1: Null and alternative hypothesis are: $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1 \ne \mu_2$

Step 2: Level of significance: $\alpha=0.05$

Step 3: Test Statistics

$\begin{aligned}
t&=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\\
&=\frac{7-10}{\sqrt{3.8333(\frac{1}{6}+\frac{1}{8})}}=-2.837
\end{aligned}$

Step 4: Critical value or rejection region (Reject $H_0$ if the absolute value of t-calculated in step 3 is greater than the absolute table value i.e. $|t_{calculated}|\ge t_{tabulated}|$). In this example t-tabulated is -2.179 with 12 degrees of freedom at a significance level of 5%.

Step 5: Conclusion: As computed value $|2.837| > |2.179|$, the number of defects is not the same on the two shifts.

Different Types of Comparison Tests

  • Independent Samples t-test: This compares the means of two independent groups. For example, you might use this to see if a new fertilizer increases plant growth compared to a control group.
  • Paired Samples t-test: This compares the means from the same group at different times or under various conditions. Imagine testing the same group’s performance on a task before and after training.
  • One-Sample t-test: This compares the mean of a single group to a hypothesized value. For instance, you could use this to see if students’ average exam scores significantly differ from 75%.

The summary of key differences between the comparison tests

Independent SamplesPaired SamplesOne-Sample
GroupsIndependentSame group at different timesSingle group
HypothesisMeans are differentMeans are differentMean is different from a hypothesized value
AssumptionsNormally distributed data, equal variances (testable)Normally distributed differencesNormally distributed data

Regardless of the type of t-test, all the above comparison tests assess the significance of a difference between means. These tests tell the research if the observed difference is likely due to random chance or reflects a true underlying difference in the populations.

Student T-test

https://rfaqs.com

https://gmstat.com