Creating Matrices in Mathematica (2015)

In this article, we will discuss creating matrices in Mathematica.

Matrices in Mathematica

A matrix is an array of numbers arranged in rows and columns. In Mathematica, matrices are expressed as a list of rows, each of which is a list itself. It means a matrix is a list of lists. If a matrix has $n$ rows and $m$ columns then we call it an $n$ by $m$ matrix. The value(s) in the ith row and jth column is called the $i,j$ entry.

In Mathematica, matrices can be entered with the { } notation, constructed from a formula, or imported from a data file. There are also commands for creating diagonal matrices, constant matrices, and other special matrix types.

Creating Matrices in Mathematica

  • Create a matrix using { } notation
    mat={{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}
    but the output will not be in matrix form, to get in matrix form use commands like
    mat//MatrixForm
  • Creating matrix using Table command
    mat1=Table[b{row, column},
    {row, 1, 4, 1}, {column, 1, 2, 1}]
    ];
    MatrixForm[mat1]
  • Creating symbolic matrices such as
    mat2=Table[xi+xj , {i, 1, 4}, {j, 1, 3}]
    mat2//MatrixForm
  • Creating a diagonal matrix with nonzero entries at its diagonal
    DiagonalMatrix[{1, 2, 3, r}]//MatrixForm
  • Creating a matrix with the same entries i.e. a constant matrix
    ConstantArray[3, {2, 4}]//MatrixForm
  • Creating an identity matrix of order $n\times n$
    IdentityMatrix[4]
Matrices and Mathematica

Matrix Operations in Mathematica

In Mathematica, matrix operations can be performed on both numeric and symbolic matrices.

  • To find the determinant of a matrix
    Det[mat]
  • To find the transpose of a matrix
    Transpose[mat]
  • To find the inverse of a matrix for a linear system
    Inverse[mat]
  • To find the Trace of a matrix i.e. sum of diagonal elements in a matrix
    Tr[mat]
  • To find the Eigenvalues of a matrix
    Eigenvalues[mat]
  • To find the Eigenvector of a matrix
    Eigenvector[mat]
  • To find both Eigenvalues and Eigenvectors together
    Eigensystem[mat]

Note that +, *, and ^ operators all automatically work element-wise.

Displaying Matrix and its Elements

  • mat[[1]]         displays the first row of a matrix where mat is a matrix created above
  • mat[[1, 2]]     displays the element from the first row and second column, i.e. m12 element of the matrix
  • mat[[All, 2]]  displays the 2nd column of matrix

Interactive Input (Menu)

  1. Go to Insert > Table/Matrix > New…
  2. Select Matrix (List of lists).
  3. Define the number of rows and columns.
  4. Click OK.
  5. Use the provided interface to enter values in each cell.

Predefined Matrices

Mathematica provides functions to generate specific types of matrices:

  • IdentityMatrix: Creates an identity matrix.
  • DiagonalMatrix: Creates a diagonal matrix from a specified list.
  • HilbertMatrix: Generates a Hilbert matrix.
  • VandermondeMatrix: Creates a Vandermonde matrix.

Importing from Files

  • Use the Import function to read data from various file formats like CSV, TSV, or Excel spreadsheets and convert them into matrices.
Matrices in Mathematica

References

R Frequently Asked Questions

Student t-test Comparison Test (2015)

In 1908, William Sealy Gosset published his work under the pseudonym “Student” to solve problems associated with inference based on sample(s) drawn from a normally distributed population when the population standard deviation is unknown. He developed the Student t-test and t-distribution, which can be used to compare two small sets of quantitative data collected independently of one another, in this case, this t-test is called independent samples t-test or also called unpaired samples t-test.

The Student t-test is the most commonly used statistical technique in testing of hypothesis based on the difference between sample means. The student t-test can be computed just by knowing the means, standard deviations, and number of data points in both samples by using the following formula

\[t=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\]

where $s_p^2$ is the pooled (combined) variance and can be computed as

\[s_p^2=\frac{(n_1-1)s_1^2 + (n_2-2)s_2^2}{n_1+n_2-2}\]

Using this test statistic, we test the null hypothesis $H_0:\mu_1=\mu_2$ which means that both samples came from the same population under the given “level of significance” or “level of risk”.

If the computed t-statistics from the above formula is greater than the critical value (value from t-table with $n_1+n_2-2$ degrees of freedom and given a level of significance, say $\alpha=0.05$), the null hypothesis will be rejected, otherwise, the null hypothesis will be accepted.

Note that the t-distribution is a family of curves depending on the degree of freedom (the number of independent observations in the sample minus the number of parameters). As the sample size increases, the t-distribution approaches a bell shape i.e. normal distribution.

Student t-test Example

The production manager wants to compare the number of defective products produced on the day shift with the number on the afternoon shift. A sample of the production from 6-day and 8-afternoon shifts revealed the following defects. The production manager wants to check at the 0.05 significance level, is there a significant difference in the mean number of defects per shits?

Day shift587697  
Afternoon Shit810711912149

Some required calculations for the Student t-test are:

The mean of samples:

$\overline{X}_1=7$, $\overline{X}_2=10$,

Standard Deviation of samples

$s_1=1.4142$, $s_2=2.2678$ and $s_p^2=\frac{(6-1) (1.4142)^2+(8-1)(2.2678)^2}{6+8-2}=3.8333$

Step 1: Null and alternative hypothesis are: $H_0:\mu_1=\mu_2$ vs $H_1:\mu_1 \ne \mu_2$

Step 2: Level of significance: $\alpha=0.05$

Step 3: Test Statistics

$\begin{aligned}
t&=\frac{\overline{X}_1-\overline{X}_2 }{\sqrt{s_p^2 (\frac{1}{n_1}+\frac{1}{n_2})}}\\
&=\frac{7-10}{\sqrt{3.8333(\frac{1}{6}+\frac{1}{8})}}=-2.837
\end{aligned}$

Step 4: Critical value or rejection region (Reject $H_0$ if the absolute value of t-calculated in step 3 is greater than the absolute table value i.e. $|t_{calculated}|\ge t_{tabulated}|$). In this example t-tabulated is -2.179 with 12 degrees of freedom at a significance level of 5%.

Step 5: Conclusion: As computed value $|2.837| > |2.179|$, the number of defects is not the same on the two shifts.

Different Types of Comparison Tests

  • Independent Samples t-test: This compares the means of two independent groups. For example, you might use this to see if a new fertilizer increases plant growth compared to a control group.
  • Paired Samples t-test: This compares the means from the same group at different times or under various conditions. Imagine testing the same group’s performance on a task before and after training.
  • One-Sample t-test: This compares the mean of a single group to a hypothesized value. For instance, you could use this to see if students’ average exam scores significantly differ from 75%.

The summary of key differences between the comparison tests

Independent SamplesPaired SamplesOne-Sample
GroupsIndependentSame group at different timesSingle group
HypothesisMeans are differentMeans are differentMean is different from a hypothesized value
AssumptionsNormally distributed data, equal variances (testable)Normally distributed differencesNormally distributed data

Regardless of the type of t-test, all the above comparison tests assess the significance of a difference between means. These tests tell the research if the observed difference is likely due to random chance or reflects a true underlying difference in the populations.

Student T-test

https://rfaqs.com

https://gmstat.com

The sum of Squared Deviations from Mean (2015)

Introduction of Sum Square Deviations

In statistics, the sum of squared deviations (also known as the sum of squares) is a measure of the total variability (Measure of spread or variation) within a data set. In other words, the sum of squares is a measure of deviation or variation from the mean (average) value of the given data set.

Computation of Sum of Squared Deviations

A sum of squares is calculated by first computing the differences between each data point (observation) and the mean of the data set, i.e. $x=X-\overline{X}$. The computed $x$ is known as the deviation score for the given data set. Squaring each of these deviation scores and then adding these squared deviation scores gave us the sum of squared deviation (SS), which is represented mathematically as

\[SS=\sum(x^2)=\sum(X-\overline{X})^2\]

Note that the small letter $x$ usually represents the deviation of each observation from the mean value, while the capital letter $X$ represents the variable of interest in statistics.

The Sum of Squared Deviations Example

Consider the following data set {5, 6, 7, 10, 12}. To compute the sum of squares of this data set, follow these steps

  • Calculate the average of the given data by summing all the values in the data set and then divide this sum of numbers by the total number of observations in the data set. Mathematically, it is $\frac{\sum X_i}{n}=\frac{40}{5}=8$, where 40 is the sum of all numbers $5+6+7+10+12$ and there are 5 observations in number.
  • Calculate the difference of each observation in the data set from the average computed in step 1, for the given data. The differences are
    $5 – 8 = –3$; $6 – 8 = –2$; $7 – 8 = –1$; $10 – 8 =2$ and $12 – 8 = 4$
    Note that the sum of these differences should be zero. $(–3 + –2 + –1 + 2 +4 = 0)$
  • Now square each of the differences obtained in step 2. The square of these differences are
    9, 4, 1, 4 and 16
  • Now add the squared number obtained in step 3. The sum of these squared quantities will be $9 + 4 + 1 + 4 + 16 = 34$, which is the sum of the square of the given data set.
Sum of Squared Deviations

Sums of Squares in Different Context

In statistics, the sum of squares occurs in different contexts such as

  • Partitioning of Variance (Partition of Sums of Squares)
  • The sum of Squared Deviations (Least Squares)
  • The sum of Squared Differences (Mean Squared Error)
  • The sum of Squared Error (Residual Sum of Squares)
  • The sum of Squares due to Lack of Fit (Lack of Fit Sum of Squares)
  • The sum of Squares for Model Predictions (Explained Sum of Squares)
  • The sum of Squares for Observations (Total Sum of Squares)
  • The sum of Squared Deviation (Squared Deviations)
  • Modeling involving the Sum of Squares (Analysis of Variance)
  • Multivariate Generalization of Sum of Square (Multivariate Analysis of Variance)

As previously discussed, the Sum of Squares is a measure of the Total Variability of a set of scores around a specific number.

Summary

  • A higher sum of squares indicates that your data points are further away from the mean on average, signifying greater spread or variability in the data. Conversely, a lower sum of squares suggests the data points are clustered closer to the mean, indicating less variability.
  • The sum of squares plays a crucial role in calculating other important statistics like variance and standard deviation. These concepts help us understand the distribution of data and make comparisons between different datasets.

Online MCQs Test Website

R Faqs

Randomized Complete Block Design (RCBD)

The Randomized Complete Block Design may be defined as the design in which the experimental material is divided into blocks/groups of homogeneous experimental units (experimental units have same characteristics) and each block/group contains a complete set of treatments which are assigned at random to the experimental units.

In Randomized Complete Design (CRD), there is no restriction on the allocation of the treatments to experimental units. But in practical life there are situations where there is relatively large variability in the experimental material, it is possible to make blocks (in a simpler sense groups) of the relatively homogeneous experimental material or units. The design applied in such situations is called a Randomized Complete Block Design (RCBD).

Randomized Complete Block Design

RCBD is a one-restriction design, used to control a variable that influences the response variable. The main aim of the restriction is to control the variable causing the variability in response. Efforts of blocking are made to create a situation of homogeneity within the block. Blocking is a source of variability. An example of a blocking factor might be the gender of a patient (by blocking on gender), this is a source of variability controlled for, leading to greater accuracy. RCBD is a mixed model in which one factor is fixed and the other is random. The main assumption of the design is that there is no contact between the treatment and block effect.

Randomized Complete Block design is said to be a complete design because in this design the experimental units and number of treatments are equal. Each treatment occurs in each block.

The general model is defined as

\[Y_{ij}=\mu+\eta_i+\xi_j+e_{ij}\]

where $i=1,2,3\cdots, t$ and $j=1,2,\cdots, b$ with $t$ treatments and $b$ blocks. $\mu$ is the overall mean based on all observations, $\eta_i$ is the effect of the ith treatment response, $\xi$ is the effect of the jth block, and $e_{ij}$ is the corresponding error term which is assumed to be independent and normally distributed with mean zero and constant variance.

The main objective of blocking is to reduce the variability among experimental units within a block as much as possible and to maximize the variation among blocks; the design would not contribute to improving the precision in detecting treatment differences.

Randomized Complete Block Design Experimental Layout

Suppose there are $t$ treatments and $r$ blocks in a randomized complete block design, then each block contains homogeneous plots for one of each treatment. An experimental layout for such a design using four treatments in three blocks is as follows.

Block 1Block 2Block 3
ABC
BCD
CDA
DAB
Randomized Complete Block Design (RCBD)

From the RCBD layout, we can see that

  • The treatments are assigned at random within blocks of adjacent subjects and each of the treatments appears once in a block.
  • The number of blocks represents the number of replications
  • Any treatment can be adjacent to any other treatment, but not to the same treatment within the block.
  • Variation in an experiment is controlled by accounting for spatial effects.

MCQs about Sequence and Series (Intermediate Math Part – 1)

Learn R Programming