Mode Measure of Central Tendency (2014)

The mode is the most frequent observation in the data set i.e. the value (number) that appears the most in the data set. It is possible that there may be more than one mode or it may also be possible that there is no mode in a data set. Usually, it is calculated for categorical data (data belongs to nominal or ordinal scale) but is unnecessary.

It can also be used for ordinal and ratio scales, but there should be some repeated value in the data set or the data set can be classified. If any of the data points don’t have the same values (no repetition in data values), then the mode of that data set will not exit or may not be meaningful. A data set having more than one mode is called multimode or multimodal.

Example 1: Consider the following data set showing the weight of a child at the age of 10 years: 33, 30, 23, 23, 32, 21, 23, 30, 30, 22, 25, 33, 23, 23, 25. We can find the most repeated value by tabulating the given data in the form of a frequency distribution table, whose first column is the weight of the child and the second column is the number of times the weight appears in the data i.e. frequency of each weight in the first column.

Weight of 10 year childFrequency

From the above frequency distribution table, we can easily find the most repeated occurring observation (data point), which will be the mode of the data set and it is 23, meaning that the majority of the 10-year-old children weigh 23kg. Note that for finding the mode it is not necessary to make a frequency distribution table, but it helps in finding the mode quickly and the frequency table can also be used in further calculations such as percentage and cumulative percentage of each weight group.

Example 2: Consider we have information about a person about his/her gender. Consider the $M$ stands for male and $F$ stands for Female. The sequence of the person’s gender noted is as follows: F, F, M, F, F, M, M, M, M, F, M, F, M, F, M, M, M, F, F, M. The frequency distribution table of gender is

The most repeated gender is male, showing that the most frequent or majority of the people have male gender in this data set.

Mode can be found by simply sorting the data in ascending or descending order and then counting the frequent value without sorting the data especially when data contains a small number of observations, though it may be difficult to remember the number of times which observation occurs. Note that the mode is not affected by the extreme values (outliers or influential observations).

The mode is also a measure of central tendency, but it may not reflect the center of the data very well. For example, the mean of the data set in example 1, is 26.4kg while the mode is 23kg. Therefore, it should be used, if it is expected that data points will repeat or have some classification in them. For such kind of data, one should use it as a measure of central tendency instead of mean or median. For example,

  • In the production process, a product can be classified as a defective or non-defective product.
  • Student grades can classified as A, B, C, D, etc.
  • Gender of respondents
  • Blood Group

Example 3: Consider the following data. 3, 4, 7, 11, 15, 20, 23, 22, 26, 33, 25, 13. There is no mode of this data as each value occurs once. By grouping this data in some useful and meaningful form we can get the most repeated value of the data for example, the grouped frequency table is

0 to 93, 4, 73
10 to 1911, 13, 153
20 to 2920, 22, 23, 25, 265
30 to 39331

We cannot find the most Frequent value from this table, but we can say that “20 to 29” is the group in which most of the observations occur. We can say that this group contains the mode which can be found by using the grouped formula.

Mode from Bar Graph

Bar Graph: Mode Measure of Central Tendency

Matrix in Matlab: Create and manipulate Matrices

Matrix in Matlab can be created and manipulated

Matrix (a two-dimensional, rectangular shape used to store multiple elements of data in an easily accessible format) is the most basic data structure in Matlab. The elements of a matrix can be numbers, characters, logical states of yes or no (true or false), or other Matlab structure types. Matlab also supports more than two-dimensional data structures, referred to as arrays in Matlab. Matlab is a matrix-based computing environment in which all of the data entered into Matlab is stored as a matrix.

The MATLAB environment uses the term matrix for a variable that contains real or complex numbers. These numbers are arranged in a two-dimensional grid. An array is, more generally, a vector, matrix, or higher dimensional grid of numbers. All variables in Matlab are multidimensional arrays, no matter what type of data they store. A matrix is a two-dimensional array often used for linear algebra.

It is assumed in this Matlab tutorial that you know some of the basics of how to define and manipulate vectors in Matlab software. we will discuss the following:

  1. Defining Matrix in Matlab
  2. Matrix Operations in Matlab
  3. Matrix Functions in Matlab

1)  Define or Create a Matrix in Matlab

Defining a matrix in Matlab is similar to defining a vector in Matlab. To define a matrix, treat it as a column of row vectors.

>> A=[1 2 3; 4 5 6; 7 8 9]

Note that spaces between numbers are used to define the elements of the matrix and semi-colon is used to separate the rows of matrix A. The square brackets are used to construct matrices. The individual matrix and vector entries can be referenced within parentheses. For example, A(2,3) represents an element in the second row and third column of matrix A.

Matrix in Matlab
Matrix in Matlab

A matrix in Matlab is a type of variable that is used for mathematical/statistical computation—some examples of creating a matrix in Matlab and extracting elements.

>> A=rand(6, 6)
>> B=rand(6, 4)
>> A(1:4, 3) is a column vector consisting of the first four entries of the third column of A
>> A(:, 3) is the third column of A
>> A(1:4, : ) contains column  and column 4 of matrix A

Convenient matrix-building Functions

eye –> identity
zeros –> matrix of zeros
ones –> matrix of ones
diag –> create or extract diagonal elements of a matrix
triu –> upper triangular part of a matrix
tril –> lower triangular part of a matrix
rand –> randomly generated matrix
hilb –> Hilbert matrix
magic –> magic square

2)  Matrix Operations in Matlab

Many mathematical operations can be applied to matrices and vectors in Matlab such as addition, subtraction, multiplication, and division of matrices, etc.

Matrix or Vector Multiplication

If $x$ and $y$ are both column vectors, then $x’*y$ is their inner (or dot) product, and $x*y’$ is their outer (or cross) product.

Matrix division

Let $A$ be an invertible square matrix and $b$ be a compatible column vector then

x = A/b is solution of A * x = b
x = b/A is solution of x * A = b 

These are also called the backslash (\) and slash operators (/) also referred to as the mldivide and mrdivide.

3)  Matrix Functions in Matlab

Matlab has many functions used to create different kinds of matrices. Some important matrix functions used in Matlab are

eig –> eigenvalues and eigenvectors
eigs –> like eig, for large sparse matrices
chol –> Cholesky factorization
svd –> singular value decomposition
svds –> like SVD, for large sparse matrices
inv –> inverse of matrix
lu –> LU factorization
qr –> QR factorization
hess –> Hessenberg form
schur –> Schur decomposition
rref –> reduced row echelon form
expm –> matrix exponential
sqrtm –> matrix square root
poly –> characteristic polynomial
det –> determinant of matrix
size –> size of an array
length –> length of a vector
rank –> rank of matrix

To learn more about the use of Matrices in Matlab, See the Matlab Help

R Language and Data Analysis

Sufficient Estimators and Sufficient Statistics

Introduction to Sufficient Estimator and Sufficient Statistics

An estimator $\hat{\theta}$ is sufficient if it makes so much use of the information in the sample that no other estimator could extract from the sample, additional information about the population parameter being estimated.

The sample mean $\overline{X}$ utilizes all the values included in the sample so it is a sufficient estimator of the population mean $\mu$.

Sufficient estimators are often used to develop the estimator that has minimum variance among all unbiased estimators (MVUE).

If a sufficient estimator exists, no other estimator from the sample can provide additional information about the population being estimated.

If there is a sufficient estimator, then there is no need to consider any of the non-sufficient estimators. A good estimator is a function of sufficient statistics.

Let $X_1, X_2,\cdots, X_n$ be a random sample from a probability distribution with unknown parameter $\theta$, then this statistic (estimator) $U=g(X_1, X_,\cdots, X_n)$ observation gives $U=g(X_1, X_2,\cdots, X_n)$ does not depend upon population parameter $\Theta$.

Sufficient Statistics Example

The sample mean $\overline{X}$ is sufficient for the population mean $\mu$ of a normal distribution with known variance. Once the sample mean is known, no further information about the population mean $\mu$ can be obtained from the sample itself, while the median is not sufficient for the mean; even if the median of the sample is known, knowing the sample itself would provide further information about the population mean $\mu$.

Mathematical Definition of Sufficiency

Suppose that $X_1,X_2,\cdots,X_n \sim p(x;\theta)$. $T$ is sufficient for $\theta$ if the conditional distribution of $X_1,X_2,\cdots, X_n|T$ does not depend upon $\theta$. Thus
This means that we can replace $X_1,X_2,\cdots,X_n$ with $T(X_1,X_2,\cdots,X_n)$ without losing information.

Sufficient Estimator Sufficient Statistics

