Median Definition, Formula, and Example: Quick Guide (2014)

Median Definition

Median (a measure of central tendency) is the middle-most value in the data set when all of the values (observations) in a data set are arranged either in ascending or descending order of their magnitude. The median is also considered a measure of central tendency that divides the data set into two halves, where the first half contains 50% observations below the median value and 50% above the median value. If there are an odd number of observations (data points) in a data set, the median value is the single-most middle value after sorting the data set.

After understanding the median definition, let us consider a few examples to calculate the median for a data set.

Median Example – 1

Question: For the following data set: 5, 9, 8, 4, 3, 1, 0, 8, 5, 3, 5, 6, 3, calculate the median.

Answer: To find the median of the given data set, first sort the data (either in ascending or descending order), that is
0, 1, 3, 3, 3, 4, 5, 5, 5, 6, 8, 8, 9. The middle-most value of the above data after sorting is 5, which is the median of the given data set.

When the number of observations in a data set is even then the median value is the average of two middle-most values in the sorted data.

Median Example – 2

Question: Consider the following data set, 5, 9, 8, 4, 3, 1, 0, 8, 5, 3, 5, 6, 3, 2. Compute the median.

Answer: To find the median first sort it and then locate the middle-most two values, that is,
0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 8, 8, 9. The middle-most two values are 4 and 5. So the median will be the average of these two values, i.e. 4.5 in this case.

The median is less affected by extreme values in the data set, so the median is the preferred measure of central tendency when the data set is skewed or not symmetrical.

Median Formula for Odd Number of Observations

For large data sets it is relatively very difficult to locate median values in sorted data. It will be helpful to use the median value using the formula. The formula for an odd number of observations is
$\begin{aligned}
Median &=\frac{n+1}{2}th\\
Median &=\frac{n+1}{2}\\
&=\frac{13+1}{2}\\
&=\frac{14}{2}=7th
\end{aligned}$

The 7th value in sorted data is the median of the given data.

Median Formula for Even Number of Observations

The median formula for an even number of observations is
$\begin{aligned}
Median&=\frac{1}{2}(\frac{n}{2}th + (\frac{n}{2}+1)th)\\
&=\frac{1}{2}(\frac{14}{2}th + (\frac{14}{2}+1)th)\\
&=\frac{1}{2}(7th + 8th )\\
&=\frac{1}{2}(4 + 5)= 4.5
\end{aligned}$

Median definition formula example

The computation of the median is a crucial step in exploratory data analysis (EDA). It helps identify potential outliers, assess skewness in the data distribution, and choose appropriate statistical methods for further analysis.

Applications of Median in Different Scenarios

1. Resisting Outliers: The median’s primary strength lies in its resistance to outliers. Unlike the mean (which can be swayed by extreme values), the median remains unaffected and stable by a few very high or very low data points (extreme observations).

2. Analyzing Skewed Distributions: When dealing with data that is not symmetrical (has skewed distributions), the median provides a more accurate representation of the “center” of the data compared to the mean/average. The median reflects the value that divides the data into halves, whereas the mean gets pulled towards the tail of the skewed distribution.

3. Ease of Interpretation: The median is a simple concept – the middle (centermost) value when the data is arranged in order (either ascending or descending).

Note that the median measure of central tendency, cannot be found for categorical data.

FAQs about Median

  1. What is the median?
  2. What is the advantage of the median over other measures of central tendencies?
  3. On what kind/type of data median can be computed?
  4. What is the benefit of using the median?
  5. Write the formula for the median when the number of observations is even and when the number of observations is odd.
  6. How median is interpreted?
  7. In how many groups median classify the data/sample/population?
https://itfeature.com

Online MCQs Test website

R Programming Language

Pseudo Random Numbers (2014)

A sequence of Pseudo Random Numbers is generated by a deterministic algorithm and should simulate a sequence of independent and uniformly distributed random variables on the interval [0, 1]. Every random experiment results in two or more outcomes.

A variable whose values depend upon the outcomes of a random experiment is called a random variable denoted by capital letters $X, Y$, or $Z$ and their values by the corresponding small letters $x, y$, or $z$.

Pseudo Random Numbers and their Generation

Random numbers are a sequence of digits from the set {0,1,2,⋯,9} so that, at each position in the sequence, each digit has the same probability 0.1 of being selected irrespective of the actual sequence, so far constructed.

The simplest ways of achieving such numbers are games of chance such as dice, coins, and cards, or by repeatedly drawing numbered slips out of a jar. These are usually grouped purely for the convenience of reading but this would become very tedious for long runs of each digit. Fortunately, tables of random digits are widely available now.

Pseudo Random Numbers and Their Process

pseudo random process is a process that appears to be random but actually, it is not. Pseudo random sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process. Such a process is easier to produce than a genuinely random one and has the benefit that it can be used again and again to produce the same numbers and they are useful for testing and fixing software.

For implementation on computers to provide a sequence of such digits easily, and quickly, the most common methods are called Pseudo Random Technique.

Here, the digit will eventually reappear in the same order (cycle. For a good technique, the cycle might be tens of thousands of digits long. Of course, the pseudorandom numbers/digits are not truly random. They are completely deterministic but they do exhibit most of the properties of random digits. Generally, their methods involve the recursive formula e.g.

\[X_{n+1}= a x_n +b\, mod\, m; n=0, 1, 2, …\]

$a, b$, and $n$ are suitably chosen integer constants and the seed $x_0$ (a starting number i.e. n = 0) is an integer. (Note mode $m$ means that if the result from the formula is greater than m, divide it by m and keep the remainder as a random number.

Use of this formula gives rise to a sequence of integers each of which is in the random 0 to $m-1$.

Example (Pseudorandom Numbers Generation)

let a = 13, b=5, and m = 1000, Generate 500 random numbers.

Solution

\[x_{n+1}=a \, x_n + \,b\, mod\, 1000; n=0,1,2,…\]

let seed $x_0=5$, then for $n=0$ we have

\begin{align*}
x_{0+1}&=13 \times 5 +5\, mod\, 1000=70\\
x_{1+1}&=13 \times 70 + 5\, mod\, 1000=915
\end{align*}

Pseudo Random Numbers

Application of Random Variables

The random numbers have wide applicability in the simulation techniques (also called Monte Carlo Methods) which have been applied to many problems in the various sciences and one useful in situations where direct experimentation is not possible, the cost of experimenting is very high or the experiment takes too much time.

R code to Generate Random Number

# store the pseudo random output
a = 13
b = 5
m = 1000
sim = 500
x <- numeric (sim)
x[0] = 5
for (i in 1: sim){
  x[i+1] <- (a * x[i] + b ) %% 1000
}
x[2:sim]
Pseudo Random Numbers Generation

Pseudo random numbers (PRNs) are a cornerstone of computer simulations and many other applications. However, computers cannot generate true randomness and PRNs are used extensively in many fields, including:

  • Simulations: Modeling complex systems like Financial market analysis, weather patterns, or traffic flow often relies on PRNs.
  • Games: From card shuffles to enemy movement in computer video games, PRNs add an element of chance and keep things interesting.
  • Cryptography: While not the only source, PRNs are used to generate encryption keys that appear random and improve security.

Read more about Pseudo Random Process | Random Number Generation and Linear Congruential Generator (LCG)

Read more on Wikipedia: Pseudo Randon Numbers generator

Generate Binomial Random Numbers in R

Mode Measure of Central Tendency (2014)

The mode is the most frequent observation in the data set i.e. the value (number) that appears the most in the data set. It is possible that there may be more than one mode or it may also be possible that there is no mode in a data set. Usually, it is calculated for categorical data (data belongs to nominal or ordinal scale) but is unnecessary.

It can also be used for ordinal and ratio scales, but there should be some repeated value in the data set or the data set can be classified. If any of the data points don’t have the same values (no repetition in data values), then the mode of that data set will not exit or may not be meaningful. A data set having more than one mode is called multimode or multimodal.

Example 1: Consider the following data set showing the weight of a child at the age of 10 years: 33, 30, 23, 23, 32, 21, 23, 30, 30, 22, 25, 33, 23, 23, 25. We can find the most repeated value by tabulating the given data in the form of a frequency distribution table, whose first column is the weight of the child and the second column is the number of times the weight appears in the data i.e. frequency of each weight in the first column.

Weight of 10 year childFrequency
221
235
252
303
321
332
Total15

From the above frequency distribution table, we can easily find the most repeated occurring observation (data point), which will be the mode of the data set and it is 23, meaning that the majority of the 10-year-old children weigh 23kg. Note that for finding the mode it is not necessary to make a frequency distribution table, but it helps in finding the mode quickly and the frequency table can also be used in further calculations such as percentage and cumulative percentage of each weight group.

Example 2: Consider we have information about a person about his/her gender. Consider the $M$ stands for male and $F$ stands for Female. The sequence of the person’s gender noted is as follows: F, F, M, F, F, M, M, M, M, F, M, F, M, F, M, M, M, F, F, M. The frequency distribution table of gender is

Weight of 10 year childFrequency
Male11
Female9
Total25

The most repeated gender is male, showing that the most frequent or majority of the people have male gender in this data set.

Mode can be found by simply sorting the data in ascending or descending order and then counting the frequent value without sorting the data especially when data contains a small number of observations, though it may be difficult to remember the number of times which observation occurs. Note that the mode is not affected by the extreme values (outliers or influential observations).

The mode is also a measure of central tendency, but it may not reflect the center of the data very well. For example, the mean of the data set in example 1, is 26.4kg while the mode is 23kg. Therefore, it should be used, if it is expected that data points will repeat or have some classification in them. For such kind of data, one should use it as a measure of central tendency instead of mean or median. For example,

  • In the production process, a product can be classified as a defective or non-defective product.
  • Student grades can classified as A, B, C, D, etc.
  • Gender of respondents
  • Blood Group

Example 3: Consider the following data. 3, 4, 7, 11, 15, 20, 23, 22, 26, 33, 25, 13. There is no mode of this data as each value occurs once. By grouping this data in some useful and meaningful form we can get the most repeated value of the data for example, the grouped frequency table is

GroupValuesFrequency
0 to 93, 4, 73
10 to 1911, 13, 153
20 to 2920, 22, 23, 25, 265
30 to 39331
Total12

We cannot find the most Frequent value from this table, but we can say that “20 to 29” is the group in which most of the observations occur. We can say that this group contains the mode which can be found by using the grouped formula.

Mode from Bar Graph

Bar Graph: Mode Measure of Central Tendency

Introduction to R Language

Online MCQs Test Website

Matrix in Matlab: Create and manipulate Matrices

Matrix in Matlab can be created and manipulated

Matrix (a two-dimensional, rectangular shape used to store multiple elements of data in an easily accessible format) is the most basic data structure in Matlab. The elements of a matrix can be numbers, characters, logical states of yes or no (true or false), or other Matlab structure types. Matlab also supports more than two-dimensional data structures, referred to as arrays in Matlab. Matlab is a matrix-based computing environment in which all of the data entered into Matlab is stored as a matrix.

The MATLAB environment uses the term matrix for a variable that contains real or complex numbers. These numbers are arranged in a two-dimensional grid. An array is, more generally, a vector, matrix, or higher dimensional grid of numbers. All variables in Matlab are multidimensional arrays, no matter what type of data they store. A matrix is a two-dimensional array often used for linear algebra.

It is assumed in this Matlab tutorial that you know some of the basics of how to define and manipulate vectors in Matlab software. we will discuss the following:

  1. Defining Matrix in Matlab
  2. Matrix Operations in Matlab
  3. Matrix Functions in Matlab

1)  Define or Create a Matrix in Matlab

Defining a matrix in Matlab is similar to defining a vector in Matlab. To define a matrix, treat it as a column of row vectors.

>> A=[1 2 3; 4 5 6; 7 8 9]

Note that spaces between numbers are used to define the elements of the matrix and semi-colon is used to separate the rows of matrix A. The square brackets are used to construct matrices. The individual matrix and vector entries can be referenced within parentheses. For example, A(2,3) represents an element in the second row and third column of matrix A.

Matrix in Matlab
Matrix in Matlab

A matrix in Matlab is a type of variable that is used for mathematical/statistical computation—some examples of creating a matrix in Matlab and extracting elements.

>> A=rand(6, 6)
>> B=rand(6, 4)
>> A(1:4, 3) is a column vector consisting of the first four entries of the third column of A
>> A(:, 3) is the third column of A
>> A(1:4, : ) contains column  and column 4 of matrix A

Convenient matrix-building Functions

eye –> identity
zeros –> matrix of zeros
ones –> matrix of ones
diag –> create or extract diagonal elements of a matrix
triu –> upper triangular part of a matrix
tril –> lower triangular part of a matrix
rand –> randomly generated matrix
hilb –> Hilbert matrix
magic –> magic square

2)  Matrix Operations in Matlab

Many mathematical operations can be applied to matrices and vectors in Matlab such as addition, subtraction, multiplication, and division of matrices, etc.

Matrix or Vector Multiplication

If $x$ and $y$ are both column vectors, then $x’*y$ is their inner (or dot) product, and $x*y’$ is their outer (or cross) product.

Matrix division

Let $A$ be an invertible square matrix and $b$ be a compatible column vector then

x = A/b is solution of A * x = b
x = b/A is solution of x * A = b 

These are also called the backslash (\) and slash operators (/) also referred to as the mldivide and mrdivide.

3)  Matrix Functions in Matlab

Matlab has many functions used to create different kinds of matrices. Some important matrix functions used in Matlab are

eig –> eigenvalues and eigenvectors
eigs –> like eig, for large sparse matrices
chol –> Cholesky factorization
svd –> singular value decomposition
svds –> like SVD, for large sparse matrices
inv –> inverse of matrix
lu –> LU factorization
qr –> QR factorization
hess –> Hessenberg form
schur –> Schur decomposition
rref –> reduced row echelon form
expm –> matrix exponential
sqrtm –> matrix square root
poly –> characteristic polynomial
det –> determinant of matrix
size –> size of an array
length –> length of a vector
rank –> rank of matrix

To learn more about the use of Matrices in Matlab, See the Matlab Help