Multivariate Analysis

Multivariate Analysis term is used to include all statistics for more than two variables which are simultaneously analyzed.

Multivariate analysis is based upon an underlying probability model known as the Multivariate Normal Distribution (MND). The objective of scientific investigations to which multivariate methods most naturally lend themselves includes.

  • Data reduction or structural simplification
    The phenomenon being studied is represented as simply as possible without sacrificing valuable information. It is hoped that this will make interpretation easier.
  • Sorting and Grouping
    Graphs of similar objects or variable are created, based upon measured characteristics. Alternatively, rules for classifying objects into well-defined groups may be required.
  • Investigation of the dependence among variables
    The nature of the relationships among variables is of interest. Are all the variables mutually independent or one or more variables dependent on the basis of observation on the other variables.
  • Prediction
    Relationships between variables must be determined for the purpose of predicting the values of one or more variables on the basis of observation on the other variables.
  • Hypothesis Construction and testing
    Specific statistical hypothesis, formulated in terms of the parameter of the multivariate population, are tested. This may be done to validate assumptions or to reinforce prior convictions.

The Organization of Multivariate Data Analysis

We concerned with analyzing measurements made on several variables or characteristics. These measurements (data) must frequently be arranged and displayed in various ways (graphs, tabular form etc). Preliminary concepts underlying these first steps of data organization are

Array

Multivariate data arise whenever an investigator, seeking to understand a social or physical phenomenon, selects a number of variables $p\ge$ of variables or characteristics to record. The values of these variables are all recorded for each distinct item, individual or experimental unit.

$X_{jk}$ notation is used to indicate the particular value of the kth variable that is observed on the jth item or trial. i.e. $X_{jk}$ measurement of the kth variable on the jth item. So, $n$ measurements on $p$ variables can be displayed as

\[\begin{array}{ccccccc}
. & V_1 & V_2  & \dots  & V_k & \dots  & V_p \\
Item 1 & x_{11} & x_{12} & \dots  & x_{1k} & \dots  & x_{1p} \\
Item 2 & x_{21} & x_{22} & \dots  & x_{2k} & \dots  & x_{2p} \\
\vdots & \vdots  & \vdots  & \vdots & \vdots   & \vdots & \vdots  \\
Item j  & x_{j1}   & x_{j2} & \dots  & x_{jk} & \dots  & x_{jp} \\
\vdots &  \vdots & \vdots & \vdots & \vdots   & \vdots & \vdots  \\
Item n & x_{n1} & x_{n2} & \dots  & x_{nk} & \dots  & x_{np} \\
\end{array}\]

These data can be displayed as rectangular arrays $X$ of $n$ rows and $p$ columns

\[X=\begin{pmatrix}
x_{11}     & x_{12} & \dots  & x_{1k}  & \dots  & x_{1p} \\
x_{21}     & x_{22} & \ddots  & x_{2k}  & \ddots  & x_{2p} \\
\vdots & \vdots & \ddots  & \ddots & \vdots & \vdots  \\
x_{j1}     & x_{j2} & \ddots  & x_{jk}  & \ddots  & x_{jp} \\
\vdots  & \vdots & \ddots  & \vdots & \ddots & \vdots  \\
x_{n1}     & x_{n2} & \dots  & x_{nk}  & \dots  & x_{np}
\end{pmatrix}\]

This $X$ array contains the data consisting of all of the observations on all of the variables.

Example: Let we have data for the number of books sold and the total amount of each sale.

Variable 1 (Sales in Dollars)
\[\begin{array}{ccccc}
Data Values: & 42 & 52 & 48 & 63 \\
Notation: & x_{11} & x_{21} & x_{31} & x_{41}
\end{array}\]

Variable 2 (Number of Books sold)
\[\begin{array}{ccccc}
Data Values: & 4 & 2 & 8 & 3 \\
Notation: & x_{12} & x_{22} & x_{33} & x_{42}
\end{array}\]

The information available in the data can be assessed by calculating certain summary numbers, known as multivariate descriptive statistics such as Arithmetic Mean, Sample Mean (measure of location), Average of the Squares of the distances of all of the numbers from the mean (variation/spread i.e. Measure of Spread or Variation).

Muhammad Imdad Ullah

Currently working as Assistant Professor of Statistics in Ghazi University, Dera Ghazi Khan. Completed my Ph.D. in Statistics from the Department of Statistics, Bahauddin Zakariya University, Multan, Pakistan. l like Applied Statistics, Mathematics, and Statistical Computing. Statistical and Mathematical software used is SAS, STATA, Python, GRETL, EVIEWS, R, SPSS, VBA in MS-Excel. Like to use type-setting LaTeX for composing Articles, thesis, etc.

You may also like...

2 Responses

  1. Rasib says:

    Plzz upload mcqs about multivariate analysis

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

x Logo: Shield Security
This Site Is Protected By
Shield Security