Properties of Correlation Coefficient (2024)

The coefficient of correlation is a statistic used to measure the strength and direction of the linear relationship between two Quantitative variables.

Properties of Correlation Coefficient

The following are some important Properties of Correlation Coefficient.

  • The $r$ is symmetric with respect to $X$ and $Y$, i.e., $r_{XY} = r_{YX}$.
  • The $r$ ranges from $-1$ to $+1$, i.e., $-1\le r \le +1$.
  • There is no unit of $r$. The correlation coefficient $r$ is independent of the unit of measurement.
  • It is not affected by the change of origin and scale, i.e., $r_{XY}=r_{YX}$. If a constant is added to each value of a variable, it is called a change of origin and if each value of a variable is multiplied by a constant, it is called a change of scale.
  • The $r$ is the geometric mean of two regression coefficients, i.e., $\sqrt{b_{YX}\times b_{XY}}$.
    In other words, if the two regression lines of $Y$ on $X$ and $X$ on $Y$ are written as $Y=a+bX$ and $X=c+dy$ respectively then $bd=r^2$.
  • The sign of $r_{XY}, b_{YX}$, and $b_{XY}$ is dependent on covariance which is common in the three as given below:
  • $r=\frac{Cov(X, Y)}{\sqrt{Var(X) Var(Y)}},\,\, b_{YX} = \frac{Cov(Y, X)}{Var(X)}, \,\, b_{XY}=\frac{Cov(Y, X)}{Var(Y)}$

Hence, $r_{YX}, b_{YX}$, and $b_{XY}$ have the same sign.

  • If $r=-1$ the correlation is perfectly negative.
  • If $r=+1$ the correlation is perfectly positive.
  • If $r=0$ there is no correlation, i.e., there is no linear relationship between the variables. However, a non-linear relationship may exist but it does not necessarily mean that the variables are independent.
Properties of Correlation Coefficient

Theorem: Correlation: Independent of Origin and Scale. Show that the correlation coefficient is independent of origin and scale, i.e., $r_{XY}=r_{uv}$.

Proof: The formula for correlation coefficient is,

$$r_{XY}=\frac{\varSigma(X-\overline{X})((Y-\overline{Y})) }{\sqrt{[\varSigma(X-\overline{X})^2][\varSigma(Y-\overline{Y})^2]}}$$

\begin{align*}
\text{Let}\quad u&=\frac{X-a}{h}\\
\Rightarrow X&=a+hu \Rightarrow \overline{X}=a+h\overline{u} \\
\text{and}\quad v&=\frac{Y-b}{K}\\
\Rightarrow Y&=b+Kv \Rightarrow \overline{Y}=b+K\overline{v}\\
\text{Therefore}\\
r_{uv}&=\frac{\varSigma(u-\overline{u})((v-\overline{v})) }{\sqrt{[\varSigma(u-\overline{u})^2][\varSigma(v-\overline{v})^2]}}\\
&=\frac{\varSigma (a+hu-a-h\overline{u}) (b+Kv-b-K\overline{v})} {\sqrt{\varSigma(a+hu-a-h\overline{u})^2\varSigma(b+Kv-b-K\overline{v})^2}}\\
&=\frac{\varSigma(hu-h\overline{u})(Kv-K\overline{v})}{\sqrt{[\varSigma(hu-h\overline{u})^2][\varSigma(Kv-K\overline{v})^2]}}\\
&=\frac{hK\varSigma(u-\overline{u})(v-\overline{v})}{\sqrt{[h^2 K^2 \varSigma(u-\overline{u})^2] [\varSigma(v-\overline{v})^2]}}\\
&=\frac{hK\varSigma(u-\overline{u})(v-\overline{v})}{hK\,\sqrt{[\varSigma(u-\overline{u})^2] [\varSigma(v-\overline{v})^2]}}\\
&=\frac{\varSigma(u-\overline{u})(v-\overline{v}) }{\sqrt{[\varSigma(u-\overline{u})^2][\varSigma(v-\overline{v})^2]}}=
r_{uv}
\end{align*}

Note that

  1. Non-causality: Correlation does not imply causation. If two variables are strongly correlated, it does not necessarily mean that changes in one variable cause changes in the other. This is because the correlation only measures the strength and direction of the linear relationship between two quantitative variables, not the underlying cause-and-effect relationship.
  2. Sensitive to Outliers: The correlation coefficient can be sensitive to outliers, as outliers can disproportionately influence the correlation calculation.
  3. Assumption of Linearity: The correlation coefficient measures the linear relationship between variables. It may not accurately capture non-linear relationships between variables.

https://gmstat.com

https://rfaqs.com

Layout of the Factorial Design: Two Factor $2^2$ (2024)

The layout of a factorial design is typically organized in a table format. Each row of the table represents an experimental run, while each column represents a factor or the response variable. The levels of factors are indicated by symbols such as + and – for high and low levels, respectively. The response variable values corresponding to each experimental condition are recorded in the form of a sign table.

Consider a simple example layout for a two-factor factorial design with factors $A$ and $B$.

RunFactor AFactor BResponse
1$Y_1$
2+$Y_2$
3+$Y_3$
4++$Y_4$

Layout of the Factorial Design: Two Factor in $n$ Replicates

Consider there are two factors and each factor has two levels in $n$ replicates. The layout of the factorial design will be as described below for $n$ replicates.

Layout for the factorial design Two Factor Two Level

$y_{111}$ is the response from the first factor at the low level, the second factor at the low level, and the first replicate of the trial. Similarly, $y_{112}$ represents the second replicate of the same trial, and up to $n$th observation is $n$th trial at the same level of $A$ and $B$.

Geometrical Structure of Two-Factor Factorial Design

The geometrical structure of two factors (Factor $A$ and $B$), each factor has two levels, low ($-$) and high (+). Response 1 is at the low level of $A$ and a low level of $B$, similarly, response 2 is produced at a high level of $A$ and a low level of $B$. The third response is at a low level of $A$ and a high level of $B$, similarly, the 4th response is at a high level of $A$ and a high level of $B$.

Geometrical Structure of two Factor Layout of Factorial Experiment

Real Life Example

The concentration of reactant vs the amount of the catalyst produces some response, the experiment has three replicates.

Layout of Two Factors Real Life Example

Geometrical Structure of the Example

Layout of the Factorial Design: Two Factor $2^2$ (2024)

Factor Effects

\begin{align} A &=\frac{(a+ab)-((I) +b)}{2} = \frac{100+90-80-60}{2} = 25\\
B &= \frac{(b+ab) – ((I) +a) }{2} = \frac{60+90-80-100}{2} = -15\\
AB&=\frac{((I)+ab)-(a+b)}{2} = \frac{80+90-100-60}{2}=5
\end{align}

Minus 15 ($-15$) is the effect of $B$, which shows the change in factor level from low to high bringing on the average $-15$ decrease in the response.

Reference

Montgomery, D. C. (2017). Design and Analysis of Experiments. 9th ed, John Wiley & Sons.

R and Data Analysis

Test Preparation MCQs

MCQ Level of Measurement 13 (2024)

The post is about the MCQ Level of measurement and covers the concepts related to statistical data and variables. The understanding of these important concepts helps in understanding the important aspects of data from different fields of study and their statistical analysis.

The quiz MCQ Level of Measurement is designed to test your knowledge of data and variables in statistics.

Online MCQs about Statistics Data and Variables with Answers.

1. Temperature on a centigrade scale (no absolute zero point) is a measurement scale of:

 
 
 
 

2. Examples of variables in statistical phenomena consist

 
 
 
 

3. Measurement scale which allows the determination of differences in intervals is classified as

 
 
 
 

4. Participants in an experiment are asked to wear headphones. Across a four-minute long interval, the experimenter presents audio clips of different instruments. Participants are asked to raise their hands every time they hear a new instrument. Their total score is the number of correct responses. The measurement scale is

 
 
 
 

5. _____ is the process of changing data to make it more organized and easier to read.

 
 
 
 

6. If a data analyst is using data that has been _____, the data will lack integrity and the analysis will be faulty.

 
 
 
 

7. As a data analyst, you are working for a national pizza restaurant chain. You have a dataset with monthly order totals for each branch over the past year. With only this data, what questions can you answer?

 
 
 
 

8. The data in which we study Regions is called

 
 
 
 

9. What can jeopardize data integrity throughout its lifecycle?

 
 
 
 

10. The data measurement which arises from a specific process of counting is classified as a

 
 
 
 

11. A measurement scale in which values are categorized to represent qualitative differences and ranked in a meaningful manner is classified as

 
 
 
 

12. Classifying elementary school children as nonreaders (0), starting readers (1), or advanced readers (2) to place each child in a reading group is

 
 
 
 

13. The collection of observations for all variables related to some research or findings is classified as

 
 
 
 

14. Reporting the temperature of a summer day in the state of California in degrees Fahrenheit is a measurement scale of

 
 
 
 

15. Which of the following conditions are necessary to ensure data integrity?

 
 
 
 

16. A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decided to generate new data that represents all continents. What type of insufficient data does this scenario describe?

 
 
 
 

17. A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?

 
 
 
 

18. A researcher asks a random sample of freshmen to describe how they feel about their first year at a university. Research assistants use predetermined criteria to assign categories to each description given: confident, Nervous, Fearful, and Insecure. What is the level of measurement used for described phenomena?

 
 
 
 

19. Number of students in a stats class

 
 
 
 

20. Which of the following conditions are necessary to ensure data integrity?

 
 
 
 


In the subject of statistics, data is collected, organized, presented, and analyzed, and interpretation is made to make wise and intelligent decisions. The data is a collection of variables, whereas a variable is some kind of measure that can vary regarding time, person/object, place, etc. Let us start with the MCQ level of measurement quiz.

In statistics, data can be classified based on the level of measurement, which refers to the nature of the information captured by the data. There are four main levels of measurement: (i) nominal, (ii) ordinal, (ii) interval, and (iv) ratio.

Level of Measurement

Nominal Level

Characteristics: Categories or labels without any inherent order.
Examples: Gender (male, female), colors, types of fruits.

Ordinal Level

Characteristics: Categories with a meaningful order or rank, but the differences between the categories are not uniform.
Examples: Educational levels (high school, college, graduate), customer satisfaction ratings (poor, fair, good, excellent).

Interval Level

Characteristics: Categories with a meaningful order, and the differences between the categories are uniform, but there is no true zero point.
Examples: Temperature (measured in Celsius or Fahrenheit) and IQ scores.

Ratio Level

Characteristics: Categories with a meaningful order, uniform differences between categories, and a true zero point.
Examples: Height, weight, income, age.

Understanding the level of measurement is crucial because it determines the types of statistical analyses that can be performed on the data. Different statistical tests and methods are appropriate for each level, and using an inappropriate analysis may lead to incorrect conclusions.

In summary, nominal data involve categories without order, ordinal data have ordered categories with non-uniform differences, interval data have ordered categories with uniform differences but no true zero, and ratio data have ordered categories with uniform differences and a true zero point.

MCQ Level of Measurement

MCQ Level of Measurement

  • Temperature on a centigrade scale (no absolute zero point) is a measurement scale of:
  • Participants in an experiment are asked to wear headphones. Across a four-minute long interval, the experimenter presents audio clips of different instruments. Participants are asked to raise their hands every time they hear a new instrument. Their total score is the number of correct responses. The measurement scale is
  • Reporting the temperature of a summer day in the state of California in degrees Fahrenheit is a measurement scale of
  • Number of students in a stats class
  • Classifying elementary school children as nonreaders (0), starting readers (1), or advanced readers (2) to place each child in a reading group is
  • A researcher asks a random sample of freshmen to describe how they feel about their first year at a university. Research assistants use predetermined criteria to assign categories to each description given: confident, Nervous, Fearful, and Insecure. What is the level of measurement used for described phenomena?
  • Which of the following conditions are necessary to ensure data integrity?
  • ___________ is the process of changing data to make it more organized and easier to read.
  • As a data analyst, you are working for a national pizza restaurant chain. You have a dataset with monthly order totals for each branch over the past year. With only this data, what questions can you answer?
  • A data analyst is working on a project about the global supply chain. They have a dataset with lots of relevant data from Europe and Asia. However, they decided to generate new data that represents all continents. What type of insufficient data does this scenario describe?
  • If a data analyst is using data that has been _________, the data will lack integrity and the analysis will be faulty.
  • Which of the following conditions are necessary to ensure data integrity?
  • A financial analyst imports a dataset to their computer from a storage device. As it’s being imported, the connection is interrupted, which compromises the data. Which of the following processes caused the compromise?
  • What can jeopardize data integrity throughout its lifecycle?
  • The data in which we study Regions is called
  • A measurement scale in which values are categorized to represent qualitative differences and ranked in a meaningful manner is classified as
  • The measurement scale which allows the determination of differences in intervals is classified as
  • Data measurement which arises from a specific process of counting is classified as a
  • The collection of observations for all variables related to some research or findings is classified as
  • Examples of variables in statistical phenomena consist

Computer MCQs Test Online

R Programming Language

What is Factor Effects of $2^2$ Design (2024)

The smallest case of a $2^K$ factorial experiment is one in which 2 factors are of interest and each factor has two levels. This design is known as a $2^2$ factorial design. We are interested in Factor effects or Effects of Factors.

The level of the factors (say $a$' and$b$’) may be called the low and high or presences and absences.

In a factorial design, in each complete trial (or replicate of the experiment), all possible combinations of the levels of the factors are investigated. For example, if Factor-A has $a$'' levels and Factor-B has$b$” levels, then each replicate contains all “$ab$” treatment combinations. Two factors each at 2 levels are:

Factors Effects Factors at Level 2

Factor Effects (or Effect of Factors)

A change in the quantity of response due to the change in the level of a factor is called the effect of that factor. Here we mean average effect.

Main Effects

A main effect of a factor is defined as a measure of the average change in effect produced by changing the level of the factor. It is measured independently from the effect of other factors. The main effect is the effect of the factor only. Main effects are sometimes regarded as an interaction of zero order. Frequently, the main effect refers to the primary factors of interest in the experiment.

Interaction Effects

Factors are said to interact when they are not independent. Interaction in a factorial experiment is a measure of the extent to which the effect of changing the levels of one or more factors depends on the levels of the other factors. Interactions between two factors are referred to as first-order interactions, those concerning three factors, as second-order interactions, and so on.

Example: Consider a two-factor factorial experiment. Consider an investigation into the effect of the concentration of reactant (Factor $A$) and the presence of catalysts on the reaction time of the chemical process (Factor $B$).

Factor Effects

Solution of Example

Main Effects

\begin{align}
\text{Main effect of A} & = \text{Average response at high level of $A$} – \text{Average response at low level of $A$}\\
&=\frac{45+60}{2}-\frac{20+35}{2}=25
\end{align}

The results indicate that Increasing Factor-A from the low level to the high level causes an average response increase of 25 units.

\begin{align}
\text{Main effect of B}&=\text{Average response at high level of $B$} -\text{ Average response at low level of $B$}\\
&=\frac{35+60}{2}-\frac{20+45}{2}=15
\end{align}

Increasing Factor B from the low level to the high level causes an average response increase of 15 units.

Effect of AB Interaction

It is possible that the difference in response between the levels of a factor is not the same at all levels of the other factor(s), then there is an interaction between the factors. Consider

Factor Effects with Interaction

\begin{align}
\text{The effect of Factor $A$ (at low level of Factor $B$)} &= 50 – 20 = 30\\
\text{The effect of Factor $A$ (at high level of Factor $B$)}&= 15 – 40 = -25\\
\text{The effect of Factor $B$ (at low level of Factor $A$)} &= 40 – 20 = 20\\
\text{The effect of Factor $B$ (at high level of Factor $B$)} &= 15 – 50 = -35
\end{align}

Because the effect of Factor-$A$ depends on the level chosen for Factor-$B$, we see that there is interaction between $A$ and $B$. One can computer Effect of $AB$ interaction as described below:

Effect of AB Interaction
= Average difference between effect of $A$ at high level of $B$ and the effect of $A$ at low level of $B$.

The magnitude of the interaction effect is the average difference in these two A effects, or $AB=\frac{-25-30}{2}=\frac{-55}{2}$.

OR

= Average difference between effect of $B$ at high level of $A$ and the effect of $B$ at low level of $A$.

The magnitude of the interaction effect is the average difference in these two B effects, or $AB = \frac{-35-20}{2} = \frac{-55}{2}$.

The interaction is large in this experiment.

Computer MCQs Test Online

R Programming Language