Multicollinearity in Regression Models

The post is about Multicollinearity in Regression Models.

The objective of multiple regression analysis is to approximate the relationship of individual parameters of a dependency, but not of interdependency. It is assumed that the dependent variable $y$ and regressors $X$’s are linearly related to each other (Graybill, 1980; Johnston, 1963; and Malinvaud, 1968). Therefore, inferences depicted from any regression model are

(i) Identify the relative influence of regressors
(ii) Prediction and/or estimation and
(iii) Selection of an appropriate set of regressors for the model.

Multicollinearity in Regression Models

From all these inferences, one of the purposes of the regression model is to ascertain what extent the dependent variable can be predicted by the regressors in the model. However, to draw some suitable inferences, the regressors should be orthogonal, i.e., there should be no linear dependencies among regressors. However, in most of the applications of regression analysis, regressors are not orthogonal, which leads to misleading and erroneous inferences, especially, in cases when regressors are perfectly or nearly perfectly collinear to each other.

Regarding the multicollinearity in Regression, the condition of non-orthogonality is also referred to as the problem of multicollinearity or collinear data, for example, see Gunst and Mason, 1977;  Mason et al., 1975 and Ragnar, 1934). Multicollinearity is also synonymous with ill-conditioning of the $X’X$ matrix.

The presence of interdependence or the lack of independence is signified by high-order inter-correlation ($R=X’X$) within a set of regressors (Dorsett et al, 1983; Farrar and Glauber1967; Gunst and Mason, 1977; Mason et al., 1975). The perfect multicollinearity situation is a pathological extreme and it can easily be detected and resolved by dropping one of the regressors causing multicollinearity (Belsley et al., 1980). In the case of perfect multicollinearity, the regression coefficients remain indeterminate and their standard errors are infinite. Similarly, perfectly collinear regressors destroy the uniqueness of the least square estimators (Belsley et al., 1980; and Belsley, 1991).

Many explanatory variables (regressors/ predictors) are highly collinear, making it very difficult to infer the separate influence of collinear regressors on the response variable ($y$), that is, estimation of regression coefficients becomes difficult because coefficient(s) measures the effect of the corresponding regressor while holding all other regressors as constant. The problem of not perfect multicollinearity is extremely hard to detect (Chatterjee and Hadi, 2006) as it is not a specification or modeling error it is a condition of deficit data (Hadi and Chatterjee, 1988). On the other hand, the existence of multicollinearity has no impact on the overall regression model and associated statistics such as $R^2$, $F$-ratio, and $p$-value.

Multicollinearity does not also lessen the predictive or reliability of the regression model as a whole, it only affects the individual regressors (Koutsoyiannis, 1977). Note that, multicollinearity refers only to the linear relationships among the regressors, it does not rule out the nonlinear relationships among them. To draw suitable inferences from the model, the existence of (multi)collinearity should always be tested when examining a data set as an initial step in multiple regression analysis. On the other hand, high collinearity is rare, but some degree of collinearity always exists.

Multicollinearity in Linear Regression Models

A distinction between collinearity and multicollinearity should be made. Strictly speaking, multicollinearity usually refers to the existence of more than one exact linear relationship among regressors, while collinearity refers to the existence of a single linear relationship. However, multicollinearity refers to both of the cases nowadays.

There are many methods for the detection/ testing of multi(collinearity) among regressors. However, these methods can destroy the usefulness of the model, since relevant regressor(s) may be removed by these methods. Note that, if there are two predictors then it is sufficient to detect the problem of collinearity using pairwise correlation. However, to check the severity of the collinearity problem, VIF/TOL, eigenvalues, or other diagnostic measures can be used.

For further details about “Multicollinearity in Regression Models” see:

  • Belsley, D., Kuh, E., and Welsch, R. (1980). Diagnostics: Identifying Influential Data and Sources of Collinearity. John Willey & Sons, New York. chap. 3.
  • Belsley, D. A. (1991). A Guide to Using the Collinearity Diagnostics. Computer Science in Economics and Management, 4(1), 3350.
  • Chatterjee, S. and Hadi, A. S. (2006). Regression Analysis by Example. Wiley and Sons, 4th edition.
  • Dorsett, D., Gunst, R. F., and Gartland, E. C. J. (1983). Multicollinear Effects of Weighted Least Squares Regression. Statistics & Probability Letters, 1(4), 207211.
  • Graybill, F. (1980). An Introduction to Linear Statistical Models. McGraw Hill.
  • Gunst, R. and Mason, R. (1977). Advantages of examining multicollinearities in regression analysis. Biometrics, 33, 249260.
  • Hadi, A. and Chatterjee, S. (1988). Sensitivity Analysis in Linear Regression. John Willey & Sons.
  • Imdadullah, M., Aslam, M. and Altaf, S. (2916) mctest: An R Package for Detection of Collinearity Among Regressors
  • Imdadullah, M., Aslam, M. (2016). mctest: An R Package for Detection of Collinearity Among Regressors
  • Johnston, J. (1963). Econometric Methods. McGraw Hill, New York.
  • Koutsoyiannis, A. (1977). Theory of Econometrics. Macmillan Education Limited.
  • Malinvaud, E. (1968). Statistical Methods of Econometrics. Amsterdam, North Holland. pp. 187192.
  • Mason, R., Gunst, R., and Webster, J. (1975). Regression Analysis and Problems of Multicollinearity. Communications in Statistics, 4(3), 277292.
  • Ragnar, F. (1934). Statistical Consequence Analysis by means of complete regression systems. Universitetets Økonomiske Instituut. Publ. No. 5.

Learn about Data analysis of Statistical Models in R

Introduction to Algebra (2021)

This post is about an Introduction to Algebra.

The basics of algebra include numbers, variables, constants, expressions, equations, linear equations, and quadratic equations. Further, it involves the basic arithmetic operations of addition, subtraction, multiplication, and division within the algebraic expressions.

Introduction to Algebra

We work with numbers in arithmetic, while in algebra we use numbers as well as Alphabets such as $A, B, C, a, b$, and $c$ for any numerical values we choose. We can say that algebra is an extension of arithmetic. For example, the arithmetic sum of two numbers $5+3=8$ means that the sum of numbers 5 and 3 is 8. In algebra, two numbers can be summed by the expression $x+y=z$ which is the general form that can be used to add any two numbers. For example, if $x=5$ and $y=3$ then $x+y$ will be equivalent to the left-hand side ($5+3$) and the summation of these numbers will be equivalent to the right-hand side $z$ which is 8.

In algebra all arithmetic operators such as $+, -, \times, =$ and $\div$, etc., can be used used.

For example, $x-y=z$ means that the difference between two numbers is equal to the number represented by the letter z. In algebra, many other notations used are the same as in arithmetic. For example,

$c=a\times b$ means that the product of two numbers represented by $a$ and $b$ is equal to the number $c$.

$x \times x \times x$ can be written as $a^4$.

Introduction to Algebra

From the above discussion, note that letters of the alphabet represent variables, and arithmetic operators (+, -, etc) represent the mathematical operations on a variable. The combination of numbers and letters of the alphabet is called an algebraic expression. For example, $8x + 7y$, $x+y$, and $7x^2+2xy-5y^2$ etc. are examples of expression.

Some important points to remember:

  • Algebra is like a toolbox for solving mathematics problems with unknowns. Instead of using specific numbers, we use letters like $x$, $y$, and $z$ to represent unknown values. These letters are called variables.
  • A variable is a quantity (usually denoted by letters of the alphabet) in algebraic expressions and equations, that changes from place to place, person, to person, and/or time to time. The variable can have any one of a range of possible values.
  • A factor that multiplies with a variable. For example, in $2x^3+3x=0$, $x$ is a variable, 2 is the coefficient of $x^3$, and 3 is the coefficient of $x$.

We learning algebra the following concepts are very important to understand the concepts used in algebra:

  • Variables are the building blocks, representing unknown numbers.
  • Expressions are combinations of variables, numbers, and mathematical operations (such as +, -, *, /) that do not necessarily have an equal sign (=).
  • Equations are statements (or expressions) with an equal sign that shows two expressions are equivalent. The equations are solved to find the value of the variable.
  • Inequalities are statements (or expressions) having “greater than” (>), “less than” (<), or “not equal to” (≠) symbols for making comparisons between expressions.

Real Life Applications of Algebra

  • Finances and Budgeting: Algebra helps you create formulas to track income, expenses, savings goals, and loan payments. You can set up equations to see how much extra money you’d have if you cut certain expenses or how long it’ll take to save for a down payment on a house.
  • Mixing and Ratios: Whether you’re baking a cake or mixing paint colors, algebra helps you determine the correct ratios of ingredients to achieve the desired outcome. You can set up proportions to find out how much water to add to a paint concentrate or how much flour you need to double a recipe.
  • Motion and Physics: From calculating travel time based on speed and distance to understanding the trajectory of a thrown ball, algebra forms the foundation for many physics concepts. You can use formulas to figure out how long it’ll take to drive somewhere at a certain speed or the angle needed to throw a basketball into the hoop.
  • DIY Projects and Home Improvement: From measuring lumber for a bookshelf to calculating the amount of paint needed for a room, algebra helps with planning and executing home improvement tasks. You can use formulas to find the area of a wall to determine how much paint to buy or calculate the volume of wood needed for a project.
  • Scientific Research and Data Analysis: Algebra is the backbone of many scientific formulas and equations used in research. It helps analyze data, identify trends, and make predictions.
https://itfeature.com

Learn more about Mathematical Expressions

Learn R Programming Language

Levels of Measurement (2021): A Comprehensive Tutorial

Levels of Measurement (Scale of Measure)

The levels of measurement (scale of measures) have been classified into four categories. It is important to understand these measurement levels since they play an important part in determining the arithmetic and different possible statistical tests carried on the data. The scale of measure is a classification that describes the nature of the information within the number assigned to a variable. In simple words, the level of measurement determines how data should be summarized and presented.

It also indicates the type of statistical analysis that can be performed. The four-level of measurements are described below:

Nominal Level of Measurement (Nominal Scale)

At the nominal level of measurement, the numbers are used to classify the data (unordered group) into mutually exclusive categories. In other words, for the nominal level of measurement, observations of a qualitative variable are measured and recorded as labels or names.

Ordinal Level of Measurement (Ordinal Scale)

In the ordinal level of measurement, the numbers are used to classify the data (ordered group) into mutually exclusive categories. However, it does not allow for a relative degree of difference between them. In other words, for the ordinal level of measurement, observations of a qualitative variable are either ranked or rated on a relative scale and recorded as labels or names.

Interval Level of Measurement (Interval Scale)

For data recorded at the interval level of measurement, the interval or the distance between values is meaningful. The interval scale is based on a scale with a known unit of measurement.

Ratio Level of Measurement (Ratio Scale)

Data recorded at the ratio level of measurement are based on a scale with a known unit of measurement and a meaningful interpretation of zero on the scale. Almost all quantitative variables are recorded on the ratio level of measurement.

Levels of Measurement

Examples of levels of measurement

Examples of Nominal Level of Measurement

  • Religion (Muslim, Hindu, Christian, Buddhist)
  • Race (Hispanic, African, Asian)
  • Language (Urdu, English, French, Punjabi, Arabic)
  • Gender (Male, Female)
  • Marital Status (Married, Single, Divorced)
  • Number plates on Cars/ Models of Cars (Toyota, Mehran)
  • Parts of Speech (Noun, Verb, Article, Pronoun)

Examples of Ordinal Level of Measurement

  • Rankings (1st, 2nd, 3rd)
  • Marks Grades (A, B, C, D)
  • Evaluations such as High, Medium, and Low
  • Educational level (Elementary School, High School, College, University)
  • Movie Ratings (1 star, 2 stars, 3 stars, 4 stars, 5 stars)
  • Pain Ratings (more, less, no)
  • Cancer Stages (Stage 1, Stage 2, Stage 3)
  • Hypertension Categories (Mild, Moderate, Severe)

Examples of Interval Levels of Measurement

  • Temperature with Celsius scale/ Fahrenheit scale
  • Level of happiness rated from 1 to 10
  • Education (in years)
  • Standardized tests of psychological, sociological, and educational discipline use interval scales.
  • SAT scores

Examples of Ratio Level of Measurement

  • Height
  • Weight
  • Age
  • Length
  • Volume
  • Number of home computers
  • Salary

In essence, levels of measurement act like a roadmap for statistical analysis. They guide us in selecting the most appropriate methods to extract valuable insights from the data under study. The level of measures is very important because they help us in

  • Choosing the right statistical tools: Different levels of measurement are used for different statistical methods. For example, One can compute a measure of central tendency (such as mean and median) for data on income (which is interval level), but a measure of central tendency (such as mean and median) cannot be computed for data on favorite color (which is nominal level, the mode can be computed regarding the measure of central tendency).
  • Drawing valid conclusions: If the statistical test is misused because of a misunderstanding of the measurement level of the data, the conclusions might be misleading or even nonsensical. Therefore, measurement levels help us ensure that analysis reflects the actual characteristics of the data.
  • Making meaningful comparisons: Levels of measurement also allow us to compare data points appropriately. For instance, one can say someone is 2 years older than another person (ordinal data), but one cannot say that their preference for chocolate ice cream is twice as strong (nominal data).
Levels of Measurement

FAQS About Levels of Measurements

  1. What do you mean by measurement levels?
  2. What is the role of Levels of Measurement in Statistics?
  3. Compare, nominal, ordinal, ratio, and interval scale.
  4. What measures of central tendency can be performed on which measurement level?
  5. What is the importance of measurement levels?
  6. Give at least five, five examples of each measurement level.

Online MCQs Test Preparation Website

Contingency Tables

Introduction to Contingency Tables

Contingency Tables also called cross tables or two-way frequency tables describe the relationship between several categorical (qualitative) variables. A bivariate relationship is defined by the joint distribution of the two associated random variables.

Contingency Tables

Let $X$ and $Y$ be two categorical response variables. Let variable $X$ have $I$ levels and variable $Y$ have $J$. The possible combinations of classifications for both variables are $I\times J$. The response $(X, Y)$ of a subject randomly chosen from some population has a probability distribution, which can be shown in a rectangular table having $I$ rows (for categories of $X$) and $J$ columns (for categories of $Y$).

The cells of this rectangular table represent the $IJ$ possible outcomes. Their probability (say $\pi_{ij}$) denotes the probability that ($X, Y$) falls in the cell in row $i$ and column $j$. When these cells contain frequency counts of outcomes, the table is called a contingency or cross-classification table and it is referred to as a $I$ by $J$ ($I \times J$) table.

Joint and Marginal Distribution

The probability distribution {$\pi_{ij}$} is the joint distribution of $X$ and $Y$. The marginal distributions are the rows and columns totals obtained by summing the joint probabilities. For the row variable ($X$) the marginal probability is denoted by $\pi_{i+}$ and for column variable ($Y$) it is denoted by $\pi_{+j}$, where the subscript “+” denotes the sum over the index it replaces; that is, $\pi_{i+}=\sum_j \pi_{ij}$ and $\pi_{+j}=\sum_i \pi_{ij}$ satisfying

$l\sum_{i} \pi_{i+} =\sum_{j} \pi_{+j} = \sum_i \sum_j \pi_{ij}=1$

Note that the marginal distributions are single-variable information, and do not pertain to association linkages between the variables.

Contingency Tables, Cross Tabulation

In (many) contingency tables, one variable (say, $Y$) is a response, and the other $X$) is an explanatory variable. When $X$ is fixed rather than random, the notation of a joint distribution for $X$ and $Y$ is no longer meaningful. However, for a fixed level of $X$, the variable $Y$ has a probability distribution. It is germane to study how this probability distribution of $Y$ changes as the level of $X$ changes.

Contingency Table Uses

  • Identify relationships between categorical variables.
  • See if one variable is independent of the other (i.e. if the frequency of one category is the same regardless of the other variable’s category).
  • Calculate probabilities of specific combinations occurring.
  • Often used as a stepping stone for further statistical analysis, like chi-square tests, to determine if the observed relationship between the variables is statistically significant.

Read More about Contingency Tables

https://itfeature.com

Computer MCQs Test Online

R Programming Language