Quote of the Day

Statistics can be made to prove anything - even the truth.
~itfeature.com
 
Share
VN:F [1.9.16_1159]
Rating: 0 (from 0 votes)
VN:F [1.9.16_1159]
Rating: 0.0/5 (0 votes cast)

Inverse Regression Analysis

In most regression problems we have to determine the value of Y corresponding to a given value of x. We will consider the inverse problem, which is called inverse regression or calibration.

Assume we have known values of x and their corresponding Y values, which both form a simple linear regression model and we have also an unknown value of x, such as x_0, which cannot be measured and we can observe the corresponding value of Y, say Y_0. Then, x_0 can be estimated and a confidence interval for x_0 can be obtained.

In regression analysis we want to investigate the relationship between variables. Regression has many applications, which occur in many fields: engineering, economics, the physical and chemical sciences, management, biological sciences and social sciences. We only consider the simple linear regression model, which is a model with one regressor X that has a linear relationship with a response Y. It is not always easy to measure the regressor X or the response Y.

We now consider a typical example for this problem. If X is the concentration of glucose in certain substances, then a spectrophotometric method is used to measure the absorbance. This absorbance depends on the concentration X. The response Y is easy to measure with the spectrophotometric method, but the concentration on the other hand is not easy to measure. If we have n known concentrations, then the absorbance can be measured. If there is a linear relation between X and Y, then a simple linear regression model can be made with these data. Suppose we have an unknown concentration, which is difficult to measure, but we can measure the absorbance of this concentration. Is it possible to estimate this concentration with the measured absorbance? This is called the calibration problem.

Suppose we have a linear model Y = \beta_0+ \beta_1X + \epsilon and we have an observed value of the response Y, but we do not have the corresponding value of X. How can we estimate this value of X? The two most important methods to estimate x are the classical method and the inverse method.

The classical method is based on the simple linear regression model

Y = \beta_0+ \beta_1 X + \epsilon    where  \epsilon \sim N(0,\sigma^2)

where the parameters \beta_0 and \beta_1 are estimated by Least Squares as \beta_0 and \beta_1 . At least two of the n values of X have to be distinct, otherwise we cannot fit a reliable regression line. For a given value of X, say x_0 (unknown), a Y value, say Y_0 (or random sample of k values of Y) is observed at the x_0 value. The problem is to estimate x_0. The classical method uses a Y_0 value (or the mean of k values of Y_0) to estimate x_0, which is then estimated by \hat{x_0}=\frac{\hat{Y_0}-\hat{\beta_0}} {\hat{\beta_1}}.

The inverse estimator is the simple linear regression of X on Y. In this case, we have to fit the model

    \[X=\alpha_0+\alpha_1 Y + \epsilon\]

where

    \[\epsilon \sim N(0,\sigma^2\]

to obtain the estimator. Then the inverse estimator of x_0is

    \[X_0=\alpha_0+\alpha_1 Y + \epsilon\]

VN:F [1.9.16_1159]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.16_1159]
Rating: 0 (from 0 votes)
Share
 
Share
VN:F [1.9.16_1159]
Rating: +1 (from 1 vote)
VN:F [1.9.16_1159]
Rating: 5.0/5 (1 vote cast)

Coefficient of Determination R^2

R^2 is a useful statistics to check the value of regression fit. R^2 measures the proportion of total variation about the mean \bar{Y}explained by the regression. Ris the correlation between Y and \hat{Y} and is usually the multiple correlation coefficient. R^2 can take values as high as 1 (100%) when all the  values are different. When repeats runs exists in the data the value of R^2 cannot attain 1, no matter how well model fits, because no model can explain the variation in the data due to pure error.

    \begin{eqnarray*} R^2 &=& \frac{\text {SS due to regression given}\, b_0}{\text{Total SS corrected for mean} \, \bar{Y}} \nonumber\\ &=& \frac{SS \, (b_1 | b_0)}{S_{YY}} \nonumber \\ &=& \frac{\sum(\hat{Y_i}-\bar{Y})^2} {\sum(Y_i-\bar{Y})^2} \nonumber \\ &=& \frac{S^2_{XY}}{(S_{XY})(S_{YY})} \nonumber \end{eqnarray*}

where summation are over i=1, 2, \cdots ,n

 

VN:F [1.9.16_1159]
Rating: 5.0/5 (1 vote cast)
VN:F [1.9.16_1159]
Rating: +1 (from 1 vote)
Share
 
Share
VN:F [1.9.16_1159]
Rating: 0 (from 0 votes)
VN:F [1.9.16_1159]
Rating: 0.0/5 (0 votes cast)

Multiple Regression Analysis

In this case the unstandardized multiple regression coefficient is interpreted as the predicted change in Y (i.e., the DV) given a one unit change in X (i.e., the IV) while controlling for the other independent variables included in the equation.

  • The regression coefficient in multiple regression is called the partial regression coefficient because the effects of the other independent variables have been statistically removed or taken out (“partialled out”) of the relationship.
  • If the standardized partial regression coefficient is being used, the coefficients can be compared for an indicator of the relative importance of the independent variables (i.e., the coefficient with the largest absolute value is the most important variable, the second is the second most important, and so on.)
VN:F [1.9.16_1159]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.16_1159]
Rating: 0 (from 0 votes)
Share
 
Share
VN:F [1.9.16_1159]
Rating: 0 (from 0 votes)
VN:F [1.9.16_1159]
Rating: 0.0/5 (0 votes cast)

Simple Regression Analysis

The basic or unstandardized regression coefficient is interpreted as the predicted change in Y (i.e., the DV) given a one unit change in X (i.e., the IV). It is in the same units as the dependent variable.

  • Note that there is another form of the regression coefficient that is important: the standardized regression coefficient. The standardized coefficient varies from –1.00 to +1.00 just like a simple correlation coefficient;
  • If the regression coefficient is in standardized units, then in simple regression the regression coefficient is the same thing as the correlation coefficient.
VN:F [1.9.16_1159]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.16_1159]
Rating: 0 (from 0 votes)
Share
 
Share
VN:F [1.9.16_1159]
Rating: 0 (from 0 votes)
VN:F [1.9.16_1159]
Rating: 0.0/5 (0 votes cast)

Pearson’s Correlation or Correlation Coefficient

The Pearson’s correlation or correlation coefficient or simply correlation  is used to find the degree of linear relationship between two continuous variables. The value for a correlation coefficient lies between 0.00 (no correlation) and 1.00 (perfect correlation). Generally, correlations above 0.80 are considered pretty high.

Remember:

  1. Correlation is interdependence of continuous variables, it does not refer to any cause and effect.
  2. Correlation is used to determine linear relattionship between variables.
  3. Draw a scatter plotbefore performing/calculating the correlation (to check the assumptions of linearity)

Procedure in SPSS

The command for correlation is found at Analyze –> Correlate –> Bivariate i.e.

Analyze-->Correlate-->Bivariate...

The Bivariate Correlations dialog box will be there:

Correlation dialog box in spss

Select one of the variables that you want to correlate in the left hand pane of the Bivariate Correlations dialog box and shift it into the Variables pane on the right hand pan by clicking the arrow button. Now click on the other variable that you want to correlate in the left hand pane and move it into the Variables pane by clicking on the arrow button

Bivariate correlation box

Output

output from correlation test

The Correlations table in output gives the values of the specified correlation tests, such as Pearson’s correlation. Each row of the table corresponds to one of the variables similarly each column also corresponds to one of the variables.

Interpretation

In example, the cell at the bottom row of the right column represents the correlation of depression with depression having the correlation equal to 1.0. Likewise the cell at the middle row of the middle column represents the correlation of anxiety with anxiety having correlation value This in in both cases shows that anxiety is related with anxiety similarly depression is related to depression, so have perfect relationship.

The cell at middle row and right column (or cell at the bottom row at the middle column) is more interesting. This cell represents the correlation of anxiety and depression (or depression with anxiety). There are three numbers in these cells.

  1. The top number is the correlation coefficient value which is 0.310.
  2. The middle number is the significance of this correlation which is 0.018.
  3. The bottom number, 46 is the number of observations that were used to calculate the correlation coefficient. between the variable of study.

Note that the significance tells us whether we would expect a correlation that was this large purely due to chance factors and not due to an actual relation. In this case, it is improbable that we would get an r (correlation coefficient) this big if there was not a relation between the variables.

 

VN:F [1.9.16_1159]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.16_1159]
Rating: 0 (from 0 votes)
Share
© 2012 itfeature.com Suffusion theme by Sayontan Sinha