# Bias (Statistical Bias)

Bias is defined as the difference between the expected value of a statistic and the true value of the corresponding parameter. Therefore the bias is a measure of the systematic error of an estimator. The bias indicates the distance of the estimator from the true value of the parameter. For example, if we calculate the mean of large number of unbiased estimators, we will find the correct value.

Gauss, C.F. (1821) during his work on the least squares method gave the concept of an unbiased estimator.

Bias of an estimator of a parameter should not be confused with its degree of precision as degree of precision is a measure of the sampling error.

There are several types of bias which should not be considered as mutually exclusive

• Selection Bias (arise due to systematic differences between the groups compared)
• Exclusion Bias (arise due to the systematic exclusion of certain individuals from the study)
• Analytical Bias (arise due to the way that the results are evaluated)

Mathematically Bias can be Defined as

Let statistics T used to estimate a parameter θ, if E(T)=θ + b(θ) then b(θ) is called the bias of the statistic T, where E(T) represents the expected value of the statistics T. Note that if b(θ)=0, then E(T)=θ. So T is an unbiased estimator of θ.

Reference:
Gauss, C.F. (1821, 1823, 1826). Theoria Combinationis Observationum Erroribus Minimis Obnoxiae, Parts 1, 2 and suppl. Werke 4, 1-108.

# Difference between an outlier and influential observation

Cases that do not follow the model as the rest of the data are called outliers. In Regression the cases with large residuals are candidate for outliers. So an outlier is a data point that diverges from an overall pattern in a sample. Therefore an outlier can certainly influence the relationship between the variables and  may also exert an influence on the slope of the regression line.

An outlier can be created by a shift in the location (mean) or in the scale (variability) of the process. Outlier may be due to recording errors (may be correctable), or due to the sample not being entirely from the same population. May also be due to the values from the same population but from non-normal (heavy tailed) population. i.e. Outliers may be due to incorrect specifications that are based on the wrong distributional assumptions.

An influential observation is often an outlier in the x-direction. Influential observation may arise from

1. observations that are unusually large or otherwise deviate in unusually extreme forms from the center of a reference distribution,
2. the observation may be associated with a unit that has low probability, and thus having high probability weight.
3. the observation may have a weight that is very large (relative to the weights of other units in the specified subpopulation) due to problems with stratum jumping; sampling of birth units or highly seasonal units; large nonresponse adjustment factors arising from unusually low response rates within a given adjustment cell; unusual calibration-weighting effects; or other factors.

# Difference between Common Log and Natural Log

The Logarithm of a number is the exponent by which another fixed value the base has to be raised to produce that number. For example the logarithm of 1000 to base 10 is 3as 1000=103. Logarithms were introduced by John Napier in the early 17th century for simplification of calculation and were widely adopted by scientists, engineers and others to perform computations more easily using logarithm tables. The logarithm to base b=10 is called the common logarithm and has lot of applications in science and engineering, while the natural logarithm has the constant e (2.718281828) as its base and is written as ln(x)or loge(x).

This common log is used in most of exponential scales (such as 23) in chemistry such as pH scale (for measurement of acidity and alkalinity), Richter scale (for measurement of intensity of earthquakes), and so on. It is so common that if you find no base written, you can assume it to be log x or common log.

Natural logarithm is widely used in pure mathematics specially calculus. The natural logarithm of a number x is the power to which e have to be raised to equal x. For example, ln(7.389…) is 2, because e2=7.389. The natural log of e itself (ln(e)) is 1because e1=e, while the natural logarithm of 1  (ln(1)) is 0, since e0=1.

The question is “the reason of choosing 10 is obvious, but why e=2.718…”?

The answer is that it back to 300 years or more ago to Euler (which e comes from his name). The function ex is the only function that its derivative (and consequently its integral) is itself. ( ex’ =  ex ), no other function has this characteristic. The number e could be achieved by several numerical and analytical methods, more often infinite summations. This number has more important rule in complex analysis.

Suppose you have a hundred rupees, and the interest rate is 10%, you will have Rs. 110, and the next time another 10% of Rs. 110, will raise you amount to Rs. 121, and so on…  What happens when the interest is being computed continuously (all the time)?  You might think you would soon have an infinite amount of money, but actually, you have your initial deposit times e to the power of the interest rate times the amount of time:

P=P0 ekt

where k is growth rate or interest rate and t is time period, P is Value at time t and P0 is Value it time t=0.

Intuitive explanation is: ex is the amount of continuous growth after a certain amount of time. The natural log gives you the time needed to reach a certain level of growth. That is, ex is the amount of continuous growth after a certain amount of time and natural log is the amount of time needed to reach a certain level of continuous growth.