Sampling and Non Sampling Errors

Before Differentiating the Sampling and Non Sampling Errors, let us define the Error term first.

The difference between an estimated value and the population’s true value is called an error. Since a sample estimate is used to describe a characteristic of a population. A sample being only a part of the population cannot provide a perfect representation of the population, no matter how carefully the sample is selected. Generally, it is seen that an estimate is rarely equal to the true value and we may think about how close will the sample estimate be to the population’s true value.

Two Kinds of Errors: Sampling and Non Sampling Errors

There are two kinds of errors, namely (I) Sampling Errors and (II) Non Sampling Errors

Sampling and Non Sampling Errors
  1. Sampling Errors (random error)
  2. Non-Sampling Errors (non-random errors)

  1. Sampling Errors
    A Sampling Error
    is the difference between the value of a statistic obtained from an observed random sample and the value of the corresponding population parameter being estimated. Let $T$ be the sample statistic used to estimate the population parameter, the sampling error denoted by $E$  is  $E = T −\theta$. The value of Sampling Errors reveals the precision of the estimate. The smaller the sampling error, the greater will be the precision of the estimate. The sampling error can be reduced:

    i)   By increasing the sample size
    ii)  By improving the sampling design
    iii) By using the supplementary information

  2. Non Sampling Error
    The errors that are caused by sampling the wrong population of interest and by response bias, as well as those made by an investigator in collecting analysis and reporting the data, are all classified as non-sampling errors or non-random errors. These errors are present in a complete census as well as in the sampling survey.

Learn R Programming Language

Statistics help, sampling and non sampling error

Common Log and Natural Log

Difference between Common Log and Natural Log

In this post, we will learn about the difference between Common Log and Natural Log.

The Logarithm of a number is the exponent by which another fixed value of the base has to be raised to produce that number. For example, the logarithm of 1000 to base 10 is 3 as 1000=103. Logarithms were introduced by John Napier in the early 17th century for simplification of calculation and were widely adopted by scientists, engineers, and others to perform computations more easily using logarithm tables. The logarithm to base b=10 is called the common logarithm and has a lot of applications in science and engineering, while the natural logarithm has the constant e (2.718281828) as its base and is written as $ln(x)$ or $log_e (x)$.

This common log is used in most of the exponential scales (such as 23) in chemistry such as pH scale (for measurement of acidity and alkalinity), Richter scale (for measurement of the intensity of earthquakes), and so on. It is so common that if you find no base written, you can assume it to be $log\, x$ or common log.

Common Log and Natural Log

The natural logarithm is widely used in pure mathematics, especially calculus. The natural logarithm of a number x is the power to which $e$ has to be raised to equal x. For example, ln(7.389…) is 2, because e2=7.389. The natural log of e itself (ln(e)) is 1 because $e^1=e$, while the natural logarithm of $1$ (ln(1))$ is 0, since $e^0=1$.

The question is “The reason for choosing 10 is obvious, but why $e=2.718…$”?

The answer is that it back to 300 years or more ago to Euler (which $e$ comes from his name). The function $e^x$ is the only function that its derivative (and consequently its integral) is itself. ($ex’ =  ex$), no other function has this characteristic. The number e could be achieved by several numerical and analytical methods, more often infinite summations. This number has a more important rule in complex analysis.

Suppose you have a hundred rupees, and the interest rate is 10%, you will have Rs. 110, and the next time another 10% of Rs. 110, will raise your amount to Rs. 121, and so on…  What happens when the interest is being computed continuously (all the time)?  You might think you will soon have an infinite amount of money, but actually, you have your initial deposit times e to the power of the interest rate times the amount of time:

$$P=P_0 e^{kt}$$

where k is the growth rate or interest rate t is time period, $P$ is the Value at time $t$, and $P_0$ is the Value at time $t=0$.

The intuitive explanation is: ex is the amount of continuous growth after a certain amount of time. The natural log gives you the time needed to reach a certain level of growth. That is, $e^x$ is the amount of continuous growth after a certain amount of time and a natural log is the amount of time needed to reach a certain level of continuous growth.

Learn more about Natural Logarithms

R Frequently Asked Questions

Statistics Help Common Logarithm

Cumulative Frequency Distribution and Polygon (2012)

Introduction to Cumulative Frequency Distribution

A cumulative frequency distribution (cumulative frequency curve or ogive) and a cumulative frequency polygon require cumulative frequencies. The cumulative frequency is denoted by CF and for a class interval it is obtained by adding the frequency of all the preceding classes including that class. It indicates the total number of values less than or equal to the upper limit of that class. For comparing two or more distributions, relative cumulative frequencies or percentage cumulative frequencies are computed.

The relative cumulative frequencies are the proportions of the cumulative frequency denoted by CRF and are obtained by dividing the cumulative frequency by the total frequency (Total number of Observations). The CRF of a class can also be obtained by adding the relative frequencies (rf) of the preceding classes including that class. Multiplying the relative frequencies by 100 gives the corresponding percentage cumulative frequency of a class.

Method of Construction of Cumulative Frequencies

The method of construction of cumulative frequencies and cumulative relative frequencies is explained in the following table:

Cumulative Frequency Distribution

Plot a Cumulative Frequency Distribution

To plot a CF distribution, scale the upper limit of each class along the x-axis and the corresponding cumulative frequencies along the y-axis. For additional information, you can label the vertical axis on the left in units and the vertical axis on the right in percent. The cumulative frequencies are plotted along the y-axis against upper or lower-class boundaries and the plotted points are joined by a straight line. Cumulative Frequency Polygon can be used to calculate median, quartiles, deciles, and percentiles, etc.

Data Visualization in R Programming Language

Cumulative Frequency Distribution Ogive
Cumulative Frequency Polygon or Ogive
Cumulative Frequency distribution and Frequency polygon

Inverse Regression Analysis or Calibration (2012)

In most regression problems we have to determine the value of $Y$  corresponding to a given value of $X$. The inverse of this problem is also called inverse regression analysis or calibration.

Inverse Regression Analysis

For inverse regression analysis, let the known values represented by matrix $X$ and their corresponding values by vector $Y$, which both form a simple linear regression model. Let, there is an unknown value of $X$, such as $X_0$, which cannot be measured and we observe the corresponding value of $Y$, say $Y_0$. Then, $X_0$ can be estimated and a confidence interval for $X_0$ can be obtained.

In regression analysis, we want to investigate the relationship between variables. Regression has many applications, which occur in many fields: engineering, economics, the physical and chemical sciences, management, biological sciences, and social sciences. We only consider the simple linear regression model, which is a model with one regressor $X$ that has a linear relationship with a response $Y$. It is not always easy to measure the regressor $X$ or the response $Y$.

Let us consider a typical example of this problem. If $X$ is the concentration of glucose in certain substances, then a spectrophotometric method is used to measure the absorbance. This absorbance depends on the concentration of $X$. The response $Y$ is easy to measure with the spectrophotometric method, but the concentration, on the other hand, is not easy to measure. If we have $n$ known concentrations, then the absorbance can be measured.

If there is a linear relation between $Y$ and $X$, then a simple linear regression model can be made with these data. Suppose we have an unknown concentration, that is difficult to measure, but we can measure the absorbance of this concentration. Is it possible to estimate this concentration with the measured absorbance? This is called the calibration problem or inverse regression Analysis.

Suppose, we have a linear model $Y=\beta_0+\beta_1X+e$ and we have an observed value of the response $Y$, but we do not have the corresponding value of $X$. How can we estimate this value of $X$? The two most important methods to estimate $X$ are the classical method and the inverse method.

The classical method of inverse regression analysis is based on the simple linear regression model

$Y=\hat{\beta}_0+\hat{\beta}_1X+\varepsilon,$   where $\varepsilon \tilde N(0, \, \sigma^2)$

where the parameters $\hat{beta}_0$ and $\hat{beta}_1$ are estimated by Least Squares as $\beta_0$ and $\beta_1$. At least two of the $n$ values of $X$ have to be distinct, otherwise, we cannot fit a reliable regression line. For a given value of $X$, say $X_0$ (unknown), a $Y$ value, say $Y_0$ (or a random sample of $k$ values of $Y$) is observed at the $X_0$ value. For inverse regression analysis, the problem is to estimate $X_0$. The classical method uses a $Y_0$ value (or the mean of $k$ values of $Y_0$) to estimate $X_0$, which is then estimated by $\hat{x_0}=\frac{\hat{Y_0}-\hat{\beta_0}} {\hat{\beta_1}}$.

scatter with regression line: Inverse Regression Analysis

The inverse estimator is the simple linear regression of $X$ on $Y$. In this case, we have to fit the model

\[X=a_0+a_1Y+e, \text{where }\, N(0, \sigma^2)\]

to obtain the estimator. Then the inverse estimator of $X_0$

\[X_0=a_0+a_1Y+e\]

Important Considerations when performing Inverse Regression

  • Inverse regression can be statistically challenging, especially when the errors are mainly in the independent variables (which become the dependent variables in the inverse model).
  • It is not a perfect replacement for traditional regression, and the assumptions underlying the analysis may differ.
  • In some cases, reverse regression, which treats both variables as having errors, might be a more suitable approach.

In summary, inverse regression is a statistical technique that flips the roles of the independent and dependent variables in a regression model.

Learn R Language Programming