# Basic Statistics and Data Analysis

## Point Estimation of Parameters

The objective of point estimation of parameters is to obtain a single number from the sample which will represent the unknown value of the parameter.

Practically we did not know about the population mean and standard deviation i.e population parameters such as mean, standard deviation etc. But  our goal is to measure (estimate) the mean and standard deviation of population we are interested from sample information to save time, cost etc.  This can be done by estimating the sample mean and standard deviation as a best guess for the true population mean and standard deviation.  We can call this estimate as “best guess” and termed as a “point estimateas it a single number summarized one.

A Point Estimate is a statistic (a statistical measure from sample) that gives a plausible estimate (or possible a best guess) for the value in question.

$\overline{x}$ is a point estimate for $\mu$ and s is a point estimate for $\sigma$.

Or we can say that

A statistic used to estimate a parameter is called a point estimator or simply an estimator. The actual numerical value which we obtain for an estimator in a given problem is called an estimate.

Generally symbol $\theta$ (unknown constant) is used to denote a population parameter which may be a proportion, mean or some measure of variability. The available information is in the form of a random sample $X_1,X_2,\cdots, X_n$ of size n drawn from the population. We wish to formulate a function of the sample observations $X_1,X_2,\cdots,X_n$; that is, we look for a statistic such that its value computed from the sample data would reflect the value of the population parameter as closely as possible. The estimator of $\theta$ is commonly denoted by $\hat{\theta}$. Different random samples usually provide different values of the statistic $\hat{\theta}$ having its own sampling distribution.

Note that Unbiasedness, Efficiency, Consistency and Sufficiency are the criteria (statistical properties of estimator) to identify that whether a statistic is “good” estimator.

## Application of Point Estimator Confidence Intervals

We can build interval with confidence as we are not only interested in finding the point estimate for the mean, but also determining how accurate the point estimate is. Here the Central Limit Theorem plays a very important role in building confidence interval.  We assume that the sample standard deviation is close to the population standard deviation (which will almost always be true for large samples). The standard deviation of the sampling distribution of estimator (here for mean) is

$\sigma_x \approx \frac{\sigma}{\sqrt{n}}$

Our interest is to find an interval around $\overline{x}$ such that there is a large probability that the actual (true) mean falls inside the computed interval.  This interval is called a confidence interval and the large probability is called the confidence level.

Example

Suppose that we check for clarity in 50 locations in Lake and discover that the average depth of clarity of the lake is 14 feet with a standard deviation of 2 feet.  What can we conclude about the average clarity of the lake with a 95% confidence level?

Solution

variable x (depth of lack at 50 location) can be used to provide a point estimate for $\mu$ and s to provide a point estimate for s. To answer how accurate is x as a point estimate, we can construct a 95% confidence interval for $\mu$ as follows.

Draw the picture like given below and use the standard normal table to find the z-score associated to the probability of .025 (there is .025 to the left and .025 to the right i.e. two tailed case).

z-score for 95% confidence level is about ±1.96.

\begin{align*}
Z&=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\\
\pm 1.96&=\frac{\overline{x}-\mu}{\frac{2}{\sqrt{n}}}\\
\overline{x}-14&=\pm 0.5488
\end{align*}

Note that $Z\frac{\sigma}{\sqrt{n}}$ is called the margin of error.

The 95% confidence interval for the mean clarity will be (13.45, 14.55)

In other words there is a 95% chance that the mean clarity is between 13.45 and 14.55.

In general if z is the standard normal table value associated with given level of confidence then a $\alpha$% confidence interval for the mean is

$\overline{x} \pm Z_{\alpha}\frac{\sigma}{\sqrt{n}}$

# Estimation: Point and Interval Estimation

## Estimation

The procedure of making judgement or decision about a population parameter is referred to as statistical estimation or simply estimation.  Statistical estimation procedures provide estimates of population parameter with a desired degree of confidence. The degree of confidence can be controlled in part, (i) by the size the sample (larger sample greater accuracy of the estimate) and (ii) by the type of the estimate made. Population parameters are estimated from sample data because it is not possible (it is impracticable) to examine the entire population in order to make such an exact determination.The statistical estimation of population parameter is further divided into two types, (i) Point Estimation and (ii) Interval Estimation

## Point Estimation

The objective of  point estimation is to obtain a single number from the sample which will represent the unknown value of the population parameter. Population parameters (population mean, variance etc) are estimated from the corresponding sample statistics (sample mean, variance etc).
A statistic used to estimate a parameter is called a point estimator or simply an estimator, the actual numerical value obtained by estimator is called an estimate.
Population parameter is denoted by θ which is unknown constant. The available information is in the form of a random sample x1,x2, … , xn of size n drawn from the population. We formulate a function of the sample observation x1,x2, … , xn. The estimator of θ is denoted by $\hat{\theta}$. Different random sample provide different values of the statistics $\hat{\theta}$. Thus $\hat{\theta}$ is a random variable with its own sampling probability distribution.

## Interval Estimation

A point estimator (such as sample mean) calculated from the sample data provides a single number as an estimate of the population parameter, which can not be expected to be exactly equal to the population parameter because the mean of a sample taken from a population may assume different values for different samples. Therefore we estimate an interval/ range  of values (set of values) within which the population parameter is expected to lie with a certain degree of confidence. This range of values used to estimate a population parameter is known as interval estimate or estimate by confidence interval, and is defined by two numbers, between which a population parameter is expected to lie. For example, $a<\bar{x}<b$ is an interval estimate of the population mean μ, indicating that the population mean is greater than a but less than b. The purpose of an interval estimate is to provide information about how close the point estimate is to the true parameter.

Note that the information developed about the shape of a sampling distribution of the sample mean i.e. Sampling Distribution of $\bar{x}$ allows us to locate an interval that has some specified probability of containing the population mean $\mu$.

## Which of the two types of estimation do you like the most, and why?

• Point estimation is nice because it provides an exact point estimate of the population value. It provides you with the single best guess of the value of the population parameter.
•  Interval estimation is nice because it allows you to make statements of confidence that an interval will include the true population value.