## Type I and Type II errors in Statistics

In hypothesis testing, two types of errors can be made: Type I and Type II errors.

#### Type I and Type II Errors

• A Type I error occurs when you reject a true null hypothesis (remember that when the null hypothesis is true you hope to retain it). Type-I error is a false positive error.
α=P(type I error)=P(Rejecting the null hypothesis when it is true)
Type I error is more serious than type II error and therefore more important to avoid than a type II error.
• A Type II error occurs when you fail to reject a false null hypothesis (remember that when the null hypothesis is false you hope to reject it). Type II error is a false negative error.
$\beta$=P(type II error) = P(accepting null hypothesis when alternative hypothesis is true)
• The best way to allow yourself to set a low alpha level (i.e., to have a small chance of making a Type I error) and to have a good chance of rejecting the null when it is false (i.e., to have a small chance of making a Type II error) is to increase the sample size.
• The key to hypothesis testing is to use a large sample in your research study rather than a small sample!

If you do reject your null hypothesis, then it is also essential that you determine whether the size of the relationship is practically significant.
The hypothesis test procedure is therefore adjusted so that there is a guaranteed “low” probability of rejecting the null hypothesis wrongly; this probability is never zero.

Therefore, for type I and Type II errors remember that falsely rejecting the null hypothesis results in an error called Type-I error and falsely accepting the null hypothesis results in Type-II Error.

## Significance level in Statistics: why do researchers use 5%?

### Significance Level

The significance level in statistics is the level of probability at which it is agreed that the null hypothesis will be rejected. In academic research, usually, a 0.05 level of significance (level of significance) is used. The level of significance is also called a level of risk.

### Significance Level in Statistics

The level of significance of an event (such as a statistical test) is the probability that the event will occur by chance. If the level is quite low then the probability of occurring that event by chance will be quite small. One can say that the event is significant as its occurrence is very small.

The significance level is the probability of rejecting the null hypothesis when it is true. In other words, the significance level is the probability of making a Type-I error, which is the error of incorrectly rejecting a true null hypothesis.

#### Type I Error

It has become part of the statistical hypothesis-testing culture.

• It is a longstanding convention.
• It reflects a concern over making type I errors (i.e., wanting to avoid the situation where you reject the null when it is true, that is, wanting to avoid “false positive” errors).
• If you set the level of significance at 0.05, then you will only reject a true null hypothesis 5% of the time (i.e., you will only make a type-I error 5% of the time) in the long run.

## The trade-off between Type-I and type-II Error

The choice of significance level is a trade-off between Type-I and Type-II errors. A smaller/ lower level of significance reduces the likelihood (probability) of Type-I errors (false positives) but increases the likelihood of Type-II errors (false negatives). In other words, the chance of type-I error increases for a higher significance level but decreases the chance of type-II error.

In conclusion, the level of significance is a powerful tool that helps us to navigate the uncertainties in data analysis. By understanding the role of the significance level, one can make more wise informed decisions about the validity of research findings. In summary, the significance level is a crucial stage in the hypothesis testing procedure that helps the researchers make decisions about whether to accept or reject a null hypothesis based on the observed data.

## Estimation and Types of Estimation in Statistics

The Post is about Introduction to Estimation and Types of Estimation in the Subject of Statistics. Let us discuss Estimation and Types of Estimation in Statistics.

The procedure of making a judgment or decision about a population parameter is referred to as statistical estimation or simply estimation.  Statistical estimation procedures provide estimates of population parameters with a desired degree of confidence. The degree of confidence can be controlled in part, by the size of the sample (larger sample greater accuracy of the estimate) and by the type of estimate made. Population parameters are estimated from sample data because it is not possible (it is impracticable) to examine the entire population to make such an exact determination.

The Types of Estimation in Statistics for the estimation of the population parameter are further divided into two groups (i) Point Estimation and (ii) Interval Estimation

### Point Estimation

The objective of point estimation is to obtain a single number from the sample which will represent the unknown value of the population parameter. Population parameters (population mean, variance, etc) are estimated from the corresponding sample statistics (sample mean, variance, etc).
A statistic used to estimate a parameter is called a point estimator or simply an estimator, the actual numerical value obtained by an estimator is called an estimate.

A population parameter is denoted by $\theta$ which is an unknown constant. The available information is in the form of a random sample $x_1,x_2,\cdots,x_n$ of size $n$ drawn from the population. We formulate a function of the sample observation $x_1,x_2,\cdots,x_n$. The estimator of $\theta$ is denoted by $\hat{\theta}$. The different random sample provides different values of the statistics $\hat{\theta}$. Thus $\hat{\theta}$ is a random variable with its sampling probability distribution.

### Interval Estimation

A point estimator (such as sample mean) calculated from the sample data provides a single number as an estimate of the population parameter, which can not be expected to be exactly equal to the population parameter because the mean of a sample taken from a population may assume different values for different samples. Therefore, we estimate an interval/ range of values (set of values) within which the population parameter is expected to lie with a certain degree of confidence. This range of values used to estimate a population parameter is known as interval estimate or estimate by a confidence interval, and is defined by two numbers, between which a population parameter is expected to lie.

For example, $a<\bar{x}<b$ is an interval estimate of the population mean $\mu$, indicating that the population mean is greater than $a$ but less than $b$. The purpose of an interval estimate is to provide information about how close the point estimate is to the true parameter.

Note that the information developed about the shape of a sampling distribution of the sample mean i.e. Sampling Distribution of $\bar{x}$ allows us to locate an interval that has some specified probability of containing the population mean $\mu$.

Interval Estimate formula when $n>30$ and Population is normal $$\bar{x} \pm Z \frac{\sigma}{\sqrt{n}}$$

Interval Estimate formula when $n<30$ and Population is not normal $$\bar{x} \pm t_{(n-1, \alpha)}\,\, \frac{s}{\sqrt{n}}$$

### Which of the two types of estimation in Statistics, do you like the most, and why?

The Types of Estimation in Statistics are as follows:

• Point estimation is nice because it provides an exact point estimate of the population value. It provides you with the single best guess of the value of the population parameter.
• Interval estimation is nice because it allows you to make statements of confidence that an interval will include the true population value.

## Rules for Skewed Data Free Guide

### Introduction to Skewed Data: Lack of Symmetry

Skewness is the lack of symmetry (lack of normality) in a probability distribution. The skewness is usually quantified by the index as given below

$$s = \frac{\mu_3}{\mu_2^{3/2}}$$

where $\mu_2$ and $\mu_3$ are the second and third moments about the mean.

This index formula described above takes the value zero for a symmetrical distribution. A distribution is positively skewed when it has a longer and thin tail to the right. A distribution is negatively skewed when it has a longer thin tail to the left.

Any distribution is said to be skewed when the data points cluster more toward one side of the scale than the other. Creating such a curve that is not symmetrical.

### Skewed Data

The two general rules for Skewed Data are

1. If the mean is less than the median, the data are skewed to the left, and
2. If the mean is greater than the median, the data are skewed to the right.

Therefore, if the mean is much greater than the median the data are probably skewed to the right.

Misinterpretation of Mean and Median: The mean can be sensitive to outliers in skewed distributions and might not accurately represent the “typical” value. The median, which is the middle value when the data is ordered, can be a more robust measure of the central tendency for skewed data.

Statistical Tests: Some statistical tests assume normality (zero skewness). If the data is skewed, alternative tests or transformations might be necessary for reliable results.

### Identifying Skewed Data

There are a couple of ways to identify skewed data:

• Visual Inspection: Histograms and box plots are useful tools for visualizing the distribution of the data. Skewed distributions will show an asymmetry in the plots.
• Skewness Coefficient: This statistic measures the direction and magnitude of the skew in the distribution. A positive value indicates a positive skew, a negative value indicates a negative skew, and zero indicates a symmetrical distribution.

Online MCQs Quiz for Different Subjects