Basic Statistics and Data Analysis

Lecture notes, MCQS of Statistics

Changing the data and creating Formula in MS-Excel

Changing the data

Before writing your required formula, you need numeric data in different columns or rows of Excels’ sheet. Suppose you want to enter few numbers in a column. Before entry these number you should first confirm the cell reference where you need to enter the data. Let start by entry number in Excels’ cell A1 and A2. For this purpose follow steps given below

  1. Click on the cell A1
  2. Type 3 from keyboard
  3. Press the ENTER or DOWN ARROW key on the keyboard. You will be in Cell A2
  4. Now type say 2 from keyboard and press ENTER key

Suppose you want to add these number in Cell C1. You need to write a formula in cell C1. After writing correct formula the content of Cell C1 will immediately changes to addition of two numbers typed in A1 and A2 and used in C1 as formula content.

excel-data-and-formula

Creating Formula in MS-Excel

In Excel, each formula begins with a equal sign (=), see the picture below

excel-data-and-formula

Therefore, when creating formulas in Excel, ALWAYS start by typing the equal sign. Equal sign is typed in the Cell where you want the answer to appear. Like image above, follow these steps

  1. Click on cell C1 with ARROW keys from keyboard or with mouse pointer.
  2. Type the equal sign in cell C1.

After typing the equal sign in step 2, you have two choices for adding cell references to the spreadsheet formula. Note that cell reference is the name of cell you want to use in formula. A1 and Aexcel-data-and-formula2 are cell references of numbers 3 and 2, respectively.

  1. You can type these references in or,
  2. You can use an Excel feature called Pointing

Pointing allows you to click with your mouse on the cell contain the data or approaching to a cell reference using keyboard ARROW keys containing your data to add. This will add cell reference toexcel-data-and-formula the formula.

After typing an equal sign in cell C3 in step 2:

  1. Click on cell A1 with the mouse pointer to enter the cell reference into the formula
  2. Type a plus (+) sign. You can also use other operators such as for multiplication use you have to use * symbol, for division / symbol and for subtraction use – etc.
  3. Click on cell A2 with the mouse pointer to enter the cell reference into the formula
  4. Press the ENTER key on the keyboard

The answer 5 should appear in cell C1.

Note if you have more than one row or column of data then you need to perform calculations on each row or column cell. It is often possible to copy the first formula to other cells. The easiest way to do this is to copy formulas with the file handle.

 

See also Creating Formula in Microsoft Excel

 

Heteroscedasticity Tests and Remedies

There are a set of heteroscedasticity tests and remedies that require an assumption about the structure of the heteroscedasticity, if it exists. That is, to use these tests you must choose a specific functional form for the relationship between the error vriance and the variables that you believe determine the error variance. The major difference between these tests is the functional form that each test assumes.

Breusch-Pagan Test

The Breusch-Pagan test assumes the error variance is a linear function of one or more variables.

Harvey-Godfrey Test

The Harvey-Godfrey test assumes the error variance is an exponential function of one or more variables. The variables are usually assumed to be one or more of the explanatory variables in the regression equation.

The White Test

The white test of heteroscedasticity is a general test for the detection of heteroscdsticity existence in data set. It has the following advantages:

  1. It does not require you to specify a model of the structure of the heteroscedasticity, if it exists.
  2. It does not depend on the assumption that the errors are normally distributed.
  3. It specifically tests if the presence of heteroscedasticity causes the OLS formula for the variances and the covariances of the estimates to be incorrect.

Remedies for Heteroscedasticity

Suppose that you find the evidence of existence of heteroscedasticity. If you use the oLS estimator, you will get unbiased but inefficient estimates of the parameters of the model. Also, the estimates of the variances and covariances of the parameter estimates will be biased and inconsistent, and as a result hypothesis tests will not be valid. When there is evidence of heteroscedasticity, econometricians do one of the two things:

  • Use OLS estimator to estimate the parameters of the model. Correct the estimates of the variances and covariances of the OLS estimates so that they are consistent.
  • Use an estimator other than the OLS estimator to estimate the parameters of the model.

Many econometricians choose first alternative. This is because the most serious consequence of using the OLS estimator when there is heteroscedasticity is that the estimates of the variances and covariances of the parameter estimates are biased and inconsistent. If this problem is corrected, then the only shortcoming of using OLS is that you lose some precision relative to some other estimator that you could have used. However, to get more precise estimates with an alternative estimator, you must know the approximate structure of the heteroscedasticity. If you specify the wrong model of heteroscedasticity, then this alternative estimator can yield estimates that are worse than the OLS

Stationary Stochastic Process

Stationary Stochastic Process

A stochastic process is said to be stationary if its mean and variance are constant over time and the value of the covariance between the two time periods depends only on a distance or gap or lag between the two time periods and not the actual time at which the covariance is computed. Such a stochastic process also known as weak stationary, covariance stationary, second-order stationary or wide sense stochastic process.

In other words a sequence of random variables {$y_t$} is covariance stationary if there is no trend, and if the covariance does not change over time.

Strictly Stationary (Covariance Stationary)

A time series is strictly stationary, if all the moments of its probability distribution are invariance over time but not for first two (mean and variance).

Let $y_t$ be a stochastic time series with

$E(y_t) = \mu $    $\Rightarrow$ Mean
$V(y_t) = E(y_t -\mu)^2=\sigma^2 $  $\Rightarrow$ Variance
$\gamma_k = E[(y_t-\mu)(y_{t+k}-\mu)]$  $\Rightarrow$ Covariance = $Cov(y_t, y_{t-k})$

$\gamma_k$ is covariance or autocovariance at lag $k$.

If $k=0$ then $Var(y_t)=\sigma^2$ i.e. $Cov(y_t)=Var(y_t)=\sigma^2$

If $k=1$ then we have covariance between two adjacent value of $y$.

If $y_t$ is to be stationary, the mean, variance and autocovariance of $y_{t+m}$ (shift or origin of $y=m$) must be the same as those of $y_t$. OR

If if a time series is stationary, its mean, variance and autocovariance remain the same no matter at what point we measure them, i.e, they are time invariant.

Non-Stationary Time Series

A time series having a time-varying mean or a time varying variance or both is called non-stationary time series.

Purely Random/ White Noise Process

A stochastic process having zero mean and a constant variance ($\sigma^2$) and serially uncorrelated is called purely random/ white noise process.

If it is independent also then such a process is called strictly white noise.

White noise denoted by $\mu_t$ as $\mu_t \sim N(0, \sigma^2)$ i.e. $\mu_t$ is independently and identically distributed as a normal distribution with zero mean and constant variance.

Stationary time series is important because if a time series is non-stationary, we can study its behaviour only for the time period under consideration. Each set of time series data will therefore be for a particular episode. As consequence, it is not possible to generalize it to other time periods. Therefore, for the purpose of forecasting, such (non-stochastic) time series may be of little practical value. Our interest is in stationary time series.

 

Download PDF file of Stationary Stochastic Process:

 

Binomial Probability Distribution

A statistical experiment having successive independent trials having two possible outcomes (such as success and failure; true and false; yes and no; right and wrong etc.) and probability of success is equal for each trial, while this kind of experiment is repeated a fixed number of times (say $n$ times) is called Binomial Experiment, Each trial of this Binomial experiment is known as Bernoulli trial (a trial which is a single performance of an experiment), for example. There are four properties of Binomial Experiment.

  1. Each trial of Binomial Experiment can be classified as success or failure.
  2. The probability of success for each trial of the experiment is equal.
  3. Successive trials are independent, that is, the occurrence of one outcome in an experiment does not affect occurrence of the other.
  4. The experiment is repeated a fixed number of times.

Binomial Probability Distribution

Let a discrete random variable, which denotes the number of successes of a Binomial Experiment (we call this binomial random variable). The random variable assume isolated values as $X=0,1,2,\cdots,n$. The probability distribution of binomial random variable is termed as binomial probability distribution. It is a discrete probability distribution.

Binomial Probability Mass Function

The probability function of binomial distribution is also called binomial probability mass function and can be denoted by $b(x, n, p)$, that is, a binomial distribution of random variable $X$ with $n$ (given number of trials) and $p$ (probability of success) as parameters. If $p$ is the probability of success (alternatively $q=1-p$ is probability of failure such that $p+q=1$) then probability of exactly $x$ success can be found from the following formula,

\begin{align}
b(x, n, p) &= P(X=x)\\
&=\binom{n}{x} p^x q^{n-x}, \quad x=0,1,2, \cdots, n
\end{align}

where $p$ is probability of success of a single trial, $q$ is probability of failure and $n$ is number of independent trials.

The formula gives probability for each possible combination of $n$ and $p$ of binomial random variable $X$. Note that it does not give $P(X <0)$ and $P(X>n)$. Binomial distribution is suitable when $n$ is small and is applied when sampling done is with replacement.

\[b(x, n, p) = \binom{n}{x} p^x q^{n-x}, \quad x=0,1,2,\cdots,n,\]

is called Binomial distribution because its successive terms are exactly same as that of binomial expansion of

\begin{align}
(q+p)^n=\binom{0}{0} p^0 q^{n-0}+\binom{n}{1} p^1 q^{n-1}+\cdots+\binom{n}{n-1} p^n q^{n-(n-1)}+\binom{n}{n} p^n q^{n-n}
\end{align}

$\binom{n}{0}, \binom{n}{1}, \binom{n}{2},\cdots, \binom{n}{n-1}, \binom{n}{n}$ are called Binomial coefficients.

Note that it is necessary to describe the limit of the random variable otherwise it will be only the mathematical equation not the probability distribution.

 

Writing Excel Formulas

Writing Excel formulas is a little different than the way it is done in mathematics class. All Excel formulas starts with equal sign (=), that is, the equal sign always goes in that cell where you want the answer to appear from formula. Therefore, the equal sign informs Excel that this is formula not just a name or number. Excel formula looks like

= 3 + 2

rather than

3+2 =

Cell References in Formula

The example of formula has one drawback. If you want to change the number being calculated (3, and 2), you need to edit it or re-write the formula. A better way is to write formula in such a way that you can change the numbers without changing or re-writing the formulas themselves. To do this, cell references are used, which tells Excel that data/ numbers are located in a cell. Therefore a cell’s location/ reference in the spreadsheet is referred to as its cell reference.

To find a cell reference, simply click the cell of which you need cell reference and from NAME BOX (shown in figure below), see the text, such as F2.

Excel formula 1

F2 represents the cell in F column (horizontal position) and row 2 (vertical position). It means cells reference can also be found by reading column heading (at the top most position) of the cells and row number (at the left most position). Therefore, cell reference is a combination of the column letter and row number such as A1, B2, Z5, and A106 etc. For previous formula example, instead of writing = 3 + 2 in cell suppose (C1), follow this way of cell reference and formula writing:

In cell A1 write 3, and in cell B2 write 2. In C1 cell write the formula such as,

= A1 + A2

Excel Formula 2

Note that there is no gap between A & 1 and A & 2, they are simply A1 and A2. See the diagram for clear understanding.

Updating Excel Formula

Upon wrong cell references in Excel formula, the results from formula will be automatically updated, whenever the data value in relevant cells is changed. For example, if you want to change data in cell A1 to 8 instead of 3, you only need to change the content of A1 only. The result of formula in cell C1 will automatically be updated after the updation of data value in A1 or B1.

Note that the formula will not change because the cells references are being used instead of data values or numbers.

 

 

Copy Right © 2011 ITFEATURE.COM
error: Content is protected !!
%d bloggers like this: