Objectives of Time Series Analysis

There are many objectives related to time series analysis, objectives of time series analysis may be classified as

  1. Description
  2. Explanation
  3. Prediction
  4. Control

The description of the objectives of time series analysis are as follows:

Description

The first step in the analysis is to plot the data and obtain simple descriptive measures (such as plotting data, looking for trends,  seasonal fluctuations and so on) of the main properties of the series.

Seasonal EffectIn above figure , there is a regular seasonal pattern of price change although this price pattern is not consistent.

Graph enables to look for “wild” observations or outlier (not appear to be consistent with the rest of the data). Graphing the time series make possible the presence of turning points where upward trend suddenly changed to a downward trend. If there is turning point, different models may have to be fitted to the two parts of the series.

Explanation

Observations taken on two or more variables, making possible to use the variation in one time series to explain the variation in another series. This may lead to deeper understanding. Multiple regression model may be helpful in this case.

Prediction

Given an observed time series, one may want to predict the future values of the series. It is an important task in sales of forecasting and is the analysis of economic and industrial time series. Prediction and forecasting used interchangeably.

Control

When time series generated to measure the quality of a manufacturing process (the aim may be) to control the process. Control procedures are of several different kinds.

In quality control, the observations are plotted on control chart and the controller takes action as a result of studying the charts.

A stochastic model is fitted to the series. Future values of the series are predicted and then the input process variables are adjusted so as to keep the process on target.

 

Binomial Random number Generation in R

We will learn here how to generate Bernoulli or Binomial distribution in R with example of flip of a coin. This tutorial is based on how to generate random numbers according to different statistical distributions in R. Our focus is in binomial random number generation in R.

We know that in Bernoulli distribution, either something will happen or not such as coin flip has to outcomes head or tail (either head will occur or head will not occur i.e. tail will occur). For unbiased coin there will be 50%  chances that head or tail will occur in the long run. To generate a random number that are binomial in R, use rbinom(n, size,prob) command.

rbinom(n, size, prob) command has three parameters, namely

where
n is number of observations
size is number of trials (it may be zero or more)
prob is probability of success on each trial for example 1/2

Some Examples

  • One coin is tossed 10 times with probability of success=0.5
    coin will be fair (unbiased coin as p=1/2)
    >rbinom(n=10, size=1, prob=1/2)
    OUTPUT: 1 1 0 0 1 1 1 1 0 1
  • Two coins are tossed 10 times with probability of success=0.5
  • > rbinom(n=10, size=2, prob=1/2)
    OUTPUT: 2 1 2 1 2 0 1 0 0 1
  • One coin is tossed one hundred thousand times with probability of success=0.5
    > rbinom(n=100,000, size=1, prob=1/2)
  • store simulation results in $x$ vector
    > x<- rbinom(n=100,000, size=5, prob=1/2)
    count 1′s in x vector
    > sum(x)
    find the frequency distribution
    > table(x)
    creates a frequency distribution table with frequency
    > t=(table(x)/n *100)}
    plot frequency distribution table
    >plot(table(x),ylab=”Probability”,main=”size=5,prob=0.5″)

Download rbinom pdf file:
rbinom()28 downloads

View Video tutorial on rbinom command

Primary and Secondary Data in Statistics

Data

The facts and figures which can be numerically measured are studied in statistics. Numerical measures of same characteristic is known as observation and collection of observations is termed as data. Data are collected by individual research workers or by organization through sample surveys or experiments, keeping in view the objectives of the study. The data collected may be:

  1. Primary Data
  2. Secondary Data

Primary and Secondary Data in Statistics

The difference between primary and secondary data in Statistics is that Primary data is collected firsthand by a researcher (organization, person, authority, agency or party etc) through experiments, surveys, questionnaires, focus groups, conducting interviews and taking (required) measurements, while the secondary data is readily available (collected by someone else) and is available to the public through publications, journals and newspapers.

Primary Data

Primary data means the raw data (data without fabrication or not tailored data) which has just been collected from the source and has not gone any kind of statistical treatment like sorting and tabulation. The term primary data may sometimes be used to refer to first hand information.

Sources of Primary Data

The sources of primary data are primary units such as basic experimental units, individuals, households. Following methods are used to collect data from primary units usually and these methods depends on the nature of the primary unit. Published data and the data collected in the past is called secondary data.

  • Personal Investigation
    The researcher conducts the experiment or survey himself/herself and collected data from it. The collected data is generally accurate and reliable. This method of collecting primary data is feasible only in case of small scale laboratory, field experiments or pilot surveys and is not practicable for large scale experiments and surveys because it take too much time.
  • Through Investigators
    The trained (experienced) investigators are employed to collect the required data. In case of surveys, they contact the individuals and fill in the questionnaires after asking the required information, where a questionnaire is an inquiry form having a number of questions designed to obtain information from the respondents. This method of collecting data is usually employed by most of the organizations and its gives reasonably accurate information but it is very costly and may be time taking too.
  • Through Questionnaire
    The required information (data) is obtained by sending a questionnaire (printed or soft form) to the selected individuals (respondents) (by mail) who fill in the questionnaire and return it to the investigator. This method is relatively cheap as compared to “through investigator” method but non-response rate is very high as most of the respondents don’t bother to fill in the questionnaire and send it back to investigator.
  • Through Local Sources
    The local representatives or agents are asked to send requisite information who provide the information based upon their own experience. This method is quick but it gives rough estimates only.
  • Through Telephone
    The information may be obtained by contacting the individuals on telephone. Its a Quick and provide accurate required information.
  • Through Internet
    With the introduction of information technology, the people may be contacted through internet and the individuals may be asked to provide the pertinent information. Google survey is widely used as online method for data collection now a day. There are many paid online survey services too.

It is important to go through the primary data and locate any inconsistent observations before it is given a statistical treatment.

Secondary Data

Data which has already been collected by someone, may be sorted, tabulated and has undergone a statistical treatment. It is fabricated or tailored data.

Sources of Secondary Data

The secondary data may be available from the following sources:

  • Government Organizations
    Federal and Provincial Bureau of Statistics, Crop Reporting Service-Agriculture Department, Census and Registration Organization etc
  • Semi-Government Organization
    Municipal committees, District Councils, Commercial and Financial Institutions like banks etc
  • Teaching and Research Organizations
  • Research Journals and Newspapers
  • Internet

 

Markov Chain

A Markov chain, named after Andrey Markov is a mathematical system that experience transitions from one state to another, between a finite or countable number of possible states. Markov chain is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of memorylessness is called the Markov property. Markov chains have many applications as statistical models of real world processes.

If the random variables $X_{n-1}$ and $X_n$ take the values $X_{n-1}=i$ and $X_n=j$, then the system has made a transition $S_i \rightarrow S_j$, that is, a transition from state $S_i$ to state $S_j$ at the $n$th trial. Note that $i$ can equal $j$, so that transitions within the same state may be possible. We need to assign probabilities to the transitions $S_i \rightarrow S_j$. Generally in chain, the probability that $X_n=j$ will depend on the whole sequence of random variables starting with the initial value $X_0$. The Markov chain has the characteristic property that the probability that $X_n=j$ depends only on the immediate previous state of the system. This means that we need no further information at each step other than for each $i$ and $j$,  \[P\{X_n=j|X_{n-1}=i\}\]
which means the probability that $X_n=j$ given that $X_{n-1}=i$: this probability is independent of the values of $X_{n-2},X_{n-3},\cdots, X_0$.

Let we have a set of states $S=\{s_1,s_2,\cdots,s_n\}$. The process starts in one of these states and moves successively from one state to another state. Each move is called a step. If the chain is currently in state $s_i$ then it moves to state $s_j$ at the next step with probability denoted by $p_{ij}$ (transition probability) and this probability does not depend upon which states the chain was in before the current state. The probabilities $p_{ij}$ are called transition probabilities ($s_i  \xrightarrow[]{p_{ij}} s_j$ ). The process can remain in the state it is in, and this occurs in probability $p_{ii}$.

An initial probability distribution, define on $S$ specifies the starting state. Usually this is done by specifying a particular state as the starting state.

Formally a Markov chain is a sequence of random variables $X_1,X_2,\cdots,$ with the Markov property that, given the present state, the future and past state are independent. Thus
\[P(X_n=x|X_1=x_1,X_2=x_2\cdots X_{n-1}=x_{n-1})\]
\[\quad=P(X_n=x|X_{n-1}=x_{n-1})\]
Or
\[P(X_n=j|X_{n-1}=i)\]

Example: Markov Chain

A Markov chain $X$ on $S=\{0,1\}$ is determined by the initial distribution given by $p_0=P(X_0=0), \; p_1=P(X_0=1)$ and the one-step transition probability given by $p_{00}=P(x_{n+1}=0|X_n=0)$, $p_{10}=P(x_{n+1}=0|X_n=1)$, $p_{01}=1-p_{00}$ and $p_{11}=1-p_{10}$, so one-step transition probability in matrix form is $P=\begin{pmatrix}p_{00}&p_{10}\\p_{01}&p_{11}\end{pmatrix}$

References:

  • https://en.wikipedia.org/wiki/Markov_chain
  • http://people.virginia.edu/~rlc9s/sys6005/SYS_6005_Intro_to_MC.pdf
  • http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf

Download pdf file:

 

Effect Size for Dependent Sample t test

Effect Size: Introduction

An effect size is a measure of the strength of a phenomenon, conveying the estimated magnitude of a relationship without making any statement about the true relationship. Effect size measure(s) play important role in meta-analysis and statistical power analyses. So reporting effect size in thesis, reports or research reports can be considered as a good practice, especially when presenting some empirical results/ findings, because it measures the practical importance of a significant finding. In simple way we can say that effect size is a way of quantifying the size of difference between two groups.

Effect size is usually computed after rejecting the null hypothesis in statistical hypothesis testing procedure. So if the null hypothesis is not rejected (i.e. accepted) then effect size has little meaning.

There are different formulas for different statistical tests to measure the effect size. In general, effect size can be computed in two ways.

  1. As the standardized difference between two means
  2. As the effect size correlation (correlation between the independent variables classification and the individual scores on the dependent variable).

Effect size for dependent sample t test

Effect size of paired sample t test (dependent sample t test) known as Cohen’s d (effect size) ranging from $-\infty$ to $\infty$ evaluated the degree measured in standard deviation units that the mean of the difference scores is equal to zero. If value of d equals 0, then it means that the difference scores is equal to zero. However larger the d value from 0, more is the effect size.

Effect size for dependent sample t test can be computed by using

\[d=\frac{\overline{D}-\mu_D}{SD_D}\]

Note that both the Pooled Mean (D) and standard deviation are reported in SPSS output under paired differences.

Let the effect size, d = 2.56 which means that the sample mean difference and the population mean difference are 2.56 standard deviations apart. The sign has no effect on the size of an effect i.e. -2.56 and 2.56 are equivalent effect sizes.

The d statistics can also be computed from obtained t value and number of paired observation by Ray and Shadish’s (1996) such as

\[d=\frac{t}{\sqrt{N}}\]

The value of d is usually categorized as small, medium and large. With cohen’s d:

  • d=0.2 to 0.5 small effect
  • d=0.5 to 0.8, medium effect
  • d= 0.8 and higher, large effect.

Computing Effect Size from $R^2$

Another method of computing the effect size is with r-squared ($r^2$), i.e.

\[r^2=\frac{t^2}{t^2+df}\]

It can be categorized in small, medium and large effect as

  • $r^2=0.01$, small effect
  • $r^2=0.09$, medium effect
  • $r^2=0.25$, large effect.

The non‐significant results of t-test indicates that we fail to reject the hypothesis that the two conditions have equal means in the population. The larger the value of $r^2$ indicates the larger effect (effect size), while a large effect size with a non‐significant result suggests that the study should be replicated with a larger sample size.

So larger the value of effect size computed from either methods indicates a very large effect, meaning that means are likely very different.

References:

  • Ray, J. W., & Shadish, W. R. (1996). How interchangeable are different estimators of effect size? Journal of Consulting and Clinical Psychology, 64, 1316-1325. (see also “Correction to Ray and Shadish (1996)”, Journal of Consulting and Clinical Psychology, 66, 532, 1998)
  • Kelley, Ken; Preacher, Kristopher J. (2012). “On Effect Size”. Psychological Methods 17 (2): 137–152. doi:10.1037/a0028086.

Download Effect Size pdf file: