# Basic Statistics and Data Analysis

## Time Series Analysis and Forecasting

A time series $\{Y_t\}$ or $\{y_1,y_2,\cdots,y_T\}$ is a discrete time, continuous state process where time $t=1,2,\cdots,=T$ are certain discrete time points spaced at uniform time intervals.

Usually time is taken at more or less equally spaced intervals such as hour, day, month, quarter or year. More specifically, it is set of data in which observations are arranged in a chronological order (A set of repeated observations of the same variable).

Time series are used in different fields of sciences such as statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering among many other fields.

Time series analysis is the analysis of a series of data-points over time, allowing one to answer question such as what is the causal effect on a variable $Y$ of a change in variable $X$ over time? An important difference between time series and cross section data is that the ordering of cases does matter in time series.

Definition: A sequence of random variables indexed by time is called a stochastic process (stochastic means random) or time series for mere mortals. A data set is one possible outcome (realization) of the stochastic process. If history had been different, we would observe a different outcome, thus we can think of time series as the outcome of a random variable.

Rather than dealing with individuals as units, the unit of interest is time: the value of Y at time $t$ is $Y_t$. The unit of time can be anything from days to election years. The value of $Y_t$ in the previous period is called the first lag value: $Y_{t-1}$. The jth lag is denoted: $Y_{t-j}$. Similarly $Y_{t+1}$ is the value of $Y_t$ in the next period. So a simple bivariate regression equation for time series data looks like: $Y_t = \beta_0 + \beta X_t + u_t$

## Continuous Time Series

A time series is said to be continuous when observation are made continuously in time. The term continuous is used for series of this type even when the measured variable can only take a discrete set of values.

## Discrete Time Series

A time series is said to be discrete when observations are taken at specific time, usually equally spaced. The term discrete is used for series of this type even when the measured variable is continuous variable.

Most of Macroeconomic and finance data comes in form of time series. GNP or Stock Return is example fo time series data.

We can write a series as $\{x_1,x_2,x_3,\cdots,x_T\}$ or $\{x_t\}$, where $t=1,2,3,\cdots,T$. $x_t$ is treated as random variable.

Time series analysis refers to the branch of statistics where observations are collected sequentially in time, usually but not necessarily at equal spaced time points. The arcane difference between time series and other variable is use of subscript.

Time series analysis comprises methods for analyzing time series data in order to extract some useful (meaningful) statistics and other characteristics of the data, while Time series forecasting is the use of a model to predict future values based on previously observed values.

Given an observed time series, the first step in analyzing a time series is to plot the given series on a graph taking time intervals (t) along X-axis (as independent variable) and the observed value ($Y_t$) on Y-axis (as dependent variable). Such a graph will show various types of fluctuations and other point of interest.

Note

• $Y_t$ is treated as random variable. If $Y_t$ is generated by some model (Regression model for time series i.e. $Y_t=x_t\beta +\varepsilon_t$, $E(\varepsilon_t|x_t)=0$, then ordinary least square (OLS) provides a consistent estimates of $\beta$.
• Time series interchangeably used for sample $\{x_t\}$ and probability model. A possible probability model for the joint distribution of a time series $\{x_t\}$ is $x_t=\varepsilon_t$, $\varepsilon_t\sim iid N(0,\sigma_\varepsilon^2)$
• Time series are typically not iid (Independent Identically Distributed) e.g. If GNP today is unusually high, GNP tomorrow will also likely to be unusually high.