Time Series Introduction (2020)

Here we will discuss Time Series Data and Time Series Analysis.

The sequence $y_1,y_2,cdots, y_n$ of $n$ observations of a variable (say $Y$), recorded in accordance with their time of occurrence $t_1, t_2, cdots, t_n$, is called a time series. Symbolically, the variable $Y$ can be expressed as a function of time $t$ as

$$y = f(t) + e,$$

where $f(t)$ is a completely determined (or a specified sequence) that follows some systematic pattern of variation, and $e$ is a random error (probabilistic component) that follows an irregular pattern of variation. For example,

Signal: The signal is a systematic component of variation in a time series.

Noise: The noise is an irregular component of variation in a time series.

  • The hourly temperature recorded at a weather bureau,
  • The total annual yield of wheat over a number of years,
  • The monthly sales of fertilizer at a store,
  • The enrollment of students in various years in a college,
  • The daily sales at a departmental store, etc.

Time Series

A time series ${Y_t}$ or ${y_1,y_2,cdots,y_T}$ is a discrete-time, continuous state process where time $t=1,2,cdots,=T$ are certain discrete time points spaced at uniform time intervals.

A sequence of random variables indexed by time is called a stochastic process (stochastic means random). A data set is one possible outcome (realization) of the stochastic process. If history had been different, we would observe a different outcome, thus we can think of a time series as the outcome of a random variable.

Time Series Introduction Data Analysis

Usually, time is taken at more or less equally spaced intervals such as minutes, hours, days, months, quarters, years, etc. More specifically, it is a set of data in which observations are arranged in chronological order (A set of repeated observations of the same variable arranged according to time).

In different fields of science (such as signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering among many other fields) Time-Series-Analysis is performed.

Continuous Time Series

A time series is said to be continuous when the observation is made continuously in time. The term, continuous is used for a series of this type even when the measured variable can only take a discrete set of values.

Discrete Time Series

A time series is said to be discrete when observations are taken at specific times, usually equally spaced. The term discrete is used for a series of this type even when the measured variable is continuous.

We can write a series as ${x_1,x_2,x_3,cdots,x_T}$ or ${x_t}$, where $t=1,2,3,cdots,T$. $x_t$ is treated as a random variable. The arcane difference between time-series variables and other variables is the use of subscripts.

Time series analysis comprises methods for analyzing time-series data to extract some useful (meaningful) statistics and other characteristics of the data, while time-series forecasting is the use of a model to predict future values based on previously observed values.

The first step in analyzing time-series data is to plot the given series on a graph taking time intervals ($t$) along the $X$-axis (as an independent variable) and the observed value ($Y_t$) on the $Y$-axis (as dependent variable). Such a graph will show various types of fluctuations and other points of interest.

https://itfeature.com statistics help

R and Data Analysis

Homoscedasticity: Constant Variance of a Random Variable (2020)

The term “Homoscedasticity” is the assumption about the random variable $u$ (error term) that its probability distribution remains the same for all observations of $X$ and in particular that the variance of each $u$ is the same for all values of the explanatory variables, i.e the variance of errors is the same across all levels of the independent variables (Homoscedasticity: assumption about the constant variance of a random variable). Symbolically it can be represented as

$$Var(u) = E\{u_i – E(u)\}^2 = E(u_i)^2 = \sigma_u^2 = \mbox(Constant)$$

This assumption is known as the assumption of homoscedasticity or the assumption of constant variance of the error term $u$’s. It means that the variation of each $u_i$ around its zero means does not depend on the values of $X$ (independent) because the error term expresses the influence on the dependent variables due to

  • Errors in measurement
    The errors of measurement tend to be cumulative over time. It is also difficult to collect the data and check its consistency and reliability. So the variance of $u_i$ increases with increasing the values of $X$.
  • Omitted variables
    Omitted variables from the function (regression model) tend to change in the same direction as $X$, causing an increase in the variance of the observation from the regression line.

The variance of each $u_i$ remains the same irrespective of small or large values of the explanatory variable i.e. $\sigma_u^2$ is not a function of $X_i$ i.e $\sigma_{u_i^2} \ne f(X_i)$.

Homoscedasticity

Consequences if Homoscedasticity is not meet

If the assumption of homoscedastic disturbance (Constant Variance) is not fulfilled, the following are the Heteroscedasticity consequences:

  1. We cannot apply the formula of the variance of the coefficient to conduct tests of significance and construct confidence intervals. The tests are inapplicable $Var(\hat{\beta}_0)=\sigma_u^2 \{\frac{\sum X^2}{n \sum X^2}\}$ and $Var(\hat{\beta}_1) = \sigma_u^2 \{\frac{1}{\sum X^2}\}$
  2. If $u$ (error term) is heteroscedastic the OLS (Ordinary Least Square) estimates do not have minimum variance property in the class of Unbiased Estimators i.e. they are inefficient in small samples. Furthermore, they are inefficient in large samples (that is, asymptotically inefficient).
  3. The coefficient estimates would still be statistically unbiased even if the $u$’s are heteroscedastic. The $\hat{\beta}$’s will have no statistical bias i.e. $E(\beta_i)=\beta_i$ (coefficient’s expected values will be equal to the true parameter value).
  4. The prediction would be inefficient because the variance of prediction includes the variance of $u$ and of the parameter estimates which are not minimal due to the incidence of heteroscedasticity i.e. The prediction of $Y$ for a given value of $X$ based on the estimates $\hat{\beta}$’s from the original data, would have a high variance.
Homoscedasticity

Tests for Homoscedasticity

Some tests commonly used for testing the assumption of homoscedasticity are:

Reference:
A. Koutsoyiannis (1972). “Theory of Econometrics”. 2nd Ed.

https://itfeature.com Statistics Help

Conducting Statistical Models in R Language

Important MCQs Statistics Online Test 10

This quiz contains MCQs Statistics Online Test with answers covering variable and type of variable, Measures of central tendency such as mean, median, mode, Weighted mean, data and type of data, sources of data, Measures of Dispersion/ Variation, Standard Deviation, Variance, Range, etc. Let us start the MCQs Statistics Online Test for the preparation of the PPSC Statistics Lecturer Post.

1. If a constant value 5 is subtracted from each observation of a set, the variance is:

 
 
 
 

2. A set of values is said to be relatively uniform if it has:

 
 
 
 

3. The correct relationship between AM, GM, and HM is

 
 
 
 

4. Statistics results are:

 
 
 
 

5. The sum of absolute deviations about the median is

 
 
 
 

6. Which measure of dispersion ensures the highest degree of reliability?

 
 
 
 

7. Which measure of dispersion is the least affected by extreme values?

 
 
 
 

8. If each observation of a set is divided by 10, the standard deviation of the new observation is:

 
 
 
 

9. Measurements usually provide:

 
 
 
 

10. Commodities subject to considerable price variations could best be measured by:

 
 
 
 

11. The Harmonic mean gives more weightage to:

 
 
 
 

12. The extreme values in negatively skewed distribution lie in the:

 
 
 
 

13. Cumulative frequency is

 
 
 
 

14. The appropriate average for calculating the average percentage increase in population is

 
 
 
 

15. Data Classified by attributes are called:

 
 
 
 

16. Statistics are aggregates of

 
 
 
 

17. Which mean is most affected by extreme values?

 
 
 
 

18. When mean, median, and mode are identical, the distribution is:

 
 
 
 

19. The measures of dispersion are changed by the change of:

 
 
 
 

20. The sum of the square of the deviations about the mean is:

 
 
 
 


If you found that any POSTED MCQ is/ are WRONG
PLEASE COMMENT below the MCQ with the CORRECT ANSWER and its DETAILED EXPLANATION.

Don’t forget to mention the MCQs Statement (or Screenshot), because MCQs and their answers are generated randomly

Introductory statistics deals with the measure of central tendencies (that include mean (arithmetic mean, or known as average), median, mode, weighted mean, geometric mean, and Harmonic mean) and measure of dispersion (such as range, standard deviation, and variance).

Introductory statistical methods include planning and designing the study, collecting data, arranging, and numerical and graphically summarizing the collected data. Basic statistics are also used to perform different statistical analyses to draw meaningful inferences.

MCQs Statistics Online Test

A basic visual inspection of data using some graphical and also with numerical statistics may give useful hidden information that is already available in the data. The graphical representation includes a bar chart, pie chart, dot chart, box plot, etc.

Companies related to finance, communication, manufacturing, charity organizations, government institutes, simple to large businesses, etc. are all examples that have a massive interest in collecting data and measuring different sorts of statistical findings. This helps them to learn from the past, noticing the trends, and planning for the future.

MCQs Statistics Online Test

  • Statistics results are:
  • Which mean is most affected by extreme values?
  • The sum of absolute deviations about the median is
  • The sum of the square of the deviations about the mean is:
  • If a constant value 5 is subtracted from each observation of a set, the variance is:
  • Which measure of dispersion ensures the highest degree of reliability?
  • Which measure of dispersion is the least affected by extreme values?
  • Statistics are aggregates of
  • Data Classified by attributes are called:
  • Measurements usually provide:
  • The measures of dispersion are changed by the change of:
  • Cumulative frequency is
  • The appropriate average for calculating the average percentage increase in population is
  • When mean, median, and mode are identical, the distribution is:
  • Commodities subject to considerable price variations could best be measured by:
  • The extreme values in negatively skewed distribution lie in the:
  • A set of values is said to be relatively uniform if it has:
  • If each observation of a set is divided by 10, the standard deviation of the new observation is:
  • The Harmonic mean gives more weightage to:
  • The correct relationship between AM, GM, and HM is

Introduction to R Programming

Online Quizzed Website

The Z-Score Definition, Formula, Real Life Examples (2020)

Z-Score Definition: The Z-Score also referred to as standardized raw scores (or simply standard score) is a useful statistic because not only permits to computation of the probability (chances or likelihood) of the raw score (occurring within normal distribution) but also helps to compare two raw scores from different normal distributions. The Z score is a dimensionless measure since it is derived by subtracting the population mean from an individual raw score and then this difference is divided by the population standard deviation. This computational procedure is called standardizing raw score, which is often used in the Z-test of testing of hypothesis.

Any raw score can be converted to a Z-score formula by

$$Z-Score=\frac{raw score – mean}{\sigma}$$

Z-Score Real Life Examples

Example 1: If the mean = 100 and standard deviation = 10, what would be the Z-score of the following raw score

Raw ScoreZ Scores
90$ \frac{90-100}{10}=-1$
110$ \frac{110-100}{10}=1$
70$ \frac{70-100}{10}=-3$
100$ \frac{100-100}{10}=0$

Note that: If Z-Score,

  • has a zero value then it means that the raw score is equal to the population mean.
  • has a positive value then it means that the raw score is above the population mean.
  • has a negative value then it means that the raw score is below the population mean.
The Z-Score Definition, Formula, Real Life Examples

Example 2: Suppose you got 80 marks in an Exam of a class and 70 marks in another exam of that class. You are interested in finding that in which exam you have performed better. Also, suppose that the mean and standard deviation of exam-1 are 90 and 10 and in exam-2 60 and 5 respectively. Converting both exam marks (raw scores) into the standard score, we get

$Z_1=\frac{80-90}{10} = -1$

The Z-score results ($Z_1=-1$) show that 80 marks are one standard deviation below the class mean.

$Z_2=\frac{70-60}{5}=2$

The Z-score results ($Z_2=2$) show that 70 marks are two standard deviations above the mean.

From $Z_1$ and $Z_2$ means that in the second exam, students performed well as compared to the first exam. Another way to interpret the Z score of $-1$ is that about 34.13% of the students got marks below the class average. Similarly, the Z Score of 2 implies that 47.42% of the students got marks above the class average.

Application of Z Score

  • Identifying Outliers: The standard score can help in identifying the outliers in a dataset. By looking for data points with very high negative or positive z-scores, one can easily flag potential outliers that might warrant further investigation.
  • Comparing Data Points from Different Datasets: Z-scores allow us to compare data points from different datasets because these scores are expressed in standard deviation units.
  • Standardizing Data for Statistical Tests: Some statistical tests require normally distributed data. The Zscore can be used to standardize data (transforming it to have a mean of 0 and a standard deviation of 1), making it suitable for such tests.

Limitation of ZScores

  • Assumes Normality: The Zscores are most interpretable when the data is normally distributed (a bell-shaped curve). If the data is significantly skewed, the scores might be less informative.
  • Sensitive to Outliers: The presence of extreme outliers can significantly impact the calculation of the mean and standard deviation, which in turn, affects the standard score of all data points.

In conclusion, z-scores are a valuable tool for understanding the relative position of a data point within its dataset. The standard score offers a standardized way to compare data points, identify outliers, and prepare data for statistical analysis. However, it is important to consider the assumptions of normality and the potential influence of outliers when interpreting the z-scores.

Read about Standard Normal Table

Visit Online MCQs Website: gmstat.com