Historigram (2020)

Here we will discuss the graphical representation of time series data, called historigram.

As we have discussed in the introduction to Time Series, given an observed time series, the first step in analyzing a time series is to plot the given series on a graph taking time intervals ($t$) along X-axis (as an independent variable) and the observed value ($Y_t$) on Y-axis (as the dependent variable: as a function of time). Such a graph will show various types of fluctuations and other points of interest.

A historigram is a graphical representation of a time series that reveals the changes that occurred at different time periods. The first step in the prediction (or forecast) of a time series involves an examination of the set of past observations. In this case, the historigram may be a useful tool. The construction of this involves the following steps described below:

  • Use an appropriate scale and take time $t$ along the $x$-axis as an independent variable.
  • Use an appropriate scale, and plot the observed values of variable $Y$ as a dependent variable against the given points of time.
  • Join the plotted points by line segments to get the required graphical representation.

Historigram Example

Draw a graphical representation of the data to show the population of Pakistan in various census years.

Census Year195119611972198119982017
Population (Million)33.4442.8865.3183.78130.58200.17
Historigram

R Programming Language

MCQs General Knowledge

https://itfeature.com statistics help

Time Series Introduction (2020)

Here we will discuss Time Series Data and Time Series Analysis.

The sequence $y_1,y_2,cdots, y_n$ of $n$ observations of a variable (say $Y$), recorded in accordance with their time of occurrence $t_1, t_2, cdots, t_n$, is called a time series. Symbolically, the variable $Y$ can be expressed as a function of time $t$ as

$$y = f(t) + e,$$

where $f(t)$ is a completely determined (or a specified sequence) that follows some systematic pattern of variation, and $e$ is a random error (probabilistic component) that follows an irregular pattern of variation. For example,

Signal: The signal is a systematic component of variation in a time series.

Noise: The noise is an irregular component of variation in a time series.

  • The hourly temperature recorded at a weather bureau,
  • The total annual yield of wheat over a number of years,
  • The monthly sales of fertilizer at a store,
  • The enrollment of students in various years in a college,
  • The daily sales at a departmental store, etc.

Time Series

A time series ${Y_t}$ or ${y_1,y_2,cdots,y_T}$ is a discrete-time, continuous state process where time $t=1,2,cdots,=T$ are certain discrete time points spaced at uniform time intervals.

A sequence of random variables indexed by time is called a stochastic process (stochastic means random). A data set is one possible outcome (realization) of the stochastic process. If history had been different, we would observe a different outcome, thus we can think of a time series as the outcome of a random variable.

Time Series Introduction Data Analysis

Usually, time is taken at more or less equally spaced intervals such as minutes, hours, days, months, quarters, years, etc. More specifically, it is a set of data in which observations are arranged in chronological order (A set of repeated observations of the same variable arranged according to time).

In different fields of science (such as signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering among many other fields) Time-Series-Analysis is performed.

Continuous Time Series

A time series is said to be continuous when the observation is made continuously in time. The term, continuous is used for a series of this type even when the measured variable can only take a discrete set of values.

Discrete Time Series

A time series is said to be discrete when observations are taken at specific times, usually equally spaced. The term discrete is used for a series of this type even when the measured variable is continuous.

We can write a series as ${x_1,x_2,x_3,cdots,x_T}$ or ${x_t}$, where $t=1,2,3,cdots,T$. $x_t$ is treated as a random variable. The arcane difference between time-series variables and other variables is the use of subscripts.

Time series analysis comprises methods for analyzing time-series data to extract some useful (meaningful) statistics and other characteristics of the data, while time-series forecasting is the use of a model to predict future values based on previously observed values.

The first step in analyzing time-series data is to plot the given series on a graph taking time intervals ($t$) along the $X$-axis (as an independent variable) and the observed value ($Y_t$) on the $Y$-axis (as dependent variable). Such a graph will show various types of fluctuations and other points of interest.

https://itfeature.com statistics help

R and Data Analysis

Homoscedasticity: Constant Variance of a Random Variable (2020)

The term “Homoscedasticity” is the assumption about the random variable $u$ (error term) that its probability distribution remains the same for all observations of $X$ and in particular that the variance of each $u$ is the same for all values of the explanatory variables, i.e the variance of errors is the same across all levels of the independent variables (Homoscedasticity: assumption about the constant variance of a random variable). Symbolically it can be represented as

$$Var(u) = E\{u_i – E(u)\}^2 = E(u_i)^2 = \sigma_u^2 = \mbox(Constant)$$

This assumption is known as the assumption of homoscedasticity or the assumption of constant variance of the error term $u$’s. It means that the variation of each $u_i$ around its zero means does not depend on the values of $X$ (independent) because the error term expresses the influence on the dependent variables due to

  • Errors in measurement
    The errors of measurement tend to be cumulative over time. It is also difficult to collect the data and check its consistency and reliability. So the variance of $u_i$ increases with increasing the values of $X$.
  • Omitted variables
    Omitted variables from the function (regression model) tend to change in the same direction as $X$, causing an increase in the variance of the observation from the regression line.

The variance of each $u_i$ remains the same irrespective of small or large values of the explanatory variable i.e. $\sigma_u^2$ is not a function of $X_i$ i.e $\sigma_{u_i^2} \ne f(X_i)$.

Homoscedasticity

Consequences if Homoscedasticity is not meet

If the assumption of homoscedastic disturbance (Constant Variance) is not fulfilled, the following are the Heteroscedasticity consequences:

  1. We cannot apply the formula of the variance of the coefficient to conduct tests of significance and construct confidence intervals. The tests are inapplicable $Var(\hat{\beta}_0)=\sigma_u^2 \{\frac{\sum X^2}{n \sum X^2}\}$ and $Var(\hat{\beta}_1) = \sigma_u^2 \{\frac{1}{\sum X^2}\}$
  2. If $u$ (error term) is heteroscedastic the OLS (Ordinary Least Square) estimates do not have minimum variance property in the class of Unbiased Estimators i.e. they are inefficient in small samples. Furthermore, they are inefficient in large samples (that is, asymptotically inefficient).
  3. The coefficient estimates would still be statistically unbiased even if the $u$’s are heteroscedastic. The $\hat{\beta}$’s will have no statistical bias i.e. $E(\beta_i)=\beta_i$ (coefficient’s expected values will be equal to the true parameter value).
  4. The prediction would be inefficient because the variance of prediction includes the variance of $u$ and of the parameter estimates which are not minimal due to the incidence of heteroscedasticity i.e. The prediction of $Y$ for a given value of $X$ based on the estimates $\hat{\beta}$’s from the original data, would have a high variance.
Homoscedasticity

Tests for Homoscedasticity

Some tests commonly used for testing the assumption of homoscedasticity are:

Reference:
A. Koutsoyiannis (1972). “Theory of Econometrics”. 2nd Ed.

https://itfeature.com Statistics Help

Conducting Statistical Models in R Language

MCQs Statistics Online Test 10

This quiz contains MCQs Statistics Online Test with answers covering variable and type of variable, Measures of central tendency such as mean, median, mode, Weighted mean, data and type of data, sources of data, Measures of Dispersion/ Variation, Standard Deviation, Variance, Range, etc. Let us start the MCQs Statistics Online Test for the preparation of the PPSC Statistics Lecturer Post.

1. If a constant value 5 is subtracted from each observation of a set, the variance is:

 
 
 
 

2. The measures of dispersion are changed by the change of:

 
 
 
 

3. Statistics are aggregates of

 
 
 
 

4. Statistics results are:

 
 
 
 

5. Which measure of dispersion is the least affected by extreme values?

 
 
 
 

6. Measurements usually provide:

 
 
 
 

7. Cumulative frequency is

 
 
 
 

8. The sum of absolute deviations about the median is

 
 
 
 

9. Which measure of dispersion ensures the highest degree of reliability?

 
 
 
 

10. A set of values is said to be relatively uniform if it has:

 
 
 
 

11. Data Classified by attributes are called:

 
 
 
 

12. When mean, median, and mode are identical, the distribution is:

 
 
 
 

13. The correct relationship between AM, GM, and HM is

 
 
 
 

14. If each observation of a set is divided by 10, the standard deviation of the new observation is:

 
 
 
 

15. Commodities subject to considerable price variations could best be measured by:

 
 
 
 

16. The appropriate average for calculating the average percentage increase in population is

 
 
 
 

17. Which mean is most affected by extreme values?

 
 
 
 

18. The sum of the square of the deviations about the mean is:

 
 
 
 

19. The Harmonic mean gives more weightage to:

 
 
 
 

20. The extreme values in negatively skewed distribution lie in the:

 
 
 
 

If you found that any POSTED MCQ is/ are WRONG
PLEASE COMMENT below the MCQ with the CORRECT ANSWER and its DETAILED EXPLANATION.

Don’t forget to mention the MCQs Statement (or Screenshot), because MCQs and their answers are generated randomly

Introductory statistics deals with the measure of central tendency (that includes mean (arithmetic mean, or known as average), median, mode, weighted mean, geometric mean, and Harmonic mean) and measure of dispersion (such as range, standard deviation, and variance).

Introductory statistical methods include planning and designing the study, collecting data, arranging, and numerical and graphically summarizing the collected data. Basic statistics are also used to perform different statistical analyses to draw meaningful inferences.

MCQs Statistics Online Test

A basic visual inspection of data using some graphical and also with numerical statistics may give useful hidden information that is already available in the data. The graphical representation includes a bar chart, pie chart, dot chart, box plot, etc.

Companies related to finance, communication, manufacturing, charity organizations, government institutes, simple to large businesses, etc. are all examples that have a massive interest in collecting data and measuring different sorts of statistical findings. This helps them to learn from the past, noticing the trends, and planning for the future.

MCQs Statistics Online Test

  • Statistics results are:
  • Which mean is most affected by extreme values?
  • The sum of absolute deviations about the median is
  • The sum of the square of the deviations about the mean is:
  • If a constant value 5 is subtracted from each observation of a set, the variance is:
  • Which measure of dispersion ensures the highest degree of reliability?
  • Which measure of dispersion is the least affected by extreme values?
  • Statistics are aggregates of
  • Data Classified by attributes are called:
  • Measurements usually provide:
  • The measures of dispersion are changed by the change of:
  • Cumulative frequency is
  • The appropriate average for calculating the average percentage increase in population is
  • When mean, median, and mode are identical, the distribution is:
  • Commodities subject to considerable price variations could best be measured by:
  • The extreme values in negatively skewed distribution lie in the:
  • A set of values is said to be relatively uniform if it has:
  • If each observation of a set is divided by 10, the standard deviation of the new observation is:
  • The Harmonic mean gives more weightage to:
  • The correct relationship between AM, GM, and HM is

Introduction to R Programming

Online Quizzed Website