Coefficient of Determination: Model Selection (2012)

$R^2$ pronounced R-squared (Coefficient of determination) is a useful statistic to check the regression fit value. $R^2$ measures the proportion of total variation about the mean $\bar{Y}$ explained by the regression. R is the correlation between $Y$ and $\hat{Y}$ and is usually the multiple correlation coefficient. The coefficient of determination ($R^2$) can take values as high as 1 or  (100%) when all the values are different i.e. $0\le R^2\le 1$.

Coefficient of Determination

When repeat runs exist in the data the value of $R^2$ cannot attain 1, no matter how well the model fits, because no model can explain the variation in the data due to the pure error. A perfect fit to data for which $\hat{Y}_i=Y_i$, $R^2=1$. If $\hat{Y}_i=\bar{Y}$, that is if $\beta_1=\beta_2=\cdots=\beta_{p-1}=0$ or if a model $Y=\beta_0 +\varepsilon$ alone has been fitted, then $R^2=0$. Therefore we can say that $R^2$ is a measure of the usefulness of the terms other than $\beta_0$ in the model.

Note that we must be sure that an improvement/ increase in $R^2$ value due to adding a new term (variable) to the model under study should have some real significance and is not because the number of parameters in the model is getting else to saturation point. If there is no pure error $R^2$ can be made unity.

\begin{align*}
R^2 &= \frac{\text {SS due to regression given}\, b_0}{\text{Total SS corrected for mean} \, \bar{Y}} \\
&= \frac{SS \, (b_1 | b_0)}{S_{YY}} \\
&= \frac{\sum(\hat{Y_i}-\bar{Y})^2} {\sum(Y_i-\bar{Y})^2}r \\
&= \frac{S^2_{XY}}{(S_{XY})(S_{YY})}
\end{align*}

where summation are over $i=1,2,\cdots, n$.

Coefficient of Determination
Coefficient of Determination

Interpreting R-Square $R^2$ does not indicate whether:

  • the independent variables (explanatory variables) are a cause of the changes in the dependent variable;
  • omitted-variable bias exists;
  • the correct regression was used;
  • the most appropriate set of explanatory variables has been selected;
  • there is collinearity (or multicollinearity) present in the data;
  • the model might be improved using transformed versions of the existing explanatory variables.

Learn more about

https://itfeature.com

What is Pseudo Random Process (2012)

Pseudo Random Process

A pseudo random refers to a process that generates a sequence of numbers or events that appears random but actually, is not and is determined by a fixed set of rules. Pseudorandom sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process. Such a process is easier to produce than a genuinely random one and has the benefit that it can be used again and again to produce the same numbers and they are useful for testing and fixing software.

The generation of random numbers has many uses (mostly in Statistics, Random Sampling, and Simulation, Computer Modeling, Markov Chains, and Experimental Design). Before modern computing, researchers requiring random numbers would either generate them through various means like a coin, dice, cards, roulette wheels, card shuffling, etc., or use existing random number tables.

Pseudo Random Process

A pseudo-random variable is a variable that is created by a deterministic procedure (often a computer program or subroutine is used) which (generally) takes random bits as input. The pseudo random string will typically be longer than the original random string, but less random (less entropic, in the information-theory sense). This can be useful for randomized algorithms.

Pseudo-random numbers are computer-generated random numbers and they are not truly random because there is an inherent pattern in any sequence of pseudo numbers.

A question arises here why do we use something that is not truly random? The reasons behind the use of pseudo random process are:

  • Speed and Efficiency: Generating pseudo-random numbers is much faster and more efficient than using true random sources like physical processes.
  • Reproducibility: Using the same seed, one can reproduce the same sequence of pseudo-random numbers. which is useful for debugging or comparing simulations.

Read more about Random Number Process, Pseudo-Random Number Generation, and Linear Congruential Generator (LCG)

Read more about Pseudo-Random Number Generator

Generate Binomial Random Numbers in R

Linear Congruential Generator (LCG)

A linear congruential generator (LCG) is an old algorithm that results in a sequence of pseudo-randomized numbers. Though, the algorithm of linear congruential generator is the oldest but best-known pseudorandom number generator method.

The building block of a simulation study is the ability to generate random numbers where a random number represents the value of a random variable uniformly distributed on (0,1).

The recurrence relation defines the generator:

\[X_{i+1}=(aX_i+C) \text{ Modulo } m\]

where $a$ and $m$ are given positive integers, $X_i$ is either $0,1, \dots, m-1$ and quantity $\frac{X_i}{m}$ is pseudo random number.

Conditions for Linear Congruential Generator

Some conditions are:

  1. $m>0$;  $m$ is usually large
  2. $0<a<m$;  ($a$ is the multiplier)
  3. $0\le c<m$ ($c$ is the increment)
  4. $0\le X_0 <m$ ($X_0$ is seed value or starting value)
  5. $c$ and $m$ are relatively prime numbers (there is no common factor between $c$ and $m$).
  6. $a-1$ is a multiple of every prime factor $m$
  7. $a-1$ is multiple of 4 if $m$ is multiple of 4
Linear Congruential Generator
Source: https://en.wikipedia.org/wiki/Linear_congruential_generator

“Two modulo-9 LCGs show how different parameters lead to different cycle lengths. Each row shows the state evolving until it repeats. The top row shows a generator with $m=9, a=2, c=0$, and a seed of 1, which produces a cycle of length 6. The second row is the same generator with a seed of 3, which produces a cycle of length 2. Using $a=4$ and $c=1$ (bottom row) gives a cycle length of 9 with any seed in [0, 8]. “

If  $c=0$, the generator is often called a multiplicative congruential method, or Lehmer RNG. If $c\neq0$ the generator is called a mixed congruential generator.

FAQs about linear congruential generator (LCG)

  1. What is meant by a linear congruential generator?
  2. How are random numbers generated?
  3. What are the conditions for linear congruential generators?
  4. What is meant by a multiplicative or mixed congruential generator?
statistics help: https://itfeature.com

Read more about the pseudo-random process and Random number Generation

Read from Wikipedia about Linear Congruential Generator (LCG)

Statistical Simulation: Introduction and Issues (2012)

In this article, you will learn about statistical simulation introduction, use in various fields, and issues.

Simulation is used before an existing system is altered or a new system is built, to reduce the chances of failure to meet specifications, eliminate unforeseen bottlenecks, prevent under or over-utilization of resources, and optimize system performance. Simulation is used in many contexts, such as simulation of technology for performance optimization, safety engineering, training testing, education, and video games. Often, computer experiments are used to study simulation models. Models are simulated versions/results.

Uses of Statistical Simulations

Statistical simulations are widely used in many fields:

  • Science: Scientists use statistical simulations to model complex systems, such as the climate or the spread of disease.
  • Business: Businesses use statistical simulations to forecast sales, evaluate the risks of new investments, and design logistics networks.
  • Government: Governments use simulations to model the effects of economic policies, assess the risks of natural disasters, and plan for future events.
  • Gambling: Casinos use simulations to design games that are fair and profitable.

Statistical Simulation depends on unknown (or external/ impositions/ factors) parameters and statistical tools depend on estimates. In statistics, simulation is used to assess the performance of a method, typically when there is a lack of theoretical background. With simulations, the statistician knows and controls the truth.

Monte Carlo Simulation Application: Statistical Simulations

Statistical Assumptions about Simulated Data

In simulation, data is generated artificially to test out a hypothesis or statistical method. Whenever a new statistical method is developed (or used), some assumptions need to be tested and verified (or confirmed). Statisticians use simulated data to test these assumptions.

  • The simulation follows finite sample properties (have to specify $n$)
  • The reasoning of statistical simulation can’t be proofed mathematically)
  • Simulation is used to illustrate things.
  • Simulation is used to check the validity of methods.
  • Simulation is a technique of representing the real world via a computer program.
  • A simulation is an act of initiating the behavior of some situation or some process utilizing something suitably analogous. (especially for study or some personal training)
  • A simulation is a representation of something (usually on a smaller scale).
  • Simulation is the act of giving a false/artificial appearance.

In summary, statistical simulation is a technique used to imitate the behavior of a system or process under various conditions. It involves creating a computer model of the system and running the model repeatedly with different inputs. The outputs of the model are then analyzed to learn about the behavior of the real system.

Statistical Simulation

Issues In Statistical Simulation

  • What distribution does the random variable have?
  • How do we generate these random variables for simulation?
  • How do we analyze the output of simulations?
  • How many simulation runs do we need?
  • How do we improve the efficiency of the simulation?

FAQS about Statistical Simulations

  1. What is meant by simulation in statistics?
  2. What random data is generated using simulation?
  3. What are the uses of simulations?
  4. What are the issues in Statistical simulations?
  5. What are statistical assumptions about generated data?

See more about Statistical Simulation

Introduction to R Programming

Online MCQs Test Website