Markov Chain an Introduction (2014)

A Markov chain, named after Andrey Markov is a mathematical system that experiences transitions from one state to another, between a finite or countable number of possible states. Markov chain is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of memorylessness is called the Markov property. Markov chains have many applications as statistical models of real-world processes.

If the random variables $X_{n-1}$ and $X_n$ take the values $X_{n-1}=i$ and $X_n=j$, then the system has made a transition $S_i \rightarrow S_j$, that is, a transition from state $S_i$ to state $S_j$ at the $n$th trial. Note that $i$ can equal $j$, so transitions within the same state may be possible. We need to assign probabilities to the transitions $S_i \rightarrow S_j$. Generally in the chain, the probability that $X_n=j$ will depend on the whole sequence of random variables starting with the initial value $X_0$.

The Markov chain has the characteristic property that the probability that $X_n=j$ depends only on the immediately previous state of the system. This means that we need no further information at each step other than for each $i$ and $j$,  \[P\{X_n=j|X_{n-1}=i\}\]
which means the probability that $X_n=j$ given that $X_{n-1}=i$: this probability is independent of the values of $X_{n-2},X_{n-3},\cdots, X_0$.

Let us have a set of states $S=\{s_1,s_2,\cdots,s_n\}$. The process starts in one of these states and moves successively from one state to another. Each move is called a step. If the chain is currently in state $s_i$ then it moves to state $s_j$ at the next step with a probability denoted by $p_{ij}$ (transition probability) and this probability does not depend upon which states the chain was in before the current state. The probabilities $p_{ij}$ are called transition probabilities ($s_i  \xrightarrow[]{p_{ij}} s_j$ ). The process can remain in its state, and this occurs in probability $p_{ii}$.

https://itfeature.com

An initial probability distribution, defined on $S$ specifies the starting state. Usually, this is done by specifying a particular state as the starting state.

A Markov chain is a sequence of random variables $X_1, X_2,\cdots,$ with the Markov property that, given the present state, the future and past state are independent. Thus
\[P(X_n=x|X_1=x_1,X_2=x_2\cdots X_{n-1}=x_{n-1})\]
\[\quad=P(X_n=x|X_{n-1}=x_{n-1})\]
Or
\[P(X_n=j|X_{n-1}=i)\]

Example: Markov Chain

A Markov chain $X$ on $S=\{0,1\}$ is determined by the initial distribution given by $p_0=P(X_0=0), \; p_1=P(X_0=1)$ and the one-step transition probability given by $p_{00}=P(x_{n+1}=0|X_n=0)$, $p_{10}=P(x_{n+1}=0|X_n=1)$, $p_{01}=1-p_{00}$ and $p_{11}=1-p_{10}$, so one-step transition probability in matrix form is $P=\begin{pmatrix}p_{00}&p_{10}\\p_{01}&p_{11}\end{pmatrix}$

Markov Chain

Markov chains are a powerful tool for modeling various random processes. However, it’s important to remember that they assume the Markov property, which may not always hold true in real-world scenarios.

Applications of Markov Chains

  • Information Theory: Used in data compression algorithms like Huffman coding.
  • Search Algorithms: Applied in recommender systems and website navigation analysis.
  • Queueing Theory: Helps model customer arrivals and service times in queues.
  • Financial Modeling: Financial analysts can use Markov chains to model stock prices or economic trends.
  • Game Design: Markov chains can be used to create video games with more realistic and interesting behavior for non-player characters.
  • Predictive Text: Smartphones that suggest the next word you are typing use a kind of Markov chain, where the probability of the next word depends on the current word.
  • Modeling weather: Markov chains can be used to represent the probabilities of transitioning between different weather states.
https://itfeature.com

References

  • https://en.wikipedia.org/wiki/Markov_chain
  • http://people.virginia.edu/~rlc9s/sys6005/SYS_6005_Intro_to_MC.pdf
  • http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf

Computer MCQs Online Test

Learn R Programming

Effect Size Definition, Formula, Interpretation (2014)

Effect Size Definition

The Effect Size definition: An effect size is a measure of the strength of a phenomenon, conveying the estimated magnitude of a relationship without making any statement about the true relationship. Effect size measure(s) play an important role in meta-analysis and statistical power analyses. So reporting effect size in thesis, reports or research reports can be considered as a good practice, especially when presenting some empirical results/ findings because it measures the practical importance of a significant finding. Simply, we can say that effect size is a way of quantifying the size of the difference between the two groups.

Effect size is usually computed after rejecting the null hypothesis in a statistical hypothesis testing procedure. So if the null hypothesis is not rejected (i.e. accepted) then effect size has little meaning.

There are different formulas for different statistical tests to measure the effect size. In general, the effect size can be computed in two ways.

  1. As the standardized difference between two means
  2. As the effect size correlation (correlation between the independent variables classification and the individual scores on the dependent variable).

The Effect Size Dependent Sample T-test

The effect size of paired sample t-test (dependent sample t-test) known as Cohen’s d (effect size) ranging from $-\infty$ to $\infty$ evaluated the degree measured in standard deviation units that the mean of the difference scores is equal to zero. If the value of d equals 0, then it means that the difference scores are equal to zero. However larger the d value from 0, the more the effect size.

Effect Size Formula for Dependent Sample T-test

The effect size for the dependent sample t-test can be computed by using

\[d=\frac{\overline{D}-\mu_D}{SD_D}\]

Note that both the Pooled Mean (D) and standard deviation are reported in SPSS output under paired differences.

Let the effect size, $d = 2.56$ which means that the sample means difference and the population mean difference is 2.56 standard deviations apart. The sign does not affect the size of an effect i.e. -2.56 and 2.56 are equivalent effect sizes.

The $d$ statistics can also be computed from the obtained $t$ value and the number of paired observations by Ray and Shadish (1996) such as

\[d=\frac{t}{\sqrt{N}}\]

The value of $d$ is usually categorized as small, medium, and large. With Cohen’s $d$:

  • d=0.2 to 0.5 small effect
  • d=0.5 to 0.8, medium effect
  • d= 0.8 and higher, large effect.

Calculating Effect Size from $R^2$

Another method of computing the effect size is with r-squared ($r^2$), i.e.

\[r^2=\frac{t^2}{t^2+df}\]

Effect size is categorized into small, medium, and large effects as

  • $r^2=0.01$, small effect
  • $r^2=0.09$, medium effect
  • $r^2=0.25$, large effect.
Effect Size Definition Dependent t test

The non‐significant results of the t-test indicate that we failed to reject the hypothesis that the two conditions have equal means in the population. A larger value of $r^2$ indicates the larger effect (effect size), while a large effect size with a non‐significant result suggests that the study should be replicated with a larger sample size.

So larger value of effect size computed from either method indicates a very large effect, meaning that means are likely very different.

Choosing the Right Effect Size Measure

The appropriate effect size measure depends on the type of analysis being conducted (for example, correlation, group comparison, etc.) and the scale measurement of the data (continuous, binary, nominal, ration, interval, ordinal, etc.). It is always a good practice to report both effect size and statistical significance (p-value) to provide a more complete picture of your findings.

In conclusion, effect size is a crucial concept in interpreting statistical results. By understanding and reporting effect size, one can gain a deeper understanding of the practical significance of the research findings and contribute to a more comprehensive understanding of the field of study.

References:

  • Ray, J. W., & Shadish, W. R. (1996). How interchangeable are different estimators of effect size? Journal of Consulting and Clinical Psychology, 64, 1316-1325. (see also “Correction to Ray and Shadish (1996)”, Journal of Consulting and Clinical Psychology, 66, 532, 1998)
  • Kelley, Ken; Preacher, Kristopher J. (2012). “On Effect Size”. Psychological Methods 17 (2): 137–152. doi:10.1037/a0028086.

Learn more about Effect Size Definition and Statistical Significance

R Language Basics

Consistent Estimator: Easy Learning

Statistics is a consistent estimator of a population parameter if “as the sample size increases, it becomes almost certain that the value of the statistics comes close (closer) to the value of the population parameter”. If an estimator (statistic) is considered consistent, it becomes more reliable with a large sample ($n \to \infty$). All this means that the distribution of the estimates becomes more and more concentrated near the value of the population parameter that is being estimated, such that the probability of the estimator being arbitrarily closer to $\theta$ converges to one (sure event).

Consistent Estimator

The estimator $\hat{\theta}_n$ is said to be a consistent estimator of $\theta$ if for any positive $\varepsilon$;
\[limit_{n \rightarrow \infty} P[|\hat{\theta}_n-\theta| \le \varepsilon]=1\]
or
\[limit_{n\rightarrow \infty} P[|\hat{\theta}_n-\theta|> \varepsilon]=0]\]

Here $\hat{\theta}_n$ expresses the estimator of $\theta$, calculated by using a sample size of size $n$.

Consistent Estimator
  • The sample median is a consistent estimator of the population mean if the population distribution is symmetrical; otherwise, the sample median would approach the population median, not the population mean.
  • The sample estimate of standard deviation is biased but consistent as the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases.
  • A sample statistic can be an inconsistent estimator, whereas a consistent statistic is unbiased in the limit but an unbiased estimator may or may not be consistent.

Note that these two are not equivalent: (1) Unbiasedness is a statement about the expected value of the sampling distribution of the estimator, while (2) Consistency is a statement about “where the sampling distribution of the estimator is going” as the sample size.

A consistent estimate has insignificant (non-significant) errors (variations) as sample sizes increase indefinitely. More specifically, the probability that those errors will vary by more than a given amount approaches zero as the sample size increases. In other words, the more data you collect, the more consistent the estimator will be with the real population parameter you’re trying to measure. The sample mean ($\overline{X}$) and sample variance ($S^2$) are two well-known consistent estimators.

Statistics Help

R Language Lectures

Application of Regression in Medical: A Quick Guide (2024)

The application of Regression cannot be ignored, as regression is a powerful statistical tool widely used in medical research to understand the relationship between variables. It helps identify risk factors, predict outcomes, and optimize treatment strategies.

Considering the application of regression analysis in medical sciences, Chan et al. (2006) used multiple linear regression to estimate standard liver weight for assessing adequacies of graft size in live donor liver transplantation and remnant liver in major hepatectomy for cancer. Standard liver weight (SLW) in grams, body weight (BW) in kilograms, gender (male=1, female=0), and other anthropometric data of 159 Chinese liver donors who underwent donor right hepatectomy were analyzed. The formula (fitted model)

 \[SLW = 218 + 12.3 \times BW + 51 \times gender\]

 was developed with a coefficient of determination $R^2=0.48$.

Application of Regression Analysis

These results mean that in Chinese people, on average, for each 1-kg increase of BW, SLW increases about 12.3 g, and, on average, men have a 51-g higher SLW than women. Unfortunately, SEs and CIs for the estimated regression coefficients were not reported. Using Formula 6 in their article, the SLW for Chinese liver donors can be estimated if BW and gender are known. About 50% of the variance of SLW is explained by BW and gender.

The regression analysis helps in:

  • Identifying risk factors: Determine which factors contribute to the development of a disease (For example, gender, age, smoking, and blood pressure for heart disease).
  • Predicting disease occurrence: Estimate the likelihood of a patient developing a disease based on specific risk factors. for example, logistic regression is used to predict the risk of diabetes based on factors like BMI, age, and family history.

The following types of regression models are widely used in medical sciences:

  • Linear regression: Used when the outcome variable is continuous (e.g., blood pressure, cholesterol levels).
  • Logistic regression: Used when the outcome variable is binary (e.g., disease present/absent, survival/death).
  • Cox proportional hazards regression: Used for survival analysis (time to event data)

 Some other related articles (Application of Regression Analysis in Medical Sciences)

Reference of Article

  • Chan SC, Liu CL, Lo CM, et al. (2006). Estimating liver weight of adults by body weight and gender. World J Gastroenterol 12, 2217–2222.

R Programming Lectures