A Markov chain, named after Andrey Markov is a mathematical system that experience transitions from one state to another, between a finite or countable number of possible states. Markov chain is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of memorylessness is called the Markov property. Markov chains have many applications as statistical models of real world processes.

If the random variables $X_{n-1}$ and $X_n$ take the values $X_{n-1}=i$ and $X_n=j$, then the system has made a transition $S_i \rightarrow S_j$, that is, a transition from state $S_i$ to state $S_j$ at the $n$th trial. Note that $i$ can equal $j$, so that transitions within the same state may be possible. We need to assign probabilities to the transitions $S_i \rightarrow S_j$. Generally in chain, the probability that $X_n=j$ will depend on the whole sequence of random variables starting with the initial value $X_0$. The Markov chain has the characteristic property that the probability that $X_n=j$ depends only on the immediate previous state of the system. This means that we need no further information at each step other than for each $i$ and $j$,  $P\{X_n=j|X_{n-1}=i\}$
which means the probability that $X_n=j$ given that $X_{n-1}=i$: this probability is independent of the values of $X_{n-2},X_{n-3},\cdots, X_0$.

Let we have a set of states $S=\{s_1,s_2,\cdots,s_n\}$. The process starts in one of these states and moves successively from one state to another state. Each move is called a step. If the chain is currently in state $s_i$ then it moves to state $s_j$ at the next step with probability denoted by $p_{ij}$ (transition probability) and this probability does not depend upon which states the chain was in before the current state. The probabilities $p_{ij}$ are called transition probabilities ($s_i \xrightarrow[]{p_{ij}} s_j$ ). The process can remain in the state it is in, and this occurs in probability $p_{ii}$.

An initial probability distribution, define on $S$ specifies the starting state. Usually this is done by specifying a particular state as the starting state.

Formally a Markov chain is a sequence of random variables $X_1,X_2,\cdots,$ with the Markov property that, given the present state, the future and past state are independent. Thus
$P(X_n=x|X_1=x_1,X_2=x_2\cdots X_{n-1}=x_{n-1})$
$\quad=P(X_n=x|X_{n-1}=x_{n-1})$
Or
$P(X_n=j|X_{n-1}=i)$

## Example: Markov Chain

A Markov chain $X$ on $S=\{0,1\}$ is determined by the initial distribution given by $p_0=P(X_0=0), \; p_1=P(X_0=1)$ and the one-step transition probability given by $p_{00}=P(x_{n+1}=0|X_n=0)$, $p_{10}=P(x_{n+1}=0|X_n=1)$, $p_{01}=1-p_{00}$ and $p_{11}=1-p_{10}$, so one-step transition probability in matrix form is $P=\begin{pmatrix}p_{00}&p_{10}\\p_{01}&p_{11}\end{pmatrix}$

## References:

• https://en.wikipedia.org/wiki/Markov_chain
• http://people.virginia.edu/~rlc9s/sys6005/SYS_6005_Intro_to_MC.pdf
• http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf

# Effect Size: Introduction

An effect size is a measure of the strength of a phenomenon, conveying the estimated magnitude of a relationship without making any statement about the true relationship. Effect size measure(s) play important role in meta-analysis and statistical power analyses. So reporting effect size in thesis, reports or research reports can be considered as a good practice, especially when presenting some empirical results/ findings, because it measures the practical importance of a significant finding. In simple way we can say that effect size is a way of quantifying the size of difference between two groups.

Effect size is usually computed after rejecting the null hypothesis in statistical hypothesis testing procedure. So if the null hypothesis is not rejected (i.e. accepted) then effect size has little meaning.

There are different formulas for different statistical tests to measure the effect size. In general, effect size can be computed in two ways.

1. As the standardized difference between two means
2. As the effect size correlation (correlation between the independent variables classification and the individual scores on the dependent variable).

## Effect size for dependent sample t test

Effect size of paired sample t test (dependent sample t test) known as Cohen’s d (effect size) ranging from $-\infty$ to $\infty$ evaluated the degree measured in standard deviation units that the mean of the difference scores is equal to zero. If value of d equals 0, then it means that the difference scores is equal to zero. However larger the d value from 0, more is the effect size.

Effect size for dependent sample t test can be computed by using

$d=\frac{\overline{D}-\mu_D}{SD_D}$

Note that both the Pooled Mean (D) and standard deviation are reported in SPSS output under paired differences.

Let the effect size, d = 2.56 which means that the sample mean difference and the population mean difference are 2.56 standard deviations apart. The sign has no effect on the size of an effect i.e. -2.56 and 2.56 are equivalent effect sizes.

The d statistics can also be computed from obtained t value and number of paired observation by Ray and Shadish’s (1996) such as

$d=\frac{t}{\sqrt{N}}$

The value of d is usually categorized as small, medium and large. With cohen’s d:

• d=0.2 to 0.5 small effect
• d=0.5 to 0.8, medium effect
• d= 0.8 and higher, large effect.

## Computing Effect Size from $R^2$

Another method of computing the effect size is with r-squared ($r^2$), i.e.

$r^2=\frac{t^2}{t^2+df}$

It can be categorized in small, medium and large effect as

• $r^2=0.01$, small effect
• $r^2=0.09$, medium effect
• $r^2=0.25$, large effect.

The non‐significant results of t-test indicates that we fail to reject the hypothesis that the two conditions have equal means in the population. The larger the value of $r^2$ indicates the larger effect (effect size), while a large effect size with a non‐significant result suggests that the study should be replicated with a larger sample size.

So larger the value of effect size computed from either methods indicates a very large effect, meaning that means are likely very different.

### References:

• Ray, J. W., & Shadish, W. R. (1996). How interchangeable are different estimators of effect size? Journal of Consulting and Clinical Psychology, 64, 1316-1325. (see also “Correction to Ray and Shadish (1996)”, Journal of Consulting and Clinical Psychology, 66, 532, 1998)
• Kelley, Ken; Preacher, Kristopher J. (2012). “On Effect Size”. Psychological Methods 17 (2): 137–152. doi:10.1037/a0028086.

A statistics is a consistent estimator of a population parameter if “as the sample size increases, it becomes almost certain that the value of the statistics comes close (closer) to the value of the population parameter”. If an estimator is consistent, it becomes more reliable with large sample. All this means that the distribution of the estimates become more and more concentrated near the value of the population parameter which is being estimated, such that the probability of the estimator being arbitrarily closer to $\theta$ converges to one (sure event).

The estimator $\hat{\theta}_n$ is said to be a consistent estimator of $\theta$ if for any positive $\varepsilon$;
$limit_{n \rightarrow \infty} P[|\hat{\theta}_n-\theta| \le \varepsilon]=1$
or
$limit_{n\rightarrow \infty} P[|\hat{\theta}_n-\theta|> \varepsilon]=0]$

Here $\hat{\theta}_n$ expresses the estimator of $\theta$, calculated by using a sample size of size $n$.

The sample median is a consistent estimator of the population mean, if the population distribution is symmetrical; otherwise the sample median would approach the population median not the population mean.

The sample estimate of standard deviation is biased but consistent as the distribution of $\hat{\sigma}^2$ is becoming more and more concentrated at $\sigma^2$ as the sample size increases.

A sample statistic can be an inconsistent estimator, whereas a consistent statistic is unbiased in the limit but an unbiased estimator may or may not be consistent estimator.

Note that these two are not equivalent: (1) Unbiasedness is a statement about the expected value of the sampling distribution of the estimator, while (ii) Consistency is a statement about “where the sampling distribution of the estimator is going” as the sample size

Considering the application of regression analysis in medical sciences, Chan et al. (2006) used multiple linear regression to estimate standard liver weight for assessing adequacies of graft size in live donor liver transplantation and remnant liver in major hepatectomy for cancer. Standard liver weight (SLW) in grams, body weight (BW) in kilograms, gender (male=1, female=0), and other anthropometric data of 159 Chinese liver donors who underwent donor right hepatectomy were analyzed. The formula (fitted model)

$SLW = 218 + 12.3 \times BW + 51 \times gender$

was developed with coefficient of determination $R^2=0.48$.

These results mean that in Chinese people, on average, for each 1-kg increase of BW, SLW increases about 12.3 g, and, on average, men have a 51-g higher SLW than women. Unfortunately, SEs and CIs for the estimated regression coefficients were not reported. By means of formula 6 in there article, the SLW for Chinese liver donors can be estimated if BW and gender are known. About 50% of the variance of SLW is explained by BW and gender.

#### Reference of Article

• Chan SC, Liu CL, Lo CM, et al. (2006). Estimating liver weight of adults by body weight and gender. World J Gastroenterol 12, 2217–2222.

There are thousands of thousands of built in functions in mathematica. Knowing a few dozen of the more important will help to do lots of neat calculations. Memorizing the names of the most of the functions is not too hard as approximately all of the built in functions in mathematica follow naming convention (i.e. name of functions are related to objective of their functionality), for example, Abs function is for absolute value, Cos function is for Cosine and Sqrt is for square root of a number. The important thing than memorizing  the function names is remembering the syntax needed to use built-in function. Remembering many of built in (built-in) mathematica functions will not only make it easier to follow programs but also enhance own programming skills too.

### Some important and widely used built in functions in Mathematica are

• Sqrt[ ]:   used to find the square root of a number
• N[ ]:   used for numerical evaluation of any mathematical expression e.g. N[Sqrt[27]]
• Log[  ]: used to find the log base 10 of a number
• Sin[  ]: used to find trigonometric function Sin
• Abs[  ]: used to find the absolute value of a number

Common built in functions in Mathematica includes

1. Trignometric functions and their inverses
2. Hyperbolic functions and their inverses
3. logarithm and exponential functions

Every built-in function in Mathematica has two very important features

• All built-in function in methematica begins with Capital letters, such as for square root we use Sqrt, for inverse cosine we use ArCos built-in function.
• Square brackets are always used to surround the input or argument of a function.

For computing absolute value -12, write on command prompt Abs[-12]  instead of for example Abs(-12) or Abs{-12} etc i.e.   Abs[-12] is valid command for computing absolute value of -12.

Note that:

In mathematica single square brackets are used for input in a function, double square brackets [[ and ]] are used for lists and parenthesis ( and ) are used to group terms in algebraic expression while curly brackets { and } are used to delimit lists. The three sets of delimiters [ ], ( ), { } are used for functions, algebraic expression and list respectively.