# Basic Statistics and Data Analysis

## Matrix in Matlab: Creating and manipulating Matrices in Matlab

Matrix (a two dimensional, rectangular shaped used to store multiple elements of data in an easy accessible format) is the most basic data structure in Matlab. The elements of matrix can be numbers, characters, logical states of yes or no (true or false) or other Matlab structure types. Matlab also supports more than two dimensional data structures, referred to as arrays in Matlab. Matlab is matrix-based computing environment in which all of the data entered into Matlab is stored as as a matrix.

It is assumed in this Matlab tutorial that you know some of the basics on how to define and manipulate vectors in Matlab software. we will discuss here

## 1)  Defining/ Creating Matrices

Defining a matrix in Matlab is similar to defining a vector in Matlab. To define a matrix, treat it as a column of row vectors.
>> A=[1 2 3; 4 5 6; 7 8 9]

Note that spaces between number is used to define the elements of matrix and semi-colon is used to separate the rows of matrix A. The square brackets are used to construct matrices. The individual matrix and vectors entries can be referenced within parenthesis. For example A(2,3) represents element in second row and third column of matrix A.

Matrix in Matlab

Some example to create matrix and extract elements
>> A=rand(6, 6)
>> B=rand(6, 4)

>>A(1:4, 3) is a column vector consisting of the first four entries of the third column of A
>>A(:, 3) is the third column of A
>>A(1:4, : ) contains column  and column 4 of matrix A

Convenient matrix building Functions

eye –> identity
zeros –> matrix of zeros
ones –> matrix of ones
diag –> create or extract diagonal elements of matrix
triu –> upper triangular part of matrix
tril –> lower triangular part of matrix
rand –> randomly generated matrix
hilb –> Hilbert matrix
magic –> magic square

## 2)  Matrix Operations

Many of the mathematical operations can be applied on matrices and vectors in Matlab such as addition, subtraction, multiplication and division of matrices etc.

Matrix or Vector Multiplication

If x and y are both column vectors, then x’*y is their inner (or dot) product and x*y’ is their outer (or cross) product.

Matrix division

Let A is an invertible square matrix and b is a compatible column vector then
x = A/b is solution of A * x = b
x = b/A is solution of x * A = b

These are also called the backslash (\) and slash operators (/) also referred to as the mldivide and mrdivide.

## 3)  Matrix Functions

Matlab has a many functions used to create different kinds of matrices. Some important matrix functions used in Matlab are

eig –> eigenvalues and eigenvectors
eigs –> like eig, for large sparse matrices
chol –> cholesky factorization
svd –> singular value decomposition
svds –> like svd, for large sparse matrices
inv –> inverse of matrix
lu –> LU factorization
qr –> QR factorization
hess –> Hessenberg form
schur –> Schur decompostion
rref –> reduced row echelon form
expm –> matrix exponential
sqrtm –> matrix square root
poly –> characteristic polynomial
det –> determinant of matrix
size –> size of an array
length –> length of a vector
rank –> rank of matrix

## Binomial Random number Generation in R

We will learn here how to generate Bernoulli or Binomial distribution in R with example of flip of a coin. This tutorial is based on how to generate random numbers according to different statistical distributions in R. Our focus is in binomial random number generation in R.

We know that in Bernoulli distribution, either something will happen or not such as coin flip has to outcomes head or tail (either head will occur or head will not occur i.e. tail will occur). For unbiased coin there will be 50%  chances that head or tail will occur in the long run. To generate a random number that are binomial in R, use rbinom(n, size,prob) command.

rbinom(n, size, prob) command has three parameters, namely

where
n is number of observations
size is number of trials (it may be zero or more)
prob is probability of success on each trial for example 1/2

Some Examples

• One coin is tossed 10 times with probability of success=0.5
coin will be fair (unbiased coin as p=1/2)
>rbinom(n=10, size=1, prob=1/2)
OUTPUT: 1 1 0 0 1 1 1 1 0 1
• Two coins are tossed 10 times with probability of success=0.5
• > rbinom(n=10, size=2, prob=1/2)
OUTPUT: 2 1 2 1 2 0 1 0 0 1
• One coin is tossed one hundred thousand times with probability of success=0.5
> rbinom(n=100,000, size=1, prob=1/2)
• store simulation results in $x$ vector
> x<- rbinom(n=100,000, size=5, prob=1/2)
count 1’s in x vector
> sum(x)
find the frequency distribution
> table(x)
creates a frequency distribution table with frequency
> t=(table(x)/n *100)}
plot frequency distribution table
>plot(table(x),ylab=”Probability”,main=”size=5,prob=0.5″)

View Video tutorial on rbinom command

## Using Built in Functions in Mathematica

There are thousands of thousands of built in functions in mathematica. Knowing a few dozen of the more important will help to do lots of neat calculations. Memorizing the names of the most of the functions is not too hard as approximately all of the built in functions in mathematica follow naming convention (i.e. name of functions are related to objective of their functionality), for example, Abs function is for absolute value, Cos function is for Cosine and Sqrt is for square root of a number. The important thing than memorizing  the function names is remembering the syntax needed to use built-in function. Remembering many of built in (built-in) mathematica functions will not only make it easier to follow programs but also enhance own programming skills too.

### Some important and widely used built in functions in Mathematica are

• Sqrt[ ]:   used to find the square root of a number
• N[ ]:   used for numerical evaluation of any mathematical expression e.g. N[Sqrt[27]]
• Log[  ]: used to find the log base 10 of a number
• Sin[  ]: used to find trigonometric function Sin
• Abs[  ]: used to find the absolute value of a number

Common built in functions in Mathematica includes

1. Trignometric functions and their inverses
2. Hyperbolic functions and their inverses
3. logarithm and exponential functions

Every built-in function in Mathematica has two very important features

• All built-in function in methematica begins with Capital letters, such as for square root we use Sqrt, for inverse cosine we use ArCos built-in function.
• Square brackets are always used to surround the input or argument of a function.

For computing absolute value -12, write on command prompt Abs[-12]  instead of for example Abs(-12) or Abs{-12} etc i.e.   Abs[-12] is valid command for computing absolute value of -12.

Note that:

In mathematica single square brackets are used for input in a function, double square brackets [[ and ]] are used for lists and parenthesis ( and ) are used to group terms in algebraic expression while curly brackets { and } are used to delimit lists. The three sets of delimiters [ ], ( ), { } are used for functions, algebraic expression and list respectively.

## Introduction to Mathematica

MATHEMATICA originally created by Steven Wolfram, a product of Wolfram Research, Inc. Mathematica is available for different operating systems, such as SGI, Sun, NeXT, Mac, DOS, and Windows. This introduction to Mathematica will help you to understand its use as mathematical and programming language with numerical, symbolic and graphical calculations.

## Mathematica can be used as:

1. A calculator for arithmetic, symbolic and algebraic calculations
2. A language for developing transformation rules, so that general mathematical relationships can expressed
3. An interactive environment for exploration of numerical, symbolic and graphical calculations
4. A tool for preparing input to other programs, or to process output from other programs

## Getting Started

Starting Mathematica will open a fresh window or a notebook, where we do all mathematical calculations and do some graphics. Initially windows title is “untitled-1” which can be changed after saving the notebook by name as desired. Mathematica notebook with text, graphics, and Mathematica input and output

## Entering Expressions

Type 1+1 in notebook and press ENTER key from keyboard. You will get answer on the next line of work area. This is called evaluating or entering the expression. Note that Mathematica places “In[1]:=” and “out[1]=” (without quotation marks) labels to 1+1 and 2 respectively. You will also see set of brackets on the right side of input and output. The inner most brackets enclose the input and output while the outer bracket (larger bracket) groups the input and output together. Each bracket contains a cell. Each time you enter or change the input you will notice that the “In” and “Out” labels will also be changed.

## Basic Artihmetic

Mathematica can perform basic operation of additions (+) , subtraction (-), multiplication (*), division (/), exponentiation(^) etc. For example write the following line for basic arithmetic in Mathematica

2*3+4^2
5*6
2(3+4)
(2-3+1)(1+2/3)-5^(-1)
6!

## Using Previous Results in Mathematica

Often we need the output of first (previous) calculations in our next (coming) computation. For this purpose % symbol can be used to refer to the output of the previous cell. For example,

2^5
% + 100

Here 2^5 is added in 100.

%% refers to the result before the last results (2nd last).

## Exact vs Approximation

Mathematica can gave approximate results; when we need

3^20/2^21 produces $\frac{3486784401}{2097152}$

We can force Mathematica to approximate result in decimal by putting decimal in expression (with any digit or number) such as

3.0^20/ 2^21

For a decimal in number in an expression, Mathematica consider it to be an approximation rather than an exact number.

## Reading Creating and importing data in R Lanugage

There are many ways to read data into R-Language.  We will learn here importing data in R-Language too. We can also generate certain kind of patterned data too. Some of them are

• ## Reading data from keyboard directly

For small data (few observations) one can input data in vector form directly on R Console, such as
x<-c(1,2,3,4,5)
y<-c(‘a’, ‘b’, ‘c’)

In vector form data can be on several lines by omitting the right parentheses, until the data are complete, such as
x<-c(1,2
3,4)

Note that it is more convenient to use the scan function, which permits with the index of the next entry.

Using Scan Function

For small data set it is better to read data from console by using scan function. The data can be entered on separate line, by using single space and/or tab. After entering complete required data, pressing twice the enter key will terminate the scanning.

X<-scan()
3 4 5
4 5 6 7
2 3 4 5 6 6

Reading String Data using “what” Option

y<-scan(what=” “)
red green blue
white

The scan function can be used to import data. The scan function returns a list or a vector while read.table function returns a dataframe. It means that scan function is less useful for imputing “rectangular” type data.

• ## Reading data from ASCII or plain text file into R as dataframe

The read.table function read any type of delimited ASCII file. It can be numeric and character values. Reading data into R by read.table is easiest and most reliable method. The default delimiter is blank space.

Note that read.table command can also be used for reading data from computer disk by providing appropriate path in inverted commas such as

For missing data, read.table will did not work and you will receive an error. For missing values the easiest way to fix this error, change the type of delimiter by using sep argument to specify the delimiter.

Comma delimited files can be read in by read.table function and sep argument, but it can also be read in by the read.csv function specifically written for comma delimited files. To display the contents of the file use print function, or file name.

To read data in fixed format use read.fwf function and argument width is used to indicate the width (number of columns) for each variables. In this format variable names are not there in first line, therefore they must be added after read in the data. Variable names are added by dimnames function and the bracket notation to indicate that we are attaching names to the variables (columns) of the data file. Any how there are several different ways to to this task.

dimnames(data)[[2]]<-c(“v1”, “v2”, “v3”, “v4”, “v5″,”v6”)

• ## Importing Data

Importing data into R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS recommended package is Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing the these packages. Example of importing data are provided below.

From Excel
On windows systems you can use the RODBC package to access Excel files. The first row of excel file should contain variable/column names.

# Excel file name is myexcel and WorkSheet name is mysheet

library(RODBC)
channel <- odbcConnectExcel(“c:/myexel.xls”)
mydata <- sqlFetch(channel, “mysheet”)
odbcClose(channel)

From SPSS
# First save SPSS dataset in trasport format

get file=’c:\data.sav’.
export outfile=’c:\data.por’.

# in R
library(Hmisc)
mydata <- spss.get(“c:/data.por”, use.value.labels=TRUE)
# “use.value.labels” option converts value labels to R factors.

From SAS
# save SAS dataset in trasport format
libname out xport ‘c:/mydata.xpt’;
data out.data;
set sasuser.data;
run;

# in R
library(Hmisc)
mydata <- sasxport.get(“c:/data.xpt”)
# character variables are converted to R factors

From Stata
# input Stata file
library(foreign)

From systat
# input Systat file
library(foreign)

• ## Accessing Data in R Library

Many of the R libraries including CAR library contains data sets. For example to access the Duncan dataframe from the CAR library in R type the following command on R Console

library(car)
data(Duncan)
attach(Duncan)

## Some Important Commands for dataframes

data    #displays the entire data set on command editor
head(data)    #displays the first 6 rows of dataframe
tail(data)    #displays the last 6 rows of dataframe
str(data)    #displays the names of variable and their types
names(data)    #shows the variable names only
rename(V1,Variable1, dataFrame=data)    # renames V1 to variable 1; note that epicalc package must be installed.
ls()    #shows a list of objects that are available
attach(data)    #attached the dataframe to the R search path, which makes it easy to access variables names.