# Basic Statistics and Data Analysis

## Binomial Random number Generation in R

We will learn here how to generate Bernoulli or Binomial distribution in R with example of flip of a coin. This tutorial is based on how to generate random numbers according to different statistical distributions in R. Our focus is in binomial random number generation in R.

We know that in Bernoulli distribution, either something will happen or not such as coin flip has to outcomes head or tail (either head will occur or head will not occur i.e. tail will occur). For unbiased coin there will be 50%  chances that head or tail will occur in the long run. To generate a random number that are binomial in R, use rbinom(n, size,prob) command.

rbinom(n, size, prob) command has three parameters, namely

where
n is number of observations
size is number of trials (it may be zero or more)
prob is probability of success on each trial for example 1/2

Some Examples

• One coin is tossed 10 times with probability of success=0.5
coin will be fair (unbiased coin as p=1/2)
>rbinom(n=10, size=1, prob=1/2)
OUTPUT: 1 1 0 0 1 1 1 1 0 1
• Two coins are tossed 10 times with probability of success=0.5
• > rbinom(n=10, size=2, prob=1/2)
OUTPUT: 2 1 2 1 2 0 1 0 0 1
• One coin is tossed one hundred thousand times with probability of success=0.5
> rbinom(n=100,000, size=1, prob=1/2)
• store simulation results in $x$ vector
> x<- rbinom(n=100,000, size=5, prob=1/2)
count 1’s in x vector
> sum(x)
find the frequency distribution
> table(x)
creates a frequency distribution table with frequency
> t=(table(x)/n *100)}
plot frequency distribution table
>plot(table(x),ylab=”Probability”,main=”size=5,prob=0.5″)

View Video tutorial on rbinom command

## Reading Creating and importing data in R Lanugage

There are many ways to read data into R-Language.  We will learn here importing data in R-Language too. We can also generate certain kind of patterned data too. Some of them are

• ## Reading data from keyboard directly

For small data (few observations) one can input data in vector form directly on R Console, such as
x<-c(1,2,3,4,5)
y<-c(‘a’, ‘b’, ‘c’)

In vector form data can be on several lines by omitting the right parentheses, until the data are complete, such as
x<-c(1,2
3,4)

Note that it is more convenient to use the scan function, which permits with the index of the next entry.

Using Scan Function

For small data set it is better to read data from console by using scan function. The data can be entered on separate line, by using single space and/or tab. After entering complete required data, pressing twice the enter key will terminate the scanning.

X<-scan()
3 4 5
4 5 6 7
2 3 4 5 6 6

Reading String Data using “what” Option

y<-scan(what=” “)
red green blue
white

The scan function can be used to import data. The scan function returns a list or a vector while read.table function returns a dataframe. It means that scan function is less useful for imputing “rectangular” type data.

• ## Reading data from ASCII or plain text file into R as dataframe

The read.table function read any type of delimited ASCII file. It can be numeric and character values. Reading data into R by read.table is easiest and most reliable method. The default delimiter is blank space.

Note that read.table command can also be used for reading data from computer disk by providing appropriate path in inverted commas such as

For missing data, read.table will did not work and you will receive an error. For missing values the easiest way to fix this error, change the type of delimiter by using sep argument to specify the delimiter.

Comma delimited files can be read in by read.table function and sep argument, but it can also be read in by the read.csv function specifically written for comma delimited files. To display the contents of the file use print function, or file name.

To read data in fixed format use read.fwf function and argument width is used to indicate the width (number of columns) for each variables. In this format variable names are not there in first line, therefore they must be added after read in the data. Variable names are added by dimnames function and the bracket notation to indicate that we are attaching names to the variables (columns) of the data file. Any how there are several different ways to to this task.

dimnames(data)[[2]]<-c(“v1”, “v2”, “v3”, “v4”, “v5″,”v6”)

• ## Importing Data

Importing data into R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS recommended package is Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing the these packages. Example of importing data are provided below.

From Excel
On windows systems you can use the RODBC package to access Excel files. The first row of excel file should contain variable/column names.

# Excel file name is myexcel and WorkSheet name is mysheet

library(RODBC)
channel <- odbcConnectExcel(“c:/myexel.xls”)
mydata <- sqlFetch(channel, “mysheet”)
odbcClose(channel)

From SPSS
# First save SPSS dataset in trasport format

get file=’c:\data.sav’.
export outfile=’c:\data.por’.

# in R
library(Hmisc)
mydata <- spss.get(“c:/data.por”, use.value.labels=TRUE)
# “use.value.labels” option converts value labels to R factors.

From SAS
# save SAS dataset in trasport format
libname out xport ‘c:/mydata.xpt’;
data out.data;
set sasuser.data;
run;

# in R
library(Hmisc)
mydata <- sasxport.get(“c:/data.xpt”)
# character variables are converted to R factors

From Stata
# input Stata file
library(foreign)

From systat
# input Systat file
library(foreign)

• ## Accessing Data in R Library

Many of the R libraries including CAR library contains data sets. For example to access the Duncan dataframe from the CAR library in R type the following command on R Console

library(car)
data(Duncan)
attach(Duncan)

## Some Important Commands for dataframes

data    #displays the entire data set on command editor
head(data)    #displays the first 6 rows of dataframe
tail(data)    #displays the last 6 rows of dataframe
str(data)    #displays the names of variable and their types
names(data)    #shows the variable names only
rename(V1,Variable1, dataFrame=data)    # renames V1 to variable 1; note that epicalc package must be installed.
ls()    #shows a list of objects that are available
attach(data)    #attached the dataframe to the R search path, which makes it easy to access variables names.

## Using R as Calculator

In Windows Operating system, The R installer will have created an icon for R on desktop and a Start Menu item. Double click the R icon to start the R Program; R will open the console, to type the R commands.

The greater than sing (>) in console is the prompt symbol. In this tutorial we will use R language as calculator (we will be Using R as Calculator for mathematical expressions), by typing some simple mathematical expressions at the prompt (>). Anything that can be computed on a pocket calculator can also be computed at the R prompt. After entering the expression on prompt, you have to press the Enter key from keyboard to execute the command. Some examples using R as calculator are as follows

> 1 + 2   #add two or more numbers
> 1 – 2   #abstracts two or more numbers
> 1 * 2   #multiply two or more numbers
> 1 / 2   #divides two more more numbers
> 1%/ %2   #gives the integer part of the quotient
> 2 ^ 1   #gives exponentiation
> 31 %% 7   #gives the remainder after division

These operators also works fine for complex numbers.

Upon pressing the enter key, the result of expression will appear, prefixed by a number in square bracket:
> 1 + 2
[1] 54

The [1] indicates that this is the first result from the command.

Some advance calculations that are available in scientific calculators can also be easily done in R for example

> sqrt(5)   #Square Root of a number
> log(10)   #Natural log of a number
> sin(45)   #Trignometric function (sin function)
> pi   #pi value 3.141593
> exp(2)   #Antilog, e raised to a power
> log10(5)   #Log of a number base 10
> factorial(5)   #Factorial of a number e.g 5!
> abs(1/-2)   #Absolute values of a number
> 2*pi/360   #Number of radian in one Babylonian degree of a circle

Remember R prints all very large or very small numbers in scientific notation.

R language also make use of parentheses for grouping operations to follow the rules for order of operations. for example

> 1-2/3   #It first computes 2/3 and then subtract it from 1
> (1-2)/3   #It first computes (1-2) and then divide it by 3

R recognizes certain goofs, like trying to divide by zero, missing values in data etc.

> 1/0   #Undefined, R tells it a infinity (Inf)
> 0/0   #Not a number (NaN)
> “one”/2   #Strings or character is divided by a number

## Getting help in R Language

R Language has a very useful and advance help system, which help the R user to understand the R language and let him to know how programming should be done in R language.

For getting help in R language you need to click Help button on the toolbar of RGui (R Graphical User Interface) windows. If you have internet access on you PC you can type CRAN in Google and search for the help you need at CRAN.

## Use of ? for Help

On the other hand, if you know the name of the function (you want help), you need to type question mark (?) followed by the name of the required function on the R command line prompt. For example to get help about “lm” function type ?lm and then press enter key from keyboard.
help(lm) or ?lm have same search results in R language.

## help.start()

Getting General help in R write the following command at R command prompt
help.start()

## Use of help.search()

Sometimes it is difficult to remember the precise name of the function, but you know the subject on which you need help for example data input. Use the help.search function (without question mark) with your query in double quotes like this:
help.search(“data input”)
Press Enter key you will see the names of the R functions associated with the query.  After that you can easily use ?lm in getting help in R.

## Use of find(“”)

Getting help in R, find and apropos are also useful functions. The find function tells you what package something is in: for example

find(“cor”) gives output that the cor in stats package.

## Use of apropos()

The apropos function return a character vector giving the names of all objects in the search list that match your inquiry (potentially partial) i.e. This command list all functions containing your string. for example
apropos(“lm”)
will gave the list of all functions containing string lm

## Use of example()

example(lm) will provide an example of your required function such as lm

There is huge amount of information about R on the web. On CRAN you will find variety of help/ manuals. There are also answers to FAQs (Frequently Asked Questions) and R News (contains interesting articles, books reviews and news of forthcoming releases. Search facility of site allows you to investigate the contents of the R documents, functions, and searchable mail archives.

## Help Manuals and Archived Mailing lists {RSiteSearch()}

You can search your required function or string in help manuals and archived mailing lists by using

## get vignettes

vignette is an R jargon for documentation, and are written in the spirit of sharing knowledge, and
assisting new users in learning the purpose and use of a package. To get some help in R try ?vignette. Vignettes are optional supplemental documentation, that’s why not all packages come with vignettes.
vignette()             will show available vingettes
vignette(“foo”)    will show specific vignette

Now you have learned about getting help in R, now you can continue with the other R tutorials. It is possible that you do not understand something discussed in the coming R tutorials. If this happens then you should use the built in help system before going to the internet. In most of the cases, the help system of R Language will give you enough information about the required function that you have searched for.

## Some Source of R Help/ Manual/ Documentations

http://cran.r-project.org/manuals.html

http://manuals.bioinformatics.ucr.edu/home/programming-in-r

http://rwiki.sciviews.org/doku.php

http://cran.r-project.org/bin/windows/base/rw-FAQ.html

## Introduction to R Language

What is R (Language)
R is an open-source (GPL) programming language for statistical computing and graphics, made after S and S-plus language. The S language was developed by AT & T laboratories in late 80’s. Robert Gentleman and Ross Ihaka started the research project of the statistics Department of the University of Auckland in 1995 and called R Language.

R language is currently maintained by the R core-development team (international team of volunteer developers). The (R Project website) is the main site for information about R. From this page information about obtaining software, accompanying package and many other sources of documentation (help files) can be obtained.

R provides a wide variety of statistical and graphical techniques such as linear and non-linear modeling, classical statistical tests, time series analysis, classification, multivariate analysis etc., as it is an integrated suite of software having facilities for data manipulation, calculation and graphics display. It includes

• Effective data handling and storage facilities
• Have suite of operators for calculation on arrays, particularly for matrices
• Have a large, coherent, integrated collection of intermediate tools for data analysis
• Graphical data analysis
• Conditions, loops, user-define recursive functions and input output facilities.

Obtaining R Software
R program can be obtained/ download from the R Project site the ready-to-run (binaries) files for several operating system such as Windows, Mac OS X, Linux, Solaris, etc. The source code for R is also available for download and can be compiled for other platforms. R language simplifies many statistical computations as R is a very powerful statistical language having many statistical routines (programming code) developed by people from all over the world and are freely available from the R project website (www.r-project.org) as “Packages”. The basic installation of R language contains many powerful set of tools and it includes some basic packages required for data handling and data analysis.

Many users of R think of R as a statistical system, but it is an environment within which statistical techniques are implemented. R can also be extended via packages.

Installing R
For windows operating system binary version is available from http://cran.r- project.org/bin/windows/base/. “R-3.0.0-win.exe. R-3.0.0” is the latest version of R released on 03-April-2013, by Duncan Murdoch.
After downloading the binary file double click it, an almost automatic installation of the R system will start although the customized installation option is also available. Follow the instruction during the installation procedure. Once Installation process is complete, you have R icon on your computer desktop.

The R Console
When R starts, you will see R console windows, where you type some commands to get required results. Note that commands are typed on R Console command prompt. You can also edit the commands previously typed on command prompt by using left, right, up, down arrow keys, home, end, backspace, insert and delete key from keyboard. Command history can be get by up and down arrow keys to scroll through recent commands. It is also possible to type commands in a file and then execute the file using the source function in R console.

Books
Following books can be useful for learning R and S language.

• “Psychologie statistique avec R” by Yvonnick Noel. Partique R. Springer, 2013.
• “Instant R: An introduction to R for statistical Analysis” by Sarah Stowell. Jotunheim Publishing, 2012.
• “Financial Risk Modeling and Portfolio Optimization with R” by Bernhard Pfaff. Wiley, Chichester, Uk, 2012.
• “An R companion to Applied Regression” by John Fox and Sanfor Weisberg, Sage Publications, Thousand Oaks, CA, USA, 2nd Edition, 2011,
• “R Graphs Cookbook” by Hrishi Mittal, Packt Publishing, 2011
• “R in Action” by Rob Kabacoff. Manning, 2010.
• “The statistical analysis with R Beginners Guide” by John M. Quick. Packt Publishing, 2010.
• “Introducing Monte Carlo Methods with R” by Christian Robert and George Casella. Use R. Springer, 2010.
• “R for SAS and SPSS users” by Robert A. Muenchen. Springer Series in Statistics and Computing. Springer, 2009.

Web Sources
Following are some useful web source for learning R