There are many ways to read data into R-Language. We will learn here importing data in R-Language too. We can also generate certain kind of patterned data too. Some of them are
Reading data from keyboard directly
For small data (few observations) one can input data in vector form directly on R Console, such as
y<-c(‘a’, ‘b’, ‘c’)
In vector form data can be on several lines by omitting the right parentheses, until the data are complete, such as
Note that it is more convenient to use the scan function, which permits with the index of the next entry.
Using Scan Function
For small data set it is better to read data from console by using scan function. The data can be entered on separate line, by using single space and/or tab. After entering complete required data, pressing twice the enter key will terminate the scanning.
3 4 5
4 5 6 7
2 3 4 5 6 6
Reading String Data using “what” Option
red green blue
The scan function can be used to import data. The scan function returns a list or a vector while read.table function returns a dataframe. It means that scan function is less useful for imputing “rectangular” type data.
Reading data from ASCII or plain text file into R as dataframe
The read.table function read any type of delimited ASCII file. It can be numeric and character values. Reading data into R by read.table is easiest and most reliable method. The default delimiter is blank space.
data<-read.table(file=file.choose()) #select from dialog box
data<-read.table(“http://itfeature.com/test.txt”, header=TRUE)) # read from web site
Note that read.table command can also be used for reading data from computer disk by providing appropriate path in inverted commas such as
data<-read.table(“D:/data.txt”, header=TRUE)) # read from your computer
For missing data, read.table will did not work and you will receive an error. For missing values the easiest way to fix this error, change the type of delimiter by using sep argument to specify the delimiter.
data<-read.table(“http//itfeature.com/missing_comma.txt”, header=TRUE, sep=”,”))
Comma delimited files can be read in by read.table function and sep argument, but it can also be read in by the read.csv function specifically written for comma delimited files. To display the contents of the file use print function, or file name.
Reading in fixed formatted files
To read data in fixed format use read.fwf function and argument width is used to indicate the width (number of columns) for each variables. In this format variable names are not there in first line, therefore they must be added after read in the data. Variable names are added by dimnames function and the bracket notation to indicate that we are attaching names to the variables (columns) of the data file. Any how there are several different ways to to this task.
dimnames(data)[]<-c(“v1”, “v2”, “v3”, “v4”, “v5″,”v6”)
Importing data into R is fairly simple. For Stata and Systat, use the foreign package. For SPSS and SAS recommended package is Hmisc package for ease and functionality. See the Quick-R section on packages, for information on obtaining and installing the these packages. Example of importing data are provided below.
On windows systems you can use the RODBC package to access Excel files. The first row of excel file should contain variable/column names.
# Excel file name is myexcel and WorkSheet name is mysheet
channel <- odbcConnectExcel(“c:/myexel.xls”)
mydata <- sqlFetch(channel, “mysheet”)
# First save SPSS dataset in trasport format
# in R
mydata <- spss.get(“c:/data.por”, use.value.labels=TRUE)
# “use.value.labels” option converts value labels to R factors.
# save SAS dataset in trasport format
libname out xport ‘c:/mydata.xpt’;
# in R
mydata <- sasxport.get(“c:/data.xpt”)
# character variables are converted to R factors
# input Stata file
mydata <- read.dta(“c:/data.dta”)
# input Systat file
mydata <- read.systat(“c:/mydata.dta”)
Accessing Data in R Library
Many of the R libraries including CAR library contains data sets. For example to access the Duncan dataframe from the CAR library in R type the following command on R Console
Some Important Commands for dataframes
data #displays the entire data set on command editor
head(data) #displays the first 6 rows of dataframe
tail(data) #displays the last 6 rows of dataframe
str(data) #displays the names of variable and their types
names(data) #shows the variable names only
rename(V1,Variable1, dataFrame=data) # renames V1 to variable 1; note that epicalc package must be installed.
ls() #shows a list of objects that are available
attach(data) #attached the dataframe to the R search path, which makes it easy to access variables names.