sudo echo "deb http://cran.csiro.au/bin/linux/ubuntu precise/" >> /etc/apt/sources.list
sudo apt-get update
sudo apt-get install r-base
Then to enter the R command line interface, $ R
For starters, will run through an intro from UCLA: http://www.ats.ucla.edu/stat/r/seminars/intro.htm
Within the R command line interface if a package is to be used it must first be installed:
install.packages()
- foreign – package to read data files from other stats packages
- xlsx – package (requires Java to be installed, same architecture as your R version, also the rJava package and xlsxjars package)
- reshape2 – package to easily melt data to long form
- ggplot2 – package for elegant data visualization using the Grammar of Graphics
- GGally – package for scatter plot matrices
- vcd – package for visualizing and analyzing categorical data
install.packages("xlsx")
install.packages("reshape2")
install.packages("ggplot2")
install.packages("GGally")
install.packages("vcd")
Pre-requisites:
sudo apt-get install openjdk-7-*
sudo ln -s /usr/lib/jvm/java-7-openjdk-amd64/bin/java /etc/alternatives/java
sudo R CMD javareconf
Preparing session:
After installing R and the packages needed for a task if these packages are needed in the current session they must be included:
require(foreign)
require(xlsx)
After attaching all of the required packages to the current session, confirmation can be completed via:
sessionInfo()
R code can be entered into the command line directly or saved to a script which can be run inside a session using the ‘source’ function.
Help can be attained using ? preceding a function name.
Entering Data:
R is most compatible with datasets stored as text files, ie: csv.
Base R contains functions read.table and read.csv see the help files on these functions for many options.
# comma separated values
dat.csv <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv")
# tab separated values
dat.tab <- read.table("http://www.ats.ucla.edu/stat/data/hsb2.txt", header=TRUE, sep = "\t")
Datasets from other statistical analysis software can be imported using the foreign package:
require(foreign)
# SPSS files
dat.spss <- read.spss("http://www.ats.ucla.edu/stat/data/hsb2.sav", to.data.frame=TRUE)
# Stata files
dat.dta <- read.dta("http://www.ats.ucla.edu/stat/data/hsb2.dta")
If converting excel spreadsheets to CSV is too much of a hassle the xlxs package we imported will do the job:
# these two steps only needed to read excel files from the internet
f <- tempfile("hsb2", fileext=".xls")
download.file("http://www.ats.ucla.edu/stat/data/hsb2.xls", f, mode="wb")
dat.xls <- read.xlsx(f, sheetIndex=1)
Viewing Data:
# first few rows
head(dat.csv)
# last few rows
tail(dat.csv)
# variable names
colnames(dat.csv)
# pop-up view of entire data set (uncomment to run)
View(dat.csv)
Datasets that have been read in are stored as data frames which have a matrix structure. The most common method of indexing is object[row,column] but many others are available.
# single cell value
dat.csv[2, 3]
# omitting row value implies all rows; here all rows in column 3
dat.csv[, 3]
# omitting column values implies all columns; here all columns in row 2
dat.csv[2, ]
# can also use ranges - rows 2 and 3, columns 2 and 3
dat.csv[2:3, 2:3]
Variables can also be accessed via their names:
# get first 10 rows of variable female using two methods
dat.csv[1:10, "female"]
dat.csv$female[1:10]
The c function is used to combine values of common type together to form a vector:
# get column 1 for rows 1, 3 and 5
dat.csv[c(1, 3, 5), 1]
## [1] 70 86 172
# get row 1 values for variables female, prog and socst
dat.csv[1, c("female", "prog", "socst")]
## female prog socst
## 1 0 1 57
Creating colnames:
colnames(dat.csv) <- c("ID", "Sex", "Ethnicity", "SES", "SchoolType", "Program",
"Reading", "Writing", "Math", "Science", "SocialStudies")
# to change one variable name, just use indexing
colnames(dat.csv)[1] <- "ID2"
Saving data:
#write.csv(dat.csv, file = "path/to/save/filename.csv")
#write.table(dat.csv, file = "path/to/save/filename.txt", sep = "\t", na=".")
#write.dta(dat.csv, file = "path/to/save/filename.dta")
#write.xlsx(dat.csv, file = "path/to/save/filename.xlsx", sheetName="hsb2")
# save to binary R format (can save multiple datasets and R objects)
#save(dat.csv, dat.dta, dat.spss, dat.txt, file = "path/to/save/filename.RData")
#change workspace directory
setwd("/home/a/Desktop/R/testspace1")