--- title: "An R Tutorial" author: "Jo Hardin, Pomona College" date: "Tuesday, September 02, 2014" output: pdf_document --- # 1. Starting Out R is an interactive environment for statistical computing and graphics. This tutorial will assume usage of R 2.15 on a PC. However, except in rare situations, these commands will work in R on UNIX and Macintosh machines as well as in S-Plus on any platform. R can be freely downloaded at \url{http://www.r-project.org/}. You also need to download RStudio. After installing R (see above) on your computer, you will install RStudio, \url{http://rstudio.org/}. Your assignmnets will be created using R Markdown. After writing the R code and relevant text, click the **Knit** button. A document will be generated that includes both content as well as the output of any embedded R code chunks within the document. # 2. Basics ## Preliminaries ```{r} # A `\#' sign is considered a comment and will not be processed by R. # You can press the up arrow key to rewrite the previous prompt. # For most functions, R has useful help information. For example, try help(cos) # to see help on the cosine function. From this help, you can read that the # angle is in radians and that the function for arccos is called acos() in R. ``` ## Arithmetic ```{r} 5 # you type in a 5 at the prompt and press Enter 4^ 3-2*7 + 9/2 # computes 4^3-(2*7)+(9/ 2). R follows the rules for the order of # opperations and ignores spaces \textit{between} numbers (or objects) ``` ## Objects ```{r} pi # gives pi (rounded, of course) temp = 11/2 # Create an object called temp which will be stored in your workspace temp # now temp is an object you can use like pi (note that caps are important) ls() # To list the objects you have created in your workspace remove(temp) # To remove an object from your directory ``` ## Vectors ```{r} x = c(3,4,7) # c is concatenate. x 3*x-7 # use arithmetic on each number in the vector x x[2] # To view the second element of x. In general, square brackets [ ] # are used to get an element from a vector or matrix x[c(1,3)] # To view the first and third elements of x. # x[1,3] # Note that x[1,3] fails (so I can't type it in my markdown file!) # In general, round brackets ( ) are used in functions like c() x[-1] # To view all but the 1st element of x ``` ## Matrices The last two arguments of the matrix function are the number of rows and columns,brespectively. If you would like the numbers to be entered by row instead of the default by column, add an additional argument, byrow=T. The matrix function is: matrix(vector of data, numrows, numcols, byrow=T) ```{r} # An example of how to create a 2x3 matrix: y = matrix(c(1,2,3,4,3,4),2,3) y # To view the matrix, y y[1,] # To view the first row of y y[,c(2,3)] # To view the second and third columns of y ``` ## Modes Most of the objects in your directory will be of one of three forms: vector, matrix (also calledban array), data.frame, or list. A vector means that it's a one dimensional matrix. A list means that itbcontains all sorts of thing (table, vector, array,... we'll see more later). The following functions will help you figure out what time of object you are working with: *length, dim, mode, names* If you have a list or a data.frame, *names* will tell you the names of the part of the list. *names* will also tell you the names of the columns in a data.frame, if the columns have been assigned names. If you have a numeric object, using *length* and *dim* will tell you whether you have a vector or a matrix. ```{r} length(x) dim(x) # because x doesn't have at least 2 dimensions length(y) # the total length of y dim(y) # two rows by three columns ``` # 3. Plots R has some very nice graphics capabilities. We'll only cover a minimal amount of information here. ## Scatterplots ```{r} # Let's create a mock simple linear regression data matrix: regdata = matrix(c(1:6,c(10,15,19,26,32,37)),6,2) regdata ``` Here, we will assume that the first column is the explanatory variable and the second column is the response variable. ```{r} # To plot the data points: plot(regdata[,1], regdata[,2]) # the x-axis variable is the first argument # To plot the data points and have the points connected by lines: plot(regdata[,1], regdata[,2], type='b') # To plot only a line (the argument is the letter *l* not a number *1*) plot(regdata[,1], regdata[,2], type='l') # To add meaningful axis labels and a title to the scatterplot: plot(regdata[,1], regdata[,2], xlab='expl var', ylab='response', main='Regression data scatterplot') # To set your own limits for the y-axis: plot(regdata[,1], regdata[,2], ylim=c(0,40)) ``` ## Histograms ```{r} # To plot a histogram of uniform(0,1) random data: hist(runif(1000, min=0, max=1)) # Or standard normal data: hist(rnorm(1000, mean=0, sd=1)) # The function *hist* allows for several useful options. Type *help(hist)* for details. ``` ## Barplots ```{r} popcorn = matrix(c(6,52,15,43),ncol=2) help(barplot) barplot(popcorn) # gives a simple barplot barplot(popcorn,names.arg=c("low","high")) # adds labels to the x-axis pop.bp<-barplot(popcorn) # To add text to the plot: the first argument is the barplot, the second is the y-axis # values, and the third argument is the text you want to place on the barplot text(pop.bp,c(popcorn[1,]-1,popcorn[2,]+popcorn[1,]-1) ,t(popcorn)) # Try removing the "-1" to see what you get: barplot(popcorn) text(pop.bp,c(popcorn[1,],popcorn[2,]+popcorn[1,]) ,t(popcorn)) # Or try not transposing the last argument, the "text" you want to put on the plot: barplot(popcorn) text(pop.bp,c(popcorn[1,]-1,popcorn[2,]+popcorn[1,]-1),popcorn) ``` # 4. Some Useful Feastures and Functions ```{r} x = c(3,4,7) # create our original object vector, x: x >= 5 # indicates which elements in x are greater than or equal to 5\\ x[x >=5] # produces all numbers in x that are greater than or equal to 5\\ round(x/7,2) # view only the first two decimal places of x/7\\ ``` ## Other Functions to Create Simple Vectors ```{r} # For most of these functions, use help(function) for additional information seq(3,1,by=-.5) # Creates a sequence from 3 to 1 counting by -.5 # look at help(seq) to see how else you can make a sequence 1:3 # A shortcut for creating a sequence counting by 1 # note that 3:1 will count 3,2,1 rep(1,3) # repeat 1 three times rep(1:3,2) # repeat a sequence from 1 to 3 two times # functions follow normal order of opperations i.e. stuff in ()s happens first sort(rep(seq(1,2,length=3),2)) # the length parameter in seq() sets the length of the seq # sort() arranges a vector small to large (by default) rev(c(1,4,3)) # reverses the order of the elements in the vector c(sort(c(seq(1,2,by=.2),1.7))[-c(2,3)],.7) # make sure you can understand! # remember that [-x] removes the x^th element ``` ## An Example (on cars) ```{r} library(MASS) # MASS is a library that contains datasets and functions data(Cars93) # We're going to use the Cars93 dataset dim(Cars93) # How big is the dataset? length(Cars93) # What is the difference between dim and length? names(Cars93) # What are the variables in the Cars93 dataset? table(Cars93$Type) # One way to see what types of cars are in the dataset table(Cars93[,3]) # "Type" happens to be the 3rd column of the dataset table(Cars93$AirBags) # But what if we want to know which types of cars have airbags? Cars93[1:10,c(3,9)] # If our interest is in type vs. airbag table(Cars93[,c(3,9)]) # Which type of car seems most likely to have an airbag (in 1993)? ``` ## Functions to Analyze Vectors ```{r} z = c(1,4,3) length(z) # Determine the length of the vector z: c(max(z), sort(z)[length(z)]) # Find the maximum value of z in two ways: # note that the answer is a vector because I concatenated the solutions of two functions rank(z) # Identify the rank of the elements of z: # i.e., the 1st element is the min and the 2nd element is the max z[rank(z)==2] # Which is the second smallest element in z # or you can also say \verb!sort(z)[2]! sample(z,10,replace=T) # To randomly sample from an existing vector: # This is a random sample with replacement of z sample(1:500,4,replace=F) # Randomly sample from a sequence from 1 to 500: # replace=F means the same number won't be sampled twice x = c(3,4,7) rbind(x,z) # rbind binds by rows to make a matrix cbind(x,z) # cbind binds by columns (treat x and z as columns) ``` ## Dealing with Data and Data Files ```{r} # To read in an external dataset called data.txt: # tempfile <-scan(`data.txt')= # # Notice that scan brings the data into R in a vector format regardless of the original format. # To read in an external dataset in a more flexible format use *read.table( )*. # Check *help(read.table)*. Also, note that both *read.table()* and *scan()* # have several options which are useful for datasets with interesting features, such as the # column names on the first row, spacing by special symbols, etc. # # Writing data to a file, tempout1, by column: # write(u, file=`tempout1') # Writing data to a file, tempout2, by row: # write(t(u), file=`tempout2') # Writing another vector to the same file by row: # write(t(x), file=`tempout2', append=T) ``` ## More Useful Functions For additional information on any of these functions use help(function) i.e. *help(log)* (note the default of the *log* function is base *e*!). sin, cos, tan, asin, acos, atan log, log10, exp min, median, max, quantile sum, prod var, sd, cov, cor union, intersect t (for transposing a matrix) solve (for the inverse of a matrix) diag (for the diagonal of a matix) ## Getting Help Most importantly, please email me if you are stuck!! Additionally, my website has links to 3 R manuals that you may find helpful. The R site has some useful information, but it isn't always easy to navigate. Try: \url{\ttfamily www.r-project.org} or \url{\ttfamily http://cran.r-project.org/doc/contrib/refcard.pdf} * Minimal R for Intro Stats \url{http://cran.r-project.org/web/packages/mosaic/vignettes/V1MinimalR.pdf} * Commands for Intro Stats \url{http://cran.r-project.org/web/packages/mosaic/vignettes/V3Commands.pdf} * Venables, Smith, \& R Team (2009). {\em An Introduction to R (2ed)}. Network Theory Limited. * Everitt, B., Hothorn T. (2006). {\em A Handbook of Statistical Analyses Using R.} Champman \& Hall. * Dalgaard P. (2002). {\em Introductory Statistics with R.} New York: Springer-Verlag. * Ripley B.D., Venables W.N. (2002). {\em Modern Applied Statistics with S (4ed).} New York: Springer-Verlag. ## Using Help As mentioned above, help can be found on any function by using, *help(function)*. The help files are all formatted in the same way: + Description: is a few words describing what the function does + Useage: gives the actual function, and what the inputs (i.e., arguments) to the function are + Arguments: specifies the format of the arguments. This section also gives the default values (if applicable) for the arguments + Details: gives any additional relevant information about the function + Value: specifies the output of the function. Sometimes the output is a plot, sometimes a matrix, sometimes a list of information, sometimes either a plot or a matrix of information (e.g., *hist*) + References: where to get more information about the function + See Also: other R functions that are related + Examples: examples to try. If you don't understand what the function is doing / what you should input / what the format of the output is, you should walk through the examples. All the examples should use data and functions that are internal to R; that is, simply copy and paste the example into the R session.