load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(dplyr)
library(ggplot2)
library(oilabs)

This week’s lab will walk through running a linear model in R. Additionally, you will plot the variables using a series of correlation plots.

The Data

Data set contains information from the Ames Assessor’s Office used in computing assessed values for individual residential properties sold in Ames, IA from 2006 to 2010. See http://www.amstat.org/publications/jse/v19n3/decock/datadocumentation.txt for detailed variable descriptions.

For specifics on units and variables:

library(oilabs)
?ames

To Turn In

  1. Create a scatterplot with the explanatory variable (you choose!) on the x-axis, and the response variable on the y-axis. Be sure to have your axes labeled.
ggplot(data = ames, aes(x=explanatory, y=response)) +
        geom_point()
  1. Superimpose the regression line onto your scatterplot. What happens when you make se=TRUE? What happens if you add color to the aesthetics (add color using a categorical variable)? Describe the plot.
ggplot(data = ames, aes(x=explanatory, y=response)) +
        geom_point() +
        geom_smooth(method="lm", se=FALSE)
  1. Calculate and interpret correlation coefficient (sign, strength, linearity).
ames %>%
        select(explanatory, response) %>%
        cor()

Or if you prefer the code with $:

cor(ames$explanatory, ames$response)
  1. Square the correlation coefficient to obtain \(R^2\). Interpret the coefficient of determination in context (e.g., using words like “price” and “number of bathrooms”).

  2. Determine and interpret the slope of the least squares line in context. The interpretation should be of the form: for every additional ____ we ESTIMATE that the AVERAGE ____ changes by ____.

lm(response ~ explanatory, data=ames)  
summary(lm(response ~ explanatory, data=ames))  
  1. Determine and interpret the intercept of the least squares regression line. Explain what this value might signify in this context. Is the interpretation meaningful within the context? Explain.

  2. Is your model significant? That is, do you believe that the explanatory variable is a good predictor of average response in the population? Justify with the appropriate number calculated above.

  3. Choose 4 to 8 variables from the ames dataset. Make sure to have at least two quantitative variables and at least two categorical variables. Run ggpairs on the observations. You should see 4 different types of plots. Explain at least one of each type in the words of the problem (describe just the trends, you will not be able to assess significance from the plots):

data(ames)
ames2 <- ames %>%
  select(4 variables of interest)

library(GGally)
ggpairs(ames2)

# with a least squares regression line:
ggpairs(ames2, lower = list(continuous = wrap("smooth", method = "lm")))

# with another variable to color the relationships:
ggpairs(ames2, aes(color=Central.Air), lower=list(continuous=wrap("smooth", method="lm")))