load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(dplyr)
library(ggplot2)
library(oilabs)

This week’s lab will investigate bootstrapping as a method for creating a confidence interval for a difference in two population means. Recall that bootstrapping provides a measurement of the standard error of a statistic. In class, we learned a formula for the SE of a difference in means, but not all statistics have a known formula for their SE.

This lab will find the bootstrap SE and compare it to formula SE. We will also investigate repeating the entire analysis for a statistic about which you do not know a formula for the SE!

The Data

There is a reasonable amount of scientific evidence that student evaluation of teaching does not measure teaching effectiveness. Indeed, the evaluations are often noted to be gender biased (as well as biased on other non-teaching-related characteristics). The following article discusses bias issues at length, concluding “SET are biased against female instructors by an amount that is large and statistically significant.”

Boring, Ottoboni, Stark (2016) “Student evaluations of teaching (mostly) do not measure teaching effectiveness”, ScienceOpen Research, https://www.scienceopen.com/document_file/25ff22be-8a1b-4c97-9d88-084c8d98187a/ScienceOpen/3507_XE6680747344554310733.pdf

Consider the dataset (not used in the paper above) gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin (variables beginning with cls). In addition, six students rated the professors’ physical appearance (variables beginning with bty). (This is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007).

data(evals)

# keeping only 2 variables, just to make it easier to understand R output later
evals <- evals %>%
  select(score, gender)

glimpse(evals)

Research Question:

The research question at hand is to estimate the difference in average evaluation score for men versus women.

  1. Make histograms to visualize the difference in evaluation scores between men and women.
    How do their centers, shapes, and spreads compare? (You might want to calculate some summary statistics!)
ggplot(data=evals, aes(x=score)) + geom_histogram(bins=30) + facet_grid(gender~.)
  1. What is the statistic of interest in this study? What is the parameter of interest?

  2. Find the theoretical standard error for the statistic of interest. (Use R as below and then as a calculator.)

evals %>%
  group_by(gender) %>%
  summarize(scoremean = mean(score), scoresd = sd(score), scoresn = n())

The Analysis

The goal of the analysis is to find an interval estimate for the average difference in evaluation scores for men versus women. Bootstrapping will be used to find a confidence interval for the true difference in student evaluation of teaching for the population of teachers at the University of Texas, Austin from where the data were selected.

  1. Collect one bootstrap sample, create histograms to show the differences in evaluation scores between men and women. Does the bootstrap sample seem substantially different from the original sample?
evals_bs <- evals %>%
  rep_sample_n(size=463, replace = TRUE, reps = 1)
  1. Generate 1000 bootstrap resamples (with replacement). Look at the new dataframe. Notice that there is a new variable corresponding to which resample you have. The new variable is called replicate.
set.seed(4747)  # choose your own seed number!
evals_all_bs <- evals %>%
  rep_sample_n(size=463, replace = TRUE, reps = 1000)
glimpse(evals_all_bs)
  1. For each of the 1000 bootstrap resamples, find the mean evaluation score, subtract the means, and calculate one statistic for each bootstrap resample. Plot the 1000 relevant statistics.
diffmeans_bs <- evals_all_bs %>%
  group_by(replicate, gender) %>%
  summarize(scoremean = mean(score)) %>%
  summarize(diff_mean = diff(scoremean))
  1. Calculate the SE of the sample statistics (use the sd function in R applied to the appropriate variable in the dataset, diffmeans_bs, above). With the bootstrap SE, find an approximate CI for the true difference in average evaluation score for males versus females.

  2. Find the theory based CI for the difference in means using the t.test function in R.

  3. Interpret the intervals from above in words of the problem. (Are your intervals close to one another?)


To Turn In

  1. Find an approximate bootstrap CI for the difference in median evaluation scores instead of mean evaluation scores. Some things to note:
  1. Interpret the interval in words of the problem.