Lab 9 - Math 58b: Bootstrapping the Difference in Two Means

load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(dplyr)
library(ggplot2)
library(oilabs)

This week’s lab will investigate bootstrapping as a method for creating a confidence interval for a difference in two population means. Recall that bootstrapping provides a measurement of the standard error of a statistic. In class, we learned a formula for the SE of a difference in means, but not all statistics have a known formula for their SE.

This lab will find the bootstrap SE and compare it to formula SE. We will also investigate repeating the entire analysis for a statistic about which you do not know a formula for the SE!

The Data

There is a reasonable amount of scientific evidence that student evaluation of teaching does not measure teaching effectiveness. Indeed, the evaluations are often noted to be gender biased (as well as biased on other non-teaching-related characteristics). The following article discusses bias issues at length, concluding “SET are biased against female instructors by an amount that is large and statistically significant.”

Boring, Ottoboni, Stark (2016) “Student evaluations of teaching (mostly) do not measure teaching effectiveness”, ScienceOpen Research, https://www.scienceopen.com/document_file/25ff22be-8a1b-4c97-9d88-084c8d98187a/ScienceOpen/3507_XE6680747344554310733.pdf

Consider the dataset (not used in the paper above) gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin (variables beginning with cls). In addition, six students rated the professors’ physical appearance (variables beginning with bty). (This is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007).

data(evals)

# keeping only 2 variables, just to make it easier to understand R output later
evals <- evals %>%
  select(score, gender)

glimpse(evals)

Research Question:

The research question at hand is to estimate the difference in average evaluation score for men versus women.

Make histograms to visualize the difference in evaluation scores between men and women.
How do their centers, shapes, and spreads compare? (You might want to calculate some summary statistics!)

ggplot(data=evals, aes(x=score)) + geom_histogram(bins=30) + facet_grid(gender~.)

What is the statistic of interest in this study? What is the parameter of interest?
Find the theoretical standard error for the statistic of interest. (Use R as below and then as a calculator.)

evals %>%
  group_by(gender) %>%
  summarize(scoremean = mean(score), scoresd = sd(score), scoresn = n())

The Analysis

The goal of the analysis is to find an interval estimate for the average difference in evaluation scores for men versus women. Bootstrapping will be used to find a confidence interval for the true difference in student evaluation of teaching for the population of teachers at the University of Texas, Austin from where the data were selected.

Collect one bootstrap sample, create histograms to show the differences in evaluation scores between men and women. Does the bootstrap sample seem substantially different from the original sample?

evals_bs <- evals %>%
  rep_sample_n(size=463, replace = TRUE, reps = 1)

Generate 1000 bootstrap resamples (with replacement). Look at the new dataframe. Notice that there is a new variable corresponding to which resample you have. The new variable is called replicate.

set.seed(4747)  # choose your own seed number!
evals_all_bs <- evals %>%
  rep_sample_n(size=463, replace = TRUE, reps = 1000)
glimpse(evals_all_bs)

For each of the 1000 bootstrap resamples, find the mean evaluation score, subtract the means, and calculate one statistic for each bootstrap resample. Plot the 1000 relevant statistics.

diffmeans_bs <- evals_all_bs %>%
  group_by(replicate, gender) %>%
  summarize(scoremean = mean(score)) %>%
  summarize(diff_mean = diff(scoremean))

Calculate the SE of the sample statistics (use the sd function in R applied to the appropriate variable in the dataset, diffmeans_bs, above). With the bootstrap SE, find an approximate CI for the true difference in average evaluation score for males versus females.
Find the theory based CI for the difference in means using the t.test function in R.
Interpret the intervals from above in words of the problem. (Are your intervals close to one another?)

To Turn In

Find an approximate bootstrap CI for the difference in median evaluation scores instead of mean evaluation scores. Some things to note:

Your parameter and statistic both change. What are they now?
You will not be able to compare your bootstrap results to a theory based CI. Why not?
Which interval is larger, the interval for the difference in means or the interval for the difference in medians?

Interpret the interval in words of the problem.

Lab 9 - Math 58b: Bootstrapping the Difference in Two Means

assignment

April 21, 2017

The Data

Research Question:

The Analysis

To Turn In