load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(dplyr)
library(ggplot2)
library(oilabs)
This week’s lab will investigate bootstrapping as a method for creating a confidence interval for a difference in two population means. Recall that bootstrapping provides a measurement of the standard error of a statistic. In class, we learned a formula for the SE of a difference in means, but not all statistics have a known formula for their SE.
This lab will find the bootstrap SE and compare it to formula SE. We will also investigate repeating the entire analysis for a statistic about which you do not know a formula for the SE!
There is a reasonable amount of scientific evidence that student evaluation of teaching does not measure teaching effectiveness. Indeed, the evaluations are often noted to be gender biased (as well as biased on other non-teaching-related characteristics). The following article discusses bias issues at length, concluding “SET are biased against female instructors by an amount that is large and statistically significant.”
Boring, Ottoboni, Stark (2016) “Student evaluations of teaching (mostly) do not measure teaching effectiveness”, ScienceOpen Research, https://www.scienceopen.com/document_file/25ff22be-8a1b-4c97-9d88-084c8d98187a/ScienceOpen/3507_XE6680747344554310733.pdf
Consider the dataset (not used in the paper above) gathered from end of semester student evaluations for a large sample of professors from the University of Texas at Austin (variables beginning with cls). In addition, six students rated the professors’ physical appearance (variables beginning with bty). (This is a slightly modified version of the original data set that was released as part of the replication data for Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill, 2007).
data(evals)
# keeping only 2 variables, just to make it easier to understand R output later
evals <- evals %>%
select(score, gender)
glimpse(evals)
The research question at hand is to estimate the difference in average evaluation score for men versus women.
ggplot(data=evals, aes(x=score)) + geom_histogram(bins=30) + facet_grid(gender~.)
What is the statistic of interest in this study? What is the parameter of interest?
Find the theoretical standard error for the statistic of interest. (Use R as below and then as a calculator.)
evals %>%
group_by(gender) %>%
summarize(scoremean = mean(score), scoresd = sd(score), scoresn = n())
The goal of the analysis is to find an interval estimate for the average difference in evaluation scores for men versus women. Bootstrapping will be used to find a confidence interval for the true difference in student evaluation of teaching for the population of teachers at the University of Texas, Austin from where the data were selected.
evals_bs <- evals %>%
rep_sample_n(size=463, replace = TRUE, reps = 1)
set.seed(4747) # choose your own seed number!
evals_all_bs <- evals %>%
rep_sample_n(size=463, replace = TRUE, reps = 1000)
glimpse(evals_all_bs)
diffmeans_bs <- evals_all_bs %>%
group_by(replicate, gender) %>%
summarize(scoremean = mean(score)) %>%
summarize(diff_mean = diff(scoremean))
Calculate the SE of the sample statistics (use the sd function in R applied to the appropriate variable in the dataset, diffmeans_bs, above). With the bootstrap SE, find an approximate CI for the true difference in average evaluation score for males versus females.
Find the theory based CI for the difference in means using the t.test function in R.
Interpret the intervals from above in words of the problem. (Are your intervals close to one another?)