Homework 11

load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(dplyr)
library(ggplot2)
library(readr)
homework at: http://www.rossmanchance.com/iscam3/instructors.html

Note that the ANOVA p-value comes from the following R code, page 333 of your text:

summary(aov(response~explanatory)) 

From the following applet

Robustness of ANOVA: http://shiny.stat.calpoly.edu/ANOVA_robust/
  1. Simulate samples of size 20 from populations with equal means and standard deviations 6, 6, and 6. What is your Type I error rate?

  2. Simulate samples of size 20 from populations with equal means and standard deviations 4, 6, and 8. Now what is your Type I error rate?

  3. Simulate samples of size 20 from populations with equal means and standard deviations 1, 6, and 11. Now what is your Type I error rate?

  4. Do the error rates you found in a, b, and c vary by sample size (when sample sizes are equal)? (That is, if the samples are still equal but much larger or much smaller, do you see the same variability in error rates?)

  5. Next repeat your simulation study with sample sizes 10, 20, and 30. How do results differ?

  6. Finally, repeat the above simulation studies, but specify population means to be -3, 0, and 3, so that you study the power of the test under different conditions. How does the power change with equal/unequal standard deviations and/or equal/unequal sample sizes?

Chapter 4 HW 21

Crash Tests
The National Transportation Safety Administration conducts crash tests on automobiles. The file crash.txt contains data on automobile crash tests in which stock automobiles are crashed into a wall at 35 miles per hour with dummies in the driver and front passenger seat (as reported by the Data and Story Library (DASL) web site, http://lib.stat.cmu.edu/DASL/Datafiles/Crash.html ). Response variables are measurements of injury extent on head (column 5), chest (column 6), left leg (column 7), and right leg (column 8). Explanatory variables include whether the dummy was on the driver or passenger side (column 9), protective devices in the car (column 10), number of doors on the car (column 11: 2, 4, or other), year of make (column 12), and size of car (column 14).

crashdata <- read.csv("http://pages.pomona.edu/~jsh04747/courses/math58/Math58Data/crashdata.csv", header=TRUE)
  1. Produce boxplots of head injury measurements by number of doors. Write a paragraph comparing and contrasting the distributions of these measurements among the three groups. (Comment on shape, center, spread, unusual observations, and any other features of interest. Pay particular attention to the question of whether the extent of head injury seems to differ among the three groups.)
  2. In addition to the boxplots, produce histograms of the head injury measurements for each of the “number of doors” categories. Do the data suggest that each of the population distributions of head injury measurements are normally distributed? Explain.
  3. Apply the log transformation to the head injury measurements. Then examine boxplots or histogram of this transformed variable by number of doors. Do these distributions appear to be roughly normally distributed?
  4. Does the technical condition about equal population standard deviations appear to be satisfied on the transformed data? Explain.
  5. Conduct an ANOVA on these transformed data. Report the hypotheses along with the value of the F statistic and p-value. Summarize your conclusions about whether the data provide evidence that the extent of head injury varies among vehicles with different numbers of doors.

Chapter 4 HW 24

Comparing Means
Suppose that instructors A, B, and C are each teaching three large sections of a course, and each instructor wants to study whether the mean exam scores differ significantly across his/her three sections. Suppose that each takes a random sample of ten students, and calculates the following descriptive statistics:

*See the online HW for a better layout to this table of data:*
A1 A2 A3 B1 B2 B3 C1 C2 C3
Sample size 10 10 10 10 10 10 10 10 10
Sample mean 50 60 70 50 60 70 57 60 63
Sample std. dev. 24 24 24 5 5 5 5 5 5

  1. Based on these statistics, which instructor has the strongest evidence that the mean scores differ significantly across his/her three sections? Which has the least evidence? Explain your answers.
  2. Hypothetical data matching these statistics can be found in HypoAnova.txt. Perform an ANOVA for each instructor, and report the test statistics and p-values.
hypoanova <- read_delim("http://www.rossmanchance.com/iscam2/data/HypoAnova.txt", "\t")

hypoanovaA <- hypoanova[1:30,]
hypoanovaB <- hypoanova[31:60,]
hypoanovaC <- hypoanova[61:90,]
  1. Do the results in (b) confirm your answers to (a)? If so, explain. If not, explain how you would not change your answers to (a) and why.