Homework 5

load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(dplyr)
library(ggplot2)

Note:
homework (HW) at: http://www.rossmanchance.com/iscam3/instructors.html
Practicee Problems (PP) in the textbook at the end of the investigations

Note on the lab write up. If you want to use an iscam function, but you don’t want the plot to print (in the knitted file), then type the following as the beginning of the chunk:

  to start R chunk: ```{r fig.keep="none"}

Chapter 0 HW2

Consider the following four histograms of (hypothetical) quiz scores in four classes (scores are integers ranging from 1 to 9):

Between class A and class B, which has more variability in quiz scores? [Hint: Think which will have the larger standard deviation and/or larger interquartile range. Explain your reasoning.]
Between classes C and D, which has more variabilty in quiz scores? Explain your reasoning.

Chapter 2 HW 17

The following histogram displays the (hypothetical) quiz scores for a class of n = 29 students.

Suppose we were to give every student 5 bonus points.

How would the mean change? The median?
How would the standard deviation change? The inter-quartile range?

Note: You should explain your answers to (a) and (b) without carrying out the calculations to find these new values.

Chapter 2 PP 2.4A

Use the Central Limit Theorem to estimate the probability that the sample mean of 20 rancomly selected passengers exceeds 159.57lbs, assuming a normal population with mean 167lbs (\(\mu\)) and standard deviation 35lbs (\(\sigma\)).
Is the probability you found in (a) larer or smaller than the probability you found (read: the investigation found) for 47 passengers? Explain why your answer makes intuitive sense.
Repeat (a) assuming a uniformly distributed population of weights (Use population 3 from the applet. You can include a screen shot as below, or just staple the screen shot onto your HW.) How do these two probabilities compare? [Hint: Think about whether it is more appropriate to use the Sampling from Finitie Population applet or the CLT to answer this question.]

How to includ a screen shot (as .png) into a markdown file:
![](MyScreenShotThatLivesInTheSameDirectoryAsThisFile.png)

Explain why the calculation in (a) does not estimate the probability of the Ethan Allen sinking with 20 passengers. [Hint: this problem is not about the shape of the distribution.]

Chapter 0 HW 3

The following dotplots display the distribution of sleeping times (per day, in hours) of three college students (Amber, Katherine, Sarah) for a nine-week period in the fall of 2004.

One of these students developed mononucleosis during the term and so was told to get as much rest as possible for several weeks. Which student do you think this is? Explain your reasoning.
One of these students is the mother of two small children. Which student do you think this is? Explain your reasoning.
Which student recorded her sleeping times only to the nearest hour? Explain.
Which student generally got the most sleep? Which generally got the least?

Chapter 2 HW 9

Reconsider the students’ sleeping times from the Chapter 0 Exercises (SleepStudents.txt).

Determine the five-number summary of sleeping times for each student.
For each student, determine which (if any) of their sleeping times qualify as outliers by the 1.5IQR rule.
Create boxplots of these students’ sleeping times on the same scale. Comment on what these boxplots reveal.
What does the dotplot reveal about Amber’s sleeping times that the boxplot does not?

Some code you might find useful:

studentSleep = read.table("http://www.rossmanchance.com/iscam2/data/SleepStudents.txt",
                          sep="\t", header=TRUE, na.strings="*")


studentSleep %>%
  select(sleep.stacked., student) %>%
  group_by(student) %>%
  summarize(meanslp = mean(sleep.stacked.), medslp = median(sleep.stacked.),
            sdslp = sd(sleep.stacked.), iqrslp = IQR(sleep.stacked.),
            slp25 = quantile(sleep.stacked., 0.25))


studentSleep %>%
  filter(student == "Amber") %>%
  filter(sleep.stacked. < 10) %>%
  summarize(badsleep = n())

Reconsider the students’ sleeping times from the Chapter 0 Exercises (SleepStudents.txt).

Determine the five-number summary of sleeping times for each student.
For each student, determine which (if any) of their sleeping times qualify as outliers by the 1.5IQR rule.

group_by the student
then create a new variable with lower and upper bounds (using data from above): report bounds

then filter for each student one at a time.
then filter by points that are above the upper or below the lower  filter(value < # | value > #) %>%

Create boxplots of these students’ sleeping times on the same scale. Comment on what these boxplots reveal.

you'll want to use qplot with geom="boxplot"
x-axis should have student and y-axis should have sleep.stacked.

What does the dotplot reveal about Amber’s sleeping times that the boxplot does not?

Chapter 2 HW 10

Sleeping Students (cont.)
Reconsider the students’ sleeping times from Exercise 9 (SleepStudents.txt).

Calculate the mean and standard deviation of sleeping times for each student.
For each student, determine the proportion of the 63 sleeping times that fall within one standard deviation of the mean.
For which student does the empirical rule (see part(b)) appear to hold most closely? For that student, determine the proportion of sleeping times that fall within two standard deviations of the mean.
Suppose that Katherine gets 10 hours of sleep in a particular night. How many hours more than her mean is this? Also calculate the z-score for this value.
Suppose that Amber gets 13 hours of sleep in a particular night. How many hours more than her mean is this? Also calculate the z-score for this value.
Which of these (10 hours for Katherine or 13 for Amber) is higher above that student’s mean? Which has the higher z-score? Explain why your answers are not the same.

Chapter 2 HW 5

Guess the Instructor’s Age The file AgeGuesses.txt contains guesses of an instructor’s age by her current students. Let \(\mu\) represent the average guess of her age by all current students at the university and suppose the sample constitutes a representative sample of all students at this school on this issue.

AgeGuesses = read.table("http://www.rossmanchance.com/iscam2/data/AgeGuesses.txt",
                        sep="\t", header=TRUE, na.strings="*")
head(AgeGuesses)

##   guesses
## 1      26
## 2      28
## 3      30
## 4      31
## 5      33
## 6      35

Produce numerical and graphical summaries of the distribution and describe what you learn (in context).
Use a histogram to decide whether the data has strong deviations from the pattern of a normal distribution.
Use technology to determine a 90% one-sample t-interval for these data. Include your output and comment on the validity of this procedure. Provide a one-sentence interpretation of this interval.
Count how many of the class guesses are inside the 90% confidence interval. Compute the percentage of the class guesses that are inside the interval. Is this close to 90%? Should it be?
Calculate and interpret a 90% prediction interval. Include the details of your calculation and comment on the validity of this procedure. How does the prediction interval compare (midpoint, length) to the confidence interval?

Math 58B - Introduction to Biostatistics

Jo Hardin

Spring 2017