Homework 12 – Last Homework!! (not due)

load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
homework at: http://www.rossmanchance.com/iscam3/instructors.html

Chapter 5 HW 32 (no technology)

Correlation Properties Suppose that you take a random sample of classes on campus and record the number of students enrolled in the class and the average evaluation score assigned by students to that teacher’s effectiveness (on an A = 4, B = 3, … scale).
(a) If the correlation coefficient turned out to be very close to zero, would you conclude that larger classes tend to have lower teaching evaluation averages? Explain.
(b) Suppose that the correlation coefficient turned out to be r = –0.5. Would you expect this to be more statistically significant if the sample size were n = 5 or if the sample size were n = 50, or would you expect sample size not to matter? Explain.
(c) Suppose that you also record whether the teacher was male or female. Would it make sense to calculate the correlation coefficient between class size and sex? Explain.

Chapter 5 HW 33 (no technology)

Least Squares Coefficients Re-consider the expressions for calculating least squares coefficients for the slope (\(b_1 = r s_y / s_x\)) and intercept (\(b_0 = \overline{y} - b_1 \overline{x}\)) of a regression line. Use these formulas to explain what happens to the least squares line in the following situations. [Hint: You may find it helpful to draw sketches as well as analyze these situations algebraically.]
(a) The mean value of the response variable increases, and all else remains the same. [Hint: Report what happens to slope and to the intercept.]
(b) The mean value of the explanatory variable increases, and all else remains the same.
(c) The standard deviation of the values of the response variable increases, and all else remains the same.
(d) The standard deviation of the values of the explanatory variable increases, and all else remains the same.
(e) The correlation coefficient between the two variables moves closer to zero, and all else remains the same.

Chapter 5 HW 35 (calculator)

Height and Foot Size
Reconsider the height and foot length data of Investigation 5.8 (HeightFoot.txt). In that investigation you used foot length (in centimeters) to predict height (in inches). Recall that some summary statistics are:
Mean Std dev Correlation
Height 67.75 in 5.004 in 0.711
Foot length 28.50 cm 3.445 cm

Now suppose that you want to use these data to construct a model for predicting a student’s foot length from his/her height. (After all, most people can tell you their height off the top of their heads, but few can tell you the length of their right foot in centimeters.)
(a) Use these summary statistics to determine the least squares regression line for predicting foot length from height.
(b) Interpret the slope coefficient of this line in context.
(c) Use the line to predict the foot length of a student who is 66 inches tall.
(d) Identify the units of measurement (e.g., inches, centimeters, no units) on the slope coefficient, the intercept coefficient, and the correlation coefficient.
(e) The least squares line for predicting height from foot length is . Solve this equation for foot length as a linear function of height.
(f) Is the line in (e) the same as the line in (a)? Explain.

Chapter 5 HW 39 (R: summary(lm(response ~ explanatory)) )

Breaking Ice
Nenana is a small, interior Alaskan town that holds a famous competition to predict the exact moment that “spring arrives” every year. The arrival of spring is defined to be the moment when the ice on the Tanana River breaks, which is measured by a tripod erected on the ice with a trigger to an official clock. The minute at which the ice breaks has been recorded in every year since 1917. For example, the dates and times for the years 2000-2004 were:
2000 2001 2002 2003 2004
May 1, 10:47am May 8, 1:00pm May 7, 9:27pm April 29, 6:22pm April 24, 2:16pm

The data file NenanaIceBreak.txt contains all of the data since 1917. Scientists have examined these data for evidence of global warming, which would suggest that the ice break day should be tending to occur earlier as time goes on.
(a) Examine a scatterplot of the day in which the ice broke (coded in column 7 with April 1 = 1) vs. year. Does it reveal any association between the two variables? In other words, is there any indication that the day on which spring begins is changing over time? Explain.
(b) Determine and report the regression line for predicting ice break day from year. Also calculate the correlation coefficient and the value of \(R^2\). Comment on what these reveal, including an interpretation of the slope coefficient.
(c) Conduct a test for whether there is a linear association between ice break day and year. State the hypotheses, and report the test statistic and p-value. Check the technical conditions, and summarize your conclusions.
(d) Would you say that the p-value reveals evidence of a strong association, or strong evidence of an association? Explain.
(e) Do the data suggest that one can make better predictions by taking year into account, rather than simply using the average of the ice break days? Explain.
(f) What date would the regression model predict for the ice break-up in the year 2005? What about 2020? Explain why you should regard these predictions cautiously.