Homework 6

load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(dplyr)
library(ggplot2)
Note:
homework (HW) at: http://www.rossmanchance.com/iscam3/instructors.html
Practicee Problems (PP) in the textbook at the end of the investigations

Note on the lab write up. If you want to use an iscam function, but you don’t want the plot to print (in the knitted file), then type the following as the beginning of the chunk:

  to start R chunk: ```{r fig.keep="none"}

To surpress warnings and messages (can be put together with the figure info above!):

  to start R chunk: ```{r warning=FALSE, message=FALSE}

Practice problem 3.2A

For each of the following statements: (i) identify which is being considered the explanatory variable and which the response variable. (ii) then suggest a potential confounding variable that explains the observed association between the explanatory and response variables. (iii) finally explain how the suggested (confounding) variable is related both to the response and to differences between the explanatory variable groups.

  1. Children with larger feet tend to have higher reading scores than children with smaller feet.
  2. Days with a higher number of ice cream sales also tend to have more drownings.
  3. Cities with higher teacher salaries also tend to have higher sales volumes in alcohol.
  4. People who eat apples regularly tend to have fewer cavities.
  5. Some professional sports teams have better winning percentages when their home games are not sold out than when their home games are sold out.

Chapter 3 HW 3

(you’ll first need the simulation like Inv 3.1; then you can use iscamnormprob or iscamtwopropztest)

Born in California?

As a transplant to California, author A wondered whether California residents were more or less likely to have been born in California back in 1950 or more recently, say in 2000. To investigate this question, he took a random sample of 500 CA residents from the 1950 Census and an independent random sample of 500 CA residents from the 2000 Census. The results are shown in the table below:

Birth 1950 2000
Born in CA 219 258
Not born in CA 281 242
Total 500 500
  1. For each year, calculate the proportion of California residents who were born in California. Use appropriate symbols to represent them. Also calculate the difference between these proportions.
  2. Produce a segmented bar graph to display the conditional proportions who were born in California in these two years. Comment on what the graph, along with your calculations from (a), reveal.
  3. State the appropriate null and alternative hypotheses, in words and in symbols, to address the research question of whether California residents were more or less likely to have been born in California back in 1950 or in 2000.
  4. Use technology (see the code in Inv 3.1 for how to do the simulation!) to conduct a simulation analysis to approximate a p-value for this significance test. Be sure to report the appropriate parameter values (n and \(\pi\)) for the binomial distribution that you simulate from. Also submit a well-labeled histogram of your simulation results. Finally explain how you calculate the approximate p-value and report its value.
  5. Check the conditions for whether the normal approximation is appropriate for this significance test.
  6. Calculate the z-test statistic and p-value based on the normal distribution.
  7. Summarize your conclusion from these analyses, with regard to the research question. Also explain the reasoning process that leads to your conclusion.

Chapter 3 HW 10

Pulling All-Nighters

A study published in the January 2008 issue of the journal Behavioral Sleep Medicine involved a survey of 120 students at St. Lawrence University, a small liberal arts college in upstate New York. Researchers found that students who claimed to have never pulled an all-nighter have average GPAs of 3.1, compared to 2.9 for those students who do claim to have pulled all-nighters.

  1. Identify the explanatory and response variables in this study. Classify each as categorical or quantitative.

  2. Is this an observational study or a randomized experiment? Explain how you know.

  3. Suppose that the difference between these two averages is shown to be statistically significant. Can you legitimately conclude that pulling all-nighters causes a student’s GPA to decrease? If so, explain why. If not, identify a potential confounding variable, and explain how it provides an alternative explanation for why the all-nighter group has a significantly lower average GPA.

Chapter 3 HW 76

(summary on pg 188 of your text should be helpful, you may also need to find a z critical value using either of: iscamnormprob or pnorm)

U.S. Volunteerism (cont.)

From 3.75: In the September 2003 study of volunteerism in the U.S. conducted by the Bureau of Labor Statistics, 25.1% of men and 32.2% of women surveyed said that they had done volunteer work for or through an organization in the previous year.

Reconsider the previous question and the study about volunteerism.

  1. Suppose that the sample sizes had been the same for men and for women. Determine the smallest sample size so that the difference between the observed sample proportions of 0.251 and 0.322 would be significant at the 0.05 level (with a two-sided test).

  2. For the sample sizes that you found in (a), would the same difference in sample proportions (0.071) have been significant if the sample proportions had been 0.451 and 0.522? Report the test statistic and p-value in this case.

  3. Repeat (b) with the same difference in sample proportions (0.071), but assuming that the sample proportions had been 0.051 and 0.122.

  4. Summarize your findings from this analysis.