Homework 3

# note that you can load ISCAM.RData to get the iscambinomprob function
load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
Note:
homework (HW) at: http://www.rossmanchance.com/iscam3/instructors.html
Practicee Problems (PP) in the textbook at the end of the investigations

1. Practice Problem 1.14

  1. Which of the following are advantages of studies with a larger sample size? (Check all that apply)
  • Better represent the population (reduce sample bias)
  • To more precisely estimate the parameter
  • To decrease sampling variabiltiy
  • To make simulation results more precise
  • Other?
  1. In conducting a simulation analysis, why might we take a larger number of samples? (Check all that apply)
  • Better represent the population (reduce sample bias)
  • To more precisely estimate the parameter
  • To decrease sampling variabiltiy
  • To make simulation results more precise
  • Other?

2. Practice Problem 1.7A

For the research study on the mortality rate at St. George’s hospital (Inv 1.3), the goal was to compare the mortality rate of that hospital to the national benchmark of 0.15. Suppose you plan to monitor the next 20 operations, using a level of significance of 0.05. Also suppose the actual death rate at this hospital equals 0.25.

  1. If you were to conclude that the hospital’s death rate exceeds the national benchmark when it really does not, what type of error would you be committing?
  2. If you were to conclude that the hospital’s death rate does not exceed the national benchmark when it really does, what type of error would you be committing?
  3. Which error, type I or type II would you consider more critical here? Explain.
  4. Use the Power Simulation applet to determine the rejection region for a sample of 20 patients. What is the rejection region?
  5. What is the approximate power for this rejection region? Write a one-sentence interpretaion of what we mean by “power” here.

3. Chapter 1 HW 38

Suppose that you conduct 10 independent tests of significance, using the \(\alpha\) = 0.05 significance level for each test. Also suppose that, unknown to you, the null hypothesis is actually true for every test. Let the random variable R be the number of tests for which you (mistakenly) reject the null hypothesis.

  1. Does rejecting the null hypothesis constitute a Type I or a Type II error in this situation?

  2. Explain why the random variable R can be considered as having a binomial distribution, and specify its values of n and \(\pi\).

  3. Determine the expected value of R, the number of tests for which you (mistakenly) reject the null hypothesis.

  4. Determine the probability that you mistakenly reject the null hypothesis for at least one of these tests (i.e., that P(R > 0)).

  5. Repeat (c) and (d) if you conduct 20 independent tests of significance, using the \(\alpha\) = 0.05 significance level for each test, where the null hypothesis is actually true for every test.

  6. Repeat (e), supposing that you use the \(\alpha\) = 0.10 significance level for each test.

4. Chapter 1 HW 40 (no part (f))

Reconsider Exercise 4. Suppose that we obtain a list of all songs with a color in the title, and suppose that we take a random sample of 75 songs from that list and determine the sample proportion, \(\hat{p}\), of songs that are about the color blue. Suppose for now that the claim (30% of all songs with a color in the title are about the color blue) is true.
(a) Verify that the CLT for a sample proportion applies here.

  1. What does the CLT say about the sampling distribution of \(\hat{p}\) (Mention shape, center, and spread, and also draw a well-labeled sketch of the sampling distribution.)

  2. Use the CLT to approximate the probability that between 24% and 36% of the songs in the sample will be “blue.” Include a one-sentence summary of what the calculated probability signifies.

  3. Use the binomial distribution to calculate the probability in (c) exactly. [Hint: First use the sample size of 75 to convert the 24% and 36% into counts.]

  4. How close did the normal approximation in (c) come to the exact binomial probability in (d)?

  5. Don’t do the continuity correction.

  6. Repeat (c)-(e) for finding the probability that between 20% and 40% of the songs in the sample will be “blue.”

5. Chapter 1 HW 41

Reconsider again the claim that 30% of all songs with a color in the title are about the color blue. Again suppose that we obtain a list of all songs with a color in the title, and suppose we take a random sample of 75 songs from that list and determine the sample proportion, \(\hat{p}\), of songs that are about the color blue.

  1. Using the CLT (i.e., the normal distribution, not the binomial), determine the values of \(\hat{p}\) that would lead to rejecting the null hypothesis at 30% of all songs with a color in the title are about the color blue at the 5% level of significance.

  2. Suppose that actually 40% of the population of all songs with a color in the title are about the color blue. Sketch a graph of the resulting sampling distribution, using 40% as the value for \(\pi\), and using the values you determined in (a), determine the probability that we would fail to reject the null hypothesis of 30% at the 5% level of significance. Is this a Type I or a Type II error?