--- title: "Math 58 / 58B - Introduction to (Bio)Statistics" author: "your name here" date: "due Feb 13, 2020" output: pdf_document --- Homework 3 ======================================================== ```{r global_options, include=FALSE} knitr::opts_chunk$set(message=FALSE, warning=FALSE, fig.height=2.5, fig.width=5, fig.align = "center", cache=TRUE) library(tidyverse) library(infer) ``` ### Assignment Summary (Goals) * create CIs * interpret CIs 1. **Chronic illness, Part I** (Problem 2.37, ISRS). In 2013, the Pew Research Foundation reported that "45% of U.S. adults report that they live with one or more chronic conditions". However, this value was based on a sample, so it may not be a perfect estimate for the population parameter of interest on its own. The study reported a standard error of about 1.2%, and a normal model may reasonably be used in this setting. Create a 95% confidence interval for the proportion of U.S. adults who live with one or more chronic conditions. Also interpret the confidence interval in the context of the study. 2. **Chronic illness, Part I** (Problem 2.39, ISRS). In 2013, the Pew Research Foundation reported that "45% of U.S. adults report that they live with one or more chronic conditions", and the standard error for this estimate is 1.2%. Identify each of the following statements as true or false. Provide an explanation to justify each of your answers. (a) We can say with certainty that the confidence interval from Exercise 2.37 contains the true percentage of U.S. adults who suffer from a chronic illness. (b) If we repeated this study 1,000 times and constructed a 95% confidence interval for each study, then approximately 950 of those confidence intervals would contain the true fraction of U.S. adults who suffer from chronic illnesses. (c) The poll provides statistically significant evidence (at the $\alpha = 0.05$ level) that the percentage of U.S. adults who suffer from chronic illnesses is below 50%. (d) Since the standard error is 1.2%, only 1.2% of people in the study communicated uncertainty about their answer. 3. **Competitive Advantage of Uniform Color?** (Chp 1 #10, ISCAM) Does uniform color give athletes an advantage over their competitors? To investigate this question, Hill and Barton (Nature, 2005, https://www.nature.com/articles/435293a) examined the records in the 2004 Olympic Games for four combat sports: boxing, tae kwon do, Greco-Roman wrestling, and freestyle wrestling. Competitors in these sports were randomly assigned to wear either a red or a blue uniform. The researchers investigated whether competitors wearing one color won significantly more often than those wearing the other color. They analyzed results for a total of 457 matches. Of these, red won the match 248 times, while blue won 209 times. (a) Identify the observational units and variable of interest. Indicate which outcome you will consider "success." (b) Explain how and why randomness was used in this study (that is, why does it matter to the results?). (c) Do the technical conditions given on page 124 (section 3.1.1) hold in this setting? (c) Compute a 95% confidence interval for the parameter. Write a sentence interpreting what this interval says, including how you are defining the parameter. (d) Now determine a 90% confidence interval for the parameter. Comment on how it differs from the 95% interval. [Hint: Refer to both the midpoints of the intervals and their widths.] (e) Summarize your results as if to an athletic director at a university. Include discussion about how you are willing to "generalize" these results beyond these 457 matches. 4. Answer the following questions in one or two sentences. (a) If we can’t know for sure whether the confidence interval contains the value of the population parameter, on what grounds can we be confident about this? (b) Survey researchers typically select only one random sample from a population, and then they produce a confidence interval based on that sample.How do we know whether the resulting confidence interval is successful in capturing the unknown value of the population parameter? (c) Why don't we always create 99.99% intervals?