Math 152, Fall 2013 Jo Hardin
install.packages("bootstrap", repos = "http://cran.us.r-project.org")
## Installing package into '\\wells/fac-staff$/jsh04747/My
## Documents/R/win-library/3.0' (as 'lib' is unspecified)
## package 'bootstrap' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\jsh04747\AppData\Local\Temp\RtmpcbAWa1\downloaded_packages
library(bootstrap)
.1. The formal medical term for vitamin A in the blood is serum retinol. Serum retinol has various beneficial effects, such as protecting against fractures. Medical researchers working with children in Papua New Guinea asked whether recent infections reduce the level of serum retinol. They classified children as recently infected or not on the basis of other blood tests, then measured serum retinol. Of the 90 children in the sample, 55 had been recently infected. The observations, in micromoles per liter, is at . One conclusion from the data is “The mean serum retinol level in uninfected children was 1.255 times the mean level in the infected children. A 95% confience interval for the ratio of means in the population of all children in Papua New Guinea is…”
(a) Find and interpret a two sided CI using a percentile interval.
(b) Find and interpret a two sided CI using a t-CI.
(c ) Find and interpret a two sided CI using a Bootstrap-t CI.
(d) Why might a one-sided interval be better here?
(e) Choose one of your intervals above to find and interpret a one sided interval.
Note: Make sure to be clear about the parameter of interest here (that is, what parameter are you finding a confidence interval for?)
.2. Why is the percentile interval range respecting but the BS-t CI is not? Why would that matter?
retinol = read.table("http://pages.pomona.edu/~jsh04747/courses/math152/retinol.csv",
header = T, sep = ",")
attach(retinol)
names(retinol)
## [1] "notinfect" "infect"
notinfect # note that you don't really want all those NA values!!
## [1] 0.59 1.44 0.35 0.99 0.83 1.17 1.08 1.04 0.99 0.35 0.35 0.35 0.88 0.67
## [15] 1.22 0.94 0.67 0.23 0.62 0.86 1.15 1.00 0.31 0.34 0.46 0.90 1.13 1.02
## [29] 0.58 0.49 0.39 0.70 0.67 1.11 1.36 NA NA NA NA NA NA NA
## [43] NA NA NA NA NA NA NA NA NA NA NA NA NA
notinfect2 = notinfect[1:35]
# check:
mean(notinfect2)/mean(infect)
## [1] 1.255
Hesketh and Everitt (2000) report on a study by Caplehorn and Bell (1991) that investigated the times (in days) that heroin addicts remained in a clinic for methadone maintenance treatment. The data in heroin.txt include the amount of time that the subjects stayed in the facility until treatment was terminated (column 4). For about 37% of the subjects, the study ended while they were still the in clinic (status=0). Thus, their survival time has been truncated. For this reason we might not want to estimate the mean survival time, but rather some other measure of typical survival time. Below we explore using the median as well as the 25% trimmed mean. We treat the group of 238 patients as representative of the population. [From ISCAM Chance & Rossman, Investigation 4.5.3]
heroin = read.table("http://pages.pomona.edu/~jsh04747/courses/math58/Math58Data/HEROIN.TXT",
header = T)
obs.stat <- median(heroin[, 4])
obs.stat
## [1] 367.5
obs.stat2 <- mean(heroin[, 4], trim = 0.25)
obs.stat2
## [1] 378.3
heroin.rs <- sample(heroin[, 4], 238, replace = T)
median(heroin.rs)
## [1] 315.5
mean(heroin.rs, trim = 0.25)
## [1] 350.6
test.stat <- c()
test.stat2 <- c()
sd.test.stat <- c()
sd.test.stat2 <- c()
for (i in 1:1000) {
heroin.rs <- sample(heroin[, 4], 238, replace = T)
test.stat <- c(test.stat, median(heroin.rs))
test.stat2 <- c(test.stat2, mean(heroin.rs, trim = 0.25))
test.stat.rs <- c()
test.stat2.rs <- c()
for (j in 1:1000) {
heroin.rsrs <- sample(heroin.rs, 238, replace = T)
test.stat.rs <- c(test.stat.rs, median(heroin.rsrs))
test.stat2.rs <- c(test.stat2.rs, mean(heroin.rsrs, trim = 0.25))
}
sd.test.stat <- c(sd.test.stat, sd(test.stat.rs))
sd.test.stat2 <- c(sd.test.stat2, sd(test.stat2.rs))
}
par(mfcol = c(2, 3))
hist(heroin[, 4])
hist(heroin.rs)
hist(test.stat)
qqnorm(test.stat)
qqline(test.stat)
hist(test.stat2)
qqnorm(test.stat2)
qqline(test.stat2)
# median
mean(test.stat)
## [1] 374.9
sd(test.stat)
## [1] 32.76
# 25% trimmed mean
mean(test.stat2)
## [1] 378.7
sd(test.stat2)
## [1] 23.05
# OR - using R's built in functions
bs.proc <- bootstrap(heroin[, 4], nboot = 1000, theta = median)
bs.proc2 <- bootstrap(heroin[, 4], nboot = 1000, theta = mean, trim = 0.25)
c(sort(test.stat)[25], sort(test.stat)[975])
## [1] 319.5 450.0
c(sort(test.stat2)[25], sort(test.stat2)[975])
## [1] 333.7 422.6
# OR - using R's built in functions
c(sort(bs.proc$thetastar)[25], sort(bs.proc$thetastar)[975])
## [1] 319.5 450.0
c(sort(bs.proc2$thetastar)[25], sort(bs.proc2$thetastar)[975])
## [1] 335.4 426.9
c(obs.stat - qt(0.975, 237) * sd(test.stat), obs.stat + qt(0.975, 237) * sd(test.stat))
## [1] 303 432
c(obs.stat2 - qt(0.975, 237) * sd(test.stat2), obs.stat2 + qt(0.975, 237) *
sd(test.stat2))
## [1] 332.9 423.7
# OR - using R's built in functions
se.bs <- sd(bs.proc$thetastar)
se.bs2 <- sd(bs.proc2$thetastar)
c(obs.stat - qt(0.975, 237) * se.bs, obs.stat + qt(0.975, 237) * se.bs)
## [1] 304.4 430.6
c(obs.stat2 - qt(0.975, 237) * se.bs2, obs.stat2 + qt(0.975, 237) * se.bs2)
## [1] 333.1 423.5
t.hat <- (test.stat - obs.stat)/sd.test.stat
t.hat2 <- (test.stat2 - obs.stat2)/sd.test.stat2
c(obs.stat - sort(t.hat)[975] * sd(test.stat), obs.stat - sort(t.hat)[25] *
sd(test.stat))
## [1] 292.8 424.4
c(obs.stat2 - sort(t.hat2)[975] * sd(test.stat2), obs.stat2 - sort(t.hat2)[25] *
sd(test.stat2))
## [1] 334.0 425.6
# OR - using R's built in functions
boott(heroin[, 4], 1000, theta = median)
## $confpoints
## 0.001 0.01 0.025 0.05 0.1 0.5 0.9 0.95 0.975 0.99 0.999
## [1,] 285.1 303.4 310.3 325.9 335 367.5 388 394.1 400.1 406.1 417.4
##
## $theta
## NULL
##
## $g
## NULL
##
## $call
## boott(x = heroin[, 4], theta = median, 1000)
boott(heroin[, 4], 1000, theta = mean, trim = 0.25)
## $confpoints
## 0.001 0.01 0.025 0.05 0.1 0.5 0.9 0.95 0.975 0.99 0.999
## [1,] 311.9 316.4 333.1 338.2 346.6 379.3 410.4 422.5 431.6 445.3 468.3
##
## $theta
## NULL
##
## $g
## NULL
##
## $call
## boott(x = heroin[, 4], theta = mean, 1000, trim = 0.25)
test.stat.jk <- c()
test.stat2.jk <- c()
for (i in 1:length(heroin[, 4])) {
test.stat.jk <- c(test.stat.jk, median(heroin[-i, 4]))
test.stat2.jk <- c(test.stat2.jk, mean(heroin[-i, 4], trim = 0.25))
}
zo.hat <- qnorm(sum(test.stat < obs.stat)/1000, 0, 1)
a.hat <- sum((mean(test.stat.jk) - test.stat.jk)^3)/(6 * (sum((mean(test.stat.jk) -
test.stat.jk)^2)^1.5))
zo.hat2 <- qnorm(sum(test.stat2 < obs.stat2)/1000, 0, 1)
a.hat2 <- sum((mean(test.stat2.jk) - test.stat2.jk)^3)/(6 * (sum((mean(test.stat2.jk) -
test.stat2.jk)^2)^1.5))
alpha1.bca <- pnorm(zo.hat + (zo.hat + qnorm(0.975))/(1 - a.hat * (zo.hat +
qnorm(0.975))))
alpha2.bca <- pnorm(zo.hat + (zo.hat + qnorm(0.025))/(1 - a.hat * (zo.hat +
qnorm(0.025))))
alpha1.bca2 <- pnorm(zo.hat2 + (zo.hat2 + qnorm(0.975))/(1 - a.hat2 * (zo.hat2 +
qnorm(0.975))))
alpha2.bca2 <- pnorm(zo.hat2 + (zo.hat2 + qnorm(0.025))/(1 - a.hat2 * (zo.hat2 +
qnorm(0.025))))
c(sort(test.stat)[ceiling(1000 * alpha2.bca)], sort(test.stat)[ceiling(1000 *
alpha1.bca)])
## [1] 317.0 444.5
c(sort(test.stat2)[ceiling(1000 * alpha2.bca2)], sort(test.stat2)[ceiling(1000 *
alpha1.bca2)])
## [1] 332.2 421.8
# OR - using R's built in functions
bcanon(heroin[, 4], 1000, theta = median)
## $confpoints
## alpha bca point
## [1,] 0.025 308.0
## [2,] 0.050 321.0
## [3,] 0.100 337.0
## [4,] 0.160 342.0
## [5,] 0.840 396.5
## [6,] 0.900 406.5
## [7,] 0.950 434.0
## [8,] 0.975 447.5
##
## $z0
## [1] -0.07024
##
## $acc
## [1] 0
##
## $u
## [1] 367 368 368 368 368 367 367 367 367 367 368 367 367 367 368 367 367
## [18] 367 367 367 368 368 368 367 368 367 367 368 367 367 368 367 367 367
## [35] 367 367 367 367 367 367 367 367 367 367 367 367 368 367 367 367 368
## [52] 367 367 367 367 368 367 367 367 368 368 367 367 367 368 367 368 367
## [69] 367 367 368 367 367 367 368 368 368 368 367 368 368 367 367 367 367
## [86] 367 368 368 367 367 368 367 368 367 367 368 367 367 367 368 367 367
## [103] 367 368 368 368 367 367 367 367 367 367 367 367 367 368 368 367 368
## [120] 367 367 368 368 367 367 367 367 367 368 368 367 368 368 368 368 367
## [137] 368 367 368 367 368 367 368 367 367 368 367 367 367 368 368 368 367
## [154] 367 367 368 367 368 368 367 367 367 368 368 368 367 368 367 368 368
## [171] 368 368 367 368 368 368 368 368 368 368 368 368 367 368 368 368 367
## [188] 368 368 367 368 368 368 368 368 368 367 368 368 367 368 368 368 368
## [205] 368 368 368 368 368 368 367 368 368 368 368 368 368 368 368 368 368
## [222] 367 368 368 368 368 367 368 368 368 368 367 368 368 367 367 368 368
##
## $call
## bcanon(x = heroin[, 4], nboot = 1000, theta = median)
bcanon(heroin[, 4], 1000, theta = mean, trim = 0.25)
## $confpoints
## alpha bca point
## [1,] 0.025 332.9
## [2,] 0.050 340.1
## [3,] 0.100 348.7
## [4,] 0.160 355.2
## [5,] 0.840 398.4
## [6,] 0.900 404.8
## [7,] 0.950 416.3
## [8,] 0.975 421.6
##
## $z0
## [1] -0.05015
##
## $acc
## [1] -2.176e-05
##
## $u
## [1] 377.9 379.2 379.3 379.9 379.3 376.5 377.8 376.5 376.5 378.2 380.1
## [12] 376.5 377.1 376.5 379.7 378.1 376.5 377.2 377.2 376.5 379.7 378.6
## [23] 379.0 376.5 379.3 376.7 378.4 378.9 376.5 376.5 379.0 376.7 378.2
## [34] 376.5 376.5 376.5 376.5 376.9 376.5 376.5 376.6 377.1 377.2 376.5
## [45] 376.5 376.8 380.1 377.4 377.1 376.5 380.1 376.7 376.5 376.5 376.5
## [56] 380.0 377.7 376.5 377.3 379.3 380.0 378.2 377.8 376.7 378.6 376.5
## [67] 379.9 378.1 376.5 376.5 379.4 376.5 376.5 377.1 378.6 379.0 379.4
## [78] 380.1 378.3 379.7 380.1 377.0 377.1 376.5 378.1 376.5 380.1 380.1
## [89] 377.4 376.9 379.8 377.5 379.5 377.1 376.5 380.1 377.6 376.5 376.5
## [100] 380.1 377.7 376.8 377.6 380.1 380.1 380.1 377.0 376.5 376.5 376.5
## [111] 376.5 376.5 376.5 376.5 376.5 380.1 380.1 376.5 378.4 376.5 376.5
## [122] 379.5 380.1 376.7 376.5 376.5 376.5 376.5 379.9 380.1 376.5 380.1
## [133] 379.1 378.5 378.7 376.5 380.1 376.7 379.2 376.5 378.8 376.5 380.1
## [144] 376.5 376.6 380.1 376.5 376.5 377.6 379.5 380.1 380.1 376.5 376.5
## [155] 376.5 379.7 376.5 379.2 379.6 376.5 377.3 378.2 380.1 380.1 378.5
## [166] 377.0 378.8 377.6 380.1 380.1 378.5 380.1 377.6 380.1 380.1 380.1
## [177] 380.1 380.1 379.6 380.1 379.8 380.1 376.6 380.0 380.1 380.1 376.9
## [188] 379.7 379.9 377.8 378.4 378.6 380.1 378.6 380.0 380.1 376.9 380.1
## [199] 379.1 377.0 379.7 379.7 380.1 380.1 380.1 380.1 379.3 380.1 378.6
## [210] 380.1 377.0 380.1 380.1 380.1 380.1 380.1 380.1 380.1 380.1 380.1
## [221] 380.1 378.2 380.1 380.0 380.0 378.8 377.4 378.7 379.1 379.8 378.4
## [232] 377.0 380.1 380.1 376.9 376.8 380.1 380.1
##
## $call
## bcanon(x = heroin[, 4], nboot = 1000, theta = mean, trim = 0.25)
Note: Retinol data provided by Francisco Rosales of the Department of Nutritional Sciences, Penn State University. See Rosales et al., “Relation of serum retinol to acute phase proteins and malarial morbidity in Papua New Guinea children,” American Journal of Clinical Nutrition, 71 (2000), pp. 1580-1588. Problem taken from “Bootstrap Methods and Permutation Tests” by Hesterberg et al., Chapter 16, 6e, IPS.