R code discussed in class

This page summarizes R code discussed in class.

The first example is a simulation study to assess how well the bootstrap estimates the variance of the sample median (and mean). The first part defines some basic functions and uses them to decide on a suitable choice of B (the number of bootstrap replications used to estimate the variance).

## Simulation study of bootstrap estimation of variance

bs.sample <- function(x) sample(x, replace = TRUE)

bs.estimate <- function(n, rdist, B = 20, statistic = median, aggregate = var, nrep = 1, ...)
{
    x <- rdist(n, ...)
    ## repeat this 'nrep' times to assess bootstrap variability (effect of B)
    replicate(nrep,
              aggregate(replicate(B, 
                                  statistic(bs.sample(x)))))
}

## Good choice of B?

var.est <- as.vector(replicate(10, bs.estimate(10, rdist = runif, B = 20, nrep = 4)))
plot(var.est, pch = 16, col = rep(c("blue", "red"), c(4, 4)))

var.est <- as.vector(replicate(10, bs.estimate(10, rdist = runif, B = 100, nrep = 4)))
plot(var.est, pch = 16, col = rep(c("blue", "red"), c(4, 4)))

var.est <- as.vector(replicate(10, bs.estimate(1000, rdist = runif, B = 200, nrep = 4)))
plot(var.est, pch = 16, col = rep(c("blue", "red"), c(4, 4)))

var.est <- as.vector(replicate(10, bs.estimate(1000, rdist = runif, B = 500, nrep = 4)))
plot(var.est, pch = 16, col = rep(c("blue", "red"), c(4, 4)))

The next part computes bootstrap variance for various sample sizes and plots them. This could be repeated using different underlying distributions, sample sizes, and B. Instead of median, we could also use the mean as the statistic of interest.

Nseq <- seq(11, 10001, by = 200)

pb <- txtProgressBar(min = Nseq[1], max = Nseq[length(Nseq)], style = 3)
var.est <- 
    sapply(Nseq,
           function(n) {
               setTxtProgressBar(pb, n)
               n * bs.estimate(n, rdist = rlnorm, B = 500, statistic = median)
           })
close(pb)
plot(Nseq, var.est)