\documentclass{amsart}
\usepackage[text={170mm,200mm},centering]{geometry}
\usepackage{hyperref}

\input{defs}

\SweaveOpts{prefix.string=figs-ass3,keep.source=TRUE,eps=FALSE}
\setkeys{Gin}{width=0.98\textwidth}

<<echo=FALSE>>=
library(lattice)
lattice.options(default.theme = standard.theme(color = FALSE))
@ 

\title{Linear Models and GLM: Assignment 3 Partial Sample Solutions}

\begin{document}

\maketitle
\raggedright
\setcounter{qnum}{2}

\begin{exercise}
  Implement an \R{} function to perform Tukey's one degree of freedom
  test for non-additivity.  Your function should accept a
  2-dimensional table (matrix) as its argument.  How would you use
  your function if the data is supplied as a data frame? [Hint:
  \code{?xtabs}]
\end{exercise}

\subsection*{Solution:}

Here is one implementation that returns a list containing the test
statistic and the correponding $p$-value:

<<>>=

tukey.1df <- function(y)
    # y: matrix of observations
{
    df.num <- nrow(y) * ncol(y) - nrow(y) - ncol(y)
    y_.. <- mean(y)
    y_i. <- rowMeans(y)
    y_.j <- colMeans(y)
    C <- sum(y * ((y_i. - y_..) %*% t(y_.j - y_..)))
    D1 <- sum((y_i. - y_..)^2)
    D2 <- sum((y_.j - y_..)^2)
    MSI <- SSI <- C^2 / (D1 * D2)
    SSE <- sum((y - outer(y_i., y_.j, "+") + y_..)^2) - SSI
    MSE <- SSE / df.num
    pvalue <- pf(MSI/MSE, 1, df.num, lower.tail = FALSE)
    round(c(MSI = MSI, MSE = MSE, F = MSI/MSE, pvalue = pvalue), 4)
}    

@ 


\begin{exercise}
  Use your function to perform the test for non-additivity in the
  \code{VADeaths} dataset.
\end{exercise}

\subsection*{Solution:}

<<>>=
tukey.1df(VADeaths)
@ 

\begin{exercise}
  Implement an \R{} function (or a suitable collection of \R{} functions) to
  obtain an optimum Box-Cox transformation, including a profiled
  log-likelihood plot and a confidence interval.  
  \vspace{2mm}
\end{exercise}

\subsection*{Solution:}

It is best to implement a solution as a series of small functions.
The main challenge is to manipulate the response in a formula-based
implementation, because we need to fit a transformed model multiple
times where the original response is replaced by a transformation.  A
simple solution is to coerce the formula into a character vector.  The
response will be the second component:
<<>>=
as.character(rvar ~ a + b * c)
@ 
%
We first define helper functions for transforming the data and
computing the profiled log-likelihood.
<<>>=
bctransform <- function(y, lambda)
{
    if (lambda == 0) log(y)
    else (y^lambda - 1) / lambda
}

compute.loglik <- function(formula, data, lambda, sum.log.y)
{
    fm <- lm(formula, data)
    R02 <- sum(residuals(fm)^2)
    n <- nrow(data)
    Lmax <- -0.5 * n * (log(R02/n) + log(2 * pi) + 1) + (lambda-1) * sum.log.y
}
@ 
%
Next, we define the main function that computes the profiled
log-likelihood values for a range of $\lambda$ values.
<<>>=
boxcox <- function(formula, data, 
                   from = -2, to = 2, cuts = 101,
                   conf.level = 0.99)
{
    resp.var <- as.character(formula)[2]
    y.values <- data[[ resp.var ]]
    if (any(y.values <= 0)) 
        stop("The response variable must be strictly positive")
    sum.log.y <- sum(log(y.values))
    lambda.grid <- seq(from = from, to = to, length = cuts)
    ploglik <- rep(NA, cuts)
    for (i in seq(along = lambda.grid))
    {
        data[[ resp.var ]] <- 
            bctransform(y.values, lambda = lambda.grid[i])
        ploglik[i] <- 
            compute.loglik(formula, data, 
                           lambda = lambda.grid[i],
                           sum.log.y = sum.log.y)
    }
    ci.include <- (2 * (max(ploglik) - ploglik) < qchisq(conf.level, df = 1))
    ans <- list(lambda = lambda.grid, ploglik = ploglik, 
                conf.int = ci.include, conf.level = conf.level)
    class(ans) <- "lboxcox"
    ans
}
@ 
%
This function returns an object of class \textsf{``lboxcox''}.  We
next want to define \Rfunction{plot} and \Rfunction{print} methods for
such objects.  The \Rfunction{print} method will simply print the
point estimate and the confidence interval.
<<>>=
print.lboxcox <- function(x, ...)
{
    cat(sprintf("Optimal Box-Cox parameter (%g confidence interval): ", x$conf.level))
    lambda.hat <- with(x, lambda[ which.max(ploglik) ])
    ci.start <- with(x, min(lambda[conf.int]))
    ci.end <- with(x, max(lambda[conf.int]))
    cat(sprintf("%g (%g, %g)\n", lambda.hat, ci.start, ci.end))
}
@ 
%
Similarly, the \Rfunction{plot} method will plot the profiled
log-likelihood curve, and add vertical lines indicating the confidence
interval.
<<>>=
plot.lboxcox <- function(x, ...) 
{
    with(x, 
     {
         plot(lambda, ploglik, type = "l", 
              main = sprintf("Box-Cox transformation: %g confidence interval", 
                             conf.level),
              xlab = expression(lambda), 
              ylab = "Profiled log-likelihood")
         ci.start <- min(lambda[conf.int])
         ci.end <- max(lambda[conf.int])
         abline(v = c(ci.start, ci.end), col = "grey")
     })
}
@ 

\begin{exercise}
  Use your function to find an optimal power transformation in the
  \code{VADeaths} dataset.  Plot the transformed values as before (see
  Assignment 2).  Does the plot ``look'' more additive?  Perform
  Tukey's test for non-additivity on the transformed data.
\end{exercise}

\subsection*{Solution:}

To use the formula interface, we first need to convert the
\code{VADeaths} table into a data frame.
<<>>=
names(dimnames(VADeaths)) <- c("AgeGroup", "PopGroup")
VADeaths.df <- as.data.frame(as.table(VADeaths), responseName = "DeathRate")
str(VADeaths.df)
@ 

Next, we compute the profiled log-likelihoods by calling 
\Rfunction{boxcox} and print and plot the result.

<<fig=TRUE,width=9, height=4>>=
bc.curve <- boxcox(DeathRate ~ AgeGroup + PopGroup, data = VADeaths.df, 
                   conf.level = 0.95, cuts = 201)
bc.curve
plot(bc.curve)
@ 
%
The optimal $\lambda$ is 0.28.  The interaction plot of the
transformed response is more consistent with additivity:
<<>>=
library(lattice)
dotplot(bctransform(DeathRate, 0.28) ~ reorder(PopGroup, DeathRate), data = VADeaths.df, 
        groups = AgeGroup, type = c("p", "a"), auto.key = list(columns = 5))
@ 
<<echo=FALSE,fig=TRUE,width=9,height=5.5>>=
plot(trellis.last.object())
@

\newpage

Applying Tukey's test of non-additivity to the transformed data gives
<<>>=
tukey.1df(bctransform(VADeaths, 0.28))
@ 


\begin{exercise}
  Write an \R{} function to fit a linear model minimising the sum of
  absolute errors, using the technique of iteratively reweighted least
  squares (IRLS).  You can use the \Rfunction{lm} function with
  weights.  \vspace{2mm}
  
  You will need to choose a convergence criterion.  What criterion did
  you choose?
\end{exercise}


\subsection*{Solution:}

Here is one implementation.  Note that it is not clear what one should
do for observations where the current residual is zero.  The original
paper of Schlossmacher (\url{http://www.jstor.org/stable/2284512})
suggests setting the corresponding weights to zero (i.e., ignore these
observations for the next least squares fit), but the justification
for doing so is weak.  In fact, doing so causes problems for our
artificial datasets, and I have instead chosen to put an upper bound
on the weights.
%% Interesting paper: http://www.springerlink.com/content/u7516010k0508257/

<<>>=

ladlm <- function(formula, data, conv.cutoff = 1e-10, max.it = 200,
                  eps = 1e-10, plot = FALSE)
{
    fm <- lm(formula, data)
    beta.ols <- beta.old <- coef(fm)
    for (i in seq_len(max.it))
    {
        r <- abs(resid(fm))
        data$lad.weights <- ifelse(r > eps, 1 / r, 1/eps)
        fm <- lm(formula, data, weights = lad.weights)
        beta.new <- coef(fm)
        if (sum((beta.new - beta.old)^2) / sum(beta.old^2) < conv.cutoff)
            break
        beta.old <- beta.new
    }
    if (plot)
    {
        plot(formula, data)
        abline(beta.ols, col = "red")
        abline(beta.new, col = "black")
    }
}    

@ 


\begin{exercise}
  Use the function written in the previous question to fit regression
  lines to Anscombe's four regression datasets (\code{?anscombe} in
  \R{}).  Produce a plot similar to what you get from
  \code{example(anscombe)}, but with regression lines fitted using
  your code instead of the usual least squares regression lines.
\end{exercise}

\subsection*{Solution}

<<anscombe,fig=TRUE,width=8,height=8>>=

par(mfrow = c(2, 2))
ladlm(y1 ~ x1, data = anscombe, plot = TRUE)
ladlm(y2 ~ x2, data = anscombe, plot = TRUE)
ladlm(y3 ~ x3, data = anscombe, plot = TRUE)
ladlm(y4 ~ x4, data = anscombe, plot = TRUE)

@ 


\end{document}