\documentclass{amsart}
\usepackage[text={170mm,200mm},centering]{geometry}
\usepackage{hyperref}

\input{defs}

\SweaveOpts{prefix.string=figs-ass2,keep.source=TRUE}
\setkeys{Gin}{width=0.98\textwidth}

<<echo=FALSE>>=
library(lattice)
lattice.options(default.theme = standard.theme(color = FALSE))
options(show.signif.stars = FALSE)
@ 

\title{Linear Models and GLM: Quiz 2 Solutions}

\newcommand{\vtheta}{\ensuremath{\vect{\theta}}}

\setlength{\parskip}{2mm}

\begin{document}
\setcounter{qnum}{1}

\maketitle
\raggedright

\begin{question}
  For the \texttt{UCBAdmissions} dataset in R, convert it into a data
  frame using \Rfunction{as.data.frame.table}, and 
  \begin{itemize}
  \item[(a)] Fit a suitable Poisson GLM model using the \R{} function
    \Rfunction{glm}
  \item[(b)] Fit a suitable log-linear model with the
    \Rfunction{loglm} function in the \Rpackage{MASS} package, which
    is a formula-based interface (similar to that of \Rfunction{glm})
    to \Rfunction{loglin}.
  \end{itemize}
  In your answer, you only need to provide the \R{} code you used
  (please use a fixed-width font).
\end{question}


\subsection*{Solution}

The models fit are full interaction (saturated) models.
<<>>=

library(MASS)
UCBdf <- as.data.frame(as.table(UCBAdmissions))
fm.poisson <- glm(Freq ~ Dept * Admit * Gender, data = UCBdf, family = poisson())
fm.loglm <- loglm(Freq ~ Dept * Admit * Gender, data = UCBdf)

@ 
%
In hindsight, I should have asked you to also fit a Binomial model, as
product Binomial is the more natural sampling scheme for this data.
Specifically, we can think of all Department-Gender combinations as
the populations of interest we condition on, and Admission as success
(Rejection as failure).  This needs the data to be in a slightly
different format.
<<>>=

UCBadmit <- as.data.frame(as.table(UCBAdmissions["Admitted",,,drop = FALSE]))
UCBreject <- as.data.frame(as.table(UCBAdmissions["Rejected",,,drop = FALSE]))
UCBadmit$Admitted <- UCBadmit$Freq; UCBadmit$Freq <- NULL; UCBadmit$Admit <- NULL
UCBreject$Rejected <- UCBreject$Freq; UCBreject$Freq <- NULL; UCBreject$Admit <- NULL
UCBmerged <- merge(UCBadmit, UCBreject)
str(UCBmerged)

fm.bin <- glm(cbind(Admitted, Rejected) ~ Dept * Gender, 
              data = UCBmerged, family = binomial())

@ 

\begin{question}
  \begin{itemize}
  \item[(a)] For both approaches above, formulate and test the hypothesis
    that there is no gender bias in the rates of admission.  Compare
    the results for the two approaches.
  \item[(b)] If there is evidence for a gender bias, what is the direction of
    the bias?
  \end{itemize}
\end{question}


\subsection*{Solution}

The null model is obvious in the Binomial GLM case.
<<>>=
fm0.bin <- glm(cbind(Admitted, Rejected) ~ Dept, data = UCBmerged, family = binomial())
anova(fm0.bin, fm.bin, test = "Chisq")
@ 
%
For the Poisson and log-linear model cases, the null model is not as
trivial, but a little thought should make it clear that the only terms
that need to be dropped are those involving any \code{Gender:Admit}
interactions.  This leads to
<<>>=

fm0.poisson <- glm(Freq ~ Dept * Admit + Dept * Gender, data = UCBdf, family = poisson())
fm0.loglm <- loglm(Freq ~ Dept * Admit + Dept * Gender, data = UCBdf)
anova(fm0.poisson, fm.poisson, test = "Chisq")
anova(fm0.loglm)
@ 
%
Naturally, all three approaches give identical results.

There is some evidence for gender bias.  To isolate the direction of
the bias, we can look at the signs of the extra terms in the full
model.  This is most conveniently extracted from the \Rfunction{loglm}
fit.
<<>>=
coef.fm <- coef(fm.loglm)
names(coef.fm)
coef.fm[c("Admit.Gender", "Dept.Admit.Gender")]
@ 
%
The overall effect of gender on admission rates is that females have a
slightly higher rate of admission.  Broken up by department, the
additional effect of gender on admission rates is in the same
direction for all but two departments (A and B).  Combining the two
effects, only department A shows higher admission rates for males.  In
other words, the direction of the actual gender bias is the opposite
of what we would have expected from the data from all departments
combined.

\end{document}