\documentclass{amsart} \usepackage[text={170mm,200mm},centering]{geometry} \usepackage{hyperref} \input{defs} \SweaveOpts{prefix.string=figs-ass2,keep.source=TRUE} \setkeys{Gin}{width=0.98\textwidth} <>= library(lattice) lattice.options(default.theme = standard.theme(color = FALSE)) options(show.signif.stars = FALSE) @ \title{Linear Models and GLM: Quiz 2 Solutions} \newcommand{\vtheta}{\ensuremath{\vect{\theta}}} \setlength{\parskip}{2mm} \begin{document} \setcounter{qnum}{1} \maketitle \raggedright \begin{question} For the \texttt{UCBAdmissions} dataset in R, convert it into a data frame using \Rfunction{as.data.frame.table}, and \begin{itemize} \item[(a)] Fit a suitable Poisson GLM model using the \R{} function \Rfunction{glm} \item[(b)] Fit a suitable log-linear model with the \Rfunction{loglm} function in the \Rpackage{MASS} package, which is a formula-based interface (similar to that of \Rfunction{glm}) to \Rfunction{loglin}. \end{itemize} In your answer, you only need to provide the \R{} code you used (please use a fixed-width font). \end{question} \subsection*{Solution} The models fit are full interaction (saturated) models. <<>>= library(MASS) UCBdf <- as.data.frame(as.table(UCBAdmissions)) fm.poisson <- glm(Freq ~ Dept * Admit * Gender, data = UCBdf, family = poisson()) fm.loglm <- loglm(Freq ~ Dept * Admit * Gender, data = UCBdf) @ % In hindsight, I should have asked you to also fit a Binomial model, as product Binomial is the more natural sampling scheme for this data. Specifically, we can think of all Department-Gender combinations as the populations of interest we condition on, and Admission as success (Rejection as failure). This needs the data to be in a slightly different format. <<>>= UCBadmit <- as.data.frame(as.table(UCBAdmissions["Admitted",,,drop = FALSE])) UCBreject <- as.data.frame(as.table(UCBAdmissions["Rejected",,,drop = FALSE])) UCBadmit$Admitted <- UCBadmit$Freq; UCBadmit$Freq <- NULL; UCBadmit$Admit <- NULL UCBreject$Rejected <- UCBreject$Freq; UCBreject$Freq <- NULL; UCBreject$Admit <- NULL UCBmerged <- merge(UCBadmit, UCBreject) str(UCBmerged) fm.bin <- glm(cbind(Admitted, Rejected) ~ Dept * Gender, data = UCBmerged, family = binomial()) @ \begin{question} \begin{itemize} \item[(a)] For both approaches above, formulate and test the hypothesis that there is no gender bias in the rates of admission. Compare the results for the two approaches. \item[(b)] If there is evidence for a gender bias, what is the direction of the bias? \end{itemize} \end{question} \subsection*{Solution} The null model is obvious in the Binomial GLM case. <<>>= fm0.bin <- glm(cbind(Admitted, Rejected) ~ Dept, data = UCBmerged, family = binomial()) anova(fm0.bin, fm.bin, test = "Chisq") @ % For the Poisson and log-linear model cases, the null model is not as trivial, but a little thought should make it clear that the only terms that need to be dropped are those involving any \code{Gender:Admit} interactions. This leads to <<>>= fm0.poisson <- glm(Freq ~ Dept * Admit + Dept * Gender, data = UCBdf, family = poisson()) fm0.loglm <- loglm(Freq ~ Dept * Admit + Dept * Gender, data = UCBdf) anova(fm0.poisson, fm.poisson, test = "Chisq") anova(fm0.loglm) @ % Naturally, all three approaches give identical results. There is some evidence for gender bias. To isolate the direction of the bias, we can look at the signs of the extra terms in the full model. This is most conveniently extracted from the \Rfunction{loglm} fit. <<>>= coef.fm <- coef(fm.loglm) names(coef.fm) coef.fm[c("Admit.Gender", "Dept.Admit.Gender")] @ % The overall effect of gender on admission rates is that females have a slightly higher rate of admission. Broken up by department, the additional effect of gender on admission rates is in the same direction for all but two departments (A and B). Combining the two effects, only department A shows higher admission rates for males. In other words, the direction of the actual gender bias is the opposite of what we would have expected from the data from all departments combined. \end{document}