Theoretical Statistics and Mathematics Unit, ISI Delhi

December 23, 2014 (Tuesday) ,
3:30 PM at Webinar

Speaker:
Bhramar Mukherjee,
University of Michigan School of Public Health, USA

Title:
Shrinkage Methods Utilizing Auxiliary Information to Improve Prediction Models with Many Covariates

Abstract of Talk

We consider predicting an outcome $Y$ using a large number of covariates $X.$ However, most of the data we have to fit the model contains only $Y$ and $W,$ which is a noisy surrogate for $X,$ and only on a small number of observations do we observe $Y, X,$ and $W.$ We develop Ridge-type shrinkage methods that trade-off between bias and variance in a data-adaptive way to yield smaller prediction error. We also demonstrate how the problem can be treated in a full Bayesian context with different forms of adaptive shrinkage.
Our work is motivated by the rapid development of genomic assay technologies. In our application, mRNA expression of 91 genes is measured by quantitative real-time polymerase chain reaction (qRT-PCR, X) and microarray technology (W) on 47 lung cancer patients with microarray measurements available on an additional 392 patients. For future patients, the goal is to predict survival time (Y) using qRT-PCR. The methods are evaluated on an independent test sample of 100 patients.
The high-dimensionality of the problem, the large fraction of missing covariate information, the matched structure of $X$ and $W,$ and the fact that we are interested in a prediction model for $Y|X$ (rather than $Y|W$) make this an intriguing but non-standard problem. This is joint work with Philip S. Boonstra and Jeremy Taylor from the University of Michigan.