A two-step Empirical Likelihood based approach for combining sample and population data in Generalised Linear Models

Abstract: Apart from the sample, sometimes some information on the relationship of explanatory variables with the dependent variable may be known from population-level data. Using the method of constrained maximum likelihood estimation it has been shown that it is possible to include such population-level information and achieve a large reduction in the bias and variance of the estimates of these regression coefficients. We propose here an alternative 2-step empirical likelihood based approach. We first compute optimal weights for the sample, which both maximise the empirical likelihood and satisfy the population constraints. These weights are then used to produce the parameter and standard error estimates. Like the constrained MLE the use of population constraints lead to substantially lower standard errors. However this two-step approach is computationally much less intensive, easily allowing for estimation with multiple population constraints and multiple covariates. We shall also discuss other applications of the methodology and indicate directions of future research.

(Joint work with Mark Handcock, Department of Statistics, University of Washington, Seattle and Michael Rendall, Rand Corporation.)