Data Analysis with Missing Values

Abstract: Missing observations is a frequent occurrence in data sets, especially in large and/or multivariate data sets. Ignoring missing cases in univariate situations and ignoring a case with one or more missing values in a multivariate observation (casewise deletion) or computing with only the available values in multivariate observations (listwise deletion) are not good solutions to the problem. Often, `missingness' does not occur at random and is related to the missing and available observations. Ideally, a solution to the problem consists of building a model for `missingness' into the statistical formulation of the problem; this is not easy, however.

In this talk, we first present results of analysis with casewise deletion and listwise deletion and point out the flaws in such analysis. We then present some of the concepts and methods available for dealing with missing data. We discuss the concepts of Missing at Random (MAR), Missing Completely at Random (MCAR), Ignorability of Missingness, etc. We present some of the methods available for dealing with missing data, such as Multiple Imputation, Regression Imputation, EM Imputation, etc.