Re: Missing mixed continuous and categorical data
From: "Nick Holford" n.holford@auckland.ac.nz
Subject: Re: [NMusers] Missing mixed continuous and categorical data
Date: Tue, May 3, 2005 7:30 am
Hi,
There are several approaches to your problem.
1. The simplest is to impute the missing value with the median of the non-missing
values. I suspect this is the most widely used method.
2. You can use multiple imputation to generate say 6 separate data sets each with
imputed values drawn from either a theoretical or empirical distribution of the
covariates. Then fit each of the 6 data sets and use the mean of the 6 estimates for
each parameter as the final model estimate. The choice of 6 is "Rubin's Rule" --
suggested by Don Rubin (one of the originators of the multiple imputation concept).
You can try bigger numbers of imputed until you find the mean converges but 6 is
often enough.
3. You can construct a joint model for the covariate distribution and the PKPD
model. This means including all the covariates as DV values and constructing a model
(usually quite simple) to predict each covariate.
The first method is simple but ignores correlation between covariates. The second
and third methods allow you to account for the covariance of covariates. It can be
tricky to know what to do when you have both categorical and continuous variables.
If you have lots of patients for each category e.g. the category is sex and about
half of the sample is male then you can construct two multivariate normal
distributions for the continuous covariates (one for males and one for females). If
you have many categories then you can try treating the categorical covariate as if
it was a continuous value. Stacey Tannenbaum (stacey.tannenbaum@pharma.novartis.com)
and Ivan Matthews (Ivan.Matthews@postgrad.manchester.ac.uk) have both worked on this
method and may be able to help you.
If you use method 3 then you will need to be aware of the isolated eta bug
(see http://www.metrumrg.com/publications/LG_ETAbug_full.pdf for details) and use the 'zeta transform' of ETAs
in order to capture the correlation between covariates when predicting missing
values.
Nick