Re: Missing mixed continuous and categorical data

From: Nick Holford Date: May 03, 2005 technical Source: cognigencorp.com
From: "Nick Holford" n.holford@auckland.ac.nz Subject: Re: [NMusers] Missing mixed continuous and categorical data Date: Tue, May 3, 2005 7:30 am Hi, There are several approaches to your problem. 1. The simplest is to impute the missing value with the median of the non-missing values. I suspect this is the most widely used method. 2. You can use multiple imputation to generate say 6 separate data sets each with imputed values drawn from either a theoretical or empirical distribution of the covariates. Then fit each of the 6 data sets and use the mean of the 6 estimates for each parameter as the final model estimate. The choice of 6 is "Rubin's Rule" -- suggested by Don Rubin (one of the originators of the multiple imputation concept). You can try bigger numbers of imputed until you find the mean converges but 6 is often enough. 3. You can construct a joint model for the covariate distribution and the PKPD model. This means including all the covariates as DV values and constructing a model (usually quite simple) to predict each covariate. The first method is simple but ignores correlation between covariates. The second and third methods allow you to account for the covariance of covariates. It can be tricky to know what to do when you have both categorical and continuous variables. If you have lots of patients for each category e.g. the category is sex and about half of the sample is male then you can construct two multivariate normal distributions for the continuous covariates (one for males and one for females). If you have many categories then you can try treating the categorical covariate as if it was a continuous value. Stacey Tannenbaum (stacey.tannenbaum@pharma.novartis.com) and Ivan Matthews (Ivan.Matthews@postgrad.manchester.ac.uk) have both worked on this method and may be able to help you. If you use method 3 then you will need to be aware of the isolated eta bug (see http://www.metrumrg.com/publications/LG_ETAbug_full.pdf for details) and use the 'zeta transform' of ETAs in order to capture the correlation between covariates when predicting missing values. Nick
May 02, 2005 Chunlin Chen Missing mixed continuous and categorical data
May 03, 2005 Nick Holford Re: Missing mixed continuous and categorical data
May 04, 2005 Mats Karlsson RE: Missing mixed continuous and categorical data