Re: An approach for imputing missing independent variable (covariate)

From: Lewis B. Sheiner Date: September 20, 2000 technical Source: cognigencorp.com
From: LSheiner <lewis@c255.ucsf.edu> Subject: Re: An approach for imputing missing independent variable (covariate) Date: Wed, 20 Sep 2000 09:45:11 -0700 First, let me ask Vladimir why he says his method operates "without assuming any explicit model for a covariate" The inverted model for the DV is an explicit model for the covariate, is it not? More importantly, however, Vladimir's approach has at least two problems: (i) it is non-convergent: each data imputation at step 4 generates a different data set, which will yield a different estimate at step 5. This will never stop. (ii) Even if it converges "well enough" to a "region", it will not yield correct standard errors. To see (ii), imagine the (absurd) situation that all but two data points from one individual were missing: the algorithm would wind up filling in all missing data points from the line defined by the two actual observations (without any error) and would eventually report perfect precision for the estimate of the slope and intercept defined by the two observations. This is not to say that anyone would try such an analysis; it merely points out that the method fails as it approaches a limit, which should make one suspect that it will have problems, perhaps of lesser severity, away from that limit. The reason for the problem is that uncertainty in the (posthoc) parameter estimates is ignored (more on this below). The more difficult issue, though, is how to compute standard errors? The standard errors from the last step of Vladimir's last iteration can't be right, as these are conditional on the imputed data, treating them as known, when in fact they are unknown. A simpler method, which doesn't require an invertable function such as Valdimir's, and which is theoretically sound (i.e. gives unbiased estimates and correct standard errors) is multiple imputation. This method requires the ability to draw samples of the missing data from their posterior distribution. Fitting a population model with the IDV as the DV and then proceeding to get post-hoc parameters and simulating as Vladimir does in his steps 3 and 4 is close, although, as noted, this procedure ignores parameter uncertainty. But again, perhaps doing so won't do too much harm to the eventual standard errors (this depends on the relative magnitude of posterior parameter uncertainty to residual error). Multiple imputation is so simple it can be described easily: 1. Estimate a distribution from which the missing data can be drawn. This can be entirely empirical and should use all the observed data (DV as well as IDV in Valdimir's example). 2. For m = 1,5 Impute the missing data using the distribution in (1). Analyze the competed data as usual to estimate parameters Pm and covariance matrix of estimate Cm. End loop 3. P-hat = average(Pm) 4. Covariance(P-hat) = Covariance(Pm) + average(Cm) This area is one that has received a great deal of attention from some first rate statisticians and we REALLY should follow their lead, or present a very compelling reason not to do so ... For anyone wishing to pursue these matters further, I strongly recommend starting with: 1. Rubin "Multiple Imputation for Non-response in Surveys", Wiley, NY, 1987. 2. Tanner "Tools for Statistical Inference" Springer-Verlag, NY, 1993. Some more ecent references to mult. imput. are: 1: Barnard J, Meng XL. Applications of multiple imputation in medical studies: from AIDS to NHANES. Stat Methods Med Res. 1999 Mar;8(1):17-36. 2. Schafer JL. Multiple imputation: a primer. Stat Methods Med Res. 1999 Mar;8(1):3-15. Review. -- _/ _/ _/_/ _/_/_/ _/_/_/ Lewis B Sheiner, MD (lewis@c255.ucsf.edu) _/ _/ _/ _/_ _/_/ Professor: Lab. Med., Bioph. Sci., Med. _/ _/ _/ _/ _/ Box 0626, UCSF, SF, CA, 94143-0626 _/_/ _/_/ _/_/_/ _/ 415-476-1965 (v), 415-476-2796 (fax)
Sep 11, 2000 Paul S. Collier missing data items
Sep 11, 2000 Lewis B. Sheiner Re: missing data items
Sep 11, 2000 Mats Karlsson Re: missing data items
Sep 11, 2000 Nick Holford Missing data values
Sep 11, 2000 Lewis B. Sheiner Re: missing data items
Sep 20, 2000 Vladimir Piotrovskij An approach for imputing missing independent variable (covariate)
Sep 20, 2000 Leonid Gibiansky RE: An approach for imputing missing independent variable (covariate)
Sep 20, 2000 Lewis B. Sheiner Re: An approach for imputing missing independent variable (covariate)
Sep 20, 2000 Lewis B. Sheiner Re: An approach for imputing missing independent variable (covariate)
Sep 21, 2000 Vladimir Piotrovskij RE: An approach for imputing missing independent variable (covariate)
Sep 21, 2000 Vladimir Piotrovskij RE: An approach for imputing missing independent variable (covariate)
Sep 21, 2000 Lewis B. Sheiner Re: An approach for imputing missing independent variable (covariate)
Sep 22, 2000 Vladimir Piotrovskij RE: An approach for imputing missing independent variable (covariate)