Re: An approach for imputing missing independent variable (covariate)

From: Lewis B. Sheiner Date: September 20, 2000 technical Source: cognigencorp.com
From: LSheiner <lewis@c255.ucsf.edu> Subject: Re: An approach for imputing missing independent variable (covariate) Date: Wed, 20 Sep 2000 10:00:19 -0700 "Gibiansky, Leonid" wrote: > > Vladimir, > > Essentially, you allow your covariate to "float" so that the imputed missing > value would not "disturb" your model. My impression is that it is the same > as use NEWCOV instead of COV where > > NEWCOV = COV (if COV is not missing) > NEWCOV = THETA(10)+ETA(10) > > $THETA > ... > 0 ; or any reasonable initial value and range > > $OMEGA > .... > HUGE FIXED; to allow any value that is convenient for the model > > Is there any difference with your approach ? Yes theren is, see my note of earlier today. What Leonid has here is one of the methods I discussed in my first response on this issue: it effectively integrates the likelihood across the missing data, and is formally correct. And again the caution: if the data are missing non-ignorably, bias can result (see: Little & Rubin, Statistical analysis with missing data, NY Wiley, 1987) > POSTHOC value for NEWCOV should > be equal to the result of your iteration scheme. Alternatively, you may > first model the covariate distribution independently (approximate it by the > normal distribution, if possible, and find mean and variance), and then fix > thata(10) and omega(10) at those values. In this case, you place some > restrictions on the missing covariate value by using distribution of the > not-missing covariate values. Strictly speaking, the model for the missing data should be (as my other note of today indicates) based on ALL the observed data, not just that of the non missing covariate. That is, it should use the information in DV as well. To see this, consider the case of modeling y based on the single covariate x. Imagine, unknown to the analyst, y = x exactly. If some x's are missing, and the model that missing x = (mean of observed x's) +/- (std dev of observed x's) is used to marginalize the likelihood, then the apparent correlatikon between x and y will not be perfect. Again, what's wrong in the limit is likely wrong away from the limit. -- _/ _/ _/_/ _/_/_/ _/_/_/ Lewis B Sheiner, MD (lewis@c255.ucsf.edu) _/ _/ _/ _/_ _/_/ Professor: Lab. Med., Bioph. Sci., Med. _/ _/ _/ _/ _/ Box 0626, UCSF, SF, CA, 94143-0626 _/_/ _/_/ _/_/_/ _/ 415-476-1965 (v), 415-476-2796 (fax)
Sep 11, 2000 Paul S. Collier missing data items
Sep 11, 2000 Lewis B. Sheiner Re: missing data items
Sep 11, 2000 Mats Karlsson Re: missing data items
Sep 11, 2000 Nick Holford Missing data values
Sep 11, 2000 Lewis B. Sheiner Re: missing data items
Sep 20, 2000 Vladimir Piotrovskij An approach for imputing missing independent variable (covariate)
Sep 20, 2000 Leonid Gibiansky RE: An approach for imputing missing independent variable (covariate)
Sep 20, 2000 Lewis B. Sheiner Re: An approach for imputing missing independent variable (covariate)
Sep 20, 2000 Lewis B. Sheiner Re: An approach for imputing missing independent variable (covariate)
Sep 21, 2000 Vladimir Piotrovskij RE: An approach for imputing missing independent variable (covariate)
Sep 21, 2000 Vladimir Piotrovskij RE: An approach for imputing missing independent variable (covariate)
Sep 21, 2000 Lewis B. Sheiner Re: An approach for imputing missing independent variable (covariate)
Sep 22, 2000 Vladimir Piotrovskij RE: An approach for imputing missing independent variable (covariate)