RE: An approach for imputing missing independent variable (covariate)
From: "Piotrovskij, Vladimir [JanBe]" <VPIOTROV@janbe.jnj.com>
Subject: RE: An approach for imputing missing independent variable (covariate)
Date: Thu, 21 Sep 2000 14:34:03 +0200
>First, let me ask Vladimir why he says his method operates "without assuming any
>explicit model for a covariate" The inverted model for the DV is an explicit model
>for the covariate, is it not?
Sorry, my phrasing was indeed ambiguous. What I meant saying "explicit model" was a model like THETA(.) + ETA(..) where we explicitly assume normal distribution for a covariate.
>More importantly, however, Vladimir's approach has
>at least two problems: (i) it is non-convergent: each data imputation
>at step 4 generates a different data set, which will yield
>a different estimate at step 5. This will never stop. (ii) Even
>if it converges "well enough" to a "region",
>it will not yield correct standard errors.
I believe the algorithm will converge, however, I don't think I will have time to check this and also to assess the magnitude of the bias (unless I will do myself modeling of data with missing covariate values; currently I do not have such a problem). The data set remained essentially unchanged except missing IDV are substituted by estimates obtained at the previous iteration. I presume this will work nicely if the proportion of missing values is small (20 % as in my example, or less). I believe "correct standard errors" is a kind of unachievable ideal even if there are no missing predictors at all.
>To see (ii), imagine the (absurd) situation that all but two data
>points from one individual were missing: the algorithm would wind up filling in
>all missing data points from the line defined by the two actual observations
>(without any error) and would eventually
>report perfect precision for the estimate of the slope
>and intercept defined by the two observations.
>This is not to say that anyone would try such an analysis; it merely
>points out that the method fails as it approaches a limit, which
>should make one suspect that it will have problems,
>perhaps of lesser severity, away from that limit. The reason for
>the problem is that uncertainty in the (posthoc) parameter
>estimates is ignored (more on this below).
With this absurd situation no imputation can be made at all. Multiple imputation will probably fail as well.
>The more difficult issue, though,
>is how to compute standard errors? The standard errors
>from the last step of Vladimir's last iteration
>can't be right, as these are conditional on the imputed data,
>treating them as known, when in fact they are unknown.
Missing values are unknown by definition, and I am not sure multiple imputation may change this.
>A simpler method, which doesn't require an invertable
>function such as Valdimir's, and which is
>theoretically sound (i.e. gives unbiased estimates and
>correct standard errors) is multiple imputation.
>This method requires the ability to draw samples of the missing
>data from their posterior distribution.
This is what I wanted to avoid: sampling covariates from (unknown) distribution.
Best regards,
Vladimir