Re: Missing covariates
From: SMITH_BRIAN_P@Lilly.com
Subject: Re: Missing covariates
Date: Thu, 05 Jul 2001 12:38:52 -0500
I have used Leonid's method many times in my analyses. It is a pragmatic way of dealing with missing covariates. Let me describe his method in a different way, and I think you will see that it is quite useable and makes sense.
Another way you could think of this if you had a linear model, is set all of the missing values to zero. Create a dummy variable which is 1 if the value is missing and 0 otherwise. Then consider the model Cl = a + b*wt + c*miss
When the value is missing you get
Cl = a + c
When the value is not missing you get
Cl = a + b*wt
If you were to a + c = a + b*wt and solve for wt = c/b, then in essence you are letting your model estimate the average of the missing individual's weight, which is c/b.
This is superior to just imputing the median or mean weight. First, your missing values may have systematically smaller or larger weights than the group that is not missing. Second this method uses up a degree of freedom in order to estimate, thus you are paying a penalty for having missing values.
Now, Leonid uses a power model in his example. He also uses if then code, which I seldom use. But, in his example, when the value is missing
Cl = theta(1)
When the value is not missing you get
Cl = theta(2)*wt**(theta(3))
Set them equal and solve for wt. Thus, your estimate for the average of an individual's with missing weights is (theta(1)/theta(2))**(1/theta(3)).
What I would do, which equivalent to Leonid's model, is fit
Cl = exp(theta(1) + theta(2)*miss + theta(3)*lnwt)
Let, lnwt=0 when missing and create a dummy variable as described above. Then theta(2)/theta(3) becomes the estimate of the average lnwt. The exponential of this quantity would become the estimate of the average weight for an individual with missing weight.
The same method is equally applicable when you have categorical data, and if you only had 2 classes (like gender), you could estimate the proportion of missing observations that were male and female.
Sincerely,
Brian Smith
Go to Subject: 'Centering (was: Missing Covariates)'