Re: Missing Gender (Categorical values)
From: Alan Xiao
Subject:Re: [NMusers] Missing Gender (Categorical values)
Date: Sun, 04 Aug 2002 20:58:07 -0400
Dear Diane,
Sorry for the late reply to your email because I was out for vacation 20 minutes after I sent the last email.
About the imputation of missing data, I'm not against Lewis' summary at all. By contrast, I agree with his summary.
However, as expressed in the last email, what I'm concerned is about the potential effect of the imputation model (or
algorithm) on the evaluation of the significance of the imputed covariate and corresponding correlated covariates (used
in the imputation model, or joint model in your paper) to the parameters in a PK model and/or the justification of the
imputation model and the PK model - this is not about the Likelihood method itself. Here, by justification of the PK
model, I mean the type of function in the PK model for the covariate effect (when covariate is other covariates than
SEX) rather than the whole PK model. To make this easier to understand, let's take your paper as an example:
1). How about if you replace Equation 4 in your paper with other simpler functions, such as WEIGHT as a function of BSA
and HEIGHT or as a function of AGE, CLCR and SEX. As you know, BSA is usually calculated from HEIGHT and
WEIGHT while CLCR is calculated from AGE, WEIGHT and SEX. The functions for them are very certain and no
modeling/simulation is needed at all if your BSA data was indeed calculated from HEIGHT and WEIGHT or CLCR was
indeed calculated from AGE, WEIGHT and SEX, and HEIGHT and/or SEX data was not missing. From your Table I,
AGE was not missing at all. BSA, CLCR and SEX were also available (1 missing in CLCR and 3 in SEX, as compared to
AGE). Or, BSA and SEX were also partly imputed in TABLE 1? In another word, patients with missing WEIGHT
actually had all other covariate values missing, including AGE, BSA, SEX and CLCR? - I don't think the information
about this is clear in the paper. Or your BSA and CLCR were directly measured so that you did not have a certain
function for them to simply connect WEIGHT with BSA and HEIGHT or CLCR, AGE and SEX ? If so, can you tell us
how they were measured? (The note under table I says that CLCR was calculated from Cockcroft and Gault formula,
which is a function of AGE, WEIGHT and SEX - why couldn't you just simply revert the calculation to get WEIGHT
from CLCR, AGE and SEX?)
Actually, whether they were measured or calculated does not influence our discussion. The question is, how are you sure
your joint model is the best imputation model? Did you try other imputation models? If you include a function, for
example, WEIGHT**THETA(), to the volume of distribution, or another one such as THETA()*SEX to clearance in the
PK model (assuming they are significant, thus your PK model and imputation model are correlated) and simultaneously fit
the PK model and imputation model to the data using likelihood method to control the minimization, would you get the
same results? or close enough?
2). We talked about SEX previously just because SEX was the missing covariate in the email sent by Atul. If the missing
covariate is a continuous covariate, e.g. WEIGHT in your paper, it becomes a little bit more complicated, because the
potential function could be a power function in additive or multiplication on some parameters of the PK model. I'm
afraid this function will also influence the imputation results. Or, just for testing, how about replacing (WT/70)**0.75 in
Equation 3 with (WT/70)? Would the results be the same? (I am trying to figure out the conditions for a model and
generalize it).
3). Back to covariate effects. If the missing covariate is not significant to any parameters of a PK/PD model, then
whatever value you impute does not matter - the imputation is not really important. However, if the missing covariate
(e.g. WEIGHT in your paper) is significant to the PK/PD model, then the type of function in the PK/PD model to express
the covariate effect will be correlated with the imputation model, as discussed in (1) and (2) above. Furthermore, if a
imputation model-predicting covariate (or "joint model-predicting covariates" such as SEX in your paper, right below
equation 4) is significant or marginally significant, its significance could be neutralized (I'm not sure it's the right word)
or largely weakened by the inclusion of the imputed covariate (WEIGHT here) into the model (i.e., both the imputation
model-predicting covariate, e.g. SEX, and the imputed covariate, e.g. WEIGHT, are significant and correlated). If you
have tested that the imputation model-predicting covariate (SEX here) is not significant in the PK/PD model which does
not include the imputed covariate (WEIGHT here), then we might be able to ignore the influence of the imputed
covariate on the identification of the imputation model-predicting covariate (SEX here). When you say that you "did
not have to estimate sex based on imputation", can you explain a little bit more in detail? Did you mean that you have
tested or you knew from other data that SEX is not significant (whether WEIGHT is significant or not)? or that SEX is
not significant based on the PK model after imputation?
4). How strong is this potential influence of the type of the imputation model on the type of the function of the imputed
covariate on parameters in a PK model? and how strong is the potential influence of the imputed covariate on the
identification of other significant covariates on parameters in a PK model? I have no idea. This is why I asked for the
information if anyone has done this before. But I think this should be case dependent.
5). Do I have a more reliable imputation model for SEX? No. I don't think I can develop one without any detailed
information about the dataset. Actually, the specific model itself is not the most important. It is the methodology used to
develop the model and the interpretation of the model that is the most important. After all, science is just science, it can
be questioned and can be defended.
6). Another minor thing. From my experience, in a combined dataset (from many studies), when the missing ratio is
high, the missing pattern is usually not random (refer to the combined dataset) - if the covariate is missing for all subjects
in one or more studies, or even if the missing is random in one or more studies. In this case, the simple random imputation
may not be appropriate at all - This could be easily overlooked if it is not yourself who have merged all sub datasets
together. I assume this is not the case in your paper (20% missing) and in Atul's data (70% missing).
I have to admit that I don't have had this paper yet:
Nick Holford wrote:
Karlsson M, Jonsson E, Wiltse C, Wade J. Assumption testing in
population pharmacokinetic models: illustrated with an analysis of moxonidine data from congestive heart failure
patients. J Pharmacokinet Biopharm 1998;26(2):207-46.
If all or some of above questions/concerns have already been addressed in this paper, please just simply skip and flag them.
Thanks.
Best regards,
Alan.