RE: likelihood model for missing SEX data

From: Jakob Ribbing Date: July 11, 2006 technical Source: cognigencorp.com
From: Jakob Ribbing jakob.ribbing@farmbio.uu.se Subject: RE: [NMusers] likelihood model for missing SEX data Date: Tue, 11 Jul 2006 17:13:11 +0200 Dear all, Nick's example was enough to convince me that estimating mixture and covariate effect simultaneously does not cause bias as I thought (and I now understand why). In most cases, I would use his approach rather than the one I suggested. The latter, to impute covariates outside of NONMEM may still be useful, basically with the same arguments for and against as for using the GAM in Xpose instead of stepwise covariate modelling within NONMEM, but with the additional benefit that you do not have to worry about modelling the relation between covariates (or the NONMEM-limit in #mixtures). I went ahead and did a quick try, but implemented so that we do not have to obtain from NONMEM the individual probability of a certain SEX, and instead use the Empirical-Bayes Estimates (EBE) of CL to estimate the probability of SEX. This approach would allow imputation of several missing covariates using covariates and EBE:s, but in this example only use the EBE of CL: 1. Change the covariate model for SEX as if it was continuous between 0-1 and impute missing values with the mean. Fit the model in NONMEM and output the EBE of CL (CL) in a table file 2. Re-impute the missing SEX so that the relation between CL and SEX is as strong in the "missing subjects" as in the non-missing. Since this step involves an element of randomness (sampling from the residuals that are obtained from fitting CL to SEX on the subjects where the covariate is not missing), this could be repeated several times (multiple imputation), each time creating a dataset with imputed SEX. 3. Fit a NONMEM model to each dataset created in step 2. These have a more accurate imputation of SEX (than the nave imputation in step 1). If the effect of SEX is large and the information on CL is sparse, step 2-3 may have to be re-iterated, using the updated CL from this step. Whether that is necessary or not will be seen already in step 1: The effect of SEX will not change much since in this case we are not using any independent variables (covariates) in the imputation. The information on CL can be measured as the shrinkage in the eta on CL. 4. If several datasets have been imputed in step 2 and the parameters differ they are averaged (simple arithmetic mean). The variability between datasets is only the variability that is due to the imputation and the SE:s of the SEX-effect and IIV-CL is difficult to obtain (whereas any of Nick's models could be bootstrapped). In case anyone is interested, Ill go through the different steps to explain what I did in practice for Nick's dataset-example (apologies for any typos and miss-formattings but I am in a hurry): STEP 1: For a real example, I would impute with the mean SEX in the included patient population. Since this was unknown, I used the mean in the subjects with non-missing SEX (same as Nick). Dropping a few columns from Nick's/Anthe's datafile (that has missing sex), this model file (run0.mod) could be used: ######################################################### $PROB RUN# 115 (PHENOBARBITAL POPULATION PK MODEL) $INPUT ID TIME AMT WT APGR DV EVID SEX $DATA cp_missingsex.csv IGNORE=I $SUBROUTINES ADVAN1 $PK ISEX = SEX IF (ISEX.LT.0) ISEX=0.5517 ;Mean in non-missing DUMMY = THETA(1);NOT ESTIMATED IN NONMEM ANYMORE TVCL = THETA(2)*(1+THETA(4)*ISEX) CL = TVCL*EXP(ETA(1)) TVV = THETA(3) V = TVV*EXP(ETA(2)) K = CL/V S1 = V $ERROR IPRED = F W = 1 Y = IPRED + THETA(5)*EPS(1) IRES = DV - IPRED IWRES = IRES/W $THETA (0, 0.424,1) FIX ; TH1: NOT ESTIMATED IN NONMEM (0, 0.005) ; TH2: POPCL FOR MALES (male set as reference) (0, 1.5) ; POPV (-1,0.4) ; CLSEX (Female Fractional change in CL) (0, 2.5) ; SD $OMEGA 0.16 ; PPV_CL 0.16 ; PPV_V $SIGMA 1 FIX ; EPS1 $EST MAXEVAL=9999 PRINT=20 METHOD=COND $COV PRINT=E ;-------Table as the input-datafile but with columns for EBE:s added:------- $TABLE ID TIME AMT WT APGR DV EVID SEX CL ONEHEADER NOPRINT NOAPPEND FILE="imptab1" ######################################################### There is a moderate (29%) shrinkage in the eta on CL which may be a problem since the empirical-Bayes estimates of this parameter are used for the imputation. Shrinkage (1-std(ETA1i)/sqrt(OMEGA1)) is easily calculated using the PsN-tool[1] execute when running NONMEM: $ execute -shrinkage run0.mod Compared to estimating parameters on only non-missing subjects, the imprecise estimates of IIV on CL and V change substantially (-9% and 23% on std-scale, respectively) to the better (closer to the values obtained when SEX is available in all 59 subjects). The typical value of CL for male and female increase with 5% and 4%, respectively. This is probably due to the fact that the missing population has (by random chance) a larger difference between sex and a higher fraction of females (than in the non-missing). STEP 2: Run this Splus function (code at the end of this e-mail) to get ten datasets with imputation of sex: impute.sex(num.imp=10) STEP 3: Create run1.mod run10.mod as run0.mod but using the ten new datafiles (ebtab1-10, respectively). Estimate the ten models in NONMEM using PsN: $ execute *.mod -shrinkage -threads=5 -dir MultipleImputed STEP 4: Since I used PsN, the parameter estimates from the ten models are neatly tabled in: "MultipleImputed/raw_results.csv" and can easily be averaged. I used a simple arithmetic mean of all parameters. Some parameters are hardly affected by the multiple imputation, where as others are more variable. Looking at the SE due to imputation, the only parameter which changes more than one SE from run0.mod (mean imputation) is IIV on CL which is slightly decreased. This is because the imputed SEX is better than the mean-imputed, which leaves less variability in CL as random, but the difference was small in this case (IIV 38.2% -> 37.6%). The effect of SEX on CL is highly variable between imputations (mean 14%, SE 4%) but the mean is in the same ball park as when estimating the effect on only the non-missing (11.6%). At this stage one could consider to reiterate steps 2 and 3, due to the problem of shrinkage. In this case, since the estimated effect of SEX is small and shrinkage did not decrease in the step3-models (average 29%, as for run0.mod) I stop here. In conclusion, compared to Nick's solutions this could be a powerful approach if many important covariates are missing and the relation between covariates is difficult to model. I am sure that someone else could come up with better code than the one I used for the multiple imputation in S-plus (below). Have a nice summer! Jakob "impute.sex" <- function (nm.tab="imptab1", num.imp=1, new.datafile="ebtab", cat.imp=F, condMean=F, shrink.imputed=F, seed.imputation=1) { #If many covariates/EBE:s are used for imputation, #use: shrink.imputed=T #If no EBE:s are used in the imputation, #use: condMean=T and num.imp=1 library("Hmisc") set.seed(i=seed.imputation) coeff.bias <- NULL org.data <- read.table(nm.tab, skip=1, header=T) #Set missing/imputed values to NA before re-imputing org.data$SEX[org.data$SEX != 0 & org.data$SEX != 1] <- NA data <- org.data data <- data[!duplicated(data$ID),]#Assuming no IOV if(!any(names(data)=="LNCL")) data$LNCL <- log(data$CL) missing <- is.na(data$SEX) CL <- data$CL LNCL <- data$LNCL SEX <- as.numeric(data$SEX) NASEX <- SEX #----Impute one dataset at a time (if multiple imputation)------------ for (imp.no in 1:num.imp) { if (condMean) {#Will produce biased result if DV:s are used!!!! print("WARNING: Using a mean that is conditional on the DV") print("without a random component will bias the importance") print("of imputed covariates!!!") xtrans <- transcan(~LNCL+SEX, imputed=T, shrink=shrink.imputed, data=data) data$ISEX <- impute(xtrans, SEX, data=data) } else {#Better to use the aregImpute(). -Approx Bayesian bootstrap xtrans <- transcan(~LNCL+SEX, imputed=T, shrink=shrink.imputed, data=data, n.impute=1) data$ISEX <- data$SEX data$ISEX <- round(impute(xtrans, SEX, data=data, imputation=1), digits=4) data$ISEX[missing & data$ISEX==1] <- 0.9999 data$ISEX[missing & data$ISEX==0] <- 0.0001 } #For imputation to categorical instead of continuous SEX: if (cat.imp) { data$ISEX <- ifelse(unclass(data$ISEX)[1:n.subj]<0.5,0,1) } if (imp.no<=10) { #Produce diagnostics print(wo.imputation <- lm(data$LNCL~NASEX, na.action=na.exclude)) print(with.imputation <- lm(data$LNCL~data$ISEX)) print(plot(jitter(as.numeric(data$ISEX[data$ISEX==0 | data$ISEX==1]), factor=0.25), data$LNCL[data$ISEX==0 | data$ISEX==1], xlab="Observed(o) or imputed(I) SEX", ylab="Empirical-Bayes Estimate of LN(CL)")) text(as.numeric(data$ISEX[data$ISEX!=0 & data$ISEX!=1]), data$LNCL[data$ISEX!=0 & data$ISEX!=1], labels="I", col=4, adj=0.5) print(plot(as.numeric(data$ISEX),data$LNCL, xlab="Observed (0/1) or imputed SEX", ylab="Empirical Bayes Estimate of LN(CL)")) } coeff.bias <- c(coeff.bias, 100*(exp(with.imputation$coefficients[2]) - exp(wo.imputation$coefficients[2]))/ (exp(wo.imputation$coefficients[2])-1)) #----Write imputed data to file---------------- imp.file <- paste(new.datafile, imp.no, sep="") imp.dataset <- merge(org.data, data[,is.element(el=names(data), set=c("ID", "ISEX"))], all=T, by=c("ID")) imp.dataset$SEX <- imp.dataset$ISEX write.table(imp.dataset[,names(imp.dataset)!="ISEX"], file = imp.file, sep = ",", dimnames.write="colnames") print(paste("Imputed datafile created:", imp.file)) }#----End of multiple-imputation loop--------- print (coeff.bias) if (num.imp >=10) {#Just a check, don't trust too much... print ("The estimated bias due to random imputation (%):") print (signif(mean(coeff.bias),digits=2)) } invisible() } 1. Lindbom L, Pihlgren P, Jonsson EN. PsN-Toolkit--a collection of computer intensive statistical methods for non-linear mixed effect modeling using NONMEM. Comput Methods Programs Biomed. Sep 2005;79(3):241-257. _______________________________________________________
Jul 04, 2006 Nick Holford Re: likelihood model for missing SEX data
Jul 05, 2006 Jakob Ribbing RE: likelihood model for missing SEX data
Jul 05, 2006 Nick Holford Re: likelihood model for missing SEX data
Jul 07, 2006 Anthe Zandvliet Re: likelihood model for missing SEX data
Jul 07, 2006 Nick Holford Re: likelihood model for missing SEX data
Jul 09, 2006 Stephen Duffull Re: likelihood model for missing SEX data
Jul 10, 2006 Anthe Zandvliet Re: likelihood model for missing SEX data
Jul 11, 2006 Jakob Ribbing RE: likelihood model for missing SEX data