likelihood model for missing SEX data

8 messages 4 people Latest: Jul 11, 2006

Re: likelihood model for missing SEX data

From: Nick Holford Date: July 04, 2006 technical
From: Nick Holford n.holford@auckland.ac.nz Subject: Re: [NMusers] likelihood model for missing SEX data Date: den 4 juli 2006 08:09 Anthe, A simple way to predict the missing SEX values in your model is to use a mixture model (see below). This avoids the need to estimate from continuous and non-continuous data. You need to code SEX with a negative number in the data when it is missing. If you want to use the continuous and non-continuous joint likelihood method then I recommend you dont try to code -2LL in $ERROR. In my experience this is less robust and more biased than using a user defined CCONTR and CONTR (see the contodd example in ftp://ftp.globomaxnm.com/Public/nonmem/continuous&categorical/). Andy Hooker recently pointed out to me that this may be because the $ERROR method does not recognize the covariance among observations that is included by NONMEM when it computes -2LL. Best wishes, Nick ;Model Desc: base model run 1 additive residual error ;Project Name: example1 ;Project ID: GM00-001 $PROB RUN# 115 (PHENOBARBITAL POPULATION PK MODEL) $DATA 001SEX.CSV IGNORE=C ; only CONC DV values $INPUT C ID TIME AMT WT APGR DV=Z EVID MDV SEX FLAG $EST MAXEVAL=9999 PRINT=20 NOABORT METHOD=COND LAPLACE MSFO=mixedsex.MSF ;$COV $THETA (0,0.4,1) ; PFEM (0, 0.005) ; POP_CL (0, 1.5) ; POP_V (0,1.4) ; FFEMCL (0,2.5) ; SD $OMEGA 0.16 ; PPV_CL 0.16 ; PPV_V $SIGMA 1 FIX ; EPS1 $SUBROUTINES ADVAN1 $MIX NSPOP=2 P(1)=THETA(1) P(2)=1-THETA(1) $PK IF (SEX.GE.0) THEN ; not missing ISEX=SEX ELSE IF (MIXNUM.EQ.1) THEN ; female ISEX=0 ELSE ISEX=1 ENDIF ENDIF TVCL=THETA(2)*THETA(4)**ISEX CL=TVCL*EXP(ETA(1)) TVV=THETA(3) V=TVV*EXP(ETA(2)) K=CL/V S1=V TAD=TIME ; for a single dose only SID=ID $ERROR IPRED=F SD=THETA(5) Y=IPRED + SD*ERR(1) $TABLE ID SID TIME TAD IPRED CL ETA(1) SEX FLAG V ETA(2) WT APGR NOPRINT ONEHEADER FILE=115.TAB pharmacology/staff/nholford/

RE: likelihood model for missing SEX data

From: Jakob Ribbing Date: July 05, 2006 technical
From: Jakob Ribbing jakob.ribbing@farmbio.uu.se Subject: RE: [NMusers] likelihood model for missing SEX data Date: Wed, 5 Jul 2006 11:11:39 +0200 Nick, Anthe, I could be wrong, but would not the suggested approach (estimating the probability of SEX=0 at the same time as estimating the effect of SEX on CL) cause an upward bias in the magnitude of the estimated-SEX effect? The individual probabilities [1] would have to be distorted so that the subjects with missing SEX do not change the estimated SEX effect (when the imputation is based only on the DV). More importantly, if there are other covariates available that could predict SEX (e.g. WT), these should be used. The latter could change (improve) the estimated SEX effect. Unless the effect of SEX is substantial, the imputation could be based only on the other covariates since DV does not add much information. Imputing best-guess SEX based only on other covariates is done without the random distortion and before estimating any covariate model in NONMEM. If best-guess SEX is not predicted well by the other covariates this approach will lead to an estimated sex-effect which is biased low (just as when imputing with the typical value), but I guess the expected SEX could be used if SEX is treated as a continuous covariate and using a linear covariate model? Best regards, Jakob 1. Carlsson, Kristin C., Radojka M. Savic, Andrew Hooker, Mats O. Karlsson. Mixture models in NONMEM - how to find the individual probability of belonging to a specific mixture, and why this can be useful information. PAGE 15 (2006) Abstr 956 [www.page-meeting.org/?abstract=956]

Re: likelihood model for missing SEX data

From: Nick Holford Date: July 05, 2006 technical
From: Nick Holford n.holford@auckland.ac.nz Subject: Re: [NMusers] likelihood model for missing SEX data Date: Thu, 06 Jul 2006 00:33:35 +1200 Jakob, I agree with your comments (more or less). The simple mixture model approach does however work quite well (N=1 example) provided the mixing fraction is FIXED to the a priori best guess obtained from the non-missing SEX ratio. More complex predictions of SEX e.g. based on WT are of course very sensible but for the purposes of method illustration the model I suggested was kept simple. I will shortly post a further comment to the list with an illustration of how to use the joint likelihood method that Anthe asked about. This can be combined with the mixture model method as well. I have not yet figured out how to do this with the individual mixing probability approach that you mention. You can show us how to do this when I have posted the codes (and data) for the other methods :-) Nick

Re: likelihood model for missing SEX data

From: Anthe Zandvliet Date: July 07, 2006 technical
From: Anthe Zandvliet Apaza@slz.nl Subject: Re: [NMusers] likelihood model for missing SEX data Date: Fri, 07 Jul 2006 17:23:52 +0200 Jakob, You might be right about the upward bias. I am planning to perform simulations to evaluate this. I also agree that any other covariates should be taken into account if possible. Yet I wonder what method would be best if no other covariates are available and if the effect of SEX on CL is substantial. I am interested in your suggestion to use individual probabilities and a linear covariate model. Kristin, Is your script to calculate IOFVs available online? Best regards, Anthe

Re: likelihood model for missing SEX data

From: Nick Holford Date: July 07, 2006 technical
From: Nick Holford n.holford@auckland.ac.nz Subject: Re: [NMusers] likelihood model for missing SEX data Date: Sat, 08 Jul 2006 04:58:57 +1200 Anthe, Jakob, Kristin, I've put some code and data sets showing some approaches to estimating missing SEX using mixture models +/- joint likelihood estimation. http://www.health.auckland.ac.nz/pharmacology/staff/nholford/pkpd/missingsex/ If Jakob or Kristin has time would you be able to take one of these examples and show how to include individual mixing probabilities? Nick

Re: likelihood model for missing SEX data

From: Stephen Duffull Date: July 09, 2006 technical
From: "Stephen Duffull" stephen.duffull@stonebow.otago.ac.nz Subject: Re: [NMusers] likelihood model for missing SEX data Date: 9 July 2006 23:14 Hi Kristin wrote: > possible. Yet I wonder what method would be best if no other > covariates > are available and if the effect of SEX on CL is substantial. I am > interested in your suggestion to use individual probabilities and a > linear covariate model. I realise the thread is really relevant to any binary covariate, but biologically why would SEX be expected to be an important covariate? Once size has been accounted for then females are essentially in a pharmacokinetic sense smaller males. Reviews have found only minimal differences between males and females once body size (and more specifically body composition) has been taken into account. Steve ========================== Professor Stephen Duffull Chair of Clinical Pharmacy School of Pharmacy University of Otago PO Box 913 Dunedin New Zealand E: stephen.duffull@stonebow.otago.ac.nz P: +64 3 479 5044 F: +64 3 479 7034

Re: likelihood model for missing SEX data

From: Anthe Zandvliet Date: July 10, 2006 technical
From: Anthe Zandvliet Apaza@slz.nl Subject: Re: [NMusers] likelihood model for missing SEX data Date: Mon, 10 Jul 2006 10:07:42 +0200 Steve, I agree. SEX and CL were just used as an example. I am planning to perform an extensive covariate analysis with a substantial number of missing values, but I am currently still working on the analysis plan.. Best regards, Anthe

RE: likelihood model for missing SEX data

From: Jakob Ribbing Date: July 11, 2006 technical
From: Jakob Ribbing jakob.ribbing@farmbio.uu.se Subject: RE: [NMusers] likelihood model for missing SEX data Date: Tue, 11 Jul 2006 17:13:11 +0200 Dear all, Nick's example was enough to convince me that estimating mixture and covariate effect simultaneously does not cause bias as I thought (and I now understand why). In most cases, I would use his approach rather than the one I suggested. The latter, to impute covariates outside of NONMEM may still be useful, basically with the same arguments for and against as for using the GAM in Xpose instead of stepwise covariate modelling within NONMEM, but with the additional benefit that you do not have to worry about modelling the relation between covariates (or the NONMEM-limit in #mixtures). I went ahead and did a quick try, but implemented so that we do not have to obtain from NONMEM the individual probability of a certain SEX, and instead use the Empirical-Bayes Estimates (EBE) of CL to estimate the probability of SEX. This approach would allow imputation of several missing covariates using covariates and EBE:s, but in this example only use the EBE of CL: 1. Change the covariate model for SEX as if it was continuous between 0-1 and impute missing values with the mean. Fit the model in NONMEM and output the EBE of CL (CL) in a table file 2. Re-impute the missing SEX so that the relation between CL and SEX is as strong in the "missing subjects" as in the non-missing. Since this step involves an element of randomness (sampling from the residuals that are obtained from fitting CL to SEX on the subjects where the covariate is not missing), this could be repeated several times (multiple imputation), each time creating a dataset with imputed SEX. 3. Fit a NONMEM model to each dataset created in step 2. These have a more accurate imputation of SEX (than the nave imputation in step 1). If the effect of SEX is large and the information on CL is sparse, step 2-3 may have to be re-iterated, using the updated CL from this step. Whether that is necessary or not will be seen already in step 1: The effect of SEX will not change much since in this case we are not using any independent variables (covariates) in the imputation. The information on CL can be measured as the shrinkage in the eta on CL. 4. If several datasets have been imputed in step 2 and the parameters differ they are averaged (simple arithmetic mean). The variability between datasets is only the variability that is due to the imputation and the SE:s of the SEX-effect and IIV-CL is difficult to obtain (whereas any of Nick's models could be bootstrapped). In case anyone is interested, Ill go through the different steps to explain what I did in practice for Nick's dataset-example (apologies for any typos and miss-formattings but I am in a hurry): STEP 1: For a real example, I would impute with the mean SEX in the included patient population. Since this was unknown, I used the mean in the subjects with non-missing SEX (same as Nick). Dropping a few columns from Nick's/Anthe's datafile (that has missing sex), this model file (run0.mod) could be used: ######################################################### $PROB RUN# 115 (PHENOBARBITAL POPULATION PK MODEL) $INPUT ID TIME AMT WT APGR DV EVID SEX $DATA cp_missingsex.csv IGNORE=I $SUBROUTINES ADVAN1 $PK ISEX = SEX IF (ISEX.LT.0) ISEX=0.5517 ;Mean in non-missing DUMMY = THETA(1);NOT ESTIMATED IN NONMEM ANYMORE TVCL = THETA(2)*(1+THETA(4)*ISEX) CL = TVCL*EXP(ETA(1)) TVV = THETA(3) V = TVV*EXP(ETA(2)) K = CL/V S1 = V $ERROR IPRED = F W = 1 Y = IPRED + THETA(5)*EPS(1) IRES = DV - IPRED IWRES = IRES/W $THETA (0, 0.424,1) FIX ; TH1: NOT ESTIMATED IN NONMEM (0, 0.005) ; TH2: POPCL FOR MALES (male set as reference) (0, 1.5) ; POPV (-1,0.4) ; CLSEX (Female Fractional change in CL) (0, 2.5) ; SD $OMEGA 0.16 ; PPV_CL 0.16 ; PPV_V $SIGMA 1 FIX ; EPS1 $EST MAXEVAL=9999 PRINT=20 METHOD=COND $COV PRINT=E ;-------Table as the input-datafile but with columns for EBE:s added:------- $TABLE ID TIME AMT WT APGR DV EVID SEX CL ONEHEADER NOPRINT NOAPPEND FILE="imptab1" ######################################################### There is a moderate (29%) shrinkage in the eta on CL which may be a problem since the empirical-Bayes estimates of this parameter are used for the imputation. Shrinkage (1-std(ETA1i)/sqrt(OMEGA1)) is easily calculated using the PsN-tool[1] execute when running NONMEM: $ execute -shrinkage run0.mod Compared to estimating parameters on only non-missing subjects, the imprecise estimates of IIV on CL and V change substantially (-9% and 23% on std-scale, respectively) to the better (closer to the values obtained when SEX is available in all 59 subjects). The typical value of CL for male and female increase with 5% and 4%, respectively. This is probably due to the fact that the missing population has (by random chance) a larger difference between sex and a higher fraction of females (than in the non-missing). STEP 2: Run this Splus function (code at the end of this e-mail) to get ten datasets with imputation of sex: impute.sex(num.imp=10) STEP 3: Create run1.mod run10.mod as run0.mod but using the ten new datafiles (ebtab1-10, respectively). Estimate the ten models in NONMEM using PsN: $ execute *.mod -shrinkage -threads=5 -dir MultipleImputed STEP 4: Since I used PsN, the parameter estimates from the ten models are neatly tabled in: "MultipleImputed/raw_results.csv" and can easily be averaged. I used a simple arithmetic mean of all parameters. Some parameters are hardly affected by the multiple imputation, where as others are more variable. Looking at the SE due to imputation, the only parameter which changes more than one SE from run0.mod (mean imputation) is IIV on CL which is slightly decreased. This is because the imputed SEX is better than the mean-imputed, which leaves less variability in CL as random, but the difference was small in this case (IIV 38.2% -> 37.6%). The effect of SEX on CL is highly variable between imputations (mean 14%, SE 4%) but the mean is in the same ball park as when estimating the effect on only the non-missing (11.6%). At this stage one could consider to reiterate steps 2 and 3, due to the problem of shrinkage. In this case, since the estimated effect of SEX is small and shrinkage did not decrease in the step3-models (average 29%, as for run0.mod) I stop here. In conclusion, compared to Nick's solutions this could be a powerful approach if many important covariates are missing and the relation between covariates is difficult to model. I am sure that someone else could come up with better code than the one I used for the multiple imputation in S-plus (below). Have a nice summer! Jakob "impute.sex" <- function (nm.tab="imptab1", num.imp=1, new.datafile="ebtab", cat.imp=F, condMean=F, shrink.imputed=F, seed.imputation=1) { #If many covariates/EBE:s are used for imputation, #use: shrink.imputed=T #If no EBE:s are used in the imputation, #use: condMean=T and num.imp=1 library("Hmisc") set.seed(i=seed.imputation) coeff.bias <- NULL org.data <- read.table(nm.tab, skip=1, header=T) #Set missing/imputed values to NA before re-imputing org.data$SEX[org.data$SEX != 0 & org.data$SEX != 1] <- NA data <- org.data data <- data[!duplicated(data$ID),]#Assuming no IOV if(!any(names(data)=="LNCL")) data$LNCL <- log(data$CL) missing <- is.na(data$SEX) CL <- data$CL LNCL <- data$LNCL SEX <- as.numeric(data$SEX) NASEX <- SEX #----Impute one dataset at a time (if multiple imputation)------------ for (imp.no in 1:num.imp) { if (condMean) {#Will produce biased result if DV:s are used!!!! print("WARNING: Using a mean that is conditional on the DV") print("without a random component will bias the importance") print("of imputed covariates!!!") xtrans <- transcan(~LNCL+SEX, imputed=T, shrink=shrink.imputed, data=data) data$ISEX <- impute(xtrans, SEX, data=data) } else {#Better to use the aregImpute(). -Approx Bayesian bootstrap xtrans <- transcan(~LNCL+SEX, imputed=T, shrink=shrink.imputed, data=data, n.impute=1) data$ISEX <- data$SEX data$ISEX <- round(impute(xtrans, SEX, data=data, imputation=1), digits=4) data$ISEX[missing & data$ISEX==1] <- 0.9999 data$ISEX[missing & data$ISEX==0] <- 0.0001 } #For imputation to categorical instead of continuous SEX: if (cat.imp) { data$ISEX <- ifelse(unclass(data$ISEX)[1:n.subj]<0.5,0,1) } if (imp.no<=10) { #Produce diagnostics print(wo.imputation <- lm(data$LNCL~NASEX, na.action=na.exclude)) print(with.imputation <- lm(data$LNCL~data$ISEX)) print(plot(jitter(as.numeric(data$ISEX[data$ISEX==0 | data$ISEX==1]), factor=0.25), data$LNCL[data$ISEX==0 | data$ISEX==1], xlab="Observed(o) or imputed(I) SEX", ylab="Empirical-Bayes Estimate of LN(CL)")) text(as.numeric(data$ISEX[data$ISEX!=0 & data$ISEX!=1]), data$LNCL[data$ISEX!=0 & data$ISEX!=1], labels="I", col=4, adj=0.5) print(plot(as.numeric(data$ISEX),data$LNCL, xlab="Observed (0/1) or imputed SEX", ylab="Empirical Bayes Estimate of LN(CL)")) } coeff.bias <- c(coeff.bias, 100*(exp(with.imputation$coefficients[2]) - exp(wo.imputation$coefficients[2]))/ (exp(wo.imputation$coefficients[2])-1)) #----Write imputed data to file---------------- imp.file <- paste(new.datafile, imp.no, sep="") imp.dataset <- merge(org.data, data[,is.element(el=names(data), set=c("ID", "ISEX"))], all=T, by=c("ID")) imp.dataset$SEX <- imp.dataset$ISEX write.table(imp.dataset[,names(imp.dataset)!="ISEX"], file = imp.file, sep = ",", dimnames.write="colnames") print(paste("Imputed datafile created:", imp.file)) }#----End of multiple-imputation loop--------- print (coeff.bias) if (num.imp >=10) {#Just a check, don't trust too much... print ("The estimated bias due to random imputation (%):") print (signif(mean(coeff.bias),digits=2)) } invisible() } 1. Lindbom L, Pihlgren P, Jonsson EN. PsN-Toolkit--a collection of computer intensive statistical methods for non-linear mixed effect modeling using NONMEM. Comput Methods Programs Biomed. Sep 2005;79(3):241-257. _______________________________________________________