From: Nick Holford n.holford@auckland.ac.nz
Subject: Re: [NMusers] likelihood model for missing SEX data
Date: den 4 juli 2006 08:09
Anthe,
A simple way to predict the missing SEX values in your model is to use a mixture model (see below).
This avoids the need to estimate from continuous and non-continuous data. You need to code
SEX with a negative number in the data when it is missing.
If you want to use the continuous and non-continuous joint likelihood method then I recommend
you dont try to code -2LL in $ERROR. In my experience this is less robust and more biased than
using a user defined CCONTR and CONTR (see the contodd example in
ftp://ftp.globomaxnm.com/Public/nonmem/continuous&categorical/).
Andy Hooker recently pointed out to me that this may be because the $ERROR method does not
recognize the covariance among observations that is included by NONMEM when it computes -2LL.
Best wishes,
Nick
;Model Desc: base model run 1 additive residual error
;Project Name: example1
;Project ID: GM00-001
$PROB RUN# 115 (PHENOBARBITAL POPULATION PK MODEL)
$DATA 001SEX.CSV IGNORE=C ; only CONC DV values
$INPUT C ID TIME AMT WT APGR DV=Z EVID MDV SEX FLAG
$EST MAXEVAL=9999 PRINT=20 NOABORT
METHOD=COND LAPLACE
MSFO=mixedsex.MSF
;$COV
$THETA
(0,0.4,1) ; PFEM
(0, 0.005) ; POP_CL
(0, 1.5) ; POP_V
(0,1.4) ; FFEMCL
(0,2.5) ; SD
$OMEGA
0.16 ; PPV_CL
0.16 ; PPV_V
$SIGMA
1 FIX ; EPS1
$SUBROUTINES ADVAN1
$MIX
NSPOP=2
P(1)=THETA(1)
P(2)=1-THETA(1)
$PK
IF (SEX.GE.0) THEN ; not missing
ISEX=SEX
ELSE
IF (MIXNUM.EQ.1) THEN ; female
ISEX=0
ELSE
ISEX=1
ENDIF
ENDIF
TVCL=THETA(2)*THETA(4)**ISEX
CL=TVCL*EXP(ETA(1))
TVV=THETA(3)
V=TVV*EXP(ETA(2))
K=CL/V
S1=V
TAD=TIME ; for a single dose only
SID=ID
$ERROR
IPRED=F
SD=THETA(5)
Y=IPRED + SD*ERR(1)
$TABLE ID SID TIME TAD IPRED CL ETA(1) SEX FLAG
V ETA(2) WT APGR NOPRINT ONEHEADER FILE=115.TAB
pharmacology/staff/nholford/
likelihood model for missing SEX data
8 messages
4 people
Latest: Jul 11, 2006
From: Jakob Ribbing jakob.ribbing@farmbio.uu.se
Subject: RE: [NMusers] likelihood model for missing SEX data
Date: Wed, 5 Jul 2006 11:11:39 +0200
Nick, Anthe,
I could be wrong, but would not the suggested approach (estimating the probability of SEX=0 at
the same time as estimating the effect of SEX on CL) cause an upward bias in the magnitude
of the estimated-SEX effect?
The individual probabilities [1] would have to be distorted so that the subjects with missing
SEX do not change the estimated SEX effect (when the imputation is based only on the DV). More
importantly, if there are other covariates available that could predict SEX (e.g. WT), these
should be used. The latter could change (improve) the estimated SEX effect. Unless the effect
of SEX is substantial, the imputation could be based only on the other covariates since DV does
not add much information. Imputing best-guess SEX based only on other covariates is done
without the random distortion and before estimating any covariate model in NONMEM. If best-guess
SEX is not predicted well by the other covariates this approach will lead to an estimated
sex-effect which is biased low (just as when imputing with the typical value), but I guess
the expected SEX could be used if SEX is treated as a continuous covariate and using a linear
covariate model?
Best regards,
Jakob
1. Carlsson, Kristin C., Radojka M. Savic, Andrew Hooker, Mats O. Karlsson.
Mixture models in NONMEM - how to find the individual probability of belonging to a
specific mixture, and why this can be useful information. PAGE 15 (2006) Abstr 956
[www.page-meeting.org/?abstract=956]
From: Nick Holford n.holford@auckland.ac.nz
Subject: Re: [NMusers] likelihood model for missing SEX data
Date: Thu, 06 Jul 2006 00:33:35 +1200
Jakob,
I agree with your comments (more or less). The simple mixture model approach does however work
quite well (N=1 example) provided the mixing fraction is FIXED to the a priori best guess
obtained from the non-missing SEX ratio.
More complex predictions of SEX e.g. based on WT are of course very sensible but for the purposes
of method illustration the model I suggested was kept simple.
I will shortly post a further comment to the list with an illustration of how to use the joint
likelihood method that Anthe asked about. This can be combined with the mixture model method as well.
I have not yet figured out how to do this with the individual mixing probability approach that
you mention. You can show us how to do this when I have posted the codes (and data) for the other methods :-)
Nick
From: Anthe Zandvliet Apaza@slz.nl
Subject: Re: [NMusers] likelihood model for missing SEX data
Date: Fri, 07 Jul 2006 17:23:52 +0200
Jakob,
You might be right about the upward bias. I am planning to perform
simulations to evaluate this.
I also agree that any other covariates should be taken into account if
possible. Yet I wonder what method would be best if no other covariates
are available and if the effect of SEX on CL is substantial. I am
interested in your suggestion to use individual probabilities and a
linear covariate model.
Kristin,
Is your script to calculate IOFVs available online?
Best regards,
Anthe
From: Nick Holford n.holford@auckland.ac.nz
Subject: Re: [NMusers] likelihood model for missing SEX data
Date: Sat, 08 Jul 2006 04:58:57 +1200
Anthe, Jakob, Kristin,
I've put some code and data sets showing some approaches to estimating missing SEX
using mixture models +/- joint likelihood estimation.
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/pkpd/missingsex/
If Jakob or Kristin has time would you be able to take one of these examples and show
how to include individual mixing probabilities?
Nick
From: "Stephen Duffull" stephen.duffull@stonebow.otago.ac.nz
Subject: Re: [NMusers] likelihood model for missing SEX data
Date: 9 July 2006 23:14
Hi
Kristin wrote:
> possible. Yet I wonder what method would be best if no other
> covariates
> are available and if the effect of SEX on CL is substantial. I am
> interested in your suggestion to use individual probabilities and a
> linear covariate model.
I realise the thread is really relevant to any binary covariate, but
biologically why would SEX be expected to be an important covariate?
Once
size has been accounted for then females are essentially in a
pharmacokinetic sense smaller males. Reviews have found only minimal
differences between males and females once body size (and more
specifically
body composition) has been taken into account.
Steve
==========================
Professor Stephen Duffull
Chair of Clinical Pharmacy
School of Pharmacy
University of Otago
PO Box 913 Dunedin
New Zealand
E: stephen.duffull@stonebow.otago.ac.nz
P: +64 3 479 5044
F: +64 3 479 7034
From: Anthe Zandvliet Apaza@slz.nl
Subject: Re: [NMusers] likelihood model for missing SEX data
Date: Mon, 10 Jul 2006 10:07:42 +0200
Steve,
I agree. SEX and CL were just used as an example. I am planning to
perform an extensive covariate analysis with a substantial number of
missing values, but I am currently still working on the analysis plan..
Best regards,
Anthe
From: Jakob Ribbing jakob.ribbing@farmbio.uu.se
Subject: RE: [NMusers] likelihood model for missing SEX data
Date: Tue, 11 Jul 2006 17:13:11 +0200
Dear all,
Nick's example was enough to convince me that estimating mixture and
covariate effect simultaneously does not cause bias as I thought (and I now
understand why). In most cases, I would use his approach rather than the one
I suggested.
The latter, to impute covariates outside of NONMEM may still be useful,
basically with the same arguments for and against as for using the GAM in
Xpose instead of stepwise covariate modelling within NONMEM, but with the
additional benefit that you do not have to worry about modelling the
relation between covariates (or the NONMEM-limit in #mixtures). I went ahead
and did a quick try, but implemented so that we do not have to obtain from
NONMEM the individual probability of a certain SEX, and instead use the
Empirical-Bayes Estimates (EBE) of CL to estimate the probability of SEX.
This approach would allow imputation of several missing covariates using
covariates and EBE:s, but in this example only use the EBE of CL:
1. Change the covariate model for SEX as if it was continuous between 0-1
and impute missing values with the mean. Fit the model in NONMEM and output
the EBE of CL (CL) in a table file
2. Re-impute the missing SEX so that the relation between CL and SEX is as
strong in the "missing subjects" as in the non-missing. Since this step
involves an element of randomness (sampling from the residuals that are
obtained from fitting CL to SEX on the subjects where the covariate is not
missing), this could be repeated several times (multiple imputation), each
time creating a dataset with imputed SEX.
3. Fit a NONMEM model to each dataset created in step 2. These have a more
accurate imputation of SEX (than the nave imputation in step 1). If the
effect of SEX is large and the information on CL is sparse, step 2-3 may
have to be re-iterated, using the updated CL from this step. Whether that is
necessary or not will be seen already in step 1: The effect of SEX will not
change much since in this case we are not using any independent variables
(covariates) in the imputation. The information on CL can be measured as the
shrinkage in the eta on CL.
4. If several datasets have been imputed in step 2 and the parameters differ
they are averaged (simple arithmetic mean). The variability between datasets
is only the variability that is due to the imputation and the SE:s of the
SEX-effect and IIV-CL is difficult to obtain (whereas any of Nick's models
could be bootstrapped).
In case anyone is interested, Ill go through the different steps to explain
what I did in practice for Nick's dataset-example (apologies for any typos
and miss-formattings but I am in a hurry):
STEP 1: For a real example, I would impute with the mean SEX in the included
patient population. Since this was unknown, I used the mean in the subjects
with non-missing SEX (same as Nick). Dropping a few columns from
Nick's/Anthe's datafile (that has missing sex), this model file (run0.mod)
could be used:
#########################################################
$PROB RUN# 115 (PHENOBARBITAL POPULATION PK MODEL)
$INPUT ID TIME AMT WT APGR DV EVID SEX
$DATA cp_missingsex.csv IGNORE=I
$SUBROUTINES ADVAN1
$PK
ISEX = SEX
IF (ISEX.LT.0) ISEX=0.5517 ;Mean in non-missing
DUMMY = THETA(1);NOT ESTIMATED IN NONMEM ANYMORE
TVCL = THETA(2)*(1+THETA(4)*ISEX)
CL = TVCL*EXP(ETA(1))
TVV = THETA(3)
V = TVV*EXP(ETA(2))
K = CL/V
S1 = V
$ERROR
IPRED = F
W = 1
Y = IPRED + THETA(5)*EPS(1)
IRES = DV - IPRED
IWRES = IRES/W
$THETA
(0, 0.424,1) FIX ; TH1: NOT ESTIMATED IN NONMEM
(0, 0.005) ; TH2: POPCL FOR MALES (male set as reference)
(0, 1.5) ; POPV
(-1,0.4) ; CLSEX (Female Fractional change in CL)
(0, 2.5) ; SD
$OMEGA
0.16 ; PPV_CL
0.16 ; PPV_V
$SIGMA
1 FIX ; EPS1
$EST MAXEVAL=9999 PRINT=20 METHOD=COND
$COV PRINT=E
;-------Table as the input-datafile but with columns for EBE:s added:-------
$TABLE ID TIME AMT WT APGR DV EVID SEX CL
ONEHEADER NOPRINT NOAPPEND FILE="imptab1"
#########################################################
There is a moderate (29%) shrinkage in the eta on CL which may be a problem
since the empirical-Bayes estimates of this parameter are used for the
imputation. Shrinkage (1-std(ETA1i)/sqrt(OMEGA1)) is easily calculated using
the PsN-tool[1] execute when running NONMEM:
$ execute -shrinkage run0.mod
Compared to estimating parameters on only non-missing subjects, the
imprecise estimates of IIV on CL and V change substantially (-9% and 23% on
std-scale, respectively) to the better (closer to the values obtained when
SEX is available in all 59 subjects). The typical value of CL for male and
female increase with 5% and 4%, respectively. This is probably due to the
fact that the missing population has (by random chance) a larger difference
between sex and a higher fraction of females (than in the non-missing).
STEP 2:
Run this Splus function (code at the end of this e-mail) to get ten datasets
with imputation of sex:
impute.sex(num.imp=10)
STEP 3:
Create run1.mod run10.mod as run0.mod but using the ten new datafiles
(ebtab1-10, respectively). Estimate the ten models in NONMEM using PsN:
$ execute *.mod -shrinkage -threads=5 -dir MultipleImputed
STEP 4:
Since I used PsN, the parameter estimates from the ten models are neatly
tabled in: "MultipleImputed/raw_results.csv" and can easily be averaged. I
used a simple arithmetic mean of all parameters.
Some parameters are hardly affected by the multiple imputation, where as
others are more variable. Looking at the SE due to imputation, the only
parameter which changes more than one SE from run0.mod (mean imputation) is
IIV on CL which is slightly decreased. This is because the imputed SEX is
better than the mean-imputed, which leaves less variability in CL as random,
but the difference was small in this case (IIV 38.2% -> 37.6%). The effect
of SEX on CL is highly variable between imputations (mean 14%, SE 4%) but
the mean is in the same ball park as when estimating the effect on only the
non-missing (11.6%).
At this stage one could consider to reiterate steps 2 and 3, due to the
problem of shrinkage. In this case, since the estimated effect of SEX is
small and shrinkage did not decrease in the step3-models (average 29%, as
for run0.mod) I stop here.
In conclusion, compared to Nick's solutions this could be a powerful
approach if many important covariates are missing and the relation between
covariates is difficult to model. I am sure that someone else could come up
with better code than the one I used for the multiple imputation in S-plus
(below).
Have a nice summer!
Jakob
"impute.sex" <- function (nm.tab="imptab1", num.imp=1, new.datafile="ebtab",
cat.imp=F, condMean=F, shrink.imputed=F, seed.imputation=1) {
#If many covariates/EBE:s are used for imputation,
#use: shrink.imputed=T
#If no EBE:s are used in the imputation,
#use: condMean=T and num.imp=1
library("Hmisc")
set.seed(i=seed.imputation)
coeff.bias <- NULL
org.data <- read.table(nm.tab, skip=1, header=T)
#Set missing/imputed values to NA before re-imputing
org.data$SEX[org.data$SEX != 0 &
org.data$SEX != 1] <- NA
data <- org.data
data <- data[!duplicated(data$ID),]#Assuming no IOV
if(!any(names(data)=="LNCL")) data$LNCL <- log(data$CL)
missing <- is.na(data$SEX)
CL <- data$CL
LNCL <- data$LNCL
SEX <- as.numeric(data$SEX)
NASEX <- SEX
#----Impute one dataset at a time (if multiple imputation)------------
for (imp.no in 1:num.imp) {
if (condMean) {#Will produce biased result if DV:s are used!!!!
print("WARNING: Using a mean that is conditional on the DV")
print("without a random component will bias the importance")
print("of imputed covariates!!!")
xtrans <- transcan(~LNCL+SEX, imputed=T,
shrink=shrink.imputed,
data=data)
data$ISEX <- impute(xtrans, SEX, data=data)
}
else {#Better to use the aregImpute(). -Approx Bayesian bootstrap
xtrans <- transcan(~LNCL+SEX, imputed=T,
shrink=shrink.imputed,
data=data, n.impute=1)
data$ISEX <- data$SEX
data$ISEX <- round(impute(xtrans, SEX, data=data,
imputation=1),
digits=4)
data$ISEX[missing & data$ISEX==1] <- 0.9999
data$ISEX[missing & data$ISEX==0] <- 0.0001
}
#For imputation to categorical instead of continuous SEX:
if (cat.imp) {
data$ISEX <- ifelse(unclass(data$ISEX)[1:n.subj]<0.5,0,1)
}
if (imp.no<=10) { #Produce diagnostics
print(wo.imputation <- lm(data$LNCL~NASEX,
na.action=na.exclude))
print(with.imputation <- lm(data$LNCL~data$ISEX))
print(plot(jitter(as.numeric(data$ISEX[data$ISEX==0 |
data$ISEX==1]), factor=0.25),
data$LNCL[data$ISEX==0 | data$ISEX==1],
xlab="Observed(o) or imputed(I) SEX",
ylab="Empirical-Bayes Estimate of LN(CL)"))
text(as.numeric(data$ISEX[data$ISEX!=0 & data$ISEX!=1]),
data$LNCL[data$ISEX!=0 & data$ISEX!=1],
labels="I", col=4, adj=0.5)
print(plot(as.numeric(data$ISEX),data$LNCL,
xlab="Observed (0/1) or imputed SEX",
ylab="Empirical Bayes Estimate of LN(CL)"))
}
coeff.bias <- c(coeff.bias,
100*(exp(with.imputation$coefficients[2]) -
exp(wo.imputation$coefficients[2]))/
(exp(wo.imputation$coefficients[2])-1))
#----Write imputed data to file----------------
imp.file <- paste(new.datafile, imp.no, sep="")
imp.dataset <- merge(org.data,
data[,is.element(el=names(data),
set=c("ID", "ISEX"))],
all=T, by=c("ID"))
imp.dataset$SEX <- imp.dataset$ISEX
write.table(imp.dataset[,names(imp.dataset)!="ISEX"],
file = imp.file, sep = ",",
dimnames.write="colnames")
print(paste("Imputed datafile created:", imp.file))
}#----End of multiple-imputation loop---------
print (coeff.bias)
if (num.imp >=10) {#Just a check, don't trust too much...
print ("The estimated bias due to random imputation (%):")
print (signif(mean(coeff.bias),digits=2))
}
invisible()
}
1. Lindbom L, Pihlgren P, Jonsson EN. PsN-Toolkit--a collection of
computer intensive statistical methods for non-linear mixed effect modeling
using NONMEM. Comput Methods Programs Biomed. Sep 2005;79(3):241-257.
_______________________________________________________