RE: SAEM and IMP
Dear Emmanuel and Pavel,
Further to Bob's answer, recall also that delta OFV in the likelihood ratio
test is only asymptotocaily chi squared distributed,and this is not the only
reason why you should not get too hung up on OFV to help choose your models.
For example, Lavielle 2010 in Biometrics showed nicely how the SAEM algorithm
can estimate parameters of complex differential equation models for joint HIV
viral load and CD4 counts. Using OFV-based metrics a latent model was chosen
whereby the majority of circulating T-cells were infected with virus - this
model also gave nice fits to the data. When immunologists look at CD4 cells of
HIV infected patients however, they find that much less than 1% (closer to
0.01%) of circulating T-cells contain virus (most of the virus making up the
latent reservoir is stuck to folicular cells in the periphery), so one would
have to question the meaning of the parameters identified as the best fit by
SAEM. By all means use SAEM to fit ODE models that don't run/converge with
other algorithms (I do), but choose models with parameters that make
mechanistic sense rather than relying too heavily on OFV-based metrics. A nice
VPC always goes down well too.
Joe
Joseph F Standing
MRC Fellow, UCL Institute of Child Health
Antimicrobial Pharmacist, Great Ormond Street Hospital
Tel: +44(0)207 905 2370
Mobile: +44(0)7970 572435
Quoted reply history
________________________________________
From: [email protected] [[email protected]] On Behalf Of
Bob Leary [[email protected]]
Sent: 15 May 2014 19:22
To: Emmanuel Chigutsa; Pavel Belo; [email protected]
Subject: RE: [NMusers] SAEM and IMP
Hi Emmanuel,
While I am a strong advocate of using quasi-random rather than pseudo- random
sequences for importance sampling in EM methods like IMP, there is a
theoretical (and very real) problem with their use in the context you
suggested in your message, namely with a multivariate t distribution as the
importance sampling distribution. The 3S2 option implies you are using a Sobol
quasi-random sequence, while
the DF=7 implies the use of a multivariate T-distribution with 7 degrees of
freedom. The standard way of generating
a p-dimensional multivariate t -random variable with DF degrees of freedom is
to generate a p-dimensional multivariate normal and then divide by an
additional independent random variable which is basically the square root of a
1-d chi square random variable with DF degrees of freedom. Thus to generate a
p-dimensional importance sample, you actually need to use p+1 independent
random variables. If you simply use a p+1 dimensional Sobol vector as the
base quasi-random draw, the nonlinear mapping from p+1 dimensions to the final
p dimensional result destroys the low discrepancy property of the final
sequence in the p-dimensional space and in fact introduces a significant
amount of bias in the final result. The problem arises directly from the p+1
vs p dimensional mismatch.
There is no problem if the final p-dimensional result can be generated from a
p-dimensional quasi-random sequence, which is the case for multivariate normal
Importance samples. So quasi random sequences should really only be used for
the DF=0 multivariate normal importance sampling distribution case, not the
multivariate DF>0 multivariate t case.
I ran across this effect in testing the Sobol-based importance sampling EM
algorithm QRPEM in Phoenix NLME. It is very real and the net effect is to
introduce a significant bias. There is a partial fix that works but gives up
some of the benefit of using low-discrepancy sequences – namely use a
p-dimensional quasi-random vector to generate the p-dimensional multivariate
normal, but
then use a 1-d pseudo-random sequence to generate the chi-square random
variable.
From: [email protected] [mailto:[email protected]] On
Behalf Of Emmanuel Chigutsa
Sent: Thursday, May 15, 2014 1:03 PM
To: Pavel Belo; [email protected]
Subject: Re: [NMusers] SAEM and IMP
Hi Pavel
I have experienced a similar problem. In my case, the following code for IMP
after SAEM (using NM7.3) greatly reduced the Monte Carlo OFV noise from
variations of about +/- 60 points to variations of +/- 6 points (though still
not good enough for covariate testing):
$EST METHOD=IMP LAPLACE INTER NITER=15 ISAMPLE=3000 EONLY=1 DF=7 IACCEPT=0.3
ISAMPEND=10000 STDOBJ=2 MAPITER=0 PRINT=1 SEED=123456 RANMETHOD=3S2
The settings are explained in the NM7.3 guide. If you are using NM7.3, you can
also try IACCEPT=0.0 whereupon "NONMEM will determine the most appropriate
IACCEPT level for each subject". Of course the settings for DF and IACCEPT in
the above code will depend on the type of data you have. Which brings me to my
own question. If I have both continous and categorical DVs in the dataset
(which would mean different optimal settings) and I am using F_FLAG
accordingly, what would the 'right' values of DF and IACCEPT be? I have noticed
that the DF automatically chosen by NONMEM for individuals in the dataset can
vary from 0-8 and this appears to be random.