RE: Linear VS LTBS
Sorry to back up a few days on this thread, but I could not
resist piling yet another comment on the traditional covariance
success/convergence debate.
>From a purely algorithmic perspective, the relationship between convergence
>behavior
and success of the covariance step is extremely tenuous.
NONMEM uses a version of a BFGS quasi Newton method to drive the ELS obective
function
optimization. At any given stage, the descent direction is of the form
-(H_BFGS)**-1 * g, where g is the gradient of the objective function
and H_BFGS is a positive definite matrix
that captures accumulated curvature information recovered from the entire
sequence
of previously evaluated gradients at various points. Note that the 'true'
Newton direction is -H**-1 * g, where H is the Hessian evaluated at the current
point.
H also captures localized (to the current point) curvature information,
but is difficult and expensive to evaluate, so BFGS methods use the much more
easily
computed quasi-Newton direction. With accurate gradients, the quasi-Newton
direction
is necessarily a descent direction (since H_BFGS is guaranteed to be positive
definite),
while the true Newton direction need not be. Note that H and H_BFGS
do not bear any necessary relation to each other, although H_BFGS is often
thought
of as a surrogate or approximant for H.
At convergence (usually recognized by the algorithm as the gradient becoming
'sufficiently' small in magnitude), the standard errors in principle are
computed from the square roots of the diagonal elements of H**(-1), assuming H
is positive definite (which is almost always the case if at least a
local minimum has been found - there are some degenerate exceptions, called
non-Morse points where the Hessian is only positive semidefinite at
the local optimum, but these are rarely encountered).
But in practice the true Hessian H is hard to compute,
and NM uses a numerical approximation to the Hessian
(I believe forward difference, meaning for example in one dimension
the second derivative
at x is approximated by [f(x+2*eps) - 2f(x+eps) + f(x )}/eps^2. Even if an
optimal
step size eps is used and f can be computed to
full 15 digit double precision accuracy (which is way too optimistic if the
model
is defined by ODEs which are solved numerically), the best you can do
numerically
with forward differences is about 5 significant
digits of accuracy of agreement of the numerical hessian with the
true Hessian . Central differences, which are much more expensive, do a little
bit better.
But for the hessian inverse, if the numerical hessian or the true Hessian has
a condition
number greater than 10^5 (a condition number of 10^5 is actually quite benign
for most
matrix computations), then there
are no siginificant digits of agreement of the std errors computed from
the numerical hessian and the standard errors computed from the true
hessian. Moreover, in this case a perfectly respectable
positive definite matrix H can easily become indefinite when approximated
numerically,
which will cause the covariance step to fail.
Thus failure of the covariance step means only that the numerical Hessian at
the converged point is not positive definite, but very little can be concluded
from this regarding the true H due to the inherent numerical difficulties
and imprecison in numerical
Hessian computations. Moreover, the numerical Hessian plays no role whatever
in the convergence of the algorithm. Conversely, success of the covariance step
simply means the numerical Hessian is positive definite.
What is well known is that most gradient based (such as simple gradient descent,
newton, quasi-Newton, and conjugate gradient) unconstrained
numerical optimization methods typically work best
(speed ,accuracy, and reliability) when the true Hessian H in the vicinity of
the optimum
has a fairly narrow range of eigenvalues - e.g. when the condition number
max eigenvalue/ min eigenvalue is relatively small. (In the case of conjugate
gradient
methods, it is possible to obtain theoretical bounds on rates of convergence
in terms of condition number with the rates being inversely related to
the condition number).
Of course, a good condition number near the optimum
but not near the starting point does not
preclude convergence problems occuring away from the neighborhood
of the optimum.
But all other things being equal, well conditioned true Hessians (small
condition numbers) in
the neighborhood of the optimum are generally favorable for convergence
behavior. In terms of parameterization,
each additional parameter usually increases the condition number at the optimum
(this can be shown to be strictly true in the case of ordinary linear
regression). So in this sense, overparameterization 'typically'
adversely affects convergence behavior. But unfortunately it is not clear how
to turn this into a practical numerical overparametization criterion,
particularly
given the inherent unreliability of information derived from numerical
Hessians.
Robert H. Leary, PhD
Fellow
Pharsight - A Certara(tm) Company
5625 Dillard Dr., Suite 205
Cary, NC 27511
Phone/Voice Mail: (919) 852-4625, Fax: (919) 859-6871
Email: [email protected]
> This email message (including any attachments) is for the sole use of the
> intended recipient and may contain confidential and proprietary information.
> Any disclosure or distribution to third parties that is not specifically
> authorized by the sender is prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies of
> the original message.
Quoted reply history
-----Original Message-----
From: [email protected]
[mailto:[email protected]]on Behalf Of Nick Holford
Sent: Friday, August 21, 2009 16:44 PM
To: nmusers
Subject: Re: [NMusers] Linear VS LTBS
Leonid,
You are once again ignoring the actual evidence that NONMEM VI will fail
to converge or not complete the covariance step more or less at random.
If you bootstrap simulated data in which the model is known and not
overparameterised it has been shown repeatedly that NONMEM VI will
sometimes converge and do the covariance step and sometimes fail to
converge.
Of course, I agree that overparameterisation could be a cause of
convergence problems but I would not agree that this is often the reason.
Bob Bauer has made efforts in NONMEM 7 to try to fix the random
termination behaviour and covariance step problems by providing
additional control over numerical tolerances. It remains to be seen by
direct experiment if NONMEM 7 is indeed less random than NONMEM VI.
BTW in this discussion about LTBS I think it is important to point out
that the only systematic study I know of comparing LTBS with
untransformed models was the one you reported at the 2008 PAGE meeting
(www.page-meeting.org/?abstract=1268). My understanding of your results
was that there was no clear advantage of LTBS if INTER was used with
non-transformed data:
"Models with exponential residual error presented in the log-transformed
variables
performed similar to the ones fitted in original variables with INTER
option. For problems with
residual variability exceeding 40%, use of INTER option or
log-transformation was necessary to
obtain unbiased estimates of inter- and intra-subject variability."
Do you know of any other systematic studies comparing LTBS with no
transformation?
Nick
Leonid Gibiansky wrote:
> Neil
> Large RSE, inability to converge, failure of the covariance step are
> often caused by the over-parametrization of the model. If you already
> have bootstrap, look at the scatter-plot matrix of parameters versus
> parameters (THATA1 vs THETA2, .., THETA1 vs OMEGA1, ...), these are
> very informative plots. If you have over-parametrization on the
> population level, it will be seen in these plots as strong
> correlations of the parameter estimates.
>
> Also, look on plots of ETAs vs ETAs. If you see strong correlation
> (close to 1) there, it may indicate over-parametrization on the
> individual level (too many ETAs in the model).
>
> For random effect with a very large RSE on the variance, I would try
> to remove it and see what happens with the model: often, this (high
> RSE) is the indication that the error effect is not needed.
>
> Also, try combined error model (on log-transformed variables):
>
> W1=SQRT(THETA(...)/IPRED**2+THETA(...))
> Y = LOG(IPRED) + W1*EPS(1)
>
>
> $SIGMA
> 1 FIXED
>
>
> Why concentrations were on LOQ? Was it because BQLs were inserted as
> LOQ? Then this is not a good idea.
> Thanks
> Leonid
>
>
> --------------------------------------
> Leonid Gibiansky, Ph.D.
> President, QuantPharm LLC
> web: www.quantpharm.com
> e-mail: LGibiansky at quantpharm.com
> tel: (301) 767 5566
>
>
>
>
> Indranil Bhattacharya wrote:
>> Hi Joachim, thanks for your suggestions/comments.
>>
>> When using LTBS I had used a different error model and the error
>> block is shown below
>> $ERROR
>> IPRED = -5
>> IF (F.GT.0) IPRED = LOG(F) ;log transforming predicition
>> IRES=DV-IPRED
>> W=1
>> IWRES=IRES/W ;Uniform Weighting
>> Y = IPRED + ERR(1)
>>
>> I also performed bootsrap on both LTBS and non-LTBS models and the
>> non-LTBS CI were much more tighter and the precision was greater than
>> non-LTBS.
>> I think the problem plausibly is with the fact that when fitting the
>> non-transformed data I have used the proportional + additive model
>> while using LTBS the exponential model (which converts to additional
>> model due to LTBS) was used. The extra additive component also may be
>> more important in the non-LTBS model as for some subjects the
>> concentrations were right on LOQ.
>>
>> I tried the dual error model for LTBS but does not provide a CV%. So
>> I am currently running a bootstrap to get the CI when using the dual
>> error model with LTBS.
>>
>> Neil
>>
>> On Fri, Aug 21, 2009 at 3:01 AM, Grevel, Joachim
>> <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>> Hi Neil,
>> 1. When data are log-transformed the $ERROR block has to
>> change:
>> additive error becomes true exponential error which cannot be
>> achieved without log-transformation (Nick, correct me if I am
>> wrong).
>> 2. Error cannot "go away". You claim your structural model
>> (THs)
>> remained unchanged. Therefore the "amount" of error will remain the
>> same as well. If you reduce BSV you may have to "pay" for it with
>> increased residual variability.
>> 3. Confidence intervals of ETAs based on standard errors
>> produced
>> during the covariance step are unreliable (many threads in NMusers).
>> Do bootstrap to obtain more reliable C.I..
>> These are my five cents worth of thought in the early morning,
>> Good luck,
>> Joachim
>>
>>
>> ------------------------------------------------------------------------
>>
>> AstraZeneca UK Limited is a company incorporated in England and
>> Wales with registered number: 03674842 and a registered office at 15
>> Stanhope Gate, London W1K 1LN.
>>
>> *Confidentiality Notice: *This message is private and may contain
>> confidential, proprietary and legally privileged information. If you
>> have received this message in error, please notify us and remove it
>> from your system and note that you must not copy, distribute or take
>> any action in reliance on it. Any unauthorised use or disclosure of
>> the contents of this message is not permitted and may be unlawful.
>>
>> *Disclaimer:* Email messages may be subject to delays, interception,
>> non-delivery and unauthorised alterations. Therefore, information
>> expressed in this message is not given or endorsed by AstraZeneca UK
>> Limited unless otherwise notified by an authorised representative
>> independent of this message. No contractual relationship is created
>> by this message by any person unless specifically indicated by
>> agreement in writing other than email.
>>
>> *Monitoring: *AstraZeneca UK Limited may monitor email traffic data
>> and content for the purposes of the prevention and detection of
>> crime, ensuring the security of our computer systems and checking
>> compliance with our Code of Conduct and policies.
>>
>> -----Original Message-----
>>
>>
>> *From:* [email protected]
>> <mailto:[email protected]>
>> [mailto:[email protected]
>> <mailto:[email protected]>]*On Behalf Of *Indranil
>> Bhattacharya
>> *Sent:* 20 August 2009 17:07
>> *To:* [email protected] <mailto:[email protected]>
>> *Subject:* [NMusers] Linear VS LTBS
>>
>> Hi, while data fitting using NONMEM on a regular PK data set
>> and its log transformed version I made the following
>> observations
>> - PK parameters (thetas) were generally similar
>> between
>> regular and when using LTBS.
>> -ETA on CL was similar
>> -ETA on Vc was different between the two runs.
>> - Sigma was higher in LTBS (51%) than linear (33%)
>> Now using LTBS, I would have expected to see the
>> ETAs unchanged
>> or actually decrease and accordingly I observed that the eta
>> values decreased showing less BSV. However the %RSE for ETA on
>> VC changed from 40% (linear) to 350% (LTBS) and further the
>> lower 95% CI bound has a negative number for ETA on Vc (-0.087).
>> What would be the explanation behind the above
>> observations
>> regarding increased %RSE using LTBS and a negative lower bound
>> for ETA on Vc? Can a negative lower bound in ETA be considered
>> as zero?
>> Also why would the residual vriability increase when using LTBS?
>> Please note that the PK is multiexponential (may be
>> this is
>> responsible).
>> Thanks.
>> Neil
>>
>> -- Indranil Bhattacharya
>>
>>
>>
>>
>> --
>> Indranil Bhattacharya
>>
--
Nick Holford, Professor Clinical Pharmacology
Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
[email protected] tel:+64(9)923-6730 fax:+64(9)373-7090
mobile: +64 21 46 23 53
http://www.fmhs.auckland.ac.nz/sms/pharmacology/holford