RE: Linear VS LTBS

From: Bob Leary Date: August 25, 2009 technical Source: mail-archive.com
Sorry to back up a few days on this thread, but I could not resist piling yet another comment on the traditional covariance success/convergence debate. >From a purely algorithmic perspective, the relationship between convergence >behavior and success of the covariance step is extremely tenuous. NONMEM uses a version of a BFGS quasi Newton method to drive the ELS obective function optimization. At any given stage, the descent direction is of the form -(H_BFGS)**-1 * g, where g is the gradient of the objective function and H_BFGS is a positive definite matrix that captures accumulated curvature information recovered from the entire sequence of previously evaluated gradients at various points. Note that the 'true' Newton direction is -H**-1 * g, where H is the Hessian evaluated at the current point. H also captures localized (to the current point) curvature information, but is difficult and expensive to evaluate, so BFGS methods use the much more easily computed quasi-Newton direction. With accurate gradients, the quasi-Newton direction is necessarily a descent direction (since H_BFGS is guaranteed to be positive definite), while the true Newton direction need not be. Note that H and H_BFGS do not bear any necessary relation to each other, although H_BFGS is often thought of as a surrogate or approximant for H. At convergence (usually recognized by the algorithm as the gradient becoming 'sufficiently' small in magnitude), the standard errors in principle are computed from the square roots of the diagonal elements of H**(-1), assuming H is positive definite (which is almost always the case if at least a local minimum has been found - there are some degenerate exceptions, called non-Morse points where the Hessian is only positive semidefinite at the local optimum, but these are rarely encountered). But in practice the true Hessian H is hard to compute, and NM uses a numerical approximation to the Hessian (I believe forward difference, meaning for example in one dimension the second derivative at x is approximated by [f(x+2*eps) - 2f(x+eps) + f(x )}/eps^2. Even if an optimal step size eps is used and f can be computed to full 15 digit double precision accuracy (which is way too optimistic if the model is defined by ODEs which are solved numerically), the best you can do numerically with forward differences is about 5 significant digits of accuracy of agreement of the numerical hessian with the true Hessian . Central differences, which are much more expensive, do a little bit better. But for the hessian inverse, if the numerical hessian or the true Hessian has a condition number greater than 10^5 (a condition number of 10^5 is actually quite benign for most matrix computations), then there are no siginificant digits of agreement of the std errors computed from the numerical hessian and the standard errors computed from the true hessian. Moreover, in this case a perfectly respectable positive definite matrix H can easily become indefinite when approximated numerically, which will cause the covariance step to fail. Thus failure of the covariance step means only that the numerical Hessian at the converged point is not positive definite, but very little can be concluded from this regarding the true H due to the inherent numerical difficulties and imprecison in numerical Hessian computations. Moreover, the numerical Hessian plays no role whatever in the convergence of the algorithm. Conversely, success of the covariance step simply means the numerical Hessian is positive definite. What is well known is that most gradient based (such as simple gradient descent, newton, quasi-Newton, and conjugate gradient) unconstrained numerical optimization methods typically work best (speed ,accuracy, and reliability) when the true Hessian H in the vicinity of the optimum has a fairly narrow range of eigenvalues - e.g. when the condition number max eigenvalue/ min eigenvalue is relatively small. (In the case of conjugate gradient methods, it is possible to obtain theoretical bounds on rates of convergence in terms of condition number with the rates being inversely related to the condition number). Of course, a good condition number near the optimum but not near the starting point does not preclude convergence problems occuring away from the neighborhood of the optimum. But all other things being equal, well conditioned true Hessians (small condition numbers) in the neighborhood of the optimum are generally favorable for convergence behavior. In terms of parameterization, each additional parameter usually increases the condition number at the optimum (this can be shown to be strictly true in the case of ordinary linear regression). So in this sense, overparameterization 'typically' adversely affects convergence behavior. But unfortunately it is not clear how to turn this into a practical numerical overparametization criterion, particularly given the inherent unreliability of information derived from numerical Hessians. Robert H. Leary, PhD Fellow Pharsight - A Certara(tm) Company 5625 Dillard Dr., Suite 205 Cary, NC 27511 Phone/Voice Mail: (919) 852-4625, Fax: (919) 859-6871 Email: [email protected] > This email message (including any attachments) is for the sole use of the > intended recipient and may contain confidential and proprietary information. > Any disclosure or distribution to third parties that is not specifically > authorized by the sender is prohibited. If you are not the intended > recipient, please contact the sender by reply email and destroy all copies of > the original message.
Quoted reply history
-----Original Message----- From: [email protected] [mailto:[email protected]]on Behalf Of Nick Holford Sent: Friday, August 21, 2009 16:44 PM To: nmusers Subject: Re: [NMusers] Linear VS LTBS Leonid, You are once again ignoring the actual evidence that NONMEM VI will fail to converge or not complete the covariance step more or less at random. If you bootstrap simulated data in which the model is known and not overparameterised it has been shown repeatedly that NONMEM VI will sometimes converge and do the covariance step and sometimes fail to converge. Of course, I agree that overparameterisation could be a cause of convergence problems but I would not agree that this is often the reason. Bob Bauer has made efforts in NONMEM 7 to try to fix the random termination behaviour and covariance step problems by providing additional control over numerical tolerances. It remains to be seen by direct experiment if NONMEM 7 is indeed less random than NONMEM VI. BTW in this discussion about LTBS I think it is important to point out that the only systematic study I know of comparing LTBS with untransformed models was the one you reported at the 2008 PAGE meeting (www.page-meeting.org/?abstract=1268). My understanding of your results was that there was no clear advantage of LTBS if INTER was used with non-transformed data: "Models with exponential residual error presented in the log-transformed variables performed similar to the ones fitted in original variables with INTER option. For problems with residual variability exceeding 40%, use of INTER option or log-transformation was necessary to obtain unbiased estimates of inter- and intra-subject variability." Do you know of any other systematic studies comparing LTBS with no transformation? Nick Leonid Gibiansky wrote: > Neil > Large RSE, inability to converge, failure of the covariance step are > often caused by the over-parametrization of the model. If you already > have bootstrap, look at the scatter-plot matrix of parameters versus > parameters (THATA1 vs THETA2, .., THETA1 vs OMEGA1, ...), these are > very informative plots. If you have over-parametrization on the > population level, it will be seen in these plots as strong > correlations of the parameter estimates. > > Also, look on plots of ETAs vs ETAs. If you see strong correlation > (close to 1) there, it may indicate over-parametrization on the > individual level (too many ETAs in the model). > > For random effect with a very large RSE on the variance, I would try > to remove it and see what happens with the model: often, this (high > RSE) is the indication that the error effect is not needed. > > Also, try combined error model (on log-transformed variables): > > W1=SQRT(THETA(...)/IPRED**2+THETA(...)) > Y = LOG(IPRED) + W1*EPS(1) > > > $SIGMA > 1 FIXED > > > Why concentrations were on LOQ? Was it because BQLs were inserted as > LOQ? Then this is not a good idea. > Thanks > Leonid > > > -------------------------------------- > Leonid Gibiansky, Ph.D. > President, QuantPharm LLC > web: www.quantpharm.com > e-mail: LGibiansky at quantpharm.com > tel: (301) 767 5566 > > > > > Indranil Bhattacharya wrote: >> Hi Joachim, thanks for your suggestions/comments. >> >> When using LTBS I had used a different error model and the error >> block is shown below >> $ERROR >> IPRED = -5 >> IF (F.GT.0) IPRED = LOG(F) ;log transforming predicition >> IRES=DV-IPRED >> W=1 >> IWRES=IRES/W ;Uniform Weighting >> Y = IPRED + ERR(1) >> >> I also performed bootsrap on both LTBS and non-LTBS models and the >> non-LTBS CI were much more tighter and the precision was greater than >> non-LTBS. >> I think the problem plausibly is with the fact that when fitting the >> non-transformed data I have used the proportional + additive model >> while using LTBS the exponential model (which converts to additional >> model due to LTBS) was used. The extra additive component also may be >> more important in the non-LTBS model as for some subjects the >> concentrations were right on LOQ. >> >> I tried the dual error model for LTBS but does not provide a CV%. So >> I am currently running a bootstrap to get the CI when using the dual >> error model with LTBS. >> >> Neil >> >> On Fri, Aug 21, 2009 at 3:01 AM, Grevel, Joachim >> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Neil, >> 1. When data are log-transformed the $ERROR block has to >> change: >> additive error becomes true exponential error which cannot be >> achieved without log-transformation (Nick, correct me if I am >> wrong). >> 2. Error cannot "go away". You claim your structural model >> (THs) >> remained unchanged. Therefore the "amount" of error will remain the >> same as well. If you reduce BSV you may have to "pay" for it with >> increased residual variability. >> 3. Confidence intervals of ETAs based on standard errors >> produced >> during the covariance step are unreliable (many threads in NMusers). >> Do bootstrap to obtain more reliable C.I.. >> These are my five cents worth of thought in the early morning, >> Good luck, >> Joachim >> >> >> ------------------------------------------------------------------------ >> >> AstraZeneca UK Limited is a company incorporated in England and >> Wales with registered number: 03674842 and a registered office at 15 >> Stanhope Gate, London W1K 1LN. >> >> *Confidentiality Notice: *This message is private and may contain >> confidential, proprietary and legally privileged information. If you >> have received this message in error, please notify us and remove it >> from your system and note that you must not copy, distribute or take >> any action in reliance on it. Any unauthorised use or disclosure of >> the contents of this message is not permitted and may be unlawful. >> >> *Disclaimer:* Email messages may be subject to delays, interception, >> non-delivery and unauthorised alterations. Therefore, information >> expressed in this message is not given or endorsed by AstraZeneca UK >> Limited unless otherwise notified by an authorised representative >> independent of this message. No contractual relationship is created >> by this message by any person unless specifically indicated by >> agreement in writing other than email. >> >> *Monitoring: *AstraZeneca UK Limited may monitor email traffic data >> and content for the purposes of the prevention and detection of >> crime, ensuring the security of our computer systems and checking >> compliance with our Code of Conduct and policies. >> >> -----Original Message----- >> >> >> *From:* [email protected] >> <mailto:[email protected]> >> [mailto:[email protected] >> <mailto:[email protected]>]*On Behalf Of *Indranil >> Bhattacharya >> *Sent:* 20 August 2009 17:07 >> *To:* [email protected] <mailto:[email protected]> >> *Subject:* [NMusers] Linear VS LTBS >> >> Hi, while data fitting using NONMEM on a regular PK data set >> and its log transformed version I made the following >> observations >> - PK parameters (thetas) were generally similar >> between >> regular and when using LTBS. >> -ETA on CL was similar >> -ETA on Vc was different between the two runs. >> - Sigma was higher in LTBS (51%) than linear (33%) >> Now using LTBS, I would have expected to see the >> ETAs unchanged >> or actually decrease and accordingly I observed that the eta >> values decreased showing less BSV. However the %RSE for ETA on >> VC changed from 40% (linear) to 350% (LTBS) and further the >> lower 95% CI bound has a negative number for ETA on Vc (-0.087). >> What would be the explanation behind the above >> observations >> regarding increased %RSE using LTBS and a negative lower bound >> for ETA on Vc? Can a negative lower bound in ETA be considered >> as zero? >> Also why would the residual vriability increase when using LTBS? >> Please note that the PK is multiexponential (may be >> this is >> responsible). >> Thanks. >> Neil >> >> -- Indranil Bhattacharya >> >> >> >> >> -- >> Indranil Bhattacharya >> -- Nick Holford, Professor Clinical Pharmacology Dept Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand [email protected] tel:+64(9)923-6730 fax:+64(9)373-7090 mobile: +64 21 46 23 53 http://www.fmhs.auckland.ac.nz/sms/pharmacology/holford
Aug 20, 2009 Indranil Bhattacharya Linear VS LTBS
Aug 21, 2009 Joachim Grevel RE: Linear VS LTBS
Aug 21, 2009 Indranil Bhattacharya Re: Linear VS LTBS
Aug 21, 2009 Doug J. Eleveld RE: Linear VS LTBS
Aug 21, 2009 Leonid Gibiansky Re: Linear VS LTBS
Aug 21, 2009 Nick Holford Re: Linear VS LTBS
Aug 21, 2009 Ekaterina Gibiansky RE: Linear VS LTBS
Aug 23, 2009 Stephen Duffull RE: Linear VS LTBS
Aug 23, 2009 Nick Holford Re: Linear VS LTBS
Aug 23, 2009 Mats Karlsson RE: Linear VS LTBS
Aug 24, 2009 Nick Holford Re: Linear VS LTBS
Aug 24, 2009 Stephen Duffull RE: Linear VS LTBS
Aug 24, 2009 Mats Karlsson RE: Linear VS LTBS
Aug 24, 2009 Indranil Bhattacharya Re: Linear VS LTBS
Aug 25, 2009 Bob Leary RE: Linear VS LTBS