Centering
From: peter.bonate@quintiles.com
Subject: Centering
Date: Thu, 5 Jul 2001 08:53:00 -0500
I would just like to add my two cents in regarding centering. The whole
issue of centering revolves around scale and matrix inversion. As part of
the optimization process that NONMEM or any other nonlinear regression
program uses, the gradient and sometimes the Hessian, must be inverted.
The gradient is J'J where J is the Jacobian matrix or matrix of partial
first derivatives with respect to the model parameters. If the columns of
J are of different magnitudes (which happens when you have covariates of
different magnitudes) this leads to matrix instability during the inversion
process. However, if the columns of J are approximately the same
magnitude, as happens when the predictor variables are centered, the
resulting matrix inversion is more stable. Hence the reason to center.
And Nick is right - always center.
In regards to what happens with the parameters
1.) There should be no change in OFV or MSE
2.) The standard errors will change but the precision of the standard
errors (SE/Theta) will not. Hence statistical inference will not change.
To see whether a model is unstable and centering would be useful, print out
the eigenvalues using the PRINT=E option with $COV. Take the square root
of the largest to smallest eigenvalue. This number is called the condition
number and is a measure of the instability of the gradient matrix with
large numbers indicating instability. Now the million dollar question.
What is large? In a paper I wrote related to Pop PK (Pharm Res 16,
709-717, 1999) when the condition number was greater than a few hundred,
significant matrix instability was present. In regards to nonlinear
regression, condition numbers greater than 100,000 are considered high.
Note that WinNonlin reports the log-condition number.
I wrote the following a while back for a paper I. Although it is written
in terms of nonlinear regression, I think some may find it useful. It is
easy to extend the concept to multivariate predictor problems such as pop
pk.
****************
The last example of where ill-conditioning may arise is when the
columns of J are of differing magnitudes or scale. At the simplest level,
for a pharmacokinetic model where time is the independent variable, simply
changing the units of time can have a dramatic effect on the condition
number and ill-conditioning. For example, suppose samples for
pharmacokinetic analysis are collected at 0, 0.5, 1, 1.5, 2, 3, 4, 6, 8,
12, 18, and 24 h after intravenous drug administration with values of 39.4,
33.3, 29.2, 29.8, 24.3, 20.7, 17.8, 10.7, 6.8, 3.4, 1.0, 0.3, respectively.
Assume a 1-compartment model is appropriate to model the data
(Embedded image moved to file: pic09232.pcx),
where C is concentration, and are the estimable model parameters, and t
is time. The first derivatives for the model are
(Embedded image moved to file: pic00750.pcx)
(Embedded image moved to file: pic25205.pcx)
with Jacobian matrix
(Embedded image moved to file: pic04975.pcx).
Only 9 iterations were required for convergence using a Gauss-Newton
algorithm for optimization. The model parameter estimates (std error) are
=38.11 (0.72) and =0.2066(0.009449) per h. The matrix J'J is
(Embedded image moved to file: pic01539.pcx)
with eigenvalues of 2.33 and 23,390 and a corresponding condition number of
100. At one end of the scale is when time is transformed from hours to
seconds. Under the transformation, the model parameter estimates are
=38.12(0.72) and =5.738E-5(2.625E-6) per h. The matrix J'J is
(Embedded image moved to file: pic00303.pcx)
with eigenvalues of 2.33 and 303,088,318,808 and corresponding condition of
360,741. At the other end of the scale is when time is transformed from
hours to day. Under this transformation, the model parameter estimates are
=38.12(0.72) and =4.96(0.227) per h. The matrix J'J is
(Embedded image moved to file: pic11422.pcx)
with eigenvalues of 2.23 and 42.44 and corresponding condition of 4. It
is clear that inverting J'J when time is in second results in an unstable
matrix, whereas inverting J'J when time is in days results in a more stable
matrix. But note that in all cases, the parameter estimate for remains
the same, the mean square error remains the same (1.198), as does the CV of
the parameter estimates (4.57%). Changing the scale does not affect the
model precision or parameter precision and hence, any statistical inference
on the model parameters. The only thing that changes is the estimate of ,
but the change is proportional based on the transformation. So why such
the fuss? Some optimization algorithms are sensitive to scale, whereas
others are not. The algorithm used in the example above to estimate the
model parameters was the Gauss-Newton algorithm in the NLIN procedure in
SAS, which is relatively insensitive to scaling. However, using the
GRADIENT method in SAS, which uses the method of Steepest Descent, took
more than 13,000 iterations before convergence was achieved and then the
parameter estimates were quite poor. For example, whe time is scaled in
seconds, the parameter estimates are =35.00(1.20) and =5.123E-5(4.40E-6)
with a mean square error of 3.58. Note that the estimate of did not even
change from its starting value. Obviously algorithms that are not
sensitive to scale are preferable to algorithms that are. But, by forcing
the parameter estimates to be approximately the same, less ill-conditioning
results, thereby easing the convergence process.
**********************
I am pretty certain that NONMEM uses a Newton Raphson algorithm for
optimization. In regards to convergence it should be fairly robust to
situations where centering is not done.
I hope this helps, although I am fairly certain this topic is not quite
dead yet.
Peter Bonate