Re: Condition number
Dear Pete, Ken, Nick, Bob and all,
I truly appreciate the spirit of discussion and helpful background information.
It is nostalgic to see all on this thread. Thank you.
Regards,
Ayyappa
Quoted reply history
> On Nov 30, 2022, at 7:43 PM, Ken Kowalski <[email protected]> wrote:
>
> Hi Pete,
>
> I'm really not trying to conflate these two different concepts. What you are
> describing is a desire to have a diagnostic that relates to the numeric
> instability of matrix inversion. If that is your desire, then yes the CN of
> the correlation matrix is not what you want. As I suggested previously, a CN
> derived from the Hessian (R-matrix in NONMEM parlance) or from the covariance
> matrix (inverse R if MATRIX=R option is employed) is probably what you want
> because this is the matrix that is actually being inverted and the CN will be
> larger because both differences in scales of the variances as well as
> collinearity issues will contribute to the potential numerical instability in
> the inversion process. However, my focus is more purely on the impact of
> collinearity where the choice of model and the limitations with the data to
> support the choice of model can have a big impact on model stability.
>
> From Bob Bauer's response earlier today it sounds like NONMEM behind the
> scenes is performing the eigenvalue analysis of the Hessian (R-matrix) as the
> first step and if that is successful (all eigenvalues positive such that the
> R matrix is positive semidefinite) and hence invertible then the COV step
> runs and the covariance matrix, correlation matrix and eigenvalues (PRINT=E
> option of $COV step) from the correlation matrix will be reported. Note when
> the COV step fails we often get the warning message that the R-matrix is
> non-positive semi-definite (NPSD) which implies one or more of the
> eigenvalues from the R matrix is 0 (singular) or negative. So clearly,
> NONMEM is calculating these eigenvalues behind the scenes and it sounds like
> you would like NONMEM to report these even if NONMEM determines that they are
> all positive and the $COV step can run successfully. I see no reason why
> NONMEM could not make this an option so that you can assess for yourself how
> much loss in accuracy there might have been in inverting the R matrix. Maybe
> broach this with Bob?
>
> Best,
>
> Ken
>
> -----Original Message-----
> From: Bonate, Peter [mailto:[email protected]]luSent: Wednesday,
> November 30, 2022 7:52 PM
> To: Ken Kowalski <[email protected]>; 'Leonid Gibiansky'
> <[email protected]>
> Cc: 'Matthew Fidler' <[email protected]>; 'Kyun-Seop Bae'
> <[email protected]>; [email protected]; 'Jeroen Elassaiss-Schaap
> (PD-value B.V.)' <[email protected]>; Alan Maloney
> ([email protected]) <[email protected]>
> Subject: RE: [NMusers] Condition number
>
> Thanks Ken and Al. I miss these discussions, while others in NMusers are
> probably thinking “how can there be that many emails on this”.
>
> I think we are conflating different things. The reason we look at the CN is
> that during the optimization process NONMEM has to invert a matrix (its
> either gradient or the Hessian, I am not sure), and also again in the
> calculation of the standard errors. The log10 CN is how many digits are lost
> in that inversion process. You can have matrix instability for many reasons,
> collinearity being one of them, but that is not the only reason. As you
> said, you can have parameters that are widely different in scale; that can
> also cause instability in that inversion process. Thus, a high CN does not
> always imply you have collinearity.
>
> So if NONMEM is reporting the eigenvalues of the correlation matrix, then
> this has a couple of consequences:
> • The CN no longer means how many digits are lost during inversion
> • The CN no longer indicates how stable that matrix inversion is
> • The only thing it is good for now is detecting collinearity.
> • We use a cutoff value of 1000 because that implies we lose 3 digits of
> accuracy during the inversion. This value may not be applicable to a
> correlation matrix eigenvector.
>
> This is why using the correlation matrix makes no sense to me. Still doesn’t.
>
> Ayyappa – see what a can of worms you opened. Lol.
>
> pete
>
>
>
> Peter Bonate, PhD
> Executive Director
> Pharmacokinetics, Modeling, and Simulation (PKMS) Clinical Pharmacology and
> Exploratory Development (CPED) Astellas
> 1 Astellas Way
> Northbrook, IL 60062
> [email protected]
> (224) 619-4901
>
>
> Quote of the week –
> “Dancing with the Stars” is not owned by Astellas.
>
> -----Original Message-----
> From: Bonate, Peter
> Sent: Wednesday, November 30, 2022 9:13 AM
> To: Ken Kowalski <[email protected]>; 'Leonid Gibiansky'
> <[email protected]>
> Cc: 'Matthew Fidler' <[email protected]>; 'Kyun-Seop Bae'
> <[email protected]>; [email protected]; 'Jeroen Elassaiss-Schaap
> (PD-value B.V.)' <[email protected]>
> Subject: RE: [NMusers] Condition number
>
> Just wanted to follow up on a few things.
>
> First, Nick, glad to hear from you again.
>
> I gave up trying to understand how NONMEM works years ago. I don't need to
> know how the engine works in my car to drive it or make me a better driver.
> But I did look at CN years ago. One of my first publications, The Effect of
> Collinearity on Parameter Estimates in Nonlinear Mixed Effect Models (1999),
> showed that when you put correlated covariates into a model (with
> correlations greater than 0.75) the standard error of the estimates become
> inflated, and the estimates of the parameters themselves becomes biased.
> This is why we don't put weight and BSA on the same parameter, for example.
> You can spot this problem easily from the CN. So, although I would never
> choose a model based solely on its condition number, I always look at it as
> part of the totality of evidence for how good a model is. But maybe that's
> just me.
>
> And to follow up with this statement from Ken:
> That is, a high CN in any one of the three matrices (Hessian, covariance
> matrix, correlation matrix) will result in a high CN in the others.
> I would think that the correlation matrix will give you the smallest
> condition number because it's scaled. I needed to see this for myself in R.
> I made a covariance matrix and computed the eigenvalues then transformed it
> to a correlation matrix. The condition number of the correlation matrix is
> lower than the covariance matrix condition number.
>
>> cov <- c(10, 2, 1, 2, 4, 3, 1, 3, 6)
>> cov <- matrix(cov, nrow=3, byrow=TRUE) cov
> [,1] [,2] [,3]
> [1,] 10 2 1
> [2,] 2 4 3
> [3,] 1 3 6
>> p <- cov2cor(cov)
>> p
> [,1] [,2] [,3]
> [1,] 1.0000000 0.3162278 0.1290994
> [2,] 0.3162278 1.0000000 0.6123724
> [3,] 0.1290994 0.6123724 1.0000000
>> eig.cov <- eigen(cov)
>> eig.p <- eigen(p)
>> CN.cov <- eig.cov$values[1]/eig.cov$values[3]
>> CN.p <- eig.p$values[1]/eig.p$values[3] CN.cov
> [1] 6.68266
>> CN.p
> [1] 4.899988
>
> So I guess we need Bob Bauer to chime in on this latter issue.
>
> pete
>
>
> Peter Bonate, PhD
> Executive Director
> Pharmacokinetics, Modeling, and Simulation (PKMS) Clinical Pharmacology and
> Exploratory Development (CPED) Astellas
> 1 Astellas Way
> Northbrook, IL 60062
> [email protected]
> (224) 619-4901
>
>
> Quote of the week –
> “Dancing with the Stars” is not owned by Astellas.
>
> -----Original Message-----
> From: Ken Kowalski <[email protected]>
> Sent: Tuesday, November 29, 2022 8:29 PM
> To: Bonate, Peter <[email protected]>; 'Leonid Gibiansky'
> <[email protected]>
> Cc: 'Matthew Fidler' <[email protected]>; 'Kyun-Seop Bae'
> <[email protected]>; [email protected]; 'Jeroen Elassaiss-Schaap
> (PD-value B.V.)' <[email protected]>
> Subject: RE: [NMusers] Condition number
>
> Hi Pete,
>
> I would say the Hessian would be the more appropriate matrix rather than the
> Jacobian since the covariance matrix of the parameter estimates is typically
> estimated as the inverse of the Hessian for most nonlinear regression
> packages and what NONMEM does if you use the MATRIX=R option on the $COV step
> instead of NONMEM's default sandwich estimator. Looking at the eigenvalues
> of the Hessian, or the eigenvalues of the covariance matrix of the parameter
> estimates or the eigenvalues of the correlation matrix of the parameter
> estimates are all going to be related. That is, a high CN in any one of the
> three matrices (Hessian, covariance matrix, correlation matrix) will result
> in a high CN in the others.
>
> I have encountered NONMEM reporting a negative eigenvalue too. I assume this
> is the result of a numerical precision issue because if it was truly
> negative, then the Hessian would not be positive semi-definite and hence the
> COV step should fail. I am not a numerical analyst so this is another issue
> that I would be interested in hearing from Bob Bauer on how NONMEM can report
> a negative eigenvalue.
>
> Best,
>
> Ken
>
> Kenneth G. Kowalski
> Kowalski PMetrics Consulting, LLC
> Email: [email protected]
> Cell: 248-207-5082
>
>
>
> -----Original Message-----
> From: Bonate, Peter [mailto:[email protected]]
> Sent: Tuesday, November 29, 2022 8:27 PM
> To: Leonid Gibiansky <[email protected]>
> Cc: Ken Kowalski <[email protected]>; Matthew Fidler
> <[email protected]>; Kyun-Seop Bae <[email protected]>;
> [email protected]; Jeroen Elassaiss-Schaap (PD-value B.V.)
> <[email protected]>
> Subject: Re: [NMusers] Condition number
>
> This is great. Just like the glory days of NMusers. Any moment now Nick
> Holford is jointing to chime in.
>
> I’m not an expert in matrix algebra but is the correlation matrix the right
> one to be using? We are concerned about inversion of the hessian. That
> instability is what affects our parameter estimates and standard errors.
> Doesn’t that depend on the Jacobian? Shouldn’t we be looking at the
> eigenvalues of the Jacobian matrix instead?
>
> And to echo what was already said. Never use the condition number as an
> absolute. It’s a yardstick. FYI- one time I got a negative eigenvalue from
> nonmem and would not have known how unstable the model was unless I looked at
> the eigenvalue.
>
> Pete.
>
>> On Nov 29, 2022, at 7:17 PM, Leonid Gibiansky <[email protected]>
>> wrote:
>>
>> from the manual:
>>
>> Iteration -1000000003 indicates that this line contains the condition number
>> , lowest, highest, Eigen values of the correlation matrix of the variances
>> of the final parameters.
>>
>>
>>
>>>> On 11/29/2022 7:59 PM, Ken Kowalski wrote:
>>> Hi Matt,
>>> I’m pretty sure Stu Beal told me many years ago that NONMEM calculates the
>>> eigenvalues from the correlation matrix. Maybe Bob Bauer can chime in here?
>>> Ken
>>> *From:*Matthew Fidler [mailto:[email protected]]
>>> *Sent:* Tuesday, November 29, 2022 7:56 PM
>>> *To:* Ken Kowalski <[email protected]>
>>> *Cc:* Kyun-Seop Bae <[email protected]>; [email protected];
>>> Jeroen Elassaiss-Schaap (PD-value B.V.) <[email protected]>
>>> *Subject:* Re: [NMusers] Condition number Hi Ken, I am unsure, since
>>> I don't have my NONMEM manual handy.
>>> I based my understanding on reading about condition numbers in numerical
>>> analysis, which seemed to use the parameter estimates:
>>> https://en.wikipedia.org/wiki/Condition_number
>>> https://en.wikipedia.org/wiki/Condition_number
>>> If it uses the correlation matrix, it could be less sensitive.
>>> Matt
>>>> On Tue, Nov 29, 2022 at 6:11 PM Ken Kowalski <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>> Hi Matt,
>>> Correct me if I’m wrong but I thought NONMEM calculates the
>>> condition number based on the correlation matrix of the parameter
>>> estimates so it is scaled based on the standard errors of the estimates.
>>> Ken
>>> *From:*Matthew Fidler [mailto:[email protected]
>>> <mailto:[email protected]>]
>>> *Sent:* Tuesday, November 29, 2022 7:04 PM
>>> *To:* Ken Kowalski <[email protected]
>>> <mailto:[email protected]>>
>>> *Cc:* Kyun-Seop Bae <[email protected]
>>> <mailto:[email protected]>>; [email protected]
>>> <mailto:[email protected]>; Jeroen Elassaiss-Schaap (PD-value
>>> B.V.) <[email protected] <mailto:[email protected]>>
>>> *Subject:* Re: [NMusers] Condition number
>>> Hi Ken & Kyun-Seop,
>>> I agree it should be taught, since it is prevalent in the industry,
>>> and should be looked at as something to investigate further, but no
>>> hard and fast rule should be applied to if the model is reasonable
>>> and fit for purpose. That should be done in conjunction with other
>>> diagnostic plots.
>>> One thing that has always bothered me about the condition number is
>>> that it is calculated based on the final parameter estimates, but
>>> not the scaled parameter estimates. Truly the scaling is supposed
>>> to help make the gradient on a comparable scale and fix many
>>> numerical problems here. Hence, if the scaling works as it is
>>> supposed to, small changes may not affect the colinearity as
>>> strongly as the calculated condition number suggests.
>>> This is mainly why I see it as a number to keep in mind instead of a
>>> hard and fast rule.
>>> Matt
>>> On Tue, Nov 29, 2022 at 5:09 PM Ken Kowalski <[email protected]
>>> <mailto:[email protected]>> wrote:
>>> Hi Kyun-Seop,
>>> I would state things a little differently rather than say
>>> “devalue condition number and multi-collinearity” we should
>>> treat CN as a diagnostic and rules such as CN>1000 should NOT be
>>> used as a hard and fast rule to reject a model. I agree with
>>> Jeroen that we should understand the implications of a high CN
>>> and the impact multi-collinearity may have on the model
>>> estimation and that there are other diagnostics such as
>>> correlations, variance inflation factors (VIF), standard errors,
>>> CIs, etc. that can also help with our understanding of the
>>> effects of multi-collinearity and its implications for model
>>> development.
>>> That being said, if you have a model with a high CN and the
>>> model converges with realistic point estimates and reasonable
>>> standard errors then it may still be reasonable to accept that
>>> model. However, in this setting I would probably still want to
>>> re-run the model with different starting values and make sure it
>>> converges to the same OFV and set of point estimates.
>>> As the smallest eigenvalue goes to 0 and the CN goes to infinity
>>> we end up with a singular Hessian matrix (R matrix) so we know
>>> that at some point a high enough CN will result in convergence
>>> and COV step failures. Thus, you shouldn’t simply dismiss CN as
>>> not having any diagnostic value, just don’t apply it in a rule
>>> such as CN>1000 to blindly reject a model. The CN>1000 rule
>>> should only be used to call your attention to the potential for
>>> an issue that warrants further investigation before accepting
>>> the model or deciding how to alter the model to improve
>>> stability in the estimation.
>>> Best,
>>> Ken
>>> Kenneth G. Kowalski
>>> Kowalski PMetrics Consulting, LLC
>>> Email: [email protected] <mailto:[email protected]>
>>> Cell: 248-207-5082
>>> *From:*[email protected]
>>> <mailto:[email protected]>
>>> [mailto:[email protected]
>>> <mailto:[email protected]>] *On Behalf Of *Kyun-Seop Bae
>>> *Sent:* Tuesday, November 29, 2022 5:10 PM
>>> *To:* [email protected] <mailto:[email protected]>
>>> *Subject:* Fwd: [NMusers] Condition numbera
>>> Dear All,
>>> I would like to devalue condition number and multi-collinearity
>>> in nonlinear regression.
>>> The reason we consider condition number (or multi-collinearity)
>>> is that this may cause the following fitting (estimation) problems;
>>> 1. Fitting failure (fail to converge, fail to minimize)
>>> 2. Unrealistic point estimates
>>> 3. Too wide standard errors
>>> If you do not see the above problems (i.e., no estimation
>>> problem with modest standard error), you do not need to give
>>> attention to the condition number.
>>> I think I saw 10^(n – parameters) criterion in an old version of
>>> Gabrielsson’s book many years ago (but not in the latest version).
>>> Best regards,
>>> Kyun-Seop Bae
>>> On Tue, 29 Nov 2022 at 22:59, Ayyappa Chaturvedula
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> Dear all,
>>> I am wondering if someone can provide references for the
>>> condition number thresholds we are seeing (<1000) etc. Also,
>>> the other way I have seen when I was in graduate school that
>>> condition number <10^n (n- number of parameters) is OK.
>>> Personally, I am depending on correlation matrix rather than
>>> condition number and have seen cases where condition number
>>> is large (according to 1000 rule but less than 10^n rule)
>>> but correlation matrix is fine.
>>> I want to provide these for my teaching purposes and any
>>> help is greatly appreciated.
>>> Regards,
>>> Ayyappa
>>>
>>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm
>> _campaign=sig-email&utm_content=emailclient>
>>>
>>> Virus-free.www.avast.com
>>>
>>> https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm
>> _campaign=sig-email&utm_content=emailclient>
>>
>
>
> --
> This email has been checked for viruses by Avast antivirus software.
> www.avast.com
>
>