RE: Condition number
This is also discussed in my book on page 70.
The first definition is simply the ratio of the largest to
smallest eigenvalue
K = L1/Lp (51)
where L1 and Lp are the largest and smallest eigenvalues of
the correlation matrix (Jackson 1991). The second way is to
define K as
K = sqrt(L1/Lp) (52)
The latter method is often used simply because the
condition numbers are smaller. The user should be aware
how a software package computes a condition number. For
instance, SAS uses (52). For this book (51) will be used as
the definition of the condition number. Condition numbers
range from 1, which indicates perfect stability, to infinity,
which indicates perfect instability. As a rule of thumb,
Log10(K) using (51) indicates the number of decimal places
lost by a computer due to round-off errors due to matrix
inversion. Most computers have about 16 decimal digits of
accuracy and if the condition number is 10^4, then the result
will be accurate to at most 12 (calculated as 16 - 4) decimal
places of accuracy.
It is difficult to find useful yardsticks in the literature
about what constitutes a large condition number because
many books have drastically different cut-offs. For this
book, the following guidelines will be used. For a linear
model, when the condition number is less than 104, no
serious collinearity is present. When the condition number
is between 10^4 and 10^6, moderate collinearity is present,
and when the condition number exceeds 10^6, severe
collinearity is present and the values of the parameter
estimates are not to be trusted. The difficulty with the use
of the condition number is that it fails to identify which
columns are collinear and simply indicates that collinearity
is present. If multicollinearity is present wherein a function
of one or more columns is collinear with a function of one
or more other columns, then the condition number will fail
to identify that collinearity. See Belsley et al. (1980) for
details on how to detect collinearity among sets of
covariates
I also found this on stack exchange
https://math.stackexchange.com/questions/2392992/matrix-condition-number-and-loss-of-accuracy
pete
Peter Bonate, PhD
Executive Director
Pharmacokinetics, Modeling, and Simulation (PKMS)
Clinical Pharmacology and Exploratory Development (CPED)
Astellas
1 Astellas Way
Northbrook, IL 60062
[email protected]
(224) 619-4901
Quote of the week –
“Dancing with the Stars” is not owned by Astellas.
Quoted reply history
-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of
Ayyappa Chaturvedula
Sent: Tuesday, November 29, 2022 9:20 AM
To: Ken Kowalski <[email protected]>
Cc: [email protected]
Subject: Re: [NMusers] Condition number
Thank you, Ken. It is very reassuring.
I have also seen a discussion on other forums that Condition number as a
function of dimension of problem (n). I am seeing contradiction between 10^n
and a static >1000 approach. I am curious if someone can also comment on this
and 10^n rule?
Regards,
Ayyappa
> On Nov 29, 2022, at 9:04 AM, Ken Kowalski <[email protected]> wrote:
>
> Hi Ayyappa,
>
> I think the condition number was first proposed as a statistic to
> diagnose multicollinearity in multiple linear regression analyses
> based on an eigenvalue analysis of the X'X matrix. You can probably
> search the statistical literature and multiple linear regression
> textbooks to find various rules for the condition number as well as
> other statistics related to the eigenvalue analysis. For the CN<1000
> rule I typically reference the following textbook:
>
> Montgomery and Peck (1982). Introduction to Linear Regression Analysis.
> Wiley, NY (pp. 301-302).
>
> The condition number is good at detecting model instability but it is
> not very good for identifying the source. Inspecting the correlation
> matrix for extreme pairwise correlations is better suited for identifying the
> source of
> the instability when it only involves a couple of parameters. It becomes
> more challenging to identify the source of the instability
> (multicollinearity) when the CN>1000 but none of the pairwise
> correlations are extreme |corr|>0.95. Although when CN>1000 often we
> will find several pairwise correlations that are moderately high
> |corr|>0.7 but it may be hard to uncover a pattern or source of the
> instability without trying alternative models that may eliminate one
> or more of the parameters associated with these moderate to high correlations.
>
> Best,
>
> Ken
>
> Kenneth G. Kowalski
> Kowalski PMetrics Consulting, LLC
> Email: [email protected]
> Cell: 248-207-5082
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Ayyappa
> Chaturvedula
> Sent: Tuesday, November 29, 2022 8:52 AM
> To: [email protected]
> Subject: [NMusers] Condition number
>
> Dear all,
> I am wondering if someone can provide references for the condition
> number thresholds we are seeing (<1000) etc. Also, the other way I
> have seen when I was in graduate school that condition number <10^n
> (n- number of parameters) is OK. Personally, I am depending on
> correlation matrix rather than condition number and have seen cases
> where condition number is large (according to 1000 rule but less than
> 10^n rule) but correlation matrix is fine.
>
> I want to provide these for my teaching purposes and any help is
> greatly appreciated.
>
> Regards,
> Ayyappa
>
>
> --
> This email has been checked for viruses by Avast antivirus software.
> www.avast.com