RE: Describing variability
From: "Kowalski, Ken"
Subject:RE: [NMusers] Describing variability
Date:Wed, 2 Apr 2003 12:43:34 -0500
Leonid,
You wrote:
>> 3. Even if $COV step converged, this is not a guarantee that the model is
>> correct, since it may be ill-conditioned any way.
With real data there is no way to know if a model is correctly specified
(hence, the famous statement from Box: "All models are wrong but some are
useful"). Please note however, that an ill-conditioned model does not imply
that the model is wrong. In a previous message I gave the example of a
simple Emax model to describe a dose-response relationship. Assuming that
the Emax model is correct, for a given set of data the Emax model may still
be ill-conditioned if we study too narrow a dose range such that we can't
get reliable estimates of the Emax and ED50. In this case, although the
model is correctly specified we need to be cautious in interpreting the
estimates of Emax and ED50 from an ill-conditioned model fit. In so doing,
if we can make the assessment that the estimates we obtained appear
reasonable, then certainly we might use them. This is the practical aspect
that most of you are willing to rely on when you accept such
over-parameterized models...which is fine provided that you are willing to
make that assessment that the estimates you obtained are indeed reasonable.
I just wonder if we will always know whether our estimates are reasonable.
>> 3. Do not accept the model if the relative standard error of estimate or
>> variability is say, more that 100%.
This is certainly a diagnostic one could look at but there are others that
can help diagnose the degree and nature of the ill-conditioning. For a
successful $COV step the PRINT=E option will report out the eigenvalues of
the correlation matrix sorted from smallest to largest. The ratio of the
largest-to-smallest eigenvalues is often referred to as the condition number
and is a measure of the degree of ill-conditioning. Montgomery & Peck,
Introduction to Linear Regression Analysis, Wiley, 1982, pp. 277-278
suggests that a condition number exceeding 1000 is an indication of severe
ill-conditioning. Inspection of the correlation matrix of the estimates can
help diagnose the nature of the ill-conditioning. In the Emax example I
gave above, the ill-conditioning would result in a pairwise correlation of
the estimates between Emax and ED50 to be very close to 1. Bates & Watts,
Nonlinear Regression Analysis and its Applications, Wiley, 1988, pp.90-91,
suggests that correlations exceeding 0.99 (in absolute value) should be a
cause for concern regarding ill-conditioning.
Ken