Re: OMEGA HAS A NONZERO BLOCK
From:"Lewis B. Sheiner"
Subject:Re: [NMusers] OMEGA HAS A NONZERO BLOCK
Date:Tue, 08 Oct 2002 12:11:23 -0700
Ken points out that my numbering was messed up ... Here is a corrected message.
LBS.
==================================
... Which is why you have to be Bayesian and incorporate an honest estimate not
only of what the 'science' says is the best guess, but of how sure that guess
is. And yes, if the science contradicts the data, you may well prefer to act on
the science, not the data, as presumably the science is also based on data, and
apparently enough of it to make the current data appear questionable. This is
why you cannot in principle fix parameters (or models): the strength of the
scientific knowledge (on which such choices are based) is thereby asserted to be
infinite: no amount of data can change your mind about a parameter that is
fixed. To Serge, I ask the following: What is so sacrosanct about the current
data/analysis that you are inclined to accept its point estimates even when they
are very uncertain, and fly in the face of valid past experience?
Now, all this is a bit too theoretical. When data appear to contradict accepted
science, we look for explanations. We do not usually decide we will accept one
or the other without a good reason to do so (the fact that these data are mine,
and those contradictory data are yours is NOT a good reason). Similarly, if we
fix a parameter in our analysis, and then, by examining residuals, etc.,
conclude that the data contradict that choice, we will change it. So wer are
not all so far apart as it may see,
However, let's get back to the real problem we have been discussing. It is the
case that the data are NOT definitive; indeed, not even suggestive about
parameters we consider 'important'. We have had the following suggestions for
what to do (until we can do another experiment that does address those
parameters):
1. Hope the problem is not really there. This is exemplified by Leonid's
remarks, which point out that with further careful investigation, we may find
that the data do indeed have something to say about the problem parameters--that
the fault was not that the problem was ill-posed--but that we were using
inadequate methods of analysis. Unfortunately, many ill-posed problems really
are fundamentally ill-posed and no analysis method will reveal what is not
there.
2. Use the estimate from the data regardless (exemplified by the choice of corr
= 0 or corr = 1 if the estimate is close to that value). This method is clearly
the easiest: It solves the problem without any additional work (such as
consulting experts, or doing additinal analyses). However, it has two terrible
problems, as I have discussed: (i) It is well known that for certain simple
cases of extremely ill-posed problems (and presumably what I am about to say
generalizes to more complex cases) the actual value of the parameters estimated
depends exclusively on the realization of the noise in the particular data at
hand and NOT AT ALL on the 'true' parameter values (you can convince yourself of
this by considering the regression y = a.x1 + b.x2 + error, where, unbeknownst
to the analyst x1 = x2 -- And please don't answer me that of course the analyst
would notice that x1=x2; I sacrificed realism to make the concept clear). Not
only may the estimates be nonsense (which is harmful, we recall, not to the
current analysis, which is insensitive to the values of these parameters--which
insensitivity is causing the problem in the first place--but for extrapolation
to new conditions), but by fixing on these meaningless estimates, (ii) we are
asserting that not only are they sensible, they are also known perfectly! It
seems to me these two problems effectively rule out this choice, despite its
attractive simplicity and seeming objectivity. Again, and I stress this, the
method is ruled out only when prediction under new circumstances is the goal; it
is perfectly reasonable if only the current data are to be interpreted (but in
that case, any approach to the under-determined parameters is rational since
they should have no influence on any inferences). I have seen nothing in this
long thread of correspondence that suggests to me that either my analysis of
this choice is wrong, or that there is some advantage to it that I have
overlooked.
3. Eliminate the ill-conditioning by fixing the under-determined parameters to
reasonable values based on external evidence (science). This necessarily
involves consulting the experts, and, indeed, trusting them. This approach
dominates #1, since it eliminates problem (i). Problem (ii) persists, however,
but at least the estimates have some justification, even if we are asserting
them too strongly.
4. Proceed as in #3, but elicit from the experts an estimate of spread
(uncertainty) as well as location, and correctly incorporate this into the
analysis. The end result is the best possible description of the current state
of knowledge: past experience (from the experts) is properly balanced against
current data to yield a rational synthesis. Other than technical difficulties
(which can be formidable) this method is the most satisfying. The major
drawback with it has not yet been mentioned: it requires a statement of the most
complete possible model from the outset. However, science is all about
modifying our view of the model structure--not only of its parameters--as we
learn more. Without incredible contortions, the Bayesian approach cannot do
this, and even when it can be twisted into doing so, the technical difficulties
quickly become insurmountable, except for the simplest of problems.
So, we know what we should do if only it were possible (#4), and we know what we
should never do (#2). The art, as I see it, is a judicious blend of #3 and #4
-- that is, limit oneself to tractable and moderately sized models (equivalent
to fixing the parameters of larger models to boundary values yielding
scientifically reasonable approximate models, suitable to the scale of the
available data and the uses to which the model is to be put), and use
informative but non-degenerate priors for all remaining free parameters of those
models. This can be seen as using #3 'globally' and Bayesian methods (#4)
'locally'.
LBS.
--
_/ _/ _/_/ _/_/_/ _/_/_/ Lewis B Sheiner, MD (lewis@c255.ucsf.edu)
_/ _/ _/ _/ _/ Professor: Lab. Med., Biophmct. Sci.
_/ _/ _/ _/_/_/ _/_/ Mail: Box 0626, UCSF, SF,CA,94143
_/ _/ _/ _/ _/ Courier: Rm C255, 521 Parnassus,SF,CA,94122
_/_/ _/_/ _/_/_/ _/ 415-476-1965 (v), 415-476-2796 (fax)