RE: covariate selection question
From: mark.e.sale@gsk.com
Subject: RE: [NMusers] covariate selection question
Date: Wed, 18 Jan 2006 15:45:40 -0500
Joga, - the rant continues;
Thanks for your insight, the view that you relate is consistent with my personal experience
with the FDA. But, I think it is important to point out the risk associated with that view.
Not that I disagree, I entirely agree, but think that the risk of this approach needs to be
pointed out. The risk is a high degree of inertia in our understanding. If we only ask
question that are based on what we already believe, it will greatly impede progress. I
certainly agree (as I believe you and Mats are saying), that the "data dredging" can only yield
hypotheses, not conclusions. But, it is reasonable to ask the questions, even questions that seem
silly, based on our current understanding of biology (may I point out:
1. H pylori and ulcers (silly hypothesis, turned out to be true)
2. PVCs and sudden cardiac death (everyone knew that preventing PVCs would reduce sudden death,
turns out not to be true)
3. Beta Carotene and Vitamin E and cancer (lots of retrospectively controlled data, good biological
explanation - turned out not to be true)
the list of hypotheses that were inconsistent with current understanding of biology - that turned
out to be true is very long.
A good Bayesian, I think, never accepts a hypothesis - only assigns a probability that it is true -
while assigning some non-zero probability to many other hypotheses, even the silly ones. In this way,
as data is accumulated, we could, in theory, eventually accept hypotheses that are currently viewed as
silly, but in fact are correct. Unfortunately, human being have a remarkably limited ability to
entertain multiple hypotheses - in fact, rarely can we really entertain more than one at a time (this
has actually been researched - and no one can entertain more than about 3 at once). We have one
hypotheses, which decide if it is true (invariable we decide that it is, otherwise we wouldn't have a
grant to write). Only if that hypothesis turns out not to be true do we look for another. Importantly,
we also have a remarkable ability to dismiss data that is inconsistent with our current view of the
world - also documented. (e.g., events over the past few years in certain countries in the Middle East).
It is generally thought that Gregor Mendel discarded lots of data that was inconsistent with his hypothesis
about genetics - his statistics were far to perfect to be random - every experiment sorted nearly exactly
as it should. The result of these two effects is a high degree of persistence of hypotheses/conclusions,
regardless of whether they are correct.
I don't have a solution, to build models without a basis in understanding of biology is silly, and will
without question lead to many wrong conclusions. But, to not ask questions just because our current view
of biology would reject it as silly is a problem as well. As usual, Bayesians have the answer, if only
we were mentally capable of objectively entertaining 10 competing hypotheses at the same time. In the
US at least, the NIH funding system insists on one hypothesis, forcing researchers to decide what they
believe, and then defend it to the death, rather than keeping an open mind.
Mark Sale M.D.
Global Director, Research Modeling and Simulation
GlaxoSmithKline
919-483-1808
Mobile
919-522-6668