From: VPIOTROV@PRDBE.jnj.com
Subject: [NMusers] FW: [S] The effect of default values on statistical results
Date:Wed, 19 Jun 2002 08:58:14 +020
Dear NONMEM users,
FYI I am forwarding a few mails I got through the S+ user forum. It is primarily about GLM and GAM,
but my understanding is that the topic is relevant for us, too.
Best regards,
Vladimir
-----------------------------------------------------------------
Vladimir Piotrovsky, Ph.D.
Global Clinical Pharmacokinetics and Clinical Pharmacology
Johnson & Johnson Pharmaceutical Research & Development
B-2340 Beerse
Belgium
=======================================================
I got an interesting call from a reporter at the New York Times yesterday, who alerted me to some
research by Francesca Dominici and colleagues at Johns Hopkins
(see http://biosun01.biostat.jhsph.edu/~fdominic/research.html, although the actual
paper's link is no longer available). They have discovered that when the effect to be estimated with a GAM model is
very small, the default convergence setting in many statistical software's GAM routines (including S-PLUS's) can
lead to biased estimates. This resulted in a downwards revision in the estimates of an air pollution study when the
data were re-analysed using a stricter convergence criterion.
You can read the full story in today's New York Times, or on their website (registration required) at
http://www.nytimes.com/2002/06/05/science/05PART.html
The reporter would like to do a follow-up story focusing on other statistical studies that may have had to revise
results after relying on defaults in statistical software. If you have any similar tales or cautionary notes and
would like to send them on, I'll pass them to the reporter.
BTW, I'll add a warning to gam()'s help page about this issue. I'd also welcome any discussion
about whether you think the default convergence criterion in gam() should be reduced in general.
# David
--
David M Smith
Product Manager, Insightful Corp, Seattle WA
Tel: +1 (206) 802 2360
Fax: +1 (206) 283 6310
======================================================================================
Thanks to everyone who responded to my query about yesterday's New York Times article: Tom Filloon, Jim Pratt, Peter England,
Brian Ripley, Rich Calaway and Bert Gunter. I summarize the responses below.
Thanks also to Trevor Hastie for his contribution, and we'll update S-PLUS in the next release to tighten the default
convergence criteria for GAM and GLM as he suggests. Thanks too to Francesca Dominici, the author of the paper
cited by the Times article, for filling me in on the background. She tells me that she has also heard from R and
Stata who are also looking into this. She mentions that SAS should also do so as well.
Since I didn't get explicit permission to post names (except in one case) and given the media interest,
I post these summaries without the traditional attribution.
-----
My opinion, for what it is worth, is that many software packages have
weak convergence criteria, since they (obviously) want to appear fast.
Staisticians, as a matter of course, should check that their procedures
have converged adequately. Whenever I do anything important using
GAMs/GLMs, I adopt stricter convergence criteria than standard, and have
been doing so for years. Unfortunately, I do not have examples where
using the standard convergence criteria would have altered a conclusion.
From memory, I think Stata continues iterating until the parameter
estimates (individually) change less than a certain tolerance, whereas
most packages rely on an overall goodness-of-fit measure such as
deviance.
-----
I've always thought that one should check robustness of results using
different convergence criteria. If one sets default criteria too small,
it greatly decreases the performance of the routine for most situations
unnecessarily. When estimates are small or effects sizes small, then
need ability to tighten convergence criteria. This would be a general
cautionary note to any iterative routine, not just GAM. I do not see
this as a criticism to any particular software (unless current default
convergence criteria were not given considerable thought), but a general
caution to any iteration routine for all software packages.
I have no example to provide where default settings biased results.
-----
[Should default convergence criteria in gam be stricter?]
My answer is yes, and more importantly in glm too.
-----
[Finally, this from Bert Gunter summarizes things well:]
Translation 1:
Data analysis is a tricky business -- a trickier business than even tricky
data analysts sometimes think.
Translation 2:
There's no free lunch even when lunch is free.
--
David M Smith
Product Manager, Insightful Corp, Seattle WA
Tel: +1 (206) 802 2360
Fax: +1 (206) 283 6310
=======================================================================================
To follow up on my previous summary, I got one further reply on
this issue from Bruce McCullough, which I include in its entirety
below. He also provides some relevant references to his papers on reliability of statistical software.
He replied:
The idea that nonlinear results are dependent upon
default options is nothing new. I made this point in
my review of S-PLUS, SAS and SPSS. I also made
the point in other reviews, where I showed that
reliance on default values is not a good idea. Of
course, I am hardly the first person to do this.
Anyhow, this strikes me as a user problem, not a software
problem. Tightening up the tolerances may be
somewhat useful, but there is no one set of criteria
that works for all problems. Hence, I think warnings
should be attached to the documentation: The user
should vary the options, switch algorithms, check the
gradient, etc. i.e., do all the things that one usually does
to ensure that one has found a local extremum, to make
sure that the solver has not just stopped at a convenient
point that is not an extremum.
Better yet: supply no defaults for nonlinear, so that
the user must choose all the options! :)
Bruce
"Assessing the Reliability of Statistical Software: Part I,"
The American Statistician 52(4), 358-366, 1998
"Assessing the Reliability of Statistical Software: Part II,"
The American Statistician 53(2), 149-159, 1999
"The Numerical Reliability of Econometric Software"
(with H.D. Vinod),
Journal of Economic Literature 37(2), 633-665, 1999
"Econometric Software Reliability: E-Views, LIMDEP,
SHAZAM, and TSP,"
Journal of Applied Econometrics, 14(2), 191-202, 1999
B. D. McCullough, Associate Professor
Department of Decision Sciences, Drexel University
Philadelphia, PA 19104-2875
w: 215-895-2134 f: 215-895-2907
bdmccullough@drexel.edu www.pages.drexel.edu/~bdm25
--
David M Smith
Product Manager, Insightful Corp, Seattle WA
Tel: +1 (206) 802 2360
Fax: +1 (206) 283 6310