FW: [S] The effect of default values on statistical results

From: Vladimir Piotrovskij Date: June 19, 2002 technical Source: cognigencorp.com
From: VPIOTROV@PRDBE.jnj.com Subject: [NMusers] FW: [S] The effect of default values on statistical results Date:Wed, 19 Jun 2002 08:58:14 +020 Dear NONMEM users, FYI I am forwarding a few mails I got through the S+ user forum. It is primarily about GLM and GAM, but my understanding is that the topic is relevant for us, too. Best regards, Vladimir ----------------------------------------------------------------- Vladimir Piotrovsky, Ph.D. Global Clinical Pharmacokinetics and Clinical Pharmacology Johnson & Johnson Pharmaceutical Research & Development B-2340 Beerse Belgium ======================================================= I got an interesting call from a reporter at the New York Times yesterday, who alerted me to some research by Francesca Dominici and colleagues at Johns Hopkins (see http://biosun01.biostat.jhsph.edu/~fdominic/research.html, although the actual paper's link is no longer available). They have discovered that when the effect to be estimated with a GAM model is very small, the default convergence setting in many statistical software's GAM routines (including S-PLUS's) can lead to biased estimates. This resulted in a downwards revision in the estimates of an air pollution study when the data were re-analysed using a stricter convergence criterion. You can read the full story in today's New York Times, or on their website (registration required) at http://www.nytimes.com/2002/06/05/science/05PART.html The reporter would like to do a follow-up story focusing on other statistical studies that may have had to revise results after relying on defaults in statistical software. If you have any similar tales or cautionary notes and would like to send them on, I'll pass them to the reporter. BTW, I'll add a warning to gam()'s help page about this issue. I'd also welcome any discussion about whether you think the default convergence criterion in gam() should be reduced in general. # David -- David M Smith Product Manager, Insightful Corp, Seattle WA Tel: +1 (206) 802 2360 Fax: +1 (206) 283 6310 ====================================================================================== Thanks to everyone who responded to my query about yesterday's New York Times article: Tom Filloon, Jim Pratt, Peter England, Brian Ripley, Rich Calaway and Bert Gunter. I summarize the responses below. Thanks also to Trevor Hastie for his contribution, and we'll update S-PLUS in the next release to tighten the default convergence criteria for GAM and GLM as he suggests. Thanks too to Francesca Dominici, the author of the paper cited by the Times article, for filling me in on the background. She tells me that she has also heard from R and Stata who are also looking into this. She mentions that SAS should also do so as well. Since I didn't get explicit permission to post names (except in one case) and given the media interest, I post these summaries without the traditional attribution. ----- My opinion, for what it is worth, is that many software packages have weak convergence criteria, since they (obviously) want to appear fast. Staisticians, as a matter of course, should check that their procedures have converged adequately. Whenever I do anything important using GAMs/GLMs, I adopt stricter convergence criteria than standard, and have been doing so for years. Unfortunately, I do not have examples where using the standard convergence criteria would have altered a conclusion. From memory, I think Stata continues iterating until the parameter estimates (individually) change less than a certain tolerance, whereas most packages rely on an overall goodness-of-fit measure such as deviance. ----- I've always thought that one should check robustness of results using different convergence criteria. If one sets default criteria too small, it greatly decreases the performance of the routine for most situations unnecessarily. When estimates are small or effects sizes small, then need ability to tighten convergence criteria. This would be a general cautionary note to any iterative routine, not just GAM. I do not see this as a criticism to any particular software (unless current default convergence criteria were not given considerable thought), but a general caution to any iteration routine for all software packages. I have no example to provide where default settings biased results. ----- [Should default convergence criteria in gam be stricter?] My answer is yes, and more importantly in glm too. ----- [Finally, this from Bert Gunter summarizes things well:] Translation 1: Data analysis is a tricky business -- a trickier business than even tricky data analysts sometimes think. Translation 2: There's no free lunch even when lunch is free. -- David M Smith Product Manager, Insightful Corp, Seattle WA Tel: +1 (206) 802 2360 Fax: +1 (206) 283 6310 ======================================================================================= To follow up on my previous summary, I got one further reply on this issue from Bruce McCullough, which I include in its entirety below. He also provides some relevant references to his papers on reliability of statistical software. He replied: The idea that nonlinear results are dependent upon default options is nothing new. I made this point in my review of S-PLUS, SAS and SPSS. I also made the point in other reviews, where I showed that reliance on default values is not a good idea. Of course, I am hardly the first person to do this. Anyhow, this strikes me as a user problem, not a software problem. Tightening up the tolerances may be somewhat useful, but there is no one set of criteria that works for all problems. Hence, I think warnings should be attached to the documentation: The user should vary the options, switch algorithms, check the gradient, etc. i.e., do all the things that one usually does to ensure that one has found a local extremum, to make sure that the solver has not just stopped at a convenient point that is not an extremum. Better yet: supply no defaults for nonlinear, so that the user must choose all the options! :) Bruce "Assessing the Reliability of Statistical Software: Part I," The American Statistician 52(4), 358-366, 1998 "Assessing the Reliability of Statistical Software: Part II," The American Statistician 53(2), 149-159, 1999 "The Numerical Reliability of Econometric Software" (with H.D. Vinod), Journal of Economic Literature 37(2), 633-665, 1999 "Econometric Software Reliability: E-Views, LIMDEP, SHAZAM, and TSP," Journal of Applied Econometrics, 14(2), 191-202, 1999 B. D. McCullough, Associate Professor Department of Decision Sciences, Drexel University Philadelphia, PA 19104-2875 w: 215-895-2134 f: 215-895-2907 bdmccullough@drexel.edu www.pages.drexel.edu/~bdm25 -- David M Smith Product Manager, Insightful Corp, Seattle WA Tel: +1 (206) 802 2360 Fax: +1 (206) 283 6310