RE: order of covariate inclusion -> avoiding stepwise approaches
From: marc.gastonguay@snet.net
Subject: RE: [NMusers] order of covariate inclusion -> avoiding stepwise approaches
Date: 9/25/2003 9:10 PM
Dear Pete, Leonid, Ken & others,
At the risk of complicating this already controversial topic, I'll throw my
two cents in...
For all the reasons that Ken metioned and more, I think we should be very
careful with stepwise approaches (see this website for more reasons "Why
stepwise regression is dumb":
http://cvu.strath.ac.uk/HyperNews/get/guss-fprt/9.html). Anyway, I've been
thinking about this for a while and I'm not sure that we should even bother
with stepwise covariate model building.
We typically go through this exercise to identify "statistically
significant" covariates and the most parsimonious model. When interpreting
the modeling results, we usually examine the "significant" covariate effects
and make some judgement about the clinical relevance of those covariates,
usually explaining away those statistically significant effects that do not
produce a clinically relevant change in the parameter(s). We also often
state that covariate effects that were not statistically significant have no
effect on the parameter in question (which is not entirely accurate).
Shouldn't we just focus on clinical relevance and forget about significance
of covariate effects? After all, the methods we use to assess significance
with the Likelihood ratio test are usually wrong (1 - 3) and inappropriate
(due to the multiple comparisons and retrospective nature of the analysis).
Let's consider an approach where one builds a full covariate model based on
prior scientific knowledge, or particular interest in a set of covariates.
This full model must be carefully & thoughtfully constructed to avoid highly
correlated/colinear covariates, but it is quite possible to create such a
model that will still converge.
Inference based on this full model is conducted not via stepwise regression
and the likelihood ratio test, but by estimating model parameters and a
measure of their uncertainty (bootstrap 95% confidence intervals, for
example). The expected clinical impact of covariate effects are then
evaluated given the parameter estimates and the uncertainties around these
estimates. In addition, conclusions about covariates that had relatively
little impact on model parameters can be made with some understanding of how
precisely these small ("insignificant") effects were estimated. So instead
of saying that a covariate has no effect on a model parameter, one can
assess if the lack of effect is actually due to the lack of a relationship,
or if the finding is due to insufficient data. I've also read that this full
model approach leads to standard errors that are more accurate than a
stepwise regression approach, which results in overly optimisitic standard
errors (4).
The other benefit of this approach is that once the full model has been
developed, computations are spent on getting estimates of parameter
precision (bootstrap) rather than a lengthy stepwise regression process.
Of course there are some practical challenges with this idea, and I have to
admit that I still routinely use stepwise backward elimination from a full
model as the primary covariate model building tool. I'm working on building
a set of case studies to convince myself that the full model/bootstrap
approach is sound.
Thanks in advance for your thoughts.
Marc Gastonguay
References:
(1) Wahlby U, Jonsson EN, Karlsson MO. Assessment of actual significance
levels for covariate effects in NONMEM. J Pharmacokinet Pharmacodyn 2001;
28(3):231-252.
(2) Wahlby U, Bouw MR, Jonsson EN, Karlsson MO. Assessment of type I error
rates for the statistical sub-model in NONMEM. J Pharmacokinet Pharmacodyn
2002; 29(3):251-269.
(3) Gobburu JV, Lawrence J. Application of resampling techniques to
estimate exact significance levels for covariate selection during nonlinear
mixed effects model building: some inferences. Pharm Res 2002; 19(1):92-98.
(4) Altman, D. G. and P. K. Andersen. 1989. Bootstrap investigation
of the stability of a Cox regression model. Statistics in Medicine 8:
771-783