RE: order of covariate inclusion -> avoiding stepwise approaches -> abandoning exploratory analysis?
From: marc.gastonguay@snet.net
Subject: RE: [NMusers] order of covariate inclusion -> avoiding stepwise approaches -> abandoning exploratory analysis?
Date: 9/26/2003 10:52 AM
Ken, Bill, Jakob, Chuanpu and Mark,
Thanks for your feedback. As I indicated, the full model/bootstrap approach
is still an idea and there are issues to be worked-out. Perhaps we can find
a compromise that addresses all of the issues you've raised. Let me try to
address the main issues.
First of all, I did not mean to denegrate statistics as a discipline and I
should have said stepwise regression. Thanks for pointing this out, Chuanpu.
On exploratory analysis and "Knowing" the model:
Of course we never really know the model and I do think that we should use
the usual goodness of fit diagnostics to compare possible alternatives and
guide the development of the structural model, while keeping prior
information in mind. In building the full covariate model, I suggested that
covariates should be included based on prior scientific knowledge AND your
interest in exploring a particular covariate effect. This does not assume
that you know the model ahead of time. If you are interested enough to do an
exploratory analysis on a particular covariate, you should include it in the
full model. I don't think we should proceed with the "kitchen sink"
approach, though. As has been mentioned before, you've got to be careful
about how you construct the full model so that you avoid problems with
correlated/colinear covariates (especially when the data set is small). You
may even need a few alternative full models to assess the form of the
covariate-parameter relationships (perhaps comparing linear and nonlinear
covariate relationships) in order to arrive at a stable full model. This is
where graphical exploration of the form of the covariate-parameter
relationship can be useful. We don't need stepwise regression to do any of
this.
On parsimony:
I agree that there are certainly advantages to arriving at a parsimonious
model. One of the things that is overlooked in a parsimonious model,
however, is why a particular covariate was excluded. Was it because the
covariate truly has no effect on the parameter of interest or was it
excluded because the data are not informative about this potential covariate
effect? A full model with point and interval estimates does address this
issue.
You could envision an approach where the full model is developed and
confidence intervals for all parameters are obtained. Then, decisions about
moving to a more parsimonious model are made based on the clinical relevance
of estimated covariate effects where those covariates having little or no
impact are dropped from the model. This preserves the assesment of why a
covariate is "insignificant", while allowing a more parsimonious model.
I would also suggest that you investigate any remaining trends in covariates
that were not included in the full model as part of the model evaluation
step. If the model performs poorly with respect to a particular covariate,
you may need to go back and pose a new full model.
Marc