RE: covariate selection question
From: "A.J. Rossini" blindglobe@gmail.com
Subject: RE: [NMusers] covariate selection question
Date: Fri, 20 Jan 2006 21:13:32 +0100
This isn't quite true; it's quite context-laden. To clarify, and I'm
nitpicking here:
Any finite dataset (datasets!) could be reasonably generated by a
number of models, not necessarily the ones you used. The larger
the individual dataset (and the more independent datasets taken), the
better the chance that you actually rediscover the model that you
originally used for generation. Of course, in a sense you are
cheating, since you have a good clue on how to restrict the space of
potential models in order to "rediscover" it.
While we like to simulate, we have to remember that just as the same
model can generate many realized datasets, the same dataset can
originate from a number of models, and this has implications.
And back to the original point: stepwise procedures are notoriously
awful, failing to preserve type I error in the final model, i.e. they
don't lead to sensible decisions based on the model unless you are
lucky. Regularization methods of variable selection (where you
slowly increase the amount that covariates contribute and look at the
selection paths) seem to do reasonably for automatic variable
selection by effect for linear and generalized linear (categorical
data) regression, and I thought I'd seen a recent paper on this for
nonlinear regression, but not yet for mixed effects. I'm not sure how
you'd balance fixed and random effects in this case.