RE: order of covariate inclusion -> avoiding stepwise approaches -> abandoning exploratory analysis?
From: Ken.Kowalski@pfizer.com
Subject: RE: [NMusers] order of covariate inclusion -> avoiding stepwise approaches -> abandoning exploratory analysis?
Date: 9/29/2003 9:58 AM
Marc, Bill, et al.,
Nice summary but to me the key point to this discussion based on Pete
Bonate's original message in this thread is that we need to be cognizant of
the effects of collinearity when we build covariate models regardless of the
model building procedure we use. The problem with the 'kitchen sink'
approach is that we are putting too much trust in the statistical algorithm
to sort out the true covariates from the nuisiance covariates that may be
highly correlated with them. The more things we toss into the 'kitchen
sink' the more likely we will pick up some of these nuisance covariates in
our model. Why is it that we are willing to make very strong mechanistic
assumptions regarding the structural model but when it comes to a list of
covariates to be investigated we are unwilling to prune the list based on
this same mechanistic reasoning? Of course we do try to make the list
somewhat plausible (e.g., we don't typically include shoe size as a
covariate but its likely to be correlated with weight and may actually
explain some of the interindividual variation), we just need to be a little
more discriminating (rather than just pruning out the obvious such as shoe
size).
With regards to building a full model, it is a bit of a 'straw man' argument
to say full models are difficult to develop when there are 20-30 covariates.
The difficulty is not the number of covariates but the amount of independent
information contained in these 20-30 covariates. The full model approach
requires the data analyst to deal head-on with the collinearity issue. The
issue is not of computational speed. Just because we now have procedures
that may be computationally faster than what we had say 10 years ago,
doesn't mean that we should blindly (i.e., ignoring collinearity)
investigate more covariates just because we can.
Ken