RE: order of covariate inclusion -> avoiding stepwise approaches
From: Jakob.Ribbing@farmbio.uu.se
Subject: RE: [NMusers] order of covariate inclusion -> avoiding stepwise approaches
Date: 9/26/2003 9:33 AM
Dear all,
A few comments on the recent discussion on stepwise covariate modelling.
We have just submitted a paper (Jakob Ribbing and E. Niclas Jonsson, Power, Selection Bias
and Predictive Performance of the Population Pharmacokinetic Covariate Model) on a simulation
study that investigates the effects of stepwise covariate modelling and in short the conclusions
relevant to what has been discussed on NMusers are:
1. Stepwise comparison should NOT be performed on a SMALL DATASET (? 50 subjects)
if the purpose is predictive modelling:
1. Weak covariates are heavily biased when selected based on a statistical criterion.
Selection bias is caused by the selection procedure used and is not due to
the estimation method used.
2. Because of the heavy selection bias a weak covariate could be expected to
worsen the predictive performance if selected
3. A weak and clinically insignificant covariate cannot be separated from a clinically
significant covariate because of this selection bias. Thus, the covariates which are
statistically significant will also most often appear clinically significant even if they arent!
4. Bias correction or other selection criteria than the p-value may allow stepwise
regression even on small datasets.
2. Testing correlated covariates for inclusion in the model is not harming the predictive performance
of the final model. However, a large dataset is required in order to select, with enough certainty,
the better of two highly correlated covariates.
To connect to what was said by Marc on this topic, I do agree to that requiring statistical significance
of covariates SOMETIMES can be harmful and contra productive if the purpose is predictive modelling. However,
even in these cases stepwise regression could be useful for hypothesis generation. Marc suggested selecting
the covariate model based purely on prior knowledge, regardless of statistical significance in the dataset
analyzed, to estimate the covariate-model parameters. On the other hand, this prior knowledge can be partly
elicited from stepwise covariate modelling on a prior dataset. This is an appealing strategy that we will
compare to others in a current simulation study, but no results from this are available yet.
Best regards,
Jakob
Jakob Ribbing, MSc
Division of Pharmacokinetics and Drug Therapy
Department of Pharmaceutical Biosciences
Uppsala University
Box 591
SE-751 24 Uppsala
SWEDEN
Phone: +46 18 471 44 37
Mobile phone: +46 70 450 33 77
Fax: +46 18 471 40 03
Email: jakob.ribbing@farmbio.uu.se