RE: covariate selection question
From: "Jakob Ribbing" Jakob.Ribbing@farmbio.uu.se
Subject: RE: [NMusers] covariate selection question
Date: Tue, 17 Jan 2006 15:47:48 +0100
Dear Joern, Mike and others,
I would agree with Mike. To answer Joerns question on how to interpret the results of the
stepwise selection: As far as the p-value/LRT can guide you in selecting the covariate
model you should keep this particular covariate in the model. Just be sure to use a
p-value/likelihood-ratio which is adjusted for the number of parameter-covariate
relations that you have tested (or otherwise explored).
To judge if the covariate relation makes biological sense it may be helpful to understand
why the covariate first was not significant but later became so. There could be a number
of reasons for the covariate-selection behaving this way:
1. Including a very influential covariate-relation may make the picture clearer and
other, weaker relations appear from out of the mist due to the reduced random noise. For
example, including CRCL on CL for a drug eliminated mainly by renal filtration would reduce
the (random) variability in CL so that less important covariate relations could be found
2. One covariate relation could be masking another relation. If the first relation
is included in the model the other becomes statistically significant. This behaviour is
due to correlation between covariates that both end up influencing the same structural-model
parameter (or correlation of estimate between two structural-model parameters). An example
of this could be a drug with higher CL for females (compared to males of the same size).
This relation may be masked by males generally being larger than females (and size is often
an important covariate). Including the one covariate would make inclusion of the other
statistically significant. Another example would be model misspecification: Including a
linear covariate relation (where another relation would have been more appropriate) could
cause a second covariate to compensate for this, eg if WT instead of lean-body weight is
included BMI may become statistically significant to compensate for this
3. Random. If the LRT gave almost the same result when including the covariate to the
basic and to the latter model (e.g. the nominal p-value changed from 0.011 to 0.099) this
could be seen as just a random change. If the p-value required for inclusion were 0.01 the
covariate is significant in the latter test but not in the first. This is a problem with
all selection methods which either includes a covariate fully (according to the maximum-likelihood
estimate) or not at all. On the other hand, getting rid of all the "maybe"-covariates may
provide the best big picture of what is important. Further, using the LRT often translates
into a p-value - whatever that will tell you :>)
Jakob