Re: Centering (was Re: Missing covariates)
From: Alan Xiao <Alan.Xiao@cognigencorp.com>
Subject: Re: Centering (was Re: Missing covariates)
Date: Mon, 30 Jul 2001 17:25:53 -0400
Dear Juan and All,
What we discussed so far are actually about how to handle the question
mentioned in your last sentence: "if the male to female ratio is not
equal to 1:1, it's necessary to modify appropriately the values for
codification with deviations from mean coding, otherwise the intercept
will be affected and won't represent the weigh average of male and
female."
Leonid proposed to use -1 and 1 for gender. It looks like we got around
the problem (of defining the mean values for dichotomous covariates).
But actually, it's still there, because in this particular case (same
number of male and female), the mean value is just zero which could be
automatically removed from the equation.
Now back to the question of your last sentence, if we want to make the
intercept "represent the weight average of male and female", shall we
use a fraction value as the mean of gender (numbers of male and female
are different, e.g., 30 males and 70 females)? Or any other methods?
Note that, we know whether a covariate is dichotomous or continuous just
because we defined them with a set of (arbitrary) values in an arbitrary
scaling system. However, computer does not know this at all. Computer
treats all covariates in the same way based on the same given
statistical regulations. With this in mind, if we can center continuous
covariates at their means (in our mind, they are continuous, but in
computer, they are just a normal variable), I don't see why we can not
do that for dichotomous covariates.
Of course, after doing that, there is a problem to give
interpretations. For this one, the interpretation is associated with
the values and the scaling systems we defined for the dichotomous (or
categorical) covariates. Even for continuous covariates, we understand
MEAN AGE = 40 years just because we chose and we are used to this
decimal scaling system and the values based on unit of years. If we
choose another totally different scaling system (e.g., hexadecimal) and
values for AGE based on different units (e.g., month), the machine -
computer - can still give the same statistical inference but the
interpretation might look bizarre. It look bizarre not because the
statistical inference is changed just because we are not used to that
scaling system.
Same thing happens here. For a dichotomous covariates, we just use a
different scaling system and set values to dichotomous covariates based
on different units. However, this seems not influence the statistical
inference at all, because the machine - computer can not distinguish a
dichotomous covariate from a continuous covariate and can not
distinguish the scaling systems they use but transfer all different
scaling systems (if apply) to the binary scaling system (for
calculation) and the decimal scaling system (for output).
Therefore, to make interpretations sound more reasonable (to people who
are used to the decimal scaling system), a scaling system closely
matching to the decimal scaling system would be preferred. That is,
choose those codifications, results of which sounds more reasonable
(to the decimal scaling world). Many choices of codification is not a
reason that we can not center a dichotomous covariates (statistically
meaningfully).
Yes. as JUAN said, slopes depends on codification (actually the choice
of scaling systems). However, this also happens to continuous
covariates. If we change the scaling system for continuos covariates,
e.g. from decimal to hexadecimal, the slope will change too. The
decimal scaling system for continuous covariates is just one of the
scaling systems and we choose it and we are used to it.
My point is, for dichotomous covariates, we can do centering with the
same statistical meaning in the same statistical way as continuous
covariates (run by computer). The interpretation (by us) is associated
with the scaling system and units we choose for the dichotomous
covariates. I agree, with nominal covariates, we'd be careful. But we
can define/choose a scaling system (metric) to make it work. (Which
one is not defined by us?). Of course, for each scaling system we
define/choose, unit scaling distance should be uniform (or not
necessary?).
The following article might be helpful. In this article the authors
discussed about the centering issue for dichotomous covariates - in a
little different way we discussed here.
Jonsson EN and Karlsson MO. Automated covariate model building within
NONMEM. Pharm Res 1998; 15(9): 463-8.
Any further discussion on this topic will be welcome and any input will
be appreciated.
Best regards,
Alan.