Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 30, 2001 technical Source: cognigencorp.com
From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Date: Mon, 30 Jul 2001 17:25:53 -0400 Dear Juan and All, What we discussed so far are actually about how to handle the question mentioned in your last sentence: "if the male to female ratio is not equal to 1:1, it's necessary to modify appropriately the values for codification with deviations from mean coding, otherwise the intercept will be affected and won't represent the weigh average of male and female." Leonid proposed to use -1 and 1 for gender. It looks like we got around the problem (of defining the mean values for dichotomous covariates). But actually, it's still there, because in this particular case (same number of male and female), the mean value is just zero which could be automatically removed from the equation. Now back to the question of your last sentence, if we want to make the intercept "represent the weight average of male and female", shall we use a fraction value as the mean of gender (numbers of male and female are different, e.g., 30 males and 70 females)? Or any other methods? Note that, we know whether a covariate is dichotomous or continuous just because we defined them with a set of (arbitrary) values in an arbitrary scaling system. However, computer does not know this at all. Computer treats all covariates in the same way based on the same given statistical regulations. With this in mind, if we can center continuous covariates at their means (in our mind, they are continuous, but in computer, they are just a normal variable), I don't see why we can not do that for dichotomous covariates. Of course, after doing that, there is a problem to give interpretations. For this one, the interpretation is associated with the values and the scaling systems we defined for the dichotomous (or categorical) covariates. Even for continuous covariates, we understand MEAN AGE = 40 years just because we chose and we are used to this decimal scaling system and the values based on unit of years. If we choose another totally different scaling system (e.g., hexadecimal) and values for AGE based on different units (e.g., month), the machine - computer - can still give the same statistical inference but the interpretation might look bizarre. It look bizarre not because the statistical inference is changed just because we are not used to that scaling system. Same thing happens here. For a dichotomous covariates, we just use a different scaling system and set values to dichotomous covariates based on different units. However, this seems not influence the statistical inference at all, because the machine - computer can not distinguish a dichotomous covariate from a continuous covariate and can not distinguish the scaling systems they use but transfer all different scaling systems (if apply) to the binary scaling system (for calculation) and the decimal scaling system (for output). Therefore, to make interpretations sound more reasonable (to people who are used to the decimal scaling system), a scaling system closely matching to the decimal scaling system would be preferred. That is, choose those codifications, results of which sounds more reasonable (to the decimal scaling world). Many choices of codification is not a reason that we can not center a dichotomous covariates (statistically meaningfully). Yes. as JUAN said, slopes depends on codification (actually the choice of scaling systems). However, this also happens to continuous covariates. If we change the scaling system for continuos covariates, e.g. from decimal to hexadecimal, the slope will change too. The decimal scaling system for continuous covariates is just one of the scaling systems and we choose it and we are used to it. My point is, for dichotomous covariates, we can do centering with the same statistical meaning in the same statistical way as continuous covariates (run by computer). The interpretation (by us) is associated with the scaling system and units we choose for the dichotomous covariates. I agree, with nominal covariates, we'd be careful. But we can define/choose a scaling system (metric) to make it work. (Which one is not defined by us?). Of course, for each scaling system we define/choose, unit scaling distance should be uniform (or not necessary?). The following article might be helpful. In this article the authors discussed about the centering issue for dichotomous covariates - in a little different way we discussed here. Jonsson EN and Karlsson MO. Automated covariate model building within NONMEM. Pharm Res 1998; 15(9): 463-8. Any further discussion on this topic will be welcome and any input will be appreciated. Best regards, Alan.
Jul 02, 2001 Nick Holford Centering (was Re: Missing covariates)
Jul 02, 2001 William Bachman RE: Centering (was Re: Missing covariates)
Jul 02, 2001 Kenneth G. Kowalski RE: Centering (was Re: Missing covariates)
Jul 02, 2001 Lewis B. Sheiner Centering (was Re: Missing covariates)
Jul 03, 2001 Jogarao Gobburu Re: Centering (was Re: Missing covariates)
Jul 03, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 03, 2001 Nick Holford Re: Centering (was Re: Missing covariates)
Jul 03, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 03, 2001 Lewis B. Sheiner Re: Centering (was Re: Missing covariates)
Jul 03, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 03, 2001 Diane Mould Re: Centering (was Re: Missing covariates)
Jul 04, 2001 Nick Holford Re: Centering (was Re: Missing covariates)
Jul 04, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 04, 2001 Diane Mould Re: Centering (was Re: Missing covariates)
Jul 05, 2001 Nick Holford Re: Centering (was Re: Missing covariates)
Jul 05, 2001 Stephen Duffull RE: Centering (was Re: Missing covariates)
Jul 05, 2001 Nick Holford Re: Centering (was Re: Missing covariates)
Jul 05, 2001 Leon Aarons 70kg neonates
Jul 05, 2001 Nick Holford Re: 70kg neonates
Jul 05, 2001 Peter Bonate Centering
Jul 05, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 05, 2001 Leonid Gibiansky RE: Centering (was Re: Missing covariates)
Jul 05, 2001 Kenneth G. Kowalski RE: Centering (was Re: Missing covariates)
Jul 05, 2001 William Bachman RE: Centering (was Re: Missing covariates)
Jul 05, 2001 Diane Mould Re: Centering (was Re: Missing covariates)
Jul 05, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 05, 2001 Alan Xiao Question 2 about prediction and covariates
Jul 06, 2001 Matt Hutmacher RE: Centering (was Re: Missing covariates)
Jul 09, 2001 Vladimir Piotrovskij RE: Centering (Impact on SE)
Jul 09, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 09, 2001 Kenneth G. Kowalski RE: Centering (Impact on SE)
Jul 09, 2001 Vladimir Piotrovskij RE: Centering (Impact on SE)
Jul 09, 2001 Smith Brian P RE: Centering (Impact on SE)
Jul 09, 2001 Matt Hutmacher RE: Centering (was Re: Missing covariates)
Jul 12, 2001 Juan Jose Perez Ruixo RE: Centering (was Re: Missing covariates)
Jul 12, 2001 Juan Jose Perez Ruixo RE: Centering (was Re: Missing covariates)
Jul 12, 2001 Matt Hutmacher RE: Centering (was Re: Missing covariates)
Jul 12, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 30, 2001 Juan Jose Perez Ruixo Re: Centering (was Re: Missing covariates)
Jul 30, 2001 Alan Xiao Re: Centering (was Re: Missing covariates)
Jul 30, 2001 Leonid Gibiansky RE: Centering (was Re: Missing covariates)