RE: Centering (was Re: Missing covariates)
From: "Perez Ruixo, Juan Jose [JanBe]" <JPEREZRU@janbe.jnj.com>
Subject: RE: Centering (was Re: Missing covariates)
Date: Thu, 12 Jul 2001 10:13:07 +0200
Dear all,
We must take care with the centering approach for categorical data.
Following the example of Mat (TVCL = THETA(1) + Xi * THETA(2)), the
parametrizations Xi = 1 for females, and Xi = -1 for males, allow to get the
population average when Xi=0 and there is equal proportion of males and
females. In this case, we have a null correlation between intercept and
slope, but the intercept SE is the same that slope SE and, both equal to
residual standard error divided by SQRT(N). In other words, you don't know
directly the males population parameter and the difference with females,
because THETA(2) is affected by the Xi codification. In this case THETA(2)
is a half of the real difference between males and females and its absolute
standard error (not relative standard error) is affected. It means t-test is
the same, but confidence interval building needs previous transformation in
order to get the precision of the real difference between males and females.
I can show you a simple example (with S+ code)
weight <- c(rnorm(50, mean = 60, sd = 6),rnorm(50, mean = 50, sd = 5))
gender0 <- c( rep(1,50),rep(0,50))
gender1 <- c( rep(1,50),rep(-1,50))
G0 <- lm(weight~gender0)
G1 <- lm(weight~gender1)
summary(G0)
summary(G1)
.......
Coefficients G0:
Value Std. Error t value Pr(>|t|)
(Intercept) 50.7279 0.7918 64.0675 0.0000
gender0 7.9107 1.1198 7.0647 0.0000
Residual standard error: 5.599 on 98 degrees of freedom
Multiple R-Squared: 0.3374
F-statistic: 49.91 on 1 and 98 degrees of freedom, the p-value is 2.362e-010
Correlation Intercept, gender: -0.7071
Coefficients G1:
Value Std. Error t value Pr(>|t|)
(Intercept) 54.6832 0.5599 97.6698 0.0000
gender1 3.9554 0.5599 7.0647 0.0000
Residual standard error: 5.599 on 98 degrees of freedom
Multiple R-Squared: 0.3374
F-statistic: 49.91 on 1 and 98 degrees of freedom, the p-value is 2.362e-010
Correlation Intercept, gender: 0
For this reasons, I suggest don't use centering approach for categorical
data (here, I don't include ordinal data). With first approach the intercept
have a useful meaning, and I don't need to center the variable.
Thanks,
Juan Jose Perez Ruixo
Global Pharmacokinetics and Clinical Pharmacology Division.
Janssen Research Foundation
Turnhoutseweg, 30
B-2340 Beerse
Belgium
Tel: (+32) 14 60 75 08
Email: jperezru@janbe.jnj.com