RE: Centering (was Re: Missing covariates)
From: "Perez Ruixo, Juan Jose [JanBe]" <JPEREZRU@janbe.jnj.com>
Subject: RE: Centering (was Re: Missing covariates)
Date: Thu, 12 Jul 2001 10:17:54 +0200
Dear all,
Regarding standard errors when the centering approach is used, I
would like to add some comments.
For simple linear model without centering the independent variable
(y = a + bx), the variance of y (as a function of x) is equal to:
S**2 * { (1 / N) +
(x - X)**2 / Sx} eq. 1
where, S is the residual standard error; N is the number of pairs
x,y; X is the mean of x; Sx is the sum of squares for x.
When x = X, we have the lower variance of y, S**2 / N. This variance
is equal to the centering approach variance in x - X = 0. It only results in
applying eq.1 to the model, y = a' + b (x-X), where a' = a + bX. In both
cases, the variance in the mean of the independent variable is lower than
the variance of the intercept. Only when the mean of x is equal 0, these
variances are equal. For these reasons, centering does not affect the
standard errors, and the intercept errors are different because it
represents different values.
We must be careful with the centering approach for categorical data.
In this case, the slope (b) is affected by the codification used. Following
the example of Matt (TVCL = THETA(1) + Xi * THETA(2)), the parametrization
Xi = 1 for females, and Xi = -1 for males, allows to get the population
average when Xi=0 and there is equal proportion of males and females. In
this case, we have a null correlation between intercept and slope, but the
intercept SE is the same as the slope SE and, both equal to residual
standard error divided by SQRT(N). In other words, you don't know directly
the population parameter for males and the difference from females. THETA(2)
is a half of the real difference between males and females and its absolute
standard error (not relative standard error) is affected. It means t-test is
the same, but confidence interval building needs previous transformation in
order to get the precision of the real difference between males and females.
I can show a simple example (with S+ code)
weight <- c(rnorm(50, mean = 60, sd = 6),rnorm(50, mean = 50, sd =
5))
gender0 <- c( rep(1,50),rep(0,50))
gender1 <- c( rep(1,50),rep(-1,50))
G0 <- lm(weight~gender0)
G1 <- lm(weight~gender1)
summary(G0)
summary(G1)
.......
Coefficients G0:
Value Std. Error t value Pr(>|t|)
(Intercept) 50.7279 0.7918 64.0675 0.0000
gender0 7.9107 1.1198 7.0647 0.0000
Residual standard error: 5.599 on 98 degrees of freedom
Multiple R-Squared: 0.3374
F-statistic: 49.91 on 1 and 98 degrees of freedom, the p-value is
2.362e-010
Correlation Intercept, gender: -0.7071
Coefficients G1:
Value Std. Error t value Pr(>|t|)
(Intercept) 54.6832 0.5599 97.6698 0.0000
gender1 3.9554 0.5599 7.0647 0.0000
Residual standard error: 5.599 on 98 degrees of freedom
Multiple R-Squared: 0.3374
F-statistic: 49.91 on 1 and 98 degrees of freedom, the p-value is
2.362e-010
Correlation Intercept, gender: 0
For this reasons, I suggest not to use centering approach for
categorical data (here, I don't include ordinal data). Without centering the
intercept have a useful meaning, thereby making centering unnecessary.
> Thanks,
>
> Juan Jose Perez Ruixo
> Global Pharmacokinetics and Clinical Pharmacology Division.
> Janssen Research Foundation
> Turnhoutseweg, 30
> B-2340 Beerse
> Belgium
> Tel: (+32) 14 60 75 08
> Email: jperezru@janbe.jnj.com
>