Centering (was Re: Missing covariates)

41 messages 15 people Latest: Jul 30, 2001

Centering (was Re: Missing covariates)

From: Nick Holford Date: July 02, 2001 technical
From: Nick Holford <n.holford@auckland.ac.nz> Subject: Centering (was Re: Missing covariates) Date: Tue, 03 Jul 2001 08:21:39 +1200 Joga, Let me pick up on your remarks about centering. Imagine this simple model using AGE to explain variability in CL: CL=POPCL*(1+THETA(age0)*(AGE-0)) ; centered on 0 CL=POPCL*(1+THETA(age40)*(AGE-40)) ; centered on 40 Algebraically these are identical given appropriate values for THETA(age0) and THETA(age40) but I believe that the robustness of the estimation procedure can be affected by whether or not centering is used. I am afraid I cannot give any reference for this (apart from saying that I think I heard Lewis Sheiner make a similar remark on some occasion). I wonder if Lewis or somebody else would like to offer some support for centering as a means of obtaining "better" estimates. If centering is "better" does this mean improved precision of the estimate of THETA(age40) or less bias or what? If the parameter estimates for THETA(age0) and THETA(age40) have any difference in precision or bias then won't this alter statistical inferences based on these parameters? The second issue relates to the convenience of using a suitable centered value. I am an enthusiastic advocate of doing this. When parameter estimates are reported then if a covariate such as age is in the model then the parameter value is centered on the centering value. In the first case POPCL estimated using THETA(age0) will be for someone of age 0 and using THETA(age40) will for someone of age 40. I would say it is much more convenient to be able to talk of the clearance for someone of age 40 if the original data was obtained in adults. I would go one step further and say that if one can choose a centering value that can be considered a standard eg. 40 years for age, 70 kg for weight, 6 L/h for creatinine clearance, then it becomes possible to easily compare population clearances across different studies and different drugs. No matter what centering value is used for the estimation the parameter estimates one should consider reporting them using a standard value. The convenience comes from using a centering value that is the same as the standard value. Holford NHG. A size standard for pharmacokinetics. Clin. Pharmacokin. 1996: 30:329-332 Jogarao Gobburu 301-594-5354 FAX 301-480-3212 wrote: > > Centering is done for convenience, it does not alter statistical > inference. > > Regards, > Joga Gobburu > Pharmacometrics, > CDER, FDA. bvatul@ufl.edu wrote: > >Hello All > >Could somebody please clarify this: > > > >I am analysing a data set in which I have covariates (wge, ht, wt, > crcl) > >for 70% of the patients. Is it a good idea to substitute the median > >values of these covariates for the missing covariates ie., in patients > >in whom I dont have the covariates?In case we are substituting the > >median values and analysing the covariates is it still necessary to > >center the covariates? Are there any reported papers where the addition > >of missing covariates has led to misinterpretation of data? When should > >we substitute the missing covariates with the median values and when > >should we not? > >Thanks > >Atul > > -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

RE: Centering (was Re: Missing covariates)

From: William Bachman Date: July 02, 2001 technical
From: "Bachman, William" <bachmanw@globomax.com> Subject: RE: Centering (was Re: Missing covariates) Date: Mon, 2 Jul 2001 16:45:18 -0400 Nick, Centering won't (or shouldn't) give better estimates. In theory, the fit of the model to the data should be identical. As I understand it, the advantage to centering (in addition to the interpretive ones you mention), is "numerical stabitility". Using centered parameters is less of a numerical peturbation to the system and somewhat eases the convergence process. Your initial estimates for the new centered model(identical to final estimates from a prior "base model" without covariates) are likely to be better than those for a new non-centered model. You are therefore less likely to obtain a local minimum rather than the global minimum with a complex problem. Bill
From: "KOWALSKI, KENNETH G. [PHR/1825]" <kenneth.g.kowalski@pharmacia.com> Subject: RE: Centering (was Re: Missing covariates) Date: Mon, 2 Jul 2001 15:47:51 -0500 Nick, The advantages of centering are numerical not statistical. That is, convergence to the minimum ELS should be faster using centering than without. However, both should lead to the same minimum value of the ELS and the same parameter estimates after adjusting for the centered value. When there is numerical instability centering can often help. It is good to do some sort of centering although it may not be necessary to do it at the precise center of your data. I agree with you that using standards such as wt=70 kg, age=40 yrs, etc. is convenient. Ken

Centering (was Re: Missing covariates)

From: Lewis B. Sheiner Date: July 02, 2001 technical
Date: Mon, 02 Jul 2001 16:47:27 -0700 From: LSheiner <lewis@c255.ucsf.edu> Subject: Centering (was Re: Missing covariates) > I think the discussion at > http://www.cognigencorp.com/nonmem/nm/99mar021999.html > is what you recall ... > > L. > -- > _/ _/ _/_/ _/_/_/ _/_/_/ Lewis B Sheiner, MD (lewis@c255.ucsf.edu) > _/ _/ _/ _/_ _/_/ Professor: Lab. Med., Bioph. Sci., Med. > _/ _/ _/ _/ _/ Box 0626, UCSF, SF, CA, 94143-0626 > _/_/ _/_/ _/_/_/ _/ 415-476-1965 (v), 415-476-2796 (fax) -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

Re: Centering (was Re: Missing covariates)

From: Jogarao Gobburu Date: July 03, 2001 technical
Date: Tue, 03 Jul 2001 09:36:31 -0400 (EDT) From: "Jogarao Gobburu 301-594-5354 FAX 301-480-3212" <GOBBURUJ@cder.fda.gov> Subject: Re: Centering (was Re: Missing covariates) Dear Nick and others, Lot of action happened while I was off duty! :=) To summarize the input from all the contributors: 1. Centering offers a convenient means of achieving readily interpretable parameter estimates. 2. Centering offers better numerical stability during estimation. 3. Centering should not affect the statistical inference. Regards, Joga Gobburu Pharmacometrics, CDER, FDA

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 03, 2001 technical
From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Date: Tue, 03 Jul 2001 11:10:25 -0400 Well, centering does also influence standard errors for some PK parameter estimates. Alan. -- ***** Alan Xiao, Ph.D *************** ***** PK/PD Scientist *************** ***** Cognigen Corporation ********** ***** Tel: 716-633-3463 ext 265 ******

Re: Centering (was Re: Missing covariates)

From: Nick Holford Date: July 03, 2001 technical
From: Nick Holford <n.holford@auckland.ac.nz> Subject: Re: Centering (was Re: Missing covariates) Date: Wed, 04 Jul 2001 07:33:22 +1200 Alan, Two other very well known authorities in this area have said that centering does not affect the SE of the parameter estimate provided a local minimum is avoided. What is the basis for your assertion? Nick -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 03, 2001 technical
Date: Tue, 03 Jul 2001 16:26:16 -0400 From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) More specifically, The intercept is splited into many terms: INTERCEPT_no_centering = INTERCEPT_centering - SUM of THETAs*MEAN_COVARIATEs. Do you think the standard errors for the intercepts on the two sides are the same? Note that THETAs and MEAN_COVARIATEs might have different levels of stadard errors. By the way, I said that from experience - which was not from local minimums. Alan. -- ***** Alan Xiao, Ph.D *************** ***** PK/PD Scientist *************** ***** Cognigen Corporation ********** ***** Tel: 716-633-3463 ext 265 ******

Re: Centering (was Re: Missing covariates)

From: Lewis B. Sheiner Date: July 03, 2001 technical
From: LSheiner <lewis@c255.ucsf.edu> Subject: Re: Centering (was Re: Missing covariates) Date: Tue, 03 Jul 2001 13:27:09 -0700 Tsk, tsk, Nick. Consider the following experiment: Data = (x=1, y=1) 1. Fit to Y = a + bX SE(a) = inf. 2. Fit to Y = a + b(X-1) SE(a) = 0* * (If you insist on using Nn-1 to compute sigma, then imagine the data are N>1 replications of the (1,1) point) The point is that centering IS a reparameterization (in the ideal case, to parameters that are both more meaningful and less correlated than before). LBS. -- _/ _/ _/_/ _/_/_/ _/_/_/ Lewis B Sheiner, MD (lewis@c255.ucsf.edu) _/ _/ _/ _/_ _/_/ Professor: Lab. Med., Bioph. Sci., Med. _/ _/ _/ _/ _/ Box 0626, UCSF, SF, CA, 94143-0626 _/_/ _/_/ _/_/_/ _/ 415-476-1965 (v), 415-476-2796 (fax)

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 03, 2001 technical
From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Date: Tue, 03 Jul 2001 17:12:31 -0400 Hi, I have a question, MEAN and MEDIAN are alternatively used as CENTERING in this news group, in journal articles, as well as in books. Which one is more (generally) reasonable? Or just a kind of flipping-a-coin thing. Thanks, Alan. -- ***** Alan Xiao, Ph.D *************** ***** PK/PD Scientist *************** ***** Cognigen Corporation ********** ***** Tel: 716-633-3463 ext 265 ******

Re: Centering (was Re: Missing covariates)

From: Diane Mould Date: July 03, 2001 technical
From: "diane r mould" <drmould@attglobal.net> Subject: Re: Centering (was Re: Missing covariates) Date: Tue, 3 Jul 2001 18:58:27 -0400 Alan I think that the variable used for centering should be based on a clinically relevant value rather than means or medians. if, for example, the mean age of a data set is 50 years, but the drug is designated for use with a patients who are actually older (say, 65 years) then you may wish to consider centering parameters based on the expected use of the drug rather than a statistic from a contrived study. So in a nutshell, I believe that the choice of value for centering or normalizing a parameter depends on the clinical situation. However, I do agree with the previous comments about the numerical improvements expected from centering or normalization. hope that this helps diane

Re: Centering (was Re: Missing covariates)

From: Nick Holford Date: July 04, 2001 technical
From: Nick Holford <n.holford@auckland.ac.nz> Subject: Re: Centering (was Re: Missing covariates) Date: Wed, 04 Jul 2001 16:52:05 +1200 Lewis, I asked earlier in this thread if centering was expected to change precision or bias. The answer from you ( http://www.cognigencorp.com/nonmem/nm/99mar021999.html) and from Ken Kowalski was that centering would not affect the estimates (i.e. no bias) if a local minimum was avoided. I am afraid I mistakenly inferred that you also meant that precision (standard error) was also unaffected because that is an issue relating to Joga Gobburu's assertion that statistical inference is not altered by centering and which prompted my question. Your reductio ad absurdam example does illustrate that the SE of the estimate will depend on centering. So now that means that Joga's assertion that "Centering is done for convenience, it does not alter statistical inference" needs to be qualified. I would accept that if the same objective function minimum is reached then model building inferences would not be affected but inferences about the parameters e.g. credible intervals, would be. There is also the issue that I was hoping that Alan Xiao might throw some light on i.e. the empirical evidence (e.g. using NONMEM) for differences in the standard errors depending on the use of centering or not. There is clearly room for a difference between theory and experiment when one has to rely on NONMEM to obtain estimates. It would be of interest to know the size of the standard error as well as the magnitude of parameter bias and or false acceptance rate of one parameterization being superior to another with a practical example. Finally, let me repeat your previous advice contained in the link above: "Bottom line: There is nothing to lose with centering, and much to gain. Hence, ALWAYS do it! " Nick -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 04, 2001 technical
Date: Wed, 04 Jul 2001 11:29:44 -0400 From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Well, ..... About model development and application: 1). Whatever way you use to express a model (generally) should genuinely reflect the assumptions that the model has employed, the background that the model is based on and the possible limit (if any) that the model is applicable. If you use a centering value which is not consistent with your data from which the model is developed, the model will be confusing and sometime is misleading. Different centering will change the interpretability of the model although it might not change the applicability of the model. 2). Application of a model is different from development of a model so that they should not be messed up although the eventual objective of the model development is its application. If a drug was designed for people with mean age of 65 but the clinical trial was performed on patients with mean age of 50, and if this age range difference can result in significant difference in either PK model or PD response, then, this clinical trial design was bad and this bad situation probably can not be sufficiently corrected just by shifting the centering from 50 to 65. If the age range difference is not important (can not produce significant difference in PK and PD), centering at 50 should be better ( I think), since it tells what the model is based on. After all, application of a model is fundamentally just a complicated interpolation or extrapolation of the data (used to develop the model). The quality of this interpolation or extrapolation depends on the quality of the data samples, including the sampling quality, representativeness, etc. 3). Technically, I don't think this centering shift will simplify the application of the model, to any extent. Now back to my original question, I think I should rephrase that as the following: A). Does the (structural) model correspond to the mean values or median values of concentration data? B). Does the structural model correspond to mean or median values of covariates? They are tricky STATISTICAL questions, especially considering the realistic situations that most data distributions (of concentration, time, or covariates) are not ideally normal (lognormal). Thanks, To those American folks, Happy National Independence Day! Alan.

Re: Centering (was Re: Missing covariates)

From: Diane Mould Date: July 04, 2001 technical
From: "diane r mould" <drmould@attglobal.net> Subject: Re: Centering (was Re: Missing covariates) Date: Wed, 4 Jul 2001 18:05:23 -0400 Alan I rather doubt that a minor shift in a value used for centering or normalization would be as destructive as you indicate. I have never seen that happen, but would be quite interested to see examples. However I do believe strongly in the applicability of a model. If changing the value used for centering a variable makes it more reflective of the intended use of that model, I dont see that this is necessarily problematic. As far as I can tell, its not uncommon for a clinical trial to enroll patients with different distributions of demographics than the intended patient population. The purpose of centering is two-fold. One purpose is to help out numerically and if the choice of value used for centering is far from the median or mean value for that demographic covariate, then you will certainly lose that benefit of centering. So if your study is in pediatric patients and you chose 65 yrs as the value to center age with, that could cause problems such as causing parameter values to become negative, or making it difficult to achieve convergence. If the latter occurred or could not be overcome by appropriate parameterization, one might be inclined to disregard potential covariates. So to that extent, I would agree that if you pick a value for centering that is greatly different from your present study you could have problems with the modeling exercise. However, the second purpose is to get out parameter values for the typical patient that the drug is supposed to be used to treat. Therefore, if the mean age of a particular study is 50.938 and you chose this for your centering value, I doubt that the physicians who might have to use your formula will appreciate it, especially if most of their patients are 55 years old :-) Nick Holford's earlier comments to NMUSERS about routinely using a value of 70 to center creatinine clearance is just such an example of picking a user-friendly value. I dont agree that model utility and model development should be kept entirely separate - the end use of the model is the reason that the modeling work is done in the first place. We should always keep the final application in mind when doing this sort of work. The parameterization, covariate selection criteria and even the tests that one might use to qualify a model are all highly dependent on what you plan on doing with it. So using a median value, or a mean value is of equal value, as the primary purpose of centering is not statistical, its numerical. If the covariate model is appropriate to describe the effect of that covariate on a parameter, it doesnt really matter what value is used to help out numerically. As long as the number selected is reasonable, it makes little difference to NONMEM, only to the customer who must use the results. as to the questions that you asked: A). Does the (structural) model correspond to the mean values or median values of concentration data?=20 I am really not sure what is meant here. The structural model should reflect the data vs time profiles of all individuals, not the mean or the median of the data. The choice of the structural model depends on the individual data, and not the mean or the median values. The parameterization of the selected model depends to some extent on what the model is intended for. B). Does the structural model correspond to mean or median values of covariates?=20 Again, I am not sure what you are asking. The structural model does not depend on the median or the mean values of the covariates. The value of the parameters of that model may reflect these values however. Diane

Re: Centering (was Re: Missing covariates)

From: Nick Holford Date: July 05, 2001 technical
Date: Thu, 05 Jul 2001 14:07:11 +1200 From: Nick Holford <n.holford@auckland.ac.nz> Subject: Re: Centering (was Re: Missing covariates) Diane Mould wrote: > Nick Holford's earlier comments to NMUSERS about routinely using a value of 70 to > center creatinine clearance is just such an example of picking a user-friendly value. In fact I suggested standard values of 70 kg, 40 years and 6 L/h creatinine clearance. While 70 mL/min (4.2 L/h) might be a common median in a patient population the idea of centering on 6 L/h (100 mL/min) is because this would be considered "normal" for a healthy 70 kg, 40 year old. The other standard covariate to consider is sex. I agree with Diane that sex should be user-friendly but picking its value depends on the gender covariate which is hard to quantitate. Kim JS, Nafziger AN. Is it sex or is it gender? Clin Pharmacol Ther 2000;68(1):1-3 -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

RE: Centering (was Re: Missing covariates)

From: Stephen Duffull Date: July 05, 2001 technical
From: "Stephen Duffull" <sduffull@pharmacy.uq.edu.au> Subject: RE: Centering (was Re: Missing covariates) Date: Thu, 5 Jul 2001 13:09:34 +1000 Hi I have a question about "centering". As I understand centering - or what seems to be commonly referred to as "centring" in UK statistical texts - it is a method for reducing the influence of collinearity that arises due to computational processes (rather than some intrinsic feature of the variables). The act of centring (eg Xi-mean(X)) helps to eliminate this computational collinearity problem (a typical example of computational collinearity occurs when polynomials are used as models hence X and X^2 are likely to be correlated). Centring will not help for intrinsic collinearity (eg weight and age in newborns). Anyway my question relates to the use of the standardisation recommended by Nick and Diane versus centring for computational purposes. Nicks model standardises parameters as follows: > In fact I suggested standard values of 70 kg, 40 years and 6 > L/h creatinine clearance. etc. However if the standardising value is quite different from the mean (or some other descriptor of the central tendency of the distribution) then is it possible that the beneficial computational effects of centring will be lost? eg standardised creatinine clearance is 6 L/h but the sample average is 4.2 L/h... How far away can the standardised value be from the centre value to retain centring benefits? Any loss of beneficial effects of centring will of course bring all the usual problems associated with collinearity. Any thoughts? Steve ================= Stephen Duffull School of Pharmacy University of Queensland Brisbane, QLD 4072 Australia Ph +61 7 3365 8808 Fax +61 7 3365 1688 http://www.uq.edu.au/pharmacy/duffull.htm

Re: Centering (was Re: Missing covariates)

From: Nick Holford Date: July 05, 2001 technical
Date: Thu, 05 Jul 2001 16:15:34 +1200 From: Nick Holford <n.holford@auckland.ac.nz> Subject: Re: Centering (was Re: Missing covariates) Steve, Steve Duffull wrote: > However if the standardising value is quite different from the mean (or > some other descriptor of the central tendency of the distribution) then > is it possible that the beneficial computational effects of centring > will be lost? eg standardised creatinine clearance is 6 L/h but the > sample average is 4.2 L/h... How far away can the standardised value be > from the centre value to retain centring benefits? > > Any loss of beneficial effects of centring will of course bring all the > usual problems associated with collinearity. I took particular care to write 2 days ago: > No matter what centering value is used for the > estimation [of] the parameter estimates one should consider reporting them using a > standard value. The convenience comes from using a centering value that is the same > as the standard value. i.e. if you think it will improve the numerical aspects of your modelling to use a median weight of say 10 kg for a paediatric group as the centring value rather than a standard value of 70 kg then indeed you should feel free to do so. But when you report the values I suggested that you use the covariate model you have developed to report the values for a standard individual e.g. if you use a simple per kg model for volume of distribution: V=Vpop x WT/10 where 10 kg is the median WT and the estimate for Vpop is 2 L/10kg then you might report Vpop as 14 L/70 kg. Brian Anderson has worked extensively on the weight scaling issue for paediatric pharmacokinetics. He *centred* all his size models using 70 kg even though most if not all subjects in the neonate to adolescent age group had weights less than 70 kg. Parameter estimates were reported per 70 kg and this did not cause any problems for journal reviewers/editors. We did check a few times to see if centring on the median weight made any difference but did not find that it did. This is offered only as anecdotal evidence but does suggest that taking advantage of the convenience of centring on a standard value is not reliably harmful and that it is scientifically acceptable to report standard parameters derived from non-standard subjects. Anderson BJ, McKee D, Holford NHG. Size, myths and the clinical pharmacokinetics of analgesia in paediatric patients. Clinical Pharmacokinetics 1997; 33:313-327 Anderson BJ, Holford NHG, Armishaw JC, Aicken R. Predicting concentrations in children presenting with acetaminophen overdose. J Pediatrics 1999; 135:290-5 Anderson BJ, Woolard G, Holford NHG. A model for size and age changes in the pharmacokinetics of paracetamol in neonates, infants and children. Br J Clin Pharmacol. 2000; 50:125-134 -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

70kg neonates

From: Leon Aarons Date: July 05, 2001 technical
From: "Leon Aarons" <laarons@fs1.pa.man.ac.uk> Subject: 70kg neonates Date: Thu, 5 Jul 2001 11:32:33 GMT Nick encouraged me to post an issue about normalising pk data for neonates. He said he would then reply (why do I think I am being set up?) Anyway in Brian Anderson's paper, which Nick refers to, several pk parameters are adjusted to 70kg. I hesitate to use the word normalisation (and particularly not centring/centering see below for my other concerns about this) since I am not sure what is normal here. My concern about Brian's use of 70kg (and it is not a major one) is that although it allows quick comparison of neonate pk to adult pk, it does have the potential to confuse some readers when they want to compare values between studies. Per kg is widely used and achieves the same aim as per 70kg. I simply feel iit is more "legible". That is all. Now to my real concern: what is the spelling of centering/centring. As usual I consulted the Oxford English Dictionary and as usual I am more confused after doing that than before. In the extracts below you will find three different spellings of it. I particularly like centreing. Take your pick. Note that there is no ambiguity about the spelling of centre. Leon 1. The action of the verb CENTRE; placing in the centre, convergence to the centre. 1667 MILTON P.L. IX. 109 As God in Heav'n Is Center, yet extends to all, so thou [Earth] Centring receav'st from all those Orbs. a1732 ATTERBURY (J.) The visible centring of all the old prophecies in the person of Christ. 2. A placing in the centre or making central; the bringing of two or more centres into coincidence; spec. the setting of lenses so that their axes are in the same straight line. 1768 E. BUYS Dict. Terms of Art, Centering of an Optick-glass, is the grinding it so that the thickest part is exactly in the Middle. 1831 BREWSTER Optics xliii. 358 The..risk of imperfect centering, or of the axes of the three lenses not being in the same straight line. 1881 Edin. Rev. Oct. 537 Mr. Carter recommends that people should look to the centreing of their spectacles for themselves. 1883 Daily News 10 Sept. 2/1 When the ring rotates at high speed, any slight error of centring tends to injure the ring. 3. Arch. The temporary woodwork or framing, whereon any vaulted work is constructed (Gwilt). a1766 Parentalia in Entick London (1766) IV. 206 Both centering and scaffolding. 1861 SMILES Engineers II. 182 The centering upon which the arches of the bridge were built. 1879 SIR G. SCOTT Lect. Archit. II. 194 The use of continuous timber centering. 1885 RUSKIN Præterita iii, Well-made centreings..made this model..attractive. 4. attrib. and Comb., as centring motion, punch (sense 2), stone (sense 3). 1855 I. TAYLOR Restor. Belief 138 A centering-stone of that structure which in the age of the Antonines had arched over the Roman world. 1883 Knowledge 27 Apr. i, Secondary stage with centering motion [in a microscope]. 1884 F. J. BRITTEN Watch & Clockm. 148 Another spring..carrying a fine centreing punch. __________________________________________________ Leon Aarons School of Pharmacy and Pharmaceutical Sciences University of Manchester Manchester, M13 9PL, U.K. tel +44-161-275-2357 fax +44-161-275-2396 email l.aarons@man.ac.uk

Re: 70kg neonates

From: Nick Holford Date: July 05, 2001 technical
From: Nick Holford <n.holford@auckland.ac.nz> Subject: Re: 70kg neonates Date: Fri, 06 Jul 2001 00:01:31 +1200 Leon, Leon Aarons wrote: > > Nick encouraged me to post an issue about normalising pk data for > neonates. He said he would then reply (why do I think I am being set > up?) <grin> > Anyway in Brian Anderson's paper, which Nick refers to, several pk > parameters are adjusted to 70kg. I hesitate to use the word > normalisation (and particularly not centring/centering see below for > my other concerns about this) since I am not sure what is normal > here. My concern about Brian's use of 70kg (and it is not a major > one) is that although it allows quick comparison of neonate pk to > adult pk, it does have the potential to confuse some readers when > they want to compare values between studies. Per kg is widely used > and achieves the same aim as per 70kg. I simply feel iit is more > "legible". That is all. Prediction of CL from weight using a value reported per kg depends on the model. If you use the simple per kg scaling model then this flies in the face of a large amount of empirical biological observation and some quite elegant theory (see the Br J CLin Pcol paper for references). While I accept it might be easier to multiply by weight instead of multiplying by weight^(3/4) (as experiment and allometric theory would recommend) why bother if the answer is wrong? The "legible" direct comparison of clearances calculated using the per kg model in adults and children has caused many to claim that clearance is more rapid in children because the parameter value per kg is larger in children. This naive interpretation is readily explained by the use of the wrong model for accounting for size. It is naive because it confuses the parameter (standardised per kg) with an undefendable model. Comparison of values estimated using an allometric model (whether standardized per kg or per 70 kg) indicates that once children have grown out of the early infant stage they are simply small adults (from a pharmacokinetic perspective). > Now to my real concern: what is the spelling of centering/centring. > As usual I consulted the Oxford English Dictionary and as usual I am > more confused after doing that than before. In the extracts below you > will find three different spellings of it. I particularly like > centreing. Take your pick. I'd like to pick the first (centring) because 1) it is the oldest 2) ATTERBURY applied it to to prophesy which is the purpose that I wish to use centring. 3) The definition deals with convergence which centring is thought to speed up. Nick -- Nick Holford, Divn Pharmacology & Clinical Pharmacology University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556 http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm

Centering

From: Peter Bonate Date: July 05, 2001 technical
From: peter.bonate@quintiles.com Subject: Centering Date: Thu, 5 Jul 2001 08:53:00 -0500 I would just like to add my two cents in regarding centering. The whole issue of centering revolves around scale and matrix inversion. As part of the optimization process that NONMEM or any other nonlinear regression program uses, the gradient and sometimes the Hessian, must be inverted. The gradient is J'J where J is the Jacobian matrix or matrix of partial first derivatives with respect to the model parameters. If the columns of J are of different magnitudes (which happens when you have covariates of different magnitudes) this leads to matrix instability during the inversion process. However, if the columns of J are approximately the same magnitude, as happens when the predictor variables are centered, the resulting matrix inversion is more stable. Hence the reason to center. And Nick is right - always center. In regards to what happens with the parameters 1.) There should be no change in OFV or MSE 2.) The standard errors will change but the precision of the standard errors (SE/Theta) will not. Hence statistical inference will not change. To see whether a model is unstable and centering would be useful, print out the eigenvalues using the PRINT=E option with $COV. Take the square root of the largest to smallest eigenvalue. This number is called the condition number and is a measure of the instability of the gradient matrix with large numbers indicating instability. Now the million dollar question. What is large? In a paper I wrote related to Pop PK (Pharm Res 16, 709-717, 1999) when the condition number was greater than a few hundred, significant matrix instability was present. In regards to nonlinear regression, condition numbers greater than 100,000 are considered high. Note that WinNonlin reports the log-condition number. I wrote the following a while back for a paper I. Although it is written in terms of nonlinear regression, I think some may find it useful. It is easy to extend the concept to multivariate predictor problems such as pop pk. **************** The last example of where ill-conditioning may arise is when the columns of J are of differing magnitudes or scale. At the simplest level, for a pharmacokinetic model where time is the independent variable, simply changing the units of time can have a dramatic effect on the condition number and ill-conditioning. For example, suppose samples for pharmacokinetic analysis are collected at 0, 0.5, 1, 1.5, 2, 3, 4, 6, 8, 12, 18, and 24 h after intravenous drug administration with values of 39.4, 33.3, 29.2, 29.8, 24.3, 20.7, 17.8, 10.7, 6.8, 3.4, 1.0, 0.3, respectively. Assume a 1-compartment model is appropriate to model the data (Embedded image moved to file: pic09232.pcx), where C is concentration, and are the estimable model parameters, and t is time. The first derivatives for the model are (Embedded image moved to file: pic00750.pcx) (Embedded image moved to file: pic25205.pcx) with Jacobian matrix (Embedded image moved to file: pic04975.pcx). Only 9 iterations were required for convergence using a Gauss-Newton algorithm for optimization. The model parameter estimates (std error) are =38.11 (0.72) and =0.2066(0.009449) per h. The matrix J'J is (Embedded image moved to file: pic01539.pcx) with eigenvalues of 2.33 and 23,390 and a corresponding condition number of 100. At one end of the scale is when time is transformed from hours to seconds. Under the transformation, the model parameter estimates are =38.12(0.72) and =5.738E-5(2.625E-6) per h. The matrix J'J is (Embedded image moved to file: pic00303.pcx) with eigenvalues of 2.33 and 303,088,318,808 and corresponding condition of 360,741. At the other end of the scale is when time is transformed from hours to day. Under this transformation, the model parameter estimates are =38.12(0.72) and =4.96(0.227) per h. The matrix J'J is (Embedded image moved to file: pic11422.pcx) with eigenvalues of 2.23 and 42.44 and corresponding condition of 4. It is clear that inverting J'J when time is in second results in an unstable matrix, whereas inverting J'J when time is in days results in a more stable matrix. But note that in all cases, the parameter estimate for remains the same, the mean square error remains the same (1.198), as does the CV of the parameter estimates (4.57%). Changing the scale does not affect the model precision or parameter precision and hence, any statistical inference on the model parameters. The only thing that changes is the estimate of , but the change is proportional based on the transformation. So why such the fuss? Some optimization algorithms are sensitive to scale, whereas others are not. The algorithm used in the example above to estimate the model parameters was the Gauss-Newton algorithm in the NLIN procedure in SAS, which is relatively insensitive to scaling. However, using the GRADIENT method in SAS, which uses the method of Steepest Descent, took more than 13,000 iterations before convergence was achieved and then the parameter estimates were quite poor. For example, whe time is scaled in seconds, the parameter estimates are =35.00(1.20) and =5.123E-5(4.40E-6) with a mean square error of 3.58. Note that the estimate of did not even change from its starting value. Obviously algorithms that are not sensitive to scale are preferable to algorithms that are. But, by forcing the parameter estimates to be approximately the same, less ill-conditioning results, thereby easing the convergence process. ********************** I am pretty certain that NONMEM uses a Newton Raphson algorithm for optimization. In regards to convergence it should be fairly robust to situations where centering is not done. I hope this helps, although I am fairly certain this topic is not quite dead yet. Peter Bonate

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 05, 2001 technical
Date: Thu, 05 Jul 2001 10:08:11 -0400 From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Hi, folks, I think I should clarify my question like this: 1). Assuming you are creating a graph of concentration-time profile for a population with measured concentrations and PRED of the model, should the PRED of the structural model predict the Mean concentrations or Median concentrations , or neither ? 2). Assuming you have a final model with all significant non-centered covariates, if you want to reduce this model to the structural model, should you set each covariate at its mean value or median value, or something else? Or in this way: Assuming you have a final model with all significant non-centered covariates, if you want to predict (by PRED of this model) what the structural model predicts (by PRED), should you set each covariate at its mean value or median value, or something else? (Please don't tell me to set all coefficients of covariates at zero). About 2). If you say MEDIAN, here is my follow-up question, how can you use MEDIAN for a categorical covariate ?. If you use MEAN, is this MEAN over IDs or over concentration records? (Note that all PK parameter estimates are "regulated" by measured concentrations, not by measured CL or Vs). About the expression of a model, I am not against the "standardization" at 70 KG (weight) or whatever for the purpose of practice. However, as a strict scientific principle, the expression of a model or any analysis should strictly (and as explicitly as possible) reflect the TRUE information that the model is based on. For readers/customers who don't have the detail knowledge about how and why the model has been developed, they are easily misled to the point that the model with expression of CL = THETA1 + THETA2*(AGE-65) be developed from data with MEAN age of 65 plus some SE (especially when CENTERING concept is implied) while the real measured data (for the model development) are averaged at age of 50 plus some SE, if you artificially shift the "centering" from 50 to 65. Another physician might think that, OK, since the model has been developed from a population with mean age of 65 (which is not true), I can still use it for an 85 years old patient. But actually, this might be beyond the limit that the original data can reasonably support. I think, a lot of work need be done to bridge the PKPD scientists and clinical professionals. But I hope application won't twist what science (if it could be labeled as science) really looks like. Thanks, Alan.

RE: Centering (was Re: Missing covariates)

From: Leonid Gibiansky Date: July 05, 2001 technical
From: "Gibiansky, Leonid" <gibianskyl@globomax.com> Subject: RE: Centering (was Re: Missing covariates) Date: Thu, 5 Jul 2001 10:29:52 -0400 I think the answer to the question 1 depend on your error model. For additive error model, the model predicts mean: If Y=F+EPS1 then F=mean(Y) For proportional - geometric mean. If Y=F*EXP(EPS1) then F=geomean(Y) For combined - I am not quite sure, it should be possible to study based on the simple examples above. Answer to question 2 depend on your goal. You have a model for everyone. So it is your call whether you want to predict CL for the person with mean weight or median weight, for males or females, etc. I am not sure that model with covariates can be simply reduced to the base model. These are different statistical models, based on different assumptions. I would not try to derive base model from the covariate model. Leonid
From: "KOWALSKI, KENNETH G. [PHR/1825]" <kenneth.g.kowalski@pharmacia.com> Subject: RE: Centering (was Re: Missing covariates) Date: Thu, 5 Jul 2001 09:48:17 -0500 Alan/NMUSERS: Who really cares if INTERCEPT(no centering) has a different standard error than INTERCEPT(centering)? They are not estimating the same thing! Note that INTERCEPT(no centering)+SLOPE(no centering)*CENTEREDVALUE is estimating the same thing as INTERCEPT(centering). If you calculatedthe standard error for INTERCEPT(no centering)+SLOPE(no centering)*CENTEREDVALUE from the covariance matrix of the estimates of the thetas you will get the same standard error as that reported for INTERCEPT(centering). The really important issue is the slope for the covariate effect. If a global minimum is achieved then SLOPE(no centering) and SLOPE(centering) will have the same estimates and standard errors because they are estimating the same thing. Again, the benefits of centering are purely numerical not statistical. I have no problems with not centering if the model is numerically stable and you achieve the global minimum. This takes the question of how far off from the center of your data to do the centering to the extreme (i.e., centered about zero). Obviously, the answer depends on the numerical stability of the model that you are fitting to your data. Since one doesn't always know at the onset how stable the model is going to be, it is good practice to do centering. Whether one uses some standard value (such as Nick suggests) or the mean or median doesn't really matter provided they all achieve the global minimum...pick the one that is most convenient to you. If achieving of the global miniminum is very sensitive to the choice of centering then you may need to look at the model more closely anyway...perhaps it is overparameterized. I agree with Nick that it is convenient to choose standards. Nevertheless, I probably wouldn't use an age of 40 if the range of ages in the study were between 55 and 75...not that there is anything wrong with that (to quote Jerry Seinfeld) if a global minimum is achieved. I would probably use something like 60 or 65 regardless of what the mean or median of the distribution of ages was in the study. If the model was so sensitive to whether I used 60 or 65 versus the mean or the median then I would worry about the appropriateness of the model more so than what value I should use for centering. Ken

RE: Centering (was Re: Missing covariates)

From: William Bachman Date: July 05, 2001 technical
From: "Bachman, William" <bachmanw@globomax.com> Subject: RE: Centering (was Re: Missing covariates) Date: Thu, 5 Jul 2001 11:32:52 -0400 Diane, I think your statement requires a little clarification. The choice of centering value depends on the clinical situation, UP TO A POINT. In the extreme, one should probably not use a centering value of 65 years for a model developed in twenty year olds. It would neither be logical nor you would gain the numerical stability due to centering near a mean or median. I feel much more comfortable in situations using a "nominal" centering value not too distant from the central tendency (eg. 70 kg for study in adults, etc.) Bill

Re: Centering (was Re: Missing covariates)

From: Diane Mould Date: July 05, 2001 technical
From: "diane r mould" <drmould@attglobal.net> Subject: Re: Centering (was Re: Missing covariates) Date: Thu, 5 Jul 2001 12:07:32 -0400 Bill I agree with your statement about using common sense to pick a value. In a later email to NMUSERS, I clarified my position by saying: "The purpose of centering is two-fold. One purpose is to help out numerically and if the choice of value used for centering is far from the median or mean value for that demographic covariate, then you will certainly lose that benefit of centering. So if your study is in pediatric patients and you chose 65 yrs as the value to center age with, that could cause problems such as causing parameter values to become negative, or making it difficult to achieve convergence. If the latter occurred or could not be overcome by appropriate parameterization, one might be inclined to disregard potential covariates. So to that extent, I would agree that if you pick a value for centering that is greatly different from your present study you could have problems with the modeling exercise." So as always with modeling, I would summarize by saying that we should use common sense as the final rational for doing anything. :-) diane

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 05, 2001 technical
From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Date: Thu, 05 Jul 2001 12:52:03 -0400 That's correct, two INTERCEPTs have different physiological meanings. By the way, I'm not against centering. Actually, I have always been using that. Alan. -- ***** Alan Xiao, Ph.D *************** ***** PK/PD Scientist *************** ***** Cognigen Corporation ********** ***** Tel: 716-633-3463 ext 265 ******

Question 2 about prediction and covariates

From: Alan Xiao Date: July 05, 2001 technical
From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Question 2 about prediction and covariates Date: Thu, 05 Jul 2001 13:29:30 -0400 Thanks Leonid, For question 2, I'm thinking, since the structural model predicts MEAN or GEOMEAN of concentration data, there should exist a combination of covariate values, with which the final model can predict (by PRED) exactly what the structural model predicts (by PRED) - i.e., the mean or geomean concentration data. What's this combination? Is it related to mean or median covariate values? Or no answer at all? (Certainly "reduce" is not the right word and I did not mean to reduce a covariate model to a structural model - which does not make sense, but meant predictions, see question 2 "Or in this way: ..."). Alan. -- ***** Alan Xiao, Ph.D *************** ***** PK/PD Scientist *************** ***** Cognigen Corporation ********** ***** Tel: 716-633-3463 ext 265 ******

RE: Centering (was Re: Missing covariates)

From: Matt Hutmacher Date: July 06, 2001 technical
From: "HUTMACHER, MATTHEW [Non-Pharmacia/1825]" <matthew.hutmacher@pharmacia.com> Subject: RE: Centering (was Re: Missing covariates) Date: Fri, 6 Jul 2001 10:35:52 -0500 Alan, For categorical variables such as sex, race, etc., you could try parameterizing the model like: TVCL=(THETA(1)+Xi*THETA(2)) where Xi= 1 for females and Xi=-1 for males, for example. Then THETA(1) will be somewhat of a population average when Xi=0. This should help getting your covariate model to go through the center of your data like your structural model. Matt

RE: Centering (Impact on SE)

From: Vladimir Piotrovskij Date: July 09, 2001 technical
From: "Piotrovskij, Vladimir [JanBe]" <VPIOTROV@janbe.jnj.com> Subject: RE: Centering (Impact on SE) Date: Mon, 9 Jul 2001 11:17:17 +0200 One simple example (sorry, using S+, not NONMEM): ## SAMPLE FROM BIVARIATE NORMAL > x <- as.data.frame(rmvnorm(50, mean = c(40, 10), sd = c(5, 0.2), rho = 0.5, d = 2)) > names(x) <- c("AGE", "CL") > x$AGE.ctr <- x$AGE - 40 ## CENTERED AGE > y <- lm(CL ~ AGE, x) ## LINEAR MODEL W/O CENTERING > z <- lm(CL ~ AGE.ctr, x) ## LINEAR MODEL WITH CENTERING > summary(y) ## FITTED MODEL ... Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 9.0027 0.2345 38.3870 0.0000 AGE 0.0249 0.0058 4.2960 0.0001 ... > summary(z) ... Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 9.9994 0.0251 398.3339 0.0000 AGE.ctr 0.0249 0.0058 4.2960 0.0001 ... As you see, t-value (Estimate/SE) is affected. Best regards, Vladimir ------------------------------------------------------------------------ Vladimir Piotrovsky, Ph.D. Research Fellow Global Clinical Pharmacokinetics and Clinical Pharmacology Janssen Research Foundation B-2340 Beerse Belgium Email: vpiotrov@janbe.jnj.com

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 09, 2001 technical
From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Date: Mon, 09 Jul 2001 08:31:27 -0400 Hi, Mat, Thanks for your reply. For ideal distribution (50% male/50% female), that's a clever choice. However, for most non-ideal situations, e.g., 70% female and 30% male, ..... I'm wondering if anyone has ever used a HYPOTHETICAL MEAN gender (as well as other dichotomous covariates, with a fractional value) before (for the purpose of centering or any other purposes)? Thanks, Alan. -- ***** Alan Xiao, Ph.D *************** ***** PK/PD Scientist *************** ***** Cognigen Corporation ********** ***** Tel: 716-633-3463 ext 265 ******

RE: Centering (Impact on SE)

From: Kenneth G. Kowalski Date: July 09, 2001 technical
From: "KOWALSKI, KENNETH G. [PHR/1825]" <kenneth.g.kowalski@pharmacia.com> Subject: RE: Centering (Impact on SE) Date: Mon, 9 Jul 2001 08:52:08 -0500 Vladimir, Note that in your example the inference (estimate, SE, and t-value) for the slope parameter for age does not change. It is not appropriate to compare the estimate, SE and t-value for the intercept for the centered and w/o centered parameterizations. In the centered case the t-test is testing whether the CL at age 40 = 0, whereas w/o centering the t-test is testing whether the CL at age 0 = 0. Ken

RE: Centering (Impact on SE)

From: Vladimir Piotrovskij Date: July 09, 2001 technical
From: "Piotrovskij, Vladimir [JanBe]" <VPIOTROV@janbe.jnj.com> Subject: RE: Centering (Impact on SE) Date: Mon, 9 Jul 2001 16:38:59 +0200 Ken, I totally agree with you. I did not suggest to compare two fits, but just stressed that the Wald statistics is affected by centering as well as SE itself. Best regards, Vladimir

RE: Centering (Impact on SE)

From: Smith Brian P Date: July 09, 2001 technical
From: SMITH_BRIAN_P@Lilly.com Subject: RE: Centering (Impact on SE) Date: Mon, 09 Jul 2001 10:14:34 -0500 Unfortunately, this is a classic apples to oranges comparison. In the uncentered model the intercept is estimating the clearance for a person with an age of 0. In the centered model the intercept is estimating the clearance for a person with an age of 40. Obviously, your estimate of a person with an age of 40 will be more precise (this is where most of your data is) than an estimate of a person with age 0. Now, notice that the estimate of a person of age 0 with your non centered model is 9.0024 + 0*0.0249 = 9.0024 the estimate of a person of age 40 with your non centered model is 9.0024 + 40*0.0249 = 9.9984 the estimate of a person of age 0 with your centered model is 9.9994 - 40*0.0249 = 9.0034 the estimate of a person of age 40 with your centered model is 9.9994 + 0*0.0249 = 9.9994 The only difference of the estimates of these two models is completely due to rounding error. With this said, consider finding the standard error for a person with age 0 from your centered model. That is find the standard error of 9.9994 - 40*0.0249. It is a mathematical fact that the standard error will be exactly the same as the standard error for the intercept for the non centered model. Further notice that the estimate and standard error of the slopes of the two models are exactly the same. Thus, the 2 model give identical inference about the effect of age on clearance. As has been mentioned, there are numerical analysis advantages to centering. Centering also allows the intercept to be an estimate of something meaningful. As others have said and I reiterate, these advantages make centering useful. However, statistically (given that both models properly converge) there is no advantage to centering. Sincerely, Brian Smith Eli Lilly and Company

RE: Centering (was Re: Missing covariates)

From: Matt Hutmacher Date: July 09, 2001 technical
From: "HUTMACHER, MATTHEW [Non-Pharmacia/1825]" <matthew.hutmacher@pharmacia.com> Subject: RE: Centering (was Re: Missing covariates) Date: Mon, 9 Jul 2001 12:33:28 -0500 Alan, You could try averaging the Xi's=Xbar, then use TVCL=THETA(1)+Xbar*THETA(2). Matt
From: "Perez Ruixo, Juan Jose [JanBe]" <JPEREZRU@janbe.jnj.com> Subject: RE: Centering (was Re: Missing covariates) Date: Thu, 12 Jul 2001 10:13:07 +0200 Dear all, We must take care with the centering approach for categorical data. Following the example of Mat (TVCL = THETA(1) + Xi * THETA(2)), the parametrizations Xi = 1 for females, and Xi = -1 for males, allow to get the population average when Xi=0 and there is equal proportion of males and females. In this case, we have a null correlation between intercept and slope, but the intercept SE is the same that slope SE and, both equal to residual standard error divided by SQRT(N). In other words, you don't know directly the males population parameter and the difference with females, because THETA(2) is affected by the Xi codification. In this case THETA(2) is a half of the real difference between males and females and its absolute standard error (not relative standard error) is affected. It means t-test is the same, but confidence interval building needs previous transformation in order to get the precision of the real difference between males and females. I can show you a simple example (with S+ code) weight <- c(rnorm(50, mean = 60, sd = 6),rnorm(50, mean = 50, sd = 5)) gender0 <- c( rep(1,50),rep(0,50)) gender1 <- c( rep(1,50),rep(-1,50)) G0 <- lm(weight~gender0) G1 <- lm(weight~gender1) summary(G0) summary(G1) ....... Coefficients G0: Value Std. Error t value Pr(>|t|) (Intercept) 50.7279 0.7918 64.0675 0.0000 gender0 7.9107 1.1198 7.0647 0.0000 Residual standard error: 5.599 on 98 degrees of freedom Multiple R-Squared: 0.3374 F-statistic: 49.91 on 1 and 98 degrees of freedom, the p-value is 2.362e-010 Correlation Intercept, gender: -0.7071 Coefficients G1: Value Std. Error t value Pr(>|t|) (Intercept) 54.6832 0.5599 97.6698 0.0000 gender1 3.9554 0.5599 7.0647 0.0000 Residual standard error: 5.599 on 98 degrees of freedom Multiple R-Squared: 0.3374 F-statistic: 49.91 on 1 and 98 degrees of freedom, the p-value is 2.362e-010 Correlation Intercept, gender: 0 For this reasons, I suggest don't use centering approach for categorical data (here, I don't include ordinal data). With first approach the intercept have a useful meaning, and I don't need to center the variable. Thanks, Juan Jose Perez Ruixo Global Pharmacokinetics and Clinical Pharmacology Division. Janssen Research Foundation Turnhoutseweg, 30 B-2340 Beerse Belgium Tel: (+32) 14 60 75 08 Email: jperezru@janbe.jnj.com
From: "Perez Ruixo, Juan Jose [JanBe]" <JPEREZRU@janbe.jnj.com> Subject: RE: Centering (was Re: Missing covariates) Date: Thu, 12 Jul 2001 10:17:54 +0200 Dear all, Regarding standard errors when the centering approach is used, I would like to add some comments. For simple linear model without centering the independent variable (y = a + bx), the variance of y (as a function of x) is equal to: S**2 * { (1 / N) + (x - X)**2 / Sx} eq. 1 where, S is the residual standard error; N is the number of pairs x,y; X is the mean of x; Sx is the sum of squares for x. When x = X, we have the lower variance of y, S**2 / N. This variance is equal to the centering approach variance in x - X = 0. It only results in applying eq.1 to the model, y = a' + b (x-X), where a' = a + bX. In both cases, the variance in the mean of the independent variable is lower than the variance of the intercept. Only when the mean of x is equal 0, these variances are equal. For these reasons, centering does not affect the standard errors, and the intercept errors are different because it represents different values. We must be careful with the centering approach for categorical data. In this case, the slope (b) is affected by the codification used. Following the example of Matt (TVCL = THETA(1) + Xi * THETA(2)), the parametrization Xi = 1 for females, and Xi = -1 for males, allows to get the population average when Xi=0 and there is equal proportion of males and females. In this case, we have a null correlation between intercept and slope, but the intercept SE is the same as the slope SE and, both equal to residual standard error divided by SQRT(N). In other words, you don't know directly the population parameter for males and the difference from females. THETA(2) is a half of the real difference between males and females and its absolute standard error (not relative standard error) is affected. It means t-test is the same, but confidence interval building needs previous transformation in order to get the precision of the real difference between males and females. I can show a simple example (with S+ code) weight <- c(rnorm(50, mean = 60, sd = 6),rnorm(50, mean = 50, sd = 5)) gender0 <- c( rep(1,50),rep(0,50)) gender1 <- c( rep(1,50),rep(-1,50)) G0 <- lm(weight~gender0) G1 <- lm(weight~gender1) summary(G0) summary(G1) ....... Coefficients G0: Value Std. Error t value Pr(>|t|) (Intercept) 50.7279 0.7918 64.0675 0.0000 gender0 7.9107 1.1198 7.0647 0.0000 Residual standard error: 5.599 on 98 degrees of freedom Multiple R-Squared: 0.3374 F-statistic: 49.91 on 1 and 98 degrees of freedom, the p-value is 2.362e-010 Correlation Intercept, gender: -0.7071 Coefficients G1: Value Std. Error t value Pr(>|t|) (Intercept) 54.6832 0.5599 97.6698 0.0000 gender1 3.9554 0.5599 7.0647 0.0000 Residual standard error: 5.599 on 98 degrees of freedom Multiple R-Squared: 0.3374 F-statistic: 49.91 on 1 and 98 degrees of freedom, the p-value is 2.362e-010 Correlation Intercept, gender: 0 For this reasons, I suggest not to use centering approach for categorical data (here, I don't include ordinal data). Without centering the intercept have a useful meaning, thereby making centering unnecessary. > Thanks, > > Juan Jose Perez Ruixo > Global Pharmacokinetics and Clinical Pharmacology Division. > Janssen Research Foundation > Turnhoutseweg, 30 > B-2340 Beerse > Belgium > Tel: (+32) 14 60 75 08 > Email: jperezru@janbe.jnj.com >

RE: Centering (was Re: Missing covariates)

From: Matt Hutmacher Date: July 12, 2001 technical
From: "HUTMACHER, MATTHEW [Non-Pharmacia/1825]" <matthew.hutmacher@pharmacia.com> Subject: RE: Centering (was Re: Missing covariates) Date: Thu, 12 Jul 2001 11:16:00 -0500 Dear all, I was not suggesting these be the parameters that are reported for a population analysis. I was merely suggesting this approach for graphical purposes. Alan wanted, it seemed, to be able to graph his final model (covariates included) as observed and predicted vs. time in a similar manner as his base model (no covariates). The suggestion was merely a convenient way to capture the central tendency for the data. Matt

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 12, 2001 technical
Date: Thu, 12 Jul 2001 19:12:58 -0400 From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Hi, I'll throw my 2 cents in here. About the centering for dichotomous or ordinal covariates, how about the following mental experiment, which starts from continuous covariates for which centering seems the least controversial here. For an ideal normal distribution of WEIGHT in a population with a mean of 40 and STD of 10, for example, probably no one will question the centering here. Now assuming in a real population, the sampled weight distribution is also normal (overall) around 40 but the sampling are distributed in the range of 10-20, 25-35, 45-55, 60-70, do you have problem with centering for this case? Further, assuming the sampling data are distributed in the ranges of 13-17, 27-33, 47-53, 63-67, do you have problem with centering for this case? Further similar operation will reduce the distribution to the ordinal/dichotomous distribution. Now, my question is, why ordinal/dichotomous covariates can not be treated as a special continuous covariates, or as a continuous covariates with special sampling values? (Usually in science, a more general/complicated theory should be able to explain/predict a specific/simpler case, am I wrong? ) Otherwise, what's the statistical basis for dichotomous covariates evaluation in PK/PD models? Back to Perez's example, note that, the scaling in gender0 (0~1) and gender1 (-1~1) is different. If you use the same scaling, for example, gender0 (0~1) and gender1 (-0.5 ~ 0.5), you'll get a different story. Following attached is a SAS program and corresponding results with 50 subjects (for WEIGHT and SEXF, data distribution is attached) - Notice the comparisons of results in following pairs: sexf0 (0 ~ 1) and sexf1 (-1 ~ 1) (comparing to Perez's case); sexf0 (0 ~ 1) and sexf2 (-0.5 ~ 0.5); and sexf1(-1 ~ 1) and sexf3 (0 ~ 2). N = 50. Dependent variable WEIGHT Parameter Estimate Standard t Value Pr > |t|| Error Intercept 50.55440307 0.94663607 53.40 <.0001 sexf0 10.85100036 7.77 <.0001 1.39573874 Intercept 55.97990325 0.69786937 80.22 <.0001 sexf1 5.42550018 0.69786937 7.7 <.0001 Intercept 55.97990325 0.69786937 80.22 <.0001 sexf2 1.39573874 7.77 <.0001 10.85100036 Intercept 50.55440307 0.94663607 53.40 <.0001 sexf3 5.42550018 0.69786937 7.77 <.0001 So, Perez's explanation seems to need some modification. ..... If you are interested, see the attached results for details or run by yourself. **************** SAS programs ************** %macro sss; data aa1; %do i = 1 %to 50; sexf0 = round(0.5 + sqrt(0.027)*rannor(35179*&i), 1); sexf1 = 2*sexf0 -1; sexf2 = sexf0 -0.5; sexf3 = 2*sexf0; if sexf0 = 0 then weight = 50 + sqrt(25)*rannor(43820*&i); if sexf0 = 1 then weight = 60 + sqrt(36)*rannor(43820*&i); output; %end; run; %mend sss; %sss; proc print data = aa1; run; proc gplot data = aa1; plot weight * sexf0; plot weight * sexf1; plot weight * sexf2; plot weight * sexf3; run; proc GLM data = aa1; model weight = sexf0; output out = bb1 p = pred r= res; run; proc GLM data = aa1; model weight = sexf1; output out = bb2 p = pred r= res; run; proc GLM data = aa1; model weight = sexf2; output out = bb3 p = pred r= res; run; proc GLM data = aa1; model weight = sexf3; output out = bb3 p = pred r= res; run; quit; **************results - graphs not shown here *********** The SAS System 94 16:32 Thursday, July 12, 2001 Obs sexf0 sexf1 sexf2 sexf3 weight 1 1 1 0.5 2 54.0675 2 1 1 0.5 2 54.4837 3 0 -1 -0.5 0 50.1830 4 0 -1 -0.5 0 59.8383 5 1 1 0.5 2 57.8778 6 0 -1 -0.5 0 47.4324 7 1 1 0.5 2 57.7886 8 1 1 0.5 2 65.4018 9 0 -1 -0.5 0 54.5489 10 0 -1 -0.5 0 55.4752 11 0 -1 -0.5 0 50.8191 12 1 1 0.5 2 60.9912 13 0 -1 -0.5 0 47.2735 14 0 -1 -0.5 0 57.7363 15 1 1 0.5 2 55.2785 16 1 1 0.5 2 59.8003 17 0 -1 -0.5 0 57.1171 18 0 -1 -0.5 0 52.8775 19 0 -1 -0.5 0 44.2806 20 0 -1 -0.5 0 51.8335 21 1 1 0.5 2 67.7795 22 0 -1 -0.5 0 53.4585 23 0 -1 -0.5 0 48.2589 24 0 -1 -0.5 0 48.2766 25 1 1 0.5 2 59.9475 ' The SAS System 95 16:32 Thursday, July 12, 2001 Obs sexf0 sexf1 sexf2 sexf3 weight 26 1 1 0.5 2 56.2993 27 1 1 0.5 2 64.0133 28 0 -1 -0.5 0 48.0437 29 1 1 0.5 2 61.5021 30 0 -1 -0.5 0 52.9423 31 0 -1 -0.5 0 46.8322 32 0 -1 -0.5 0 45.1282 33 1 1 0.5 2 68.9803 34 0 -1 -0.5 0 51.5586 35 0 -1 -0.5 0 53.0975 36 0 -1 -0.5 0 55.8479 37 1 1 0.5 2 66.0053 38 1 1 0.5 2 58.9312 39 1 1 0.5 2 52.7544 40 1 1 0.5 2 68.0063 41 0 -1 -0.5 0 46.0756 42 1 1 0.5 2 65.4956 43 0 -1 -0.5 0 48.3888 44 0 -1 -0.5 0 48.8654 45 1 1 0.5 2 70.5623 46 0 -1 -0.5 0 40.6463 47 1 1 0.5 2 68.1454 48 1 1 0.5 2 61.6652 49 0 -1 -0.5 0 48.1331 50 1 1 0.5 2 56.5472 ' The SAS System 96 16:32 Thursday, July 12, 2001 The GLM Procedure Number of observations 50 ' The SAS System 97 16:32 Thursday, July 12, 2001 The GLM Procedure Dependent Variable: weight Sum of Source DF Squares Mean Square F Value Pr > F Model 1 1462.383074 1462.383074 60.44 <.0001 Error 48 1161.371327 24.195236 Corrected Total 49 2623.754401 R-Square Coeff Var Root MSE weight Mean 0.557363 8.855503 4.918865 55.54586 Source DF Type I SS Mean Square F Value Pr > F sexf0 1 1462.383074 1462.383074 60.44 <.0001 ' The SAS System 98 16:32 Thursday, July 12, 2001 The GLM Procedure Dependent Variable: weight Source DF Type III SS Mean Square F Value Pr > F sexf0 1 1462.383074 1462.383074 60.44 <.0001 Standard Parameter Estimate Error t Value Pr > |t| Intercept 50.55440307 0.94663607 53.40 <.0001 sexf0 10.85100036 1.39573874 7.77 <.0001 ' The SAS System 99 16:32 Thursday, July 12, 2001 The GLM Procedure Number of observations 50 ' The SAS System 100 16:32 Thursday, July 12, 2001 The GLM Procedure Dependent Variable: weight Sum of Source DF Squares Mean Square F Value Pr > F Model 1 1462.383074 1462.383074 60.44 <.0001 Error 48 1161.371327 24.195236 Corrected Total 49 2623.754401 R-Square Coeff Var Root MSE weight Mean 0.557363 8.855503 4.918865 55.54586 Source DF Type I SS Mean Square F Value Pr > F sexf1 1 1462.383074 1462.383074 60.44 <.0001 ' The SAS System 101 16:32 Thursday, July 12, 2001 The GLM Procedure Dependent Variable: weight Source DF Type III SS Mean Square F Value Pr > F sexf1 1 1462.383074 1462.383074 60.44 <.0001 Standard Parameter Estimate Error t Value Pr > |t| Intercept 55.97990325 0.69786937 80.22 <.0001 sexf1 5.42550018 0.69786937 7.77 <.0001 ' The SAS System 102 16:32 Thursday, July 12, 2001 The GLM Procedure Number of observations 50 ' The SAS System 103 16:32 Thursday, July 12, 2001 The GLM Procedure Dependent Variable: weight Sum of Source DF Squares Mean Square F Value Pr > F Model 1 1462.383074 1462.383074 60.44 <.0001 Error 48 1161.371327 24.195236 Corrected Total 49 2623.754401 R-Square Coeff Var Root MSE weight Mean 0.557363 8.855503 4.918865 55.54586 Source DF Type I SS Mean Square F Value Pr > F sexf2 1 1462.383074 1462.383074 60.44 <.0001 ' The SAS System 104 16:32 Thursday, July 12, 2001 The GLM Procedure Dependent Variable: weight Source DF Type III SS Mean Square F Value Pr > F sexf2 1 1462.383074 1462.383074 60.44 <.0001 Standard Parameter Estimate Error t Value Pr > |t| Intercept 55.97990325 0.69786937 80.22 <.0001 sexf2 10.85100036 1.39573874 7.77 <.0001 ' The SAS System 105 16:32 Thursday, July 12, 2001 The GLM Procedure Number of observations 50 ' The SAS System 106 16:32 Thursday, July 12, 2001 The GLM Procedure Dependent Variable: weight Sum of Source DF Squares Mean Square F Value Pr > F Model 1 1462.383074 1462.383074 60.44 <.0001 Error 48 1161.371327 24.195236 Corrected Total 49 2623.754401 R-Square Coeff Var Root MSE weight Mean 0.557363 8.855503 4.918865 55.54586 Source DF Type I SS Mean Square F Value Pr > F sexf3 1 1462.383074 1462.383074 60.44 <.0001 ' The SAS System 107 16:32 Thursday, July 12, 2001 The GLM Procedure Dependent Variable: weight Source DF Type III SS Mean Square F Value Pr > F sexf3 1 1462.383074 1462.383074 60.44 <.0001 Standard Parameter Estimate Error t Value Pr > |t| Intercept 50.55440307 0.94663607 53.40 <.0001 sexf3 5.42550018 0.69786937 7.77 <.0001 -- ***** Alan Xiao, Ph.D *************** ***** PK/PD Scientist *************** ***** Cognigen Corporation ********** ***** Tel: 716-633-3463 ext 265 ******
From: "Perez Ruixo, Juan Jose [JanBe]" <JPEREZRU@janbe.jnj.com> Subject: Re: Centering (was Re: Missing covariates) Date: Mon, 30 Jul 2001 13:08:26 +0200 Dear Alan and all, Now, I think everybody agree with centering approach for quantitative covariates, also when they are sampled with different strategies. But, I don't think the same is true for categorical covariates. In this setting, we can distinguish nominal (for instance, pharmaceutical form: solution, capsule or tablet; or sex) and ordinal (for instance, disease progresion: grade I, II, III or IV; or APGAR scale) covariates. This type of covariates cannot be treated as special continous covariates with special sampling values. Quantitative covariates are from interval (for instance, temperature) or ratio (for instance, age) metric scale, and additive or additive and multiplicative operations with them are allowed, respectively. For this reasons, centering approach can be used independently of sampling strategy. Categorical covariates aren't from metric scale. In nominal covariates only equality operations are allowed, and in ordinal covariates equality and order operations are possible. By definition, additive and multiplicative operations are not applicable. For this reasons, centering approach must be avoided. A special case is ordinal covariates with a lot of values (for instance, APGAR). If you assume asummed that the "distance" between 6 and 7 scores is the same as between 10 and 11 scores, usually, it's possible to treat this covariate as a quantitative with an interval metric scale and then, centering approach can be usefull. But this assumption is hardly applicable for covariates with a few scores like disease progression covariates. Usually, the "distance" between grade I and II scores is not the same than III to IV scores. Allan's example shows how the slope of a linear regression model with categorical data is affected by the codification used, as I said in my last email. In that example, the real difference between male and female weight is 10.85 (SE: 1.39). I agree we can get this value from codification like sexf0 or sexf2, but no sexf1 or sexf3. In last examples, we need to multiply by 2 for getting the confidence interval of the real difference between male and female. Sexf0 and sexf2 represents two different types of codification. The sexf0 codification is named "reference cell coding" or "partial method" and, sexf2 codification is named "deviations from mean coding" or "marginal method". The choice of the covariates codification depends of the effect that you want to fit. Now, we can consider the pharmaceutical form covariate. If I want to estimate the absorption rate constant difference between capsule and solution and, also the difference between tablet and solution, I will use two dummy covariates with reference cell coding (D1: solution = 0 and capsule = 1; D2 solution = 0 and table = 1). But, if I wish to compare the capsule absorption rate with respect to the mean of absorption rate of all pharmaceuticaI forms, I will use deviation from mean coding. If I have the same number of data for every pharmaceutical form, I could code: solution = -1, capsule = 1 and tablet = 0. If the number of categories increases, the complexity of codification increases too. This situation doesn't happen with reference cell coding. Moreover, in health sciences, the reference cell coding is the most oftenly used codification because the regression coefficients are very easy to interpret. In sexf0 example, the intercept have a direct meaning, it's the weight average for category 0 and it's independent of the ratio male to female. It doesn't happen the same with sexf2. Moreover, if the male to female ratio is not equal to 1:1, it's neccesary to modify appropiately the values for codifications with deviations from mean coding, otherwise the intercept will be affected and won't represent the weigth average of male and female. Thanks, Juan Jose Perez Ruixo Global Clinical Pharmacokinetics and Clinical Pharmacodynamics Department. Jassen Research Foundation Turnhoutsweg, 30 B-2340 Beerse Belgium Telephone: +032 / 14 60 75 08 Fax: +032 / 14 60 58 34 E-mail: jperezru@janbe.jnj.com

Re: Centering (was Re: Missing covariates)

From: Alan Xiao Date: July 30, 2001 technical
From: Alan Xiao <Alan.Xiao@cognigencorp.com> Subject: Re: Centering (was Re: Missing covariates) Date: Mon, 30 Jul 2001 17:25:53 -0400 Dear Juan and All, What we discussed so far are actually about how to handle the question mentioned in your last sentence: "if the male to female ratio is not equal to 1:1, it's necessary to modify appropriately the values for codification with deviations from mean coding, otherwise the intercept will be affected and won't represent the weigh average of male and female." Leonid proposed to use -1 and 1 for gender. It looks like we got around the problem (of defining the mean values for dichotomous covariates). But actually, it's still there, because in this particular case (same number of male and female), the mean value is just zero which could be automatically removed from the equation. Now back to the question of your last sentence, if we want to make the intercept "represent the weight average of male and female", shall we use a fraction value as the mean of gender (numbers of male and female are different, e.g., 30 males and 70 females)? Or any other methods? Note that, we know whether a covariate is dichotomous or continuous just because we defined them with a set of (arbitrary) values in an arbitrary scaling system. However, computer does not know this at all. Computer treats all covariates in the same way based on the same given statistical regulations. With this in mind, if we can center continuous covariates at their means (in our mind, they are continuous, but in computer, they are just a normal variable), I don't see why we can not do that for dichotomous covariates. Of course, after doing that, there is a problem to give interpretations. For this one, the interpretation is associated with the values and the scaling systems we defined for the dichotomous (or categorical) covariates. Even for continuous covariates, we understand MEAN AGE = 40 years just because we chose and we are used to this decimal scaling system and the values based on unit of years. If we choose another totally different scaling system (e.g., hexadecimal) and values for AGE based on different units (e.g., month), the machine - computer - can still give the same statistical inference but the interpretation might look bizarre. It look bizarre not because the statistical inference is changed just because we are not used to that scaling system. Same thing happens here. For a dichotomous covariates, we just use a different scaling system and set values to dichotomous covariates based on different units. However, this seems not influence the statistical inference at all, because the machine - computer can not distinguish a dichotomous covariate from a continuous covariate and can not distinguish the scaling systems they use but transfer all different scaling systems (if apply) to the binary scaling system (for calculation) and the decimal scaling system (for output). Therefore, to make interpretations sound more reasonable (to people who are used to the decimal scaling system), a scaling system closely matching to the decimal scaling system would be preferred. That is, choose those codifications, results of which sounds more reasonable (to the decimal scaling world). Many choices of codification is not a reason that we can not center a dichotomous covariates (statistically meaningfully). Yes. as JUAN said, slopes depends on codification (actually the choice of scaling systems). However, this also happens to continuous covariates. If we change the scaling system for continuos covariates, e.g. from decimal to hexadecimal, the slope will change too. The decimal scaling system for continuous covariates is just one of the scaling systems and we choose it and we are used to it. My point is, for dichotomous covariates, we can do centering with the same statistical meaning in the same statistical way as continuous covariates (run by computer). The interpretation (by us) is associated with the scaling system and units we choose for the dichotomous covariates. I agree, with nominal covariates, we'd be careful. But we can define/choose a scaling system (metric) to make it work. (Which one is not defined by us?). Of course, for each scaling system we define/choose, unit scaling distance should be uniform (or not necessary?). The following article might be helpful. In this article the authors discussed about the centering issue for dichotomous covariates - in a little different way we discussed here. Jonsson EN and Karlsson MO. Automated covariate model building within NONMEM. Pharm Res 1998; 15(9): 463-8. Any further discussion on this topic will be welcome and any input will be appreciated. Best regards, Alan.

RE: Centering (was Re: Missing covariates)

From: Leonid Gibiansky Date: July 30, 2001 technical
From: "Gibiansky, Leonid" <gibianskyl@globomax.com> Subject: RE: Centering (was Re: Missing covariates) Date: Mon, 30 Jul 2001 17:47:27 -0400 I am not sure whether Leonid quoted here is me. If yes, I did not propose centering of the categorical covariates, including gender. I think this is not very useful. I think that for continuous covariates we implicitly assume reasonable distribution (normal or smooth or just not very weird) around the mean or median. For categorical, the distribution degenerates into delta-functions and any centering becomes questionable. I still cannot understand why one may need to reduce covariate model to the base one. From the covariate model one can extract information about any typical person, with their particular characteristics: gender, color of eyes, weight, etc. If one needs prediction for a typical person, one may define this typical person via typical age, weight. You cannot define typical gender, there is no way to do it ! Leonid Gibiansky Go to previous thread: Missing Covariates