unbalanced data set

8 messages 7 people Latest: Jan 25, 2016

unbalanced data set

From: Zheng Liu Date: January 06, 2016 technical
Dear all, I recently have a data set for pk parameters fitting. The issue is some patients have far more measurement points than others (i.e. a few patients have ~15 points, other patients have only 1 or 2). I speculate in the fitted parameters, those patients with many points would contribute much more than those with less points. Then the population "average" values of fitted pk parameters are not anymore average from all the patients, but more biased to those patients with many points. This is not what I expect. Of course I could take away some points from the patients with many points, in order to be comparable to less-points patients. Then I will be forced to lose some information from the data set. I just wonder are there anyone who have better proposal to solve this problem? I appreciate your help very much! Best regards, Zheng

Re: unbalanced data set

From: Nick Holford Date: January 06, 2016 technical
Zheng, I think you are imagining a problem that does not really exist. Each observation contributes something to the overall fit. There is no intrinsic reason to require "balance" across subjects. It is always useful to have more information but it is not a good idea to remove observations. Best wishes, Nick
Quoted reply history
On 06-Jan-16 15:03, Zheng Liu wrote: > Dear all, > > I recently have a data set for pk parameters fitting. The issue is some patients have far more measurement points than others (i.e. a few patients have ~15 points, other patients have only 1 or 2). I speculate in the fitted parameters, those patients with many points would contribute much more than those with less points. Then the population "average" values of fitted pk parameters are not anymore average from all the patients, but more biased to those patients with many points. This is not what I expect. > > Of course I could take away some points from the patients with many points, in order to be comparable to less-points patients. Then I will be forced to lose some information from the data set. I just wonder are there anyone who have better proposal to solve this problem? I appreciate your help very much! > > Best regards, > > Zheng -- Nick Holford, Professor Clinical Pharmacology Dept Pharmacology & Clinical Pharmacology, Bldg 503 Room 302A University of Auckland,85 Park Rd,Private Bag 92019,Auckland,New Zealand office:+64(9)923-6730 mobile:NZ+64(21)46 23 53 email: [email protected] http://holford.fmhs.auckland.ac.nz/ "Declarative languages are a form of dementia -- they have no memory of events" Holford SD, Allegaert K, Anderson BJ, Kukanich B, Sousa AB, Steinman A, Pypendop, B., Mehvar, R., Giorgi, M., Holford,N.H.G. Parent-metabolite pharmacokinetic models - tests of assumptions and predictions. Journal of Pharmacology & Clinical Toxicology. 2014;2(2):1023-34. Holford N. Clinical pharmacology = disease progression + drug action. Br J Clin Pharmacol. 2015;79(1):18-27.

RE: unbalanced data set

From: Joachim Grevel Date: January 06, 2016 technical
Dear Zheng, This is indeed a fundamental and recurring problem in drug development. You have rich data from Phase 1 studies (single ascending dose, multiple ascending dose, others e.g. QTc) and sparse data from Phase 3 studies. Should you mix them all in one large meta-analysis and derive the definitive popPK model for that drug/project? After years of experience, I tend to not mix Phase 1 with Phase 3 data. Phase 1 can be used to establish the first popPK model which may contain special features such as nonlinearities/saturation effects as a consequence of the wide range of doses studied. This can be the starting point for the building of a fit-for purpose model using Phase 3 data only. I have come to believe that the specific patient population(s) of Phase 3 require their own popPK model that predicts exposure without bias. This is then used in the exposure-response (E-R) modelling that is important for market approval. Only a dedicated Phase 3 popPK model, that does not carry unnecessary legacies of Phase 1 development, is fit for E-R modelling and can give the important answers about the dose rate(s) to be put in the drug label. I would be interested to hear some other opinions. Good luck, Joachim Joachim Grevel, PhD Scientific Director BAST Inc Limited Science & Enterprise Park Loughborough University Loughborough, LE11 3AQ United Kingdom Tel: +44 (0)1509 222908 www.bastinc.eu http://www.bastinc.eu/
Quoted reply history
From: [email protected] [mailto:[email protected]] On Behalf Of Zheng Liu Sent: 06 January 2016 02:03 To: [email protected] Subject: [NMusers] unbalanced data set Dear all, I recently have a data set for pk parameters fitting. The issue is some patients have far more measurement points than others (i.e. a few patients have ~15 points, other patients have only 1 or 2). I speculate in the fitted parameters, those patients with many points would contribute much more than those with less points. Then the population "average" values of fitted pk parameters are not anymore average from all the patients, but more biased to those patients with many points. This is not what I expect. Of course I could take away some points from the patients with many points, in order to be comparable to less-points patients. Then I will be forced to lose some information from the data set. I just wonder are there anyone who have better proposal to solve this problem? I appreciate your help very much! Best regards, Zheng

Re: unbalanced data set

From: Bill Denney Date: January 06, 2016 technical
Hi Zheng, I'll take an intermediate view between Joachim and Nick. The rich data from Phase 1 provides the ability to define the structural model and a few of the important covariates. The control of Phase 1 gives precision that cannot be achieved in Phase 2 or 3 studies. But, there are usually important differences between Phase 1 and later phase populations that makes the later phase separately important. With later phase trials, the range of covariates is expanded [1]. On top of the expanded covariate range, sometimes late-phase patient populations are categorically different than early phase [2]. In practice, this means that I fit a single model to all data. The model will allow for the dense data from Phase 1 with more inter-individual variability (IIV) terms (fix the IIV to 0 for sparse data) and the expanded covariate range with a richer set of fixed effects as the model is expanded for later phase. Finally, due to typical differences in data quality, I will often include a different residual error structure for sparse data. This approach allows the complexity of the Phase 1 structural model to carry into the richness of the late phase covariate model. [1] A specific example is that typically renal function is allowed to be lower especially when Phase 1 is in healthy subjects. [2] My true belief is that there may be unobserved covariates causing what appears to be a categorical difference. The functional impact of that belief is semantic only. In practice, the model would include a categorical parameter. Thanks, Bill
Quoted reply history
On Jan 6, 2016, at 4:09, "Joachim Grevel" <[email protected]<mailto:[email protected]>> wrote: Dear Zheng, This is indeed a fundamental and recurring problem in drug development. You have rich data from Phase 1 studies (single ascending dose, multiple ascending dose, others e.g. QTc) and sparse data from Phase 3 studies. Should you mix them all in one large meta-analysis and derive the definitive popPK model for that drug/project? After years of experience, I tend to not mix Phase 1 with Phase 3 data. Phase 1 can be used to establish the first popPK model which may contain special features such as nonlinearities/saturation effects as a consequence of the wide range of doses studied. This can be the starting point for the building of a fit-for purpose model using Phase 3 data only. I have come to believe that the specific patient population(s) of Phase 3 require their own popPK model that predicts exposure without bias. This is then used in the exposure-response (E-R) modelling that is important for market approval. Only a dedicated Phase 3 popPK model, that does not carry unnecessary legacies of Phase 1 development, is fit for E-R modelling and can give the important answers about the dose rate(s) to be put in the drug label. I would be interested to hear some other opinions. Good luck, Joachim Joachim Grevel, PhD Scientific Director BAST Inc Limited Science & Enterprise Park Loughborough University Loughborough, LE11 3AQ United Kingdom Tel: +44 (0)1509 222908 www.bastinc.eu_&d=CwMFAg&c=UE1eNsedaKncO0Yl_u8bfw&r=4WqjVFXRfAkMXd6y3wiAtxtNlICJwFMiogoD6jkpUkg&m=wrsdorQ-9eTdtCeqy58cKOuX_NzLV7qeQgXnv6Rs89U&s=3ER4IQI_zP2M4rkqPEVwQseSkXSfoC6ux5FHzM7qeSs&e=">https://urldefense.proofpoint.com/v2/url?u=http-3A__www.bastinc.eu_&d=CwMFAg&c=UE1eNsedaKncO0Yl_u8bfw&r=4WqjVFXRfAkMXd6y3wiAtxtNlICJwFMiogoD6jkpUkg&m=wrsdorQ-9eTdtCeqy58cKOuX_NzLV7qeQgXnv6Rs89U&s=3ER4IQI_zP2M4rkqPEVwQseSkXSfoC6ux5FHzM7qeSs&e= From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Zheng Liu Sent: 06 January 2016 02:03 To: [email protected]<mailto:[email protected]> Subject: [NMusers] unbalanced data set Dear all, I recently have a data set for pk parameters fitting. The issue is some patients have far more measurement points than others (i.e. a few patients have ~15 points, other patients have only 1 or 2). I speculate in the fitted parameters, those patients with many points would contribute much more than those with less points. Then the population "average" values of fitted pk parameters are not anymore average from all the patients, but more biased to those patients with many points. This is not what I expect. Of course I could take away some points from the patients with many points, in order to be comparable to less-points patients. Then I will be forced to lose some information from the data set. I just wonder are there anyone who have better proposal to solve this problem? I appreciate your help very much! Best regards, Zheng

RE: unbalanced data set

From: Michael Fossler Date: January 06, 2016 technical
At the risk of being tiresome about this topic, absent specific differences between Phase 1 and Phase 2/3 data , e.g., renal function due to age or disease states, etc., I'd argue that most of the differences seen between Phase 1 and Phase 2/3 data are due to adherence. In a sense, then, much of the differences in PK between these two groups is artificial, and due to the fact that patients do not reliably take their medication as prescribed, as opposed to Phase 1 volunteers, where adherence is near 100%. Bernard Vrijens has published a lot on this topic as it relates to PPK analyses. We, as a discipline, need to start pushing hard for adherence measures in clinical trials. As an n=1 case study , a few years ago, I was involved with an analysis of a large Phase 2 study which consisted of an in-house phase, followed by discharge to home and an out-patient phase. The patients were significantly older and sicker than Phase 1 volunteers, so one might expect some PK differences. When we analyzed the data from the in-house portion of the study, we got results nearly identical to Phase 1. However, when we added in the out-patient phase, IIV on many of the parameters increased dramatically, and the residual error became extremely large. Clearly, patients were not taking their medication as prescribed ( and as they wrote in their patient diaries). We ended up not using the out-patient portion of the data, which represents a huge waste of resources. This irritates people when I say this, but we as a discipline are so enamored of finding that magical covariate(s) which will explain variability, but we neglect the most important one of all: Did they take the medicine when they say they did? No biological covariate can have as big of an effect as adherence. Accounting for adherence routinely results in up to a 50% decrease in residual variability - few standard covariates have this effect. Fossler M.J. Commentary: Patient Adherence: Clinical Pharmacology's Embarrassing Relative. Journal of Clinical Pharmacology (2015) 55(4): 365-367. Mike Michael J. Fossler, Pharm. D., Ph. D., F.C.P. VP, Quantitative Sciences Trevena, Inc [email protected]<mailto:[email protected]> Office: 610-354-8840, ext. 249 Cell: 610-329-6636
Quoted reply history
From: [email protected] [mailto:[email protected]] On Behalf Of Denney, William S. Sent: Wednesday, January 06, 2016 8:33 AM To: <[email protected]> Cc: Zheng Liu; [email protected] Subject: Re: [NMusers] unbalanced data set Hi Zheng, I'll take an intermediate view between Joachim and Nick. The rich data from Phase 1 provides the ability to define the structural model and a few of the important covariates. The control of Phase 1 gives precision that cannot be achieved in Phase 2 or 3 studies. But, there are usually important differences between Phase 1 and later phase populations that makes the later phase separately important. With later phase trials, the range of covariates is expanded [1]. On top of the expanded covariate range, sometimes late-phase patient populations are categorically different than early phase [2]. In practice, this means that I fit a single model to all data. The model will allow for the dense data from Phase 1 with more inter-individual variability (IIV) terms (fix the IIV to 0 for sparse data) and the expanded covariate range with a richer set of fixed effects as the model is expanded for later phase. Finally, due to typical differences in data quality, I will often include a different residual error structure for sparse data. This approach allows the complexity of the Phase 1 structural model to carry into the richness of the late phase covariate model. [1] A specific example is that typically renal function is allowed to be lower especially when Phase 1 is in healthy subjects. [2] My true belief is that there may be unobserved covariates causing what appears to be a categorical difference. The functional impact of that belief is semantic only. In practice, the model would include a categorical parameter. Thanks, Bill On Jan 6, 2016, at 4:09, "Joachim Grevel" <[email protected]<mailto:[email protected]>> wrote: Dear Zheng, This is indeed a fundamental and recurring problem in drug development. You have rich data from Phase 1 studies (single ascending dose, multiple ascending dose, others e.g. QTc) and sparse data from Phase 3 studies. Should you mix them all in one large meta-analysis and derive the definitive popPK model for that drug/project? After years of experience, I tend to not mix Phase 1 with Phase 3 data. Phase 1 can be used to establish the first popPK model which may contain special features such as nonlinearities/saturation effects as a consequence of the wide range of doses studied. This can be the starting point for the building of a fit-for purpose model using Phase 3 data only. I have come to believe that the specific patient population(s) of Phase 3 require their own popPK model that predicts exposure without bias. This is then used in the exposure-response (E-R) modelling that is important for market approval. Only a dedicated Phase 3 popPK model, that does not carry unnecessary legacies of Phase 1 development, is fit for E-R modelling and can give the important answers about the dose rate(s) to be put in the drug label. I would be interested to hear some other opinions. Good luck, Joachim Joachim Grevel, PhD Scientific Director BAST Inc Limited Science & Enterprise Park Loughborough University Loughborough, LE11 3AQ United Kingdom Tel: +44 (0)1509 222908 www.bastinc.eu_&d=CwMFAg&c=UE1eNsedaKncO0Yl_u8bfw&r=4WqjVFXRfAkMXd6y3wiAtxtNlICJwFMiogoD6jkpUkg&m=wrsdorQ-9eTdtCeqy58cKOuX_NzLV7qeQgXnv6Rs89U&s=3ER4IQI_zP2M4rkqPEVwQseSkXSfoC6ux5FHzM7qeSs&e=">https://urldefense.proofpoint.com/v2/url?u=http-3A__www.bastinc.eu_&d=CwMFAg&c=UE1eNsedaKncO0Yl_u8bfw&r=4WqjVFXRfAkMXd6y3wiAtxtNlICJwFMiogoD6jkpUkg&m=wrsdorQ-9eTdtCeqy58cKOuX_NzLV7qeQgXnv6Rs89U&s=3ER4IQI_zP2M4rkqPEVwQseSkXSfoC6ux5FHzM7qeSs&e= From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Zheng Liu Sent: 06 January 2016 02:03 To: [email protected]<mailto:[email protected]> Subject: [NMusers] unbalanced data set Dear all, I recently have a data set for pk parameters fitting. The issue is some patients have far more measurement points than others (i.e. a few patients have ~15 points, other patients have only 1 or 2). I speculate in the fitted parameters, those patients with many points would contribute much more than those with less points. Then the population "average" values of fitted pk parameters are not anymore average from all the patients, but more biased to those patients with many points. This is not what I expect. Of course I could take away some points from the patients with many points, in order to be comparable to less-points patients. Then I will be forced to lose some information from the data set. I just wonder are there anyone who have better proposal to solve this problem? I appreciate your help very much! Best regards, Zheng ________________________________ Notice: This e-mail message, together with any attachments, contains information of Trevena, Inc., 1018 West 8th Avenue, King of Prussia, PA 19406, USA. This information may be confidential, proprietary, copyrighted and/or legally privileged. It is intended solely for use by the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately and delete it and any attachments from your system.

Re: unbalanced data set

From: Leonid Gibiansky Date: January 06, 2016 technical
I recently found out that FDA approved digestible sensor that can be given with the tablet (any tablet) and inform the patient (and the company if needed) whether and when the tablet was taken http://www.proteus.com/press-releases/first-medical-device-cleared-by-fda-with-adherence-claim/ If used in the trials, it would end the guessing game about dose times, compliance, etc., providing the exact times of doses for the analysis. I am wondering whether anybody has an experience with this type of data? It would be interesting to see the difference between diary-based analysis and sensor-based analysis. Thanks Leonid -------------------------------------- Leonid Gibiansky, Ph.D. President, QuantPharm LLC web: www.quantpharm.com e-mail: LGibiansky at quantpharm.com tel: (301) 767 5566
Quoted reply history
On 1/6/2016 9:55 AM, Michael Fossler wrote: > At the risk of being tiresome about this topic, absent specific > differences between Phase 1 and Phase 2/3 data , e.g., renal function > due to age or disease states, etc., I’d argue that most of the > differences seen between Phase 1 and Phase 2/3 data are due to > adherence. In a sense, then, much of the differences in PK between these > two groups is artificial, and due to the fact that patients do not > reliably take their medication as prescribed, as opposed to Phase 1 > volunteers, where adherence is near 100%. Bernard Vrijens has published > a lot on this topic as it relates to PPK analyses. We, as a discipline, > need to start pushing hard for adherence measures in clinical trials. > > As an n=1 case study , a few years ago, I was involved with an analysis > of a large Phase 2 study which consisted of an in-house phase, followed > by discharge to home and an out-patient phase. The patients were > significantly older and sicker than Phase 1 volunteers, so one might > expect some PK differences. When we analyzed the data from the in-house > portion of the study, we got results nearly identical to Phase 1. > However, when we added in the out-patient phase, IIV on many of the > parameters increased dramatically, and the residual error became > extremely large. Clearly, patients were not taking their medication as > prescribed ( and as they wrote in their patient diaries). We ended up > not using the out-patient portion of the data, which represents a huge > waste of resources. > > This irritates people when I say this, but we as a discipline are so > enamored of finding that magical covariate(s) which will explain > variability, but we neglect the most important one of all: Did they take > the medicine when they say they did? No biological covariate can have as > big of an effect as adherence. Accounting for adherence routinely > results in up to a 50% decrease in residual variability – few standard > covariates have this effect. > > *Fossler M.J.*Commentary: Patient Adherence: Clinical Pharmacology’s > Embarrassing Relative. /Journal of Clinical Pharmacology/ (2015) 55(4): > 365-367. > > Mike > > Michael J. Fossler, Pharm. D., Ph. D., F.C.P. > > VP, Quantitative Sciences > > Trevena, Inc > > [email protected] <mailto:[email protected]> > > Office: 610-354-8840, ext. 249 > > Cell: 610-329-6636 > > *From:*[email protected] > [mailto:[email protected]] *On Behalf Of *Denney, William S. > *Sent:* Wednesday, January 06, 2016 8:33 AM > *To:* <[email protected]> > *Cc:* Zheng Liu; [email protected] > *Subject:* Re: [NMusers] unbalanced data set > > Hi Zheng, > > I'll take an intermediate view between Joachim and Nick. > > The rich data from Phase 1 provides the ability to define the structural > model and a few of the important covariates. The control of Phase 1 > gives precision that cannot be achieved in Phase 2 or 3 studies. But, > there are usually important differences between Phase 1 and later phase > populations that makes the later phase separately important. > > With later phase trials, the range of covariates is expanded [1]. On > top of the expanded covariate range, sometimes late-phase patient > populations are categorically different than early phase [2]. > > In practice, this means that I fit a single model to all data. The > model will allow for the dense data from Phase 1 with more > inter-individual variability (IIV) terms (fix the IIV to 0 for sparse > data) and the expanded covariate range with a richer set of fixed > effects as the model is expanded for later phase. Finally, due to > typical differences in data quality, I will often include a different > residual error structure for sparse data. This approach allows the > complexity of the Phase 1 structural model to carry into the richness of > the late phase covariate model. > > [1] A specific example is that typically renal function is allowed to be > lower especially when Phase 1 is in healthy subjects. > > [2] My true belief is that there may be unobserved covariates causing > what appears to be a categorical difference. The functional impact of > that belief is semantic only. In practice, the model would include a > categorical parameter. > > Thanks, > > Bill > > On Jan 6, 2016, at 4:09, "Joachim Grevel" <[email protected] > <mailto:[email protected]>> wrote: > > Dear Zheng, > > This is indeed a fundamental and recurring problem in drug development. > You have rich data from Phase 1 studies (single ascending dose, multiple > ascending dose, others e.g. QTc) and sparse data from Phase 3 studies. > Should you mix them all in one large meta-analysis and derive the > definitive popPK model for that drug/project? > > After years of experience, I tend to not mix Phase 1 with Phase 3 data. > Phase 1 can be used to establish the first popPK model which may contain > special features such as nonlinearities/saturation effects as a > consequence of the wide range of doses studied. This can be the starting > point for the building of a fit-for purpose model using Phase 3 data > only. I have come to believe that the specific patient population(s) of > Phase 3 require their own popPK model that predicts exposure without > bias. This is then used in the exposure-response (E-R) modelling that is > important for market approval. Only a dedicated Phase 3 popPK model, > that does not carry unnecessary legacies of Phase 1 development, is fit > for E-R modelling and can give the important answers about the dose > rate(s) to be put in the drug label. > > I would be interested to hear some other opinions. > > Good luck, > > Joachim > > *Joachim Grevel, PhD* > > Scientific Director > > BAST Inc Limited > > Science & Enterprise Park > > Loughborough University > > Loughborough, LE11 3AQ > > United Kingdom > > Tel: +44 (0)1509 222908 > > www.bastinc.eu > www.bastinc.eu_&d=CwMFAg&c=UE1eNsedaKncO0Yl_u8bfw&r=4WqjVFXRfAkMXd6y3wiAtxtNlICJwFMiogoD6jkpUkg&m=wrsdorQ-9eTdtCeqy58cKOuX_NzLV7qeQgXnv6Rs89U&s=3ER4IQI_zP2M4rkqPEVwQseSkXSfoC6ux5FHzM7qeSs&e=">https://urldefense.proofpoint.com/v2/url?u=http-3A__www.bastinc.eu_&d=CwMFAg&c=UE1eNsedaKncO0Yl_u8bfw&r=4WqjVFXRfAkMXd6y3wiAtxtNlICJwFMiogoD6jkpUkg&m=wrsdorQ-9eTdtCeqy58cKOuX_NzLV7qeQgXnv6Rs89U&s=3ER4IQI_zP2M4rkqPEVwQseSkXSfoC6ux5FHzM7qeSs&e= > > *From:*[email protected] > <mailto:[email protected]> > [mailto:[email protected]] *On Behalf Of *Zheng Liu > *Sent:* 06 January 2016 02:03 > *To:* [email protected] <mailto:[email protected]> > *Subject:* [NMusers] unbalanced data set > > Dear all, > > I recently have a data set for pk parameters fitting. The issue is some > patients have far more measurement points than others (i.e. a few > patients have ~15 points, other patients have only 1 or 2). I speculate > in the fitted parameters, those patients with many points would > contribute much more than those with less points. Then the > population "average" values of fitted pk parameters are not > anymore average from all the patients, but more biased to those patients > with many points. This is not what I expect. > > Of course I could take away some points from the patients with many > points, in order to be comparable to less-points patients. Then I will > be forced to lose some information from the data set. I just wonder are > there anyone who have better proposal to solve this problem? I > appreciate your help very much! > > Best regards, > > Zheng > > ------------------------------------------------------------------------ > > Notice: This e-mail message, together with any attachments, contains > information of Trevena, Inc., 1018 West 8th Avenue, King of Prussia, PA > 19406, USA. This information may be confidential, proprietary, > copyrighted and/or legally privileged. > It is intended solely for use by the individual or entity named on this > message. If you are not the intended recipient, and have received this > message in error, please notify us immediately and delete it and any > attachments from your system.

Re: unbalanced data set

From: Alison Boeckmann Date: January 22, 2016 technical
Nick's comment answered the question that was asked, although later responses moved to a somewhat different subject. I'd like to add a little history, as best as I remember what I was told, that may illuminate the original issue, especially for non- statisticians. Prior to 1978, PK data was obtained from drugs that were tested on healthy young volunteers (typically medical students). The data was balanced, i.e., same number of samples at the same times from each of them, typically over one day. If someone dropped out early, it was generally for a reason un-related to the drug, and that subject's data was simply ignored. A methodology such as ANOVA could be used to analyze the data. Lewis Sheiner objected to this. He said the drugs should be tested on the target population. This sometimes meant sick people, in a clinical setting, over a multi-visit time frame. If a subject dropped out early, it might be because this person either over-responded to the drug or under- responded and needed to be put on a rescue medication. But these were the "outlier" subjects that the study was most interested in! Lewis needed a way of combining unbalanced data. Stuart Beal joined him in 1978. His PhD thesis was on a technique for analyzing such data sets. By 1980, they released the first version of NONMEM. To make the point more clear: At the Short Course, Stuart used to talk about a data set with 99 observed values of 100 and 1 observed value of 50. If there is no other information, then the best estimate of the mean in the population is a number close to 100. But what if you knew that the 99 values were from one subject, and the single value of 50 was from a second subject? You'd be very sure of the value 100, but much less sure about the value 50. Therefore, 75 would be a poor choice for the mean in the population. There is a methodology "BLUE" (Best Linear Unbiased Estimator). I can't remember what Stuart said this gave, but it was a number between 75 and 100. That is the whole idea behind NONMEM: to provide a weight for each observation that takes into account the fact that observations come from different subjects. As Lewis says in Guide V, "mixed effect modeling ... is especially useful when there are only a few pharmacokinetic measurements from each individual sampled in the population, or when the data collection design varies considerably between these individuals." -- Alison Boeckmann
Quoted reply history
On Tue, Jan 5, 2016, at 06:03 PM, Zheng Liu wrote: > Dear all, > > I recently have a data set for pk parameters fitting. The issue is > some patients have far more measurement points than others (i.e. a few > patients have ~15 points, other patients have only 1 or 2). I > speculate in the fitted parameters, those patients with many points would contribute much more than those with less points. Then the population "average" values of fitted pk parameters are not anymore average from all the patients, but more biased to those patients with many points. This is not what I expect. > > Of course I could take away some points from the patients with many > points, in order to be comparable to less-points patients. Then I > will be forced to lose some information from the data set. I just > wonder are there anyone who have better proposal to solve this problem? I appreciate your help very much! > > Best regards, > > Zheng -- Alison Boeckmann [email protected]

RE: unbalanced data set

From: Zheng Liu Date: January 25, 2016 technical
Dear Alison, Thanks a lot for your detailed comments, which answered all my question. In fact, I felt completely relieved after reading the methodology "BLUE" (Best Linear Unbiased Estimator). This is exactly what I expected. I guess now most of the users can use NONMEM delightfully, without worrying this issue. Thank you also for introducing the development history of NONMEM. Best regards, Zheng Liu, Ph.D. Pharmacometrician (postdoc), Melbourne Royal Children's Hospital email: [email protected]
Quoted reply history
________________________________ From: Alison Boeckmann <[email protected]> Sent: Saturday, 23 January 2016 6:36 AM To: Zheng Liu; [email protected] Subject: Re: [NMusers] unbalanced data set Nick's comment answered the question that was asked, although later responses moved to a somewhat different subject. I'd like to add a little history, as best as I remember what I was told, that may illuminate the original issue, especially for non-statisticians. Prior to 1978, PK data was obtained from drugs that were tested on healthy young volunteers (typically medical students). The data was balanced, i.e., same number of samples at the same times from each of them, typically over one day. If someone dropped out early, it was generally for a reason un-related to the drug, and that subject's data was simply ignored. A methodology such as ANOVA could be used to analyze the data. Lewis Sheiner objected to this. He said the drugs should be tested on the target population. This sometimes meant sick people, in a clinical setting, over a multi-visit time frame. If a subject dropped out early, it might be because this person either over-responded to the drug or under-responded and needed to be put on a rescue medication. But these were the "outlier" subjects that the study was most interested in! Lewis needed a way of combining unbalanced data. Stuart Beal joined him in 1978. His PhD thesis was on a technique for analyzing such data sets. By 1980, they released the first version of NONMEM. To make the point more clear: At the Short Course, Stuart used to talk about a data set with 99 observed values of 100 and 1 observed value of 50. If there is no other information, then the best estimate of the mean in the population is a number close to 100. But what if you knew that the 99 values were from one subject, and the single value of 50 was from a second subject? You'd be very sure of the value 100, but much less sure about the value 50. Therefore, 75 would be a poor choice for the mean in the population. There is a methodology "BLUE" (Best Linear Unbiased Estimator). I can't remember what Stuart said this gave, but it was a number between 75 and 100. That is the whole idea behind NONMEM: to provide a weight for each observation that takes into account the fact that observations come from different subjects. As Lewis says in Guide V, "mixed effect modeling ... is especially useful when there are only a few pharmacokinetic measurements from each individual sampled in the population, or when the data collection design varies considerably between these individuals." -- Alison Boeckmann On Tue, Jan 5, 2016, at 06:03 PM, Zheng Liu wrote: Dear all, I recently have a data set for pk parameters fitting. The issue is some patients have far more measurement points than others (i.e. a few patients have ~15 points, other patients have only 1 or 2). I speculate in the fitted parameters, those patients with many points would contribute much more than those with less points. Then the population "average" values of fitted pk parameters are not anymore average from all the patients, but more biased to those patients with many points. This is not what I expect. Of course I could take away some points from the patients with many points, in order to be comparable to less-points patients. Then I will be forced to lose some information from the data set. I just wonder are there anyone who have better proposal to solve this problem? I appreciate your help very much! Best regards, Zheng -- Alison Boeckmann [email protected]