Re: unbalanced data set

From: Alison Boeckmann Date: January 22, 2016 technical Source: mail-archive.com
Nick's comment answered the question that was asked, although later responses moved to a somewhat different subject. I'd like to add a little history, as best as I remember what I was told, that may illuminate the original issue, especially for non- statisticians. Prior to 1978, PK data was obtained from drugs that were tested on healthy young volunteers (typically medical students). The data was balanced, i.e., same number of samples at the same times from each of them, typically over one day. If someone dropped out early, it was generally for a reason un-related to the drug, and that subject's data was simply ignored. A methodology such as ANOVA could be used to analyze the data. Lewis Sheiner objected to this. He said the drugs should be tested on the target population. This sometimes meant sick people, in a clinical setting, over a multi-visit time frame. If a subject dropped out early, it might be because this person either over-responded to the drug or under- responded and needed to be put on a rescue medication. But these were the "outlier" subjects that the study was most interested in! Lewis needed a way of combining unbalanced data. Stuart Beal joined him in 1978. His PhD thesis was on a technique for analyzing such data sets. By 1980, they released the first version of NONMEM. To make the point more clear: At the Short Course, Stuart used to talk about a data set with 99 observed values of 100 and 1 observed value of 50. If there is no other information, then the best estimate of the mean in the population is a number close to 100. But what if you knew that the 99 values were from one subject, and the single value of 50 was from a second subject? You'd be very sure of the value 100, but much less sure about the value 50. Therefore, 75 would be a poor choice for the mean in the population. There is a methodology "BLUE" (Best Linear Unbiased Estimator). I can't remember what Stuart said this gave, but it was a number between 75 and 100. That is the whole idea behind NONMEM: to provide a weight for each observation that takes into account the fact that observations come from different subjects. As Lewis says in Guide V, "mixed effect modeling ... is especially useful when there are only a few pharmacokinetic measurements from each individual sampled in the population, or when the data collection design varies considerably between these individuals." -- Alison Boeckmann
Quoted reply history
On Tue, Jan 5, 2016, at 06:03 PM, Zheng Liu wrote: > Dear all, > > I recently have a data set for pk parameters fitting. The issue is > some patients have far more measurement points than others (i.e. a few > patients have ~15 points, other patients have only 1 or 2). I > speculate in the fitted parameters, those patients with many points would contribute much more than those with less points. Then the population "average" values of fitted pk parameters are not anymore average from all the patients, but more biased to those patients with many points. This is not what I expect. > > Of course I could take away some points from the patients with many > points, in order to be comparable to less-points patients. Then I > will be forced to lose some information from the data set. I just > wonder are there anyone who have better proposal to solve this problem? I appreciate your help very much! > > Best regards, > > Zheng -- Alison Boeckmann [email protected]
Jan 06, 2016 Zheng Liu unbalanced data set
Jan 06, 2016 Nick Holford Re: unbalanced data set
Jan 06, 2016 Joachim Grevel RE: unbalanced data set
Jan 06, 2016 Bill Denney Re: unbalanced data set
Jan 06, 2016 Michael Fossler RE: unbalanced data set
Jan 06, 2016 Leonid Gibiansky Re: unbalanced data set
Jan 22, 2016 Alison Boeckmann Re: unbalanced data set
Jan 25, 2016 Zheng Liu RE: unbalanced data set