Theoretical questions about Beal's M2 method

9 messages 6 people Latest: Feb 27, 2009

Theoretical questions about Beal's M2 method

From: Sebastien Bihorel Date: February 13, 2009 technical
Dear colleagues, In a paper dated from 2001, Dr. Beal presented several methods to handle data below the quantification limit (Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method that can be implemented in NONMEM 6 via the YLO functionnality. I would like to submit some questions to the list about the theory associated to the M2 method. I quote: "...the BQL observations can be discarded, and under the assumption that all the D(t) [the distribution of residual errors] are normal, the method of maximum conditional likelihood estimation can be applied to the remaining observations (method M2). With this method, the likelihood for the data, conditional on the fact that by design, all (remaining) observations are above the QL, is maximized with respect to the model parameters. The density function of the distribution on possible observations at time t, evaluated at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the probability that an observation at time t is above the QL is 1- phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution function. Therefore, conditional on the observation at time t being above QL, the likelihood for y(t) is the ratio: l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- phi((QL-f(t)/g(t))) [equation 1]" Now, lets A and B be two events. The probability of A, given B is: p(A|B) = p(A∩B) / p(B) In the context of Dr. Beal's paper, I interpret A as simply the observation y(t) and B as the fact that y(t) is above QL, and thus have the following questions about equation 1: - it looks like p(A∩B) in equation 1 simplifies to the probability of y(t) given the model parameters, i.e. p(A). Which part of the problem allows this simplification? - how can l(t) be constrained between 0 and 1 if both numerator and denominator can vary between 0 and 1? Any comment from nmusers will be greatly appreciated. -- *Sebastien Bihorel, PharmD, PhD* PKPD Scientist Cognigen Corp Email: [email protected] < mailto: [email protected] > Phone: (716) 633-3463 ext. 323

RE: Theoretical questions about Beal's M2 method

From: Lars Erichsen Date: February 13, 2009 technical
Dear Sebastian, The likelihood is not constrained by 1. It is a probability density function - not a probability. Br Lars Erichsen Modelling and simulation specialist Experimental Medicine Ferring Pharmaceuticals
Quoted reply history
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Sebastien Bihorel Sent: 13 February 2009 16:00 To: [email protected] Subject: [NMusers] Theoretical questions about Beal's M2 method Dear colleagues, In a paper dated from 2001, Dr. Beal presented several methods to handle data below the quantification limit (Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method that can be implemented in NONMEM 6 via the YLO functionnality. I would like to submit some questions to the list about the theory associated to the M2 method. I quote: "...the BQL observations can be discarded, and under the assumption that all the D(t) [the distribution of residual errors] are normal, the method of maximum conditional likelihood estimation can be applied to the remaining observations (method M2). With this method, the likelihood for the data, conditional on the fact that by design, all (remaining) observations are above the QL, is maximized with respect to the model parameters. The density function of the distribution on possible observations at time t, evaluated at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the probability that an observation at time t is above the QL is 1- phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution function. Therefore, conditional on the observation at time t being above QL, the likelihood for y(t) is the ratio: l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- phi((QL-f(t)/g(t))) [equation 1]" Now, lets A and B be two events. The probability of A, given B is: p(A|B) = p(A∩B) / p(B) In the context of Dr. Beal's paper, I interpret A as simply the observation y(t) and B as the fact that y(t) is above QL, and thus have the following questions about equation 1: - it looks like p(A∩B) in equation 1 simplifies to the probability of y(t) given the model parameters, i.e. p(A). Which part of the problem allows this simplification? - how can l(t) be constrained between 0 and 1 if both numerator and denominator can vary between 0 and 1? Any comment from nmusers will be greatly appreciated. -- *Sebastien Bihorel, PharmD, PhD* PKPD Scientist Cognigen Corp Email: [email protected] <mailto:[email protected]> Phone: (716) 633-3463 ext. 323 ********************************************************************** Proprietary or confidential information belonging to Ferring Holding SA or to one of its affiliated companies may be contained in the message. If you are not the addressee indicated in this message (or responsible for the delivery of the message to such person), please do not copy or deliver this message to anyone. In such case, please destroy this message and notify the sender by reply e-mail. Please advise the sender immediately if you or your employer do not consent to e-mail for messages of this kind. Opinions, conclusions and other information in this message represent the opinion of the sender and do not necessarily represent or reflect the views and opinions of Ferring. **********************************************************************

Re: Theoretical questions about Beal's M2 method

From: Leonid Gibiansky Date: February 13, 2009 technical
You can view it as: p(y ∩ y > LLQ) = 0 when y < LLQ p(y ∩ y > LLQ) = p(y) when y > LLQ Another way to look on this is to say that p(y | y > LLQ) is proportional to p(y) and should integrate to 1 integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that immediately leads to l(t) below. As to the 0 to 1 restriction, l(t) is the density, not probability. It should integrate to one but can be smaller or greater than 1 (any positive number). Leonid -------------------------------------- Leonid Gibiansky, Ph.D. President, QuantPharm LLC web: www.quantpharm.com e-mail: LGibiansky at quantpharm.com tel: (301) 767 5566 Sebastien Bihorel wrote: > Dear colleagues, > > In a paper dated from 2001, Dr. Beal presented several methods to handle data below the quantification limit (Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method that can be implemented in NONMEM 6 via the YLO functionnality. I would like to submit some questions to the list about the theory associated to the M2 method. > > I quote: > > "...the BQL observations can be discarded, and under the assumption that all the D(t) [the distribution of residual errors] are normal, the method of maximum conditional likelihood estimation can be applied to the remaining observations (method M2). With this method, the likelihood for the data, conditional on the fact that by design, all (remaining) observations are above the QL, is maximized with respect to the model parameters. The density function of the distribution on possible observations at time t, evaluated at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the probability that an observation at time t is above the QL is 1- phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution function. Therefore, conditional on the observation at time t being above QL, the likelihood for y(t) is the ratio: l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- phi((QL-f(t)/g(t))) [equation 1]" > > Now, lets A and B be two events. The probability of A, given B is: p(A|B) = p(A∩B) / p(B) > > In the context of Dr. Beal's paper, I interpret A as simply the observation y(t) and B as the fact that y(t) is above QL, and thus have the following questions about equation 1: - it looks like p(A∩B) in equation 1 simplifies to the probability of y(t) given the model parameters, i.e. p(A). Which part of the problem allows this simplification? - how can l(t) be constrained between 0 and 1 if both numerator and denominator can vary between 0 and 1? > > Any comment from nmusers will be greatly appreciated.
Thanks Leonid, However, there is still one point that is unclear to me. You have demonstrated that p(y | y > LLQ) = p(y) / p(y>LOQ), given the assumptions of the text. Now, this is a discrete probability, while l(t) is a likelihood... How can one mathematically demonstrate the expression of l(t) used by Dr. Beal starting from the previous expression of p(y | y > LLQ)? *Sebastien Bihorel, PharmD, PhD* PKPD Scientist Cognigen Corp Email: [email protected] < mailto: [email protected] > Phone: (716) 633-3463 ext. 323 Leonid Gibiansky wrote: > You can view it as: > > p(y ∩ y > LLQ) = 0 when y < LLQ > p(y ∩ y > LLQ) = p(y) when y > LLQ > > Another way to look on this is to say that > p(y | y > LLQ) is proportional to p(y) and should integrate to 1 > > integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that immediately leads to l(t) below. > > As to the 0 to 1 restriction, l(t) is the density, not probability. It should integrate to one but can be smaller or greater than 1 (any positive number). > > Leonid > > -------------------------------------- > Leonid Gibiansky, Ph.D. > President, QuantPharm LLC > web: www.quantpharm.com > e-mail: LGibiansky at quantpharm.com > tel: (301) 767 5566 > > Sebastien Bihorel wrote: > > > Dear colleagues, > > > > In a paper dated from 2001, Dr. Beal presented several methods to handle data below the quantification limit (Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method that can be implemented in NONMEM 6 via the YLO functionnality. I would like to submit some questions to the list about the theory associated to the M2 method. > > > > I quote: > > > > "...the BQL observations can be discarded, and under the assumption that all the D(t) [the distribution of residual errors] are normal, the method of maximum conditional likelihood estimation can be applied to the remaining observations (method M2). With this method, the likelihood for the data, conditional on the fact that by design, all (remaining) observations are above the QL, is maximized with respect to the model parameters. The density function of the distribution on possible observations at time t, evaluated at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the probability that an observation at time t is above the QL is 1- phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution function. Therefore, conditional on the observation at time t being above QL, the likelihood for y(t) is the ratio: l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- phi((QL-f(t)/g(t))) [equation 1]" > > > > Now, lets A and B be two events. The probability of A, given B is: p(A|B) = p(A∩B) / p(B) > > > > In the context of Dr. Beal's paper, I interpret A as simply the observation y(t) and B as the fact that y(t) is above QL, and thus have the following questions about equation 1: - it looks like p(A∩B) in equation 1 simplifies to the probability of y(t) given the model parameters, i.e. p(A). Which part of the problem allows this simplification? - how can l(t) be constrained between 0 and 1 if both numerator and denominator can vary between 0 and 1? > > > > Any comment from nmusers will be greatly appreciated.

RE: Theoretical questions about Beal's M2 method

From: Yaning Wang Date: February 18, 2009 technical
Bihorel: If you are familiar with the concept of hazard in survival analysis, this conditional likelihood should be straightforward. A slight modification of h(y)=f(y)/S(y) to h(y)=f(y)/S(LOQ) where (y>=LOQ) will give you the expression used by M2: l(y)=f(y)/S(LOQ) at a specific time t. You should ignore t in the M2 expression and think of it as l(y)=P(Y=y|Y>=LOQ) where y>=LOQ at time t. P(Y=y|Y>=LOQ) is not a discrete probability but a probability density because y is a continuous variable. I think P(Y=y, Y>=LOQ) can be simplified to P(Y=y) because there is an inherent restriction: y>=LOQ. Yaning Yaning Wang, Ph.D. Team Leader, Pharmacometrics Office of Clinical Pharmacology Office of Translational Science Center for Drug Evaluation and Research U.S. Food and Drug Administration Phone: 301-796-1624 Email: [email protected] "The contents of this message are mine personally and do not necessarily reflect any position of the Government or the Food and Drug Administration."
Quoted reply history
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Sebastien Bihorel Sent: Monday, February 16, 2009 5:22 PM To: Leonid Gibiansky Cc: [email protected] Subject: Re: [NMusers] Theoretical questions about Beal's M2 method Thanks Leonid, However, there is still one point that is unclear to me. You have demonstrated that p(y | y > LLQ) = p(y) / p(y>LOQ), given the assumptions of the text. Now, this is a discrete probability, while l(t) is a likelihood... How can one mathematically demonstrate the expression of l(t) used by Dr. Beal starting from the previous expression of p(y | y > LLQ)? *Sebastien Bihorel, PharmD, PhD* PKPD Scientist Cognigen Corp Email: [email protected] <mailto:[email protected]> Phone: (716) 633-3463 ext. 323 Leonid Gibiansky wrote: > You can view it as: > > p(y ∩ y > LLQ) = 0 when y < LLQ > p(y ∩ y > LLQ) = p(y) when y > LLQ > > Another way to look on this is to say that > p(y | y > LLQ) is proportional to p(y) and should integrate to 1 > > integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that > immediately leads to l(t) below. > > As to the 0 to 1 restriction, l(t) is the density, not probability. It > should integrate to one but can be smaller or greater than 1 (any > positive number). > > Leonid > > > > -------------------------------------- > Leonid Gibiansky, Ph.D. > President, QuantPharm LLC > web: www.quantpharm.com > e-mail: LGibiansky at quantpharm.com > tel: (301) 767 5566 > > > > > Sebastien Bihorel wrote: >> Dear colleagues, >> >> In a paper dated from 2001, Dr. Beal presented several methods to >> handle data below the quantification limit (Journal of >> Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), >> including the M2 method that can be implemented in NONMEM 6 via the >> YLO functionnality. I would like to submit some questions to the list >> about the theory associated to the M2 method. >> >> I quote: >> "...the BQL observations can be discarded, and under the assumption >> that all the D(t) [the distribution of residual errors] are normal, >> the method of maximum conditional likelihood estimation can be >> applied to the remaining observations (method M2). With this method, >> the likelihood for the data, conditional on the fact that by design, >> all (remaining) observations are above the QL, is maximized with >> respect to the model parameters. The density function of the >> distribution on possible observations at time t, evaluated at y(t), >> is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the >> probability that an observation at time t is above the QL is 1- >> phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution >> function. Therefore, conditional on the observation at time t being >> above QL, the likelihood for y(t) is the ratio: >> l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- >> phi((QL-f(t)/g(t))) [equation 1]" >> >> Now, lets A and B be two events. The probability of A, given B is: >> p(A|B) = p(A∩B) / p(B) >> >> In the context of Dr. Beal's paper, I interpret A as simply the >> observation y(t) and B as the fact that y(t) is above QL, and thus >> have the following questions about equation 1: >> - it looks like p(A∩B) in equation 1 simplifies to the probability of >> y(t) given the model parameters, i.e. p(A). Which part of the problem >> allows this simplification? >> - how can l(t) be constrained between 0 and 1 if both numerator and >> denominator can vary between 0 and 1? >> >> Any comment from nmusers will be greatly appreciated. >>

Re: Theoretical questions about Beal's M2 method

From: Jun Shen Date: February 22, 2009 technical
Sebastien, I have been reading the BQL papers recently. Here is what I think. 1. M2 can be regarded as a truncated distribution method. You can look at it this way. Any probability density function (pdf) MUST be integrated into a cumulative distribution function (cdf) of 1 on its entire sample space, which is negative infinity to positive infinity for a normal distribution. If the BQL observations are discarded, the sample space is truncated from negative infinity to QL. This results in a cdf less than 1. In order to bring the cdf back to 1 on the new sample space (from QL to positive infinity), the pdf function has to be corrected by ( 1- phi((QL-f(t)/g(t))). Wikipedia has an exellent entry to explain the math ( http://en.wikipedia.org/wiki/Truncated_distribution) 2. The question I have is from M3 and M4 where the likelihood of an uncencored observation is expressed in pdf and the likelihood of a cencored observation is in cdf. Well, you can still combine everything together to get an objective function. But what is the definition of likelihood here exactly, pdf or cdf? 3. If we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function. For example, in M3 the likelihood of a censored observation is l(t)=phi((QL-f(t))/sqrt(g(t))). The smaller the f(t), the higher the phi(). Does that make sense? I believe the clear undersdanding on these theoretical concepts is just as important as to know those "know-how". Appreciate more comments. -- Jun Shen PhD PK/PD Scientist BioPharma Services Millipore Corporation 15 Research Park Dr. St Charles, MO 63304 Direct: 636-720-1589 On Mon, Feb 16, 2009 at 4:22 PM, Sebastien Bihorel < Sebastien.Bihorel > Thanks Leonid, > > However, there is still one point that is unclear to me. You have > demonstrated that p(y | y > LLQ) = p(y) / p(y>LOQ), given the assumptions of > the text. Now, this is a discrete probability, while l(t) is a likelihood... > How can one mathematically demonstrate the expression of l(t) used by Dr. > Beal starting from the previous expression of p(y | y > LLQ)? > > *Sebastien Bihorel, PharmD, PhD* > PKPD Scientist > Cognigen Corp > Email: sebastien.bihorel > sebastien.bihorel > Phone: (716) 633-3463 ext. 323 > > > Leonid Gibiansky wrote: > >> You can view it as: >> >> p(y $B"A(B y > LLQ) = 0 when y < LLQ >> p(y $B"A(B y > LLQ) = p(y) when y > LLQ >> >> Another way to look on this is to say that >> p(y | y > LLQ) is proportional to p(y) and should integrate to 1 >> >> integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that immediately >> leads to l(t) below. >> >> As to the 0 to 1 restriction, l(t) is the density, not probability. It >> should integrate to one but can be smaller or greater than 1 (any positive >> number). >> >> Leonid >> >> >> >> -------------------------------------- >> Leonid Gibiansky, Ph.D. >> President, QuantPharm LLC >> web: www.quantpharm.com >> e-mail: LGibiansky at quantpharm.com >> tel: (301) 767 5566 >> >> >> >> >> Sebastien Bihorel wrote: >> >>> Dear colleagues, >>> >>> In a paper dated from 2001, Dr. Beal presented several methods to handle >>> data below the quantification limit (Journal of Pharmacokinetics and >>> Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method >>> that can be implemented in NONMEM 6 via the YLO functionnality. I would like >>> to submit some questions to the list about the theory associated to the M2 >>> method. >>> >>> I quote: >>> "...the BQL observations can be discarded, and under the assumption that >>> all the D(t) [the distribution of residual errors] are normal, the method of >>> maximum conditional likelihood estimation can be applied to the remaining >>> observations (method M2). With this method, the likelihood for the data, >>> conditional on the fact that by design, all (remaining) observations are >>> above the QL, is maximized with respect to the model parameters. The density >>> function of the distribution on possible observations at time t, evaluated >>> at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the >>> probability that an observation at time t is above the QL is 1- >>> phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution >>> function. Therefore, conditional on the observation at time t being above >>> QL, the likelihood for y(t) is the ratio: >>> l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- >>> phi((QL-f(t)/g(t))) [equation 1]" >>> >>> Now, lets A and B be two events. The probability of A, given B is: p(A|B) >>> = p(A$B"A(BB) / p(B) >>> >>> In the context of Dr. Beal's paper, I interpret A as simply the >>> observation y(t) and B as the fact that y(t) is above QL, and thus have the >>> following questions about equation 1: >>> - it looks like p(A$B"A(BB) in equation 1 simplifies to the probability of >>> y(t) given the model parameters, i.e. p(A). Which part of the problem allows >>> this simplification? >>> - how can l(t) be constrained between 0 and 1 if both numerator and >>> denominator can vary between 0 and 1? >>> >>> Any comment from nmusers will be greatly appreciated. >>> >>>

Re: Theoretical questions about Beal's M2 method

From: Jun Shen Date: February 23, 2009 technical
Sebastien, I have been reading the BQL papers recently. Here is what I think. 1. M2 can be regarded as a truncated distribution method. You can look at it this way. Any probability density function (pdf) MUST be integrated into a cumulative distribution function (cdf) of 1 on its entire sample space, which is negative infinity to positive infinity for a normal distribution. If the BQL observations are discarded, the sample space is truncated from negative infinity to QL. This results in a cdf less than 1. In order to bring the cdf back to 1 on the new sample space (from QL to positive infinity), the pdf function has to be corrected by ( 1- phi((QL-f(t)/g(t))). Wikipedia has an exellent entry to explain the math ( http://en.wikipedia.org/wiki/Truncated_distribution) 2. The question I have is from M3 and M4 where the likelihood of an uncencored observation is expressed in pdf and the likelihood of a cencored observation is in cdf. Well, you can still combine everything together to get an objective function. But what is the definition of likelihood here exactly, pdf or cdf? 3. If we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function. For example, in M3 the likelihood of a censored observation is l(t)=phi((QL-f(t))/sqrt(g(t))). The smaller the f(t), the higher the phi(). Does that make sense? I believe the clear undersdanding on these theoretical concepts is just as important as to know those "know-how". Appreciate more comments. -- Jun Shen PhD PK/PD Scientist BioPharma Services Millipore Corporation 15 Research Park Dr. St Charles, MO 63304 Direct: 636-720-1589
Quoted reply history
On Mon, Feb 16, 2009 at 4:22 PM, Sebastien Bihorel < [email protected]> wrote: > Thanks Leonid, > > However, there is still one point that is unclear to me. You have > demonstrated that p(y | y > LLQ) = p(y) / p(y>LOQ), given the assumptions of > the text. Now, this is a discrete probability, while l(t) is a likelihood... > How can one mathematically demonstrate the expression of l(t) used by Dr. > Beal starting from the previous expression of p(y | y > LLQ)? > > *Sebastien Bihorel, PharmD, PhD* > PKPD Scientist > Cognigen Corp > Email: [email protected] <mailto: > [email protected]> > Phone: (716) 633-3463 ext. 323 > > > Leonid Gibiansky wrote: > >> You can view it as: >> >> p(y ∩ y > LLQ) = 0 when y < LLQ >> p(y ∩ y > LLQ) = p(y) when y > LLQ >> >> Another way to look on this is to say that >> p(y | y > LLQ) is proportional to p(y) and should integrate to 1 >> >> integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that immediately >> leads to l(t) below. >> >> As to the 0 to 1 restriction, l(t) is the density, not probability. It >> should integrate to one but can be smaller or greater than 1 (any positive >> number). >> >> Leonid >> >> >> >> -------------------------------------- >> Leonid Gibiansky, Ph.D. >> President, QuantPharm LLC >> web: www.quantpharm.com >> e-mail: LGibiansky at quantpharm.com >> tel: (301) 767 5566 >> >> >> >> >> Sebastien Bihorel wrote: >> >>> Dear colleagues, >>> >>> In a paper dated from 2001, Dr. Beal presented several methods to handle >>> data below the quantification limit (Journal of Pharmacokinetics and >>> Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method >>> that can be implemented in NONMEM 6 via the YLO functionnality. I would like >>> to submit some questions to the list about the theory associated to the M2 >>> method. >>> >>> I quote: >>> "...the BQL observations can be discarded, and under the assumption that >>> all the D(t) [the distribution of residual errors] are normal, the method of >>> maximum conditional likelihood estimation can be applied to the remaining >>> observations (method M2). With this method, the likelihood for the data, >>> conditional on the fact that by design, all (remaining) observations are >>> above the QL, is maximized with respect to the model parameters. The density >>> function of the distribution on possible observations at time t, evaluated >>> at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the >>> probability that an observation at time t is above the QL is 1- >>> phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution >>> function. Therefore, conditional on the observation at time t being above >>> QL, the likelihood for y(t) is the ratio: >>> l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- >>> phi((QL-f(t)/g(t))) [equation 1]" >>> >>> Now, lets A and B be two events. The probability of A, given B is: p(A|B) >>> = p(A∩B) / p(B) >>> >>> In the context of Dr. Beal's paper, I interpret A as simply the >>> observation y(t) and B as the fact that y(t) is above QL, and thus have the >>> following questions about equation 1: >>> - it looks like p(A∩B) in equation 1 simplifies to the probability of >>> y(t) given the model parameters, i.e. p(A). Which part of the problem allows >>> this simplification? >>> - how can l(t) be constrained between 0 and 1 if both numerator and >>> denominator can vary between 0 and 1? >>> >>> Any comment from nmusers will be greatly appreciated. >>> >>>

RE: Theoretical questions about Beal's M2 method

From: Jae Eun Ahn Date: February 23, 2009 technical
Dear Jun, For your thought # 3, it is not quite true that “If we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function“. Assuming f(t) is smaller than QL (which is usually the case for BQL data), the argument for the PHI function becomes a positive number. When the approximated value for cdf is returned, the final output is actually “1-phi” for a positive argument (note that NONMEM takes the absolute value for the argument when it computes the integral and returns the value according to the sign of the argument; see below). So using your terms, the smaller the f(t), yes, the higher the phi(); however, the cdf will be smaller because it is actually 1-phi(). ; below is a part of NONMEM’s PHI function. ARG=X IF (ARG.LT.0) PHI=PHITL(ARG) IF (ARG.GT.0) PHI=ONE-PHITL(ARG) IF (ARG.EQ.0) PHI=P5 Jae Eun Ahn
Quoted reply history
________________________________ From: [email protected] [mailto:[email protected]] On Behalf Of Jun Shen Sent: Sunday, February 22, 2009 8:09 PM To: Sebastien Bihorel Cc: Leonid Gibiansky; [email protected] Subject: Re: [NMusers] Theoretical questions about Beal's M2 method Sebastien, I have been reading the BQL papers recently. Here is what I think. 1. M2 can be regarded as a truncated distribution method. You can look at it this way. Any probability density function (pdf) MUST be integrated into a cumulative distribution function (cdf) of 1 on its entire sample space, which is negative infinity to positive infinity for a normal distribution. If the BQL observations are discarded, the sample space is truncated from negative infinity to QL. This results in a cdf less than 1. In order to bring the cdf back to 1 on the new sample space (from QL to positive infinity), the pdf function has to be corrected by ( 1- phi((QL-f(t)/g(t))). Wikipedia has an exellent entry to explain the math ( http://en.wikipedia.org/wiki/Truncated_distribution) 2. The question I have is from M3 and M4 where the likelihood of an uncencored observation is expressed in pdf and the likelihood of a cencored observation is in cdf. Well, you can still combine everything together to get an objective function. But what is the definition of likelihood here exactly, pdf or cdf? 3. If we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function. For example, in M3 the likelihood of a censored observation is l(t)=phi((QL-f(t))/sqrt(g(t))). The smaller the f(t), the higher the phi(). Does that make sense? I believe the clear undersdanding on these theoretical concepts is just as important as to know those "know-how". Appreciate more comments. -- Jun Shen PhD PK/PD Scientist BioPharma Services Millipore Corporation 15 Research Park Dr. St Charles, MO 63304 Direct: 636-720-1589 On Mon, Feb 16, 2009 at 4:22 PM, Sebastien Bihorel <[email protected]> wrote: Thanks Leonid, However, there is still one point that is unclear to me. You have demonstrated that p(y | y > LLQ) = p(y) / p(y>LOQ), given the assumptions of the text. Now, this is a discrete probability, while l(t) is a likelihood... How can one mathematically demonstrate the expression of l(t) used by Dr. Beal starting from the previous expression of p(y | y > LLQ)? *Sebastien Bihorel, PharmD, PhD* PKPD Scientist Cognigen Corp Email: [email protected] <mailto:[email protected]> Phone: (716) 633-3463 ext. 323 Leonid Gibiansky wrote: You can view it as: p(y ∩ y > LLQ) = 0 when y < LLQ p(y ∩ y > LLQ) = p(y) when y > LLQ Another way to look on this is to say that p(y | y > LLQ) is proportional to p(y) and should integrate to 1 integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that immediately leads to l(t) below. As to the 0 to 1 restriction, l(t) is the density, not probability. It should integrate to one but can be smaller or greater than 1 (any positive number). Leonid -------------------------------------- Leonid Gibiansky, Ph.D. President, QuantPharm LLC web: www.quantpharm.com e-mail: LGibiansky at quantpharm.com tel: (301) 767 5566 Sebastien Bihorel wrote: Dear colleagues, In a paper dated from 2001, Dr. Beal presented several methods to handle data below the quantification limit (Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method that can be implemented in NONMEM 6 via the YLO functionnality. I would like to submit some questions to the list about the theory associated to the M2 method. I quote: "...the BQL observations can be discarded, and under the assumption that all the D(t) [the distribution of residual errors] are normal, the method of maximum conditional likelihood estimation can be applied to the remaining observations (method M2). With this method, the likelihood for the data, conditional on the fact that by design, all (remaining) observations are above the QL, is maximized with respect to the model parameters. The density function of the distribution on possible observations at time t, evaluated at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the probability that an observation at time t is above the QL is 1- phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution function. Therefore, conditional on the observation at time t being above QL, the likelihood for y(t) is the ratio: l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- phi((QL-f(t)/g(t))) [equation 1]" Now, lets A and B be two events. The probability of A, given B is: p(A|B) = p(A∩B) / p(B) In the context of Dr. Beal's paper, I interpret A as simply the observation y(t) and B as the fact that y(t) is above QL, and thus have the following questions about equation 1: - it looks like p(A∩B) in equation 1 simplifies to the probability of y(t) given the model parameters, i.e. p(A). Which part of the problem allows this simplification? - how can l(t) be constrained between 0 and 1 if both numerator and denominator can vary between 0 and 1? Any comment from nmusers will be greatly appreciated.

RE: Theoretical questions about Beal's M2 method

From: Jae Eun Ahn Date: February 27, 2009 technical
Dear Jun and NMusers, I would like to make a correction to my email below. Just to make things clearer, let me restate the way that NONMEM’s PHI function works: internally it takes the absolute value of the argument (X) and computes the approximate area of normal density curve from |X| to the positive infinity. Since the standard normal density curve is symmetric around 0, the approximated value is actually the same as the area from minus infinity to -|X|. However, regardless of whether X is positive or negative, the PHI function returns the cdf. Assuming f(t) is smaller than QL, the argument ((QL-f(t))/g(t)) is positive. The SMALLER f(t) makes the distance from QL larger (assuming g(t) is same or smaller) and so is the argument. Thus, the area between the argument and positive infinity gets smaller and the cdf gets BIGGER. I missed the part that the argument is not just f(t) but the difference from QL. Therefore, as Jun pointed out, the smaller f(t), the higher the cdf. This makes sense because the censored observation with smaller f(t) is more likely to be BQL than the one with bigger f(t) (in M3, the likelihood for BQL observation is the probability that the observation is BQL). For example, if the f(t) is same as QL, there is a 50% chance that an observation is BQL. If f(t) is 0, the probability of being BQL will be 1. However, it may not necessarily be the case that the censored observation with smaller f(t) will have more contribution to the objective function. For uncensored observations, the closer f(t) to the observations is more favorable as the algorithm tries to maximize the likelihood (actually minimize -2 log likelihood). For censored observations, the smaller f(t) is more favorable as it makes cdf bigger. The answer we will get from the software should be somewhere in between where the likelihood for all the data (the product) is maximized. Thank you, Jae Eun
Quoted reply history
________________________________ From: [email protected] [mailto:[email protected]] On Behalf Of Ahn, Jae Eun Sent: Monday, February 23, 2009 12:02 PM To: Jun Shen Cc: [email protected] Subject: RE: [NMusers] Theoretical questions about Beal's M2 method Dear Jun, For your thought # 3, it is not quite true that “If we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function“. Assuming f(t) is smaller than QL (which is usually the case for BQL data), the argument for the PHI function becomes a positive number. When the approximated value for cdf is returned, the final output is actually “1-phi” for a positive argument (note that NONMEM takes the absolute value for the argument when it computes the integral and returns the value according to the sign of the argument; see below). So using your terms, the smaller the f(t), yes, the higher the phi(); however, the cdf will be smaller because it is actually 1-phi(). ; below is a part of NONMEM’s PHI function. ARG=X IF (ARG.LT.0) PHI=PHITL(ARG) IF (ARG.GT.0) PHI=ONE-PHITL(ARG) IF (ARG.EQ.0) PHI=P5 Jae Eun Ahn ________________________________ From: [email protected] [mailto:[email protected]] On Behalf Of Jun Shen Sent: Sunday, February 22, 2009 8:09 PM To: Sebastien Bihorel Cc: Leonid Gibiansky; [email protected] Subject: Re: [NMusers] Theoretical questions about Beal's M2 method Sebastien, I have been reading the BQL papers recently. Here is what I think. 1. M2 can be regarded as a truncated distribution method. You can look at it this way. Any probability density function (pdf) MUST be integrated into a cumulative distribution function (cdf) of 1 on its entire sample space, which is negative infinity to positive infinity for a normal distribution. If the BQL observations are discarded, the sample space is truncated from negative infinity to QL. This results in a cdf less than 1. In order to bring the cdf back to 1 on the new sample space (from QL to positive infinity), the pdf function has to be corrected by ( 1- phi((QL-f(t)/g(t))). Wikipedia has an exellent entry to explain the math ( http://en.wikipedia.org/wiki/Truncated_distribution) 2. The question I have is from M3 and M4 where the likelihood of an uncencored observation is expressed in pdf and the likelihood of a cencored observation is in cdf. Well, you can still combine everything together to get an objective function. But what is the definition of likelihood here exactly, pdf or cdf? 3. If we use cdf to represent the contribution of censored observations, then the smaller concentrations will have larger contribution to the objective function. For example, in M3 the likelihood of a censored observation is l(t)=phi((QL-f(t))/sqrt(g(t))). The smaller the f(t), the higher the phi(). Does that make sense? I believe the clear undersdanding on these theoretical concepts is just as important as to know those "know-how". Appreciate more comments. -- Jun Shen PhD PK/PD Scientist BioPharma Services Millipore Corporation 15 Research Park Dr. St Charles, MO 63304 Direct: 636-720-1589 On Mon, Feb 16, 2009 at 4:22 PM, Sebastien Bihorel <[email protected]> wrote: Thanks Leonid, However, there is still one point that is unclear to me. You have demonstrated that p(y | y > LLQ) = p(y) / p(y>LOQ), given the assumptions of the text. Now, this is a discrete probability, while l(t) is a likelihood... How can one mathematically demonstrate the expression of l(t) used by Dr. Beal starting from the previous expression of p(y | y > LLQ)? *Sebastien Bihorel, PharmD, PhD* PKPD Scientist Cognigen Corp Email: [email protected] <mailto:[email protected]> Phone: (716) 633-3463 ext. 323 Leonid Gibiansky wrote: You can view it as: p(y ∩ y > LLQ) = 0 when y < LLQ p(y ∩ y > LLQ) = p(y) when y > LLQ Another way to look on this is to say that p(y | y > LLQ) is proportional to p(y) and should integrate to 1 integral(p(y)) over y > LLQ is ( 1- phi((LLQ-f(t)/g(t))) that immediately leads to l(t) below. As to the 0 to 1 restriction, l(t) is the density, not probability. It should integrate to one but can be smaller or greater than 1 (any positive number). Leonid -------------------------------------- Leonid Gibiansky, Ph.D. President, QuantPharm LLC web: www.quantpharm.com e-mail: LGibiansky at quantpharm.com tel: (301) 767 5566 Sebastien Bihorel wrote: Dear colleagues, In a paper dated from 2001, Dr. Beal presented several methods to handle data below the quantification limit (Journal of Pharmacokinetics and Pharmacodynamics, Vol. 28, No. 5, October 2001), including the M2 method that can be implemented in NONMEM 6 via the YLO functionnality. I would like to submit some questions to the list about the theory associated to the M2 method. I quote: "...the BQL observations can be discarded, and under the assumption that all the D(t) [the distribution of residual errors] are normal, the method of maximum conditional likelihood estimation can be applied to the remaining observations (method M2). With this method, the likelihood for the data, conditional on the fact that by design, all (remaining) observations are above the QL, is maximized with respect to the model parameters. The density function of the distribution on possible observations at time t, evaluated at y(t), is 1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) and the probability that an observation at time t is above the QL is 1- phi((QL-f(t)/g(t))), where phi is the cumulative normal distribution function. Therefore, conditional on the observation at time t being above QL, the likelihood for y(t) is the ratio: l(t)=(1/sqrt( 2*pi*g(t) ))*exp( -0.5*( y(t)-f(t) )^2/g(t) ) /( 1- phi((QL-f(t)/g(t))) [equation 1]" Now, lets A and B be two events. The probability of A, given B is: p(A|B) = p(A∩B) / p(B) In the context of Dr. Beal's paper, I interpret A as simply the observation y(t) and B as the fact that y(t) is above QL, and thus have the following questions about equation 1: - it looks like p(A∩B) in equation 1 simplifies to the probability of y(t) given the model parameters, i.e. p(A). Which part of the problem allows this simplification? - how can l(t) be constrained between 0 and 1 if both numerator and denominator can vary between 0 and 1? Any comment from nmusers will be greatly appreciated.