Re: Validation
From: Pascal Girard <pg@upcl.univ-lyon1.fr>
Subject: Re: Validation
Date: Wed, 03 Feb 1999 17:24:57 +0100
Dear Vladimir,
Just a note on posterior predictive check (PPC).
I used this technique for validating the compliance model. It is described in the recently published paper:
Girard, P., Blaschke, T.F., Kastrissios, H., and Sheiner, L.B. A Markov mixed effect regression model for drug compliance. Stat.Med. 17(20):2313-2334, 1998,
and I recently posted the NONMEM code for compliance model on the NONMEM repository in Palo Alto ( http://pkpd.icon.palo-alto.med.va.gov/), but not the code for the PPC, since it is a complex mixture of Splus, UNIX C-shell and NONMEM.
We found the idea of the PPC in an outstanding (from my point of view) paper:
Belin, T.R. and Rubin, D.B. The analysis of repeated-measures data on schizophrenic reaction times using mixture models. Stat.Med. 14:747-768, 1995.
The appealing idea with PPC, is that you validate your model using a statistic, non sufficient to describe your data, but that may interest directly the clinician. For example, with the compliance model, I used either the longest drug holiday, or the non therapeutic coverage posterior distribution, which speaks much more to a clinician than telling him that you have 30% interindividual variability in the logit of the marginal probability of not taking the treatment. For a population PK model, you can imagine loking at the distribution of the % of time during which the concentrations are within a therapeutic window, which once again can be more interesting than knowing you have 70%, 30% and 40% interindividual variability on Ka, CL and V and 50% residual variability.
So the idea of PPC is to simulate the posterior distribution of a (non sufficient) statistic (NSS) and to compare this distribution with the observed statistic on your actual data. If there is no contradiction between the 2 you accept your model. You can imagine a model for which NSS1 is in agreement with the posterior distribution simulated using the model, and NSS2 another statistic in contradiction with it. This may not be a problem if what is really important for the clinician is NSS1.
>From a technical point of view, in order to do PPC, you need posterior distribution of <<all>>, fixed and random, parameters of your population model. And here is the difficulty.
NONMEM does not give you the posterior distribution of <<all>>, fixed and random, parameters, because you do not define any prior distribution. For THETAs you can approximate the posterior distribution by supposing it is (multivariate) normal (MVN) with mean equals to THETA, the final estimate, and covariance given by the asymptotic covariance matrix, obtained using $COV. But even in this case, when yo sample from this distribution you may find negative parameters (e.g. CL or V), because normal distribution is not constrained to be positive. So either you will have to truncate your resampled parameters, or to suppose that your parameter is log normally distributed. The problem is even worse when you want to resample the OMEGA and SIGMA matrices using a MVN with mean OMEGA and SIGMA, the final estimates, and variance given by the asymptotic covariance matrix.
Another solution is to use a fully bayesian method, with MCMC algorithms, as the ones implemented in POPKAN or the less expensive PHARM-BUGS. This softwares allow you to define the prior distribution of all parameters, estimate the posterior distributions, and then to compute posterior distribution of any statistic you want.
When you don't have access to this bayesian technique, or you don't like it, you can simulate the posterior distribution using a parametric bootstrap, as we did for the compliance model. Briefly, let Y be your observed data set; M the final model fitted to Y; TOS=(THETA, OMEGA, SIGMA), the final parameter estimate; and S(Y) a statistic computed on Y. You can approximate the posterior distribution of S by doing:
1. Simulate new set of obsevations Y* using THETA, etas sampled from MVN(0,OMEGA) and errors sampled from MVN(0,SIGMA)
2. fit M to Y* and get new estimates TOS*
3. simulate new set of obsevations Y** using THETA*, etas sampled from MVN(0,OMEGA*) and errors sampled from MVN(0,SIGMA*)
4. compute S(Y**)
Repeat steps 1-4 a great number of times (at least 100). The fact that S is computed on Y** and not Y* (which would be less CPU intensive ...) is to approximate the posterior distribution of TOS given the data. Notice also that step 1 and 3 are easily implemented within NONMEM using $SIMUL.
Concerning CPU, the PPC results, presented on the Stat in Med paper for the compliance model, took 1 week of computation using 3 SUN Sparcstation and one UltraSparc (this one produces more than 60% of the iterations). No comments!
So there is space either for improved, less CPU intensive, methodology on PPC, or CPU improvement (UltraPENTIUM III, 450 GigaHz), and probably for both ...
I'm not sure all this helps industry today. But why not tomorrow ...
Best,
Pascal
--
Pascal Girard
-------------------------------------------------------------------
Service Pharmacologie Clinique
BP 3041,162, avenue Lacassagne
69394 LYON Cedex 03 FRANCE
PG@upcl.univ-lyon1.fr
Tel : +33 (0)4 78 78 57 26
Fax : +33 (0)4 78 78 57 19