RE: Simulation vs. actual data

From: Kenneth Kowalski Date: June 15, 2005 technical Source: cognigencorp.com
From: "Kowalski, Ken" Ken.Kowalski@pfizer.com Subject: RE: [NMusers] Simulation vs. actual data Date: Wed, June 15, 2005 4:49 pm NMusers, We seem to be using terms like confidence bands and prediction intervals somewhat loosely and interchangeably. To provide some clarity here are some definitions I use for the different types of statistical intervals that are often constructed for different purposes: Tolerance Interval - If we simulate DVs from our model and use the mean +/- some multiple of the SD (across subjects at a particular time point) or use the percentile method to obtain a lower and upper bound (of the individual responses at a particular time point), the resulting interval is something akin to what is known in the statistical literature as a "tolerance interval". Here the interest is in characterizing the interval that contains a certain percentage of the individual observations. However, to be a truly valid tolerance interval, such an interval should also take into account the uncertainty in the parameter estimates and a confidence level is associated with the interval. Tolerance intervals are used to makes statements like "I'm 90% confident that the interval (LTL, UTL) will contain 80% of the individual observations (i.e, if we repeat the study an infinite number of times, 90% of the intervals should contain 80% of the individual observations)." Prediction Interval - If we simulate DVs from our model and calculate the mean (across subjects) conditioning on some specific design (with a specified number of subjects, n) and repeat this process for N simulated datasets where we use a different set of population estimates based on the parameter uncertainty for each of the N simulated datasets, then the grand mean (of the N means) +/- some multiple of the SD (across the N means) or use the corresponding percentile method to obtain a lower and upper bound, the resulting interval is a prediction interval on the future mean response of n subjects. A valid prediction interval takes into account the parameter uncertainty as well as the sampling variation in Omega (sampling of subjects) and Sigma (sampling of observations). Prediction intervals are used to make statements like "I'm 90% confident that the interval (LPL, UPL) contains the future mean response of n subjects (i.e., if we repeat the study an infinite number of times, 90% of the intervals will contain the future mean response of n subjects)." Confidence Interval - If we simulate DVs from our model in a similar fashion as for the prediction interval but where we choose n to be infinitely large in computing the mean across the n subjects then we are effectively averaging out the sampling variation in Omega and Sigma and the resulting interval only reflects the uncertainty in the parameter estimates of the model. Confidence intervals are used to make statements like "I'm 90% confident that the interval (LCL, UCL) contains the true population mean response (i.e., if we repeat the study an infinite number of times, 90% of the intervals will contain the true population mean response). In general confidence intervals have the shortest width followed by prediction intervals and then tolerance intervals. However, prediction intervals can be wider than tolerance intervals when n (number of future subjects) is small. For example, when n=1 where we want to predict the value of a future single observation the prediction interval is typically wider than a tolerance interval. For more information on different types of statistical intervals, see Hahn, "Statistical Intervals for a Normal Population, Part I. Tables, Examples and Applications", J. of Quality Technology, 1970; 2:115-125. I agree with Liping that taking into account parameter uncertainty by assuming that the parameters estimates come from a multivariate normal distribution using the population estimates and corresponding covariance matrix of the estimates can be labor-intensive. However, this does not mean that it is computationally-intensive, in fact quite the contrary. One can generate an N=1000 sample of population parameters (thetas, omegas, and sigmas) from the multivariate normal distribution in a matter of minutes even for mean parameter vectors and covariance matrices of the dimensions typical of a pop PK or pop PK/PD model. The laborious aspect of the work comes from the fact that we don't have automated utilities to do this work and so we have to do a lot of custom coding (pre- and post-processing) to generate the sample parameter estimate vectors (thetas, omegas, and sigmas), pass them in to NONMEM to perform the simulations, and then post-process the simulated results to calculate the various intervals of interest. However, the process can also be computationally-intensive if the distribution of the parameter estimates does not follow a multivariate normal distribution. In this setting we may have to perform non-parametric bootstrapping (sample with replacement of subjects from the observed dataset) to get the N=1000 sample of population parameters from the empirical bootstrap distribution from fitting the model to each of 1000 bootstrap datasets. Unless, I'm already performing nonparametric bootstrapping for other purposes, I typically assume the multivariate normal distribution when taking into account parameter uncertainty simply because it is computationally less intensive. My philosophy is that it is better to do something to take into account parameter uncertainty rather than to completely ignore it. Ken
Jun 14, 2005 Toufigh Gordi Simulation vs. actual data
Jun 14, 2005 Nick Holford Re: Simulation vs. actual data
Jun 14, 2005 Liping Zhang Re: Simulation vs. actual data
Jun 15, 2005 Kenneth Kowalski RE: Simulation vs. actual data
Jun 25, 2005 Nick Holford Re: Simulation vs. actual data
Jul 05, 2005 Kenneth Kowalski RE: Simulation vs. actual data
Jul 12, 2005 Nick Holford Re: Simulation vs. actual data
Jul 12, 2005 Juan Jose Perez Ruixo RE: Simulation vs. actual data
Jul 12, 2005 Nick Holford Re: Simulation vs. actual data
Jul 13, 2005 Juan Jose Perez Ruixo RE: Simulation vs. actual data
Jul 14, 2005 Kenneth Kowalski RE: Simulation vs. actual data
Jul 14, 2005 Juan Jose Perez Ruixo RE: Simulation vs. actual data
Jul 14, 2005 Nick Holford Re: Simulation vs. actual data
Jul 15, 2005 Kenneth Kowalski RE: Simulation vs. actual data
Jul 16, 2005 Kenneth Kowalski RE: Simulation vs. actual data