RE: Simulation vs. actual data
From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE: [NMusers] Simulation vs. actual data
Date: Sat, July 16, 2005 9:56 am
Juanjo,
If you perform 50 bootstraps but you are going to perform 2500 trial replicates
you shouldn't just replicate the 50 bootstrap sets 50 times to get 2500...that's
not random. I suppose you could bootstrap with replacement the 50 bootstrap sets
to generate 2500 sets of population parameters, however, I wouldn't advise it. I
you're going to generate 2500 trial replicates presumably it is because you are
interested in getting accurate tail probabilities. 50 bootstraps is not enough to
accurately nail down the tail probabilities and a "double bootstrap" procedure to
"artificially" generate 2500 bootstrap sets from an original 50 doesn't help (i.e.,
the 2500 are ultimately based on only 50 unique sets). However, the 50 bootstrap
sets are probably good enough to estimate the sample covariance matrix from which
you can compare to the covariance matrix of the parameter estimates from the NONMEM
output ($COV step). In fact, this would be a good diagnostic to decide whether or
not to use the parametric multivariate normal assumption for the posterior distribution
when taking into account parameter uncertainty.
If the sample covariance matrix from 50 bootstraps is in fairly close agreement to
the covariance matrix reported by NONMEM then I would use the parametric multivariate
normal distribution using the covariance matrix estimated by NONMEM to account for
parameter uncertainty (i.e., generate 2500 unique sets of population parameters from
the multivariate normal posterior distribution). On the other hand, if the sample
covariance matrix from the 50 bootstraps is substantially different than that estimated
by NONMEM, then you may be in a situation where it is not reasonable to assume that the
posterior distribution is multivariate normal and you may need to generate considerably
more than 50 bootstrap sets from a nonparametric procedure although I don't think you
necessarily need 2500 bootstraps...1000 bootstraps is probably sufficient if you want
to estimate the 5th and 95th percentiles.
Note that when the multivariate normal assumption doesn't hold it is often because the
distribution is asymmetric. This is why nonparametric bootstrap confidence intervals
on the parameter estimates can be quite asymmetric when they do differ from Wald-based
intervals using the NONMEM standard errors. To really nail down this asymmetry in the
distribution to get accurate tail probabilities really requires a lot more than 50
bootstraps. So, if you routinely advocate doing nonparametric bootstrapping to account
for parameter uncertainty out of concern for this potential asymmetry then I think
you've got to bite the bullet and do more than 50 bootstraps.
Your suggestion of doing 50 bootstraps prompts me to respond to the issue Nick raised
regarding what to do when one is willing to accept a model where the COV step fails
but long runtimes prohibit performing the nonparametric bootstrap procedure to
generate 500-1000 sets of parameters from the empirical posterior distribution. If
the runtimes are not too long such that 50 bootstraps are feasible, then I would
advocate calculating the sample covariance matrix and assume the multivariate normal
distribution holds to generate 1000 sets of population parameters to take into account
parameter uncertainty. Basically, I think it is better to do something to take into
account parameter uncertainty than to do nothing at all...we don't really know when
parameter uncertainty may or may not be important unless we account for it and see
how much impact it has. Nevertheless, I'm sure that situations arise where runtimes
are so long that even performing 50 bootstraps is prohibitive.
Ken
_______________________________________________________