Re: Bootstrap resampling!
From: Nick Holford <n.holford@auckland.ac.nz>
Subject: Re: Bootstrap resampling!
Date: Wed, 28 Mar 2001 10:01:59 +1200
Paul,
Paul Williams wrote:
>
> Less synthetic percentile bootstrap:
>
A few months ago we discussed the bootstrap terminology in nmusers and the terms parametric vs non-parametric bootstrap were mentioned. Parametric might also be called "all simulation" or "totally synthetic" because all the bootstrap data sets are simulated from a parametric model for the fixed and random effects. The non-parametric bootstrap involves resampling from the original data and typically the sampling unit is an individual subject (when applied to mixed effect models).
The next thing to consider is what to do with the 1000 estimates of a parameter such as clearance (CL) which you have obtained from analysing 1000 bootstrap data sets. At this point it does not matter if you used the parametric or non-parametric bootstrap method. You may be interested in a confidence interval for the estimate. (I prefer the term "credible interval" which is popular among BUGSy (Bayesian) types -- both are CI so you can interpret CI as you prefer).
There are two basic approaches here. The first we might call parametric and the second non-parametric. The parametric approach first computes the standard error from the empirical distribution of 1000 CL estimates then uses some formula such as CI=mean+/-1.96*SE to predict the asymptotic 95% CI. As you point out this can sometimes give you crazy results e.g. negative CL values if SE is large. The non-parametric approach is the one you describe below where you rank the estimates and pick the CL values that are at the 2.5% and 97.5% quantiles. This is much more robust and cannot predict negative CL values. It may also produce an asymmetrical CI which is fine by me but impossible using the simple parametric approach.
So I think your term "less synthetic percentile" is what one might call the non-parametric naive quantile approach. There are other, less naive, methods. Davison has written an excellent book (Bootstrap Methods and Their Application (Cambridge Series in Statistical and Probabilistic Mathematics , No 1) by A. C. Davison, D. V. Hinkley) (which Steve Duffull "stole" from me last November so I cannot refer to it right now) which describes these in detail. It includes "studentizing" the quantiles by incorporating the standard error of the estimate of CL for each replication. Of course this is rarely feasible using NONMEM because getting the covariance step to run for every replication is very unlikely!
Paul Williams wrote:
> There are two approaches to applying the bootstrap method. The first is the standard bootstrap which would assume some type of distribution (usually the normal distribution) throughout the entire modeling process (both development and model checking) and therefore relies on the formulae that are used to calculate means, standard errors, 95% CIs etc. The percentile bootstrap is less reliant on the formulae that are a function of an assumed distribution because in the end it ranks the element(s) of interest and takes the 2.5th percentile element and the 97.5th percentile element and constructs the 95% confidence interval for that element as the distance between these two. For example I have previously been interested in a ppk model for an antifungal agent. Cl = theta1 * clcr + theta2. I was interested in the 95% CI for theta1. I constructed 1000 bootstrap data sets and estimated the model for each of the 1000 bootstrap data sets. Rather than plugging the 1000 values into a formula that assumes a normal distribution to calculate the standard error, then the 95% CI, I ranked the 1000 values for theta1 and took the 25th as the lower boundary for the 95% CI and the 975th as the upper boundary. It should be noted that when using the percentile method one must construct at least 1000 data sets and re-estimate the model on all 1000. So I call this a "less synthetic" approach because (1) it is less reliant on formulae and underlying assumptions about distributions and (2) the intervals come directly from a ranking of the data not from a series of calculations. The percentile bootstrap can have the advantage of avoiding nonsense estimates which may sometimes come about when the standard normal distribution is assumed. For example, I have occasionally had results that indicated the lower boundary of a 95% CI for a coefficient of variation for inter-subject variability was intractable (i.e would be less than 0 which would not make sense). This won't happen with the percentile bootstrap.
>
[stuff deleted -- NH]
Paul Williams wrote:
> A comment for the good and welfare of all: It does not seem to me that bootstrapping residuals is the appropriate approach for population PK or PD modeling. I have looked at this and the within subject residuals are correlated for population models. The exception would be if cross-sectional sampling was done. So it seems to me that one is restructuring the entire data set when the residuals from subject A are assigned to subject B. Also, sampling of residuals assumes that we know the population model(s) with certainty. I am not sure one can make such an assumption. The safe approach is to randomly sample individuals (with entire data associated with each individual) with replacement to create bootstrap data sets.
I agree. The sampling unit needs to be the subject not the residual in order to preserve the within subject correlations as well as the need to allow for heteroscedasticity. The residual sampling approach assumes that the residual is the same typical size for all observations.
Nick
--
Nick Holford, Divn Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x6730 fax:373-7556
http://www.phm.auckland.ac.nz/Staff/NHolford/nholford.htm