Re: Bootstrap resampling! -> Randomization test

From: Smith Brian P Date: April 07, 2001 technical Source: cognigencorp.com
From: SMITH_BRIAN_P@Lilly.com Subject: Re: Bootstrap resampling! -> Randomization test Date: Sat, 07 Apr 2001 10:40:45 -0500 I just love the original question and Lewis's reply. I am going to add my 2 cents as well below. Sincerely, Brian Smith I just want to point out that there is no reason in hypothesis testing situations that the null hypothesis has to be that there is no treatment differences. There is no reason, for instance, for a test of the means that the null hypothesis cannot be mu1 - mu2 = 5. What is interesting is that the region of a 95% confidence interval are all values for which the null hypothesis cannot be rejected. That is if my 95% confidence interval were (2, 7). Then we would fail to reject the null hypothesis that mu1 - mu2 = 2.1 (at the 0.05 level) and we would fail to reject the null hypotheses that mu1 - mu2 = 6.9 (at the 0.05 level). (In addition we would fail to reject all null hypotheses between 2 and 7.) However, we would reject the null hypothesis that the difference was equal to 1.9. The confidence interval can define how much treatment benefit to expect. In essence the confidence interval is generated with the same mechanism as hypotheses tests. > So a simple answer to your question is: we need valid (i.e., correct > performance under the null) hypothesis testing procedures whenever we > are in a testing situation (as above). And since confidence intervals are developed by a similar mechanism as hypotheses tests, equivalently we need the correct coverage probabilities for confidence intervals. > As scientists (as opposed to advocators), I believe there is another > use of testing, and this is the one that RA Fisher advocated: we are so eager to > find > patterns in our data that we need some reality check on this tendency. > So an hypothesis test of the null hypothesis that "nothing interesting is > going on" may be useful, if it is not rejected, to encourage us to abandon > fruitless quests to find signal where there is likely only noise. I agree 100%. If we understand what a clinically significant effect is, we can in addition use confidence intervals to further help us discern if anything interesting is or could be going on. > Stephen Duffull wrote: >> >> Hi all >> >> To pose some simple questions. If model misspecification >> affects the validity of the likelihood ratio test (which is >> not an altogether unexpected finding) and when analysing >> actual data (rather than simulated data) we always have >> model misspecification, then the LRT will almost never be >> valid in real life. Therefore it would seem appropriate to >> ignore deltaOBJ values in a statistical sense altogether (ie >> not use them for hypothesis testing). Perhaps we should ask >> ourselves: when do we need statistical verification of our >> model? For non-nested models, when the LRT is inappropriate >> anyway, have we always required statistical verification? >> Certainly most publications that discuss non-nested models >> do not supply statistical validation for their choice. >> Well as I have tried to make clear hypothesis tests are not the only type of "statistical verification" that can be done. Whether or not you have model misspecification or not the "hypothesis test" you are talking about is still valid. A p-value tells you the following, given that model A is the true model, what is the chance that I would get the values of model B or something more extreme. The only way we can easily extract a p-value from this situation is with nested models. In which case you a p-value just becomes a function of the difference of -2*ln(L)'s. That is if we believe that the likelihood expresses the degree of fit of a model (NONMEM users implicitly assume this since it is a maximum likelihood based program). A p-value is just a reformulation of the likelihoods in a form that is more easily interpretable. (Note: the -2*ln(L) test is asymptotic. Which means that the p-value only approaches its correct value as the sample size approaches infinity. Just because, however, it is not exact does not mean that it is not useful.) It is harder to justify one non-nested model over another. The major principle is that if 2 models have the same degree of complexity the one with the largest likelihood (smallest -2*ln(L)) is preferred. This principle obviously cannot be applied in a vacuum. Suppose that the goal is future dose adjustments. You have model a) that has an easily obtainable variable which has a slightly smaller likelihood than model b) that has a measure that measures the same quantity in a more complex fashion. It is unlikely that clinicians will use measure b) in clinical practice, but a) would not be a problem. Model a) is preferable. There is also the case in which one model just makes more since than another but has a slightly smaller likelihood. Go with the one that makes since. However, I think we conceive of "tests" for non-nested models. In the case where you have non-nested models with differing degrees of complexity, you have a problem. This is why AIC and BIC exist. The problem is that in the statistics literature there seem to be 50 of these sort of criteria. Which is correct? No-one knows. Let us suppose that you pick your model with BIC. Model A gives you a BIC of 272 and model B gives 280. A is smaller so we prefer A. But should we? Is 272 really different to 280? BIC and the difference in BIC between 2 models are statistics!! Guess what, we could do bootstrapping or randomization testing on the difference in the BICs. Stephen states correctly that we do not apply statistical criteria for the justification of non-nested models. This is mostly because it would be hard to do. With modern computing power this is no longer such an impediment. Maybe we should start producing p-values to justify non-nested models. I would claim that the likelihood is the quantity that justifies your model. Besides practical concerns, the likelihood ought to be the sole judge of a models fitness or lack there of. >> Do we need statistical verification when the covariate >> effect is biologically plausible and biologically >> significant - or do we need it when biological plausibility >> or significance cannot be assessed (if so then was the >> covariate that important anyway)? >> This is why I would suggest that you look at confidence intervals as well. There are 2 reasons a covariate is not significant 1) there really is nothing going on, 2) your data are not powerful enough to detect a meaningful difference. A very wide confidence indicates that 2) is possible. A narrow confidence interval indicates 1). >> I would certainly be interested in any thoughts that the >> group may have on this, particularly in light of the >> discussion about bootstrap and RT.
Mar 23, 2001 Ganesh R Iyer Bootstrap resampling!
Mar 23, 2001 Paul Williams Re: Bootstrap resampling!
Mar 24, 2001 Nick Holford Re: Bootstrap resampling!
Mar 24, 2001 Harry Mager Hm Antwort: Bootstrap resampling!
Mar 27, 2001 Paul Williams Re: Bootstrap resampling!
Mar 27, 2001 Nick Holford Re: Bootstrap resampling!
Mar 27, 2001 Stephen Duffull RE: Bootstrap resampling!
Mar 28, 2001 Ludger Banken RE: Bootstrap resampling!
Mar 29, 2001 Nick Holford Re: Bootstrap resampling!
Mar 29, 2001 Leonid Gibiansky RE: Bootstrap resampling!
Mar 29, 2001 Diane Mould Re: Bootstrap resampling!
Mar 29, 2001 Leonid Gibiansky RE: Bootstrap resampling!
Mar 29, 2001 Kenneth G. Kowalski RE: Bootstrap resampling!
Mar 29, 2001 Jogarao Gobburu Re: Bootstrap resampling!
Mar 29, 2001 Nick Holford Re: Bootstrap resampling! -> Randomization test
Mar 29, 2001 Leonid Gibiansky RE: Bootstrap resampling! -> Randomization test
Mar 29, 2001 Kenneth G. Kowalski RE: Bootstrap resampling! -> Randomization test
Mar 29, 2001 Jogarao Gobburu Re: Bootstrap resampling! -> Randomization test
Mar 29, 2001 Kenneth G. Kowalski RE: Bootstrap resampling! -> Randomization test
Mar 30, 2001 Stephen Duffull RE: Bootstrap resampling! -> Randomization test
Mar 30, 2001 Nick Holford Re: Bootstrap resampling! -> Randomization test
Mar 30, 2001 Stephen Duffull RE: Bootstrap resampling! -> Randomization test
Mar 30, 2001 Mats Karlsson Re: Bootstrap resampling!
Mar 30, 2001 Lewis B. Sheiner Re: Bootstrap resampling! -> Randomization test
Apr 07, 2001 Smith Brian P Re: Bootstrap resampling! -> Randomization test