RE: WRES AND OUTLIER IDENTIFICATION/EXCLUSION
From: "Kowalski, Ken" Ken.Kowalski@pfizer.com
Subject: RE: [NMusers] WRES AND OUTLIER IDENTIFICATION/EXCLUSION
Date: Thu, 28 Sep 2006 17:08:59 -0400
Mats,
We appear to be in good agreement on all points. Thank you for your
kind words regarding (4) and info on Dr. Sadray's (et al.) paper...I
will certainly take a look at it. I just have a follow up with regards
to your responses to (2).
2) Certainly work needs to be done to evaluate whether this approach
indeed has merit. I agree there is no reason necessarily to believe
outliers are normal, however, we most likely will lack suitable power to
assess the distribution of these outlying data. There is precedence to
consider a mixture model of normal distributions, referred to in the
statistical literature as a contaminated normal distribution where Y is
distributed as (1-p)N(mu,sigma) + (p)N(mu,k(sigma)) where p represents
the fraction of outliers and k is the scale parameter for the increased
variation in the outliers (see Barnett and Lewis, Outliers in
Statistical Data, Wiley, 1978, pp 31-33, 127-130). We propose a
two-stage approach to this contanimated normal distribution by first
estimating p by use of a prespecified outlier criteria and fixing this
through the use of the FLAG variable. In the second stage we estimate k
which is the ratio of sigma2 to sigma1. The outlier criteria, which
would ideally be specified in the analysis plan before starting the
model development, might be something like "flag all data points as
potential outliers for further evaluation where abs(IWRES)>5" (perhaps a
reasonable criteria with dense data). Of course, we could look at a
full likelihood mixture model approach were p and k are simultaneously
estimated. There are other contaminated normal mixture models that
allow for asymmetry (a shift in mu as well as a scale increase in sigma)
and of course mixtures of different distributions between non-outliers
and outliers. Whether we have enough power to discern between various
contaminated distributions and how well they may perform in the context
of PK/PD is certainly an area that could benefit from some research.
Kind regards,
Ken