Re: WRES AND OUTLIER IDENTIFICATION/EXCLUSION
From: Nick Holford n.holford@auckland.ac.nz
Subject: Re: [NMusers] WRES AND OUTLIER IDENTIFICATION/EXCLUSION
Date: Monday, October 02, 2006 5:59 PM
Ken,
Thanks for confirming the distinction between outlier subjects (i.e.
large ETA) and subjects with outliers (i.e. large EPS). While attempting
to contribute to this thread I have realized the importance of
distinguishing between 'outlier subjects' and 'subjects with outliers'.
The $MIX code I posted earlier attempts to separate subjects with
outliers from those without outliers based on the size of EPS (see again
below). This is in the spirit of the discussion to identify subjects who
do not fit well and therefore may be subjects with outliers regardless
of any ETA adjustment.
The fractional likelihood method you and Matt H. propose estimates some
average fraction of outlier observations. I am still not very
comfortable about the fractional likelihood method because it assumes a
similar fraction of outlier and non-outlier observations in each subject
(which might be addressed by having an ETA on this fraction) but more
importantly (for this thread) it doesn't distinguish in a discrete way
between outlier and non-outlier observations.
I think we need to have something like $MIX that works at the
observation level (instead of only at the subject level) because it
produces a discrete classification.
You refer to a "two stage approach" in your earlier emails but I am
afraid I didnt really follow all the details. Here is a specific
proposal using NONMEM:
Step 1: Use the simple $MIX method (See Step 1 code below) to identify
subjects with outliers from subjects without outliers. Save posthoc
estimates of the ETA specific parameters for the next step. This step
estimates 'P' (your nomenclature) i.e. the proportion of subjects with
outliers and also discretely identifies (using MIXEST) which subject
belongs to each group (ME1).
Step 2: Code the data with ID changing for each observation and use the
individual posthoc parameters (FIXED) to make subject specific
predictions. Set up a $MIX model to identify the overall proportion of
observation level outliers in the total set of observations (See Step 2
code below). You can estimate different residual error model parameters
for outlier and non-outlier observations ('K') during this second step
but the main outcome would to use MIXEST at the observation level to
distinguish outlier observations (ME2).
The second step can be used to to test the hypothesis that there are
indeed two populations of observations i.e. outliers and non-outliers by
fitting with and without $MIX and 'K'.
Overall this 2 step procedure is not as elegant as doing it in one step
but it could maybe achieve the goal of this thread i.e. to identify
outlier observations with a somewhat objective and automatable
procedure. The main problem I think is in the posthoc estimates of the
parameters in Step 1 which will be somewhat contaminated with the
misspecification of the outlier vs nonoutlier residual error. Perhaps it
is here that the fractional likelihood method might help.
Best wishes,
Nick
Caution: The following code has not been tested. It is intended
primarily to demonstrate the ideas being discused. It might easily
contain errors!
************* STEP 1 ******************
$THETA
(0,0.1,1) ; P prob of being subject with outlier 1
(0,3,); Pop clearance 2
(0,10,); Pop volume 3
(0,1,) ; SD of additive residual error 4
(0,1,) ; K fractional difference in SD in subjects with outliers 5
$OMEGA
0.5 ; between subject variability in CL
0.5 ; between subject variability in V
$SIGMA
1 FIX ; unit random effect
$MIX
NSPOP=2
P(1)=THETA(1) ; P prob of being subject with outlier
P(2)=1-P(1)
$PRED
IF (NEWIND.EQ.0) OBID=0 ; counter for observations
CL=THETA(2)*EXP(ETA(1))
V=THETA(3)*EXP(ETA(2))
MYPRED=DOSE/V*EXP(-CL/V*TIME)
;SD= ... any residual error model you want expressed with THETAs e.g
SD=THETA(4) ; additive residual error SD
IF (MIXNUM.EQ.1) THEN ; subject with outliers
KOUT=THETA(3)
ELSE
KOUT=1
ENDIF
Y=MYPRED+KOUT*SD*EPS(1)
IF (MDV.EQ.0) OBID=OBID+1 ; create observation level ID
ME1=MIXEST
$TABLE ID ME1 OBID TIME DOSE CL V DV MDV
NOAPPEND NOPRINT ONEHEADER FILE=step1.fit
************* STEP 2 ******************
$DATA step1.fit IGNORE=@
;OBID is an observation level ID type of data item
;ICL and IV are the posthoc individual estimates of CL and V from Step 1
$INPUT SID ME1 ID=OBID TIME DOSE ICL IV DV MDV
$THETA
(0,0.1,1) ; proportion of obs which are outliers 1
(0,1,) ; SD of additive residual error 2
(0,1,) ; K fractional difference in SD in outlier population 3
$OMEGA 0 FIX ; keep NM-TRAN happy (no OMEGAs needed in this step
$SIGMA 1 FIX ; unit random effect
$MIX
NSPOP=2
P(1)=THETA(1) ; proportion of obs which are outliers
P(2)=1-P(1)
$PRED
MYPRED=DOSE/IV*EXP(-ICL/IV*TIME)
SD=THETA(2) ; additive residual error SD
IF (MIXNUM.EQ.1) THEN ; outlier observation
KOUT=THETA(3)
ELSE
KOUT=1
ENDIF
Y=MYPRED+KOUT*SD*EPS(1)
ME2=MIXEST
$TABLE SID ID ME1 ME2
--
Nick Holford, Dept Pharmacology & Clinical Pharmacology
University of Auckland, 85 Park Rd, Private Bag 92019, Auckland, New Zealand
email:n.holford@auckland.ac.nz tel:+64(9)373-7599x86730 fax:373-7556
http://www.health.auckland.ac.nz/pharmacology/staff/nholford/