Cross-validation script in NM

5 messages 4 people Latest: Dec 06, 2014

Cross-validation script in NM

From: Pieter Colin Date: December 04, 2014 technical
Dear nm-users, I'm trying to construct a NONMEM control file to be used in a cross-validation study. In a first problem statement I run an estimation step on a subset of my data. In a subsequent problem statement (within the same control file) I am trying to predict the PK of the subset that was not included in part 1. I managed to do this by use of the MSFO option (in the first part of the control file) and the $MSFI in de second part. However, it appears that time-varying covariates (defined under $PK in the first problem statement) are not evaluated when performing the predictions for the second problem statement. Does anyone know of a workaround for this or is there another way of combining a fit and predict action (both on different data) within the same control-file? Kind regards, Pieter -- Pieter Colin, Pharm.D., Ph.D. Post-Doctoral researcher (Faculty of Pharmaceutical Sciences - Ghent University) Associate Professor (Department of Anesthesiology - UMCG)

Re: Cross-validation script in NM

From: Kajsa Harling Date: December 05, 2014 technical
Dear Pieter, If your end goal is to perform k-fold cross-validation, you can use PsN's crossval program. It is run with commands like crossval mymodel.mod -groups=5 where mymodel.mod is any regular estimation-type control stream. This will perform 'groups'-fold cross-validation. The prediction models are copies of the estimation models, except that $DATA is changed, initial estimates are automatically set to final estimates from the estimations, and MAXEVAL is set to 0 (or corresponding for non-classical estimation methods). $PK will be identical, so there should be no problem with time-varying covariates. Just make sure to delete the run folder after you have retrieved the results, because it will be very large. Best regards, Kajsa On 12/04/2014 06:03 PM, Pieter Colin wrote: Dear nm-users, I’m trying to construct a NONMEM control file to be used in a cross-validation study. In a first problem statement I run an estimation step on a subset of my data. In a subsequent problem statement (within the same control file) I am trying to predict the PK of the subset that was not included in part 1. I managed to do this by use of the MSFO option (in the first part of the control file) and the $MSFI in de second part. However, it appears that time-varying covariates (defined under $PK in the first problem statement) are not evaluated when performing the predictions for the second problem statement. Does anyone know of a workaround for this or is there another way of combining a fit and predict action (both on different data) within the same control-file? Kind regards, Pieter -- Pieter Colin, Pharm.D., Ph.D. Post-Doctoral researcher (Faculty of Pharmaceutical Sciences – Ghent University) Associate Professor (Department of Anesthesiology – UMCG) -- ----------------------------------------------------------------- Kajsa Harling, PhD System Developer Department of Pharmaceutical Biosciences Uppsala University [email protected] +46-(0)18-471 4308 http://www.farmbio.uu.se/research/researchgroups/pharmacometrics/

RE: Cross-validation script in NM

From: Pieter Colin Date: December 05, 2014 technical
Dear Kajsa and Dennis Thank you for your thoughts on this. I know of (and have used several times in the past) the mentioned functionalities in PsN and PLT-tools. However, due to the specific nature of my problem, I'm afraid these will not work for me. Allow me to further clarify my problem. (For clarity, I've included a piece of my control stream at the bottom of this message.) As Dennis pointed out, I'm fitting a training group and use the final parameter estimates in a subsequent run to predict the plasmaconcentrations of the validation group. I failed to clarify this in my previous message, but I'm predicting the plasmaconcentrations for the validation group according to a TDM setting. This means that for the validation group MAXEVAL=0 and only the first through sample per ID is included in the dataset as an observation event(EVID=0 and MDV=0). It goes without saying that the objective is to accurately predict the other plasmaconcentrations (EVID=2 and MDV=1) for the IDs in the validation group. Now to get to the problem. I tried this approach with two separate control streams and it works. I.e. plasmaconcentrations are predicted for the validation group based on the post-hoc corrected final parameter estimates of the training group. However, when I combine these in a single control stream (as shown below) the time-varying covariates are not taken into account for the validation group. More specifically, the following statement (under $PK) is not evaluated for the IDs in the validation group (statement used to switch on/off an additional CL due to hemodialysis). CL_DIA = 0 IF(DIALYSIS.EQ.1) CL_DIA = THETA(6) IND=0 IF(IND_DIA.EQ.1) IND=1 This causes the hemodialysis moments to be ignored by NM in the validation group when using the control stream as shown below. Since it worked for me using separate control streams, it seems that the problem is associated with the use of MSFO=... and $MSFI in the training and validation set, respectively. Do any of you have a specific solution for this problem or could shed some light on specific behavior of the $MSFI option in NM which might be causing this? Kind regards, Pieter Colin $PROBLEM No covariates ;; 1. Based on: ;; COMMENT: ;--------------------------------------------------------------------------------- ;----------------------- FIT XVAL -------------------------------------------- ;--------------------------------------------------------------------------------- $INPUT ID TIME DV CMT AMT RATE EVID MDV UVOL EXTRA IND_DIA OCC DIALYSIS ANALYSIS BV MISSING AGE WGT HGT BMI BSA SOFA M1F2 GFR XVAL $DATA RawdataCFP_cov_ext.csv IGNORE=@ IGNORE(MISSING.EQ.1) ;Exclude missing values IGNORE(CMT.GT.3) ;Exclude CSF sample IGNORE(XVAL.EQ.1) REWIND $SUBROUTINE ADVAN13 TOL=12 $MODEL COMP(CENTRAL,DEFOBS,DEFDOSE) COMP(PERIPH) COMP(URINE,INITIALOFF) $PK ;------------- Calculation of Time After Dose ------------ IF (EVID.EQ.1.OR.EVID.EQ.4) THEN TDOS=TIME TAD=0.0 ENDIF IF (EVID.NE.1.AND.EVID.NE.4) TAD=TIME-TDOS TVCLOTHER =THETA(1) CLOTHER =TVCLOTHER*EXP(ETA(4)) TVCL = THETA(2) CL = TVCL*EXP(ETA(1)) TVV1 = THETA(3) V1 =TVV1*EXP(ETA(2)) TVV2 =THETA(4) V2 =TVV2*EXP(ETA(3)) TVQ =THETA(5) Q =TVQ ;------------- Dialysis submodel ------------------------ CL_DIA = 0 IF(DIALYSIS.EQ.1) CL_DIA = THETA(6) IND=0 IF(IND_DIA.EQ.1) IND=1 S1=V1 S3=UVOL K10=CLOTHER/V1 K12=Q/V1 K21=Q/V2 K13=CL/V1 K11=CL_DIA/V1 $DES DADT(1)=-K12*A(1)+K21*A(2)-K10*A(1)-K13*A(1)-K11*A(1)*IND DADT(2)=K12*A(1)-K21*A(2) DADT(3)=K13*A(1) $ERROR IPRED = 1E-3 IF(F.GT.0) IPRED=F Y = IPRED*(1+EPS(1)) IRES = DV-IPRED IWRES = IRES/(IPRED*SQRT(SIGMA(1,1))) IF(CMT.EQ.3) THEN Y = IPRED*(1+EPS(2)) IRES = DV-IPRED IWRES = IRES/SQRT(IPRED*IPRED*SIGMA(2,2)) ENDIF $THETA (1E-9,1.097450) ; CLOTHER; L/h (1E-9,2.124530) ; CL; L/h (1E-9,8.640870) ; V1; L (1E-9,18.58180) ; V2; L (1E-9,34.13580) ; Q; L/h (1E-9,4.046690) ; CL_DIA; L/h $OMEGA 1.265890 ; IIV_CL 0.387112 ; IIV_V1 0.186287 ; IIV_V2 0.371892 ; IIV_CLOTHER $SIGMA 0.090199 ; Proportional plasma 0.106711 ; Proportional urine $ESTIMATION SIG=2 MAX=9999 METHOD=1 SORT INTERACTION POSTHOC PRINT=1 MSFO=run61.msf ;--------------------------------------------------------------------------------- ;----------------------- POST HOC -------------------------------------------- ;--------------------------------------------------------------------------------- $PROBLEM PREDICT XVAL1 $INPUT ID TIME DV CP CMT AMT RATE EVID MDV UVOL EXTRA IND_DIA OCC DIALYSIS ANALYSIS BV MISSING AGE WGT HGT BMI BSA SOFA M1F2 GFR TDM XVAL $DATA RawdataCFP_xval_ext.csv IGNORE=@ IGNORE(MISSING.EQ.1) ;Exclude missing values IGNORE(CMT.GT.3) ;Exclude CSF sample IGNORE(XVAL.NE.1) REWIND $MSFI run61.msf $ESTIMATION SIG=2 MAX=0 METHOD=1 SORT INTERACTION POSTHOC PRINT=1 ...

Re: Cross-validation script in NM

From: Ron Keizer Date: December 05, 2014 technical
hi Pieter, It is not exactly clear from the NONMEM manuals what information is included in the MSFO file, or the exact mechanism that is used to re-implement the model, so it's hard to say if this should really work the way you expect it to. To my knowledge (and the manual), model specification files are intended for restarting the estimation process, and not really for analyses where you use a different dataset in the second estimation (or simulation) step. For an analysis such as yours, or in fact any simulation or re-estimation analysis, I would always do the data mangling outside of NONMEM, e.g. in R or Perl. That way you will have full control over what is going on, and you'll use NONMEM only for what it is good at (i.e. estimation). hope this helps, Ron ---------------------------------------------- Ron Keizer, PharmD PhD Dept. of Bioengineering & Therapeutic Sciences University of California San Francisco (UCSF) ----------------------------------------------
Quoted reply history
On Fri, Dec 5, 2014 at 11:03 AM, Pieter Colin <[email protected]> wrote: > Dear Kajsa and Dennis > > > > Thank you for your thoughts on this. > > I know of (and have used several times in the past) the mentioned > functionalities in PsN and PLT-tools. > > However, due to the specific nature of my problem, I’m afraid these will > not work for me. > > > > Allow me to further clarify my problem. (For clarity, I’ve included a > piece of my control stream at the bottom of this message.) > > As Dennis pointed out, I’m fitting a training group and use the final > parameter estimates in a subsequent run to predict the plasmaconcentrations > of the validation group. > > I failed to clarify this in my previous message, but I’m predicting the > plasmaconcentrations for the validation group according to a TDM setting. > > This means that for the validation group MAXEVAL=0 and only the first > through sample per ID is included in the dataset as an observation > event(EVID=0 and MDV=0). > > It goes without saying that the objective is to accurately predict the > other plasmaconcentrations (EVID=2 and MDV=1) for the IDs in the validation > group. > > > > Now to get to the problem. I tried this approach with two separate control > streams and it works. > > I.e. plasmaconcentrations are predicted for the validation group based on > the post-hoc corrected final parameter estimates of the training group. > > > > However, when I combine these in a single control stream (as shown below) > the time-varying covariates are not taken into account for the validation > group. > > More specifically, the following statement (under $PK) is not evaluated > for the IDs in the validation group (statement used to switch on/off an > additional CL due to hemodialysis). > > > > CL_DIA = 0 > > IF(DIALYSIS.EQ.1) CL_DIA = THETA(6) > > > > IND=0 > > IF(IND_DIA.EQ.1) IND=1 > > > > This causes the hemodialysis moments to be ignored by NM in the validation > group when using the control stream as shown below. > > Since it worked for me using separate control streams, it seems that the > problem is associated with the use of MSFO=… and $MSFI in the training and > validation set, respectively. > > Do any of you have a specific solution for this problem or could shed some > light on specific behavior of the $MSFI option in NM which might be causing > this? > > > > Kind regards, > > > > Pieter Colin > > > > > > $PROBLEM No covariates > > ;; 1. Based on: > > ;; COMMENT: > > > > > ;--------------------------------------------------------------------------------- > > ;----------------------- FIT XVAL > -------------------------------------------- > > > ;--------------------------------------------------------------------------------- > > > > $INPUT ID TIME DV CMT AMT RATE EVID MDV UVOL EXTRA IND_DIA OCC > > DIALYSIS ANALYSIS BV MISSING AGE WGT HGT BMI BSA SOFA M1F2 GFR > XVAL > > > > $DATA RawdataCFP_cov_ext.csv > > IGNORE=@ IGNORE(MISSING.EQ.1) ;Exclude missing > values > > IGNORE(CMT.GT.3) ;Exclude CSF sample > > IGNORE(XVAL.EQ.1) REWIND > > > > $SUBROUTINE ADVAN13 TOL=12 > > > > $MODEL COMP(CENTRAL,DEFOBS,DEFDOSE) COMP(PERIPH) > > COMP(URINE,INITIALOFF) > > > > $PK > > ;------------- Calculation of Time After Dose ------------ > > > > IF (EVID.EQ.1.OR.EVID.EQ.4) THEN > > TDOS=TIME > > TAD=0.0 > > ENDIF > > IF (EVID.NE.1.AND.EVID.NE.4) TAD=TIME-TDOS > > > > TVCLOTHER =THETA(1) > > CLOTHER =TVCLOTHER*EXP(ETA(4)) > > > > TVCL = THETA(2) > > CL = TVCL*EXP(ETA(1)) > > > > TVV1 = THETA(3) > > V1 =TVV1*EXP(ETA(2)) > > > > TVV2 =THETA(4) > > V2 =TVV2*EXP(ETA(3)) > > > > TVQ =THETA(5) > > Q =TVQ > > > > ;------------- Dialysis submodel ------------------------ > > CL_DIA = 0 > > IF(DIALYSIS.EQ.1) CL_DIA = THETA(6) > > > > IND=0 > > IF(IND_DIA.EQ.1) IND=1 > > > > S1=V1 > > S3=UVOL > > > > K10=CLOTHER/V1 > > K12=Q/V1 > > K21=Q/V2 > > K13=CL/V1 > > K11=CL_DIA/V1 > > > > $DES > > DADT(1)=-K12*A(1)+K21*A(2)-K10*A(1)-K13*A(1)-K11*A(1)*IND > > DADT(2)=K12*A(1)-K21*A(2) > > DADT(3)=K13*A(1) > > > > $ERROR > > IPRED = 1E-3 > > IF(F.GT.0) IPRED=F > > > > Y = IPRED*(1+EPS(1)) > > IRES = DV-IPRED > > IWRES = IRES/(IPRED*SQRT(SIGMA(1,1))) > > > > IF(CMT.EQ.3) THEN > > Y = IPRED*(1+EPS(2)) > > IRES = DV-IPRED > > IWRES = IRES/SQRT(IPRED*IPRED*SIGMA(2,2)) > > ENDIF > > > > $THETA > > (1E-9,1.097450) ; CLOTHER; L/h > > (1E-9,2.124530) ; CL; L/h > > (1E-9,8.640870) ; V1; L > > (1E-9,18.58180) ; V2; L > > (1E-9,34.13580) ; Q; L/h > > (1E-9,4.046690) ; CL_DIA; L/h > > > > $OMEGA > > 1.265890 ; IIV_CL > > 0.387112 ; IIV_V1 > > 0.186287 ; IIV_V2 > > 0.371892 ; IIV_CLOTHER > > > > $SIGMA > > 0.090199 ; Proportional plasma > > 0.106711 ; Proportional urine > > > > $ESTIMATION SIG=2 MAX=9999 METHOD=1 SORT INTERACTION POSTHOC PRINT=1 > > MSFO=run61.msf > > > > > ;--------------------------------------------------------------------------------- > > ;----------------------- POST HOC > -------------------------------------------- > > > ;--------------------------------------------------------------------------------- > > > > $PROBLEM PREDICT XVAL1 > > > > $INPUT ID TIME DV CP CMT AMT RATE EVID MDV UVOL EXTRA IND_DIA OCC > > DIALYSIS ANALYSIS BV MISSING AGE WGT HGT BMI BSA SOFA M1F2 GFR > TDM XVAL > > > > $DATA RawdataCFP_xval_ext.csv > > IGNORE=@ > > IGNORE(MISSING.EQ.1) ;Exclude missing > values > > IGNORE(CMT.GT.3) ;Exclude CSF sample > > IGNORE(XVAL.NE.1) REWIND > > > > $MSFI run61.msf > > > > $ESTIMATION SIG=2 MAX=0 METHOD=1 SORT INTERACTION POSTHOC PRINT=1 > > > > … > > >

Re: Cross-validation script in NM

From: Alison Boeckmann Date: December 06, 2014 technical
Dear Pieter, I think there is a mistake in the control stream. (Thank you for including it.) The two problems use two different data sets: RawdataCFP_cov_ext.csv RawdataCFP_xval_ext.csv Nothing is wrong with that. However, the $INPUT statements are not the same. For the first problem: $INPUT ID TIME DV CMT AMT RATE EVID MDV UVOL EXTRA IND_DIA OCC DIALYSIS ANALYSIS BV MISSING AGE WGT HGT BMI BSA SOFA M1F2 GFR XVAL For the second problem: $INPUT ID TIME DV CP CMT AMT RATE EVID MDV UVOL EXTRA IND_DIA OCC DIALYSIS ANALYSIS BV MISSING AGE WGT HGT BMI BSA SOFA M1F2 GFR TDM XVAL Notice CP and TDM are listed in the second problem but not the first. When the problems are part of the same NONMEM run, the $PK abbreviated code is specified only once. For the first problem, the variables of interest are obtained this way in FSUBS: EVID=EVTREC(NVNT,007) UVOL=EVTREC(NVNT,009) IND_DIA=EVTREC(NVNT,011) DIALYSIS=EVTREC(NVNT,013) CMT=EVTREC(NVNT,004) If I run the second problem by itself, the variables of interest are: EVID=EVTREC(NVNT,008) UVOL=EVTREC(NVNT,010) IND_DIA=EVTREC(NVNT,012) DIALYSIS=EVTREC(NVNT,014) CMT=EVTREC(NVNT,005) The presence of CP causes them to be shifted over. If you have two separate runs, then the code in FSUBS can be different, and IND_DIA and DIALYSIS etc. will be in the right places. But when there is only one run, the variables of interest have the wrong values in the second problem. The use of $MSFO/$MSFI has nothing to do with it. I suggest that you change the second $INPUT record to specify CP=DROP. You probably don't need TDM=DROP in the second problem; NM-TRAN will find XVAL during the data pre-processor step, and will IGNORE/ACCEPT the appropriate records. As general approach to complicated models and trouble shooting: there is no point running $ESTIM till you are sure you've got the model right! Instead, *each* problem should display items of interest $TABLE ID TIME IND_DIA DIALYSIS TDM etc. Now you can make sure you are getting the correct data items from the correct records. Alison Boeckmann
Quoted reply history
On Fri, Dec 5, 2014, at 02:03 AM, Pieter Colin wrote: > Dear Kajsa and Dennis > > Thank you for your thoughts on this. > I know of (and have used several times in the past) the mentioned > functionalities in PsN and PLT-tools. > However, due to the specific nature of my problem, I’m afraid these > will not work for me. > > Allow me to further clarify my problem. (For clarity, I’ve included a > piece of my control stream at the bottom of this message.) > As Dennis pointed out, I’m fitting a training group and use the final > parameter estimates in a subsequent run to predict the > plasmaconcentrations of the validation group. > I failed to clarify this in my previous message, but I’m predicting > the plasmaconcentrations for the validation group according to a TDM > setting. > This means that for the validation group MAXEVAL=0 and only the first > through sample per ID is included in the dataset as an observation > event(EVID=0 and MDV=0). > It goes without saying that the objective is to accurately predict the > other plasmaconcentrations (EVID=2 and MDV=1) for the IDs in the > validation group. > > Now to get to the problem. I tried this approach with two separate > control streams and it works. > I.e. plasmaconcentrations are predicted for the validation group based > on the post-hoc corrected final parameter estimates of the > training group. > > However, when I combine these in a single control stream (as shown > below) the time-varying covariates are not taken into account for the > validation group. > More specifically, the following statement (under $PK) is not > evaluated for the IDs in the validation group (statement used to > switch on/off an additional CL due to hemodialysis). > > CL_DIA = 0 > IF(DIALYSIS.EQ.1) CL_DIA = THETA(6) > > IND=0 > IF(IND_DIA.EQ.1) IND=1 > > This causes the hemodialysis moments to be ignored by NM in the > validation group when using the control stream as shown below. > Since it worked for me using separate control streams, it seems that > the problem is associated with the use of MSFO=… and $MSFI in the > training and validation set, respectively. > Do any of you have a specific solution for this problem or could shed > some light on specific behavior of the $MSFI option in NM which might > be causing this? > > Kind regards, > > Pieter Colin > > > $PROBLEM No covariates > ;; 1. Based on: > ;; COMMENT: > > ;--------------------------------------------------------------------------------- > ;----------------------- FIT XVAL > -------------------------------------------- > ;--------------------------------------------------------------------------------- > > $INPUT ID TIME DV CMT AMT RATE EVID MDV UVOL EXTRA IND_DIA OCC > DIALYSIS ANALYSIS BV MISSING AGE WGT HGT BMI BSA SOFA M1F2 GFR XVAL > > $DATA RawdataCFP_cov_ext.csv > IGNORE=@ IGNORE(MISSING.EQ.1) ;Exclude missing values > IGNORE(CMT.GT.3) ;Exclude CSF sample > IGNORE(XVAL.EQ.1) REWIND > > $SUBROUTINE ADVAN13 TOL=12 > > $MODEL COMP(CENTRAL,DEFOBS,DEFDOSE) COMP(PERIPH) > COMP(URINE,INITIALOFF) > > $PK > ;------------- Calculation of Time After Dose ------------ > > IF (EVID.EQ.1.OR.EVID.EQ.4) THEN > TDOS=TIME > TAD=0.0 > ENDIF > IF (EVID.NE.1.AND.EVID.NE.4) TAD=TIME-TDOS > > TVCLOTHER =THETA(1) > CLOTHER =TVCLOTHER*EXP(ETA(4)) > > TVCL = THETA(2) > CL = TVCL*EXP(ETA(1)) > > TVV1 = THETA(3) > V1 =TVV1*EXP(ETA(2)) > > TVV2 =THETA(4) > V2 =TVV2*EXP(ETA(3)) > > TVQ =THETA(5) > Q =TVQ > > ;------------- Dialysis submodel ------------------------ > CL_DIA = 0 > IF(DIALYSIS.EQ.1) CL_DIA = THETA(6) > > IND=0 > IF(IND_DIA.EQ.1) IND=1 > > S1=V1 > S3=UVOL > > K10=CLOTHER/V1 > K12=Q/V1 > K21=Q/V2 > K13=CL/V1 > K11=CL_DIA/V1 > > $DES > DADT(1)=-K12*A(1)+K21*A(2)-K10*A(1)-K13*A(1)-K11*A(1)*IND > DADT(2)=K12*A(1)-K21*A(2) > DADT(3)=K13*A(1) > > $ERROR > IPRED = 1E-3 > IF(F.GT.0) IPRED=F > > Y = IPRED*(1+EPS(1)) > IRES = DV-IPRED > IWRES = IRES/(IPRED*SQRT(SIGMA(1,1))) > > IF(CMT.EQ.3) THEN > Y = IPRED*(1+EPS(2)) > IRES = DV-IPRED > IWRES = IRES/SQRT(IPRED*IPRED*SIGMA(2,2)) > ENDIF > > $THETA > (1E-9,1.097450) ; CLOTHER; L/h > (1E-9,2.124530) ; CL; L/h > (1E-9,8.640870) ; V1; L > (1E-9,18.58180) ; V2; L > (1E-9,34.13580) ; Q; L/h > (1E-9,4.046690) ; CL_DIA; L/h > > $OMEGA > 1.265890 ; IIV_CL > .387112 ; IIV_V1 > .186287 ; IIV_V2 > .371892 ; IIV_CLOTHER > > $SIGMA > 0.090199 ; Proportional plasma > .106711 ; Proportional urine > > $ESTIMATION SIG=2 MAX=9999 METHOD=1 SORT INTERACTION POSTHOC PRINT=1 > MSFO=run61.msf > > ;--------------------------------------------------------------------------------- > ;----------------------- POST HOC > -------------------------------------------- > ;--------------------------------------------------------------------------------- > > $PROBLEM PREDICT XVAL1 > > $INPUT ID TIME DV CP CMT AMT RATE EVID MDV UVOL EXTRA IND_DIA OCC > DIALYSIS ANALYSIS BV MISSING AGE WGT HGT BMI BSA SOFA M1F2 GFR > TDM XVAL > > $DATA RawdataCFP_xval_ext.csv > IGNORE=@ > IGNORE(MISSING.EQ.1) ;Exclude missing values > IGNORE(CMT.GT.3) ;Exclude CSF sample > IGNORE(XVAL.NE.1) REWIND > > $MSFI run61.msf > > $ESTIMATION SIG=2 MAX=0 METHOD=1 SORT INTERACTION POSTHOC PRINT=1 > > … > -- Alison Boeckmann [email protected]