Imputation of multiple categorical covariates with missing data

From: Ykl7 Date: September 23, 2015 technical Source: mail-archive.com
Dear NMusers, I would like to investigate the effect of several genotypes on clearance in a pop PK model. The issue is most genotypes have some amount of missing data. I have discarded the genotypes which have way too many missing samples ( >30%) and now want to handle the remaining genotypes appropriately, before I move on to an automated stepwise covariate search in PsN. A colleague informed me that the following mixture model can serve for imputation of a single categorical covariate (let's call it GENO): ----------------------------------- ; In the dataset, the genotype is saved in the variable GENO and coded -99 if unknown, otherwise it takes on the values 0,1,2 $INPUT ID OCC TIME AMT DV .... GENO .... $PK ; here you check if the genotype is available or not (GENO==--99). If it's available, you save the new variable GENOME=GENO... IF (GENO.NE.-99) THEN GENOME = GENO ; ... otherwise you use the mixture to impute GENOME ELSE IF (MIXNUM.EQ.1) GENOME = 0 IF (MIXNUM.EQ.2) GENOME = 1 IF (MIXNUM.EQ.3) GENOME = 2 ENDIF ; then you use the variable GENOME (not GENO, which was in the dataset) to define CL, or whichever other parameter you want. ; you need to use a new variable since NONMEM won't let you change the value of one of the fields in the dataset. IF(GENOME.EQ.0) THEN TVCL = THETA(1)*((WT/12.5)**0.75) TVBIO = 1 ENDIF ; Three sub-populations whose proportion is given by the THETAs $MIX NSPOP=3 P(1)=THETA(14) P(2)=THETA(15) P(3)=THETA(16) $THETA 0.4 FIX ; GENO = 0 fixed to observed proportion in known genotype $THETA 0.4 FIX ; GENO = 1 fixed to observed proportion in known genotype $THETA 0.2 FIX ; GENO = 2 fixed to observed proportion in known genotype ----------------------------------- *The question is how to repeat such an approach when there are several missing genotypes (GENO1, GENO2, ..., GENOX) which need to be explored? * The answer I received from my colleague is it would be rather difficult, as the mixture model would require the specification of every possible combination of different genotypes. One approach I am considering is performing the stepwise covariate search in PsN (where per default missing categorical data is set to equal the most common value). Then I retrace the steps of the search based on the scm log file and check the difference between the OFV drops + p-values of the chosen relationships with those observed had a mixture model approach been used. If the difference is small and far removed from any other relationships which could have been chosen, I accept it and build my covariate model. Any input on this matter would be very much appreciated. Have a good day. Best regards, Yassine Roskilde Hospital Denmark
Sep 23, 2015 Ykl7 Imputation of multiple categorical covariates with missing data
Sep 23, 2015 Leonid Gibiansky Re: Imputation of multiple categorical covariates with missing data
Sep 23, 2015 Ron Keizer Re: Imputation of multiple categorical covariates with missing data