RE: covariate selection question

From: Kenneth Kowalski Date: January 19, 2006 technical Source: cognigencorp.com
From: "Kowalski, Ken" Ken.Kowalski@pfizer.com Subject: RE: [NMusers] covariate selection question Date: Thu, 19 Jan 2006 16:49:25 -0500 NMusers, I think blaming the NONMEM OBJ and stepwise procedures when we don't like the result of a covariate model selection process is a bit misplaced. I think there are 3 major factors that contribute to a successful covariate model building strategy. To make my point I'll draw an analogy to building a house. The quality of the house we build depends on 1) the materials, 2) the tools available to work with this material, and 3) the proficiency of the builder in using those materials and tools. In covariate model building the material we have to work with is our data, the tools available to us are NONMEM, stepwise procedures, diagnostic plots, etc., and the builder is the modeler. A successful covariate model building strategy depends on how well the modeler understands the limitations of the data, and how effective they are in using the available tools with the data given the limitations. Of course, there are times when our tools are inadequate for the task at hand, however, I think more often the issue is not fully appreciating the limitations of our data and not tailoring our model building strategies to these limitations. I know I'm treading on old ground but in my opinion the diagnostic output from a successful COV step to help us understand the limitations of our data and how it can be used to guide our model building strategy is under-appreciated. Here is my 2 cents on covariate model building and stepwise procedures. I apologize in advance for the long and rambling message, and for treading on old ground. 1) We generally perform systematic procedures for covariate model building to identify a parsimonious model with the fewest covariates that explain as much of the inter-individual variability as possible. We should not be viewing such procedures as providing an assessment of the statistical significance of each covariate parameter. If we want to assess statistical significance of each and every covariate parameter that we might entertain in a systemate covariate selection procedure (e.g., stepwise procedures) we are better off doing this based on a "full model" (see Point 9c below). 2) Stepwise procedures can routinely find a parsimonious model, however, there is no guarantee that they will find the most parsimonious model nor the most biologically plausible model. There may be several almost equally parsimonious models of which a stepwise procedure may find one. Other parsimonious models not selected by a stepwise procedure may be more biologically plausible. 3) While stepwise procedures and the delta OBJ often cannot be used to find a biologically plausible parsimonious model among a search space of both plausible and non-plausible candidate models, this should not be considered an indictment of stepwise procedures or the delta OBJ. In my opinion it should be considered an indictment of the practice of casting too wide a net searching numerous covariate parameters of which many may have dubious biological relevance. While we try to be mechanistic and guided by biology/pharmacology in postulating structural models, when it comes to specifying covariate submodels we often resort to empiricism. We are willing to investigate numerous covariate effects on several parameter submodels in part because it is easy to use a systematic procedure and just turn the crank with little forethought to the covariate parameters we are evaluating. In so doing, we often cross our fingers and hope that the final model selected by a stepwise procedure is one that can be scientifically justified. When the selected model is not scientifically justifiable, it is easy but misguided to place the blame on the stepwise procedure. In this setting the modeler should ask themselves why they are investigating covariate effects that cannot be scientifically justified. Of course, as Mark has pointed out, we need to be cautious here and recognize that with model building we are generating hypotheses and we must be open-minded to possible hypotheses that may run counter to our prior beliefs. 4) Better upfront planning and judicious selection of covariates and covariate-parameter effects can help steer a stepwise procedure to focus only on biologically plausible models. In specifying the covariate parameters to be evaluated by a stepwise procedure, the modeler should ask themselves upfront, "Am I prepared to accept any model within the search space as being scientfically justifiable?" If the answer to this question is "no" then the modeler should re-think the set of covariate parameters before undertaking the stepwise procedure. 5) There may be degrees of biological plausibility. For example, a gender or sex effect may be interpreted as a surrogate for body size rather than an intrinsic gender/sex effect. In this setting one may question whether gender/sex should be included in the investigation. To be more plausible as well as parsimonious in our search of covariate effects the modeler may wish to choose one body size covariate among several measures of body size (body weight, lean body weight, BMI, BSA, etc.) that they feel is the most plausible and use that in the covariate search. Of course the modeler can and should evaluate diagnostics (graphically) to ensure that any trends in the other body size covariate effects not included in the stepwise procedure can be explained by the one selected for evaluation in the stepwise procedure. 6) To avoid or at least reduce the problems associated with collinearity and selection bias we should try to understand the limitations of our data to provide information on the covariate parameters that we wish to evaluate in a stepwise procedure. This is where I depart from others regarding the value of the COV step. I do agree that a successful COV step should not be used as a "stamp of approval" or down-weight/penalize models when the COV step fails. However, when the COV step runs successfully, there is useful diagnostic information in the COV step output that can help steer us away from some of the pitfalls of stepwise procedures such as those encountered by Joern which initiated this email thread (see Points 7 and 8). 7) During base structural model development it is useful to inspect the COV step output to assess correlation in the parameter estimates before undertaking a stepwise procedure. If two structural parameter estimates are highly correlated the modeler may be faced with a difficult decision as to whether a particular covariate effect is more plausible on one structural parameter or the other as there may be insufficient information in the data to investigate the covariate on both structural parameters. For example, suppose concentration-response data has sufficient curvature to support fitting an Emax model but Emax and EC50 may not be precisely estimated. In this setting the correlation in the estimates of Emax and EC50 may be high. This could lead to potentially unstable covariate model investigations (leading to convergence problems) if we begin to evaluate the same covariate on both parameters. For example, suppose that we are interested in evaluating the effect of sex on both Emax and EC50. Inclusion of a sex effect simultaneously on both Emax and EC50 may exacerbate the instability of the model such that the model may not converge. Because of the correlation in these two structural parameter estimates there may be insufficient information in the data to distinguish whether the sex effect should be on the potency or efficacy or both. In this setting the modeler should question whether it is more plausible to investigate a sex effect on potency or efficacy recognizing the limitations of the data to evaluate it on both. If one is more plausible than the other we should not rely on a stepwise procedure to select among the two as it could by random chance select the one that is less plausible simply due to the collinearity in the parameter estimates. 8) Another place where I use the COV step output to help with covariate model building is in evaluating a "full model". By full model I mean the model in which all of the covariate parameters that one might evaluate in a stepwise procedure are included in the model simultaneously. If the COV step output from this full model suggests that it is stable (i.e., no extremely high correlations or numerous moderately high correlations that would result in a extremely high ratio of the largest to the smallest eigenvalues of the correlation matrix of the estimates...obtained from the PRINT=E option on the $COV step) then we have some diagnostic information to suggest that the data can support evaluation of all the covariate effects. 9) Evaluating a full model has intrinsic value regardless of whether or not the full model is used as part of a systematic covariate model building procedure. Some of the benefits of fitting a full model include: a) COV step output can be used to ensure that the data can support the evaluation of ALL the covariate effects of interest (see Point 8 above). b) Among a class of hierarchical covariate models, the full model represents the best that we can do with respect to OBJ. That is, the delta OBJ between the base and full model is the largest. Thus, the full model can be used to help assess the degree of parsimony of a final model selected by a stepwise procedure. A parsimonious model is one that has an OBJ as close to the full model OBJ but with as few covariate parameters as possible. So, if we use a forward selection procedure in a situation like Joern's where perhaps the combination of the two covariate effects that result in a large drop in OBJ only occurs when both are included simultaneously never gets evaluated by the forward selection procedure, we may very well end up with a final model that is not very parsimonious in comparison to the full model. In this particular setting, it may be advantageous to perform a pure backward elimination procedure beginning with the full model, which by definition, would include both of these covariate effects in the model at the start of the procedure. c) If one is interested in assessing statistical significance of ALL the covariate effects, bootstrapping the full model to construct confidence intervals and/or bootstrap p-values is less likely to be prone to statistical issues regarding the adequacy of the Chi-Square assumption for the likelihood ratio test and the problems associated with multiplicity of testing in using a final model based on a covariate selection procedure to assess statistical significance as both issues can result in the inflation of type I errors. Moreover, the issue of ruling out a DDI effect can be easily incorporated by including it in the full model. d) I'll make a shameless plug for the WAM procedure (see Kowalski & Hutmacher, JPP 2001;28:253-275) which makes use of the COV step output from a full model run to identify a subset of potentially parsimonious models that can then be fit in NONMEM. Unlike stepwise procedures that can only select a single parsimonious model, the WAM procedure can give the modeler a sense of the competing models that may have comparable degrees of parsimony. For those interested, Pfizer in collaboration with Pharsight has developed a freeware version of the WAM software that can be downloaded from the NONMEM repository (ftp:/ftp.globomaxnm.com/Public/nonmem). 10) The benefits of the COV step and full model evaluation are difficult to realize unless we are more judicious in our selection of covariates to be investigated. We need to change our practices to understand the limitations of our data when we perform covariate model building and to apply biological reasoning more effectively in developing our submodels. Ken
Jan 17, 2006 Joern Loetsch covariate selection question
Jan 17, 2006 Mark Sale RE: covariate selection question
Jan 17, 2006 Joern Loetsch RE: covariate selection question
Jan 17, 2006 Michael Fossler RE: covariate selection question
Jan 17, 2006 Jakob Ribbing RE: covariate selection question
Jan 17, 2006 Mark Sale RE: covariate selection question
Jan 18, 2006 Mats Karlsson RE: covariate selection question
Jan 18, 2006 Paul Hutson RE: covariate selection question
Jan 18, 2006 Mark Sale RE: covariate selection question
Jan 18, 2006 Jogarao V Gobburu RE: covariate selection question
Jan 18, 2006 Mark Sale RE: covariate selection question
Jan 19, 2006 Kenneth Kowalski RE: covariate selection question
Jan 20, 2006 Mark Sale RE: covariate selection question
Jan 20, 2006 William Bachman RE: covariate selection question
Jan 20, 2006 Mark Sale RE: covariate selection question
Jan 20, 2006 Kenneth Kowalski RE: covariate selection question
Jan 20, 2006 Leonid Gibiansky RE: covariate selection question
Jan 20, 2006 Anthony J. Rossini RE: covariate selection question
Jan 24, 2006 Mats Karlsson RE: covariate selection question