RE: Outliers and the FDA guideline

From: Matt Hutmacher Date: August 18, 2004 technical Source: cognigencorp.com
From: "Hutmacher, Matt" Matt.Hutmacher@pfizer.com Subject: RE: [NMusers] Outliers and the FDA guideline Date: Wed, August 18, 2004 11:43 am Outliers are a difficult subject. I think if you asked 10 different modelers you would get 10 different answers on how to handle them. I would suggest a systematic approach to data elimination in general. A systematic approach is the analyst's best surrogate for objectivity, since only the reviewer/audience can determine ultimately the level of objectivity. For an analysis which will be submitted to a regulatory authority, I would advocate specifying the criteria for classifying data as outliers a priori (before unblinding the data) in a population modeling analysis plan. This document should also specify how the analyst will determine if the outlier is influential and how he/she will proceed if the outlier is influential. This systematic, pre-specified approach will mitigate the subjectivity induced by eliminating data a posteriori. In general, my opinion is that it is best to include all the data whenever possible. If there are number of outliers, one might try using a mixture of epsilons (and hence variances) to down-weight these observations and reduce their influence. Sometimes, handling of outliers will depend on the goal of the analysis, and the outliers may not fulfill pre-specified criteria such as |residuals|>=3 or 4. For example, we did a population PK (PPK) analysis on some sparse data. Because the estimated CV of residual variation was >80%, no data appeared as outliers by the usual residual criteria. When you looked at the data (2 samples, 1 hour apart for each individual for each visit), some visits appeared to have concentrations, which were ascending with time (as if absorption were occurring). However, these pseudo-absorption phases were occurring much to late in the dosing interval; these were "highly improbable" observations (as below) given the drug had very predictable absorption in every other study. We figured these results were do to incorrect recollection/recording of the last administered dose. Thus, the large CV estimate was from the model predicting elimination when the data were exhibiting this pseudo-absorption. Ultimately, the purpose of the PPK analysis was to test for influential covariates. The large %CV would reduce the power to detect these covariates, so (in my opinion) it was of interest to eliminate these data points (since any attempt we made to include them failed) to better perform the exploratory covariate analysis. To mitigate the subjectivity induced by selecting the points by visual inspection (again 10 analysts might end up with 10 different data sets), we used a mixture model on Tlag. Three mixtures were discovered, the "typical", "unrealistic 1", and "unrealistic 2" absorbers. The model classified each visit for each patient into one of these three categories. We plotted the data by the three mixture classifications and it was easy to see that these data had different, unlikely characteristics. These data were deleted, and the CV was reduced to ~30%. The reviewer/audience could disagree with the procedure, but if he/she thought it was reasonable, then there would be no argument over classifying which data should be eliminated. Matt
Aug 18, 2004 Thomas Klitgaard Outliers and the FDA guideline
Aug 18, 2004 Robert L. James RE: Outliers and the FDA guideline
Aug 18, 2004 Matt Hutmacher RE: Outliers and the FDA guideline
Aug 23, 2004 Mats Karlsson RE: Outliers and the FDA guideline
Aug 23, 2004 Matt Hutmacher RE: Outliers and the FDA guideline