Re: Condition number

From: Al Maloney Date: December 01, 2022 technical Source: mail-archive.com
…trying to resend this, as I think only Pete got it, and not NMUSERS! Hi All, I wanted to make two comments related to the interesting discussions around condition number (CN) and a related point regarding Dose-Exposure-Response (D-E-R) modelling with weak data. As a foreword, I always wish to simulate from any model I have developed. Thus I am always interested in the joint distribution of the P parameters given the data, and never just some point estimates of the P parameters. We can either sample, say, 1000 parameter sets (e.g. using Bayesian MCMC) or we can use the multivariate normal (MVN) approximation to draw 1000 samples using the estimated parameters and the approximate var-cov matrix (in a maximum likelihood estimation (MLE) world). The latter is weak for many reasons. For example, it is not parameterisation invariant, so if you model ED50 as ED50 or log(ED50) you will get a different MVN approximation (likelihood profiling may help here, but not nearly enough!). In addition, we are just “hoping” the full joint distribution is MVN based on crude curvature measures at the peak of the likelihood function. These curvature measures can imply 0 correlation between two parameters that are truly dependent (e.g. a u shaped relationship). Thus I would encourage all modellers to think of a “model” as these 1000 samples from the P dimensional joint distribution of the parameters (e.g. I put 0.1% of my belief to each of the 1000 parameter sets). Crude summaries of the marginal distributions of each parameter that are typically reported (such as mean/median, SE, 95% CrI(CI) etc.) are univariate, but our model is always multivariate (hence why any model published is only useful if the authors (minimally) provide the correlation matrix in addition to point estimates and SE's.). In a Bayesian MCMC world you can make the P*P coplot to visualise these correlations…for example, this could show a u shaped relationship between two parameters, whilst the equivalent MVN plot may (incorrectly) suggest the two parameters are uncorrelated; in this case, the MVN approximation is not working well, and hence could yield simulations that are not realistic (i.e. is the model poor, or the MVN approximation poor?). When you realise you really do want to draw random samples of the parameters from their joint distributions based on a given model+data, you will realise why Bayesian MCMC (via Hamiltonian MC (HMC)) is the way to go. After fitting a number of complex NLME models in both MLE and Bayesian HMC frameworks over the last 10-15 years, I consider the MLE+MVN as plain ugly (fast yes, but “best” no). On the condition number (CN) topic From a modelling perspective, the CN only makes sense for the correlation matrix (not the covariance matrix or hessian (both are scale dependent!)), and I understand this is what NM correctly reports. As Ken mentioned, you should always try to understand high correlations, and a heat plot showing the P*P correlations is a good place to start. This may suggest a group of parameters that can be reparameterised to reduce their correlations (e.g. to model 5 Emax parameters as offsets to one Emax parameter, rather than as 6 unique Emax parameters). The CN will increase with any increase in the number of parameters (since no additional parameter will be perfectly orthogonal to all other parameters). Hence any “rules of thumb” are limited. Lower correlations between parameters are desirable. From a numerical perspective, a correlation of 0.999 and 0.998 may be (numerically) as similar as a correlation of 0.002 and 0.001, but the former will have a greater influence on your subsequence simulations. High correlations may be reflective of poor trial design; consider using a better trial design (see next comment!). Lower correlations also help most estimation algorithms work quicker, so some effort to better parameterise your model is often worthwhile (although Radford Neil showed that HMC still works very well with a CN of 10000 using 100 parameters https://youtu.be/ZGtezhDaSpM?t=1659 https://youtu.be/ZGtezhDaSpM?t=1659). On fitting Dose-Exposure-Response (D-E-R) models to weak data, and why a linear D-E-R model is never sound Firstly, all “dose-ranging” and “dose-finding” trials state their objective is to “determine the dose (exposure) response relationship”. However most have awful designs, with far too few dose levels that are far too closely spaced, with small N. Any simple simulation and re-estimation effort would show just how awful they are; they are invariably incapable of accurately and precisely the D-E-R relationship (for efficacy or safety). Pharma must do so much better here! Thus the first goal for any analyst is to ensure any trial design you will be asked to analyse is fit for D-E-R modelling BEFORE you get any data. To avoid the following “car crash” scenario, you need to use a suitable model (like the sigmoidal Emax model), determine the range of credible parameter values, and then use simulation re-estimation to show your company how well/poorly different candidate designs will be at quantifying the true D-E-R relationship. You will find that designs like Placebo, 1 mg, 2 mg and 4 mg are really poor; you typically need much wider dose ranges, with wider dose spacing, and large N. Optimal/adaptive D-E-R trials are best. The “car crash” scenario which I, and many, will have seen is being presented with “weak” data from poorly designed D-E-R trials. The analyst is given data, for example, data from Placebo, 1 mg, 2 mg and 4 mg from a (too small) trial and asked to perform the D-E-R modelling. The doses are far too closely spaced, and the observed data may even appear “linear”. Some awful choices are: 1) Option 1: Refer to your simulation/re-estimation work - you told them it would be a “car-crash”. 2) Option 2: Weak: Fit moderate priors for Emax and Hill, and get 1000 D-E-R curves. Summarise, but warn that there is a lot of “guessing” here. Refer senior management back to 1), and politely ask they listen to you next time :-) 3) Option 3: Wrong: Fit a linear D-E-R model. Do not do this! Think about the 1000 D-E-R you will show; 1000 linear relationships. As an expert in clinical pharmacology, how can you present such nonsense!? You KNOW D-E-R relationships are non-linear! Option 2 will produce a wide “band” of D-E-R curves, with the 1000 D-E-R curves more honestly reflecting the “ignorance” that such a weak design and dataset yield. Comparisons across doses will be (rightly) imprecise. In contrast, Option 3 will yield a “bow-tie” shaped uncertainty, with high precision around the middle of the dose range, with wider uncertainty towards the extremes. This precision in the middle of the dose range is a “fools fallacy”, since it conditions on the fact that the true D-E-R is 100% linear. But you know that is wrong. Appearing linear and being linear are totally different. If I gave you placebo (dose=0) and dose = 100 mg, would you equally enjoy the narrow precision at 50 mg?? I hope not! Alas great D-E-R modelling often needs very wide dose ranges (think 10-100 fold) and large N if you truly wish to precisely estimate D-E-R relationships. Linear D-E-R models are plain wrong. I am quite unfamiliar with NM, but Bob Bauer told me a few years ago that a full Bayesian HMC analysis was supported, so using moderate priors for Emax and Hill should be possible (as in Option 2). In STAN, Option 2 works well in those cases where you cannot influence the trial designs (e.g. in MBMA work, where you integrate D-R models across many drugs/classes, often with limited dose ranges for each drug e.g. like here https://ascpt.onlinelibrary.wiley.com/doi/10.1002/cpt.1307 https://ascpt.onlinelibrary.wiley.com/doi/10.1002/cpt.1307). I hope this is clear. Feel free to reach out to me personally if you would like to follow up “quietly” on anything in the above. Best wishes, Al Al Maloney PhD Consultant Pharmacometrician
Nov 29, 2022 Ayyappa Chaturvedula Condition number
Nov 29, 2022 Kenneth Kowalski RE: Condition number
Nov 29, 2022 Peter Bonate RE: Condition number
Nov 29, 2022 Jeroen Elassaiss-Schaap Re: Condition number
Nov 29, 2022 Kyun-Seop Bae Fwd: Condition number
Nov 30, 2022 Matt Fidler Re: Condition number
Nov 30, 2022 Kenneth Kowalski RE: Condition number
Nov 30, 2022 Leonid Gibiansky Re: Condition number
Nov 30, 2022 Peter Bonate Re: Condition number
Nov 30, 2022 Robert Bauer RE: Condition number
Nov 30, 2022 Bill Denney RE: Condition number
Dec 01, 2022 Kyun-Seop Bae Re: Condition number
Dec 01, 2022 Peter Bonate RE: Condition number
Dec 01, 2022 Kenneth Kowalski RE: Condition number
Dec 01, 2022 Ayyappa Chaturvedula Re: Condition number
Dec 01, 2022 Al Maloney Re: Condition number
Dec 01, 2022 Robert Bauer RE: [EXTERNAL] RE: Condition number
Dec 01, 2022 Robert Bauer Condition number
Dec 02, 2022 Robert Bauer Condition number