Re: Condition number
…trying to resend this, as I think only Pete got it, and not NMUSERS!
Hi All,
I wanted to make two comments related to the interesting discussions around
condition number (CN) and a related point regarding Dose-Exposure-Response
(D-E-R) modelling with weak data.
As a foreword, I always wish to simulate from any model I have developed. Thus
I am always interested in the joint distribution of the P parameters given the
data, and never just some point estimates of the P parameters. We can either
sample, say, 1000 parameter sets (e.g. using Bayesian MCMC) or we can use the
multivariate normal (MVN) approximation to draw 1000 samples using the
estimated parameters and the approximate var-cov matrix (in a maximum
likelihood estimation (MLE) world). The latter is weak for many reasons. For
example, it is not parameterisation invariant, so if you model ED50 as ED50 or
log(ED50) you will get a different MVN approximation (likelihood profiling may
help here, but not nearly enough!). In addition, we are just “hoping” the full
joint distribution is MVN based on crude curvature measures at the peak of the
likelihood function. These curvature measures can imply 0 correlation between
two parameters that are truly dependent (e.g. a u shaped relationship). Thus I
would encourage all modellers to think of a “model” as these 1000 samples from
the P dimensional joint distribution of the parameters (e.g. I put 0.1% of my
belief to each of the 1000 parameter sets). Crude summaries of the marginal
distributions of each parameter that are typically reported (such as
mean/median, SE, 95% CrI(CI) etc.) are univariate, but our model is always
multivariate (hence why any model published is only useful if the authors
(minimally) provide the correlation matrix in addition to point estimates and
SE's.). In a Bayesian MCMC world you can make the P*P coplot to visualise these
correlations…for example, this could show a u shaped relationship between two
parameters, whilst the equivalent MVN plot may (incorrectly) suggest the two
parameters are uncorrelated; in this case, the MVN approximation is not working
well, and hence could yield simulations that are not realistic (i.e. is the
model poor, or the MVN approximation poor?). When you realise you really do
want to draw random samples of the parameters from their joint distributions
based on a given model+data, you will realise why Bayesian MCMC (via
Hamiltonian MC (HMC)) is the way to go. After fitting a number of complex NLME
models in both MLE and Bayesian HMC frameworks over the last 10-15 years, I
consider the MLE+MVN as plain ugly (fast yes, but “best” no).
On the condition number (CN) topic
From a modelling perspective, the CN only makes sense for the correlation
matrix (not the covariance matrix or hessian (both are scale dependent!)), and
I understand this is what NM correctly reports. As Ken mentioned, you should
always try to understand high correlations, and a heat plot showing the P*P
correlations is a good place to start. This may suggest a group of parameters
that can be reparameterised to reduce their correlations (e.g. to model 5 Emax
parameters as offsets to one Emax parameter, rather than as 6 unique Emax
parameters).
The CN will increase with any increase in the number of parameters (since no
additional parameter will be perfectly orthogonal to all other parameters).
Hence any “rules of thumb” are limited.
Lower correlations between parameters are desirable. From a numerical
perspective, a correlation of 0.999 and 0.998 may be (numerically) as similar
as a correlation of 0.002 and 0.001, but the former will have a greater
influence on your subsequence simulations. High correlations may be reflective
of poor trial design; consider using a better trial design (see next comment!).
Lower correlations also help most estimation algorithms work quicker, so some
effort to better parameterise your model is often worthwhile (although Radford
Neil showed that HMC still works very well with a CN of 10000 using 100
parameters https://youtu.be/ZGtezhDaSpM?t=1659
https://youtu.be/ZGtezhDaSpM?t=1659).
On fitting Dose-Exposure-Response (D-E-R) models to weak data, and why a linear
D-E-R model is never sound
Firstly, all “dose-ranging” and “dose-finding” trials state their objective is
to “determine the dose (exposure) response relationship”. However most have
awful designs, with far too few dose levels that are far too closely spaced,
with small N. Any simple simulation and re-estimation effort would show just
how awful they are; they are invariably incapable of accurately and precisely
the D-E-R relationship (for efficacy or safety). Pharma must do so much better
here!
Thus the first goal for any analyst is to ensure any trial design you will be
asked to analyse is fit for D-E-R modelling BEFORE you get any data. To avoid
the following “car crash” scenario, you need to use a suitable model (like the
sigmoidal Emax model), determine the range of credible parameter values, and
then use simulation re-estimation to show your company how well/poorly
different candidate designs will be at quantifying the true D-E-R relationship.
You will find that designs like Placebo, 1 mg, 2 mg and 4 mg are really poor;
you typically need much wider dose ranges, with wider dose spacing, and large
N. Optimal/adaptive D-E-R trials are best.
The “car crash” scenario which I, and many, will have seen is being presented
with “weak” data from poorly designed D-E-R trials. The analyst is given data,
for example, data from Placebo, 1 mg, 2 mg and 4 mg from a (too small) trial
and asked to perform the D-E-R modelling. The doses are far too closely spaced,
and the observed data may even appear “linear”. Some awful choices are:
1) Option 1: Refer to your simulation/re-estimation work - you told them it
would be a “car-crash”.
2) Option 2: Weak: Fit moderate priors for Emax and Hill, and get 1000 D-E-R
curves. Summarise, but warn that there is a lot of “guessing” here. Refer
senior management back to 1), and politely ask they listen to you next time :-)
3) Option 3: Wrong: Fit a linear D-E-R model. Do not do this! Think about the
1000 D-E-R you will show; 1000 linear relationships. As an expert in clinical
pharmacology, how can you present such nonsense!? You KNOW D-E-R relationships
are non-linear!
Option 2 will produce a wide “band” of D-E-R curves, with the 1000 D-E-R curves
more honestly reflecting the “ignorance” that such a weak design and dataset
yield. Comparisons across doses will be (rightly) imprecise.
In contrast, Option 3 will yield a “bow-tie” shaped uncertainty, with high
precision around the middle of the dose range, with wider uncertainty towards
the extremes. This precision in the middle of the dose range is a “fools
fallacy”, since it conditions on the fact that the true D-E-R is 100% linear.
But you know that is wrong. Appearing linear and being linear are totally
different. If I gave you placebo (dose=0) and dose = 100 mg, would you equally
enjoy the narrow precision at 50 mg?? I hope not! Alas great D-E-R modelling
often needs very wide dose ranges (think 10-100 fold) and large N if you truly
wish to precisely estimate D-E-R relationships. Linear D-E-R models are plain
wrong.
I am quite unfamiliar with NM, but Bob Bauer told me a few years ago that a
full Bayesian HMC analysis was supported, so using moderate priors for Emax and
Hill should be possible (as in Option 2). In STAN, Option 2 works well in those
cases where you cannot influence the trial designs (e.g. in MBMA work, where
you integrate D-R models across many drugs/classes, often with limited dose
ranges for each drug e.g. like here
https://ascpt.onlinelibrary.wiley.com/doi/10.1002/cpt.1307
https://ascpt.onlinelibrary.wiley.com/doi/10.1002/cpt.1307).
I hope this is clear. Feel free to reach out to me personally if you would like
to follow up “quietly” on anything in the above.
Best wishes,
Al
Al Maloney PhD
Consultant Pharmacometrician