Re: Observed (yaxis) vs Predicted (xaxis) Diagnostic Plot - Scientific basis.
Hi Joga, Wilbert,
It indeed is an interesting aspect. I was triggered to think about this during my masters research (with dense time series), and for me it was helpful to think about orthogonal regression. One can find and compare the expressions in the wikipedia entries at https://en.wikipedia.org/wiki/Deming_regression#Orthogonal_regression and https://en.wikipedia.org/wiki/Simple_linear_regression to see how it impacts. [ Side note: As the authors in the paper you referenced Wilbert, express, the correlation coefficient r is symmetric for x and y and is not impacted.] A good example of how it changes the fit can be found in the last figure of this blog < https://www.r-bloggers.com/2018/10/about-a-curious-feature-and-interpretation-of-linear-regressions/ >: basically linear regression goes through the middle of the cloud at the edges in the y-direction, while orthogonal goes through them balanced perpendicular to the linear relation.
But in the end it also goes down to the general expectation in regression to put the independent variable without error on the x-axis and the dependent variable on the y-axis. From this we can derive it is best to put the observations on the y-axis (*). Therefore we have two reasons to adhere to the approach of putting observed on the y-axis and predicted on the x-axis.
Hope this helps,
Jeroen
(*) Whether or not the predictions are without (residual) error is a matter of debate and situation. If we go from PRED predictions to PRED when the model has a covariate, to post-hoc predictions, the amount of randomness increases. The observed values nevertheless will retain most randomness and therefore are expected on the y-axis.
http://pd-value.com
[email protected]
@PD_value
+31 6 23118438
-- More value out of your data!
Quoted reply history
On 18-08-2023 08:07, Wilbert de Witte wrote:
> Hi Joga,
>
> Fully agree on this, unfortunately it is still often shown the other way around which is at least confusing. There is a publication on this very topic here < https://www.sciencedirect.com/science/article/abs/pii/S0304380008002305 > that arrives at the same conclusion and can be helpful.
>
> Best,
>
> Wilbert
>
> Op do 17 aug 2023 om 19:47 schreef Gobburu, Joga < [email protected] >:
>
> Dear James – how have you been?
>
> Yes, you said it most eloquently. Its not about plotting per se
> but “the problem is really that the loess line is fitting noise in
> the wrong direction if the observed is actually on the x-axis”.
> Thank you…J
>
> *From: *James G Wright <[email protected]>
> *Date: *Thursday, August 17, 2023 at 7:16 AM
> *To: *Gobburu, Joga <[email protected]>,
> [email protected] <[email protected]>
> *Subject: *Re: [NMusers] Observed (yaxis) vs Predicted (xaxis)
> Diagnostic Plot - Scientific basis.
>
> You don't often get email from [email protected]. Learn why
> this is important https://aka.ms/LearnAboutSenderIdentification
>
> *CAUTION: *This message originated from a non-UMB email system.
> Hover over any links before clicking and use caution opening
> attachments.
>
> So whichever axis the observed data is plotted on is parallel to
> the direction of noise (random residual error). When you fit the
> loess line, I think it will generally assume noise is vertical
> i.e. parallel to the y-axis. So the problem is really that the
> loess line is fitting noise in the wrong direction if the observed
> is actually on the x-axis ... which means you are right, the
> observed needs to go on the y-axis and deviations need to be
> interpreted parallel to the y-axis.
>
> Kind regards, James
>
> https://product.popypkpd.com/
>
> PS Of course, if you were to fit a loess line with horizontal
> noise and observed data on the x-axis, you should reach identical
> conclusions to the conventional vertical noise and observed data
> on the y-axis.
>
> On 17/08/2023 11:35, Gobburu, Joga wrote:
>
> Dear Friends – Observations versus population predicted is
> considered a standard diagnostic plot in our field. I used to
> place observations on the x-axis and predictions on the yaxis.
> Then I was pointed to a publication from ISOP
>
> ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5321813/figure/psp412161-fig-0001/)
> which recommended plotting predictions on the xaxis and
> observations on the yaxis. To the best of my knowledge, there
> was no justification provided. It did question my decades old
> practice, so I did some thinking and digging. Thought to share
> it here so others might benefit from it. If this is obvious to
> you all, then I can say I am caught up!
>
> 1. We write our models as observed = predicted + random
> error; which can be interpreted to be in the form: y =
> f(x) + random error. It is technically not though. Hence
> predicted goes on the xaxis, as it is free of random
> error. It is considered a correlation plot, which makes
> plotting either way acceptable. This is not so critical as
> the next one.
> 2. However, there is a statistical reason why it is important
> to keep predictions on the xaxis. Invariably we always add
> a loess trend line for these diagnostic plots. To
> demonstrate the impact, I took a simple iv bolus single
> dose dataset and compared both approaches. The results are
> available at this link:
>
> https://github.com/jgobburu/public_didactic/blob/main/iv_sd.html.pdf.
> I used Pumas software, but the scientific underpinning is
> agnostic to software. See the two plots on Pages 5 and 6.
> The interpretation of the bias between the two approaches
> is different. This is the statistical reason why it
> matters to plot predictions on the xaxis.
>
> Joga Gobburu
>
> University of Maryland
>
> --
>
> James G Wright PhD,
>
> Scientist, Wright Dose Ltd
>
> Tel: UK (0)772 5636914