Slow PRED in tables?

8 messages 5 people Latest: Feb 04, 2014

Slow PRED in tables?

From: Doug J. Eleveld Date: January 28, 2014 technical

Hello everyone, I have a curious problem with slow PRED calculations in tables. The estimations are reasonably fast, 471 seconds for 23 iterations. If there is no PRED in any tables then NONMEM finishes a moment after the message about elapsed estimation time. But if is PRED in a table then it takes an extremely long time to return, I waited more than 10 hours before I broke off the run. If convergence of the algorithm is reasonably fast I get the impression that the differential equations are not problematic (it is $DES solved with ADVAN13 TOL=9). In fact if I restart the run with all $OMEGA to (0 FIXED) - to force all ETAs to zero - and do a MAX=0 run then it is quite fast. I would expect calculating PRED would be simply setting ETA=0 and solving the differential equations just once. Am I missing anything in how PRED in a table is calculated? Has anyone seen this kind of thing before? Any advice or tips? Warm regards, Douglas Eleveld ________________________________

RE: Slow PRED in tables?

From: Robert Bauer Date: January 29, 2014 technical

Douglas: Thank you for sharing your code and data with me. When PRED, WRES, or RES are requested, NONMEM calls the weighted residual routine PRRES, which calculates PRED, WRES, and RES together. Your problem has a large number of data per subject (up to 1544) , and evaluating WRES requires some matrix algebra on up to a 1544x1544 matrix, which can take time to compute for each subject. The standard method of calculating WRES in NONMEM 7.3 and lower versions is inefficient, and I have made it more efficient for a future version of nonmem 7.4, so your TABLE step will calculate in 15 minutes. Meantime, in recently released NONMEM 7.3, you can use the WRESCHOL option, and the $TABLE step was calculated in 3 minutes. Robert J. Bauer, Ph.D. Vice President, Pharmacometrics, R&D ICON Development Solutions 7740 Milestone Parkway Suite 150 Hanover, MD 21076 Tel: (215) 616-6428 Mob: (925) 286-0769 Email: [email protected]<mailto:[email protected]> Web: http://www.iconplc.com/

Quoted reply history

From: [email protected] [mailto:[email protected]] On Behalf Of Eleveld, DJ Sent: Tuesday, January 28, 2014 9:19 AM To: [email protected] Subject: [NMusers] Slow PRED in tables? Hello everyone, I have a curious problem with slow PRED calculations in tables. The estimations are reasonably fast, 471 seconds for 23 iterations. If there is no PRED in any tables then NONMEM finishes a moment after the message about elapsed estimation time. But if is PRED in a table then it takes an extremely long time to return, I waited more than 10 hours before I broke off the run. If convergence of the algorithm is reasonably fast I get the impression that the differential equations are not problematic (it is $DES solved with ADVAN13 TOL=9). In fact if I restart the run with all $OMEGA to (0 FIXED) - to force all ETAs to zero - and do a MAX=0 run then it is quite fast. I would expect calculating PRED would be simply setting ETA=0 and solving the differential equations just once. Am I missing anything in how PRED in a table is calculated? Has anyone seen this kind of thing before? Any advice or tips? Warm regards, Douglas Eleveld ________________________________

RE: Slow PRED in tables?

From: Bob Leary Date: January 30, 2014 technical

Doug and Bob, This is point that I have long been interested in. For each individual, the original WRES in NONMEM is based on a Schur decomposition of the estimated covariance matrix (I have verified this numerically using MATLAB) of the residual vector. Thus if the individual covariance matrix of the residuals is C, and the residual vector is r, WRES is define as C^(-1/2)*r, where C= U*diag(lambda)*U' , and the columns of U are the eigenvectors of C, and lambda is the diagonal matrix of eigenvalues. The main point of the computation is that the covariance of C^(-1/2)*r is a unit matrix, where C^(-1/2) = U*lambda(^-1.2)*U' has the rough interpretation of being a square root of C^(-1). Further diagnostic analysis of the components of WRES treats them as if they were independent N(0,1) random variables since indeed are are uncorrelated and normalized to have unit variance (although the normality assumption is doubtful for hgihly nonlinear models). Thus the computation of WRES in NM7.2 requires the computation of all eigenvalues and eigenvectors of the C matrix, which for large matrices can start to be expensive (although the best LAPACK implementation of a symmetric positive definite eigendecomposition uses Jim Demmel's O(N^1/3) routine and maybe considerably faster than what you currently seeing in NM7.2 - I have no way of knowing what NM7.2 actually uses.) But in any event, all the WRES computation is doing is de-correlating and normalizing the residuals. Any matrix W such that W*r has a covariance matrix equal to the unit matrix will do. In particular, choosing W = lower triangular Cholesky decompostion of C^(-1), works just fine and is much faster to compute than to compute than a Schur eigen-decompostion. Based on the name, I assume that WRESCHOL is just WRES implemented with such a Cholesky decomposition. WHich is better - WRES or WRESCHOL ? This is difficult question without further information. It does point out that the details of the WRES results, no matter how done, should be take with a large grain of salt (WRESCHOL numerically will look totally different than WRES, yet they are both equally valid de-correlation/normalizations) Electrical engineers have long been aware of this de-correlation/normalization ambiguity. Basically if C(^-1/2) is any matrix that decorrelates and normalizes the residuals, so is U*C^(-1/2), where U is any orthogonal matrix (U'*U=I) Thus there are an infinite number of choices to use for computing a reasonable WRES for further diagnostic analysis. In the electrical engineering community, any matrix of the form U*C^(-1/2) is called a whitening matrix, since it transforms correlated data into 'whitened', decorrelated data with unit variances. In order to decide which decorrelation is best, one must impose an optimization criterion to pick the particular best whitening matrix, This is not so easy to do,but has given rise to a large literature under the general name of 'projection pursuit' methods. I have long suspected there may be something here worth looking into with respect to population PK/PD model evaluation techniques, but his is still an open question.

Quoted reply history

________________________________________ From: [email protected] [[email protected]] On Behalf Of Bauer, Robert [[email protected]] Sent: Wednesday, January 29, 2014 3:34 PM To: Eleveld, DJ; [email protected] Subject: [NMusers] RE: Slow PRED in tables? Douglas: Thank you for sharing your code and data with me. When PRED, WRES, or RES are requested, NONMEM calls the weighted residual routine PRRES, which calculates PRED, WRES, and RES together. Your problem has a large number of data per subject (up to 1544) , and evaluating WRES requires some matrix algebra on up to a 1544x1544 matrix, which can take time to compute for each subject. The standard method of calculating WRES in NONMEM 7.3 and lower versions is inefficient, and I have made it more efficient for a future version of nonmem 7.4, so your TABLE step will calculate in 15 minutes. Meantime, in recently released NONMEM 7.3, you can use the WRESCHOL option, and the $TABLE step was calculated in 3 minutes. Robert J. Bauer, Ph.D. Vice President, Pharmacometrics, R&D ICON Development Solutions 7740 Milestone Parkway Suite 150 Hanover, MD 21076 Tel: (215) 616-6428 Mob: (925) 286-0769 Email: [email protected]<mailto:[email protected]> Web: http://www.iconplc.com/ From: [email protected] [mailto:[email protected]] On Behalf Of Eleveld, DJ Sent: Tuesday, January 28, 2014 9:19 AM To: [email protected] Subject: [NMusers] Slow PRED in tables? Hello everyone, I have a curious problem with slow PRED calculations in tables. The estimations are reasonably fast, 471 seconds for 23 iterations. If there is no PRED in any tables then NONMEM finishes a moment after the message about elapsed estimation time. But if is PRED in a table then it takes an extremely long time to return, I waited more than 10 hours before I broke off the run. If convergence of the algorithm is reasonably fast I get the impression that the differential equations are not problematic (it is $DES solved with ADVAN13 TOL=9). In fact if I restart the run with all $OMEGA to (0 FIXED) - to force all ETAs to zero - and do a MAX=0 run then it is quite fast. I would expect calculating PRED would be simply setting ETA=0 and solving the differential equations just once. Am I missing anything in how PRED in a table is calculated? Has anyone seen this kind of thing before? Any advice or tips? Warm regards, Douglas Eleveld ________________________________

RE: Slow PRED in tables?

From: Doug J. Eleveld Date: January 30, 2014 technical

Hi Robert, Thank you very much for your feedback! Indeed, I only saw this phenomenon with large numbers of observations per individual, so that makes sense. However, to get back to the original problem, I did not request WRES be calculated, only PRED. This is OK if the algorithm for WRES can be made sufficiently fast. The end result is kind of odd though. If I want PRED as fast as possible, I have to use an option (WRESCHOL) for something else (WRES) which gets subsequently deleted. I would think that it make more sense to only calculate WRES when it is needed. But I can imagine there could be all sorts of practical considerations with legacy code. warm regards, Douglas Eleveld ________________________________

Quoted reply history

Van: Bauer, Robert [mailto:[email protected]] Verzonden: January 29, 2014 10:34 PM Aan: Eleveld, DJ; [email protected] Onderwerp: RE: Slow PRED in tables? Douglas: Thank you for sharing your code and data with me. When PRED, WRES, or RES are requested, NONMEM calls the weighted residual routine PRRES, which calculates PRED, WRES, and RES together. Your problem has a large number of data per subject (up to 1544) , and evaluating WRES requires some matrix algebra on up to a 1544x1544 matrix, which can take time to compute for each subject. The standard method of calculating WRES in NONMEM 7.3 and lower versions is inefficient, and I have made it more efficient for a future version of nonmem 7.4, so your TABLE step will calculate in 15 minutes. Meantime, in recently released NONMEM 7.3, you can use the WRESCHOL option, and the $TABLE step was calculated in 3 minutes. Robert J. Bauer, Ph.D. Vice President, Pharmacometrics, R&D ICON Development Solutions 7740 Milestone Parkway Suite 150 Hanover, MD 21076 Tel: (215) 616-6428 Mob: (925) 286-0769 Email: [email protected]<mailto:[email protected]> Web: http://www.iconplc.com/ From: [email protected] [mailto:[email protected]] On Behalf Of Eleveld, DJ Sent: Tuesday, January 28, 2014 9:19 AM To: [email protected] Subject: [NMusers] Slow PRED in tables? Hello everyone, I have a curious problem with slow PRED calculations in tables. The estimations are reasonably fast, 471 seconds for 23 iterations. If there is no PRED in any tables then NONMEM finishes a moment after the message about elapsed estimation time. But if is PRED in a table then it takes an extremely long time to return, I waited more than 10 hours before I broke off the run. If convergence of the algorithm is reasonably fast I get the impression that the differential equations are not problematic (it is $DES solved with ADVAN13 TOL=9). In fact if I restart the run with all $OMEGA to (0 FIXED) - to force all ETAs to zero - and do a MAX=0 run then it is quite fast. I would expect calculating PRED would be simply setting ETA=0 and solving the differential equations just once. Am I missing anything in how PRED in a table is calculated? Has anyone seen this kind of thing before? Any advice or tips? Warm regards, Douglas Eleveld ________________________________

RE: Slow PRED in tables?

From: Robert Bauer Date: January 30, 2014 technical

Bob: That is correct, the description for WRESCHOL is listed in nm730.pdf as follows: WRESCHOL (NM73) Normally, population and individual weighted residuals are evaluated by square root of the eigenvalues of the population or individual residual variance. However, an alternative method is to Cholesky decompose the residual variance (suggested by France Mentre, personal communication), by entering the WRESCHOL option. This should be specified only on the first $TABLE command. The Cholesky form has the property of sequentially decorrelating each additional data point in the order of the data set. The standard decomposition method (SVD) used in NONMEM for WRES is by singular value decomposition, which for symmetric matrices, is equivalent to the method of obtaining eigenvalues and eigenvector matrices. Subsequently, the eigenvalues are square rooted, and the "square root of the matrix" is constructed from them and the original eigenvector matrix. The SVD method can be several fold slower than the Cholesky decomposition method. However, there was also some inefficient code that constructed the square root matrix after the SVD step, which resulted in Douglas's problem running several hundred times longer than necessary, due to the very large number of data per subject in his problem. The code has been made more efficient, and will be available in nonmem 7.4. Meanwhile, the WRESHCOL option has efficient code throughout and is available to use in nm73, and can be used as an alternative method, although as Bob said, these WRES's will not map 1 to 1 to the SVD method, but should have the usual statistical properties of decorrelated samples. Robert J. Bauer, Ph.D. Vice President, Pharmacometrics, R&D ICON Development Solutions 7740 Milestone Parkway Suite 150 Hanover, MD 21076 Tel: (215) 616-6428 Mob: (925) 286-0769 Email: [email protected] Web: www.iconplc.com

Quoted reply history

-----Original Message----- From: Bob Leary [mailto:[email protected]] Sent: Wednesday, January 29, 2014 7:03 PM To: Bauer, Robert; Eleveld, DJ; [email protected] Subject: RE: Slow PRED in tables? Doug and Bob, This is point that I have long been interested in. For each individual, the original WRES in NONMEM is based on a Schur decomposition of the estimated covariance matrix (I have verified this numerically using MATLAB) of the residual vector. Thus if the individual covariance matrix of the residuals is C, and the residual vector is r, WRES is define as C^(-1/2)*r, where C= U*diag(lambda)*U' , and the columns of U are the eigenvectors of C, and lambda is the diagonal matrix of eigenvalues. The main point of the computation is that the covariance of C^(-1/2)*r is a unit matrix, where C^(-1/2) = U*lambda(^-1.2)*U' has the rough interpretation of being a square root of C^(-1). Further diagnostic analysis of the components of WRES treats them as if they were independent N(0,1) random variables since indeed are are uncorrelated and normalized to have unit variance (although the normality assumption is doubtful for hgihly nonlinear models). Thus the computation of WRES in NM7.2 requires the computation of all eigenvalues and eigenvectors of the C matrix, which for large matrices can start to be expensive (although the best LAPACK implementation of a symmetric positive definite eigendecomposition uses Jim Demmel's O(N^1/3) routine and maybe considerably faster than what you currently seeing in NM7.2 - I have no way of knowing what NM7.2 actually uses.) But in any event, all the WRES computation is doing is de-correlating and normalizing the residuals. Any matrix W such that W*r has a covariance matrix equal to the unit matrix will do. In particular, choosing W = lower triangular Cholesky decompostion of C^(-1), works just fine and is much faster to compute than to compute than a Schur eigen-decompostion. Based on the name, I assume that WRESCHOL is just WRES implemented with such a Cholesky decomposition. WHich is better - WRES or WRESCHOL ? This is difficult question without further information. It does point out that the details of the WRES results, no matter how done, should be take with a large grain of salt (WRESCHOL numerically will look totally different than WRES, yet they are both equally valid de-correlation/normalizations) Electrical engineers have long been aware of this de-correlation/normalization ambiguity. Basically if C(^-1/2) is any matrix that decorrelates and normalizes the residuals, so is U*C^(-1/2), where U is any orthogonal matrix (U'*U=I) Thus there are an infinite number of choices to use for computing a reasonable WRES for further diagnostic analysis. In the electrical engineering community, any matrix of the form U*C^(-1/2) is called a whitening matrix, since it transforms correlated data into 'whitened', decorrelated data with unit variances. In order to decide which decorrelation is best, one must impose an optimization criterion to pick the particular best whitening matrix, This is not so easy to do,but has given rise to a large literature under the general name of 'projection pursuit' methods. I have long suspected there may be something here worth looking into with respect to population PK/PD model evaluation techniques, but his is still an open question. ________________________________________ From: [email protected] [[email protected]] On Behalf Of Bauer, Robert [[email protected]] Sent: Wednesday, January 29, 2014 3:34 PM To: Eleveld, DJ; [email protected] Subject: [NMusers] RE: Slow PRED in tables? Douglas: Thank you for sharing your code and data with me. When PRED, WRES, or RES are requested, NONMEM calls the weighted residual routine PRRES, which calculates PRED, WRES, and RES together. Your problem has a large number of data per subject (up to 1544) , and evaluating WRES requires some matrix algebra on up to a 1544x1544 matrix, which can take time to compute for each subject. The standard method of calculating WRES in NONMEM 7.3 and lower versions is inefficient, and I have made it more efficient for a future version of nonmem 7.4, so your TABLE step will calculate in 15 minutes. Meantime, in recently released NONMEM 7.3, you can use the WRESCHOL option, and the $TABLE step was calculated in 3 minutes. Robert J. Bauer, Ph.D. Vice President, Pharmacometrics, R&D ICON Development Solutions 7740 Milestone Parkway Suite 150 Hanover, MD 21076 Tel: (215) 616-6428 Mob: (925) 286-0769 Email: [email protected]<mailto:[email protected]> Web: http://www.iconplc.com/ From: [email protected] [mailto:[email protected]] On Behalf Of Eleveld, DJ Sent: Tuesday, January 28, 2014 9:19 AM To: [email protected] Subject: [NMusers] Slow PRED in tables? Hello everyone, I have a curious problem with slow PRED calculations in tables. The estimations are reasonably fast, 471 seconds for 23 iterations. If there is no PRED in any tables then NONMEM finishes a moment after the message about elapsed estimation time. But if is PRED in a table then it takes an extremely long time to return, I waited more than 10 hours before I broke off the run. If convergence of the algorithm is reasonably fast I get the impression that the differential equations are not problematic (it is $DES solved with ADVAN13 TOL=9). In fact if I restart the run with all $OMEGA to (0 FIXED) - to force all ETAs to zero - and do a MAX=0 run then it is quite fast. I would expect calculating PRED would be simply setting ETA=0 and solving the differential equations just once. Am I missing anything in how PRED in a table is calculated? Has anyone seen this kind of thing before? Any advice or tips? Warm regards, Douglas Eleveld ________________________________

RE: Slow PRED in tables?

From: Jerry Nedelman Date: February 03, 2014 technical

Bobs: Thanks for these insights. I've often wondered about the use of weighted residuals as diagnostics where the pattern of residuals against an observation-related covariate, such as time, is examined. Each weighted residual is a linear combination of all residuals, so it is not attached to any unique time. How serious a concern is this? Is it possible that the pattern could be misleading because of how the weights are distributed? Is there some way of knowing how strong the attachment is? Could the strength of the attachment be an optimization criterion for picking the whitening matrix? Might the choice of best whitening matrix depend on the covariate being diagnosed? Thanks, Jerry

Quoted reply history

-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Bob Leary Sent: Wednesday, January 29, 2014 7:03 PM To: Bauer, Robert; Eleveld, DJ; [email protected] Subject: [NMusers] RE: Slow PRED in tables? Doug and Bob, This is point that I have long been interested in. For each individual, the original WRES in NONMEM is based on a Schur decomposition of the estimated covariance matrix (I have verified this numerically using MATLAB) of the residual vector. Thus if the individual covariance matrix of the residuals is C, and the residual vector is r, WRES is define as C^(-1/2)*r, where C= U*diag(lambda)*U' , and the columns of U are the eigenvectors of C, and lambda is the diagonal matrix of eigenvalues. The main point of the computation is that the covariance of C^(-1/2)*r is a unit matrix, where C^(-1/2) = U*lambda(^-1.2)*U' has the rough interpretation of being a square root of C^(-1). Further diagnostic analysis of the components of WRES treats them as if they were independent N(0,1) random variables since indeed are are uncorrelated and normalized to have unit variance (although the normality assumption is doubtful for hgihly nonlinear models). Thus the computation of WRES in NM7.2 requires the computation of all eigenvalues and eigenvectors of the C matrix, which for large matrices can start to be expensive (although the best LAPACK implementation of a symmetric positive definite eigendecomposition uses Jim Demmel's O(N^1/3) routine and maybe considerably faster than what you currently seeing in NM7.2 - I have no way of knowing what NM7.2 actually uses.) But in any event, all the WRES computation is doing is de-correlating and normalizing the residuals. Any matrix W such that W*r has a covariance matrix equal to the unit matrix will do. In particular, choosing W = lower triangular Cholesky decompostion of C^(-1), works just fine and is much faster to compute than to compute than a Schur eigen-decompostion. Based on the name, I assume that WRESCHOL is just WRES implemented with such a Cholesky decomposition. WHich is better - WRES or WRESCHOL ? This is difficult question without further information. It does point out that the details of the WRES results, no matter how done, should be take with a large grain of salt (WRESCHOL numerically will look totally different than WRES, yet they are both equally valid de-correlation/normalizations) Electrical engineers have long been aware of this de-correlation/normalization ambiguity. Basically if C(^-1/2) is any matrix that decorrelates and normalizes the residuals, so is U*C^(-1/2), where U is any orthogonal matrix (U'*U=I) Thus there are an infinite number of choices to use for computing a reasonable WRES for further diagnostic analysis. In the electrical engineering community, any matrix of the form U*C^(-1/2) is called a whitening matrix, since it transforms correlated data into 'whitened', decorrelated data with unit variances. In order to decide which decorrelation is best, one must impose an optimization criterion to pick the particular best whitening matrix, This is not so easy to do,but has given rise to a large literature under the general name of 'projection pursuit' methods. I have long suspected there may be something here worth looking into with respect to population PK/PD model evaluation techniques, but his is still an open question. ________________________________________ From: [email protected] [[email protected]] On Behalf Of Bauer, Robert [[email protected]] Sent: Wednesday, January 29, 2014 3:34 PM To: Eleveld, DJ; [email protected] Subject: [NMusers] RE: Slow PRED in tables? Douglas: Thank you for sharing your code and data with me. When PRED, WRES, or RES are requested, NONMEM calls the weighted residual routine PRRES, which calculates PRED, WRES, and RES together. Your problem has a large number of data per subject (up to 1544) , and evaluating WRES requires some matrix algebra on up to a 1544x1544 matrix, which can take time to compute for each subject. The standard method of calculating WRES in NONMEM 7.3 and lower versions is inefficient, and I have made it more efficient for a future version of nonmem 7.4, so your TABLE step will calculate in 15 minutes. Meantime, in recently released NONMEM 7.3, you can use the WRESCHOL option, and the $TABLE step was calculated in 3 minutes. Robert J. Bauer, Ph.D. Vice President, Pharmacometrics, R&D ICON Development Solutions 7740 Milestone Parkway Suite 150 Hanover, MD 21076 Tel: (215) 616-6428 Mob: (925) 286-0769 Email: [email protected]<mailto:[email protected]> Web: http://www.iconplc.com/ From: [email protected] [mailto:[email protected]] On Behalf Of Eleveld, DJ Sent: Tuesday, January 28, 2014 9:19 AM To: [email protected] Subject: [NMusers] Slow PRED in tables? Hello everyone, I have a curious problem with slow PRED calculations in tables. The estimations are reasonably fast, 471 seconds for 23 iterations. If there is no PRED in any tables then NONMEM finishes a moment after the message about elapsed estimation time. But if is PRED in a table then it takes an extremely long time to return, I waited more than 10 hours before I broke off the run. If convergence of the algorithm is reasonably fast I get the impression that the differential equations are not problematic (it is $DES solved with ADVAN13 TOL=9). In fact if I restart the run with all $OMEGA to (0 FIXED) - to force all ETAs to zero - and do a MAX=0 run then it is quite fast. I would expect calculating PRED would be simply setting ETA=0 and solving the differential equations just once. Am I missing anything in how PRED in a table is calculated? Has anyone seen this kind of thing before? Any advice or tips? Warm regards, Douglas Eleveld ________________________________

RE: Slow PRED in tables?

From: Bob Leary Date: February 03, 2014 technical

Jerry, you are correct - the temporal pattern of the WRES components WRES(I) could be quite misleading. One simple measure of agreement might be correlation between WRES(I) and the actual residual RES(I). For the case of the Cholesky decomposition, if we have M observations, then the 1st component WRES(1) actually just depends on the First residual RES(1) , so there is a 100% correlation between WRES(1) and RES(1). But WRES(2) depends on RES(1) and RES(2), and more generally WRES(K) depends on the First K residuals. So the correlation between RES(i) and WRES(i) gets weaker for the later values. In the case of the eigen-decomposition version of WRES, each individual WRES(I) value depend on all the RES(I) values, but there is still a positive correlation Between RES(I) and WRES(I). I looked at this some time ago, and I think I think I convinced myself that the eigenvector method actually maximizes some average measure of correlation (or more likely squared correlation) Between RES(I) and WRES(I), but offhand I don't remember the details and whether or not I actually was able to prove something. One thing you need to be careful about is picking the whitening matrix to be a function of the residual data RES(i). In this case the whitening matrix becomes a random matrix and We can not longer be sure that the resulting product of the whitening matrix and the random residuals on which it depends is actually anything close to normally distributed with Independent unit variance components. For an extreme example, for any given WRES vector, one could always multiply it By an orthogonal Householder matrix H to get WRESNEW = H*WRES, where H*H'=I, but H is selected to make, for example, the first component of WRESNEW equal to a positive value and all the other Components equal to zero. Such a selection would be close to useless as a diagnostic. My own conjecture is that for each particular model and experimental design, there is a whitening matrix which is best in some sense for exposing model misspecification, and That perhaps the projection pursuit methodology , in combination with a large number of simulations from that model, could be used to find it. Whether this is a practical idea, I don't know.

Quoted reply history

-----Original Message----- From: Nedelman, Jerry [mailto:[email protected]] Sent: Monday, February 03, 2014 8:06 AM To: Bob Leary; Bauer, Robert; [email protected] Subject: RE: Slow PRED in tables? Bobs: Thanks for these insights. I've often wondered about the use of weighted residuals as diagnostics where the pattern of residuals against an observation-related covariate, such as time, is examined. Each weighted residual is a linear combination of all residuals, so it is not attached to any unique time. How serious a concern is this? Is it possible that the pattern could be misleading because of how the weights are distributed? Is there some way of knowing how strong the attachment is? Could the strength of the attachment be an optimization criterion for picking the whitening matrix? Might the choice of best whitening matrix depend on the covariate being diagnosed? Thanks, Jerry -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Bob Leary Sent: Wednesday, January 29, 2014 7:03 PM To: Bauer, Robert; Eleveld, DJ; [email protected] Subject: [NMusers] RE: Slow PRED in tables? Doug and Bob, This is point that I have long been interested in. For each individual, the original WRES in NONMEM is based on a Schur decomposition of the estimated covariance matrix (I have verified this numerically using MATLAB) of the residual vector. Thus if the individual covariance matrix of the residuals is C, and the residual vector is r, WRES is define as C^(-1/2)*r, where C= U*diag(lambda)*U' , and the columns of U are the eigenvectors of C, and lambda is the diagonal matrix of eigenvalues. The main point of the computation is that the covariance of C^(-1/2)*r is a unit matrix, where C^(-1/2) = U*lambda(^-1.2)*U' has the rough interpretation of being a square root of C^(-1). Further diagnostic analysis of the components of WRES treats them as if they were independent N(0,1) random variables since indeed are are uncorrelated and normalized to have unit variance (although the normality assumption is doubtful for hgihly nonlinear models). Thus the computation of WRES in NM7.2 requires the computation of all eigenvalues and eigenvectors of the C matrix, which for large matrices can start to be expensive (although the best LAPACK implementation of a symmetric positive definite eigendecomposition uses Jim Demmel's O(N^1/3) routine and maybe considerably faster than what you currently seeing in NM7.2 - I have no way of knowing what NM7.2 actually uses.) But in any event, all the WRES computation is doing is de-correlating and normalizing the residuals. Any matrix W such that W*r has a covariance matrix equal to the unit matrix will do. In particular, choosing W = lower triangular Cholesky decompostion of C^(-1), works just fine and is much faster to compute than to compute than a Schur eigen-decompostion. Based on the name, I assume that WRESCHOL is just WRES implemented with such a Cholesky decomposition. WHich is better - WRES or WRESCHOL ? This is difficult question without further information. It does point out that the details of the WRES results, no matter how done, should be take with a large grain of salt (WRESCHOL numerically will look totally different than WRES, yet they are both equally valid de-correlation/normalizations) Electrical engineers have long been aware of this de-correlation/normalization ambiguity. Basically if C(^-1/2) is any matrix that decorrelates and normalizes the residuals, so is U*C^(-1/2), where U is any orthogonal matrix (U'*U=I) Thus there are an infinite number of choices to use for computing a reasonable WRES for further diagnostic analysis. In the electrical engineering community, any matrix of the form U*C^(-1/2) is called a whitening matrix, since it transforms correlated data into 'whitened', decorrelated data with unit variances. In order to decide which decorrelation is best, one must impose an optimization criterion to pick the particular best whitening matrix, This is not so easy to do,but has given rise to a large literature under the general name of 'projection pursuit' methods. I have long suspected there may be something here worth looking into with respect to population PK/PD model evaluation techniques, but his is still an open question. ________________________________________ From: [email protected] [[email protected]] On Behalf Of Bauer, Robert [[email protected]] Sent: Wednesday, January 29, 2014 3:34 PM To: Eleveld, DJ; [email protected] Subject: [NMusers] RE: Slow PRED in tables? Douglas: Thank you for sharing your code and data with me. When PRED, WRES, or RES are requested, NONMEM calls the weighted residual routine PRRES, which calculates PRED, WRES, and RES together. Your problem has a large number of data per subject (up to 1544) , and evaluating WRES requires some matrix algebra on up to a 1544x1544 matrix, which can take time to compute for each subject. The standard method of calculating WRES in NONMEM 7.3 and lower versions is inefficient, and I have made it more efficient for a future version of nonmem 7.4, so your TABLE step will calculate in 15 minutes. Meantime, in recently released NONMEM 7.3, you can use the WRESCHOL option, and the $TABLE step was calculated in 3 minutes. Robert J. Bauer, Ph.D. Vice President, Pharmacometrics, R&D ICON Development Solutions 7740 Milestone Parkway Suite 150 Hanover, MD 21076 Tel: (215) 616-6428 Mob: (925) 286-0769 Email: [email protected]<mailto:[email protected]> Web: http://www.iconplc.com/ From: [email protected] [mailto:[email protected]] On Behalf Of Eleveld, DJ Sent: Tuesday, January 28, 2014 9:19 AM To: [email protected] Subject: [NMusers] Slow PRED in tables? Hello everyone, I have a curious problem with slow PRED calculations in tables. The estimations are reasonably fast, 471 seconds for 23 iterations. If there is no PRED in any tables then NONMEM finishes a moment after the message about elapsed estimation time. But if is PRED in a table then it takes an extremely long time to return, I waited more than 10 hours before I broke off the run. If convergence of the algorithm is reasonably fast I get the impression that the differential equations are not problematic (it is $DES solved with ADVAN13 TOL=9). In fact if I restart the run with all $OMEGA to (0 FIXED) - to force all ETAs to zero - and do a MAX=0 run then it is quite fast. I would expect calculating PRED would be simply setting ETA=0 and solving the differential equations just once. Am I missing anything in how PRED in a table is calculated? Has anyone seen this kind of thing before? Any advice or tips? Warm regards, Douglas Eleveld ________________________________

Re: RE: Slow PRED in tables?

From: Alison Boeckmann Date: February 04, 2014 technical

This interesting thread has gotten far away from the original question: "I would expect calculating PRED would be simply setting ETA=0 and solving the differential equations just once. Am I missing anything in how PRED in a table is calculated?" This is true. It is only the computation of WRES that involves the matrix arithmetic that takes extra time. However, if *any* of PRED, RES, WRES is listed in the table, then NONMEM computes all three. This situation may change with a future version of NONMEM. Here is a technique that can be used with all versions of NONMEM and all ADVAN routines. There is an option $TABLE NOAPPEND that tells NONMEM not to append the PRED, RES, and WRES items. Code in $ERROR or $PRED can be used to capture values of PRED with ETA=0 (and with conditional ETAS, if desired). COMACT is 1 when NONMEM is making a copying pass, i.e., when the data records are being passed to PRED for the purpose of computing values of variables which will be obtained (i.e. copied) from NMPRD4 for tables and scatterplots. ETA=0 with such a pass. Values of PRE0 are identical to those of PRED. $ERROR (or $PRED) Y= ... IF (COMACT==1) PRE0=Y ; uses ETA=0 IPRED=Y ; uses conditional etas $TABLE PRE0 IPRED NOAPPEND

Quoted reply history

On Mon, Feb 3, 2014, at 06:38 AM, Bob Leary wrote: > Jerry, you are correct - the temporal pattern of the WRES components > WRES(I) could be quite misleading. One simple measure of agreement might > be correlation between > WRES(I) and the actual residual RES(I). > > For the case of the Cholesky decomposition, if we have M observations, > then the 1st component WRES(1) actually just depends on the > First residual RES(1) , so there is a 100% correlation between WRES(1) > and RES(1). But WRES(2) depends on RES(1) and RES(2), and more > generally WRES(K) depends on the > First K residuals. So the correlation between RES(i) and WRES(i) gets > weaker for the later values. > > In the case of the eigen-decomposition version of WRES, each individual > WRES(I) value depend on all the RES(I) values, but there is still a > positive correlation > Between RES(I) and WRES(I). I looked at this some time ago, and I think > I think I convinced myself that the eigenvector method actually maximizes > some average measure of correlation > (or more likely squared correlation) Between RES(I) and WRES(I), but > offhand I don't remember the details and whether or not I actually was > able to prove something. > > One thing you need to be careful about is picking the whitening matrix to > be a function of the residual data RES(i). In this case the whitening > matrix becomes a random matrix and > We can not longer be sure that the resulting product of the whitening > matrix and the random residuals on which it depends is actually anything > close to normally distributed with > Independent unit variance components. For an extreme example, for any > given WRES vector, one could always multiply it > By an orthogonal Householder matrix H to get WRESNEW = H*WRES, where > H*H'=I, but H is selected to make, for example, the first component of > WRESNEW equal to a positive value and all the other > Components equal to zero. Such a selection would be close to useless as > a diagnostic. > > My own conjecture is that for each particular model and experimental > design, there is a whitening matrix which is best in some sense for > exposing model misspecification, and > That perhaps the projection pursuit methodology , in combination with a > large number of simulations from that model, could be used to find it. > Whether this is a practical idea, I don't know. > > > -----Original Message----- > From: Nedelman, Jerry [mailto:[email protected]] > Sent: Monday, February 03, 2014 8:06 AM > To: Bob Leary; Bauer, Robert; [email protected] > Subject: RE: Slow PRED in tables? > > Bobs: Thanks for these insights. I've often wondered about the use of > weighted residuals as diagnostics where the pattern of residuals against > an observation-related covariate, such as time, is examined. Each > weighted residual is a linear combination of all residuals, so it is not > attached to any unique time. > > How serious a concern is this? Is it possible that the pattern could be > misleading because of how the weights are distributed? > > Is there some way of knowing how strong the attachment is? Could the > strength of the attachment be an optimization criterion for picking the > whitening matrix? Might the choice of best whitening matrix depend on the > covariate being diagnosed? > > Thanks, > Jerry > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > On Behalf Of Bob Leary > Sent: Wednesday, January 29, 2014 7:03 PM > To: Bauer, Robert; Eleveld, DJ; [email protected] > Subject: [NMusers] RE: Slow PRED in tables? > > Doug and Bob, > > This is point that I have long been interested in. For each individual, > the original WRES in NONMEM is based on a Schur decomposition of the > estimated covariance matrix (I have verified this numerically using > MATLAB) of the residual vector. Thus if the individual covariance matrix > of the residuals is C, and the residual vector is r, WRES is define as > C^(-1/2)*r, where C= U*diag(lambda)*U' , and the columns of U are the > eigenvectors of C, and lambda is the diagonal matrix of eigenvalues. > The main point of the computation is that the covariance of C^(-1/2)*r > is a unit matrix, where C^(-1/2) = U*lambda(^-1.2)*U' has the rough > interpretation of being a square root of C^(-1). > > Further diagnostic analysis of the components of WRES treats them as if > they were independent N(0,1) random variables since indeed are are > uncorrelated and normalized to have unit variance (although the normality > assumption is doubtful for hgihly nonlinear models). > > Thus the computation of WRES in NM7.2 requires the computation of all > eigenvalues and eigenvectors of the C matrix, which for large matrices > can start to be expensive (although the best LAPACK implementation of a > symmetric positive definite eigendecomposition uses Jim Demmel's O(N^1/3) > routine and maybe considerably faster than what you currently seeing in > NM7.2 - I have no way of knowing what NM7.2 actually uses.) > > But in any event, all the WRES computation is doing is de-correlating and > normalizing the residuals. Any matrix W such that W*r has a covariance > matrix equal to the unit matrix will do. In particular, choosing W = > lower triangular Cholesky decompostion of C^(-1), works just fine and is > much faster to compute than to compute than a Schur eigen-decompostion. > Based on the name, I assume that WRESCHOL is just WRES implemented with > such a Cholesky decomposition. > > WHich is better - WRES or WRESCHOL ? This is difficult question without > further information. It does point out that the details of the WRES > results, no matter how done, should be take with a large grain of salt > (WRESCHOL numerically will look totally different than WRES, yet they are > both equally valid de-correlation/normalizations) > > Electrical engineers have long been aware of this > de-correlation/normalization ambiguity. Basically if C(^-1/2) is any > matrix that decorrelates and normalizes the residuals, so is U*C^(-1/2), > where U is any orthogonal matrix (U'*U=I) Thus there are an infinite > number of choices to use for computing a reasonable WRES for further > diagnostic analysis. In the electrical engineering community, any matrix > of the form U*C^(-1/2) is called a whitening matrix, since it transforms > correlated data into 'whitened', decorrelated data with unit variances. > > In order to decide which decorrelation is best, one must impose an > optimization criterion to pick the particular best whitening matrix, This > is not so easy to do,but has given rise to a large literature under the > general name of 'projection pursuit' methods. > > > I have long suspected there may be something here worth looking into with > respect to population PK/PD model evaluation techniques, but his is still > an open question. > > > > > > ________________________________________ > From: [email protected] [[email protected]] On > Behalf Of Bauer, Robert [[email protected]] > Sent: Wednesday, January 29, 2014 3:34 PM > To: Eleveld, DJ; [email protected] > Subject: [NMusers] RE: Slow PRED in tables? > > Douglas: > Thank you for sharing your code and data with me. When PRED, WRES, or > RES are requested, NONMEM calls the weighted residual routine PRRES, > which calculates PRED, WRES, and RES together. > Your problem has a large number of data per subject (up to 1544) , and > evaluating WRES requires some matrix algebra on up to a 1544x1544 matrix, > which can take time to compute for each subject. The standard method of > calculating WRES in NONMEM 7.3 and lower versions is inefficient, and I > have made it more efficient for a future version of nonmem 7.4, so your > TABLE step will calculate in 15 minutes. Meantime, in recently released > NONMEM 7.3, you can use the WRESCHOL option, and the $TABLE step was > calculated in 3 minutes. > > Robert J. Bauer, Ph.D. > Vice President, Pharmacometrics, R&D > ICON Development Solutions > 7740 Milestone Parkway > Suite 150 > Hanover, MD 21076 > Tel: (215) 616-6428 > Mob: (925) 286-0769 > Email: [email protected]<mailto:[email protected]> > Web: http://www.iconplc.com/ > > From: [email protected] [mailto:[email protected]] > On Behalf Of Eleveld, DJ > Sent: Tuesday, January 28, 2014 9:19 AM > To: [email protected] > Subject: [NMusers] Slow PRED in tables? > > Hello everyone, > > I have a curious problem with slow PRED calculations in tables. The > estimations are reasonably fast, 471 seconds for 23 iterations. If there > is no PRED in any tables then NONMEM finishes a moment after the message > about elapsed estimation time. But if is PRED in a table then it takes an > extremely long time to return, I waited more than 10 hours before I broke > off the run. > > If convergence of the algorithm is reasonably fast I get the impression > that the differential equations are not problematic (it is $DES solved > with ADVAN13 TOL=9). In fact if I restart the run with all $OMEGA to (0 > FIXED) - to force all ETAs to zero - and do a MAX=0 run then it is quite > fast. > > I would expect calculating PRED would be simply setting ETA=0 and solving > the differential equations just once. Am I missing anything in how PRED > in a table is calculated? > > Has anyone seen this kind of thing before? Any advice or tips? > > Warm regards, > > Douglas Eleveld > > > ________________________________ >

`j` / `k`	Next / previous message
`o`	Open message
`f`	Search
`s`	Copy link
`t`	Filters
`c`	Copy message body
`r`	Related threads
`?`	This help
`Esc`	Close / clear