Dear All,
We are using psn version: 3.4.2 together with NONMEM 7.2.0 on a Linux Sun
Grid Engine (SGE). When using multi-cores run on SGE, it happens sometimes
that NONMEM returns a log file where the "MONITORING OF SEARCH" starts and
nothing is reported.
Looking into the psn directory, I found files which have the name of my
script file + an extension made of letters and numbers that contains an
error message that is not shown on the log file. For example my nm-tran
script file is run003.mod and my log file run003.lst ends with:
MONITORING OF SEARCH:
Stop Time:
Wed Jul 10 21:05:18 CEST 2012
Then I recover a file named run003.mod.o9501 in run003/NM_run1 directory
created by psn. Sometimes this file contains an explicit error message,
sometimes more cabalistic information as:
WARNINGS AND ERRORS (IF ANY) FOR PROBLEM 1
(WARNING 2) NM-TRAN INFERS THAT THE DATA ARE POPULATION.
CREATING MUMODEL ROUTINE...
Recompiling certain components
USING PARALLEL PROFILE mpi_12cores.pnm
MPI TRANSFER TYPE SELECTED
Exit status = 1
IN MPI
Starting MPI version of nonmem execution ...
License Registered to: Merck KGaA
Expiration Date: 14 SEP 2013
Current Date: 11 JUL 2012
Days until program expires : 428
Iterative Two Stage (No Prior)
MONITORING OF SEARCH:
At line 240 of file (unit = 10, file = 'WK1_FILE10')
Fortran runtime error: End of file
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(174).....................: MPI_Send(buf=0xde71a0, count=80030,
MPI_INTEGER, dest=1, tag=1, MPI_COMM_WORLD) failed
MPIDI_CH3I_Progress(150)..........:
MPID_nem_mpich2_blocking_recv(948):
MPID_nem_tcp_connpoll(1720).......:
state_commrdy_handler(1556).......:
MPID_nem_tcp_recv_handler(1446)...: socket closed
rank 1 in job 1 deda1x0481_36189 caused collective abort of all ranks
exit status of rank 1: return code 2
Questions:
1) Is there a way to force psn and/or NONMEM to collect the error message
in the log file when using multi-cores run ?
2) What about "cabalistic" error messages as the one above?
Thank you for your help,
Kind regards
Pascal Girard, PhD
[email protected]
Head of Modeling & Simulation - Oncology
Global Exploratory Medicine
Merck Serono S.A. ยท Geneva
Tel: +41.22.414.3549
Cell: +41.79.508.7898
This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient, you
must not copy this message or attachment or disclose the contents to any other
person. If you have received this transmission in error, please notify the
sender immediately and delete the message and any attachment from your system.
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept
liability for any omissions or errors in this message which may arise as a
result of E-Mail-transmission or for damages resulting from any unauthorized
changes of the content of this message and any attachment thereto. Merck KGaA,
Darmstadt, Germany and any of its subsidiaries do not guarantee that this
message is free of viruses and does not accept liability for any damages caused
by any virus transmitted therewith.
Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.
Error files when using multicore runs and psn
2 messages
2 people
Latest: Jul 11, 2012
Pascal:
I cannot help regarding having all console messages sent to the proper files in
the PSN environment, but I can assist in avoiding your present NONMEM error.
If you insert at the beginning of the control stream file
$SIZES LIM1=??
and insert a large enough value for ??, then file buffer 10 will not be used,
and the error is avoided. The value should be at least as large as the number
of data records (lines) in your data file (see section I.6 of
..\guides\nm720.pdf).
Although nmfe72 in parallel mode has been tested successfully in our hands to
use the file buffers for large data sets, it may not work in all grid
environments. Setting the LIM values large enough avoids using buffer files,
and utilizes only memory. The problem also runs faster when buffer files are
not used.
Robert J. Bauer, Ph.D.
Vice President, Pharmacometrics, R&D
ICON Development Solutions
7740 Milestone Parkway
Suite 150
Hanover, MD 21076
Tel: (215) 616-6428
Mob: (925) 286-0769
Email: [email protected]
Web: http://www.iconplc.com/
Quoted reply history
________________________________
From: [email protected] [mailto:[email protected]] On
Behalf Of [email protected]
Sent: Wednesday, July 11, 2012 11:19 AM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: [NMusers] Error files when using multicore runs and psn
Dear All,
We are using psn version: 3.4.2 together with NONMEM 7.2.0 on a Linux Sun Grid
Engine (SGE). When using multi-cores run on SGE, it happens sometimes that
NONMEM returns a log file where the "MONITORING OF SEARCH" starts and nothing
is reported.
Looking into the psn directory, I found files which have the name of my script
file + an extension made of letters and numbers that contains an error message
that is not shown on the log file. For example my nm-tran script file is
run003.mod and my log file run003.lst ends with:
MONITORING OF SEARCH:
Stop Time:
Wed Jul 10 21:05:18 CEST 2012
Then I recover a file named run003.mod.o9501 in run003/NM_run1 directory
created by psn. Sometimes this file contains an explicit error message,
sometimes more cabalistic information as:
WARNINGS AND ERRORS (IF ANY) FOR PROBLEM 1
(WARNING 2) NM-TRAN INFERS THAT THE DATA ARE POPULATION.
CREATING MUMODEL ROUTINE...
Recompiling certain components
USING PARALLEL PROFILE mpi_12cores.pnm
MPI TRANSFER TYPE SELECTED
Exit status = 1
IN MPI
Starting MPI version of nonmem execution ...
License Registered to: Merck KGaA
Expiration Date: 14 SEP 2013
Current Date: 11 JUL 2012
Days until program expires : 428
Iterative Two Stage (No Prior)
MONITORING OF SEARCH:
At line 240 of file (unit = 10, file = 'WK1_FILE10')
Fortran runtime error: End of file
Fatal error in MPI_Send: Other MPI error, error stack:
MPI_Send(174).....................: MPI_Send(buf=0xde71a0, count=80030,
MPI_INTEGER, dest=1, tag=1, MPI_COMM_WORLD) failed
MPIDI_CH3I_Progress(150)..........:
MPID_nem_mpich2_blocking_recv(948):
MPID_nem_tcp_connpoll(1720).......:
state_commrdy_handler(1556).......:
MPID_nem_tcp_recv_handler(1446)...: socket closed
rank 1 in job 1 deda1x0481_36189 caused collective abort of all ranks
exit status of rank 1: return code 2
Questions:
1) Is there a way to force psn and/or NONMEM to collect the error message in
the log file when using multi-cores run ?
2) What about "cabalistic" error messages as the one above?
Thank you for your help,
Kind regards
Pascal Girard, PhD
[email protected]
Head of Modeling & Simulation - Oncology
Global Exploratory Medicine
Merck Serono S.A. * Geneva
Tel: +41.22.414.3549
Cell: +41.79.508.7898
This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient, you
must not copy this message or attachment or disclose the contents to any other
person. If you have received this transmission in error, please notify the
sender immediately and delete the message and any attachment from your system.
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept
liability for any omissions or errors in this message which may arise as a
result of E-Mail-transmission or for damages resulting from any unauthorized
changes of the content of this message and any attachment thereto. Merck KGaA,
Darmstadt, Germany and any of its subsidiaries do not guarantee that this
message is free of viruses and does not accept liability for any damages caused
by any virus transmitted therewith.
Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.