I'm running NONMEM 744 and 750 with MPI parallelization on a cluster managed by
https://slurm.schedmd.com/overview.html. NM744 worked fine but NM750
failed to communicate with the worker processes. IT support (Peter Maxwell)
discovered that using the -awnf option (no file copy to workers) allowed NM750
to run. The pnm file looks like this:
$GENERAL
NODES=[nodes] PARSE_TYPE=2 PARSE_NUM=200 TIMEOUTI=600 TIMEOUT=1000 PARAPRINT=0
TRANSFER_TYPE=1
$COMMANDS
srun --export=ALL -n [nodes] <<nmexec>> $* -awnf
However, the -awnf option leads lines in stderror.txt for each worker
apparently triggered by an exit code of 1 which is passed on by nmfe75. Peter
explained this as follows:
"The srun: error: wbn202: task 21: Exited with exit code 1 messages and
"FAILED" state of the MPI job step are both consequences of $nmexec exiting
with exit code 1 rather than the 0 which might be expected to indicate success.
I think that's deliberate, or at least expected, given that the code in nmfe75
which wraps its use only worries about exit codes greater than 1:
echo Starting MPI version of nonmem execution ...
./nmmpi.sh $1 $3 -licfile=$lfile $* -nmexec=$nmexec
status2=$?
echo Done with nonmem execution
if [ $status2 -gt 1 ] ; then exitcode=115 ; fi
"
So my questions are:
1. Why does MPI communication with workers fail if the -awnf option is not
used? Given the use of this option seems to fix the problem and even makes the
run faster an answer would just be for interest rather than necessary to make
NM750 work.
2. Why does NONMEM return an exit code of 1 (via $status) when the run seems to
have completed without an error? Is there a way to avoid this? Being able to do
this would be primarily cosmetic to suppress the misleading entries in
stderror.txt.
Thanks in advance if you can throw any light on these queries.
Nick
--
Nick Holford, Professor Clinical Pharmacology
Dept Pharmacology & Clinical Pharmacology, Bldg 503 Room 302A
University of Auckland,85 Park Rd,Private Bag 92019,Auckland,New Zealand
office:+64(9)923-6730 mobile:NZ+64(21)46 23 53 FR+33(6)62 32 46 72
email: [email protected]
http://holford.fmhs.auckland.ac.nz/
http://orcid.org/0000-0002-4031-2514
Read the question, answer the question, attempt all questions
mpiexec exit code 1 even though NONMEM MPI run was successful
2 messages
2 people
Latest: Apr 15, 2021
Nick:
In NONMEM, Exit condition 1 is issued as an identification of a worker, and
Exit condition 0 is issued as identification of the (successful) manager. This
is in case a user wants a special action taken on manager versus worker. You
can create a wrapper script, such as this example, let's call it nonmem_wrap,
so that a final exit 0 is called:
#!/bin/bash
exefile=$1
shift
./$exefile $*
status=$?
# Manager returns status 0, while workers return status 1.
if [ ! $status -eq 1 ]; then
# do something here if manager
exit 0
else
# do something here if worker
exit 0
fi
Modify your pnm file so it calls the wrapper script (which in turn calls the
nonmem executable), for example:
srun --export=ALL -n [nodes] ./nonmem_wrap <<nmexec>> $* -awnf
By over-sight I did not have this feature documented, but I shall do so.
Robert J. Bauer, Ph.D.
Senior Director
Pharmacometrics R&D
ICON Early Phase
820 W. Diamond Avenue
Suite 100
Gaithersburg, MD 20878
Office: (215) 616-6428
Mobile: (925) 286-0769
[email protected]<mailto:[email protected]>
http://www.iconplc.com