mpiexec exit code 1 even though NONMEM MPI run was successful
I'm running NONMEM 744 and 750 with MPI parallelization on a cluster managed by
https://slurm.schedmd.com/overview.html. NM744 worked fine but NM750
failed to communicate with the worker processes. IT support (Peter Maxwell)
discovered that using the -awnf option (no file copy to workers) allowed NM750
to run. The pnm file looks like this:
$GENERAL
NODES=[nodes] PARSE_TYPE=2 PARSE_NUM=200 TIMEOUTI=600 TIMEOUT=1000 PARAPRINT=0
TRANSFER_TYPE=1
$COMMANDS
srun --export=ALL -n [nodes] <<nmexec>> $* -awnf
However, the -awnf option leads lines in stderror.txt for each worker
apparently triggered by an exit code of 1 which is passed on by nmfe75. Peter
explained this as follows:
"The srun: error: wbn202: task 21: Exited with exit code 1 messages and
"FAILED" state of the MPI job step are both consequences of $nmexec exiting
with exit code 1 rather than the 0 which might be expected to indicate success.
I think that's deliberate, or at least expected, given that the code in nmfe75
which wraps its use only worries about exit codes greater than 1:
echo Starting MPI version of nonmem execution ...
./nmmpi.sh $1 $3 -licfile=$lfile $* -nmexec=$nmexec
status2=$?
echo Done with nonmem execution
if [ $status2 -gt 1 ] ; then exitcode=115 ; fi
"
So my questions are:
1. Why does MPI communication with workers fail if the -awnf option is not
used? Given the use of this option seems to fix the problem and even makes the
run faster an answer would just be for interest rather than necessary to make
NM750 work.
2. Why does NONMEM return an exit code of 1 (via $status) when the run seems to
have completed without an error? Is there a way to avoid this? Being able to do
this would be primarily cosmetic to suppress the misleading entries in
stderror.txt.
Thanks in advance if you can throw any light on these queries.
Nick
--
Nick Holford, Professor Clinical Pharmacology
Dept Pharmacology & Clinical Pharmacology, Bldg 503 Room 302A
University of Auckland,85 Park Rd,Private Bag 92019,Auckland,New Zealand
office:+64(9)923-6730 mobile:NZ+64(21)46 23 53 FR+33(6)62 32 46 72
email: [email protected]
http://holford.fmhs.auckland.ac.nz/
http://orcid.org/0000-0002-4031-2514
Read the question, answer the question, attempt all questions