mpiexec exit code 1 even though NONMEM MPI run was successful

From: Nick Holford Date: April 14, 2021 technical Source: mail-archive.com
I'm running NONMEM 744 and 750 with MPI parallelization on a cluster managed by https://slurm.schedmd.com/overview.html. NM744 worked fine but NM750 failed to communicate with the worker processes. IT support (Peter Maxwell) discovered that using the -awnf option (no file copy to workers) allowed NM750 to run. The pnm file looks like this: $GENERAL NODES=[nodes] PARSE_TYPE=2 PARSE_NUM=200 TIMEOUTI=600 TIMEOUT=1000 PARAPRINT=0 TRANSFER_TYPE=1 $COMMANDS srun --export=ALL -n [nodes] <<nmexec>> $* -awnf However, the -awnf option leads lines in stderror.txt for each worker apparently triggered by an exit code of 1 which is passed on by nmfe75. Peter explained this as follows: "The srun: error: wbn202: task 21: Exited with exit code 1 messages and "FAILED" state of the MPI job step are both consequences of $nmexec exiting with exit code 1 rather than the 0 which might be expected to indicate success. I think that's deliberate, or at least expected, given that the code in nmfe75 which wraps its use only worries about exit codes greater than 1: echo Starting MPI version of nonmem execution ... ./nmmpi.sh $1 $3 -licfile=$lfile $* -nmexec=$nmexec status2=$? echo Done with nonmem execution if [ $status2 -gt 1 ] ; then exitcode=115 ; fi " So my questions are: 1. Why does MPI communication with workers fail if the -awnf option is not used? Given the use of this option seems to fix the problem and even makes the run faster an answer would just be for interest rather than necessary to make NM750 work. 2. Why does NONMEM return an exit code of 1 (via $status) when the run seems to have completed without an error? Is there a way to avoid this? Being able to do this would be primarily cosmetic to suppress the misleading entries in stderror.txt. Thanks in advance if you can throw any light on these queries. Nick -- Nick Holford, Professor Clinical Pharmacology Dept Pharmacology & Clinical Pharmacology, Bldg 503 Room 302A University of Auckland,85 Park Rd,Private Bag 92019,Auckland,New Zealand office:+64(9)923-6730 mobile:NZ+64(21)46 23 53 FR+33(6)62 32 46 72 email: [email protected] http://holford.fmhs.auckland.ac.nz/ http://orcid.org/0000-0002-4031-2514 Read the question, answer the question, attempt all questions
Apr 14, 2021 Nick Holford mpiexec exit code 1 even though NONMEM MPI run was successful
Apr 15, 2021 Robert Bauer RE: mpiexec exit code 1 even though NONMEM MPI run was successful