mpiexec exit code 1 even though NONMEM MPI run was successful

From: Nick Holford Date: April 14, 2021 technical

I'm running NONMEM 744 and 750 with MPI parallelization on a cluster managed by https://slurm.schedmd.com/overview.html. NM744 worked fine but NM750 failed to communicate with the worker processes. IT support (Peter Maxwell) discovered that using the -awnf option (no file copy to workers) allowed NM750 to run. The pnm file looks like this: $GENERAL NODES=[nodes] PARSE_TYPE=2 PARSE_NUM=200 TIMEOUTI=600 TIMEOUT=1000 PARAPRINT=0 TRANSFER_TYPE=1 $COMMANDS srun --export=ALL -n [nodes] <<nmexec>> $* -awnf However, the -awnf option leads lines in stderror.txt for each worker apparently triggered by an exit code of 1 which is passed on by nmfe75. Peter explained this as follows: "The srun: error: wbn202: task 21: Exited with exit code 1 messages and "FAILED" state of the MPI job step are both consequences of $nmexec exiting with exit code 1 rather than the 0 which might be expected to indicate success. I think that's deliberate, or at least expected, given that the code in nmfe75 which wraps its use only worries about exit codes greater than 1: echo Starting MPI version of nonmem execution ... ./nmmpi.sh $1 $3 -licfile=$lfile $* -nmexec=$nmexec status2=$? echo Done with nonmem execution if [ $status2 -gt 1 ] ; then exitcode=115 ; fi " So my questions are: 1. Why does MPI communication with workers fail if the -awnf option is not used? Given the use of this option seems to fix the problem and even makes the run faster an answer would just be for interest rather than necessary to make NM750 work. 2. Why does NONMEM return an exit code of 1 (via $status) when the run seems to have completed without an error? Is there a way to avoid this? Being able to do this would be primarily cosmetic to suppress the misleading entries in stderror.txt. Thanks in advance if you can throw any light on these queries. Nick -- Nick Holford, Professor Clinical Pharmacology Dept Pharmacology & Clinical Pharmacology, Bldg 503 Room 302A University of Auckland,85 Park Rd,Private Bag 92019,Auckland,New Zealand office:+64(9)923-6730 mobile:NZ+64(21)46 23 53 FR+33(6)62 32 46 72 email: [email protected] http://holford.fmhs.auckland.ac.nz/ http://orcid.org/0000-0002-4031-2514 Read the question, answer the question, attempt all questions

RE: mpiexec exit code 1 even though NONMEM MPI run was successful

From: Robert Bauer Date: April 15, 2021 technical

Nick: In NONMEM, Exit condition 1 is issued as an identification of a worker, and Exit condition 0 is issued as identification of the (successful) manager. This is in case a user wants a special action taken on manager versus worker. You can create a wrapper script, such as this example, let's call it nonmem_wrap, so that a final exit 0 is called: #!/bin/bash exefile=$1 shift ./$exefile $* status=$? # Manager returns status 0, while workers return status 1. if [ ! $status -eq 1 ]; then # do something here if manager exit 0 else # do something here if worker exit 0 fi Modify your pnm file so it calls the wrapper script (which in turn calls the nonmem executable), for example: srun --export=ALL -n [nodes] ./nonmem_wrap <<nmexec>> $* -awnf By over-sight I did not have this feature documented, but I shall do so. Robert J. Bauer, Ph.D. Senior Director Pharmacometrics R&D ICON Early Phase 820 W. Diamond Avenue Suite 100 Gaithersburg, MD 20878 Office: (215) 616-6428 Mobile: (925) 286-0769 [email protected]<mailto:[email protected]> http://www.iconplc.com

`j` / `k`	Next / previous message
`o`	Open message
`f`	Search
`s`	Copy link
`t`	Filters
`c`	Copy message body
`r`	Related threads
`?`	This help
`Esc`	Close / clear