Hello All,
I have used nonmem for a while in an environment which uses sun grid engine and
mpich2. Currently the two services do not interact. Batch jobs go to the grid
engine, and parallel jobs run across all nodes using the mpi daemon. This is
usually not an issue, but during times of heavy user activity, the amount of
nonmem processes running on each compute node can exceed the total number of
cores, causing inefficiency. I am looking for a method of running a parallel
job such that it waits for the required number of slots / cores to be available
and clear of gridengine jobs before running. (and no new gridengine jobs are
submitted after it until it is finished) I have seen that gridengine supports
parallel queues but a method of interfacing this with nonmem's parafile
specification is not immediately apparent. I wanted to check if there is any
possibility of using nonmem/sge in this way before writing a wrapper bash
script that does something like this:
-submit N number of shell scripts which sleep forever
-poll the grid engine until N number of shell scripts are seen running in the
queue
-begin the parallel run
-when done, qdel all of the shell scripts
This solution would some problems with evenly using cores, and would require a
lot of manual code writing, so I was looking for a better solution first.
Please advise if you have heard of any solution.
Thank you.
Usage of parallel nonmem in tendem with SGE
3 messages
3 people
Latest: Apr 29, 2016
Paul,
I would think you can just use the parallel environments flag to reserve
(by_node) the required slots and execute once they are available.
Either way though, can be pretty inefficient, as you're going to be
'wasting' compute resources waiting for the additional nodes to become
available.
It's still not an ideal solution though IMO, given one long process could
cause many cores to sit waiting until that one finished.
Eg need 4 cores - 3 cores are available, but the 1 additional core is taken
up by a job that will run for another 20 minutes, don't know how much of a
concern that would be. As a "low effort" workaround personally, I'd
probably reserve N-X slots, then have the parallel job submit "too early"
and let the OS handle the oversaturation of threads for the remainder of
the last couple job(s). You could also do some basic introspection and
check what types of jobs are also submitted, an ongoing sse/scm run vs
simple execute statements could have markedly different justifications of
how to proceed.
No silver bullet out there that I know of, at least without more
orchestration effort and training people using the cluster, especially for
small/resource constrained clusters.
Good luck,
Devin
Quoted reply history
On Thu, Apr 28, 2016 at 12:23 PM Paul Jewell [Rudraya] <[email protected]>
wrote:
>
> Hello All,
>
>
> I have used nonmem for a while in an environment which uses sun grid
> engine and mpich2. Currently the two services do not interact. Batch jobs
> go to the grid engine, and parallel jobs run across all nodes using the mpi
> daemon. This is usually not an issue, but during times of heavy user
> activity, the amount of nonmem processes running on each compute node can
> exceed the total number of cores, causing inefficiency. I am looking for a
> method of running a parallel job such that it waits for the required number
> of slots / cores to be available and clear of gridengine jobs before
> running. (and no new gridengine jobs are submitted after it until it is
> finished) I have seen that gridengine supports parallel queues but a method
> of interfacing this with nonmem's parafile specification is not
> immediately apparent. I wanted to check if there is any possibility of
> using nonmem/sge in this way before writing a wrapper bash script that does
> something like this:
>
> -submit N number of shell scripts which sleep forever
>
> -poll the grid engine until N number of shell scripts are seen running in
> the queue
>
> -begin the parallel run
>
> -when done, qdel all of the shell scripts
>
> This solution would some problems with evenly using cores, and would
> require a lot of manual code writing, so I was looking for a better
> solution first. Please advise if you have heard of any solution.
>
>
> Thank you.
>
Hi Paul,
How are you launching jobs?
Are you launching nmfe using qsub -pe myqueue 4 ./nmfe-launch.sh ?
I may be mistaken, but I assume that SGE should automatically handle this sort
of thing, no?
If you specified the correct number of slots in the cluster queue, only that
amount of cores should be used at the same time. Any other jobs will wait until
enough slots are available.
See http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_pe.html , slots
attribute.
See also the -pe switch for qsub:
http://gridscheduler.sourceforge.net/htmlman/htmlman1/qsub.html
Of course, maybe your grid was configured with unlimited slots. This might make
sense for non-CPU-limited processes, where you want every CPU to multitask 2 or
3 jobs, ensuring it is always busy while waiting for I/O.
Maybe the environment for mpirun is not correctly set when it is actually
called from the nmfe script? In that case, I suggest you play around with the
command-line in the PNM file.
In my tests, I actually called my own ‘mpirun.sh’ which printed the full
environment, the mpirun call and then waited for user input. This way, I could
try the mpirun call manually and check that everything was okay.
From my experience with SLURM and NONMEM:
· It is best to assign only one job per (virtual hyper-threaded) CPU.
Hyper-threading will make sure any CPU can always choose from 2 processes,
ensuring that a single I/O-bound process does not impact cluster throughput.
· The above strategy requires some ‘user education’, since they might
feel unfairly treated if they get stuck at the end of a long queue.
· Some cluster schedulers can mitigate this (e.g. PriorityDecayHalfLife
in SLURM), or pre-emptying low-priority jobs for high-priority ones.
· Take care when combining PsN and NONMEM grid functionality. PsN might
reserve your full cluster, leaving you unable to start any MPI runs from
NONMEM. This will essentially deadlock the whole thing.
· You cannot predict whether NONMEM will actually benefit from these
cores. A user might specify 8 cores, waiting 4h until these are available on
the cluster, while NONMEM decides it only benefits from 2 cores! It will leave
the rest occupied (and sometimes even in MPI BUSY WAITING). There is no good
solution for this, apart from sensible defaults and user education…
As a side question:
Does anyone know if PsN can automatically reserve the right amount of cores
with SLURM ? (based on the -nodes parameter ?)
Kind regards,
Ruben
Information in this email and any attachments is confidential and
intended solely for the use of the individual(s) to whom it is addressed
or otherwise directed. Please note that any views or opinions presented
in this email are solely those of the author and do not necessarily
represent those of the Company.
Finally, the recipient should check this email and any attachments for
the presence of viruses. The Company accepts no liability for any damage
caused by any virus transmitted by this email.
All SGS services are rendered in accordance with the applicable SGS
conditions of service available on request and accessible at
http://www.sgs.com/en/Terms-and-Conditions.aspx