RE: setup of parallel processing and supporting software - help wanted
Mark:
Regarding point 2, keep in mind that PARSE_TYPE=2 or 4, algorithms you helped
with, do empirical load balancing, improving its assessment with each
iteration, so the idle time waiting for all to finish is reduced.
Robert J. Bauer, Ph.D.
Vice President, Pharmacometrics R&D
ICON Early Phase
Office: (215) 616-6428
Mobile: (925) 286-0769
[email protected]<mailto:[email protected]>
http://www.iconplc.com
Quoted reply history
From: [email protected] [mailto:[email protected]] On
Behalf Of Mark Sale
Sent: Tuesday, December 08, 2015 3:00 PM
To: Pavel Belo; [email protected]
Subject: Re: [NMusers] setup of parallel processing and supporting software -
help wanted
Pavel,
The loss of efficiency with parallel computing in NONMEM has two sources:
1. I/O time, each process has to do it's calculation, then write those results
to a disc file (on a single machine, even with the MPI method the results are
written to a file, that file may or may not be written to disc by the operating
system, depending on the file size the whether the OS decides the file may be
used soon, same actually in the FPI method, where the OS may decide to buffer
the file and not actually write it to disc.). This inefficiency gets larger
with the number of processes, and gets substantially larger when you go to
multiple machines, as they must send data over the network (and must actually
write the data to disk, with either MPI or FPI method). You can actually run
parallel NONMEM over a VPN, but as you might imagine, this slows it down
substantially.
2. Inefficiency due to one process finishing it's slice of the data before the
other. The manager program must wait until the last process is finished before
it can do the management (sum the OBJ, calculate the gradient, get the next
parameter values, send them out to the processes). This also gets larger with
more processes. In a well conditioned problem, where every individual takes
roughly the same amount of time to calculate the OBJ for, this isn't too bad.
But, occasionally, with stiff ODEs you'll find a small number of individuals
who take much, much longer to solve the ODES, and you'll find that efficiency
drops substantially.
Together these make up Amdahl's law
https://en.wikipedia.org/wiki/Amdahl%27s_law
[Image removed by https://en.wikipedia.org/wiki/Amdahl%27s_law
Amdahl's law - Wikipedia, the free encyclopedia
In computer architecture, Amdahl's law (or Amdahl's argument [1]) gives the
theoretical speedup in latency of the execution of a task at fixed workload
that can be ...
Read https://en.wikipedia.org/wiki/Amdahl%27s_law
All that said, here are my recommendations:
Don't bother trying to parallelize a run that takes less than 10 minutes, the
I/O time will cancel out any gain in execution time.
Single machine:
If the execution time for a single function evaluation (note a run is often
between 1000 and 5000 function evaluations) is less than 0.5 seconds, you
probably can improve performance with parallel execution. Note that 1000
function evaluations at 0.5 seconds each = 500 seconds, 8 minutes.
Multiple machines,
Assuming a 1 gbit network, if the execution time for a single function
evaluation is > 1 second, you probably can improve performance with parallel
execution.
I have personally never found a problem that benefited from more than 24
processes, but, in theory some very large problems (run time of weeks) may.
Here is a link to a nice paper from the Gibianskys and Bob Bauer with more
recent benchmarks than our early work.
http://www.ncbi.nlm.nih.gov/pubmed/22101761
Comparison of Nonmem 7.2 estimation methods and parallel ...
1. J Pharmacokinet Pharmacodyn. 2012 Feb;39(1):17-35. doi:
10.1007/s10928-011-9228-y. Epub 2011 Nov 19. Comparison of Nonmem 7.2
estimation methods and parallel ...
Read http://www.ncbi.nlm.nih.gov/pubmed/22101761
Mark
Mark Sale M.D.
Vice President, Modeling and Simulation
Nuventra, Inc. (tm)
2525 Meridian Parkway, Suite 280
Research Triangle Park, NC 27713
Office (919)-973-0383
[email protected]<[email protected]>
http://www.nuventra.com
Empower your Pipeline
CONFIDENTIALITY NOTICE The information in this transmittal (including
attachments, if any) may be privileged and confidential and is intended only
for the recipient(s) listed above. Any review, use, disclosure, distribution or
copying of this transmittal, in any form, is prohibited except by or on behalf
of the intended recipient(s). If you have received this transmittal in error,
please notify me immediately by reply email and destroy all copies of the
transmittal.
________________________________
From: [email protected] <[email protected]> on behalf of
Pavel Belo <[email protected]>
Sent: Tuesday, December 8, 2015 4:54 PM
To: [email protected]
Subject: [NMusers] setup of parallel processing and supporting software - help
wanted
Hello The Team,
We hear different opinions about effectiveness of parallel processing with
NONMEM from very helpful to less helpful. It can be task dependent. How
useful is it in phase 3 for basic and covariate models, as well as for
bootstrapping?
We reached a non-exploratory (production) point when popPK is on a critical
path and sophisticated but slow home-made utilities may be insufficient. Are
there efficient/quick companies/institutions, which setup parallel processing,
supporting software and, possibly, some other utilities (cloud computing, ...)?
A group which used to helped us a while ago disappeared somewhere...
Thanks,
Pavel