Re: setup of parallel processing and supporting software - help wanted

From: Mark Sale Date: December 08, 2015 job Source: cognigen.com
Pavel, The loss of efficiency with parallel computing in NONMEM has two sources: 1. I/O time, each process has to do it's calculation, then write those results to a disc file (on a single machine, even with the MPI method the results are written to a file, that file may or may not be written to disc by the operating system, depending on the file size the whether the OS decides the file may be used soon, same actually in the FPI method, where the OS may decide to buffer the file and not actually write it to disc.). This inefficiency gets larger with the number of processes, and gets substantially larger when you go to multiple machines, as they must send data over the network (and must actually write the data to disk, with either MPI or FPI method). You can actually run parallel NONMEM over a VPN, but as you might imagine, this slows it down substantially. 2. Inefficiency due to one process finishing it's slice of the data before the other. The manager program must wait until the last process is finished before it can do the management (sum the OBJ, calculate the gradient, get the next parameter values, send them out to the processes). This also gets larger with more processes. In a well conditioned problem, where every individual takes roughly the same amount of time to calculate the OBJ for, this isn't too bad. But, occasionally, with stiff ODEs you'll find a small number of individuals who take much, much longer to solve the ODES, and you'll find that efficiency drops substantially. Together these make up Amdahl's law https://en.wikipedia.org/wiki/Amdahl%27s_law https://en.wikipedia.org/wiki/Amdahl%27s_law Amdahl's law - Wikipedia, the free encyclopedia In computer architecture, Amdahl's law (or Amdahl's argument [1]) gives the theoretical speedup in latency of the execution of a task at fixed workload that can be ... Read https://en.wikipedia.org/wiki/Amdahl%27s_law All that said, here are my recommendations: Don't bother trying to parallelize a run that takes less than 10 minutes, the I/O time will cancel out any gain in execution time. Single machine: If the execution time for a single function evaluation (note a run is often between 1000 and 5000 function evaluations) is less than 0.5 seconds, you probably can improve performance with parallel execution. Note that 1000 function evaluations at 0.5 seconds each = 500 seconds, 8 minutes. Multiple machines, Assuming a 1 gbit network, if the execution time for a single function evaluation is > 1 second, you probably can improve performance with parallel execution. I have personally never found a problem that benefited from more than 24 processes, but, in theory some very large problems (run time of weeks) may. Here is a link to a nice paper from the Gibianskys and Bob Bauer with more recent benchmarks than our early work. http://www.ncbi.nlm.nih.gov/pubmed/22101761 Comparison of Nonmem 7.2 estimation methods and parallel ... 1. J Pharmacokinet Pharmacodyn. 2012 Feb;39(1):17-35. doi: 10.1007/s10928-011-9228-y. Epub 2011 Nov 19. Comparison of Nonmem 7.2 estimation methods and parallel ... Read http://www.ncbi.nlm.nih.gov/pubmed/22101761 Mark Mark Sale M.D. Vice President, Modeling and Simulation Nuventra, Inc. 2525 Meridian Parkway, Suite 280 Research Triangle Park, NC 27713 Office (919)-973-0383 msale_at_nuventra.com<msale_at_kinetigen.com> http://www.nuventra.com Empower your Pipeline CONFIDENTIALITY NOTICE The information in this transmittal (including attachments, if any) may be privileged and confidential and is intended only for the recipient(s) listed above. Any review, use, disclosure, distribution or copying of this transmittal, in any form, is prohibited except by or on behalf of the intended recipient(s). If you have received this transmittal in error, please notify me immediately by reply email and destroy all copies of the transmittal.
Quoted reply history
________________________________ From: owner-nmusers_at_globomaxnm.com <owner-nmusers_at_globomaxnm.com> on behalf of Pavel Belo <nonmem_at_optonline.net> Sent: Tuesday, December 8, 2015 4:54 PM To: nmusers_at_globomaxnm.com Subject: [NMusers] setup of parallel processing and supporting software - help wanted Hello The Team, We hear different opinions about effectiveness of parallel processing with NONMEM from very helpful to less helpful. It can be task dependent. How useful is it in phase 3 for basic and covariate models, as well as for bootstrapping? We reached a non-exploratory (production) point when popPK is on a critical path and sophisticated but slow home-made utilities may be insufficient. Are there efficient/quick companies/institutions, which setup parallel processing, supporting software and, possibly, some other utilities (cloud computing, ...)? A group which used to helped us a while ago disappeared somewhere... Thanks, Pavel
Dec 08, 2015 Mark Sale Re: setup of parallel processing and supporting software - help wanted
Dec 09, 2015 Bill Knebel Re: setup of parallel processing and supporting software - help wanted
Dec 09, 2015 Mark Sale Re: setup of parallel processing and supporting software - help wanted