RE: NONMEM/PsN benchmark for SGE expansion

From: Jurgen Bulitta Date: April 01, 2011 technical Source: mail-archive.com

Dear Julia, I am very happy to see that Merck is leveraging parallelized computing so significantly. We did some systematic testing on parallelized modeling and published this recently. AAPS Journal 2011, DOI: http://dx.doi.org/10.1208/s12248-011-9258-9 http://www.springerlink.com/content/c215433172002281/ The results on the efficiency of parallelization were generally in good agreement with the testing that Chee and Bob presented for parallelized S-ADAPT earlier. This was using the Importance Sampling EM algorithm (pmethod=4 in S-ADAPT; equivalent to method IMPMAP in NONMEM). For this example, parallelizing on 8 threads yielded a 6.9 times faster estimation and parallelizing on 48 threads yielded a 23 times faster estimation. As the datasets had 48 subjects, each thread received one subject in the latter case and about 50% of the estimation time was distributing data through the network. The benefit of parallelizing increases significantly: 1) If the algorithm has a large (>99%) parallelizable fraction that can be distributed among worker nodes. (IMPMAP is very well suited for this, MCMC is not, FOCE should have a smaller parallelizable fraction than IMPMAP). Example: A program with 50% parallelizable fraction can only be accelerated to 2-fold the single threaded speed, no matter how many cores one has. 2) If the dataset has many subjects. This is most critical for industry. 3) If the model is complex and requires differential equations. (Parallelizing a one compartment model is unlikely to yield much benefit due to network traffic). 4) Bootstrap analyses are ideal to be distributed on the network. Bootstraps are best run in single threaded though, as one can parallelize with 100% efficiency across the 1000 bootstrap replicates. Some additional thoughts: a) The larger your cluster, the more important it is to invest in the model code and dataset debugger before a model is compiled, since you do not want to manually shut-down the 2000 simultaneously running exe-files that might not have closed properly. This is one of the key reasons why we invested significant time in developing a free pre-processor for S-ADAPT. b) If you have 2000 nodes, it may be worth to consider launching jobs from several master nodes. You can run into trouble both with the available RAM and with the network traffic, if everything needs to funnel through one master node, even if you use 4x Infiniband networking, for example. c) Creating a cuing system to prioritize jobs from different users and projects may help. Your computational chemistry group must have a system like this. d) Saving and analyzing intermediary results is most critical for large parallelized jobs. Hope this provides some useful ideas. Overall, I think for (complex) models that require differential equations, parallelizing will decide whether a project is feasible or not in the time available. This is why we almost always parallelize. Best wishes Juergen Jürgen B. Bulitta, Ph.D., Senior Scientist, Ordway Research Institute, 150 New Scotland Avenue, Albany, NY 12208, USA Phone: +1 (518) 641-6418, Fax: +1 (518) 641-6304 Email: [email protected] http://www.ordwayresearch.org/profile_bulitta.html

Quoted reply history

From: [email protected] [mailto:[email protected]] On Behalf Of Ivashina, Julia Sent: Friday, March 25, 2011 11:43 AM To: [email protected] Subject: [NMusers] NONMEM/PsN benchmark for SGE expansion Dear all, We would like to benchmark our new SGE cluster, and appreciate anyone who has performed a similar task and can share the findings. We use NONMEM 7.1.2 with PsN 3.2.12 in two cluster environments. Our older environment consists of 9 quad core machines (about 40 work nodes, counting the head node), and the newer one - over 2000 work nodes 512 CPU each. These are the questions we'd like to answer: · What is a reasonable time one should expect to shave off by moving PK/PD analysis from the smaller cluster to the bigger one? · What type of analysis is the most sensitive to an increase in number of work nodes? · What should be the expected gain from increasing the number in -threads 50 times? · What parts of NONMEM/PsN are the most optimized for parallel execution? · What are the scenarios where gain from parallelization is the biggest? The initial bootstrap test we've done showed some progress. Although, the model we chose did not run 50 times faster (2000/40=50). Some of the reasons: pre-processing (creating of bootstrap samples), Fortran compiler work, and combining of the results are not spread across work nodes. Since the compute time for each of the job was small (5-10 seconds), the overhead of job submittal was more significant. We also use vpc, npc, cdd, llp, sse and scm analysis, so would like to get some ideas on parallelization capability of these functions. Any benchmarking results or ideas that you can share is very much appreciated. Thank you, Julia Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.

Mar 25, 2011	Julia Ivashina	NONMEM/PsN benchmark for SGE expansion
Mar 28, 2011	Paul Matthias Diderichsen	Re: NONMEM/PsN benchmark for SGE expansion
Apr 01, 2011	Jurgen Bulitta	RE: NONMEM/PsN benchmark for SGE expansion
Apr 01, 2011	Paul Matthias Diderichsen	Re: RE: NONMEM/PsN benchmark for SGE expansion

`j` / `k`	Next / previous message
`o`	Open message
`f`	Search
`s`	Copy link
`t`	Filters
`c`	Copy message body
`r`	Related threads
`?`	This help
`Esc`	Close / clear

RE: NONMEM/PsN benchmark for SGE expansion

Thread

Keyboard Shortcuts