Re: Simple parallel benchmark for Nonmem 7.2 with large Bayes problem

From: Ron Keizer Date: May 21, 2011 technical Source: mail-archive.com

Dieter, the observation that the 8-core run is slower than the 4-core run is probably not due CPU hyperthreading, as you suggest. The CPU loads that you report also suggest otherwise. I agree with Mark that it is more likely due to the short time per iteration, i.e. the relatively high amount of overhead compared to the actual calculations. We noticed the same when using FPI. Use MPI or test a slower model and this effect will probably disappear. We also did some benchmarking, and noticed that NM7.2 can do pretty efficient parallelization. Our conclusions: - MPI is much more efficient than FPI, especially for faster problems - The efficiency with MPI seems to hold across estimation methods (FOCE / BAYES / SAEM) and models (8 tested), around 90% when using 5 cores. See results below. - Parallelization efficiency depends on e.g. time per iteration, transfer type, number of individuals in dataset. - parallelization (MPI) was still efficient at higher numbers of cores. We tested up to 7 cores on 1 machine. In some basic tests, performance over network-nodes seemed as good as when running on a single machine, although fair benchmarking is difficult on a production cluster. We tested using the gfortran compiler, on a dedicated 8-core machine running Linux. best regards, Ron -- ----------------------------------- Ron Keizer, PharmD PhD Post-doctoral fellow Pharmacometrics Research Group Uppsala University ----------------------------------- table1: multicore efficiency | tt | n cores | time_FOCE | % | time_BAYES | % | |-----+---------+-----------+-----+------------+-----| | - | 1 | 13462.69 | 100 | 5283.78 | 100 | | FPI | 2 | 7269.35 | 54 | 3096.51 | 58 | | FPI | 3 | 5081.05 | 38 | 2470.52 | 46 | | FPI | 4 | 4211.93 | 31 | 2709.43 | 51 | | FPI | 5 | 3667.43 | 27 | 2729.8 | 51 | | FPI | 6 | 3464.34 | 26 | 3254.91 | 61 | |-----+---------+-----------+-----+------------+-----| | - | 1 | 13462.69 | 100 | 5283.78 | 100 | | MPI | 2 | 7122.48 | 53 | 2731.38 | 51 | | MPI | 3 | 4826.77 | 36 | 1853.94 | 35 | | MPI | 4 | 3705.35 | 28 | 1464.69 | 27 | | MPI | 5 | 2976.36 | 22 | 1179.11 | 22 | | MPI | 6 | 2519.89 | 19 | 1011.94 | 19 | table 2: efficiency across different models (distributed to 5 cores, t in sec) | mo | model | est | n_ind | iter | t_orig | t_mpi5 | t% | eff% | |----+--------+-------+-------+------+----------+---------+-------+-------| | M1 | ADVAN6 | FOCEI | 9 | 16 | 5863.0 | 1881.88 | 32.1 | 62.31 | | M2 | ADVAN6 | FOCEI | 454 | 28 | 4485.3 | 930.38 | 20.74 | 96.42 | | M3 | ADVAN6 | FOCEI | 412 | 20 | 363.84 | 78.23 | 21.5 | 93.02 | | M4 | ADVAN6 | FOCE | 105 | 486 | 13616.83 | 2979.52 | 21.88 | 91.4 | | M5 | ADVAN6 | FOCEI | 42 | 45 | 14183.92 | 3167.56 | 22.33 | 89.56 | | M6 | ADVAN6 | FOCEI | 39 | 43 | 4698.34 | 992.52 | 21.12 | 94.67 | | M7 | ADVAN6 | FOCE | 100 | 29 | 33249 | 7493.82 | 22.54 | 88.74 |

Quoted reply history

On 5/20/11 9:36 PM, Dieter Menne wrote: > Here some quick-and-dirty results of my first benchmark with parallel > processing in NONMEM 7.2 > > Running Win7, 64 bit, intel i7, with 4 CPU (and 4 hyperthreading cores). One > computer only. > > Using file message passing. Could not get mpi to work in this configuration. > > call nmfe72 mtl_KPreM2Pre_T2L2_.ctl -parafile=fpiwini8.pnm [nodes]= (1 or 4 > or 8) > > 10 iterations of a very large Bayes problem (which should not profit from > multiple cores, according to the manual) > > nodes time > 1 45 s > 4 25 s > 8 40 s > > So about a factor of 2 between 1 and 4 cores. > > It is not surprising that 8 gives worse values because these are no real > CPUs. More surprising is the fact that with 8 "CPU", I have 100 load on all > of them (huh?), while with 4 CPUs, I have the expected 50%. > > Dieter

`j` / `k`	Next / previous message
`o`	Open message
`f`	Search
`s`	Copy link
`t`	Filters
`c`	Copy message body
`r`	Related threads
`?`	This help
`Esc`	Close / clear

Re: Simple parallel benchmark for Nonmem 7.2 with large Bayes problem

Thread

Keyboard Shortcuts

May 20, 2011	Dieter Menne	Simple parallel benchmark for Nonmem 7.2 with large Bayes problem
May 20, 2011	Mark Sale	RE: Simple parallel benchmark for Nonmem 7.2 with large Bayes problem
May 21, 2011	Ron Keizer	Re: Simple parallel benchmark for Nonmem 7.2 with large Bayes problem
May 21, 2011	Nick Holford	Re: Simple parallel benchmark for Nonmem 7.2 with large Bayes problem
May 23, 2011	Ron Keizer	Re: Simple parallel benchmark for Nonmem 7.2 with large Bayes problem
May 23, 2011	Xavier Woot de Trixhe	RE: Simple parallel benchmark for Nonmem 7.2 with large Bayes problem