RE: Simple parallel benchmark for Nonmem 7.2 with large Bayes problem
Dieter, We never expected the parallel NONMEM to perform well with problems of this size. The benefit, in our early benchmarks, really starts with problems that are at least 20 minutes. The math is pretty simple, basically, if a function evaluation takes more than a about a half second (not that a "typical" nonmem run may have 3000 function evaluations), it is worth sending out to multiple processes. That was our conclusion with the file-based method, the MPI might be more efficient (but, I'm told that behind the curtains, they both do pretty much the same thing, the OS buffers data blocks of this size very well, the data never actually goes to the physical disc). Our early benchmark were also with multiple computers, across a 100 Mb/s LAN. Likely there is also better performance with the very clever load balancing and dynamic sizing that Bob Bauer has put into the new release. But, don't expect any benefit with 1 minute runs, there is I/O overhead involved with sending out the data, even on the same CPU. Note that our benchmarks had a base run time of 6 hours. See our poster at http://2009.go-acop.org/acop2009/posters . Mark Mark Sale MD President, Next Level Solutions, LLC www.NextLevelSolns.com 919-846-9185 A carbon-neutral company See our real time solar energy production at: http://enlighten.enphaseenergy.com/public/systems/aSDz2458
Quoted reply history
-------- Original Message --------
Subject: [NMusers] Simple parallel benchmark for Nonmem 7.2 with large
Bayes problem
From: "Dieter Menne " < [email protected] > ;
Date: Fri, May 20, 2011 3:36 pm
To: "nmuser list" < [email protected] >
Here some quick-and-dirty results of my first benchmark with parallel
processing in NONMEM 7.2
Running Win7, 64 bit, intel i7, with 4 CPU (and 4 hyperthreading cores). One
computer only.
Using file message passing. Could not get mpi to work in this configuration.
call nmfe72 mtl_KPreM2Pre_T2L2_.ctl -parafile= fpiwini8.pnm [nodes]= (1 or 4
or 8)
10 iterations of a very large Bayes problem (which should not profit from
multiple cores, according to the manual)
nodes time
1 45 s
4 25 s
8 40 s
So about a factor of 2 between 1 and 4 cores.
It is not surprising that 8 gives worse values because these are no real
CPUs. More surprising is the fact that with 8 "CPU", I have 100 load on all
of them (huh?), while with 4 CPUs, I have the expected 50%.
Dieter