RE: Linear speedup of NONMEM on quad-core CPUs?

From: Mark Sale Date: March 12, 2007 technical Source: mail-archive.com
Steve, It really, really should be the case that speed up for multiple simulataneous runs is linear. In looking at it for many years, NONMEM execution really is consistently proportional to benchmarks like specfp95. It seems that disc I/O is trivial, the entire data set can typically be put into cache on modern machines. I have noted differences between "cheaper" 2.8 Ghz dual core machines (Dell E510) and "better" 2.8 Ghz machines I've gotten (from Gateway). But, if you look at the specfp95 ( http://www.spec.org/cpu95/results/cfp95.html), there are difference between machines using the same CPU - I can't claim to understand why. Memory should not be an issue - NONMEM typically uses less than 5 Mb of memory. I have done what you ask (I think) in a two stage, but not the whole thing: Dual core does increase run speed (1/time) linearly (note that dual core are typically a little slower clock speed) for 2 processes - this is what I currently run. 4 processor (single core - a Proliant 4 processor server running Windows Server 2000) machine does increase run speed (1/time) linearly, for four processes. The Intel quad core is just two dual core processor stuck together with a single front side bus, they don't share cache or registers. This probably is better for NONMEM than the AMD approach, sharing registers, since separate NONMEM runs obviously don't need to share anything. (the Intel approach is worse for games, since latency to cache memory is worse) But, a 4 processor dual core will cost you > $12,000, and will not use less power than 4 dual core boxes - why go to the quad processor? (Trust me, it won't make less noise either) You can buy 4 dual core boxes, set up a LAN and map the c: drive on one "main" machine to all the machines (so from the "main" machine, everthing looks like it is happening on the local drive, when in fact execution is happening on the other machines), use remote desktop to control all 4 computers from one monitor/mouse/keyboard. A dual core Dell is about $700. Best price for quad core right now is about $2000 (i.e, more $/Ghz than dual core) The current Intel quad core is intended for servers, and is expensive. The desktop version is due out late this year - should be cheaper and prices will probably come down when AMD comes out with their quad core CPU. Brian, You're observation (if I understand correctly that you are talking about running only one NONMEM run) is a little surprising, NONMEM is single threaded. So the current appoach to parallel computing (multithreading) isn't going to happen. The parallel option on the Intel compiler can, in theory, "unroll" loops in Fortran. But, in reality, the code has to be specifically written to do this, and NONMEM certainly is not. I tried this, in collaboration with Silicon Graphics about 10 years ago (who claimed to have the best parallel compiler around, right before they went out of business), and got zero parallelization for a single run of NONMEM. But this was a long time ago, maybe Intel figured out something new. Mark Mark Sale MD Next Level Solutions, LLC www.NextLevelSolns.com
Quoted reply history
> -------- Original Message -------- > Subject: Re: [NMusers] Linear speedup of NONMEM on quad-core CPUs? > From: Steve Chapel <[EMAIL PROTECTED]> > Date: Mon, March 12, 2007 11:30 am > To: [email protected] > > That's really not my question. My question was about speedup of multiple > NONMEM runs, not one NONMEM run. Let me rephrase the question. > > Let's say I have eight NONMEM jobs to run each week. Each NONMEM job > takes eight hours to run. I go to a computer and start one NONMEM job, > and when it is finished, I start another, and so on. After eight hours, > all eight NONMEM jobs are run. > > The next week, I get a great idea. Instead of using one computer, I can > use eight computers. I start all eight NONMEM jobs at the same time, and > after only one hour they are all done. I have achieved eightfold > (linear) speedup in running eight jobs by using eight computers. > > The next week, I make a further realization. The computers I was running > the NONMEM jobs are dual-core, so I need to use only four computers. I > start two NONMEM jobs on each of the four computers, and after one hour > all the jobs are done. The benefit is that this week I needed only four > computers to be available. > > It might occur to me that all I really need is one computer with two > quad-core processors. I could start all eight NONMEM jobs simultaneously > on just one computer. The question is, has anyone actually tried this? > Does it run all eight NONMEM jobs in the same time it would take to run > one NONMEM jobs? In other words, has going from one core to eight cores > enabled an eightfold (linear) speedup in running eight NONMEM jobs? If > not, how much speedup might I expect from an eight-core computer? > > -- Steve > > > Brian M. Sadler wrote: > > Steve, > > > > I have just set up NONMEM 6 on a 4GB Core(2) Quad system running XP64. I > > don't yet have benchmarks, but I have noted activity on all four CPU using > > the "/Qparallel" option with the Intel Fortran Compiler. I look forward to > > hearing of others' experiences. > > > > Cheers... Brian > > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > > Behalf Of Steve Chapel > > Sent: Friday, March 09, 2007 12:08 PM > > To: [email protected] > > Subject: [NMusers] Linear speedup of NONMEM on quad-core CPUs? > > > > A few years ago there was a post about benchmarking results for NONMEM > > on a dual-core CPU ( http://huxley.phor.com/nonmem/nm/99nov212005.html). > > Given the relatively recent release of Xeon quad-core processors I > > wanted to know if anybody has compared NONMEM runs on a machine with two > > dual-core processors to NONMEM runs on a quad-core CPU, or even NONMEM > > runs on a computer with two quad-core CPUs. Has anyone confirmed that > > having four or eight cores provides linear speedup of running four or > > eight NONMEM jobs? Alternatively, if anyone has confirmed that the > > speedup is not linear, what is the approximate speedup, and what was > > model number of the CPU(s)? > > > > If a similar topic has been discussed recently (in January or February) > > on this mailing list, could someone please re-post the information? I > > just joined in March 2007, and the archives seem to contain no messages > > from 2007. > > > > Thanks, > > Steve > > > > > > > > > >
Mar 09, 2007 Steve Chapel Linear speedup of NONMEM on quad-core CPUs?
Mar 10, 2007 Brian Sadler RE: Linear speedup of NONMEM on quad-core CPUs?
Mar 12, 2007 Steve Chapel Re: Linear speedup of NONMEM on quad-core CPUs?
Mar 12, 2007 Mark Sale RE: Linear speedup of NONMEM on quad-core CPUs?
Mar 12, 2007 William Bachman RE: Linear speedup of NONMEM on quad-core CPUs?
Mar 12, 2007 Brian Sadler RE: Linear speedup of NONMEM on quad-core CPUs?
Mar 14, 2007 Steve Chapel Re: Linear speedup of NONMEM on quad-core CPUs?