Linear speedup of NONMEM on quad-core CPUs?

7 messages 4 people Latest: Mar 14, 2007

Linear speedup of NONMEM on quad-core CPUs?

From: Steve Chapel Date: March 09, 2007 technical
A few years ago there was a post about benchmarking results for NONMEM on a dual-core CPU ( http://huxley.phor.com/nonmem/nm/99nov212005.html ). Given the relatively recent release of Xeon quad-core processors I wanted to know if anybody has compared NONMEM runs on a machine with two dual-core processors to NONMEM runs on a quad-core CPU, or even NONMEM runs on a computer with two quad-core CPUs. Has anyone confirmed that having four or eight cores provides linear speedup of running four or eight NONMEM jobs? Alternatively, if anyone has confirmed that the speedup is not linear, what is the approximate speedup, and what was model number of the CPU(s)? If a similar topic has been discussed recently (in January or February) on this mailing list, could someone please re-post the information? I just joined in March 2007, and the archives seem to contain no messages from 2007. Thanks, Steve

RE: Linear speedup of NONMEM on quad-core CPUs?

From: Brian Sadler Date: March 10, 2007 technical
Steve, I have just set up NONMEM 6 on a 4GB Core(2) Quad system running XP64. I don't yet have benchmarks, but I have noted activity on all four CPU using the "/Qparallel" option with the Intel Fortran Compiler. I look forward to hearing of others' experiences. Cheers... Brian
Quoted reply history
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Chapel Sent: Friday, March 09, 2007 12:08 PM To: [email protected] Subject: [NMusers] Linear speedup of NONMEM on quad-core CPUs? A few years ago there was a post about benchmarking results for NONMEM on a dual-core CPU ( http://huxley.phor.com/nonmem/nm/99nov212005.html). Given the relatively recent release of Xeon quad-core processors I wanted to know if anybody has compared NONMEM runs on a machine with two dual-core processors to NONMEM runs on a quad-core CPU, or even NONMEM runs on a computer with two quad-core CPUs. Has anyone confirmed that having four or eight cores provides linear speedup of running four or eight NONMEM jobs? Alternatively, if anyone has confirmed that the speedup is not linear, what is the approximate speedup, and what was model number of the CPU(s)? If a similar topic has been discussed recently (in January or February) on this mailing list, could someone please re-post the information? I just joined in March 2007, and the archives seem to contain no messages from 2007. Thanks, Steve

Re: Linear speedup of NONMEM on quad-core CPUs?

From: Steve Chapel Date: March 12, 2007 technical
That's really not my question. My question was about speedup of multiple NONMEM runs, not one NONMEM run. Let me rephrase the question. Let's say I have eight NONMEM jobs to run each week. Each NONMEM job takes eight hours to run. I go to a computer and start one NONMEM job, and when it is finished, I start another, and so on. After eight hours, all eight NONMEM jobs are run. The next week, I get a great idea. Instead of using one computer, I can use eight computers. I start all eight NONMEM jobs at the same time, and after only one hour they are all done. I have achieved eightfold (linear) speedup in running eight jobs by using eight computers. The next week, I make a further realization. The computers I was running the NONMEM jobs are dual-core, so I need to use only four computers. I start two NONMEM jobs on each of the four computers, and after one hour all the jobs are done. The benefit is that this week I needed only four computers to be available. It might occur to me that all I really need is one computer with two quad-core processors. I could start all eight NONMEM jobs simultaneously on just one computer. The question is, has anyone actually tried this? Does it run all eight NONMEM jobs in the same time it would take to run one NONMEM jobs? In other words, has going from one core to eight cores enabled an eightfold (linear) speedup in running eight NONMEM jobs? If not, how much speedup might I expect from an eight-core computer? -- Steve Brian M. Sadler wrote: > Steve, > > I have just set up NONMEM 6 on a 4GB Core(2) Quad system running XP64. I > don't yet have benchmarks, but I have noted activity on all four CPU using > the "/Qparallel" option with the Intel Fortran Compiler. I look forward to > hearing of others' experiences. > > Cheers... Brian >
Quoted reply history
> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Steve Chapel > Sent: Friday, March 09, 2007 12:08 PM > To: [email protected] > Subject: [NMusers] Linear speedup of NONMEM on quad-core CPUs? > > A few years ago there was a post about benchmarking results for NONMEM on a dual-core CPU ( http://huxley.phor.com/nonmem/nm/99nov212005.html ). Given the relatively recent release of Xeon quad-core processors I wanted to know if anybody has compared NONMEM runs on a machine with two dual-core processors to NONMEM runs on a quad-core CPU, or even NONMEM runs on a computer with two quad-core CPUs. Has anyone confirmed that having four or eight cores provides linear speedup of running four or eight NONMEM jobs? Alternatively, if anyone has confirmed that the speedup is not linear, what is the approximate speedup, and what was model number of the CPU(s)? > > If a similar topic has been discussed recently (in January or February) on this mailing list, could someone please re-post the information? I just joined in March 2007, and the archives seem to contain no messages from 2007. > > Thanks, > Steve

RE: Linear speedup of NONMEM on quad-core CPUs?

From: Mark Sale Date: March 12, 2007 technical
Steve, It really, really should be the case that speed up for multiple simulataneous runs is linear. In looking at it for many years, NONMEM execution really is consistently proportional to benchmarks like specfp95. It seems that disc I/O is trivial, the entire data set can typically be put into cache on modern machines. I have noted differences between "cheaper" 2.8 Ghz dual core machines (Dell E510) and "better" 2.8 Ghz machines I've gotten (from Gateway). But, if you look at the specfp95 ( http://www.spec.org/cpu95/results/cfp95.html), there are difference between machines using the same CPU - I can't claim to understand why. Memory should not be an issue - NONMEM typically uses less than 5 Mb of memory. I have done what you ask (I think) in a two stage, but not the whole thing: Dual core does increase run speed (1/time) linearly (note that dual core are typically a little slower clock speed) for 2 processes - this is what I currently run. 4 processor (single core - a Proliant 4 processor server running Windows Server 2000) machine does increase run speed (1/time) linearly, for four processes. The Intel quad core is just two dual core processor stuck together with a single front side bus, they don't share cache or registers. This probably is better for NONMEM than the AMD approach, sharing registers, since separate NONMEM runs obviously don't need to share anything. (the Intel approach is worse for games, since latency to cache memory is worse) But, a 4 processor dual core will cost you > $12,000, and will not use less power than 4 dual core boxes - why go to the quad processor? (Trust me, it won't make less noise either) You can buy 4 dual core boxes, set up a LAN and map the c: drive on one "main" machine to all the machines (so from the "main" machine, everthing looks like it is happening on the local drive, when in fact execution is happening on the other machines), use remote desktop to control all 4 computers from one monitor/mouse/keyboard. A dual core Dell is about $700. Best price for quad core right now is about $2000 (i.e, more $/Ghz than dual core) The current Intel quad core is intended for servers, and is expensive. The desktop version is due out late this year - should be cheaper and prices will probably come down when AMD comes out with their quad core CPU. Brian, You're observation (if I understand correctly that you are talking about running only one NONMEM run) is a little surprising, NONMEM is single threaded. So the current appoach to parallel computing (multithreading) isn't going to happen. The parallel option on the Intel compiler can, in theory, "unroll" loops in Fortran. But, in reality, the code has to be specifically written to do this, and NONMEM certainly is not. I tried this, in collaboration with Silicon Graphics about 10 years ago (who claimed to have the best parallel compiler around, right before they went out of business), and got zero parallelization for a single run of NONMEM. But this was a long time ago, maybe Intel figured out something new. Mark Mark Sale MD Next Level Solutions, LLC www.NextLevelSolns.com
Quoted reply history
> -------- Original Message -------- > Subject: Re: [NMusers] Linear speedup of NONMEM on quad-core CPUs? > From: Steve Chapel <[EMAIL PROTECTED]> > Date: Mon, March 12, 2007 11:30 am > To: [email protected] > > That's really not my question. My question was about speedup of multiple > NONMEM runs, not one NONMEM run. Let me rephrase the question. > > Let's say I have eight NONMEM jobs to run each week. Each NONMEM job > takes eight hours to run. I go to a computer and start one NONMEM job, > and when it is finished, I start another, and so on. After eight hours, > all eight NONMEM jobs are run. > > The next week, I get a great idea. Instead of using one computer, I can > use eight computers. I start all eight NONMEM jobs at the same time, and > after only one hour they are all done. I have achieved eightfold > (linear) speedup in running eight jobs by using eight computers. > > The next week, I make a further realization. The computers I was running > the NONMEM jobs are dual-core, so I need to use only four computers. I > start two NONMEM jobs on each of the four computers, and after one hour > all the jobs are done. The benefit is that this week I needed only four > computers to be available. > > It might occur to me that all I really need is one computer with two > quad-core processors. I could start all eight NONMEM jobs simultaneously > on just one computer. The question is, has anyone actually tried this? > Does it run all eight NONMEM jobs in the same time it would take to run > one NONMEM jobs? In other words, has going from one core to eight cores > enabled an eightfold (linear) speedup in running eight NONMEM jobs? If > not, how much speedup might I expect from an eight-core computer? > > -- Steve > > > Brian M. Sadler wrote: > > Steve, > > > > I have just set up NONMEM 6 on a 4GB Core(2) Quad system running XP64. I > > don't yet have benchmarks, but I have noted activity on all four CPU using > > the "/Qparallel" option with the Intel Fortran Compiler. I look forward to > > hearing of others' experiences. > > > > Cheers... Brian > > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > > Behalf Of Steve Chapel > > Sent: Friday, March 09, 2007 12:08 PM > > To: [email protected] > > Subject: [NMusers] Linear speedup of NONMEM on quad-core CPUs? > > > > A few years ago there was a post about benchmarking results for NONMEM > > on a dual-core CPU ( http://huxley.phor.com/nonmem/nm/99nov212005.html). > > Given the relatively recent release of Xeon quad-core processors I > > wanted to know if anybody has compared NONMEM runs on a machine with two > > dual-core processors to NONMEM runs on a quad-core CPU, or even NONMEM > > runs on a computer with two quad-core CPUs. Has anyone confirmed that > > having four or eight cores provides linear speedup of running four or > > eight NONMEM jobs? Alternatively, if anyone has confirmed that the > > speedup is not linear, what is the approximate speedup, and what was > > model number of the CPU(s)? > > > > If a similar topic has been discussed recently (in January or February) > > on this mailing list, could someone please re-post the information? I > > just joined in March 2007, and the archives seem to contain no messages > > from 2007. > > > > Thanks, > > Steve > > > > > > > > > >
I've actually run this experiment (not on a quad-core, but, on a four cpu cluster). I have no reason to believe the quad-core would behave differently than the cluster assuming appropriate software on the quad-core. N = n cpu's O = observed run-time in minutes for multiple identical runs D = observed decrease in runtime with n cpu's T = theoretical decrease in runtime with n cpu's N O D T 1 159 - 0 2 79 80 80 3 53 106 106 4 40 119 119 The equations for the reduction in processing time were also derived and presented at AAPS in 2003, "Use of a Linux Cluster with PDx-Pop and NONMEM V to Streamline Population Analysis", W. Bachman and W. Knebel.
Quoted reply history
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Chapel Sent: Monday, March 12, 2007 11:31 AM To: [email protected] Subject: Re: [NMusers] Linear speedup of NONMEM on quad-core CPUs? That's really not my question. My question was about speedup of multiple NONMEM runs, not one NONMEM run. Let me rephrase the question. Let's say I have eight NONMEM jobs to run each week. Each NONMEM job takes eight hours to run. I go to a computer and start one NONMEM job, and when it is finished, I start another, and so on. After eight hours, all eight NONMEM jobs are run. The next week, I get a great idea. Instead of using one computer, I can use eight computers. I start all eight NONMEM jobs at the same time, and after only one hour they are all done. I have achieved eightfold (linear) speedup in running eight jobs by using eight computers. The next week, I make a further realization. The computers I was running the NONMEM jobs are dual-core, so I need to use only four computers. I start two NONMEM jobs on each of the four computers, and after one hour all the jobs are done. The benefit is that this week I needed only four computers to be available. It might occur to me that all I really need is one computer with two quad-core processors. I could start all eight NONMEM jobs simultaneously on just one computer. The question is, has anyone actually tried this? Does it run all eight NONMEM jobs in the same time it would take to run one NONMEM jobs? In other words, has going from one core to eight cores enabled an eightfold (linear) speedup in running eight NONMEM jobs? If not, how much speedup might I expect from an eight-core computer? -- Steve Brian M. Sadler wrote: > Steve, > > I have just set up NONMEM 6 on a 4GB Core(2) Quad system running XP64. I > don't yet have benchmarks, but I have noted activity on all four CPU using > the "/Qparallel" option with the Intel Fortran Compiler. I look forward to > hearing of others' experiences. > > Cheers... Brian > > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Steve Chapel > Sent: Friday, March 09, 2007 12:08 PM > To: [email protected] > Subject: [NMusers] Linear speedup of NONMEM on quad-core CPUs? > > A few years ago there was a post about benchmarking results for NONMEM > on a dual-core CPU ( http://huxley.phor.com/nonmem/nm/99nov212005.html). > Given the relatively recent release of Xeon quad-core processors I > wanted to know if anybody has compared NONMEM runs on a machine with two > dual-core processors to NONMEM runs on a quad-core CPU, or even NONMEM > runs on a computer with two quad-core CPUs. Has anyone confirmed that > having four or eight cores provides linear speedup of running four or > eight NONMEM jobs? Alternatively, if anyone has confirmed that the > speedup is not linear, what is the approximate speedup, and what was > model number of the CPU(s)? > > If a similar topic has been discussed recently (in January or February) > on this mailing list, could someone please re-post the information? I > just joined in March 2007, and the archives seem to contain no messages > from 2007. > > Thanks, > Steve > > > > > _______________________________________________________________________________________________________________________________________

RE: Linear speedup of NONMEM on quad-core CPUs?

From: Brian Sadler Date: March 12, 2007 technical
Mark, The 4-core "activity" was noted in the Windows Task Manager and may, according to the guy who build my machine, be an anomaly in the way the Task Manager reports activity on each processor. I am just starting my evaluation of this computer's performance and will share my experience with the group once I have more objective results. Cheers... Brian
Quoted reply history
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Sale - Next Level Solutions Sent: Monday, March 12, 2007 1:04 PM To: Steve Chapel Cc: [email protected] Subject: RE: [NMusers] Linear speedup of NONMEM on quad-core CPUs? Steve, It really, really should be the case that speed up for multiple simulataneous runs is linear. In looking at it for many years, NONMEM execution really is consistently proportional to benchmarks like specfp95. It seems that disc I/O is trivial, the entire data set can typically be put into cache on modern machines. I have noted differences between "cheaper" 2.8 Ghz dual core machines (Dell E510) and "better" 2.8 Ghz machines I've gotten (from Gateway). But, if you look at the specfp95 ( http://www.spec.org/cpu95/results/cfp95.html), there are difference between machines using the same CPU - I can't claim to understand why. Memory should not be an issue - NONMEM typically uses less than 5 Mb of memory. I have done what you ask (I think) in a two stage, but not the whole thing: Dual core does increase run speed (1/time) linearly (note that dual core are typically a little slower clock speed) for 2 processes - this is what I currently run. 4 processor (single core - a Proliant 4 processor server running Windows Server 2000) machine does increase run speed (1/time) linearly, for four processes. The Intel quad core is just two dual core processor stuck together with a single front side bus, they don't share cache or registers. This probably is better for NONMEM than the AMD approach, sharing registers, since separate NONMEM runs obviously don't need to share anything. (the Intel approach is worse for games, since latency to cache memory is worse) But, a 4 processor dual core will cost you > $12,000, and will not use less power than 4 dual core boxes - why go to the quad processor? (Trust me, it won't make less noise either) You can buy 4 dual core boxes, set up a LAN and map the c: drive on one "main" machine to all the machines (so from the "main" machine, everthing looks like it is happening on the local drive, when in fact execution is happening on the other machines), use remote desktop to control all 4 computers from one monitor/mouse/keyboard. A dual core Dell is about $700. Best price for quad core right now is about $2000 (i.e, more $/Ghz than dual core) The current Intel quad core is intended for servers, and is expensive. The desktop version is due out late this year - should be cheaper and prices will probably come down when AMD comes out with their quad core CPU. Brian, You're observation (if I understand correctly that you are talking about running only one NONMEM run) is a little surprising, NONMEM is single threaded. So the current appoach to parallel computing (multithreading) isn't going to happen. The parallel option on the Intel compiler can, in theory, "unroll" loops in Fortran. But, in reality, the code has to be specifically written to do this, and NONMEM certainly is not. I tried this, in collaboration with Silicon Graphics about 10 years ago (who claimed to have the best parallel compiler around, right before they went out of business), and got zero parallelization for a single run of NONMEM. But this was a long time ago, maybe Intel figured out something new. Mark Mark Sale MD Next Level Solutions, LLC www.NextLevelSolns.com > -------- Original Message -------- > Subject: Re: [NMusers] Linear speedup of NONMEM on quad-core CPUs? > From: Steve Chapel <[EMAIL PROTECTED]> > Date: Mon, March 12, 2007 11:30 am > To: [email protected] > > That's really not my question. My question was about speedup of multiple > NONMEM runs, not one NONMEM run. Let me rephrase the question. > > Let's say I have eight NONMEM jobs to run each week. Each NONMEM job > takes eight hours to run. I go to a computer and start one NONMEM job, > and when it is finished, I start another, and so on. After eight hours, > all eight NONMEM jobs are run. > > The next week, I get a great idea. Instead of using one computer, I can > use eight computers. I start all eight NONMEM jobs at the same time, and > after only one hour they are all done. I have achieved eightfold > (linear) speedup in running eight jobs by using eight computers. > > The next week, I make a further realization. The computers I was running > the NONMEM jobs are dual-core, so I need to use only four computers. I > start two NONMEM jobs on each of the four computers, and after one hour > all the jobs are done. The benefit is that this week I needed only four > computers to be available. > > It might occur to me that all I really need is one computer with two > quad-core processors. I could start all eight NONMEM jobs simultaneously > on just one computer. The question is, has anyone actually tried this? > Does it run all eight NONMEM jobs in the same time it would take to run > one NONMEM jobs? In other words, has going from one core to eight cores > enabled an eightfold (linear) speedup in running eight NONMEM jobs? If > not, how much speedup might I expect from an eight-core computer? > > -- Steve > > > Brian M. Sadler wrote: > > Steve, > > > > I have just set up NONMEM 6 on a 4GB Core(2) Quad system running XP64. I > > don't yet have benchmarks, but I have noted activity on all four CPU using > > the "/Qparallel" option with the Intel Fortran Compiler. I look forward to > > hearing of others' experiences. > > > > Cheers... Brian > > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > > Behalf Of Steve Chapel > > Sent: Friday, March 09, 2007 12:08 PM > > To: [email protected] > > Subject: [NMusers] Linear speedup of NONMEM on quad-core CPUs? > > > > A few years ago there was a post about benchmarking results for NONMEM > > on a dual-core CPU ( http://huxley.phor.com/nonmem/nm/99nov212005.html). > > Given the relatively recent release of Xeon quad-core processors I > > wanted to know if anybody has compared NONMEM runs on a machine with two > > dual-core processors to NONMEM runs on a quad-core CPU, or even NONMEM > > runs on a computer with two quad-core CPUs. Has anyone confirmed that > > having four or eight cores provides linear speedup of running four or > > eight NONMEM jobs? Alternatively, if anyone has confirmed that the > > speedup is not linear, what is the approximate speedup, and what was > > model number of the CPU(s)? > > > > If a similar topic has been discussed recently (in January or February) > > on this mailing list, could someone please re-post the information? I > > just joined in March 2007, and the archives seem to contain no messages > > from 2007. > > > > Thanks, > > Steve > > > > > > > > > >

Re: Linear speedup of NONMEM on quad-core CPUs?

From: Steve Chapel Date: March 14, 2007 technical
I didn't even think about disk I/O. I was more concerned about front side bus activity being a bottleneck. I found that NONMEM used only 2.4 MB of memory, but of course this would depend on computer architecture, compiler options, sizes of arrays, and so on. My impression is that the bus should not be a bottleneck, because the 4 MB shared cache of each dual core should be able to hold the most frequently accessed memory, as you point out. It's good that someone has determined that this really is the case. As for why go the quad-core route, I'm going to need a very reliable system so I want servers with Xeons anyway. Noise is not a concern, because these servers are going to get their own room. Comparing the cost and power of two quad-core systems vs. a system with two quad-cores, it looks like the two quad-core system will cost less and should use less power. If performance is no worse, it makes economic sense to pack as many cores into each box as possible. -- Steve Mark Sale - Next Level Solutions wrote: > Steve, > It really, really should be the case that speed up for multiple > simulataneous runs is linear. In looking at it for many years, NONMEM > execution really is consistently proportional to benchmarks like > specfp95. It seems that disc I/O is trivial, the entire data set can > typically be put into cache on modern machines. I have noted > differences between "cheaper" 2.8 Ghz dual core machines (Dell E510) > and "better" 2.8 Ghz machines I've gotten (from Gateway). But, if you > look at the specfp95 ( http://www.spec.org/cpu95/results/cfp95.html), > there are difference between machines using the same CPU - I can't > claim to understand why. Memory should not be an issue - NONMEM > typically uses less than 5 Mb of memory. > > I have done what you ask (I think) in a two stage, but not the whole > thing: > Dual core does increase run speed (1/time) linearly (note that dual core > are typically a little slower clock speed) for 2 processes - this is > what I currently run. > 4 processor (single core - a Proliant 4 processor server running Windows > Server 2000) machine does increase run speed (1/time) linearly, for four > processes. > > The Intel quad core is just two dual core processor stuck together with > a single front side bus, they don't share cache or registers. This > probably is better for NONMEM than the AMD approach, sharing registers, > since separate NONMEM runs obviously don't need to share anything. (the > Intel approach is worse for games, since latency to cache memory is > worse) > > But, a 4 processor dual core will cost you > $12,000, and will not use > > less power than 4 dual core boxes - why go to the quad processor? (Trust me, it won't make less noise either) You can buy 4 dual core > > boxes, set up a LAN and map the c: drive on one "main" machine to all > the machines (so from the "main" machine, everthing looks like it is > happening on the local drive, when in fact execution is happening on > the other machines), use remote desktop to control all 4 computers from > one monitor/mouse/keyboard. A dual core Dell is about $700. Best price > > for quad core right now is about $2000 (i.e, more $/Ghz than dual core) The current Intel quad core is intended for servers, and is expensive. The desktop version is due out late this year - should be cheaper and > > prices will probably come down when AMD comes out with their quad core > CPU. > > Brian, > You're observation (if I understand correctly that you are talking > about running only one NONMEM run) is a little surprising, NONMEM is > single threaded. So the current appoach to parallel computing > (multithreading) isn't going to happen. The parallel option on the > Intel compiler can, in theory, "unroll" loops in Fortran. But, in > reality, the code has to be specifically written to do this, and NONMEM > certainly is not. I tried this, in collaboration with Silicon Graphics > about 10 years ago (who claimed to have the best parallel compiler > around, right before they went out of business), and got zero > parallelization for a single run of NONMEM. But this was a long time > ago, maybe Intel figured out something new. > > Mark > > Mark Sale MD > Next Level Solutions, LLC > www.NextLevelSolns.com