Re: GPU port for NONMEM

From: Chee ng Date: March 21, 2011 technical Source: mail-archive.com

Hi Mark, Xavier and all, Thank you for sharing the information. Someone pointed me to this interesting discussion about the GPU and NLME in NMuser group. Recently, I developed a prototype of GPU-based QRPEM (Quassi-Random Parameteric EM) algorithm in a single laptop computer equipped with an INTEL Core i7-920 (2.6GHz) Extreme Quad-core processor and a NVIDIA Quadro FX3800M video graphic card that contained 128 stream processors (hopefully the PAGE will accept my poster for presentation in Greece this year). Using a simple one-compartment PK model, my results is very similar to those obtained by Xavier in SAEM, the GPU computation (based on a single graphic card) was close to 20x speed increase compared to the CPU (in this case INTEL Core i7-920 extreme processor which was one of the fastest CPU for laptop). In addition, the GPU has a much better scaling relationship between computation times and number of random samples (Nmc) used to compute the E-step of the QRPEM algorithm. By increasing the Nmc from 1000 to 20000, the mean computation times (with 30 iterations) increased from 2.9 to 38 min for CPU-based QRPEM, but only from 0.5 to 1.9 min for GPU-based QRPEM. I suspect that GPU computing will become more efficient for a more complex population PK/PD model and currently developing a ODE for GPU so I can test this hypothesis. Because a different programming logic and limited memory that has to be shared by many stream processors (can be up to 448 processor cores with 6GB memory for a single Tesla GPU 2070 card ), GPU demand a smart and efficient programming and upfront thinking about the implemented numerical algorithms (let think about the comparison between hybrid car and V-8 GM hummer). Based on my current understandings, I agreed with Xavier that estimation core of the NONMEM (and also S-ADAPT) would probably need be rewritten almost from scratch. Simply compiled the existing NONMEM code with PGI fortran will not make the NONMEM run in GPU mode and you have to change the source code in order to take advantage of the slim but efficient GPU computing. Kind Regards, Chee M Ng, PharmD, PhD, FCP Children Hospital of Philadelphia School of Medicine University of Pennsylvania Philadelphia, PA 19104

Quoted reply history

________________________________ From: Mark Sale - Next Level Solutions <[email protected]> Cc: [email protected] Sent: Thu, March 17, 2011 8:36:57 AM Subject: RE: [NMusers] GPU port for NONMEM Xavier, That is very exciting. But, the question was, I think, about running NONMEM on a GPU. My conclusion was that there isn't a clear way to break the NONMEM algorithm up into small enough pieces for GPU computing, not whether a new app designed to run on GPU could be written. Your point about the limited memory is a good one, a copy the entire memory space does not need to be loaded for each core, only the part specific to that core, which might be quite small (or quite large, in the case of NONMEM at least, some of the arrays are very large, although this is dramatically better with the recent dynamic sizing, something that wasn't available when we looked at GPU computing). But, I'll look forward to hearing more about your work, it would be a very important result. Mark Sale MD President, Next Level Solutions, LLC www.NextLevelSolns.com 919-846-9185 A carbon-neutral company See our real time solar energy production at: http://enlighten.enphaseenergy.com/public/systems/aSDz2458 -------- Original Message -------- >Subject: Re: [NMusers] GPU port for NONMEM >From: Xavier Woot de Trixhe <[email protected]> >Date: Thu, March 17, 2011 7:52 am >To: [email protected] > >Hi, > > > I have to disagree with your conclusions. > > Last year a POC software was implemented in CUDA to check the concept >rather >than speculate about the subject. > This POC used the SAEM algorithm an a fairly simple PK model: single dose >and analytical function. > > From this exercise it became clear that: > - The data is a lot less dissimilar than you imply. All the data fits >in >a 2D csv file which can hardly be qualified as a complex data structure and as >every individual has the same number of parameters... > > - The computation of IPREDS for analytical models is almost trivial >and >for ODE's one can start with simpler, less efficient (fixed step) ODE-solvers >and is be able to expect an improvement. > - The limited "memory per core" is the memory shared by threads this >is >used for memory optimization not to keep the whole problem. The main memory on >is up to 4 Gigs which far exceeds the "1-2 Gig per NONMEM run" rule of thumb. > > > Our POC-soft was close to 30x speed increase compared to our reference >(CPU: >core i7 940 @2.93 GHz | GPU: nVidia GTX 285 -$500-) without even using within >individual parallelisation. Which could in theory result in a total 240x >speed-up. > > Where we agree is that NONMEM would probably need be rewritten almost from >scratch. > > > So although it could (should) be done; from a programmers p.o.v. the >resulting soft would no longer be NONMEM. > >K. Regards > >Xavier > >On 03/16/2011 03:00 AM, Mark Sale - Next Level Solutions wrote: > GPU was considered in the work leading up to the upcoming parallel NONMEM. > GPU >computing is intended essentially for very simple algorithms applied to a >large >number of very similar data sets (things like bit shifting everything left, or >adding a number to every value of a matrix). Really not well suited to the >NONMEM algorithm (at least the time consuming part, which is the calculation >of >predictions). Most GPUs don't have nearly enough memory per core to run >NONMEM >in the way that it will be parallelized (which basically runs the entire PRED, >just on subset of the data). >>We tried really hard to think of a way to do NONMEM with GPU, even consulted >>with a GPU computing expert at University of North Carolina, and couldn't >>come >>up with a way to do it, and even if the memory requirement could be >>addressed, >>would be a significant rewrite of NONMEM code. But, mainly it wasn't clear >>to >>us how the algorithm could be broken up into pieces small enough for GPU >>computing. >> >> >>But, never say never. >> >> >> >> >> >> >> >> >>Mark Sale MD >>President, Next Level Solutions, LLC >>www.NextLevelSolns.com >>919-846-9185 >>A carbon-neutral company >>See our real time solar energy production at: >> http://enlighten.enphaseenergy.com/public/systems/aSDz2458 >> >> >> >>-------- Original Message -------- >>>Subject: [NMusers] GPU port for NONMEM >>>From: "Amr Ragab" <[email protected]> >>>Date: Tue, March 15, 2011 9:01 pm >>>To: <[email protected]> >>> >>> >>>Hello NMUsers, >>>Wanted to ask what has been done in terms of utilizing the bank of various >>>GPU >>>architecture available. Currently >>>I have access to a few Tesla NVIDIA cards(w/ double precision math). I know >>>PGI >>>has worked out a solution using NVIDIA's CUDA toolkit >>>to create a C++/Fortran complier that uses GPU. Unfortunately NONMEM SETUP7 >>>script doesn't include an install script >>>to utilize PGI's fortran. I managed to get through some Windows trickery for >>>NONMEM to think it opens the intel fortran compiler >>>but instead opens CUDA fortran....but it's not a permanent solution. >>> >>>That of course would be first step, as the ideal solution would be that the >>>entire NONMEM program would be optimized for GPU. And it looks like >>>there is a performance benefit for large datasets or calling nmfe7 for >>>multiple >>>runs.. >>> >>>Thanks >>>Amr -- Exprimo NV This e-mail is confidential. It is also privileged or otherwise protected by work product immunity or other legal rules. The information is intended to be for use of the individual or entity named above. If you are not the intended recipient, please be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. You should therefore delete this message from your computer system. If you have received the message in error, please notify us by reply e-mail. The integrity and security of this message cannot be guaranteed on the Internet. Thank you for your co-operation.

`j` / `k`	Next / previous message
`o`	Open message
`f`	Search
`s`	Copy link
`t`	Filters
`c`	Copy message body
`r`	Related threads
`?`	This help
`Esc`	Close / clear

Re: GPU port for NONMEM

Thread

Keyboard Shortcuts

Mar 16, 2011	Amr Ragab	GPU port for NONMEM
Mar 16, 2011	Mark Sale	RE: GPU port for NONMEM
Mar 17, 2011	Xavier Woot de Trixhe	Re: GPU port for NONMEM
Mar 21, 2011	Chee ng	Re: GPU port for NONMEM
Mar 21, 2011	Amr Ragab	RE: GPU port for NONMEM