RE: GPU port for NONMEM
GPU was considered in the work leading up to the upcoming parallel NONMEM. GPU computing is intended essentially for very simple algorithms applied to a large number of very similar data sets (things like bit shifting everything left, or adding a number to every value of a matrix). Really not well suited to the NONMEM algorithm (at least the time consuming part, which is the calculation of predictions). Most GPUs don't have nearly enough memory per core to run NONMEM in the way that it will be parallelized (which basically runs the entire PRED, just on subset of the data). We tried really hard to think of a way to do NONMEM with GPU, even consulted with a GPU computing expert at University of North Carolina, and couldn't come up with a way to do it, and even if the memory requirement could be addressed, would be a significant rewrite of NONMEM code. But, mainly it wasn't clear to us how the algorithm could be broken up into pieces small enough for GPU computing. But, never say never. Mark Sale MD President, Next Level Solutions, LLC www.NextLevelSolns.com 919-846-9185 A carbon-neutral company See our real time solar energy production at: http://enlighten.enphaseenergy.com/public/systems/aSDz2458
Quoted reply history
-------- Original Message --------
Subject: [NMusers] GPU port for NONMEM
From: "Amr Ragab" < [email protected] >
Date: Tue, March 15, 2011 9:01 pm
To: < [email protected] >
Hello NMUsers, Wanted to ask what has been done in terms of utilizing the bank of various GPU architecture available. Currently I have access to a few Tesla NVIDIA cards(w/ double precision math). I know PGI has worked out a solution using NVIDIA's CUDA toolkit to create a C++/Fortran complier that uses GPU. Unfortunately NONMEM SETUP7 script doesn't include an install script to utilize PGI's fortran. I managed to get through some Windows trickery for NONMEM to think it opens the intel fortran compiler but instead opens CUDA fortran....but it's not a permanent solution. That of course would be first step, as the ideal solution would be that the entire NONMEM program would be optimized for GPU. And it looks like there is a performance benefit for large datasets or calling nmfe7 for multiple runs.. Thanks Amr