setup of parallel processing and supporting software - help wanted

7 messages 6 people Latest: Dec 09, 2015
Hello The Team, We hear different opinions about effectiveness of parallel processing with NONMEM from very helpful to less helpful. It can be task dependent. How useful is it in phase 3 for basic and covariate models, as well as for bootstrapping? We reached a non-exploratory (production) point when popPK is on a critical path and sophisticated but slow home-made utilities may be insufficient. Are there efficient/quick companies/institutions, which setup parallel processing, supporting software and, possibly, some other utilities (cloud computing, ...)? A group which used to helped us a while ago disappeared somewhere... Thanks, Pavel
WRT bootstrap, it makes much more sense to simply bootstrap the many nonmem runs (start 8 NONMEM runs at at time, rather than 1 run parallelized to 8 cores). So, no, don't use parallel NONMEM for bootstrap, run multiple bootstrap samples at the same time. Mark Mark Sale M.D. Vice President, Modeling and Simulation Nuventra, Inc. (tm) 2525 Meridian Parkway, Suite 280 Research Triangle Park, NC 27713 Office (919)-973-0383 [email protected]<[email protected]> http://www.nuventra.com Empower your Pipeline CONFIDENTIALITY NOTICE The information in this transmittal (including attachments, if any) may be privileged and confidential and is intended only for the recipient(s) listed above. Any review, use, disclosure, distribution or copying of this transmittal, in any form, is prohibited except by or on behalf of the intended recipient(s). If you have received this transmittal in error, please notify me immediately by reply email and destroy all copies of the transmittal.
Quoted reply history
________________________________ From: [email protected] <[email protected]> on behalf of Pavel Belo <[email protected]> Sent: Tuesday, December 8, 2015 4:54 PM To: [email protected] Subject: [NMusers] setup of parallel processing and supporting software - help wanted Hello The Team, We hear different opinions about effectiveness of parallel processing with NONMEM from very helpful to less helpful. It can be task dependent. How useful is it in phase 3 for basic and covariate models, as well as for bootstrapping? We reached a non-exploratory (production) point when popPK is on a critical path and sophisticated but slow home-made utilities may be insufficient. Are there efficient/quick companies/institutions, which setup parallel processing, supporting software and, possibly, some other utilities (cloud computing, ...)? A group which used to helped us a while ago disappeared somewhere... Thanks, Pavel
Mark: Regarding point 2, keep in mind that PARSE_TYPE=2 or 4, algorithms you helped with, do empirical load balancing, improving its assessment with each iteration, so the idle time waiting for all to finish is reduced. Robert J. Bauer, Ph.D. Vice President, Pharmacometrics R&D ICON Early Phase Office: (215) 616-6428 Mobile: (925) 286-0769 [email protected]<mailto:[email protected]> http://www.iconplc.com
Quoted reply history
From: [email protected] [mailto:[email protected]] On Behalf Of Mark Sale Sent: Tuesday, December 08, 2015 3:00 PM To: Pavel Belo; [email protected] Subject: Re: [NMusers] setup of parallel processing and supporting software - help wanted Pavel, The loss of efficiency with parallel computing in NONMEM has two sources: 1. I/O time, each process has to do it's calculation, then write those results to a disc file (on a single machine, even with the MPI method the results are written to a file, that file may or may not be written to disc by the operating system, depending on the file size the whether the OS decides the file may be used soon, same actually in the FPI method, where the OS may decide to buffer the file and not actually write it to disc.). This inefficiency gets larger with the number of processes, and gets substantially larger when you go to multiple machines, as they must send data over the network (and must actually write the data to disk, with either MPI or FPI method). You can actually run parallel NONMEM over a VPN, but as you might imagine, this slows it down substantially. 2. Inefficiency due to one process finishing it's slice of the data before the other. The manager program must wait until the last process is finished before it can do the management (sum the OBJ, calculate the gradient, get the next parameter values, send them out to the processes). This also gets larger with more processes. In a well conditioned problem, where every individual takes roughly the same amount of time to calculate the OBJ for, this isn't too bad. But, occasionally, with stiff ODEs you'll find a small number of individuals who take much, much longer to solve the ODES, and you'll find that efficiency drops substantially. Together these make up Amdahl's law https://en.wikipedia.org/wiki/Amdahl%27s_law [Image removed by https://en.wikipedia.org/wiki/Amdahl%27s_law Amdahl's law - Wikipedia, the free encyclopedia In computer architecture, Amdahl's law (or Amdahl's argument [1]) gives the theoretical speedup in latency of the execution of a task at fixed workload that can be ... Read https://en.wikipedia.org/wiki/Amdahl%27s_law All that said, here are my recommendations: Don't bother trying to parallelize a run that takes less than 10 minutes, the I/O time will cancel out any gain in execution time. Single machine: If the execution time for a single function evaluation (note a run is often between 1000 and 5000 function evaluations) is less than 0.5 seconds, you probably can improve performance with parallel execution. Note that 1000 function evaluations at 0.5 seconds each = 500 seconds, 8 minutes. Multiple machines, Assuming a 1 gbit network, if the execution time for a single function evaluation is > 1 second, you probably can improve performance with parallel execution. I have personally never found a problem that benefited from more than 24 processes, but, in theory some very large problems (run time of weeks) may. Here is a link to a nice paper from the Gibianskys and Bob Bauer with more recent benchmarks than our early work. http://www.ncbi.nlm.nih.gov/pubmed/22101761 Comparison of Nonmem 7.2 estimation methods and parallel ... 1. J Pharmacokinet Pharmacodyn. 2012 Feb;39(1):17-35. doi: 10.1007/s10928-011-9228-y. Epub 2011 Nov 19. Comparison of Nonmem 7.2 estimation methods and parallel ... Read http://www.ncbi.nlm.nih.gov/pubmed/22101761 Mark Mark Sale M.D. Vice President, Modeling and Simulation Nuventra, Inc. (tm) 2525 Meridian Parkway, Suite 280 Research Triangle Park, NC 27713 Office (919)-973-0383 [email protected]<[email protected]> http://www.nuventra.com Empower your Pipeline CONFIDENTIALITY NOTICE The information in this transmittal (including attachments, if any) may be privileged and confidential and is intended only for the recipient(s) listed above. Any review, use, disclosure, distribution or copying of this transmittal, in any form, is prohibited except by or on behalf of the intended recipient(s). If you have received this transmittal in error, please notify me immediately by reply email and destroy all copies of the transmittal. ________________________________ From: [email protected] <[email protected]> on behalf of Pavel Belo <[email protected]> Sent: Tuesday, December 8, 2015 4:54 PM To: [email protected] Subject: [NMusers] setup of parallel processing and supporting software - help wanted Hello The Team, We hear different opinions about effectiveness of parallel processing with NONMEM from very helpful to less helpful. It can be task dependent. How useful is it in phase 3 for basic and covariate models, as well as for bootstrapping? We reached a non-exploratory (production) point when popPK is on a critical path and sophisticated but slow home-made utilities may be insufficient. Are there efficient/quick companies/institutions, which setup parallel processing, supporting software and, possibly, some other utilities (cloud computing, ...)? A group which used to helped us a while ago disappeared somewhere... Thanks, Pavel
Hi Pavel, In general, parallelization discussions always revolve around the following question: “Can you create independent blocks of work?” You should make a clear distinction here between parallelizing nonmem, and running several nonmem runs in parallel. Let’s talk about estimating of a single model, doing a covariate search and doing a bootstrap. Very roughly speaking, Nonmem works as follows: 1) Pick a THETA 2) Estimate the probability curve of all ETA’s for all subjects 3) Compute the integral over all probability curves to find a probability for THETA 4) Pick a more likely THETA, rinse and repeat Parallelizing NONMEM means parallelizing step #2. Step #1, #3 and #4 cannot be parallelized. In practice, we simply split up the subjects in N groups. Each worker calculates the probability curve for all subject in its group and sends the results back to the main worker, who can then calculate step #3 and step #4. This works well for very complex models with a considerable estimation step: ODE systems. If you have a very fast model (e.g. simple $PRED section) and a huge dataset, it might be faster to run all of this locally on a single node. Note that Nonmem 7.3 does not parallelize anything other than the subject ETA estimation step! Nonmem 7.4 will parallelize also the TABLE and/or COVARIANCE estimation step. Conclusion: Parallelizing nonmem only works well in specific cases. You can execute the parallelization by specifying a parafile. If your system administrator specified a parametric parafile, you can also choose the number of CPU’s to parallelize over using [nodes]=x in nmfe, or with -nodes=xxx in PsN execute. Let’s now talk about a covariate search. In this case, we want to evaluate 12 models; we can evaluate them concurrently, as there is no dependence between them. PsN works wonderfully here: you can configure the amount of parallel runs using the -threads=xxx switch. 1) Estimate the base model 2) Create 12 instances of the base model, adding a single covariate to each instance. Launch all of these instances in parallel. 3) Once these 12 instances completed, select the most significant covariate. 4) Create 11 instances of the model from step #3, adding a single covariate to each instance. Launch all of these instances in parallel. 5) etc. As you can see, there is still some dependence: we need all results from step #2 to evaluate step #3. On top of that, parallelizing step #2 means you will have to collect all of those results back over the network to do step #3 (I/O impact). If you do not parallelize, they will already be sitting in main memory. In practice, we use the following calculation: · #threads = #max_covariate_steps · #CPU_available / #max_covariate_steps = #nodes So for a cluster of 20 CPU’s: · A covariate search with 5 covariates would be launched using: scm myModel.scm -threads=5 -nodes=4 · A covariate search with 20 covariates would be launched using: scm myModel.scm -threads=20 -nodes=1 Remember that running multiple nonmem runs concurrently should always be preferred over parallelizing a single nonmem run. Finally, let’s talk about bootstrapping: In this case, there is no dependence between the results. This problem can be perfectly parallelized. In this case, always prefer to run multiple nonmem runs concurrently, instead of parallelizing a single nonmem run. bootstrap myModel -samples=2000 -threads=2000 -nodes=1 Final summary: For a single nonmem run, parallelization may or may not work, depending on how complex your model $PRED code is. For a covariate search, try to prefer running multiple runs at the same time, rather than parallelizing single runs. For bootstraps, always run multiple runs at the same time. Never parallelize a single nonmem run. Kind regards, Ruben
Quoted reply history
From: [email protected] [mailto:[email protected]] On Behalf Of Pavel Belo Sent: dinsdag 8 december 2015 22:54 To: [email protected] Subject: [NMusers] setup of parallel processing and supporting software - help wanted Hello The Team, We hear different opinions about effectiveness of parallel processing with NONMEM from very helpful to less helpful. It can be task dependent. How useful is it in phase 3 for basic and covariate models, as well as for bootstrapping? We reached a non-exploratory (production) point when popPK is on a critical path and sophisticated but slow home-made utilities may be insufficient. Are there efficient/quick companies/institutions, which setup parallel processing, supporting software and, possibly, some other utilities (cloud computing, ...)? A group which used to helped us a while ago disappeared somewhere... Thanks, Pavel Information in this email and any attachments is confidential and intended solely for the use of the individual(s) to whom it is addressed or otherwise directed. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the Company. Finally, the recipient should check this email and any attachments for the presence of viruses. The Company accepts no liability for any damage caused by any virus transmitted by this email. All SGS services are rendered in accordance with the applicable SGS conditions of service available on request and accessible at http://www.sgs.com/en/Terms-and-Conditions.aspx
Pavel, For a complete solution to modeling and simulation you should take a look at our Metworx Performance product. It is a cloud based solution that allows the user to interface with a high performance computing system via a web browser. The system includes NONMEM (7.2 and 7.3), gfortran or Intel fortran, RStudio, R, Stan (MCMC analysis), PiranaJS, PsN, and grid scheduler. Parallel computing via PiranaJS, PsN, or via the metrumrg R package is supported. The system utilizes encrypted disks in Amazon Web Services cloud and all communications to and from the cloud is over encrypted channels. The cluster will autoscale (grow and shrink automatically) to meet the requirements of an analysis. In addition, the user has the ability to develop and deploy Shiny applications via the Metworx Envision product. More information can be found at the address below: http://metrumrg.com/metworx.html Regards, Bill
Quoted reply history
On Tue, Dec 8, 2015 at 4:54 PM, Pavel Belo <[email protected]> wrote: > Hello The Team, > > We hear different opinions about effectiveness of parallel processing with > NONMEM from very helpful to less helpful. It can be task dependent. How > useful is it in phase 3 for basic and covariate models, as well as for > bootstrapping? > > We reached a non-exploratory (production) point when popPK is on a > critical path and sophisticated but slow home-made utilities may be > insufficient. Are there efficient/quick companies/institutions, > which setup parallel processing, supporting software and, possibly, some > other utilities (cloud computing, ...)? A group which used to helped us > a while ago disappeared somewhere... > > Thanks, > Pavel > -- *~~~~~~~~~~~~~~~~~~~~~~~~* *Bill Knebel, PharmD, PhD* *Principal Scientist II* Group Leader, Pharmacology M&S *Metrum Research Group LLC* *2 Tunxis Road, Suite 112* *Tariffville**, CT 06081* *O: 860.735.7043* *C: 860.930.1370* *F: 860.760.6014*
Maybe a little more clarification: Thanks to Bob for pointing out that the PARSE_TYPE=2 or 4 option implements some code for load balancing, and there really is no downside, so should probably always be used. Contrary to other comments, NONMEM 7.3 (and 7.2) does parallelize the covariance step. Ruben is correct that the $TABLE step is not parallelize in 7.3. WRT sometimes it works and sometimes it doesn't, we can be more specific than this. The parallelization takes place at the level of the calculation of the objective function. The data are split up and the OBJ for the subsets of the data is sent to multiple processes. When all processes are done, the results are compiled by the manager program. The total round trip time for one process then is the calculation time + I/O time. Without parallelization, there is no I/O time. For each parallel process, the I/O time is essentially fixed (in our benchmarks maybe 20-40 msec per process on a single machine). The variable of interest then is the calculation time. If the calculation time is 1 msec and the I/O time is 20 msec, if you parallelize to 2 cores, you cut the calculation time to 0.5 msec, now have 40 msec (2*20 msec) of I/O time, for a total of 40.5 msec, much slower. If the calculation time is 500 msec, and you parallelize to 2 cores, the total time is 250 msec (for calculation) + 2*20 msec (for I/O) = 290 msec. If The key parameter then is the time for a single objective function evaluation (not the total run time). If the time for a single function evaluation is > 500 msec, parallelization will be helpful (on a single machine). There really isn't anything very mystical about when it helps and when it doesn't. The efficiency depends very little on the size of the data set, except that the limit of parallelization is the number of subjects (the data set must be split up by subject). Mark Sale M.D. Vice President, Modeling and Simulation Nuventra, Inc. ™ 2525 Meridian Parkway, Suite 280 Research Triangle Park, NC 27713 Office (919)-973-0383 [email protected]<[email protected]> http://www.nuventra.com Empower your Pipeline CONFIDENTIALITY NOTICE The information in this transmittal (including attachments, if any) may be privileged and confidential and is intended only for the recipient(s) listed above. Any review, use, disclosure, distribution or copying of this transmittal, in any form, is prohibited except by or on behalf of the intended recipient(s). If you have received this transmittal in error, please notify me immediately by reply email and destroy all copies of the transmittal.
Quoted reply history
________________________________ From: [email protected] <[email protected]> on behalf of Faelens, Ruben (Belgium) <[email protected]> Sent: Wednesday, December 9, 2015 5:42 AM To: Pavel Belo; [email protected] Subject: RE: [NMusers] setup of parallel processing and supporting software - help wanted Hi Pavel, In general, parallelization discussions always revolve around the following question: “Can you create independent blocks of work?” You should make a clear distinction here between parallelizing nonmem, and running several nonmem runs in parallel. Let’s talk about estimating of a single model, doing a covariate search and doing a bootstrap. Very roughly speaking, Nonmem works as follows: 1) Pick a THETA 2) Estimate the probability curve of all ETA’s for all subjects 3) Compute the integral over all probability curves to find a probability for THETA 4) Pick a more likely THETA, rinse and repeat Parallelizing NONMEM means parallelizing step #2. Step #1, #3 and #4 cannot be parallelized. In practice, we simply split up the subjects in N groups. Each worker calculates the probability curve for all subject in its group and sends the results back to the main worker, who can then calculate step #3 and step #4. This works well for very complex models with a considerable estimation step: ODE systems. If you have a very fast model (e.g. simple $PRED section) and a huge dataset, it might be faster to run all of this locally on a single node. Note that Nonmem 7.3 does not parallelize anything other than the subject ETA estimation step! Nonmem 7.4 will parallelize also the TABLE and/or COVARIANCE estimation step. Conclusion: Parallelizing nonmem only works well in specific cases. You can execute the parallelization by specifying a parafile. If your system administrator specified a parametric parafile, you can also choose the number of CPU’s to parallelize over using [nodes]=x in nmfe, or with -nodes=xxx in PsN execute. Let’s now talk about a covariate search. In this case, we want to evaluate 12 models; we can evaluate them concurrently, as there is no dependence between them. PsN works wonderfully here: you can configure the amount of parallel runs using the -threads=xxx switch. 1) Estimate the base model 2) Create 12 instances of the base model, adding a single covariate to each instance. Launch all of these instances in parallel. 3) Once these 12 instances completed, select the most significant covariate. 4) Create 11 instances of the model from step #3, adding a single covariate to each instance. Launch all of these instances in parallel. 5) etc. As you can see, there is still some dependence: we need all results from step #2 to evaluate step #3. On top of that, parallelizing step #2 means you will have to collect all of those results back over the network to do step #3 (I/O impact). If you do not parallelize, they will already be sitting in main memory. In practice, we use the following calculation: · #threads = #max_covariate_steps · #CPU_available / #max_covariate_steps = #nodes So for a cluster of 20 CPU’s: · A covariate search with 5 covariates would be launched using: scm myModel.scm -threads=5 -nodes=4 · A covariate search with 20 covariates would be launched using: scm myModel.scm -threads=20 -nodes=1 Remember that running multiple nonmem runs concurrently should always be preferred over parallelizing a single nonmem run. Finally, let’s talk about bootstrapping: In this case, there is no dependence between the results. This problem can be perfectly parallelized. In this case, always prefer to run multiple nonmem runs concurrently, instead of parallelizing a single nonmem run. bootstrap myModel -samples=2000 -threads=2000 -nodes=1 Final summary: For a single nonmem run, parallelization may or may not work, depending on how complex your model $PRED code is. For a covariate search, try to prefer running multiple runs at the same time, rather than parallelizing single runs. For bootstraps, always run multiple runs at the same time. Never parallelize a single nonmem run. Kind regards, Ruben From: [email protected] [mailto:[email protected]] On Behalf Of Pavel Belo Sent: dinsdag 8 december 2015 22:54 To: [email protected] Subject: [NMusers] setup of parallel processing and supporting software - help wanted Hello The Team, We hear different opinions about effectiveness of parallel processing with NONMEM from very helpful to less helpful. It can be task dependent. How useful is it in phase 3 for basic and covariate models, as well as for bootstrapping? We reached a non-exploratory (production) point when popPK is on a critical path and sophisticated but slow home-made utilities may be insufficient. Are there efficient/quick companies/institutions, which setup parallel processing, supporting software and, possibly, some other utilities (cloud computing, ...)? A group which used to helped us a while ago disappeared somewhere... Thanks, Pavel Information in this email and any attachments is confidential and intended solely for the use of the individual(s) to whom it is addressed or otherwise directed. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the Company. Finally, the recipient should check this email and any attachments for the presence of viruses. The Company accepts no liability for any damage caused by any virus transmitted by this email. All SGS services are rendered in accordance with the applicable SGS conditions of service available on request and accessible at http://www.sgs.com/en/Terms-and-Conditions.aspx
Mark a) I have to disagree with you that the efficiency of MPI implementation does not depend on the size of the data set for a single desktop SMP machine with multiple processors - larger data sets mean higher granularity and more cpu-bound work between stoppages for communication. This assumes the NLME MPI implementation is done efficiently - I don't know the details of the NONMEM MPI implementation, particularly those of how communications are handled. b) your I/O timings seem horrendously large (if by msec you mean milliseconds) I/O times of 40 milliseconds per function evaluation (assuming 1 function evaluation is a single sweep through all Nsub subjects, evaluating and summing the likelihood contribution from each subject) seem very high. I have been running MPI since its original release in 1994 (I was a member of the committee that designed the first release of MPI during 1992-1994 ) - these communications timings would seem more appropriate for machines from that era. I/O timings for MPI are usually modeled by a latency (startup time - typically on current SMP single desktop machines on the order of 1 microsecond) , and a bandwidth (on the order of 10's of gigabytes/sec for current era SMPs, but much lower for clusters). Based on the latency/bandwidth model, the conventional wisdom is to manage the message processing so as to favor a few large messages as opposed to many small messages to minimize the latency contribution. If possible, small messages should be concatenated into larger messages. I don't know the details of the MPI implementation in NONMEM, but for FOCE-like NLME algorithms, it is possible to limit the number of messages to just a few per function evaluation. If the data set size is expanded by adding more subjects, then more work (more subjects processed) will be done between stoppages for communication at the function evaluation boundaries. In the MPI implementation for Phoenix NLME, I find it almost impossible to find a model where I/O dominates to such an extent that the MPI version runs slower than the single processor version on a 4-processor Intel i7 desktop. For example, I just tested (FOCE) the classic simple closed form Emax model used in the INSERM estimation method comparison exercise from 2004 (Girard and Mentre', PAGE 2005, abstract 234) with Phoenix NLME. It would be hard to find a simpler model - E=E0 + EMAX*DOSE/(ED50 + DOSE) +EPS, with random effects on each of the three parameters E0, EMAX, and ED50, and three observations per subject. If I expand the data set to around 1600 from the original 100 subjects and run on a four processor i7, the internally reported cpu time is 72 sec for four processors vs 18 sec for one processor (a speedup of 4). Wall clock times were a few seconds longer for each run. If I make the data set smaller, down to the original size of 100, the speedup clearly suffers a decrease but I still observe a reported cpu time speedup of 2.5x for the four processors (times are well under 1 sec, so reliable wall clock times are not available). (this was done on a relatively old i7 desktop, so more current machines may do better). c) It is not always necessary to parallelize over function evaluations (i.e. over subjects). In importance sampling EM methods, (IMP in NONMEM, QRPEM in Phoenix NLME), in principle the parallelization can be done over the sample points used in the monte carlo or quasi-monte carlo integral evaluations - there are usually many more of these than processors available. In PHX QRPEM, we actually do it this way and it works fine. Now all processors are working on the same subject at the same time, so load balancing problems tend to go away, but communications overhead increases since now you have to pass separate messages for each subject, whereas in FOCE-like algorithms you only have to pass messages at the end of a sweep through all the subjects. One thing we have noticed is that QRPEM parallelized this way is much more reproducible - single processor results almost always match multiprocessor results exactly, which is not always the case with some of the FOCE-like methods. Bob Leary Fellow, Pharsight Corporation
Quoted reply history
________________________________ From: [email protected] [[email protected]] on behalf of Mark Sale [[email protected]] Sent: Wednesday, December 09, 2015 7:42 AM To: Faelens, Ruben (Belgium); Pavel Belo; [email protected] Subject: Re: [NMusers] setup of parallel processing and supporting software - help wanted Maybe a little more clarification: Thanks to Bob for pointing out that the PARSE_TYPE=2 or 4 option implements some code for load balancing, and there really is no downside, so should probably always be used. Contrary to other comments, NONMEM 7.3 (and 7.2) does parallelize the covariance step. Ruben is correct that the $TABLE step is not parallelize in 7.3. WRT sometimes it works and sometimes it doesn't, we can be more specific than this. The parallelization takes place at the level of the calculation of the objective function. The data are split up and the OBJ for the subsets of the data is sent to multiple processes. When all processes are done, the results are compiled by the manager program. The total round trip time for one process then is the calculation time + I/O time. Without parallelization, there is no I/O time. For each parallel process, the I/O time is essentially fixed (in our benchmarks maybe 20-40 msec per process on a single machine). The variable of interest then is the calculation time. If the calculation time is 1 msec and the I/O time is 20 msec, if you parallelize to 2 cores, you cut the calculation time to 0.5 msec, now have 40 msec (2*20 msec) of I/O time, for a total of 40.5 msec, much slower. If the calculation time is 500 msec, and you parallelize to 2 cores, the total time is 250 msec (for calculation) + 2*20 msec (for I/O) = 290 msec. If The key parameter then is the time for a single objective function evaluation (not the total run time). If the time for a single function evaluation is > 500 msec, parallelization will be helpful (on a single machine). There really isn't anything very mystical about when it helps and when it doesn't. The efficiency depends very little on the size of the data set, except that the limit of parallelization is the number of subjects (the data set must be split up by subject). Mark Sale M.D. Vice President, Modeling and Simulation Nuventra, Inc. ™ 2525 Meridian Parkway, Suite 280 Research Triangle Park, NC 27713 Office (919)-973-0383 [email protected]<UrlBlockedError.aspx> http://www.nuventra.com