Re: [EXTERNAL] Segmentation fault - invalid memory reference for nonmem7.5.0 in HPC cluster
Thanks a lot Bob, I will give it a try
Best,
Tong
Quoted reply history
On Fri, Feb 18, 2022 at 2:48 PM Bauer, Robert <[email protected]>
wrote:
> Dear Tong Lu:
>
> Your segmentation fault might have something to do with bug #2 listed in
> the attached list. A work-around is recommended you might want to try, or
> you can install NONMEM 7.5.1 from
>
>
>
> https://nonmem.iconplc.com/nonmem751
>
>
>
>
>
>
>
> Robert J. Bauer, Ph.D.
>
> Senior Director
>
> Pharmacometrics R&D
>
> ICON Early Phase
>
> 820 W. Diamond Avenue
>
> Suite 100
>
> Gaithersburg, MD 20878
>
> Office: (215) 616-6428
>
> Mobile: (925) 286-0769
>
> [email protected]
>
> www.iconplc.com
>
>
>
> *From:* [email protected] <[email protected]> *On
> Behalf Of *Tong Lu
> *Sent:* Thursday, February 17, 2022 10:34 AM
> *To:* [email protected]
> *Subject:* [EXTERNAL] [NMusers] Segmentation fault - invalid memory
> reference for nonmem7.5.0 in HPC cluster
>
>
>
> Hello NM users,
>
> We recently encountered a strange error message for nonmem7.5.0 in the HPC
> cluster, which only occurred for models with long run time (occurred around
> 1.5 day with 180 nodes). The model was terminated because of this, but it
> could be continued with the msf file.
>
>
>
> Any help would be appreciated. Please let me know if you need
> more information to figure this out.
>
>
>
> Thanks a lot,
>
> Tong
>
>
>
> Starting 1 NONMEM executions. 1 in parallel.
> S:1 ..
> All executions started.
>
> Program received signal SIGSEGV: Segmentation fault - invalid memory
> reference.
>
> Backtrace for this error:
> #0 0x7ae33f in ???
> #1 0x7e27ef in ???
> #2 0x7e4003 in ???
> #3 0x7d623f in ???
> #4 0x7b764c in ???
> #5 0x7b79ca in ???
> #6 0x7b0498 in ???
> #7 0x45de7c in ???
> #8 0x45e08a in ???
> #9 0x591109 in ???
> #10 0x5853d4 in ???
> #11 0x541c05 in ???
> #12 0x657652 in ???
> #13 0x5914fd in ???
> #14 0x404794 in ???
> #15 0x8fa8c3 in ???
> #16 0x8fab40 in ???
> #17 0x404c4b in ???
> #18 0xffffffffffffffff in ???
> [proxy:0:0@nc260] HYD_pmcd_pmip_control_cmd_cb
> (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
> [proxy:0:0@nc260] HYDT_dmxu_poll_wait_for_event
> (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:0@nc260] main (pm/pmiserv/pmip.c:202): demux engine error
> waiting for event
> [proxy:0:6@nc266] HYD_pmcd_pmip_control_cmd_cb
> (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
> [proxy:0:4@nc264] HYD_pmcd_pmip_control_cmd_cb
> (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
> [proxy:0:4@nc264] HYDT_dmxu_poll_wait_for_event
> (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:4@nc264] main (pm/pmiserv/pmip.c:202): demux engine error
> waiting for event
> [proxy:0:2@nc262] HYD_pmcd_pmip_control_cmd_cb
> (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
> [proxy:0:2@nc262] HYDT_dmxu_poll_wait_for_event
> (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:2@nc262] main (pm/pmiserv/pmip.c:202): demux engine error
> waiting for event
> [proxy:0:1@nc261] HYD_pmcd_pmip_control_cmd_cb
> (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
> [proxy:0:1@nc261] HYDT_dmxu_poll_wait_for_event
> (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:1@nc261] main (pm/pmiserv/pmip.c:202): demux engine error
> waiting for event
> [proxy:0:5@nc265] HYD_pmcd_pmip_control_cmd_cb
> (pm/pmiserv/pmip_cb.c:887): assert (!closed) failed
> [proxy:0:5@nc265] HYDT_dmxu_poll_wait_for_event
> (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:5@nc265] main (pm/pmiserv/pmip.c:202): demux engine error
> waiting for event
> [proxy:0:6@nc266] HYDT_dmxu_poll_wait_for_event
> (tools/demux/demux_poll.c:76): callback returned error status
> [proxy:0:6@nc266] main (pm/pmiserv/pmip.c:202): demux engine error
> waiting for event
> srun: error: nc262: task 2: Exited with exit code 7
> srun: error: nc266: task 6: Exited with exit code 7
> srun: error: nc265: task 5: Exited with exit code 7
> srun: error: nc261: task 1: Exited with exit code 7
> srun: error: nc264: task 4: Exited with exit code 7
> srun: error: nc260: task 0: Exited with exit code 7
> [mpiexec@nc260] HYDT_bscu_wait_for_completion
> (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
> badly; aborting
> [mpiexec@nc260] HYDT_bsci_wait_for_completion
> (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for
> completion
> [mpiexec@nc260] HYD_pmci_wait_for_completion
> (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
> completion
> [mpiexec@nc260] main (ui/mpich/mpiexec.c:340): process manager error
> waiting for completion
> NONMEM run failed. Check the lst-file in NM_run1 for errors
> Not restarting this model.
> F:1 ..
> execute done
>
>
>