linux - Open MPI Virtual Timer Expired -


i'm using open mpi 1.8 on gentoo 3.13 manage data transfer 1 program via server/client concept. both server , clients launched via mpiexec separate processes. after days (this quite heavy computation...), receive error

mpiexec noticed process rank 0 pid 17213 on node xxx exited on signal 26 (virtual timer expired). 

unfortunately, error not reproducible in reliable way, i.e., error not appear , not @ same point in program flow. experienced error on other machines. tracked issue down itimer_virtual which, upon expiration, delivers sigvtalrm (see, e.g., http://man7.org/linux/man-pages/man2/setitimer.2.html). in bugs section of man page, says

under heavy loading, itimer_real timer may expire before signal previous expiration has been delivered. second signal in such event lost.

i wonder if similar might hold itimer_virtual? did experience similar problems , can confirm error?

the workaround can think of invoke setitimer(...) , try manipulate timer myself. however, hope there way since can't modify clients' source code. suggestions?

since question has not been answered officially, on behalf of hristo (@hristoiliev: hope ok you). pointed out in first comment question, there not single hint in open mpi source code can have caused virtual timer expiration. indeed, timer problem related third-party library made code crash after unpredictable time (depending on current loading of machine).


Comments

Popular posts from this blog

how to proxy from https to http with lighttpd -

android - Automated my builds -

python - Flask migration error -