From: Bo-sung Lee <bslee@pvmcube4.snu.ac.kr>
Newsgroups: comp.parallel.mpi
Subject: MPI error in CrayT3E
Date: Tue, 08 Sep 1998 15:26:18 +0900
Organization: ETRI/Super Computer Center
Message-Id: <35F4CE0A.4AD77A91@pvmcube4.snu.ac.kr>
Reply-To: bslee@pvmcube4.snu.ac.kr, bslee@linuxsvr.seri.re.kr
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Does anybody experience like this error in CrayT3E?
This is error report from NQE in CrayT3E.


+ cd /scc/bslee/data/airfoil/arc018/
+ ja
+ mpprun -n 16 /scc/bslee/mpi/bin/mpi_bench2d-o
-MPI- FATAL: Remote protocol queue full (PE -MPI- FATAL: Remote protocol
queue full (PE -MPI-
FATAL: Remote protocol queue full (PE -MPI- FATAL: Remote protocol queue
full (PE -MPI- FATAL:
 Remote protocol queue full (PE -MPI- FATAL: Remote protocol queue full
(PE -MPI- FATAL: Remot
e protocol queue full (PE -MPI- FATAL: Remote protocol queue full (PE
-MPI- FATAL: Remote prot
ocol queue full (PE -MPI- FATAL: Remote protocol queue full (PE -MPI-
FATAL: Remote protocol q
ueue full (PE -MPI- FATAL: Remote protocol queue full (PE -MPI- FATAL:
Remote protocol queue f
ull (PE -MPI- FATAL: Remote protocol queue full (PE -MPI- FATAL: Remote
protocol queue full (P
E 15)
10)
5)
11)
12)
13)
6)
14)
1)
2)
3)
8)
4)
7)
0)
SIGNAL: Abort ( from process 158053 )

 Beginning of Traceback (PE 15):
  Interrupt at address 0x800051050 in routine '_lwp_kill'.
  Called from line 30 (address 0x800050770) in routine 'raise'.
  Called from line 127 (address 0x800021220) in routine 'abort'.
  Called from line 4525 (address 0x800140040) in routine 'MPI_SEND'.
  Called from line 475 (address 0x800005968) in routine 'MPI_BENCH2D'.
  Called from line 475 (address 0x800000c98) in routine '$START$'.
 End of Traceback.
/usr/spool/nqe/spool/scripts/++2Z8+++++0+++[10]: 12274 Abort(coredump)
+ ja -csf

'Remote protocol queue full error' occurs frequently in CrayT3E system.
It isn't my mistakes in programming with MPI, because other MPI users
are experienced in the same error. So, I developed two parallel code
with PVM and MPI in CrayT3E. Two codes are exactly same except message
passing routine.
PVM code has no problem. Above error occurs irregularly. So we think
this is MPI in CrayT3E's native problem.
In LAM MPI in Linux clusters, this MPI code and other's code have no
problem.

Help me.
If possible, send mail to me directly.

+++++++++++++++++++++++++++++++++++++++
Bo-sung Lee
ETRI/Super Computer Center, Korea

mailto:bslee@pvmcube4.snu.ac.kr
mailto:bslee@linuxsvr.seri.re.kr
+++++++++++++++++++++++++++++++++++++++

