From: Blas Pedro Uberuaga <buber@u.washington.edu>
Newsgroups: comp.parallel.mpi
Subject: MPICH and net_recv_timeout error
Date: Wed, 05 May 1999 09:22:51 -0400
Organization: University of Washington
Message-Id: <3730462B.25D0FB60@u.washington.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Xref: ukc comp.parallel.mpi:5029


Hi All,

I have MPICH 1.1.2 running on a cluster of IBM 43P-260s.  I'm trying to
run a code compiled with MPICH linked in from the command line and when
I do, I often get this error: 

net_recv_timeout failed for fd = 8
p6_13266: 258:  p4_error: net_recv_timeout read, errno = : 25

The machine just sits there afterwards.  If I do a control-C and then
try the same command again, after a couple of times, it will eventually
work.

Is this a problem with my communications?  We have 100Mbit connection
between all of the machines I'm using.  Is there a way to get MPICH to
probe machines for a longer period of time?

Thank you.

