From: Will Reed <wr-reed@glsn15.ews.uiuc.edu>
Newsgroups: comp.parallel.mpi
Subject: Linux MPI_BCAST Problem
Date: Fri, 18 Jun 1999 11:09:51 -0500
Organization: University of Illinois at Urbana-Champaign
Message-Id: <Pine.GSO.4.10.9906181008180.3461-100000@glsn15.ews.uiuc.edu>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Xref: ukc comp.parallel.mpi:5225


Hi,

I have a large numerical code written in F90 that I'm trying to run in
linux (RH6 cluster -- 12 proc, ch_p4).  The code works on SGI O2k under
SGI's MPI implementation.  Unfortunately, I have been unable to run it
under linux because I get the following runtime error:  (mpirun -np 4 in
this case)

p2_2229:  p4_error: interrupt SIGSEGV: 11
rm_l_2_2230:  p4_error: interrupt SIGINT: 2
p1_5730:  p4_error: interrupt SIGSEGV: 11
rm_l_1_5731:  p4_error: interrupt SIGINT: 2
p3_2232: (5.008455) Trying to receive a message when there are no
connections; Bailing out
bm_list_5727:  p4_error: net_recv read:  probable EOF on socket: 1

I have isolated the fault to a portion of the code that does a group of
MPI_BCAST's (about 100 variables, roughly 2 MB data).  I have had similar
problems before with large groups of BCASTs on Sun machines, which was
cured by reducing the size & number of BCASTs  (Seemed like a buffering
problem...)  I have tried setting P4_SOCKBUFSIZE higher to no avail.

Any help with this would be greatly appreciated.

-Will Reed
wr-reed@uiuc.edu

