Newsgroups: comp.parallel.mpi
From: Mathias Waack <mathias@mufasa.informatik.uni-mannheim.de>
Subject: Deadlock in MPICH with blocking Send/Rec
Organization: Dept. of Computer Science, University of Mannheim, Germany
Date: Tue, 27 Jan 1998 10:44:48 +0100
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <34CDAC90.5851@mufasa.informatik.uni-mannheim.de>

Hi,

I've tested an installation with MPICH on some Linux boxes. I've wrote
a simple program which sends some messages of different sizes 
and measures the time used by the data transfer. The program 
only uses blocking sends und receives (MPI_Send and MPI_Recv). 
But after some transfers the program blocks. I've repeated the 
test in a Solaris environment on Sun boxes and got the same 
result. The MPICH is configured to use the ch_p4 device for 
communication. 

The sender runs throw the following code:
  for(i=stepsize;i<maxsize;i+=stepsize) {
    avgtime = 0.0;
    for (j=0;j<pingcount;j++) {
      start = MPI_Wtime();
      MPI_Send(msgbuf,i,MPI_CHAR,0,1,MPI_COMM_WORLD);
      avgtime += MPI_Wtime() - start;
      //sleep(1);
    }
    avgtime /= (float)pingcount;
    printf("msgsize: %d bytes\t\ttime: %f s\n",i,avgtime);
    fflush(stdout);
  }
and the receiver 
  for(i=stepsize;i<maxsize;i+=stepsize) {
    for (j=0;j<pingcount;j++) {
      MPI_Recv(msgbuf,i,MPI_CHAR,MPI_ANY_SOURCE,MPI_ANY_TAG,
           MPI_COMM_WORLD,&stat);
    }
  }

As you can see I forced the sender to sleep after each send 
but it doesn't help. Because both sending and receiving are 
blocking there should be no buffer overflow or such thing 
which can cause a deadlock. I've run the sender in a 
debugger and saw it stops during a call to MPI_Send().
Can anybody tell me, why this program stops? 

Thanks in advance,
	Mathias

-- 
Mathias Waack		|     mathias@mufasa.informatik.uni-mannheim.de
Tel.:  +49 621 292 1620  Fax.:  +49 621 292 5597

