From: Jim Tuccillo <jjt@radia1.com>
Newsgroups: comp.parallel.mpi
Subject: Re: Performance of overlapped communication and computation
Date: Fri, 08 Jan 1999 10:55:55 -0500
Organization: Posted via RemarQ, http://www.remarQ.com - Discussions start
    here!
Message-Id: <36962A8B.2781@radia1.com>
References: <76tkr7$efc$1@lux.doc.ic.ac.uk>
    <76tn46$ni3$1@pegasus.csx.cam.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


On the SP, the communication is completely overlapped. See below.

[1] starting...
[0] starting...
[0] iteration 0
[1] iteration 0
wait = 3.40424e-05
wait = 0.000110851
[1] iteration 1
[0] iteration 1
wait = 3.3617e-05
wait = 3.87235e-05
[0] iteration 2
[1] iteration 2
wait = 3.14892e-05
wait = 3.80853e-05
[0] iteration 3
[1] iteration 3
wait = 3.48934e-05
wait = 4.74469e-05
[0] iteration 4
[1] iteration 4
wait = 3.23404e-05
wait = 3.37232e-05
[1] iteration 5
[0] iteration 5
wait = 3.32976e-05
wait = 3.56382e-05
[0] iteration 6
[1] iteration 6
wait = 3.28724e-05
wait = 3.54254e-05
[1] iteration 7
[0] iteration 7
wait = 3.12766e-05
wait = 3.56382e-05
[0] iteration 8
[1] iteration 8
wait = 3.64895e-05
wait = 3.79789e-05
[1] iteration 9
[0] iteration 9
wait = 3.19148e-05
wait = 3.81917e-05
[0] all done, tick = 17933592, time = 40.0098
[1] all done, tick = 17889158, time = 40.0091


Nick Maclaren wrote:
> 
> In article <76tkr7$efc$1@lux.doc.ic.ac.uk>,
> William Knottenbelt <wjk@pluto.doc.ic.ac.uk> wrote:
> >
> >I am trying to overlap communication and computation on a Fujitsu
> >AP3000 distributed-memory parallel computer (runs Solaris 2.5.1).
> >In particular, I am using MPI_Irecv and MPI_Isend calls to exchange
> >vectors between 2 processors while they each do some "computation"
> >(here 4 seconds worth which should be more than enough time for
> >the communication). I do 10 iterations of this process, so the
> >whole thing should take 40 seconds if the computation and communication
> >is overlapped nicely.
> 
> I don't know that system, so my comments may be wrong.
> 
> >Unfortunately none of the communication seems to be happening during
> >the computation loop - it all seems to happen at the MPI_Wait() statement
> >which happens after the computation. The result is not much better
> >then using blocking sends and receives.
> >
> >Is it reasonable to expect the communication to occur during the
> >computation - or am I expecting too much? Or is it supposed to happen
> >but I have a lousy MPI implementation? I'd very much appreciate
> >any comments, and I attach my code below in case I'm doing something
> >silly (or if you want to test it on your own machine/MPI version since
> >I'd be very interested in the results)!
> 
> It is reasonable but unrealistic.  MPI communication is not simple,
> and cannot usually be done somely by DMA even when that hardware
> exists - it needs a CPU to reorganise the data, control the transfer
> and so on.  So almost all MPI implementations on machines with one
> processor per node are effectively "always blocking", and most are
> even when there is more than one processor per node.
> 
> Regards,
> Nick Maclaren,
> University of Cambridge Computing Service,
> New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
> Email:  nmm1@cam.ac.uk
> Tel.:  +44 1223 334761    Fax:  +44 1223 334679

-- 
=========================================================================
Jim Tuccillo			jjt@radia1.com
IBM				tuccillo@us.ibm.com
415 Loyd Road			voice: 770.487.6694, fax: same, call first
Peachtree City, GA 30269	http://www.radia1.com/jjt1
=========================================================================

