From: Jim Tuccillo <jjt@radia1.com>
Newsgroups: comp.parallel.mpi
Subject: Re: Performance of overlapped communication and computation
Date: Sat, 09 Jan 1999 07:45:20 -0500
Organization: Posted via RemarQ, http://www.remarQ.com - Discussions start
    here!
Message-Id: <36974F60.59E2@radia1.com>
References: <76tkr7$efc$1@lux.doc.ic.ac.uk>
    <772v8p$8q7$1@nnrp1.dejanews.com>
    <3695C224.4557861A@mufasa.informatik.uni-mannheim.de>
    <77505v$749$1@lux.doc.ic.ac.uk> <775lro$t00$1@walter-fddi.cray.com>
    <7767ip$qq7$1@lux.doc.ic.ac.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi William,

I'm not sure if you saw my previous post regarding the SP. The CPU
"overhead" is about 3% as determined by looking at the number of "ticks"
with and without communication. As indicated by others, despite the
presence of hardware to handle the communications in the SP, some work
must still be done by the CPU to process the communication protocol
which executes in user space.

Your test code can exhibit different behavior depending on the system. I
believe, by and large, that most systems will "stall" on the MPI_WAIT
because there is no guarantee that the issue of an asynchronous MPI send
and receive will actually cause data to start transfering immediately. A
system may be busy doing your computational loop and then "stall" on the
MPI_WAIT as it waits for the communication to complete. On the SP, the
application can be signaled by the communications subsystem when
communication is ready to complete so that the communication can be
completed during your computational loop. This is why the total time on
the SP is what you preceive to be correct - 40 seconds. There is some
overhead as manifested by less "ticks" without communications.

In summary, on the SP, the communications and computations are overlaped
as you would expect BUT the communications is not free.

Regards,
  Jim 


> I have to say Jim's results on the IBM SP definitely appear to be the
> best so far - I'd be interested to see more results on SP to see if it
> does the transfer without much CPU overhead (by comparing the reported
> tick count with non-blocking send/receive to the tick count without
> any sending/receiving). If it does then IBM would appear to have
> got the right end of the stick many moons ago.

=========================================================================
Jim Tuccillo			jjt@radia1.com
IBM				tuccillo@us.ibm.com
415 Loyd Road			voice: 770.487.6694, fax: same, call first
Peachtree City, GA 30269	http://www.radia1.com/jjt1
=========================================================================