From: Lars Rzymianowicz <lr@mufasa.informatik.uni-mannheim.de>
Newsgroups: comp.parallel.mpi
Subject: Re: Performance of overlapped communication and computation
Date: Fri, 08 Jan 1999 09:30:28 +0100
Organization: Dept. of Computer Engineering, University of Mannheim, Germany
Message-Id: <3695C224.4557861A@mufasa.informatik.uni-mannheim.de>
References: <76tkr7$efc$1@lux.doc.ic.ac.uk>
    <76tn46$ni3$1@pegasus.csx.cam.ac.uk> <7705o8$jf$1@lux.doc.ic.ac.uk>
    <770p7a$cbm$1@nnrp1.dejanews.com> <77276u$rfp$1@lux.doc.ic.ac.uk>
    <772v8p$8q7$1@nnrp1.dejanews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


don_schad@my-dejanews.com wrote:
> Perhaps you might not want to bother.  Here are the results of running
> the 3 versions of your code on an 4-CPU SGI Challenge RS-10000 super-dooper
> computer:
> 
> isend/irecv/wait:
> [0] all done, tick = 4017432, time = 42.8943
> [1] all done, tick = 4022494, time = 42.8962
> 
> send/recv blocking
> 
> [0] all done, tick = 4027718, time = 42.6865
> [1] all done, tick = 4034738, time = 42.7301
> 
> MPI_Sendrecv()
> 
> [0] all done, tick = 4015940, time = 42.8822
> [1] all done, tick = 4015395, time = 42.8825
>
> For some reason the blocking version seems to run faster,
> but perhaps this is due to fewer function calls or something?
> The difference is trivial (I repeated runs and we see the
> same thing).

Hi all,
i think, the problem is here, that the test program is not an adequate
benchmark for blocking vs non-blocking calls. What we have is an app,
where all processes do the same amount of work, and all tests were run
with equal-strong nodes. So all nodes will reach their MPI calls at more
or less the same time...
Non-blocking calls can help a lot in a more heterogenous environment
(different CPU clock, RAM size ...). And if processes have different
amount of work. Think of a master-slave model, where the master distributes
different data portions to the slaves. You won't use blocking calls here, or?

In general i would say, if you don't need synchronization, better use nb calls.
The effect is even larger with HPC networks like Myrinet, ServerNet etc.,
where a lot of work is done by the network interface and not the CPU.

Lars
-- 
Homepage: http://mufasa.informatik.uni-mannheim.de/lsra/persons/lars/

