From: wjk@pluto.doc.ic.ac.uk (William Knottenbelt) Newsgroups: comp.parallel.mpi Subject: Re: Performance of overlapped communication and computation Date: 8 Jan 1999 13:11:27 GMT Organization: Dept. of Computing, Imperial College, University of London, UK. Message-Id: <77505v$749$1@lux.doc.ic.ac.uk> References: <76tkr7$efc$1@lux.doc.ic.ac.uk> <76tn46$ni3$1@pegasus.csx.cam.ac.uk> <7705o8$jf$1@lux.doc.ic.ac.uk> <770p7a$cbm$1@nnrp1.dejanews.com> <77276u$rfp$1@lux.doc.ic.ac.uk> <772v8p$8q7$1@nnrp1.dejanews.com> <3695C224.4557861A@mufasa.informatik.uni-mannheim.de> Hi Lars : i think, the problem is here, that the test program is not an adequate : benchmark for blocking vs non-blocking calls. What we have is an app, : where all processes do the same amount of work, and all tests were run : with equal-strong nodes. So all nodes will reach their MPI calls at more : or less the same time... Thanks for your comments. My problem is the fact that the non-blocking send operation (MPI_Isend) is taking just as long as a blocking send operation. By "time taken" I mean the actual function call itself MPI_Isend(), not any subsequent MPI_Wait() i.e. the "non-blocking" call seems to block for just as long as the blocking call. The non-blocking receive seems to be working in the manner expected - the function call returns almost immediately, and you can receive data in the back ground while doing other computation. Note that the data received/sent is not needed by the *current* iteration of the computation loop (but is used by the next one), so it'd be handy to overlap the two. : Non-blocking calls can help a lot in a more heterogenous environment : (different CPU clock, RAM size ...). And if processes have different : amount of work. Yes I agree they are very useful in this situation, but they should also be just as useful in the equal work situation you described in your first paragraph - why can't the communication proceed in the background during the fixed computation time? If you use blocking calls, you are forcing yourself to wait for data to be sent/received even though it's not needed in the current iteration of the computation loop. See the threading experiment I outlined in my previous post to Don to get an idea of what I'm trying to achieve. Best regards -- William Knottenbelt Department of Computing Imperial College 180 Queens Gate South Kensington LONDON SW7 2BZ