Newsgroups: comp.parallel.mpi
From: Richard Barrett <rbarrett@lanl.gov>
Subject: Re: MPI vs. shmem
Organization: Los Alamos National Laboratory
Date: Wed, 01 Apr 1998 09:18:17 -0700
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <352268C9.67C5@lanl.gov>

> By learning shmem: 
> 1) My code runs faster than using MPI; 
> 2) I learned a great deal more about how the machine works;

Exactly. In fact, you really just found out how MPI works
on the T3D: MPI is implemented using SHMEM. (Obvious, since
SHMEM is how the T3D moves data from one processor to another.)
Since MPI is a send/recv setup, two or three shmems are
used to accomplish this.

BTW, it should be noted that simply claiming shmemput has a
3-4 us latency while MPI is 12-15 us (?) is not necessarily
identifying what's happening in a code. The use of shmem often
requires the use of barriers are well as cache flushes, so that
time should be included. So a true measure of shmem vs. MPI would
be wall clock time of a quality application implementation that
uses both. (Also note that shmemget is slower than put because it
is implemented using put.)

Anyway, it seems reasonable that if you have the time and inclination
to use a faster data movement mechanism that you would use it. 


Richard

[Please note that this is a rough sketch of what I've been told
is happening on the T3D. Perhaps an implementor or two will
correct me or add details.]