Newsgroups: comp.parallel.mpi
From: Erik Demaine <eddemain@daisy.uwaterloo.ca>
Subject: IDEA: MPI testbed
Organization: University of Waterloo
Date: Wed, 5 Mar 1997 20:03:25 GMT
Message-ID: <E6L6Dq.MIz@undergrad.math.uwaterloo.ca>

Good day,

As we all know, the development/testing/debugging/tuning stage is a big part of
parallel programming.  Typically, supercomputing or any parallel resources are
not available for this stage, only for the getting-results/jumping-for-joy/
using-the-application phase.  Sometimes, parallel resources (e.g., networks of
workstations) are available, but are usually loaded (people mind if you suck
the system dry of CPU cycles) and often are not "massive" enough.  That is,
they don't have as many processors as you'd like to test your application
with.  Some applications may not work if you use 31 processors because it's
the third Mersenne prime, etc.

Wouldn't it be nice if you could test MPI applications on your (or somebody
else's) workstation?  You can with existing MPI implementations, e.g., MPICH
and LAM.  Unfortunately:

        - these implementations use UNIX processes.  There are severe limits
          on the number of UNIX processes a user can own (16 or 32 is often
          out of the question), and the task-switching cost is large.

        - MPICH (I don't know about LAM) uses polling for communication, which
          will perform badly for a high process-to-processor ratio.

A nice goal to set would be: the n-processor version of an application should
run (on a single processor) almost as fast as the 1-processor version, assuming
the two versions solve the same problem.  Basically we want to minimize the
task-switching and communication overhead, given that all the processes will be
on a single processor.

A solution that I suggest is to make a multi-threaded (single address space)
implementation of MPI.  Task-switching costs are minimal.  Communication
(through shared storage) is fast.

The approach I was thinking of taking is to use user-level threads, and
task-switch at calls to MPI.  A similar project has been done for PVM by Manuel
Mollar Villanueva.  Here overheads should be absolutely minimal, assuming I
implement things right, and the system would be completely portable (assuming
you have setjmp() or whatever's necessary).

An alternative approach is to use POSIX or other threads.  This would be
portable to any system that supports these threads.  Overheads should be
reasonable, but slightly higher.  One advantage is you can exploit SMP
workstations because of system-level threads.

Comments or suggestions on the basic idea or either of these approaches?
Or can you suggest other approaches?

One major problem I see with both approaches: applications will have to be
thread-safe.  Each process will use the same executable core and address
space.  If you use global data on each process, you're in trouble.  I don't
know for sure, but I think that the MPI standard mandates that each process
should have its own address space.  Hence, it wouldn't be an "implementation
of MPI," but rather something else.

Any suggestions on how to get around this, in either approach?  Perhaps there
is away to get multiple address spaces reasonably cheaply?

I'll post a summary if I get e-mail only (unposted) responses.

Erik
-- 
Erik Demaine                 ()  e-mail: eddemain@daisy.uwaterloo.ca
Dept. of Computer Science    ||  URL: http://daisy.uwaterloo.ca/~eddemain/
University of Waterloo       ||  PGP key: finger me.  "Maturity is switching
Waterloo, ON Canada N2L 3G1  ()  from passive voice to active voice" -P. Alder