Newsgroups: comp.parallel.mpi From: Erik Demaine Subject: IDEA: MPI testbed Organization: University of Waterloo Date: Wed, 5 Mar 1997 20:03:25 GMT Message-ID: Good day, As we all know, the development/testing/debugging/tuning stage is a big part of parallel programming. Typically, supercomputing or any parallel resources are not available for this stage, only for the getting-results/jumping-for-joy/ using-the-application phase. Sometimes, parallel resources (e.g., networks of workstations) are available, but are usually loaded (people mind if you suck the system dry of CPU cycles) and often are not "massive" enough. That is, they don't have as many processors as you'd like to test your application with. Some applications may not work if you use 31 processors because it's the third Mersenne prime, etc. Wouldn't it be nice if you could test MPI applications on your (or somebody else's) workstation? You can with existing MPI implementations, e.g., MPICH and LAM. Unfortunately: - these implementations use UNIX processes. There are severe limits on the number of UNIX processes a user can own (16 or 32 is often out of the question), and the task-switching cost is large. - MPICH (I don't know about LAM) uses polling for communication, which will perform badly for a high process-to-processor ratio. A nice goal to set would be: the n-processor version of an application should run (on a single processor) almost as fast as the 1-processor version, assuming the two versions solve the same problem. Basically we want to minimize the task-switching and communication overhead, given that all the processes will be on a single processor. A solution that I suggest is to make a multi-threaded (single address space) implementation of MPI. Task-switching costs are minimal. Communication (through shared storage) is fast. The approach I was thinking of taking is to use user-level threads, and task-switch at calls to MPI. A similar project has been done for PVM by Manuel Mollar Villanueva. Here overheads should be absolutely minimal, assuming I implement things right, and the system would be completely portable (assuming you have setjmp() or whatever's necessary). An alternative approach is to use POSIX or other threads. This would be portable to any system that supports these threads. Overheads should be reasonable, but slightly higher. One advantage is you can exploit SMP workstations because of system-level threads. Comments or suggestions on the basic idea or either of these approaches? Or can you suggest other approaches? One major problem I see with both approaches: applications will have to be thread-safe. Each process will use the same executable core and address space. If you use global data on each process, you're in trouble. I don't know for sure, but I think that the MPI standard mandates that each process should have its own address space. Hence, it wouldn't be an "implementation of MPI," but rather something else. Any suggestions on how to get around this, in either approach? Perhaps there is away to get multiple address spaces reasonably cheaply? I'll post a summary if I get e-mail only (unposted) responses. Erik -- Erik Demaine () e-mail: eddemain@daisy.uwaterloo.ca Dept. of Computer Science || URL: http://daisy.uwaterloo.ca/~eddemain/ University of Waterloo || PGP key: finger me. "Maturity is switching Waterloo, ON Canada N2L 3G1 () from passive voice to active voice" -P. Alder