- DateofInformation
- 21 February 1999
- Summary
- The OVERFLOW-MLP code developed in the NAS Systems
Division is based on the standard C90 parallel-vector
version of OVERFLOW. The total time to generate this
new parallel version and correctly execute three widely
varying test cases was around three weeks. About 250
lines of code were inserted or changed in the base
MLP version. Most changes came from the addition of
four small subroutines. The OVERFLOW-MLP code organizes
the calculations such that groups of zones are processed
by groups of CPUs in parallel. An initial distribution
of zones is made across a user-defined number of CPU
groups so that the work is approximately equal. The
number of CPUs in each group is adjusted to further
load-balance the work. Load balancing is fully automatic
and dynamic in time. During the solution process, each
group of CPUs advances the time level for its assigned
zones until the solution converges to the degree needed.
To fully stress-test both the MLP methodology and the
Origin2000 system, a very large test case was selected.
The test dataset was provided by Karlin Roth (high-lift
CFD team lead, NASA Ames Applied Computational Aerodynamics
Branch) and consisted of 153 zones totaling over 33
million grid points. The important and exciting aspect
of this work is that the Origin2000 performs this very
large calculation at a rate that is 72 percent of the
full dedicated C90. Scaling to 64 CPUs is completely
linear. There is none of the tailing-off that is frequently
seen when some aspect of a parallel system's architecture
(or the numerical algorithms) begins to infringe on
performance. "Back-of-the-envelope" calculations and
test executions of portions of the problem as it might
execute on larger systems indicate that this test case
may scale to several hundred CPUs. The code is not
very efficient in its reuse of level-1 cache, and subsequent
speedups can be expected with some tuning effort in
this area. It is estimated that two work-months of
optimization will result in a code that could be two
times faster on the large test problem.
- MethodUsed
- Multi-Level Parallelism (MLP): The term MLP is generally
associated with shared-memory, multiprocessor architectures,
in which the shared memory features allow users to
dispense with message passing altogether. As its name
implies, code developed under MLP contains multiple
levels of parallelism. In general, this means fine-grained
parallelism at the loop level, using compilers to generate
the parallel code and, at the same time, performing
parallel work at a coarser level using a standard Unix
fork system call to effect any needed coarser levels
of parallelism. Data that is truly "global" is shared
among the forked processes through standard shared-memory
arenas.
- IsStudyOf
- OVERFLOW
- SupportedBy
- NASNews article on Overflow-MLP
- WasCreatedBy
- Jim Taft
- Name
- OVERFLOW-MLP Results on SGI Origin 2000