Contents of this PerformanceStudy file:
| Field Name | Value | HTML Tag Type |
|---|---|---|
| BIDM.PerformanceStudy.DateofInformation | 21 February 1999 | META |
| BIDM.PerformanceStudy.IsStudyOf.Asset | http://www.nhse.org/rib/repositories/arl_perf_studies/objects/Asset/overflow.html | LINK |
| BIDM.PerformanceStudy.MethodUsed | Multi-Level Parallelism (MLP): The term MLP is generally associated with shared-memory, multiprocessor architectures, in which the shared memory features allow users to dispense with message passing altogether. As its name implies, code developed under MLP contains multiple levels of parallelism. In general, this means fine-grained parallelism at the loop level, using compilers to generate the parallel code and, at the same time, performing parallel work at a coarser level using a standard Unix fork system call to effect any needed coarser levels of parallelism. Data that is truly "global" is shared among the forked processes through standard shared-memory arenas. | META |
| BIDM.PerformanceStudy.Name | OVERFLOW-MLP Results on SGI Origin 2000 | META |
| BIDM.PerformanceStudy.Summary | The OVERFLOW-MLP code developed in the NAS Systems Division is based on the standard C90 parallel-vector version of OVERFLOW. The total time to generate this new parallel version and correctly execute three widely varying test cases was around three weeks. About 250 lines of code were inserted or changed in the base MLP version. Most changes came from the addition of four small subroutines. The OVERFLOW-MLP code organizes the calculations such that groups of zones are processed by groups of CPUs in parallel. An initial distribution of zones is made across a user-defined number of CPU groups so that the work is approximately equal. The number of CPUs in each group is adjusted to further load-balance the work. Load balancing is fully automatic and dynamic in time. During the solution process, each group of CPUs advances the time level for its assigned zones until the solution converges to the degree needed. To fully stress-test both the MLP methodology and the Origin2000 system, a very large test case was selected. The test dataset was provided by Karlin Roth (high-lift CFD team lead, NASA Ames Applied Computational Aerodynamics Branch) and consisted of 153 zones totaling over 33 million grid points. The important and exciting aspect of this work is that the Origin2000 performs this very large calculation at a rate that is 72 percent of the full dedicated C90. Scaling to 64 CPUs is completely linear. There is none of the tailing-off that is frequently seen when some aspect of a parallel system's architecture (or the numerical algorithms) begins to infringe on performance. "Back-of-the-envelope" calculations and test executions of portions of the problem as it might execute on larger systems indicate that this test case may scale to several hundred CPUs. The code is not very efficient in its reuse of level-1 cache, and subsequent speedups can be expected with some tuning effort in this area. It is estimated that two work-months of optimization will result in a code that could be two times faster on the large test problem. | META |
| BIDM.PerformanceStudy.SupportedBy.PerformanceDocument | http://www.nhse.org/rib/repositories/arl_perf_studies/objects/PerformanceDocument/overflow_mlp_nasnews.html | LINK |
| BIDM.PerformanceStudy.WasCreatedBy.Organization | http://www.nhse.org/rib/repositories/arl_perf_studies/objects/Organization/jim_taft.html | LINK |