Crisis in High Performance Computing - Extended Abstract

11th September 1995

Lecture room G22 (also known as the Pearson Lecture Theatre)
Pearson Building
University College London
Gower Street
London WC1E 6BT

Extended Abstract

Efficiency levels on massively parallel super-computers have been reported (e.g. in the NAS Parallel Benchmarks Results 3-95, Technical Report NAS-95-011, NASA Ames Research Center, April 1995) ranging from 50% for the ``embarrassingly parallel benchmarks'', through 20% for tuned ``real'' applications, past 10% for typical ``irregular'' applications and down to 3% when using a portable software environment. Low efficiencies apply not only to the larger system configurations (256 or 1024 nodes), but also to the smaller ones (e.g. 16 nodes). Seven years ago, we would be disappointed with efficiency levels below 70% for any style of application on the then state-of-the-art parallel super-computers. What has caused this regression and can it be remedied?

It seems to be proving difficult to build efficient high-performance computer systems simply by taking very fast processors and joining them together with very high bandwidth interconnect. Apart from the need to keep the computational and communication power in balance, it may also be essential to reduce communication start-up costs (in line with increasing bandwidth) and to reduce process context-switch time (in line with increasing computational power). Failure in either of these regards leads to coarse-grained parallelism, which may result in insufficient parallel slackness to allow efficient use of individual processing nodes, potentially serious cache-coherency problems for super-computing applications and unnecessarily large worst-case latency guarantees for real-time applications.

A further cause of concern is the dwindling number of suppliers of HPC technology that are still in the market. Will there be a next generation of super-computers from the traditional sources? Or will HPC users have to rely on products from the commercial marketplace, in particular the PC Industry and Games / Consumer-Products Industries? If the latter, how will this change the way we approach the design of HPC facilities and applications?

At the other end of the spectrum, clusters of workstations are reported as offering, potentially, good value for money, but only for certain types of application (e.g. those with very high compute/communicate ratios). What are those threshold ratios and how do we tell if our application is above them? What do we do if our application does not so conform?

Blame is often laid at the lack of software tools to support and develop applications for high performance architectures. New standards have been introduced for parallel computing - in particular, High Performance FORTRAN (HPF) and the Message Passing Interface (MPI). Old standards stick around - e.g. the Parallel Virtual Machine (PVM).

These standards raise two problems: depressed levels of efficiency (this may be a temporary reflection of early implementations) and a low-level hardware-oriented programming model (HPF expects the world to be an array and processing architectures to be a 2-D grid, MPI allows a free-wheeling view of message-passing that is non-deterministic by default). Neither standard allows the application developer to design and implement systems in terms dictated by the application; bridging the gap between the application and these hardware-oriented tools remains a serious problem.

New pretenders, based upon solid mathematical theory and analysis, are knocking on the door - such as Bulk Synchronous Parallelism (BSP). Old pretenders, also based upon solid mathematical theory and analysis and with a decade of industrial application, lie largely unused and under-developed for large-scale HPC - such as occam. Might either of these offer some pointers to the future?

The above sections raise several interesting and contentious issues. The aim of this workshop is to exercise and debate them thoroughly, see what peoples' real experiences have been and consider in what ways HPC needs to mature in order to become viable. A major goal of the workshop is to start to try to identify standards of ``good behaviour'' on software for parallel or distributed systems that will:

enable HPC hardware architectures to operate with much greater efficiency levels;
enable HPC applications to be developed in their own terms without regard for the underlying hardware.

Or maybe the workshop will decide that:

HPC architectures (hardware and software) do not have fundamental problems;
there are no lessons from the past that need re-discovery and re-application;
everything can be sorted out by better education and tools for existing HPC standards.

Please come along and make this workshop work.

Peter Welch