Crisis in High Performance Computing - Extended Abstract
11th September 1995
Lecture room G22 (also known as the Pearson Lecture Theatre)
Pearson Building
University College London
Gower Street
London WC1E 6BT
Extended Abstract
Efficiency levels on massively parallel super-computers have been
reported (e.g. in the NAS Parallel Benchmarks Results 3-95,
Technical Report NAS-95-011, NASA
Ames Research Center, April 1995) ranging from 50% for the
``embarrassingly parallel benchmarks'', through 20% for tuned
``real'' applications, past 10% for typical ``irregular''
applications and down to 3% when using a portable software
environment. Low efficiencies apply not only to the larger
system configurations (256 or 1024 nodes), but also to the
smaller ones (e.g. 16 nodes). Seven years ago, we would be
disappointed with efficiency levels below 70% for any style of
application on the then state-of-the-art parallel
super-computers. What has caused this regression and can it be
remedied?
It seems to be proving difficult to build efficient
high-performance computer systems simply by taking very fast
processors and joining them together with very high bandwidth
interconnect. Apart from the need to keep the computational and
communication power in balance, it may also be essential to
reduce communication start-up costs (in line with increasing
bandwidth) and to reduce process context-switch time (in line
with increasing computational power). Failure in either of these
regards leads to coarse-grained parallelism, which may result in
insufficient parallel slackness to allow efficient use of
individual processing nodes, potentially serious cache-coherency
problems for super-computing applications and unnecessarily large
worst-case latency guarantees for real-time applications.
A further cause of concern is the dwindling number of suppliers
of HPC technology that are still in the market. Will there be a
next generation of super-computers from the traditional sources?
Or will HPC users have to rely on products from the commercial
marketplace, in particular the PC Industry and Games /
Consumer-Products Industries? If the latter, how will this
change the way we approach the design of HPC facilities and
applications?
At the other end of the spectrum, clusters of workstations are
reported as offering, potentially, good value for money, but only
for certain types of application (e.g. those with very high
compute/communicate ratios). What are those threshold ratios and
how do we tell if our application is above them? What do we do
if our application does not so conform?
Blame is often laid at the lack of software tools to support and
develop applications for high performance architectures. New
standards have been introduced for parallel computing - in
particular, High Performance FORTRAN (HPF) and the Message
Passing Interface (MPI). Old standards stick around - e.g. the
Parallel Virtual Machine (PVM).
These standards raise two problems: depressed levels of
efficiency (this may be a temporary reflection of early
implementations) and a low-level hardware-oriented programming
model (HPF expects the world to be an array and processing
architectures to be a 2-D grid, MPI allows a free-wheeling view
of message-passing that is non-deterministic by default).
Neither standard allows the application developer to design and
implement systems in terms dictated by the application; bridging
the gap between the application and these hardware-oriented tools
remains a serious problem.
New pretenders, based upon solid mathematical theory and
analysis, are knocking on the door - such as Bulk Synchronous
Parallelism (BSP). Old pretenders, also based upon solid
mathematical theory and analysis and with a decade of industrial
application, lie largely unused and under-developed for
large-scale HPC - such as occam. Might either of these offer
some pointers to the future?
The above sections raise several interesting and contentious
issues. The aim of this workshop is to exercise and debate them
thoroughly, see what peoples' real experiences have been and
consider in what ways HPC needs to mature in order to become
viable. A major goal of the workshop is to start to try to
identify standards of ``good behaviour'' on software for parallel
or distributed systems that will:
- enable HPC hardware architectures to operate with much greater
efficiency levels;
- enable HPC applications to be developed in their own terms
without regard for the underlying hardware.
Or maybe the workshop will decide that:
- HPC architectures (hardware and software) do not have
fundamental problems;
- there are no lessons from the past that need re-discovery and
re-application;
- everything can be sorted out by better education and tools
for existing HPC standards.
Please come along and make this workshop work.
Peter Welch
[ WoTUG |
Parallel Archive |
Up |
New |
Add |
Search |
Mail |
Help
]
Copyright © 1995 Dave Beckett, University of Kent at Canterbury, UK.