Newsgroups: comp.parallel
From: eugene@cse.ucsc.edu (Eugene Miya)
Subject: 1997: Top-Ten list (readings list)
Organization: UC Santa Cruz CIS/CE
Date: 9 Mar 1997 05:14:40 GMT
Message-ID: <5fth00$5dq@darkstar.ucsc.edu>

Context: you are a prof. with first or second years CS grads students
(they have had the basic classes).
They are supposed to read documents about parallel computing.
This list is "democratically" kibitzed along your colleagues, the
readers of this news group.  First are the top ten (a lot of work this
year, I had to make a decision to drop the oldest for the first time
[that might not be a wise move on my part]).

Next come the recommended 100 (even more work!  I must cull 88 articles).

REQUIRED

%A George S. Almasi
%A Allan Gottlieb
%T Highly Parallel Computing, 2nd ed.
%I Benjamin/Cummings division of Addison Wesley Inc.
%D 1994
%K ISBN 0-8053-0443-6
%K ISBN # 0-8053-0177-1, book, text, Ultracomputer, grequired96, 91,
%d 1st edition, 1989
%K enm, cb@uk, ag, jlh, dp, gl, dar, dfk, a(umn),
%$ $36.95
%X This is a kinda neat book.  There are special net antecdotes
which make this interesting.
%X Oh, there are a few significant typos: LINPAK is really LINPACK. Etc.
These were fixed in the second edition.
%X It's cheesy in places and the typography is
pitiful, but it's still the best survey of parallel processing.  We really
need a Hennessy and Patterson for parallel processing.
(The topography was much improved in the second edition so much of
the cheesy flavor is gone --ag.)
%X (JLH & DP) The authors discuss the basic foundations, applications,
programming models, language and operating system issues and a wide
variety of architectural approaches.  The discussions of parallel
architectures include a section that describes the key concepts within
a particular approach.
%X Very broad coverage of architecture, languages, background theory,
software, etc. Not really a book on programming, of course, but
certainly a good book otherwise.
%X Top-10 required reading in computer architecture to Dave Patterson.
%X It is hardware oriented, but makes some useful comments on programming.

%A Michael Wolfe
%T Optimizing Supercompilers for Supercomputers
%S Pitman Research Monographs in Parallel and Distributed Computing
%I MIT
%C Cambridge, MA
%D 1989
%d October 1982
%r Ph. D. Dissertation
%K parallelization, compiler, summary,
%K book, text,
%K grequired91/3,
%K cbuk, dmp, lls, +6 c.compilers,
%K Recursion removal and parallel code
%X Good technical intro to dependence analysis, based on Wolfe's PhD Thesis.
%X This dissertation was re-issued in 1989 by MIT under it's Pittman
parallel processing series.
%X ...synchronization and locking instructions when compiling the
parallel procedures and those called by them. This is a bit like
the 'random synchronization' method described by Wolfe but
works with pointer-based datastructures rather than array elements.
%X Cited Chapters:
Data Dependence 11-57
Structure of a Supercomplier 214-218

%A W. Daniel Hillis
%A Guy. L. Steele, Jr.
%Z Thinking Machines Corp.
%T Data Parallel Algorithms
%J Communications of the ACM
%V 29
%N 12
%D December 1986
%P 1170-1183
%r DP86-2
%K Special issue on parallel processing,
grequired97: enm, hcc, dmp, jlh, dp, jwvz, sm,
CR Categories and Subject Descriptors: B.2.1 [Arithmetic and Logic Structures]:
Design Styles - parallel; C.1.2 [Processor Architectures]:
Multiple Data Stream Architectures (Multiprocessors) - parallel processors;
D.1.3 [Programming Techniques] Concurrent Programming;
D.3.3 [Programming Languages] Language Constructs -
concurrent programming structures: E.2 [Data Storage Representations]:
linked representations; F.1.2 [Computation by Abstract Devices]:
Modes of Computation - parallelism; G.1.0 [Numerical Analysis]
General- parallel algorithms,
General Terms: Algorithms
Additional Key Words and Phrases: Combinator reduction, combinators,
Connection Machine computer system, log-linked lists, parallel prefix,
SIMD, sorting, Ultracomputer
%K Rhighnam, algorithms, analysis, Connection Machine, programming, SIMD, CM,
%X (JLH & DP) Discusses the challenges and approaches for programming an SIMD
like the Connection Machine.

%A C. L. Seitz
%T The Cosmic Cube
%J Communications of the ACM
%V 28
%N 1
%D January 1985
%P 22-33
%r Hm83
%d June 1984
%K grequired91: enm, dmp, jlh, dp, j-lb, jwvz,
Rcccp, Rhighnam,
%K CR Categories and Subject Descriptors: C.1.2 [Processor Architectures]:
Multiple Data Stream Architectures (Multiprocessors);
C.5.4 [Computer System Implementation]: VLSI Systems;
D.1.2 [Programming Techniques]: Concurrent Programming;
D.4.1 [Operating Systems]: Process Management
General terms: Algorithms, Design, Experimentation
Additional Key Words and Phrases: highly concurrent computing,
message-passing architectures, message-based operating systems,
process programming, object-oriented programming, VLSI systems,
homogeneous machine, hypercube, C^3P,
%X Excellent survey of this project.
Reproduced in "Parallel Computing: Theory and Comparisons,"
by G. Jack Lipovski and Miroslaw Malek,
Wiley-Interscience, New York, 1987, pp. 295-311, appendix E.
%X * Brief survey of the cosmic cube, and its hardware
%X (JLH & DP) This is a good discussion of the Caltech approach, which
embodies the ideas several of these machines (often called hypercubes).
The work at Caltech is the basis for the machines at JPL and the Intel iPSC,
as well as closely related to the NCUBE design.  Another paper by Seitz
on this same topic appears in the Dec. 1984 issue of IEEE Trans.
on Computers.
%X One of my top-10 papers to Dave Patterson (on computer architecture).
%X Literature search yielded:
1450906 C85023854
The Cosmic Cube (Concurrent Computing)
Seitz, C.L.
Author Affil: Dept. Of Comput. Sci., California Inst. Of Technol.,
Pasadena, Ca, Usa
Source: Commun. Acm (Usa) Vol.28, No.1, Pp.: 22-33
Publication Year: Jan. 1985
Coden: Cacma2 Issn: 0001-0782
U. S. Copyright Clearance Center Code: 0001-0782/85/0100-002275c
Treatment: Practical;
Document Type: Journal Paper
Languages: English
(14 Refs)
Abstract: Sixty-four small computers are connected by a network of
point-to-point communication channels in the plan of a binary 6-cube. this
cosmic cube computer is a hardware simulation of a future vlsi
implementation that will consist of single-chip nodes. the machine offers
high degrees of concurrency in applications and suggests that future
machines with thousands of nodes are both feasible and attractive. it uses
message switching instead of shared variables for communicating between
concurrent processes.
descriptors: multiprocessing systems; message switching
identifiers: message passing architectures; process programming; vlsi
systems; point-to-point communication channels; binary 6-cube; cosmic cube;
hardware simulation; VLSI implementation; single-chip nodes; concurrency
class codes: C5440; C5620

%A Edward Gehringer
%A Daniel P. Siewiorek
%A Zary Segall
%Z CMU
%T Parallel Processing: The Cm* Experience
%I Digital Press
%C Boston, MA
%D 1987
%K book, text, multiprocessor,
%K grequired91: enm, ag, jlh, dp, dar,
%O ISBN 0-932376-91-6
%$ 42
%X Looks okay!
%X [Extract from inside front cover]
... a comprehensive report of the important parallel-processing
research carried out on Cm* at Carnegie-Mellon University. Cm* is a
multiprocessing system consisting of 50 tightly coupled processors and
has been in operation since the mid-1970s. Two operating
systems-StarOs and Medusa-are part of its development, along with a
vast number of applications.
%X (JLH & DP) This book reviews the Cm* experience.  The book
discusses hardware issues, operating system strategies,
programming systems, and includes an extensive discussion of the
experience with over 20 applications on Cm*.
%X (DAR) a must read to avoid re-inventing the wheel.

%A John Hennessy
%A David Patterson
%T Computer Architecture: A Quantitative Approach, 2nd ed.
%I Morgan Kaufmann Publishers Inc.
%C Palo Alto, CA 94303
%D 1995
%O ISBN 1-55860-069-8
%K books, text, textbook, basic concepts, multiprocessors,
computer architecture, textbook, pario bib,
%K grequired97,
%K rgs, dn, a(umn), dab, sm,
%X http://Literary.com/mkp/new/hp2e/hp2e_index.shtml
%X This is an excellent book, and I would guess it was about suitable for
second or final-year undergraduate use.
%X The book emphasises quantitative measurement of various architectures, as
hinted at in the title. Thus, benchmarking, using real applications, is
heavily emphasised. Naturally, considering the authors, the benefits of the
class of processors generically referred to as 'RISC' are highlighted.
%X The book costs M-#25 Sterling here in England (hard-back).
%X Chapter titles are:
1. Fundamentals of Computer Design
2. Performance and Cost
3. Instruction Set Design: Alternatives and Principles
4. Instruction Set Examples and Measurements of Use
5. Basic Processor Implementation Strategies
6. Pipelining
7. Vector Processors
8. Memory-Heirarchy Design
9. Input/Output
10. Future Directions
Appendix A: Computer Arithmetic
Appendix B: Complete Instruction Set Tables
Appendix C: Detailed Instruction Set Measurements
Appendix D: Time Versus Frequency Measurements
Appendix E: Survey of RISC Architectures
%X Looks like a great coverage of architecture. Of course a chapter on I/O!
[David.Kotz@Dartmouth.edu]
%X Watch for printing or edition number in paper copies
(The "V. Pratt" Warning).

%A M. Ben-Ari
%T Principles of Concurrent and Distributed Programming
%I Prentice Hall International, Inc.
%C Englewood Cliffs, NJ
%D 1989
%O ISBN 0-13-711821-X
%K conditional grequired91 (1986 version was the suggested version, see VRP),
parallel processing (electronic computers),
%K sc, +3 votes posted from c.e. discussion.
%X Sound familiar?
%X I (VRP) ran into a problem with Prentice-Hall over Ben-Ari: they do not
regard his rewrite as a 2nd edition but as a completely new book.  If
you order it under the title you give in your bibliography THEY WILL
SHIP YOU THE OLD BOOK.  The Stanford bookstore even called them to ask
whether they'd be receiving the new edition and P-H told them that if
the instructor ordered it under the old title that was what he must want.
%X Why a publishing company would not only create a situation with such an
obvious built-in pitfall but then proceed to firmly and insistently
push their customers into this pit is utterly beyond me.  God and
publishers move in mysterious ways.
%X Moral: Change your title to "Principles of Concurrent and Distributed
Computing" and don't refer to it as "the second edition" since it isn't.

%K fox:cubix,
%A Geoffrey C. Fox
%A Mark A. Johnson
%A Gregory Lyzenga
%A Steve W. Otto
%A John Salmon
%A David Walker
%Z Caltech
%T Solving Problems on Concurrent Processors
%V 1, General Techniques and Regular Problems
%I Prentice-Hall
%C Englewood Cliffs, New Jersey
%D 1988
%K book, text, hypercubes, CCCP, MIMD, parallel programming,
communication, applications, physics, pario bib,
parallel processing, supercomputers,
%K grequired91,
%K bb, jlh, dp, dfk,
%K suggested supplemental ref by jh and dp
%K Barnes-Hut N-body problem,
%K parallel programming distributed memory
%K parallel scheduling bib,
%O ISBN 13-823022-6 (HB), 13-823469-8 (PB) $66.00
%X Interesting book.  Given out for free at Supercomputing'89.
%X My Bible of Distributed Parallel Computing; even if you are not using
Express it is a wonderful book to have !
%X "It is a good introduction to loosely synchronous
concurrent problems on hypercube topologies."
%X See fox:cubix for parallel I/O.
%P chapters 6 and 15
%K parallel file system, hypercube, pario bib,
%X Parallel I/O control, called CUBIX. Interesting method.
Depends a lot on ``loose synchronization'', which is sortof SIMD-like.

%A John L. Gustafson
%A Gary R. Montry
%A Robert E. Benner
%Z Sandia National Labs.
%T Development of Parallel Methods for a 1024-Processor Hypercube
%J SIAM Journal on Scientific and Statistical Computing
%V 9
%N 4
%D July 1988
%K fluid dynamics, hypercubes, MIMD machines, multiprocessor performance,
parallel computing, structural analysis, supercomputing, wave mechanics,
%K grequired91,
%K jlh, dp, hds, dar,
%X Introduces concept of operation efficiency, scaled speed-up.
Also covers communication cost, beam strain analysis, and a bit on
benchmarking.  Winner of 1988 Bell and Karp Prizes.
%X (JLH & DP) This paper report interesting results in using a
large scale NCUBE.  The authors won the Gordon Bell prize with their work.
They also suggest the idea of problem scaling to overcome the limitations of
sequential portions of an application.
%X (DAR) some application flavor mixed with performance analysis.

%A W. Daniel Hillis
%T The Connection Machine
%S Series in Artificial Inteligence
%I MIT Press
%C Cambridge, MA
%D 1985
%K book, text, PhD thesis,
%K grequired96, 91
%K JLb, dar, jwvz, dn,
%O ISBN #: 0262580977 $15.95 [1989 printing?]
%X Has a chapter on why computer science is no good.
%X Patent 4,709,327, Connection Machine, 24 Nov 87 (individuals)
"Parallel Processor / Memory Circuit", W. Daniel Hillis et al.
This looks like the meat of the connection machine design.
It probably has lots of stuff that up til the patent was considered
proprietary.
%X another dissertation rehash and woefully lacking in details
(a personal gripe about MIT theses) but otherwise a CM introduction.
%X Top-10 required reading in computer architecture to Dave Patterson.


From eugene@cse.ucsc.edu Mon Mar 17 15:04:57 1997
Path: ukc!nntpfeed.doc.ic.ac.uk!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!uknet!usenet1.news.uk.psi.net!uknet!dispatch.news.demon.net!demon!cpk-news-hub1.bbnplanet.com!cam-news-hub1.bbnplanet.com!news.bbnplanet.com!howland.erols.net!agate!news.ucsc.edu!eugene
From: eugene@cse.ucsc.edu (Eugene Miya)
Newsgroups: comp.parallel
Subject: 1997: Next 90 Recommended readings for comp.parallel
Date: 9 Mar 1997 07:10:01 GMT
Organization: UC Santa Cruz CIS/CE
Approved: eugene@ames.arc.nasa.gov
Message-ID: <5ftno9$60u@darkstar.ucsc.edu>
NNTP-Posting-Host: arapaho.cse.ucsc.edu
Lines: 2125

Not quite 100, I had to cut down 90 articles.
Surprises: No recommendations from purely 1991 or earlier.
This cut out many noted previously recommended papers: Amdahl, Backus, etc.
Some important news papers/books at least made recommendations:
PVM and MPI papers.  Priority: tally first, then year of recommendation.

RECOMMENDED

%A J. R. Gurd
%A C. C. Kirkham
%A I. Watson
%T The Manchester Prototype Dataflow Computer
%J Communications of the CACM
%V 28
%N 1
%D January 1985
%P 34-52
%K CR Categories and Subject Descriptors:
C.1.3 [Processor Architectures]: Other Architecture Styles;
C.4 [ Performance of Systems]; D.3.2 [Programming Languages]: Language
Classifications
General Terms: Design, Languages, Performance
Additional Key Words and Phrases: tagged-token dataflow,
single assignment programming, SISAL, parallel computation,
%K grecommended91,
%K dmp, jlh, dp, jwvz,
%K define average parallelism - use speedup and efficiency
%K parallel scheduling bib,
%X A special issue on Computer Architecture.  Mentions SISAL, but not LLNL.
Using tagged-token dataflow, the Manchester processor is running
reasonably large user programs at maximum rates of between 1 and 2 MIPS.
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 111-129.
%X (JH & DP) This paper discusses the machine, its software, and
evaluates performance.

%A Kai Hwang
%A Faye A. Briggs
%T Computer Architecture and Parallel Processing
%I McGraw-Hill
%C New York, NY
%D 1984
%O ISBN 0-07-031556-6
%K grecommended95, book, text,
%K meb, jlh, dp, dfk,
%K Rhighnam, analysis, architecture, survey, classification/description,
%X This text is quite weighty.  It covers much about the interconnection
problem.  It's a bit weak on software and algorithms.  Dated.
%X Extensive survey with large bibliography.  Lots of details.
%X (JLH & DP) Covers a wide variety of subjects including sections
on interconnection networks, special-purpose machines, dataflow, and
programming and applications issues for parallel processing.
%X A good book on the theory of high-level design. Hwang is at USC and is
interested in supercomputing, and the book reflects that, though cost
issues occasionally come in. They do take a sort of narrow view of
SIMD machines, seeing them mainly as vector processors. It seems to be
strong on pipelining, which is an important topic in microprocessors
these days. It does occasionally reach down to the gate level for
such matters as HW implementation of cache replacement policies. It
doesn't cover such issues as instruction set design, which I'm
interested in, but other than that most of its flaws are that it's
already five years old. Work on a second edition has reportedly begun,
but it's likely to be a while before it's out.
%X recommended by JAGROGAN@vax1.tcd.ie, in conjunction with Quinn's book.
%X (Anon.) Anti-recommendation: Specifically not on the list:
Any book which deals substantially with interconnection networks.
%X The standard reference book on parallel architectures.
%X I would say "one" reference.  I would not say "standard."

%A George H. Barnes
%A Richard M. Brown
%A Maso Kato
%A David J. Kuck
%A Daniel L. Slotnick
%A Richard A. Stokes
%T The ILLIAC IV Computer
%J IEEE Transactions on Computers
%V C-17
%N 8
%D August 1968
%P 746-757
%K grecommended91, ag, jlh, dp, JLb,
array, computer structures, look-ahead, machine language, parallel processing,
speed, thin-film memory, multiprocessors,
rwa, Rhighnam, Rmaeder biblio: parallel hardware and devices,
%K architecture, ILLIAC-IV, SIMD,
%X This was the original paper on the ILLIAC IV when it was proposed as
a 256 processing element machine, a follow on to the SOLOMON.  It was a
very ambitious design.
%X Contains ILLIAC IV assembler (among other things).
%X (JLH & DP) This is the original paper on the ILLIAC IV hardware;
some aspects of the machine (especially the memory system) changed
subsequently.  A later paper, cited as Bourknight, et. al. (1972)
provides a more accurate description of the real hardware.
%X (J-LB) paper of historical significance.

%A W. B. Ackerman
%Z MIT
%T Data flow languages
%J Computer
%V 15
%N 2
%D February 1982
%P 15-25
%K grecommended(3), programming languages,
special issue on data flow,
%K bmiya,
%X Very good summary of data flow, and changes made to
traditional languages to accommodate parallelism [e.g. outlawing side-effects]
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 179-188.  Earlier version in Proc. NCC,
1979 reproduced in Kuhn, Padua's tutorial 1981.

%A Arvind
%A David E. Culler
%T Dataflow Architectures
%Z MIT
%J Annual Reviews in Computer Science
%V 1
%P 225-53
%D 1986
%r TM-294
%d February 1986
%K grecommended91(3),
%K jlh, dp, JLb,
%X Not detailed, but reasonably current survey paper on data flow.
Includes status information of American, English, France, and Japanese
dataflow projects like the SIGMA-1 (Japan), Manchester (English), and so
forth.
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 79-101.
%X (JLH & DP) This paper discusses the basic ideas behind dataflow machines.
The basic concepts of dataflow were espoused by Dennis: Dennis, J. [1980].
"Dataflow Supercomputers," Computers vol. 13, no. 11 (November), pages 48-56.

%A Tom Axford
%T Concurrent Programming: Fundamental Techniques for
Real-Time and Parallel Software Design
%S Series in Parallel Computing
%I John Wiley & Sons
%C New York
%D 1989
%K book, text,
%O ISBN 0 471 92303 6
%K grecommended91(3),
%K js, (2 c.e. votes),
%X more about software techniques for concurrency than about parallel
programming, but still useful.
%X ...quite happy with it. ... primary language used was Modula-2 ...
... concepts, architectures, and so forth. ... used transputers as
an inexpensive platform for parallel computing. We used C for
the transputer programming.

%A Robert G. Babb, II, ed.
%T Programming Parallel Processors
%I Addison-Wesley
%C Reading, MA
%D 1988
%K book, text, software, Alliant FX/8, BBN Butterfly, Cray X-MP,
FPS T-Series, IBM 3090, Loral LDF-100, Intel iPSC, hypercube, shared memory,
message passing, Sequent Balance,
%K grecommended91(3),
%K dwf, enm, dfk,
%X Reviewed, IEEE Computer, by T. DeMarco, April 1988, pp. 141.
Good quote in review:
	If juggling three balls is of difficulty 3 on a scale of 1 to 10,
	then juggling four balls is a difficulty of 20. -- from a
	juggler's handbook.
This book is a compilation of experiences by grad students (and undergrads)
on a diversity of machines.  I suspect the monogram format will become
popular as a publishing vehicle on this topic for the future since
comparisons will be difficult.  Over half of the book consists of
appendices of source code or commentary.  I like the way certain
significant things are encircled by boxes: . . . WANTED
for Hazardous Journey. Small wages, bitter cold, long months of
complete darkness, constant danger, safe return doubtful.
Honor and recognition in case of success.
--Ernest Shackleton.
%X This is essentially a how-to book for programming several different
multiprocessors. Handy if you want to use several different parallel
processors. More practical, less conceptual.

%A Uptal Banerjee
%A Rudolf Eigenmann
%A Alexandru Nicolau
%A David A. Padua
%T Automatic program parallelization
%J Proceedings of the IEEE
%V 81
%N 2
%D Feb. 1993
%K grecommended93(3),
%K +3 c.compilers,

%A Dimitri P. Bertsekas
%A John N. Tsitsiklis
%T Parallel and Distributed Computation: Numerical Methods
%I Prentice Hall
%C Englewood Cliffs NJ
%D 1989
%K book, text, parallel processing,
%K grecommended96(3), 91,
%K ab, dfk, a(umn),
%K asynchronous algorithms, overcoming communication latency,
%O ISBN 0-13-648700-9 $40 715 pp.
%X It covers most of the basic ideas in a thorough manner.
%X parallel computing is very good for operations research applications.
%X Received one verbal view that this book isn't great.
It was that person's opinion that the authors didn't implement
important details of algorithms (like boundary conditions).
The view is unconfirmed, and requires further investigation.

%A K. Mani Chandy
%A Jayadev Misra
%Z University of Texas--Austin
%T Parallel Program Design: A Foundation
%I Addison-Wesley
%C Reading, MA
%D 1988
%K book, text, UNITY,
%K grecommended91(3),
%K hcc, fpst, dfk,
%O ISBN 0-201-05866-9
%X requires a more in depth review.
%X Theoretically useful.
%X Mathematical approach to programming.
%X The authors introduce the UNITY programming language.  UNITY
lacks any sense of flow control and is quite a different sort
of programming language from the more traditional imperative
programming languages.  However, it is quite interesting
reading.  After writing a UNITY program, the programmer is
required to "map" the program onto a specific architecture
and programming language.  Apt and Olderog
claim that this two-step process is a drawback to UNITY.

%A D. D. Gajski
%A D. A. Padua
%A D. J. Kuck
%A R. H. Kuhn
%T A Second Opinion on Data Flow Machines and Languages
%J Computer
%V 15
%N 2
%D Feb. 1982
%P 15-25
%K bmiya, multiprocessing,
%K grecommended91(3),
%K meb, jlh, dp,
%X (SKS) or why I'm afraid people won't use FORTRAN.
This paper should only be read (by beginners) in conjunction with a
pro-dataflow paper for balance: maybe McGraw's "Physics Today" May 1984.
Also reprinted in the text compiled by Kai Hwang:
"Supercomputers: Design and Application," IEEE, 1984.
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 165-176.
%X * Due to their simplicity and strong appeal to intuition, data flow
techniques attract a great deal of attention.  Other alternatives,
however, offer more hope for the future.
%X (JLH & DP) The most complete critique of the dataflow approach.
%X Using dataflow graphs for parallel programming is too difficult and
error-prone for human programmers.

%A Allen Gottlieb
%A Ralph Grishman
%A Clyde P. Krukal
%A Kevin P. McAuliffe
%A Larry Rudolph
%A Marc Snir
%T The NYU Ultracomputer -- Designing an MIMD Shared Memory Parallel Computer
%J IEEE Transactions on Computers
%V C-32
%N 2
%D February 1983
%P 175-190
%K multiprocessors, parallel processing, Ultracomputer,
computer architecture, fetch-and-add, MIMD, Omega-network,
parallel computer, shared memory, systolic queues, VLSI,
%K grecommended91(3),
%K jlh, dp, JLb,
%X describes the design of the NYU Ultracomputer, its synchronization,
the omega network. Analyzes the projected performance of the Ultracomputer.
Reproduced in "Computer Architecture," D. D. Gajski,
V. M. Milutinovic, H. J. Siegel, and B. P. Furht, eds., IEEE, 1987,
pp. 471-485.
%X This paper represents an architectural approach that uses shared memory
and caches without cache-coherence.  Several machines have explored
this approach including IBM's RP3, the U. of Illinois CEDAR, and a number
of early multiprocessors (e.g., C.mmp).

%A C. A. R. Hoare
%T Communicating Sequential Processes
%I Prentice-Hall
%C Englewood Cliffs, NJ
%D 1985
%O ISBN 0-13-153271-5 & 0-13-153289-8
%K book, text, CSP, parallel processing,
%K grecommended91(3),
%K hcc, fpst, jb,
%K process migration,
%X One informed dissenting opinion holds that CSP does not adequately cover
process migration, another differs.
%X A better book than the original CSP papers.  Hoare comes down to
earth and tries to give concrete examples of CSP notation.  Still
has some problems.
%X Somewhat esoteric.
%X Must reading for those interested in distributed processing. High
level discussions of various operators one might think about in the
message passing realm. Discusses failures.
%X This defines CSP, upon which occam is based.  REAL parallelism here!
Very theoretical.  Must read for serious parallelism students!
%X This describes the mathematical foundation for languages like occam.
It's very well written and readable and is an "important" work for
anyone concerned with the mathematics of programming.

%A Robert H. Kuhn
%A David A. Padua, eds.
%T Tutorial on Parallel Processing
%I IEEE
%D August 1981
%K grecommended91(3),
%K bmiya, book, text, survey, supercomputers,
%K enm, ag, fpst,
%O ISBN #: 0818603674
%X This is a collection of noted papers on the subject, collected for
the tutorial given at the 10th conference (1981) on Parallel Processing.
It eases the search problem for many of the obscure papers.
Some of these papers might not be considered academic, others are
applications oriented.  Data flow is given short coverage.  Still, a
quick source for someone getting into the field.  Where ever possible,
papers in this bibliography are noted as being in this text.
%X Check on literature search:
Tutorial on parallel processing; initially presented at the Tenth
International Conference on Parallel Processing, August 25-28, 1981,
Bellaire, Michigan / [edited by] Robert H. Kuhn, David A. Padua
Kuhn, Robert H; Padua, David A
CONFERENCE: International Conference on Parallel Processing (10th : 1981
: Bellaire, Mich.)

%A Leslie Lamport
%T Time, Clocks, and the Ordering of Events in a Distributed System
%J Communications of the ACM
%V 21
%N 7
%D July 1978
%P 558-565
%K distributed systems, computer networks, clock synchronization,
multiprocess systems,
CR categories: 4.32, 5.29
distributed processing computer networks multiprocessing programs
ordering of events distributed system synchronising total ordering
clocks computer networks multiprocessing, operating system, grad reading list,
%K causality, happens-before, event ordering, fault tolerance,
%K bsatya
%K grecommended91(3),
%K enm, dmp, jw,
%O 4 Refs.
treatment: theoretical
%X classic paper on logical clocks.
%X A classic paper on synchronization.
Reproduced in "Distributed Computing: Concepts and Implementations" edited by
McEntire, O'Reilly and Larson, IEEE, 1984.
%X The concept of one event happening before another in a distributed system
is examined, and is shown to define a partial ordering of the events. A
distributed algorithm is given for synchronising a system of logical clocks
which can be used to totally order the events. The use of the total
ordering is illustrated with a method for solving synchronisation problems.
The algorithm is then specialised for synchronising physical clocks, and a
bound is derived on how far out of synchrony the clocks can become.
%X The classic paper on the relativity of event ordering in
distributed systems. Its author explains why it may not be possible to
impose a complete ordering on events in a distributed system, then presents a
simple algorithm for constructing a consistent partial order based on local
virtual clocks.

%A Frank C. H. Lin
%A Robert M. Keller
%T The Gradient Model Load Balancing Method
%J IEEE Transactions on Software Engineering
%V SE-13
%N 1
%D January 1987
%P 32-38
%K special issue on distributed systems, applicative systems,
computer architecture, data flow, distributed systems,
load balancing, multiprocessor systems, reduction architecture
%K dynamic load balancing in distributed systems, workload diffusion,
%K grecommended91(3),
%K dmp, (+1), dar,
%X (DAR) an old reference to the problem of dynamic task placement.

%A David A. Padua
%A Michael J. Wolfe
%Z CSRD, U. Ill.
%T Advanced Compiler Optimization for Supercomputers
%J Communications of the ACM
%V 29
%N 12
%D December 1986
%P 1184-1201
%r CSRD TR#592
%d July 1986
%K Special issue on parallel processing,
CR Categories and Subject Descriptors: C.1.2 [Processor Architectures]:
Multiple Data Stream Architectures (Multiprocessors) -
array and vector processors;
multiple-instruction-stream, multiple-data-stream processors (MIMD);
parallel processors; pipeline processors;
single-instruction-stream, multiple-data-stream processors (SIMD);
D.2.7 [Software Engineering]: Distribution and Maintenance -
restructuring; D.3.3 [Programming Languages] Language Constructs -
concurrent programming structures; D.3.4 [Programming Languages] Processors -
code generation; compilers; optimization; preprocessors;
General Terms: Languages, performance
%K RCedar,
%K grecommended91(3),
%K ak, dar, anon.,
%X a reasonable introduction to the problem.

%A Michael J. Quinn
%T Designing Efficient Algorithms for Parallel Computers
%I McGraw Hill
%D 1987
%K book, text
%K grecommended91(3),
%K dgg, fpst, dfk,
%X Have used in classes. Readable.
%X Has exercises.
Some comments (not mine):
"A nice outline of a good book, but if you are wanting to present the
 details, he will often abandon you with a high level sketch or an
 algorithm that takes a lot of tracing to understand the basic
 features.  It is usually advisable to have the papers he bases his
 presentation upon prior to starting the chapter, as they often contain
 the detail that is needed to fully understand the presentation. On the
 other hand, some of his presentations are complete and well done, and
 his problems are more basic than Akl's (as a rule)."
"Too much like a survey."
I have heard of mistakes in this book, and coorboration of the
incompleteness of some presentations. The intro is good.

%Q Thinking Machines Corporation
%T Connection Machine Model CM-2 technical summary
%R Technical Report HA87-4
%C Boston, MA
%D April 1987
%K grecommended91(3), hardware architecture,
%K Rhighnam, Connection Machine
%K jlh, dp, (+1),
%K parallel I/O, connection machine, disk array, disk architecture,
SIMD, pario bib,
%X (JLH & DP) Another architecture reference for the CM-2 is Tucker, L.
and G. Robertson [1988].  "Architecture and Applications of the
Connection Machine," Computer, vol. 21, no. 8 (August), pages 26-38.
%X I/O and Data Vault, pp. 27--30 [David.Kotz@Dartmouth.edu]

%A Philip C. Treleaven
%A David R. Brownbridge
%A Richard P. Hopkins
%T Data-Driven and Demand-Driven Computer Architecture
%J Computing Surveys
%V 14
%N 1
%D March 1982
%P 93-143
%K Rdf, enm, dmp, ak,
%K grecommended91(3),
CR Categories and Subject Descriptors:
C.0 [Computer System Organization]:
General - hardware/software interfaces; system architectures;
C.1.2 [Processor Architecture]:
Multiple Data Stream Architectures (Multiprocessors);
C.1.3 [Processor Architecture]: Other Architecture Styles
- data flow architectures; high level language architectures;
D.3.2 [Programming Languages]: Language Classifications - data-flow
languages; macro and assembly languages; very high-level languages
General Terms: Design
Additional Key Words and Phrases: Demand = driven architecture,
data = driven architecture
%X * The aim of this paper is to identify the concepts and relationships
that exist both within and between the two areas of research of
data-driven and demand-driven architectures.
Reproduced in "Selected Reprints on Dataflow and Reduction Architectures"
ed. S. S. Thakkar, IEEE, 1987, pp. 4-54.
%X Top-10 required reading in computer architecture to Dave Patterson.
%X Literature search yields:
01151084   E.I. Monthly No: EI8210086143   E.I. Yearly No: EI82017616
Title: Data-Driven And Demand-Driven Computer Architecture.
Author: Treleaven, Philip C.; Brownbridge, David R.; Hopkins, Richard P.
Corporate Source: Univ of Newcastle upon Tyne, Engl
Source: Computing Surveys v 14 n 1 Mar 1982 p 93-143
Publication Year: 1982
CODEN: CMSVAN   ISSN: 0010-4892
Language: ENGLISH
Journal Announcement: 8210
Abstract: Novel data-driven and demand-driven computer architectures are
under development in a large number of laboratories in the United States,
Japan, and Europe. The concepts and relationships that exist both within
and between the two areas of research are identified by examining
data-driven and demand-driven architecture at three levels: computation
organization, (stored) program organizations, and machine organization.
Finally, a survey of various novel computer architectures under development
is given. 118 refs.
Descriptors: *COMPUTER ARCHITECTURE
Classification Codes: 723  (Computer Software); 722  (Computer Hardware)
72  (COMPUTERS & DATA PROCESSING)

%A Gul A. Agha
%Z U. of Mich.
%T Actors: A Model of Concurrent Computation in Distributed Systems
%I MIT Press
%C Cambridge, MA
%D 1986
%K book, text, communication, evaluation, abstraction,
distributed computing, agents, parallel processing,
%K grecommended91(2),
%K hcc, fpst,
%X See also his PhD thesis of the same title.
%X Now considered a classic text.

%A Selim G. Akl
%T The Design and Analysis of Parallel Algorithms
%I Prentice Hall
%C Englewodd Cliffs, NJ
%D 1989
%K book, text,
%K grecommended96(2), 95,
%K db, dn, parallel matrix multiplication,
%O ISBN 0-13-200056-3

%A Arvind
%A Kattamuri Ekanadham
%T Future Scientific Programming on Parallel Machines
%J Journal of Parallel and Distributed Computing
%V 5
%N 5
%D October 1988
%P 460-493
%K special issue on languages, compilers, and environments
for parallel programming, grecommended(2),
%K parallel execution functional languages

%A Utpal Banerjee
%T Dependence Analysis for Supercomputing
%I Kluwer Academic Publishers
%C Boston
%D 1988
%K book, text, grecommended91(2), parallel compilers, parallel proceessing,
supercomputers,
%K lls, anon.,
%O ISBN #: 0898382890 $60.00

%A John Beetem
%A Monty Denneau
%A Don Weingarten
%Z IBM TJW, Yorktown Heights
%T The GF11 Supercomputer
%J Proceedings of the 12th International Symposium on Computer Architecture
%I IEEE
%D June 1985
%C Boston, MA
%P 108-115
%K Quantum chromodynamics,
%K Special Purpose Parallel Processors, IBM
%r RC 10852
%K grecommended91(2),
%K jlh, dp,
%K suggested supplemental ref jh and dp
%X 576 processors, modified SIMD, 2 MB memory per processor at 20 MFLOPS.
Memphis Switch, 50 ns. 1,125 Macho Bytes, 11,520 Macho FLOPS.

%A John Beetem
%A Monty Denneau
%A Don Weingarten
%Z IBM TJW, Yorktown Heights
%T The GF11 Supercomputer
%B Experimental Parallel Computing Architectures
%E J. J. Dongarra
%S Special Topics in Supercomputing
%V 1
%I Elsevier Science Publishers B.V. (North-Holland)
%C Amsterdam
%D 1987
%P 255-298
%K Quantum chromodynamics,
%K grecommended91(2),
%K jlh, dp,
%K suggested supplemental ref jh and dp
%X 576 processors, modified SIMD, 2 MB memory per processor at 20 MFLOPS.
Memphis Switch, 50 ns. 1,125 Macho Bytes, 11,520 Macho FLOPS.
See also the Proceedings of the 12th International Symposium on
Computer Architecture, IEEE, June 1985, Boston, MA, pages 108-115
or see the IBM TR RC 10852 from TJW.

%A Nicholas Carriero
%A David Gelernter
%T How to Write Parallel Programs: A Guide to the Perplexed
%J ACM Computing Surveys
%V 21
%N 3
%D September 1989
%P 323-357
%d April 1988
%r YALEU/DCS/RR-628
%K special issue on programming language paradigms,
%K Categories and Subject Descriptors:
D.1.3 [Programming Techniques]: Concurrent Programming;
D.3.2 [Programming Languages]: Language classifications -
parallel languages; D.3.3 [Programming Languages]:
Language constructs - concurrent programming structures;
E.1.m [Data Structures]: Miscellaneous -
distributed data structures; live data structures;
General Terms: Algorithms, Program Design, Languages,
Additional Key Words and Phrases: Linda,
parallel programming methodology, parallelism,
%K grecommended91(2),
%K hcc, ag,
%X From page: 326:
It is nonetheless a subtle but essential point that these approaches
represent three clearly separate ways of thinking about the problem:
Result parallelism focuses on the shape of the finished product;
specialist parallelism focuses on the makeup of the work crew; and
agenda parallelism focuses on the list of tasks to be performed.
Also the terms: message-passing, distributed data structures or
live data structures.  Notes that it does not deal with data parallelism
(ala CM) nor speculative parallelism (OR-parallelism).  Tries to be
practical, but it does admit distributed programs are harder and more complex.
%X The authors distinguish between three classes of parallelism,
result, agenda, and specialist,
and between three roughly corresponding methods for implementation,
live data structures, distributed (shared) data structures, and
message passing systems.  The Linda model is then introduced and related to
each class and method #209# it serves as a kind of universal model for
describing the essential parallelism, as opposed to sequential processes.
An example is treated in some detail.
%X Fairly good overview of the language. Nothing about implementation.
Shows good performance.
%X Linda Parallel Programming Guide. Conceptual
Classes of Parallel Programming: result parallel approach, specialist
parallel aproach, agenda parallel approach. Programming Methods:
message passing, live data structures, distributed data structures.

%A Nicholas Carriero
%A David Gelernter
%T How to Write Parallel Programs: A First Course
%I MIT Press
%C Cambridge, MA
%D 1990
%K book, text, Linda,
%K grecommended91,
%K hcc, dfk,
%X Has exercises.
This centers around C-Linda (of course). I like this book. It has an
interesting way of breaking up parallel programming into result
parallelism, agenda parallelism, and specialist parallelism. Linda is
easy to learn and to use, and runs on a variety of platforms. But it
costs money. This book would have to be supplemented with material
about other programming models, architecture, etc. The authors suggest
ways to do this.
%X Used it in Parallel Processing class last spring
good prose about parallel approach and good intro to C-Linda
%X The title should really include the postscript
``in C-Linda'', as that is the only parallel programming system discussed,
and the book's examples are problems which sit well on top of Linda. Despite
that, this is a good introductory text; it describes different ways of
exploiting parallelism while avoiding race conditions, deadlock, and low
performance.

%A David Culler
%A Richard Karp
%A David Patterson
%A Abhijit Sahay
%A Klaus E. Schauser
%A Eunice Santos
%A Ramesh Subramonian
%A Thomas von\ Eicken
%T Log P: Towards a Realistic Model of Parallel Computation
%J 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
(PPOPP 93)
%C San Diego, CA
%D May 1993
%P 1-12
%K theory,
%K grecommended97(2), dab, sm,
basic concepts, multiprocess, theory,
%X Must also read (in conjunction):
A. Alexandrov, M. Ionescu, K. Schauser and C. Scheiman,
LogGP: Incorporating Long Messages into the LogP Model -
One step closer towards a realistic model for parallel computation,
7th Annual ACM Symposium on Parallel Algorithms and Architectures
(7th SPAA'95) Santa Barbara, CA, July 1995, pp. 95-105.

%A William J. Dally
%A Charles L. Seitz
%Z Caltech
%T Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
%J IEEE Transactions on Computers
%V C-36
%N 5
%D May 1987
%P 547-553
%r TR 5231:TR:86
%d June 1986
%K Caltech Cosmic Cube, hypercube, C^3P, wormhole, Rcccp,
%K grecommended91(2),
%K dmp, JLb,

%A F. Darema-Rogers
%A G. F. Pfister
%A K. So
%T Memory Access Patterns of Parallel Scientific Programs
%J Proc. 1987 ACM/SIGMETRICS Conf. in Meas. and Mod. of Comp. Syst.
%V 15
%N 1
%D May 1987
%C Banff, Alberta, Canada
%P 46-57
%K parallel systems, RP3,
%K grecommended91(2),
%K jlh, dp,
%X (JLH & DP) Reports results of measurements of some applications for
the RP3.

%A Jarek Deminent
%T Experience with Multiprocessor Algorithms
%J IEEE Transactions on Computers
%V C-31
%N 4
%P 278-288
%D April 1982
%K bmiya, Cm*, parallel algorithms, applications,
%K grecommended91(2),
%K jlh, dp,
%X This paper reports experience using Cm* for several applications
There are other references available on experience with Cm*; these
are published as CMU technical reports or theses.  Also, see the book
by Gehringer, et al.

%A Jerome A. Feldman
%A Patrick J. Hayes
%A David E. Rumelhart, eds.
%T Parallel Distributed Processing, Explorations in the Microstructure
of Cognition
%V 1, Foundations
%I MIT Press
%D 1986
%K AT15 AI04 AI03 AI08
%K The PDP Perspective, book, text, parallel scheduling bib,
%K grecommended91,
%K jb, jwvz,
%X Two volume set for $35.95.
%X AI meets parallelism and neural nets.
(A bible of sorts)

%A Jerome A. Feldman
%A Patrick J. Hayes
%A David E. Rumelhart, eds.
%T Parallel Distributed Processing, Explorations in the Microstructure
of Cognition
%I MIT Press
%V 2, Psychological and Biological Models
%D 1986
%K AT15 AI04 AI03 AI08
%K The PDP Perspective
%K book, text,
%K grecommended91,
%K jb, jwvz,
%X Two volume set for $35.95.
%X AI meets parallelism and neural nets.
(A bible of sorts)

%A Geoffrey C. Fox
%T Concurrent Processing for Scientific Calculations
%J Digest of Papers COMPCON, Spring 84
%I IEEE
%D Feb. 1984
%P 70-73
%r Hm62
%K super scientific computers, hypercube, Rcccp,
%K grecommended91(2),
%K jlh, dp,
%K suggested supplemental ref by jh and dp
%X An introduction the the current 64 PE Caltech hypercube.  Based
on the dissertation by Lang (Caltech 1982) on the `Homogeneous machine.'

%A Samuel H. Fuller
%A John K. Ousterhout
%A Levy Raskin
%A Paul I. Rubinfeld
%A Pradeep J. Sindhu
%A Richard J. Swan
%T Multi-Microprocessors: An Overview and Working Example
%J Proceedings of the IEEE
%V 66
%N 2
%D February 1978
%P 216-228
%K multiprocessor architectures and operating systems
%K bsatya
%K grecommended91(2),
%K jlh, dp,
%X
Reprinted in "Tutorial: Distributed Processing," IEEE,
compiled by Burt H. Liebowitz and John H. Carson, 1981, 3rd edition.
%X (JLH & DP) Cm* was the first multiprocessor based on microprocessor
technology.  Despite the limited success of the machine, it's impact and ideas
are present in many machines being built today including the
Encore Ultramax (Multimax) and Stanford's DASH.

%A Narain Gehani
%A Andrew D McGettrick, eds.
%T Concurrent Programming
%I Addison-Wesley
%C Reading, MA
%D 1988
%O ISBN 0-201-17435-9
%K book, text, language,
%K grecommended91(2),
%K ps, dfk,
%X Sounds more like concurrent programming than parallel programming.

%A M. J. Gonzalez, Jr.
%T Deterministic processor scheduling
%J Computing Surveys
%V 9
%N 3
%D September 1977
%P 173-204
%K scheduling
%K bsatya
%K grecommended91,
%K dmp, ak,
%X References are classified into various types (single, dual, multiple,
and flow shop) at end of paper.

%z Article
%K Hoare78
%A C. A. R. Hoare
%T Communicating Sequential Processes
%J Communications of the ACM
%V 21
%N 8
%D August 1978
%P 666-677
%K bhibbard
%K grecommended91(2),
%K hcc, ak,
%K programming, programming languages, programming primitives,
program structures, parallel programming, concurrency, input, output,
guarded commands, nondeterminacy, coroutines, procedures, multiple entries,
multiple exits, classes, data representations, recursion,
conditional critical regions, monitors, iterative arrays, CSP,
CR categories: 4.20, 4.22, 4.32
maeder biblio: synchronisation and concurrency in processes,
parallel programming,
%K Guarded commands, parbegin, synchronous message-passing.
%X This paper is now expanded into an excellent book detailed by Hoare
and published by Prentice-Hall.
This paper is reproduced in Kuhn and Padua's (1981, IEEE)
survey "Tutorial on Parallel Processing."
Reproduced in "Distributed Computing: Concepts and Implementations" edited by
McEntire, O'Reilly and Larson, IEEE, 1984.
%X Somewhat dated.
%X Hoare's original CSP paper; not very mathematical.
%X A simple program paradigm based on messages whence came Occam and
the Transputer.
%K messages
%K distributed processing
%K parallel processing
%s darrell@cypress (Sun Mar 14 14:51:09 1993)

%A R. W. Hockney
%A C. R. Jesshope
%T Parallel Computers: 2 Architecture, Programming, and Algorithms,
2nd ed.
%C Pennsylvania
%I IOP Publishing Ltd.
%I Adams Hilger
%D 1988
%K book, text, taxonomy, classification/description, parallel processing,
supercomputers,
parallel processing (electronic computers),
%K grecommended96(2) 91,
%K meb, rgs,
%O ISBN 0-85274-811-6
%X QA 76.6 H63 1988
%X World's most expensive paperback; matched only by Hockney's book on
particle simulations; both worth the price.
%X This book tends to the more mainframe oriented machines and bigger
supercomputers.  Killer micros get a little coverage (CM).  Shows
its UK bias (DAP).
%X review of other taxonomies; introduction of ASN, pages 53-81, chapter 1.2.

%A R. Michael Hord
%T The ILLIAC IV: The First Supercomputer
%I Computer Science Press
%D 1982
%K book, text,
%K bmiya, Rhighnam
%K grecommended91(2),
%K jlh, dp,
%K suggested supplemental reference by jh and dp
%K analysis, algorithm, architecture, ILLIAC-IV, SIMD, software, survey
%K trade/popular/business press, industry references,
%O 510.7809
%X A collection of papers dealing with the ILLIAC IV.  These papers include
reminisces and applications on the ILLIAC.
It is slightly apologetic in tone.
%X Describes in detail the background of the Illiac IV,
programming and software tools, and applications. The chapters are a
little disjointed, and the instruction set is not well explained or motivated.

%A Kai Hwang
%T Advanced Computer Architecture:
Parallelism, Scalability, Programmability
%I McGraw-Hill
%C New York
%D 1993
%K book, text,
%K grecommended96(2) 95,
%K db, a(umn),
%O ISBN 0-07-031622-8

%A Donald E. Knuth
%T The Art of Computer Programming
%V 1, Fundamental Algorithms, 2nd ed.
%I Addison-Wesley
%C Reading, MA
%D 1973
%K book, text,
%K grecommended97(2),
%K rgs, dab,
%X While not specically about parallelism, unquestionably useful.

%A Donald E. Knuth
%T The Art of Computer Programming
%V 2, Seminumerical Algorithms, 2nd ed.
%I Addison-Wesley
%C Reading, Mass.
%D 1981
%K book, text,
%K parallel GCD algorithms,
%K grecommended97(2),
%K rgs, dab,
%X While not specically about parallelism, unquestionably useful.

%A Donald E. Knuth
%T The Art of Computer Programming
%V 3, Sorting and Searching
%I Addison-Wesley
%C Reading, MA
%D 1973
%K book, text,
%K grecommended97(2),
%K rgs, dab,
%X While not specically about parallelism, unquestionably useful.

%A David J. Kuck
%T ILLIAC IV Software and Application Programming
%J IEEE Transactions on Computers
%V C-17
%N 8
%P 758-770
%D August 1968
%K bhibbard
%K applications of array computer, array computer, array language, compiler,
operating system,
%K Rhighnam, language, TRANQUIL,
%K grecommended91(2),
%K jlh, dp,
%X The early proposals for system software for the 256 PE ILLIAC IV.
Examples are given.
%X Contains a rationale for the ``TRANQUIL'' language.
%X (JLH & DP) Kuck's paper discusses the software and programming strategies
for ILLIAC.  Hord's book has more material about software, as well as some
discussion of experience in using the machine.

%A David J. Kuck
%A Edward S. Davidson
%A Duncan H. Lawrie
%A Ahmed H. Sameh
%T Parallel Supercomputing Today and the Cedar Approach
%J Science
%V 231
%N 4741
%D February 28, 1986
%P 967-974
%K paracompiler,
%K grecommended91(2),
%K ab, dar,
%r CSRD TR#558
%X Cedar uses a "paracompiler" to parallelize do-loops.
A more recent paper appears in Dongarra's Experimental Parallel
Computing Architectures book.  Photos.

%A Vipin Kumar
%A Ananth Grama
%A Anshul Gupta
%A George Karypis
%T Introduction to Parallel Computing:
Design and Analysis of Algorithms
%I Benjamin Cummings
%C Redwood City, CA
%D 1994
%K book, text, grecommended96(2) 94, scaleability,
%K mp, a (umn),
%X This new book takes an in-depth look at techniques for the
design and analysis of parallel algorithms.  Its broad, balanced
coverage of important core topics includes sorting and graph
algorithms, discrete optimization techniques, and scientific
computing applications.  The authors focus on parallel
algorithms for realistic machine models while avoiding
architectures that are unrealizable in practice. Numerous
examples and diagrams illustrate potentially difficult subjects
and each chapter concludes with an extensive list of
bibliographic references. In addition, problems of varying
degrees of difficulty challenge readers at different levels.
This is an ideal book for students and professionals who want
insight into problem-solving with parallel computers.
%X For a detailed ASCII brochure reply to: parallel@bc.aw.com.
%X This is to announce the availability of supplementary material and
other information regarding the text book "INTRODUCTION TO PARALLEL
COMPUTING: DESIGN AND ANALYSIS OF ALGORITHMS" (by Kumar, Grama,
Gupta and Karypis, Publisher: Benjamin Cummings, November 93) by
anonymous ftp.
%X The following supplementary material is currently available via
anonymous ftp from the sites ftp.cs.umn.edu:users/kumar/book and
bc.aw.com:bc/kumar:
%X
a) Postscript files containing the figures, tables and pseudocodes
in the text.
b) Errata sheet.
%X If you would like to receive more information on how to retrieve
these, or about the book in general, or be added to a mailing list
announcing updates and additional material on the book, you can
send E-MAIL to book-vk@cs.umn.edu.
%X Solutions to problem sets in the book are available in an
instructors guide directly from Benjamin/Cummings (or contact your
local Addison Wesley / Benjamin/Cummings representative).
%X An excellent text on parallel algorithms. The book's
great strengths is that its authors invest little time in PRAM algorithms,
preferring ones which can be implemented on machines which can actually be
built. The chapters discussing architectures and programming languages
are run-of-the-mill, but the bulk of the book describes and analyzes a
wide variety of algorithms for mesh- and hypercube-based multicomputers
(chosen as representative of sparsely-connected and densely-connected machines
respectively).

%A Richard M. Russell
%T The Cray-1 Computer System
%J Communications of the ACM
%V 21
%N 1
%P 63-72
%D January 1978
%K bhibbard
%K grecommended91,
%K enm, j-lb,
existing classic architecture,
maeder biblio: parallel hardware and devices, implementation,
ginsberg biblio:
%X The original paper describing the Cray-1.
This paper is reproduced in Kuhn and Padua's (1981, IEEE)
survey "Tutorial on Parallel Processing."
Also reproduced in "Computer Structures: Principles and Examples" by
Daniel P. Siewiorek, C. Gordon Bell, and Allen Newell, McGraw-Hill,
1982, pp. 743-752.
Reproduced in Dharma P. Agrawal's (ed.) "Advanced Computer Architecture,"
IEEE, 1986, pp.15-24.
%X Literature search yields:
00712248   E.I. Monthly No: EI7804023850   E.I. Yearly No: EI78014612
Title: Cray-1 Computer System.
Author: Russell, Richard M.
Corporate Source: Cray Res Inc, Minneapolis, Minn
Source: Communications of the ACM v 21 n 1 Jan 1978 p 63-72
Publication Year: 1978
CODEN: CACMA2   ISSN: 0001-0782
Language: ENGLISH
Journal Announcement: 7804
Abstract: The CRAY-1 is described, the evolution of its architecture is
discussed, and an account is given of some of the problems that were
overcome during its manufacture. The CRAY-1 is the only computer to have
been built to date that satisfies ERDA's Class VI requirement (a computer
capable of processing from 20 to 60 million floating point operations per
second). The CRAY-1's Fortran compiler (CFT) is designed to give the
scientific user immediate access to the benefits of the CRAY-1's vector
processing architecture. An optimizing compiler, CFT, " vectorizes "
innermost DO loops. Compatible with the ANSI 1966 Fortran Standard and with
many commonly supported Fortran extensions, CFT does not require any source
program modifications or the use of additional nonstandard Fortran
statements to achieve vectorization. 6 refs.
Descriptors: *COMPUTER ARCHITECTURE; COMPUTER SYSTEMS, DIGITAL
Classification Codes: 722  (Computer Hardware); 723  (Computer Software)
72  (COMPUTERS & DATA PROCESSING)
%X "Designing the Circuits:"
CRAY-1 modules are 6 inches wide.  The distance across the board is about
a nanosecond which is just about the edge time of the electrical signals.
Unless due precautions are taken, when electric signals run around a board,
standing waves can be induced in the ground plane.  Part of the solution is
to make all signal paths in the machine the same length.  This is done by
padding out paths with foil runs and integrated circuit packages.  All told,
between 10 and 20 per cent of the IC packages in the machine are there simply
to pad out a signal line.  The other part of the solution was to use only
simple gates and make sure that both sides of every gate are always
terminated.  This means that there is no dynamic component presented to
the power supply.  This is the principal reason why simple gates are used
in the CRAY-1.  If a more complex integrated circuit package is used, it
is impossible to terminate both sides of every gate.  So all of the CRAY-1's
circuits are perfectly balanced.  Five layer boards have one ground layer,
two voltage layers, and then the two logic layers on the outside.  Twisted
pairs which interconnect the modules are balanced and there are equal and
opposite signals on both sides of the pairs.  The final result is that
there is just a purely resistive load to the power supply.

%A Richard J. Swan
%A S. H. Fuller
%A Daniel P. Siewiorek
%T Cm* -- A Modular, Multi-Microprocessor
%J Proceedings AFIPS National Computer Conference
%I AFIPS Press
%V 46
%D 1977
%P 637-644
%K CMU,
%K grecommended91(2),
%K btartar
%K jlh, dp,
%X This paper is reproduced in Kuhn and Padua's (1981, IEEE)
survey "Tutorial on Parallel Processing."
%X (JLH & DP) Cm* was the first multiprocessor based on microprocessor
technology.  Despite the limited success of the machine, it's impact and ideas
are present in many machines being built today including the
Encore Ultramax (Multimax) and Stanford's DASH.
%X Literature search yields:
00719649   E.I. Monthly No: EI7806040291   E.I. Yearly No: EI78016355
Title: Cm* -- A Modular, Multi-Microprocessor.
Author: Swan, R. J.; Fuller, S. H.; Siewiorek, D. P.
Corporate Source: Carnegie-Mellon Univ, Pittsburgh, Pa
Source:  AFIPS  Conference  Proceedings v 46 1977, Dallas, Tex, Jun 13-16
1977. Publ by AFIPS Press, Montvale, NJ, 1977 p 637-644
Publication Year: 1977
CODEN: AFPGBT   ISSN: 0095-6880
Language: ENGLISH
Journal Announcement: 7806
Abstract: The paper describes the architecture of a new large
multiprocessor computer system being built at Carnegie-Mellon University.
The system allows close cooperation between large numbers of inexpensive
processors. All processors share access to a single virtual memory address
space. There are no arbitrary limits on the number of processors, amount of
memory or communication bandwidth in the system. Considerable support is
provided for low-level operating system primitives and inter-process
communication. 18 refs.
Descriptors: *COMPUTERS, MICROPROCESSOR
Classification Codes: 721  (Computer Circuits & Logic Elements); 722
(Computer Hardware)
72  (COMPUTERS & DATA PROCESSING)

%A Shreekant S. Thakkar
%A Mark Sweiger
%T Performance of an OLTP Application on the Symmetry Multiprocessor System
%J Proc. 17th Annual Symposium on Computer Architecture,
Computer Architecture News
%I ACM
%V 18
%N 2
%D June 1990
%P 228-238
%K applications, Sequent,
%K grecommended91(2),
%K jlh, dp,
%X (JLH & DP) One of the few papers evaluating a
nonscientific application running on a multiprocessor.

%A William A. Wulf
%A Roy Levin
%A Samuel P. Harbison
%T HYDRA/C.mmp: An Experimental Computer System
%I McGraw-Hill
%D 1981
%K grecommended91(2), CMU, C.mmp, HYDRA OS,
multiprocessor architecture and operating systems
%K bsatya, book, text,
%K enm, ag,
%O ISBN 0-07-072120-3
%X A detailed description of the philosophy, design, and implementation of
Hydra, similar in a sense to Organick's monograph on Multics.
Highly recommended to anyone desiring an understanding of
multiprocessor operating systems in general and Hydra in particular.
Text reproduced with the permission of Prentice-Hall \(co 1980.
%X * Describes the architecture of C.mmp, and details the goals, design, and
performance of HYDRA, its capability based OS.

RECOMMENDED W/1 vote

%A Sarita Adve
%A Kourosh Gharachorloo
%T Shared Memory Consistency Models: A Tutorial
%J IEEE Computer
%V 29
%N 12
%D December, 1996
%P 66-76
%K grecommended97, sm,
basic concepts, multiprocess, theory,

%A Albert Alexandrov
%A Mihai Ionescu
%A Klaus E. Schauser
%A Chris Scheiman
%Z UC Santa Barbara
%T LogGP: Incorporating Long Messages into the LogP model -
One step closer towards a realistic model for parallel computation
%J Proc. 7th Annual ACM Symposium on Parallel Algorithms and Architectures
SPAA'95
%C Santa Barbara, California
%D July 1995
%P 95-105
%K Log GP, Log P, Meiko CS-2, FFT, split-C, sort,
%K grecommended97, dab,
%X Must also read (in conjunction):
D.E. Culler, R.M. Karp, D.A. Patterson, A. Sahay, K. E. Schauser,
E. Santos, R. Subramonian and T. von Eicken,
LogP: Towards a Realistic Model of Parallel Computation,
Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
(4th PPoPP'93), May 1993.

%A Kenneth E. Batcher
%T Sorting Networks and Their Applications
%J Conference Proceedings of the 1968 Spring Joint Computer Conference
%V 32
%I AFIPS Press
%C Reston, VA
%D May 1968
%P 307-314
%K bhibbard
%K Rhighnam, algorithm, analysis, bitonic-sort, mesh,
%K grecommended97, dab,
%X Reproduced in the 1984 tutorial: "Interconnection Network for parallel
and distributed processing" by Wu and Feng.

%A Guy Blelloch
%T Prefix Sums and Their Applications
%R CMU-CS-90-190
%I School of Computer Science, Carnegie Mellon University
%C Pittsburgh, PA
%D November 1990
%K segmented parallel prefix,
%K grecommended97, dab,
%X Tutorial.

%A Guy E. Blelloch
%A Charles E. Leiserson
%A Bruce M. Maggs
%A C. Gregory Plaxton
%A Steven J. Smith
%A Marco Zagha
%Z Thinking Machine Corp.
%T A comparison of sorting algorithms for the Connection Machine CM-2
%J Symposium on Parallel Algorithms and Architectures, SPAA'91
%C Hilton Head, South Carolina
%D July 1991
%P 3-16
%r TMC-222
%K Performance Evaluation and Modelling,
%K grecommended97, dab,
%X http://www.cs.cmu.edu/afs/cs.cmu.edu/project/scandal/public/www/alg/sort.html
%X A high-radix radix sort algorithm (which has been implemented on the
CM-2, CM-5, and Cray Y-MP) requires a large number of parallel prefix
computations.
%X Describes several sorting algorithms for massively-parallel machines,
and their performance on the CM-2 Connection Machine.
The authors also describe some ways to optimize these
algorithms, and explain why they believe parallel radix sort to be the best
general-purpose sorting algorithm for the CM-2.
%X A nice use of the elementwise parallel prefix operation on vectors occurs
in parallel radix sort.  Suppose we choose the radix to be 256, so that
each iteration in the radix sort arranges the inputs according to their
value in an 8-bit field.  Each processor looks through its local values
and counts the number of occurrences of each "digit" between 0 and 255,
yielding a vector of 256 counts in each processor.  To convert this local
count into a global destination for each value we must compute the
parallel prefix sum of each of the 256 vector elements across the
processors (and broadcast the sum in the last processor to adjust the
prefix sums so that all "1" digits follow all "0" digits, etc.).

%A T.H. Cormen
%A C.E. Leiserson
%A R.L. Rivest
%T Introduction to Algorithms
%I MIT Press
%C Cambridge, MA
%D 1990
%K book, text, parallel algorithms,
%K grecommended97, dab,

%A Joseph JaJa
%T Introduction to Parallel Algorithms
%I Addison-Wesley
%C New York
%D 1992
%K book, text,
%K grecommended97, mp, dfk, dab,
%K parallel programming, textbook, recommendations, summary,
parallel data structures, Transitive Closure,
%O ISBN 0-201-54856-9
%X Looooooooovely; but except the first chapter it is on PRAMs only.

%A S. Kleiman
%A D. Shah
%A B. Smaalders
%T Programming with Threads
%I Prentice-Hall
%C Englewood Cliffs, NJ
%D 1996
%K book, text,
%K grecommended97, dab,

%A F. Thomas Leighton
%Z MIT
%T Introduction to Parallel Algorithms and Architectures:
Arrays, Trees, Hypercubes
%I Morgan Kaufmann
%C San Mateo, CA
%D 1991
%K book, text, speedup, parallel matrix multiplication,
%K grecommended97/96, a (umn), mp, csc,
%X A huge tome.
%O ISBN 1-55860-117-1 $54.95 US 831 pages
%X Significant topics in computer architecture and parallel algorithms.
The text is written for designers, programmers and engineers
to understand issues at a fundamental level to utilize the full power
afforded by parallel computation.
It is an important resource for students and researchers.
Writing for an advanced general audience, the author assumes
few pre-requisites developing an elegant narrative of
fundamental issues in parallel computation and applied algorithm design.
%X Glowing advertising prose removed.
%X Table of Contents:
1 ARRAYS AND TREES 1
 1.1 Elementary Sorting and Counting 4
  1.1.1 on a Linear Array 5
	- Assessing the Performance of the Algorithm 7
	- Sorting N Numbers with Fewer Than N Processors 10
  1.1.2 in the Bit Model 12
  1.1.3 Lower Bounds 18
  1.1.4 A Counterexample-Counting 22
  1.1.5 Properties of the Fixed-Connection Network Model 29
 1.2 Integer Arithmetic 32
  1.2.1 Carry-Lookahead Addition 32
  1.2.2 Prefix Computations 37
	- Segmented Prefix Computations 43
  1.2.3 Carry-Save Addition 44
  1.2.4 Multiplication and Convolution 48
  1.2.5 Division and Newton Iteration 55
 1.3 Matrix Algorithms 59
  1.3.1 Elementary Matrix Products 60
  1.3.2 Triangular Matrices 66
  1.3.3 Tridiagonal Matrices 72
	- Odd-Even Reduction 72
	- Parallel Prefix Algorithms 78
  1.3.4 Gaussian Elimination 82
  1.3.5 Iterative Methods 92
	- Jacobi Relaxation 93
	- Gauss-Seidel Relaxation 95
	- Finite Difference Methods 97
	- Multigrid Methods 99
 1.4 Retiming and Systolic Conversion 102
  1.4.1 A Motivating Example
	-Palindrome Recognition 103
  1.4.2 Systolic and Semisystolic Models of Computation 103
  1.4.3 Retiming Semisystolic Networks 108
  1.4.4 Conversion of a Semisystolic Network into a Systolic Network 113
  1.4.5 Broadcasting 118
  1.4.6 Retiming the Host 119
  1.4.7 Design by Systolic Conversion--A Summary 123
 1.5 Graph Algorithms 125
  1.5.1 Transitive Closure 125
  1.5.2 Connected Components 130
  1.5.3 Shortest Paths 131
  1.5.4 Breadth-First Spanning Trees 132
  1.5.5 Minimum-Weight Spanning Trees 136
 1.6 Sorting Revisited 139
  1.6.1 Odd-Even Transposition Sort on a Linear Array 139
  1.6.2 A Simple $\sqrt N (log N + 1)$-Steps 144
  1.6.3 A $(3 \sqrt N + o(\sqrt N))$-Steps 148
  1.6.4 A Matching Lower Bound 151
 1.7 Packet Routing 154
  1.7.1 Greedy Algorithms 155
  1.7.2 Average-Case Analysis of 163
	- Routing $N$ Packets to Random Destinations 163
	- Analysis of Dynamic Routing Problems 173
  1.7.3 Randomized Routing Algorithms 178
  1.7.4 Deterministic Algorithms with Small Queues 183
  1.7.5 An Off-line Algorithm 186
  1.7.6 Other Routing Models and Algorithms 197
 1.8 Image Analysis and Computational Geometry 200
  1.8.1 Component-Labelling Algorithms 201
	- Levialdi's Algorithm 202
	- An $O(\sqrt N)$-Step Recursive Algorithm 207
  1.8.2 Computing Hough Transforms 210
  1.8.3 Nearest-Neighbor Algorithms 214
  1.8.4 Finding Convex Hulls 216
 1.9 Higher-Dimensional Arrays 222
  1.9.1 Definitions and Properties 223
  1.9.2 Matrix Multiplication 226
  1.9.3 Sorting 229
  1.9.4 Packet Routing 232
  1.9.5 Simulating High-Dimensional Arrays on Low-Dimensional Arrays 234
2 MESHES OF TREES 277
 2.1 Two-Dimensional Mesh of Trees 280
  2.1.1 Definition and Properties 280
  2.1.2 Recursive Decomposition 282
  2.1.3 Derivation from $K_ N,N $ 283
  2.1.4 Variations 286
  2.1.5 Comparison With the Pyramid and Multigrid 287
 2.2 Elementary $O(log N)$-Step Algorithms 288
  2.2.1 Routing 288
  2.2.2 Sorting 289
  2.2.3 Matrix-Vector Multiplication 291
  2.2.4 Jacobi Relaxation 292
  2.2.5 Pivoting 294
  2.2.6 Convolution 295
  2.2.7 Convex Hull 296
 2.3 Integer Arithmetic 298
  2.3.1 Multiplication 298
  2.3.2 Division and Chinese Remaindering 301
  2.3.3 Related Problems 306
	- Iterated Products 306
	- Root Finding 308
 2.4 Matrix Algorithms 309
  2.4.1 Three-Dimensional Mesh of Trees 310
  2.4.2 Multiplication 311
  2.4.3 Inverting Lower Triangular Matrices 312
  2.4.4 Inverting Arbitrary Matrices 316
	- Csanky's Algorithm 316
	- Inversion by Newton Iteration 319
  2.4.5 Related Problems 320
 2.5 Graph Algorithms 324
  2.5.1 Minimum-Weight Spanning Trees 325
  2.5.2 Connected Components 338
  2.5.3 Transitive Closure 339
  2.5.4 Shortest Paths 340
  2.5.5 Matching Problems 341
 2.6 Fast Evaluation of Straight-Line Code 354
  2.6.1 Addition and Multiplication Over a Semiring 355
  2.6.2 Extension to Codes with Subtraction and Division 367
  2.6.3 Applications 371
 2.7 Higher-Dimensional Meshes of Trees 373
  2.7.1 Definitions and Properties 373
  2.7.2 Shuffle-Tree Graph 374
3 HYPERCUBES AND RELATED NETWORKS 389
 3.1 Hypercube 392
  3.1.1 Definitions and Properties 393
  3.1.2 Containment of Arrays 396
	- Higher-Dimensional Arrays 399
	- Non-Power-of-2 Arrays 401
  3.1.3 Containment of Complete Binary Trees 404
  3.1.4 Embeddings of Arbitrary Binary Trees 410
	- Embeddings with Dilation 1 and Load $O(M/(N + log N))$ 412
	- Dilation $O(1)$ and Load $O((M/N) + 1)$ 416
	- A Review of One-Error-Correcting Codes 418
	- Embedding $P_ log N $ into $H_ log N $ 427
  3.1.5 Containment of Meshes of Trees 430
  3.1.6 Other Containment Results 437
 3.2 Butterfly, Cube-Connected-Cycles, and Benes Network 439
  3.2.1 Definitions and Properties 440
  3.2.2 Simulation of Arbitrary Networks 456
  3.2.3 of Normal Hypercube Algorithms 461
  3.2.4 Some Containment and Simulation Results 465
 3.3 Shuffle-Exchange and de Bruijn Graphs 473
  3.3.1 Definitions and Properties 474
  3.3.2 Diaconis Card Tricks 483
  3.3.3 Simulation of Normal Hypercube Algorithms 491
  3.3.4 Similarities with the Butterfly 495
  3.3.5 Some Containment and Simulation Results 509
 3.4 Packet-Routing Algorithms 511
  3.4.1 Definitions and Routing Models 513
  3.4.2 Greedy Routing Algorithms and Worst-Case Problems 515
	- A General Lower Bound for Oblivious Routing 521
  3.4.3 Packing, Spreading, and Monotone Routing Problems 524
	- Reducing a Routing Problem to a Sorting Problem 538
  3.4.4 Average-Case Behavior of the Greedy Algorithm 539
	- Bounds on Congestion 542
	- on Running Time 547
	- Analyzing Non-Predictive Contention-Resolution Protocols 556
  3.4.5 Converting Worst-Case Routing Problems into Average-Case
	Routing Problems 561
	- Hashing 562
	- Randomized Routing 568
  3.4.6 Bounding Queue Sizes 571
	- Routing on Arbitrary Levelled Networks 588
  3.4.7 Routing with Combining 591
  3.4.8 Information Dispersal Approach to Routing 598
	- Using Information Dispersal to Attain Fault-Tolerance 604
	- Finite Fields and Coding Theory 608
  3.4.9 Circuit-Switching Algorithms 612
 3.5 Sorting 621
  3.5.1 Odd-Even Merge Sort 622
	- Constructing a Sorting Circuit with Depth $log N(log N + 1)/2$ 628
  3.5.2 Sorting Small Sets 632
  3.5.3 A Deterministic $O(log N loglog N)$-Step Sorting Algorithm 642
  3.5.4 Randomized $O(log N)$-Step Sorting Algorithms 657
  - A Circuit with Depth $7.45 log N$ that Usually Sorts 662
 3.6 Simulating a Parallel Random Access Machine 697
  3.6.1 PRAM Models and Shared Memories 698
  3.6.2 Randomized Simulations Based on Hashing 700
  3.6.3 Deterministic Simulations Using Replicated Data 703
  3.6.4 Using Information Dispersal to Improve Performance 709
 3.7 Fast Fourier Transform  711
  3.7.1 Algorithm 711
  3.7.2 Implementation on the Butterfly and Shuffle-Exchange Graph 713
  3.7.3 Application to Convolution and Polynomial Arithmetic 717
  3.7.4 to Integer Multiplication 722
 3.8 Other Hypercubic Networks 730
  3.8.1 Butterflylike Networks 730
	- Omega Network 730
	- Flip Network 732
	- Baseline and Reverse Baseline Networks 732
	- Banyan and Delta Networks 736
	- $k$-ary Butterflies 739
  3.8.2 De Bruijn-Type Networks 739
	- $k$-ary de Bruijn Graph  741
	- Generalized Shuffle-Exchange Graph 742
%X THIS IS a 8192 Byte limit record.  Must trim if adding.

%A Charles E. Leiserson
%A Zahi S. Abuhamdeh
%A David C. Douglas
%A Carl R. Feynman
%A Mahesh N. Ganmukhi
%A Jeffrey V. Hill
%A W. Daniel Hillis
%A Bradley C. Kuszmaul
%A Margaret A. St.\ Pierre
%A David S. Wells
%A Monica C. Wong
%A Shaw-Wen Yang
%A Robert Zak
%T The Network Architecture of the Connection Machine CM-5
%J 4th Annual ACM Symposium on Parallel Algorithms and Architectures
(4th SPAA'92)
%C San Diego
%D June 1992
%K grecommended97, sm,
%K fat trees, parallel machines,
%X http://www.umiacs.umd.edu/research/dbader/
%X For a reference that gives a *little* detail on
how they're actually constructed (which is what makes them interesting, IMO).
-- ST

%A Daniel E. Lenoski
%A Wolf-Dietrich Weber
%T Scalable Shared Memory Multiprocessing
%I Morgan Kaufmann
%C San Francisco, CA
%D 1995
%K grecommended97, csc,
%K book, text, parallel processing, supercomputers, DASH, TERA-DASH,
%X K: General concepts involved in building scalable multiprocessors,
tradeoffs, design of large scale shared memory, design alternatives
of same, survey of historical and proposed scalable shared- memory
machines.  Plus a retrospective examination of the design of the
DASH multiprocessor and extension of a cache-coherant design
like DASH to thousands of processors.
%X This model of processing is worth reading about
(and quite current in a commercial product after some amount of
academic development).  A little lukewarm: some of the early exposition
is rather more biased than necessary.

%A Tom Lovett
%A Russell Clapp
%Z Sequent Computer Systems
%T STiNG: A CC-NUMA Computer System for the Commercial Marketplace
%J 23rd Annual International Symposium on Computer Architecture (23rd ISCA'96),
Computer Architecture News
%I ACM SIGARCH
%V 24
%N 5
%D May 1996
%P 308-317
%K grecommended97, sm,
%K parallel machines,

%Q Message Passing Interface Forum
%T MPI: A Message-Passing Interface Standard
%I University of Tennessee
%C Knoxville, TN
%D June 1995
%K grecommended97, dab,
%X Version 1.1.

%A Shubhendu S. Mukherjee
%A Shamik D. Sharma
%A Mark D. Hill
%A James R. Larus
%A Anne Rogers
%A Joel Saltz
%T Efficient Support for Irregular Applications on Distributed-Memory Machines
%J Proc. 5th ACM SIGPLAN Symposium on Principles and Practice of
Parallel Programming, PPoPP'95
%C Santa Barbara, California
%D July 1995
%P 68-79
%K grecommended97, sm,
%K CM-5, CHAOS, XSM, TSM, Tempest, DSMC, applications,

%A Peter Pacheco
%T Parallel Programming with MPI
%I Morgan Kaufmann
%C San Francisco, CA
%D 1997
%K book, text, PPMPI, C, Fortran77,
%K grecommended97, es,
%O http://www.usfca.edu/mpi  (source programs available)
ISBN is 1-55860-339-5
%X Parallel Programming with MPI is an elementary introduction to programming
parallel systems that uses the MPI 1.1 library of extensions to C and Fortran.
It is intended for use by students and professionals with some knowledge of
programming for conventional, single-processor systems, but
who have little or no experience programming multiprocessor systems.
It is an extensive revision and expansion of A User's Guide to MPI
(by William Gropp, Ewing Lusk and Anthony Skjellum, MIT Press, 1994).
%X 1.Introduction 
   2.An Overview of Parallel Computing 
   3.Greetings! 
   4.An Application: Numerical Integration 
   5.Collective Communication 
   6.Grouping Data for Communication 
   7.Communicators and Topologies 
   8.Dealing with I/O 
   9.Debugging Your Program 
  10.Design and Coding of Parallel Programs 
  11.Performance 
  12.More on Performance 
  13.Advanced Point-to-Point Communication 
  14.Parallel Algorithms 
  15.Parallel Libraries 
  16.Wrapping Up 
  17.Summary of MPI Commands 
  18.MPI on the Internet

%A Greg Pfister
%T In search of clusters
%I Prentice Hall
%C Englewood Cliffs, NJ
%D 1995
%O ISBN 0-13-437625-0
%K book, text,
%K grecommended97, dab,

%A Steve Scott
%T Synchronization and Communication in the T3E Multiprocessor
%J Proceedings of the Seventh International Conference
on Architectural Support for Programming Languages and Operating Systems
(ASPLOS VII),
Computer Architecture News
%D 1996
%P 26-36
%K grecommended97, sm, Cray, parallel machines, distributed memory,
Alpha,

%A V. S. Sunderam
%Z Emory U., Atlanta, GA
%T PVM: A Framework for Parallel Distributed Computing
%J Concurrency: Practice & Experience
%V 2
%N 4
%D December 1990
%P 315-339
%r ORNL/TM-11375
%d February 1990
%K grecommended97, enm,
%K parallel virtual machine,
%K heterogeneous computing, distributed computing, network parallel computing
checkpointing,
%X See netlib.
%X Good overview of PVM, though now a little out of date. Supports
dynamic, location-transparent, process initiation, typed message passing and
shared memory, broadcast and distributed synchronization, and heterogeneity
in the form of language- and machine-independence, type conversion, and
multiple executables for each component. Seems to be heavily dependent
on broadcast. Shared memory is somewhat limited. See also beguelin:concsuper.
[David.Kotz@Dartmouth.edu]

%A Leslie G. Valiant
%T A Bridging Model for Parallel Computation
%J Communications of the ACM
%V 22
%N 8
%D August 1990
%P 103-111
%K CR Categories and Subject Descriptors:
C.1.2 [Processor Architectures]: Multiple Data Stream Architectures
(multiprocessors) - parallel processors; F.1.2
[Computation by Abstract Devices]: Modes of computation - parallelism,
General Terms: Design
Additional Key Words and Phrases: Bulk synchronous parallel (BSP) model,
taxonomy, classification/description,
%K grecommended97, dab,
%X Outlines the BSP - bulk-synchronous parallel - an
extension to the PRAM model, enabling async
operation and modelling latency and limited
bandwidth.  The basic thesis of the paper is that
some model is needed to bridge the gap between
hardware and software, enabling advances in both to
be decoupled form the other. If a certain
programming methodology is adhered to, only a few
extra parameters are necessary  (i.e. only a certain
number of messages are sent/received by any one
processes or groups of processes).  If there is
sufficient slack in the program, automatic
compilation will result in near optimal efficiency
(using hashing techniques), though direct control by
the programmer is possible, and in certain
situations, recommended.  Examples show how several
well know operations can be performed on the
machine, and also discusses two possible techniques
for implementing such a system in hardware.
%X The success of the von Neumann model of
sequential computation is attributable to the fact that it is an
efficient bridge between software and hardware: high-level languages
can be efficiently compiled on to this model; yet it can be
efficiently implemented in hardware.  The author argues that an
analagous bridge between software and hardware is required for
parallel computation if that is to become as widely used.  This
article introduces the bulk-synchronous parallel (BSP) model as a
candidate for this role, and gives results quantifyin its efficiency
both in implementing high-level language features and algorithms, as
well as in being implemented in hardware.
%X Introduces a parallel computer model with virtual parallel processors and
virtual shared memory.

%A David A. Wood
%A Mark D. Hill
%T Cost-effective parallel computing
%J IEEE Computer
%V 28
%N 2
%D February 1995
%P 69-72
%K grecommended97, sm,

%T Top 500 sites list
%D continuing, fill in date
%K grecommended96,
%K dn (sgi),
%O http://parallel.rz.uni-mannheim.de/top500/top500.html
%X 500 most MFLOPS/GFLOPS in the world.
Security targets......

%A Greg Astfalk
%T High Performance Computing for Industry
%J International Mechanical Engineering Congress and Exposition
%C San Francisco
%D November 1995
%K grecommended96,
%K rgs,

%Q Cray Research
%T Cray Research Home Page: http://www.cray.com
%D continuing, fill in date
%K grecommended96,
%K dn (sgi),
%X See also comp.unix.cray.

%A Ian Foster
%T Designing and Building Parallel Programs
%I Addison-Wesley Publishing Company
%C Reading, MA
%D 1995
%K book, text, speedup, parallel, design, tools, resources, CC++, FortranM,
HPF, MPI
%K grecommended96,
%K a(umn),
%O ISBN:  0-201-57594-9 400 pp     Hardcover
%X DESCRIPTION
  Designing and Building Parallel Programs is a practitioner's guide to
  the programming of parallel and distributed computer systems.  It
  provides a comprehensive introduction to parallel algorithm design,
  performance analysis, and program construction.  It describes the
  tools needed to write parallel programs and provides numerous examples.
%X ONLINE VERSION
  A unique feature is the companion on-line version, accessible via the
  World Wide Web using browsers such as Mosaic at URL:
                http://www.mcs.anl.gov/dbpp/
and mirror sites:
        http://www.cs.rdg.ac.uk/dbpp/
        http://www.qpsf.edu.au/mirrors/dbpp/
Additional mirror sites:
        http://www.mcs.anl.gov/dbpp/mirror_sites.html
  The on-line version provides a convenient hypertext version of the
  text with pointers to programming tools, example programs, and other
  resources on parallel and distributed computing.
%X FEATURES
* Shows how to design parallel programs in a methodical fashion and
  how to analyze and improve program performance.
* Introduces modern parallel languages, including Compositional C++
  and Fortran M, that enable you to develop programs easily and
  efficiently.
* Describes the parallel programming standards High Performance
  Fortran (HPF) and Message Passing Interface (MPI).
* Presents different tools for collecting and analyzing parallel program
  performance data.
* Provides numerous bibliographic references so that you can further
  explore special areas of interest.
* Explains how parallel programming can assist in a broad range of
  applications, from circuit design to climate modeling.
* Includes many examples and exercises to expand upon concepts
  introduced in the book.
%X AVAILABILITY
* Wherever technical books are sold.
* For direct orders to Addison-Wesley, please call 1-800-822-6339
  and have your credit card handy.
* To be included on the Addison-Wesley on-line information server,
  please send an email message to awbook@aw.com.  The subject line
  should be 'information' and the body of the message should be
  'send information'.  Adding your name to this listing will enable us
  to keep you informed of all new titles.

%A Michael Metcalf
%A John Reid
%T Fortran 90 Explained
%I Oxford Press
%D 1989
%K book, text, grecommended96,
%K dn,

%A David A. Patterson
%A John L. Hennessy
%T Computer Organization and Design:
Hardware and Software Interface
%I Morgan Kaufmann Publishers
%C San Mateo, California
%D June 1993
%K book, text, computer architecture,
%K grecommended96,
%K
%O ISBN #: 155860281X $73

%A W. Press
%A Teukolsky
%A Vetterling
%A Flannery
%T Numerical Receipes in Fortran, 2nd ed.
%I Cambridge University Press
%C UK
%D 1993
%K book, text, numeric algorithms, grecommended96,
%K dn (sgi),

%A Lawrence Snyder
%Z Dept. of Computer Science, University of Washington, Seattle, CA
%T Type Architectures, Shared Memory, and the Corollary of Modest Potential
%J Annual Reviews of Computer Science
%V 1
%D 1986
%P 289-317
%K CHiP,
%r TR 86-03-04
%K grecommended96
%K rgs,
%X Should be read by most other advocates of the promise of
massive parallelism.
%X It's certainly still premature to be pessimistic regarding
ultimate success of achieving the benefits of massive parallelism, yet
THERE IS REASON TO BE EXTREMELY CAUTIOUS.
%X Continuing to paraphrase Snyder: The frequent challenge with physical
phemomena models is not to solve a fixed-size problem faster, but to
solve larger instances within a fixed time budget.  Simple rationale:
Solutions are so computationally intensive that problem instances of an
"interesting" size do not complete execution in a tolerable amount of
time.  What is the effect of parallelism on problem size?  Keeping time
fixed and UNREALISTICALLY assuming the best possible speedup, it follows
that superlinear problems (the interesting, realistic ones) can only be
improved sublineraly by parallel computation.  Specifically, if a problem
takes t = c*n^x sequential steps, and if the application of p processors
permits an increase in the problem size by a factor of m, t = c*(m*n)^x/p,
then the problem size can increase by at most a factor of m = p^(1/x).
For example, to increase by two orders of magnitude the size of a problem
whose sequential performance is given by t = cn^4 (three spacial, one time
dimension) requires, WITH UNREALISTIC OPTIMISM, 100,000,000 processors!
%X A critique of the parallel random access machine (PRAM)
model much-loved by theoreticians, wrapped up in a larger commentary
on the nature and purpose of abstract models of computation. This paper also
includes a rather pessimistic discussion of the limits to what
parallelism can be expected to deliver.

%Q Usenet news group.
%J comp.parallel newsgroup
%D continuing, fill in date
%K grecommended96,
%K dn (sgi),
%X Limited archives.

%Q Usenet news group.
%J comp.parallel.pvm
%D continuing, fill in date
%K grecommended96,
%K dn (sgi),

%Q Usenet news group.
%J comp.parallel.mpi
%D continuing, fill in date
%K grecommended96,
%K dn (sgi),

%A Patrick H. Worley
%T The Effect of Time Constraints on Scaled Speedup
%J SIAM J. Sci. Stat. Computing
%V 11
%N 5
%D Sept. 1990
%P 838-858
%r ORNL/TM-11031
%d January 1989
%K grecommended96,
%K scalability,
%K rgs,
%Z Oak Ridge National Laboratory, Oak Ridge, TN 37831-8083
%X Wherein he shows that a meaningful interpretation of speed-up
depends on constraints on execution time.  Why don't advocates of
the promise of massive parallelism acknowledge such issues?
%X Not all problems scale the same.
%X Points out that even though a problem might be scalable this
fact may not be usable.   He has a very nice example of a problem
thatis scalable but who run time increase by 12*N where N is the number
of process.  If the problem time took 1 hr for one processes you
would be waiting 512 day for the answer for the scalled up problem on
a 1024 node system.

%A William W. Collier
%Z IBM
%T Reasoning about Parallel Architectures
%I Prentice-Hall
%C Englewood Cliff, NJ
%D 1992
%K book, text,
%K grecommended94, mp,

%A Michel Raynal
%T Algorithms for Mutual Exclusion
%S Scientific Computation Series
%I MIT Press
%C Cambridge, MA
%D 1986
%K book, text,
%K grecommended94, mp, page 107, parallel processing,
%O Algorithmique du parallelisme, translated by D. Beeson
%O ISBN 0-262-18119-3

%A Hans P. Zima
%A Barbara Chapman
%T Supercompilers for Parallel and Vector Computers
%S ACM Press Frontier Series
%I Addison-Wesley (ACM)
%C Menlo Park, CA (New York)
%D 1991
%O ISBN 0-201-17560-6 $39.95
%K book, text, summary, parallelizing compilers, parallel processing,
%K grecommended94/93/91 +8, c.compilers,
%K cbuk, mp,
%X Zima and Chapman provide an excellent introduction to the field of
compilation for vector and parallel computers.  It is self-contained, well
organized and full of examples, algorithms and supporting references.
I recommend it to anyone interested in getting beyond lexical analysis,
parsing and the standard serial processor optimization techniques. (anon.)
%X Ken Kennedy, in the intro, describes this as the first satisfactory
textbook on dependence analysis and its application to vectorization and
parallelization.  Good bibliography.
%X Contents (chapter headings):-
1. Supercomputers and Supercompilers - general stuf,
architecture, applications, early work...
2. Supercomputer Architectures - vector and parallel systems
3. Scalar Analysis - control and data flow, monotone data flow
systems, use-definition chains, dominance relation, simple
optimizations,...
4. Data Dependence - basic concepts and definition, dependence
testing, algorithms for DD testing.
5. Standard Transformations - DO-loop normalization, subscript
normalization, scalar renaming.
6. Vectorization - fundamental transformations, concurrency in
loops, vector code-generation, loop transformations, control dependence
and vectorization, procedure calls.
7. Parallelization - parallel code generation for DOALL,
DOACROSS loops, shared memory parallelization, distributed memory
parallelization.
8. Supercompilers and their Environments - case studies of
'supercompilers', and issues in their environments.
A. Tarjan's algorithm
B. The Banerjee Test
C. Mathematical Notation
%X Each chapter has a series of bibliographic notes and there is a large
bibliography which is very helpful.  The book is quite mathematical in
style, making familiarity with set notation helpful.  I am most
interested in tools to aid in automatic parallelization, particularly
for distributed memory machines, and have therefore read more in the
chapters more relevant to this area - in particular I haven't read
chapters 5 or 6 at all.
Chapters 1 and 2 are fairly standard summaries.
Chapter 3 (53 pages) provides a fairly weighty intro to scalar analysis.
Chapter 4 (62 pages): a thorough introduction, including direction
vectors, gcd, Banerjee and separability tests, etc.  Personally I found
the book form of Wolfe's thesis (Optimizing Supercompilers for
Supercomputers) clearer as a newcomer to the field, but everything is
covered here.
Chapter 7 (56 pages) is biased towards shared memory systems, as that is
where most of the existing work has been done.  The style is clear, and
less mathematical in formulation than earlier chapters.  Definitions of
shared, local, reduction, shared ordered, etc.  variables are clear.
For the distributed memory case an SPMD model is assumed, and all
discussion is in terms of this model.
Chapter 8 (20 pages) gives case studies on Parafrase (Illinois), PFC
(Rice) and SUPERB (Suprenum) as well as looking at the place of
parallelization tools within an integrated development environment for
high-performance computing.
Overall - good summary of the field, somewhat mathematical, and as Ken
Kennedy writes in the Foreword:
"...no satisfactory textbook on dependence analysis and its
applications to vectorization and parallelization has yet emerged.
Given the importance of the subject, this is quite surprising.

%J IEEE Spectrum
%V 30
%N 1
%D January 1993
%K grecommended93 (ls),
%X Entire issue.

%Q Sequent Computer Systems
%T Parallel Programming Guide, 3rd ed.
%I Sequent Computer Systems
%C Portland, OR
%D 1992
%K book, text, grecommended93 (ls),

%A Glenn Zorpette
%T Teraflops Galore
%J IEEE Spectrum
%V 29
%N 9
%D September 1992
%P 26-27
%K special issue on supercomputing, Teraflops galore, reinventing the machine,
%K survey,
%K grecommended93 (ls),
%X Introduction to special issue.  Largely shallow.  The paper on
Japanese supercomputing is very significant (topical).
Slightly more than a page for a marginal picture.
%X Entire issue a recommended reading.

%A P. M. Kogge
%T The Architecture of Pipelined Computers
%I McGraw-Hill Books
%D 1981
%K bmiya, book, text,
%K grecommended92,
%K jwvz,

%A Daniel P. Siewiorek
%A C. Gordon Bell
%A Allen Newell
%T Computer Structures: Principles and Examples
%C New York
%I McGraw-Hill
%D 1982
%K book, text,
%K bmiya,
%K grecommended92,
%K jwvz,
%X Classic big thick expensive book with collection of papers.
%X My "10 required readings" on architecture to Dave Patterson:
Especially:
Chapter 9
Design of the B 5000 System
Chapter 33
Alto: A Personal Computer
Chapter 43
Parallel Operation in the Control Data 6600
Chapter 44
The CRAY-1 Computer System

%A Herbert A. Simon
%T The Sciences of the Artificial, 2nd ed.
%I The MIT Press
%C Cambridge, MA
%D 1981
%K bmiya, book, text,
%K grecommended92
%K jwvz,
%X Oft cited chapter 6 on the architecture of complexity.
%X This is just an all around book that everyone should have read.
%X Now in the third edition.  Have not yet seen changes.

%A John van\ Zandt
%T Parallel Processing in Information Systems: with Examples and Cases
%I Wiley
%C New York
%D 1992
%K grecommended92,
%K book, text, jwvz,
%O 51734U-0-471-54822-7  1992  224pp  cloth  $49.95
%X Thin survey text:
	Why Parallel Processing
	Classification of Parallel Processors
	Early Generations of Parallel Processors
	The Third Generation -- Commercializing Parallel Processing
	Parallel Processor -- New Releases
	Application Domains
	Programming Parallel Processors
	Selecting a Parallel Computer
	The Future of Parallel Processors
%X This book seems like it is written as an intro for those in the
business sector.
%X ** Description **
Written by one of the leading authorities in the field, it
discusses the essential facts of parallel computing as applied to
MIS systems. Thoroughly covers the growth of parallel processors
from early forms to current and future generations; applications
including databases, on line transaction processing, general
purpose time-sharing support and network information servers.
Features programming techniques in order to create parallel
applications using practical, existing examples.
%X ** Market **
Information Systems Managers, Systems Analysts, MIS Strategic
Planners, Students.