High Performance Computing and Communications Glossary

Version 2.1


A significant part of the material of this glossary was adapted from material originally written by Gregory V. Wilson which appeared as "A Glossary of Parallel Computing Terminology" (IEEE Parallel & Distributed Technology, February 1993), and is being re-printed in the same author's "Practical Parallel Programming" (MIT Press, 1995). Several people have contributed additions to this glossary, especially Jack Dongarra, Geoffrey Fox and many of my colleagues at Edinburgh and Syracuse.

A

acyclic graph (n.) a graph without any cycles.

adaptive (adj.) Taking available information into account. For example, an adaptive mesh-generating algorithm generates a finer resolution mesh near to discontinuities such as boundaries and corners. An adaptive routing algorithm may send identical messages in different directions at different times depending on the local density information it has on message traffic. See also oblivious

address generation (n.) a cycle during execution of an instruction in which an effective address is calculated by means of indexing or indirect addressing.

address mask (n.) in internet communications, a 32 bit long mask used to select an IP address for subnet addressing. The mask selects the network portion of the IP address and one or more bits of the local LAN address.

address space (n.) A region of a computer's total memory within which addresses are contiguous and may refer to one another directly. A shared memory computer has only one user-visible address space; a disjoint memory computer can have several.

address translation (n.) in networking, the process of converting external addresses into standardized network addresses and vice versa. This facilitates the interconnection of multiple networks which each have their own address plan.

adjacency matrix (n.) a Boolean matrix indicating for each pair of vertices I and J whether there is an edge from I to J.

all-pairs shortest-path problem (n.) given a weighted graph, find the shortest path between every pair of vertices.

alpha-beta pruning (n.) an algorithm used to determine the minimax value of a game tree. Alpha-beta algorithm avoids searching subtrees whose evaluation cannot influence the outcome of the search.

Amdahl's Law (n.) A rule first formalised by Gene Amdahl in 1967, which states that if F is the fraction of a calculation that is serial and 1-F the fraction that can be parallelized, then the speedup that can be achieved using P processors is: 1/( F + (1-F)/P) which has a limiting value of 1/F for an infinite number of processors. This no matter how many processors are employed, if a calculation has a 10% serial component, the maximum speedup obtainable is 10.

AMI (n.) alternate mark inversion, a line code used for T1 and E1 lines that has a 12.5% ones density minimum, and the one conditions of the signal alternate between positive and negative polarity.

AND tree (n.) a search tree whose nonterminal nodes are all AND nodes.

AND-parallelism (n.) A form of parallelism exploited by some implementations of parallel logic programming languages, in which the terms in conjunctions are evaluated simultaneously, and parent computations are allowed to proceed only once their immediate children have completed. See also OR-parallelism .

AND/OR tree (n.) a search tree with both AND and OR nonterminal nodes.

ANSI (n.) American National Standards Institute, a United States based organization which develops standards and defines interfaces for telecommunications amongst other things.

antidependence (n.) See dependence.

AppleTalk (n.) Apple Computer's proprietary network suite of protocols.

applicative language a language that performs all computations by applying functions to values.

architecture (n.) The basic plan along which a computer has been constructed. Popular parallel architectures include processor arrays, bus-based multiprocessors (with caches of various sizes and structures) and disjoint memory multicomputers. See also Flynn's taxonomy.

arity (n.) The number of arguments taken by a function. Arity is also sometimes used in place of valence.

ARP (n.) Address resolution protocol under TCP/IP used to dynamically bind a high level IP address to a low-level physical hardware address. ARP is limited to a single physical network that supports hardware broadcasting.

ARPAnet (n.) The US Advanced Research Project Agency network, which was the first large multi-site computer network.

array constant (n.) an array reference within an iterative loop, all of whose subscripts are invariant.

array processor (n.) Any computer designed primarily to perform data parallel calculations on arrays or matrices. The two principle architectures used are the processor array and the vector processor.

artificial intelligence (n.) A class of problems such as pattern recognition, decision making and learning in which humans are clearly very proficient compared with current computers.

associative address (n.) a method whereby access is made to an item whose key matches an access key, as distinct from an access to an item at a specific address in memory. See associative memory.

associative memory (n.)Memory that can be accessed by content rather than by address; content addressable is often used synonymously. An associative memory permits its user to specify part of a pattern or key and retrieve the values associated with that pattern. The tuple space used to implement the generative communication model is an associative memory.

associative processor (n.) a processor array with associative memory.

asynchronous (n.) A method of transmission which does not require a common clock, but separates fields of data by stop and start bits.

asynchronous algorithm (n.) See relaxation method.

asynchronous message passing (n.) See message passing .

asynchronous transfer mode (n.) A broadband cell relay protocol that cuts subscriber data into fixed size packets for transfer across the wide area network.

ATM (n.) See asynchronous transfer mode.

atomic(adj.) Not interruptible. An atomic operation is one that always appears to have been executed as a unit.

attached vector-processor (n.) A specialised processor for vector computations, designed to be connected to a general purpose host processor. The host processor supplies I/O functions, a file system and other aspects of a computing system environment.

automatic vectorization (n.) A compiler that takes code written in a serial language (often Fortran or C) and translates it into vector instructions. The vector instructions may be machine specific or in a source form such as array extensions or as subroutine calls to a vector library. See also compiler optimization.

auxiliary memory (n.) Memory that is usually large, in capacity, but slow and inexpensive, often a rotating magnetic or optical memory, whose main function is to store large volumes of data and programs not being actively used by the processors.

AVL tree (n.) binary tree having the property that for any node in the tree, the difference in height between the left and right subtrees of that node is no more than 1.

B

B8ZS (n.) Bipolar with eight zero substitution, a line code used for T1 which converts any string of 8 zeros of a DS-1 signal into a code which at the far end is converted back to eight zeros. The coding actually inserts BPVs that are realized at the next multiplexer point and that taken out of the signal.

back substitution (n.) See LU decomposition.

banded matrix (n.) a matrix in which the non-zero elements are clustered around the main diagonal. If all the non-zero elements in the matrix are within m columns of the main diagonal, then the bandwidth of the matrix is 2m+1.

bandwidth (n.) The communications capacity (measured in bits per second) of a transmission line or of a specific path through the network. Contiguous bandwidth is a synonym for consecutive grouped channels in mux, switch or DACS; i.e. 256kbps (4 64kbps channels).

bank conflict (n.) A bank "busy-wait" situation. Memory chip speeds are relatively slow when required to deliver a single word, so supercomputer memories are placed in a large number of independent banks (usually a power of 2). A vector of data laid out contiguously in memory with one component per successive bank, can be accessed at one word per cycle (despite the intrinsic slowness of the chips) through the use of pipelined delivery of vector-component words at high bandwidth. When the number of banks is a power of two, then vectors requiring strides of a power of 2 can run into a bank conflict.

bank cycle time (n.) The time, measured in clock cycles, taken by a memory bank between the honoring of one request to fetch or store a data item and accepting another such request. On most supercomputers this value is either four or eight clock cycles. See also prefetch.

barrier (n.) A point in a program at which barrier synchronization occurs. See also fuzzy barrier.

barrier synchronization(n.) An event in which two or more processes belonging to some implicit or explicit group block until all members of the group have blocked. They may then all proceed. No member of the group may pass a barrier until all processes in the group have reached it. See also fuzzy barrier.

basic block (n.) A section of program that does not cross any conditional branches, loop boundaries or other transfers of control. Most compiler optimization is done within basic blocks.

batch search (n.) a concurrent search for a number of names.

benchmark (n.) a quantitative measure of performance for a computing system. The measure may be in terms of computational performance, which is often rated in FLOPS, or in terms of memory speed, or communications bandwidth or some more application-oriented measure such as LIPS or OLTPS. A collection of benchmark results for many computer systems is available on line from the National Software Exchange.

Bernstein's Condition (n.) A sufficient condition for the independence of two sections of a program as stated by Bernstein in 1966. If Ri(Wi) is the set of variables read(written) by a section of code i, then Bernstein's Condition states that sections i and j may be executed in an arbitrary order, or concurrently if: there is no true dependence, output dependence or antidependence among the statements in the sections.

BGP (n.) border gateway protocol.

BiCMOS (n.) Bi-polar CMOS. BiCMOS is a merger of ECL and CMOS wafer processes allowing both types of circuit to exist on the same chip. This gives the advantage of the small feature size and large scale integration of CMOS with ECL's high power, fast driver circuits.

binary semaphore (n.) a semaphore that can only take on the values 0 and 1.

BISDN (n.) Broadband Integrated Services Digital Network is a packet switching technique which uses packets of fixed length, resulting in lower processing and higher speeds. See also ATM or cell relay.

bisection bandwidth(n.) The rate at which communication can take place between one half of a computer and the other. A low bisection bandwidth, or a large disparity between the maximum and minimum bisection bandwidths achieved by cutting the computers elements in different ways is a warning that communications bottlenecks may arise in some calculations.

bit-addressable (adj.) Allowing direct access to individual bits, rather than requiring bits to be selected by applying arithmetic or other operations to whole words. The local memory of each processing element in many processor arrays is bit addressable.

bitonic merge (n.)a parallel algorithm to merge two bitonic sequences of length 2^k into a single bitonic sequence of length 2^(k+1) in k+1 steps.

bitonic sequence (n.) a sequence of numbers A0,A1,...An-1 with the property that (1) there exists an index i, 0<=i<=n-1, such that A0 through Ai is monotonically increasing and Ai through An-1 is monotonically decreasing, or else (2) there exists a cyclic shift of indices so that condition 1 is satisfied.

BLAS (n.) Basic linear algebra software: a suite of very basic linear algebra routines, out of which almost any other matrix calculation can be built. See the National Software Exchange for the BLAS source and documentation.

block (v.) To suspend one's own operation, or the operation of another process. A process may block itself, or be blocked by the system, until some event occurs. If all processes are simultaneously blocked, and no external event can cause any of them to become unblocked, then deadlock has occurred. The term block is also often used to refer to a chunk of data or program. See also basic block.

blocking (adj.) An operation that causes the process executing it to block. Usually applied to communications operations, where it implies that the communicating process cannot perform any other operations until the communication has completed. See also asynchronous, non-blocking and synchronous.

boundary value swapping(n.) A technique used when performing calculations on a mesh on which geometric decomposition has been used. During a boundary value swap, each process exchanges values on the edges of its tile or tiles for the complementary values of its neighbours. See also indirect method.

BPS (n.) bytes per second, a unit of memory access speed or communications transfer speed. See also WARPS and FLOPS.

bridge (n.) A LAN internetworking device that filters and passes data between LANs based on Layer 2 (MAC layer) information. Bridges do not use any routing algorithms.

broadcast (n.) To send a message to all possible recipients. Broadcast can be implemented as a repeated send, but is more efficiently implemented by using spanning trees and having each node in the tree propagate the message to its descendants. See also multicast and process group.

brouter (n.) A term used by some venders, normally refering to a bridge also having some of the characteristics of a router.

buffer (n.) A temporary storage area in memory. Many methods for routing messages between processors use buffers at the source and destination, or at intermediate processors. See also packet switching, virtual cut-through and wormhole routing.

buffered message passing (n.) See message passing system.

bus (n.) A single physical communications medium shared by two or more devices. The network shared by processors in many distributed computers is a bus, as is the shared data path in many multiprocessors. See also link.

busy-waiting (n.) a situation whereby processor cycles are used to test a variable until it assumes a desired value.

butterfly(n.) A topology in which nodes are organised into levels, and there is a unique path from any node in the first level to any node in the last. A butterfly is a recursive topology . See also hypercube and shuffle exchange network.

C

cache (n.) A high-speed memory, local to a single processor , whose data transfers are carried out automatically in hardware. Items are brought into a cache when they are referenced, while any changes to values in a cache are automatically written when they are no longer needed, when the cache becomes full, or when some other process attempts to access them. Also (v.) To bring something into a cache.

cache consistency (n.) The problem of ensuring that the values associated with a particular variable in the caches of several processors are never visibly different.

cache hit (n.) a cache access that successfully finds the requested data.

cache line (n.) The unit in which data is fetched from memory to cache.

cache miss (n.) A cache access that fails to find the requested data. The cache must then be filled from main memory at the expense of time.

CAD (n.) Computer-Aided Design; a term which can encompass all facets of the use of computers in manufacturing although the term CAM is also in use.

CAE (n.)Computer-Aided Engineering, like CAD, but usually applied to the use of computers in fields such as civil and nautical engineering.

CAM (n.) Computer-Aided Manufacturing.

CCITT (n.) Consultative Committee International Telephone and Telegraph is an international organization which develops standards and defines interfaces for telecommunications.

cell relay (n.) Packet switching technique which uses packets of fixed length, resulting in lower processing speeds. Also known as BISDN and ATM.

cellular automata (n.) A system made up of many discrete cells, each of which may be in one of a finite number of states. A cell or automaton may change state only at fixed, regular intervals, and only in accordance with fixed rules that depend on cells own values and the values of neighbours within a certain proximity.

cellular computer (n.) A term sometimes used to describe fine grain systems such as neural networks, systolic array, and SIMD systems. Such systems tend to be well suited for the implementation of cellular automata .

CEPT (n.) Conference on European Post and Telegraph is a European organization which develops standards and defines interfaces for telecommunications.

CFD (n.) Computational fluid dynamics; the simulation or prediction of fluid flow using computers, a field which has generally required twice the computing power available at any given time.

chain (n.) A topology in which every processor is connected to two others, except for two end processors that are connected to only one other. See also Hamiltonian, ring.

chaining (n.) the ability to take the results of one vector operation and use them directly as input operands to a second vector instruction, without the need to store to memory or registers the results of the first vector operation. Chaining (or linking as it is sometimes called) can significantly speed up a calculation.

channel (n.) A point-to-point connection between two processes through which messages can be sent. Programming systems that rely on channels are sometimes called connection-oriented, to distinguish them from the more widespread connectionless systems in which messages are sent to named destinations rather than through named channels. See also CSP, channel mask.

channel mask (n.) A non-negative integer specifying a set of communication channels. If the numbers of the channels to be specified are I0,I1,...In-1, then the channel mask is the integer for which these bit numbers are set to 1 and all other bits are 0. For example, if the channels to be specified are numbered 0, 1 and 4, the corresponding channel mask is the binary number 10011, or 19, in which the bits numbered 0, 1 and 4 are set to 1.

chime (n.) Chained vector time, approximately equal to the vector length in a DO-loop. The number of chimes required for a loop dominates the time required for execution. A new chime begins each time a resource such as a functional unit, vector register or memory path, must be reused.

chunksize (n.) The number of iterations of a parallel DO-loop grouped together as a single task in order to increase the granularity of the task.

CIR (n.) See MIR.

circuit switching (n.) A switching method where a dedicated path is set up between the transmitter and receiver. The connection is transparent, meaning that the switches do not try to interpret the data.

CISC (adj.) Complicated instruction set computer; a computer that provides many powerful but complicated instructions. This term is also applied to software designs that give users a large number of of complex basic operations. See also RISC.

clausal logic (n.) a form of logic in which all propositions are expressed in terms of AND, OR and NOT. A six-stage process transforms any predicate calculus formula into clausal form. See also clause.

clause (n.) a sentence in formal logic. See also clausal logic.

CLNP (n.) connectionless network protocol, also known as ISO-IP. This protocol provides a datagram service and is OSI's equivalent to IP.

clock cycle (n.) The fundamental period of time in a computer. Current technology will typically have this measured in nanoseconds.

clock time(n.) Physical or elapsed time, as seen by an external observer. Nonrelativistic time. In small computer systems where all components can be synchronized, clock time and logical time may be the same everywhere, but in large systems it may be difficult for a processor to correlate the events it sees with the clock time an external observer would see. The clock times of events define a complete order on those events.

CLTP (n.) connectionless transport protocol, which provides end-to-end transport data addressing and is OSI's equivalent to UDP.

CMIP (n.) common management internet protocol is the OSI protocol for network management.

CMIS (n.) is the service performed by the common management internet protocol.

CMOS (n.) Complementary Metal Oxide on Silicon. A widely used chip technology. See also BiCMOS.

CMOT (n.) common management internet protocol (CMIP) over TCP is ISO's CMIP/CMIS mapping of the OSI network management protocols and is based on the internet suite of protocols.

co-processor (n.) an additional processor attached to a main processor, to accelerate arithmetic, I/O or graphics operations.

coarse grain(adj.) See granularity

Cobegin/Coend (n.) a structured way of indicating a set of statements that can be executed in parallel.

combining (v.) Joining messages together as they traverse a network. Combining may be done to reduce the total traffic in the network, to reduce the number of times the start-up penalty of messaging is incurred, or to reduce the number of messages reaching a particular destination.

combining switch (n.) an element of an interconnection network that can combine certain types of requests into one request and produce a response that mimics serial execution of the requests.

common subexpression (n.) a combination of operations and operands that is repeated, especially in a loop. A good compiler will not recompute the common subexpressions but will save them in a register for reuse.

communicating sequential processes (n.) See CSP.

communication channel (n.) See channel

communication overhead (n.) A measure of the additional workload incurred in a parallel algorithm due to communication between the nodes of the parallel system. See also latency, startup cost

communication width (n.) the size of shared memory in an SIMD model assuming a global memory.

comparator (n.) a device that performs the compare-exchange operation.

compare-exchange the fundamental operation of the bitonic merge algorithm. Two numbers are brought together, compared, and then exchanged, if necessary, so that they are in the proper order.

compiler directives (n.) special keywords often specified as comments in the source code, but recognised by the compiler as providing additional information from the user for use in optimization.

compiler optimization (n.) Rearranging or eliminating sections of a program during compilation to achieve higher performance. Compiler optimization is usually applied only within basic blocks and must account for the possible dependence of one section of a program on another.

complex instruction set computer (adj.) See CISC.

complexity (n.) a measure of time or space used by an algorithm. Without adjective this refers to time complexity.

compress/index (n.) a vector operation used to deal with the nonzeroes of a large vector with relatively few nonzeroes. The location of the nonzeroes is indicated by an index vector (usually a bit vector of the same length in bits as the full vector in words). The compress operation uses the index vector to gather the nonzeroes into a dense vector where they are operated on with a vector instruction. See also gather/scatter.

computation-to-communication ratio (n.) The ratio of the number of calculations a process does to the total size of the messages it sends. A process that performs a few calculations and then sends a single short message may have the same computation-to-communication ratio as a process that performs millions of calculations and then sends many large messages. The ratio may also be measured by the ratio of the time spent calculating to the time spent communicating, in which case the ratio's value depends on the relative speeds of the processor and communications medium, and on the startup cost and latency of communication. See also granularity.

concurrent computer (n.) A generic category, often used synonymously with parallel computer to include both multicomputer and multiprocessor.

concurrent processing (adj.) simultaneous execution of instructions by two or more processors within a computer.

condition synchronization (n.) process of delaying the continued execution of a process until some data object it shares with another process is in an appropriate state.

configuration (n.) A particular selection of the types of processes that could make up a parallel program. Configuration is trivial in the SPMD model, in which every processor runs a single identical process, but can be complicated in the general MIMD case, particularly if user-level processes rely on libraries that may themselves require extra processes. See also mapping.

conjugate gradient method (n.) A technique for solving systems of linear algebraic equations, which proceeds by minimizing a quadratic residual error function. The method is iterative but quite powerful: in the absence of roundoff error, it will converge exactly in M steps, where M is the order of the system in question.

content addressable (adj.) See associative memory.

contention (n.) Conflict that arises when two or more requests are made concurrently for a resource that cannot be shared. Processes running on a single processor may contend for CPU time, or a network may suffer from contention if several messages attempt to traverse the same link at the same time.

context switching (n.) Saving the state of one process and replacing it with that of another that is time sharing the same processor. If little time is required to switch contexts, processor overloading can be an effective way to hide latency in a message passing system.

control process (n.) A process which controls the execution of a program on a concurrent computer. The major tasks performed by the control process are to initiate execution of the necessary code on each node and to provide I/O and other service facilities for the nodes.

control-driven (adj.) refers to an architecture with one or more program counters that determine the order in which instructions are executed.

cost (n.) complexity of an algorithm multiplied by the number of processors used.

cpu (n.) central processing unit or processor unit of a sequential computer system. Sometimes used to mean one processor element of a concurrent computer system.

CRCW (n.) See PRAM.

CREW (n.) See PRAM.

critical node (n.) when a node is inserted into an AVL tree, the critical node is the root of the subtree about which a rebalancing is going to take place.

critical section (n.) a section of program that can be executed by at most one process at a time.

CSP (n.) Communicating sequential processes; an approach to parallelism in which anonymous processes communicate by sending messages through named point-to-point channels. CSP was coined by Hoare in 1985. All communication is synchronous in that the process that reaches the communication operation first is blocked until a complementary process reaches the same operation. See also guard.

cube-connected cycles network (n.) a processor organization that is a variant of a hypercube. Each hypercube node becomes a cycle of nodes, and no node has more than three connections to other nodes.

cycle a cycle of the computer clock is an electronic signal that counts a single unit of time within a computer.

cycle time (n.) the length of of a single cycle of a computer function such as a memory cycle or processor cycle. See also clock cycle.

cyclic reduction (n.) a parallel algorithm to solve general first order linear recurrence relations.

D

D4 (n.) An AT&T specified format for T1 facilities. Designates every 193rd bit as reserved for framing and synchronization information. Term orignates from the fourth generation digital channel bank.

DACS (n.) Digital Access Cross-connect System is a switch that enables test access and switching of digital signals in a T system.

data cache (n.) a cache that holds data but does not hold instructions.

data dependency (n.) a situation existing between two statements if one statement can store into a location that is later accessed by the other statement. See also dependence.

data flow analysis (n.) process of finding dependencies among instructions.

data flow graph (n.) (1) machine language for a data flow computer; (2) result of data flow analysis.

data parallelism (n.) A model of parallel computing in which a single operation can be applied to all elements of a data structure simultaneously. Typically, these data structures are arrays, and the operations are arithmetic and act independently on every array element, or reduction operations. See also array processor, processor array, SIMD and vector processor.

data-driven (n.) a data flow architecture in which execution of instructions depends on availability of operands.

dataflow (n.) A model of parallel computing in which programs are represented as dependence graphs and each operation is automatically blocked until the values on which it depends are available. The parallel functional and parallel logic programming models are very similar to the dataflow model.

dead code(n.) A portion of a program that does not have to be executed (because the values it calculates will never be used) or that will never be entered. Compiler optimization usually removes sections of dead code. See also dependence.

deadlock (n.) A situation in which each possible activity is blocked, waiting on some other activity that is also blocked. If a directed graph represents how activities depend on others, then deadlock arises if and only if there is a cycle in this graph. See also dependence graph.

decision problem (n.) a problem whose solution, if any, is found by satisfying a set of constraints.

declustered (adj.) A file system that distributes blocks of individual files between several disks. This contrasts with a traditional file system, in which all blocks of a single file are placed on the same disk. See also striped.

DECnet (n.) See DNA.

decomposition (n.) A division of a data structure into substructures that can be distributed separately, or a technique for dividing a computation into subcomputations that can be executed separately. The most common decomposition strategies in parallel computing are: functional decomposition; geometric decomposition and iterative decomposition.

dedicated throughput (n.) the number of results returned for a single job per time unit.

demand-driven (n.) data flow architecture in which execution of an instruction depends upon both availability of its operands and a request for the result.

dependence (n.) The relationship of a calculation B to a calculation A if changes to A, or to the ordering of A and B, could affect B. If A and B are calculations in a program, for example, then B is dependent on A if B uses values calculated by A. There are four types of dependence: true dependence, where B uses values calculated by A; antidependence, where A uses values overwritten by B; output dependence, where A and B both write to the same variables; control dependence, where B's execution is controlled by values set in A. Dependence is also used in message routing to mean that some activity X cannot proceed until another activity Y has completed. For example, if X and Y are messages attempting to pass through a region with limited buffer space, and Y currently holds some or all of the buffer, X may depend on Y releasing some buffer space before proceeding.

dependence graph (n.) A directed graph whose nodes represent calculations and whose edges represent dependencies among those calculations. If the calculation represented by node k depends on the calculations represented by nodes i and j, then the dependence graph contains the edges i-k and j-k. See also compiler optimization, dataflow, dependence.

dependence analysis (n.) an analysis by compiler or precompiler that reveals which portions of a program depend on the prior completion of other portions of the program. Dependency analysis usually relates statements in an iterative code construct.

depth (n.) parallel time complexity.

deque (n.) a double ended queue; that is a list of elements on which insertions and deletions can be performed at both the front and rear.

deterministic model (n.) a task model in which precedence relations between tasks and the execution time needed by each task are fixed and known before the schedule is devised.

diameter (n.) The distance across a graph, measured by the number of links traversed. Diameter is usually taken to mean maximum diameter (ie the greatest internode distance in the graph, but it can also mean the average of all internode distances. Diameter is sometimes used as a measure of the goodness of a topology.

direct mapping (n.) a cache that has a set associativity of one so that each item has a unique place in the cache at which it can be stored.

direct memory access (n.) See DMA.

direct method (n.) Any technique for solving a system of equations that relies on linear algebra directly. LU decomposition with back substitution is an example of a direct method. See also indirect method.

direct naming (n.) a message passing scheme in which source and destination designators are the names of processes.

directed graph (n.) a graph in which the edges have an orientation, denoted by arrowheads.

disjoint memory (n.) Memory that appears to the user to be divided amongst many separate address spaces. In a multicomputer, each processor typically has its own private memory and manages requests to it from processes running on other processors. Disjoint memory is more commonly called distributed memory, but the memory of many shared memory> computers is physically distributed.

disk striping (n.) technique of interleaving a disk file across two or more disk drives to enhance input/output performance. The performance gain is a function of the number of drives and channels used.

distributed computer (n.) A computer made up of many smaller and potentially independent computers, such as a network of workstations. This architecture is increasingly studied because of its cost effectiveness and flexibility. Distributed computers are often heterogeneous. See also multi-processor, multicomputer.

distributed memory (n.) Memory that is physically distributed amongst several modules. A distributed memory architecture may appear to users to have a single address space and a single shared memory or may appear as disjoint memory made up of many separate address spaces.

divide and conquer (n.) a problem solving methodology that involves partitioning a problem into subproblems, solving the subproblems, and then combining the solutions to the subproblems into a solution for the original problem.

DLCI (n.) Data Link Connection Identifier, a frame relay header field that identifies the destination of the packet.

DMA (n.) Direct Memory Access; allows devices on a bus to access memory without requiring intervention by the CPU.

DNA (n.) Digital Network Architecture is Digital Equipment Corporation's proprietary digital network architecture and is also known as DECnet.

domain (n.) That part of a larger computing resource allocated for the sole use of a specific user or group of users. See also space sharing.

domain decomposition (n.) See decomposition

DRAM (n.) Dynamic RAM; memory which periodically needs refreshing, and is therefore usually slower than SRAM but is cheaper to produce.

DS0 (n.) .Digital service hierachy level 0 with a maximum channel capacity of 64kbps.

DS1 (n.) Digital service hierachy level 1 with a maximum channel capacity of 1.544Mbps. This term is used interchangeably with T1. 24 DS-0 channels per DS1.

DS3 (n.) Digital service hierachy level 3 with a maximum channel capacity of 44.736. This term is used interchangeably with T3. 28 DS1 channels per DS3.

dusty deck (n.) A term applied to old programs (usually Fortran or Cobol). The term is derived from the image of a deck of punched cards grown dusty over the years.

dynamic channel naming (n.) a message passing scheme allowing source and destination designators to be established at run time.

dynamic decomposition (n.) a task allocation policy that assumes tasks are generated at execution time. See also decomposition.

E

e-cube routing(n.) A message routing algorithm used on binary hypercubes. If <|Sn-1...S0|> and <|Dn-1...D0|> are the binary co-ordinates of the source and destination of the message, and <|Xn-1...X0>| their difference, then the e-cube algorithm routes the message along link parallel to the axes corresponding to 1's in <|Xn-1...X0|>, in order from highest to lowest. See also Metropolis routing, randomized routing.

E1 (n.) The European standard for high speed digital transmission, defined as 2.048Mbps. The E1 has 31 available 64k channels for traffic use. Also refered as a 2Meg, European T1 and CEPT.

eager evaluation (n.) A scheduling policy under which each computation begins as soon as its inputs are ready or its necessary preconditions have been satisfied. This is the scheduling policy used in most programs, but contrasts with the lazy evaluation policy often used in functional and logic programming. See also dataflow, dependence dependence graph.

ECL (n.) Emitter-coupled logic; a high speed, high power transistor technology for chip manufacture. See also BiCMOS, CMOS.

efficiency (n.) A measure of hardware utilization, equal to the ratio of speedup achieved on P processors to P itself. See also isoefficiency, optimality.

EGP (n.) External Gateway protocol is used by gateways in different autonomous systems. EGP allows gateways to share routing information through advertisements.

enumeration sort (n.) a sort that finds the position of each name by determining the number of names smaller than it.

EPROM (n.) Electronically programmable ROM; a memory whose contents can be changed using special hardware. This usually involves removing the chips from their environment in order to "burn" a new pattern into them.

EREW (n.) See PRAM.

ES-IS (n.) End System to Intermediate System is an OSI protocol that allows end systems to announce themselves as intermediate systems. End Systems are equivalent to the internet concept of a host that uses all layers in the communications model. Intermediate Systems are relay communications nodes or forwarders between End Systems and are effectively a subset of End Systems.

ethernet (n.) A LAN protocol that supports high speed communications in the local area. Usually rates are at 10Mbps.

expand(n.) a vector computer instruction that stores the elements of a vector according to the values in a corresponding masking vector.

expected space complexity (n.) the average amount of space used by an algorithm over all possible inputs.

expected time complexity (n.) the average amount of time used by an algorithm over all possible inputs.

explicitly parallel (n.) language semantics that describe which computations are independent and can be performed in parallel. See also implicitly parallel.

F

fact (n.) in the context of logic programming, a fact is a Horn clause with a head but no body.

fairness (n.) A property of a concurrent system. If a system is fair, then in the long run all processes are served equally. No process has preferential access to semaphores and in particular no process can livelock. See also deadlock.

fast Fourier transform (n.) See FFT

fast packet (n.) Fast packet is a general term for various streamlined packet technologies including frame relay, BISDN and ATM. Compared to X.25 packet switching, fast packet contains a much reduced functionality, but with the lower overhead, fast packet systems can operate at higher rates at the same processing cost.

FastPacketTM (n.) StrataCom Corporation's trademarked term for their proprietary switching technique, which uses 192 bit packets.

FDDI (n.) Fast digital data interface; a standard for fibre optic communications systems. See also ethernet.

fetch-and-add (n.) a computer synchronization instruction that updates a word in memory, returns the value before the update, and produces a set of results as if the processors executed in some arbitrary order. See also prefetch.

FFT (n.) The fast Fourier transform is a technique for the rapid calculation of discrete Fourier transform of a function specified discretely at regular intervals. The technique makes use of a butterfly data structure.

FIFO (n.) First in, first out, a queue.

file server (n.) A process running on a computer that provides access to files for remote user systems.

file system (n.) The hardware used for nonvolatile data storage; the system software that controls this hardware; the architecture of this hardware and software. A parallel file system is one that can be read or written to by many processors simultaneously. See also RAID.

fine grain (adj.) See granularity

finite difference method (n.) A direct method for the approximate solution of partial differential equations on a discrete grid, by approximating derivatives of the unknown quantities on the grid by linear differences. Array processors lend themselves very well to the efficient implementation to this sort of application. See also finite element method.

finite element method (n.) An approximate method for solving partial differential equations by replacing continuous functions by piecewise approximations defined on polygons, which are referred to as elements. Usually polynomial approximations are used. The finite element method reduces the problem of finding the solution at the vertices of the polygons to that of solving a set of linear equations. This task may then be accomplished by a number of methods, including Gaussian elimination, the conjugate gradient method and the multigrid method. See also finite difference method.

flit (n.) A flow control unit - the basic unit of information sent through a message passing system that uses virtual cut-through or wormhole routing.

FLOPS (n.) Floating point operations per second; a measure of memory access performance, equal to the rat eat which a machine can perform single-precision floating-point calculations. See also WARPS.

Flynn's Taxonomy (n.) A classification system for architectures that has two axes: the number of instructions streams executing concurrently, and the number of data sets to which those instructions are being applied. The scheme was proposed by Flynn in 1966. See also SPMD, MIMD, SIMD, SISD.

forall (n.) A programming construct that specifies a set of loop iterations and further specifies that these iterations can be done in any order. data parallel and shared variable programs are often expressed using forall loops.

fork (v.) To create a new process that is a copy of its immediate parent. See also: join, spawn>.

forward reduction (n.) See LU decomposition.

FPU (n.) Floating point unit; either a separate chip or an area of silicon on the CPU specialised to accelerate floating point arithmetic.

FRAD (n.) Frame relay assembler/disassembler, used to interface a LAN with a frame relay WAN.

frame relay (n.) A packet interface data transmission protocol used for connecting LANs via a WAN topology at rates from 56Kbps to T1/E1.

frame slip (n.) Also called just slip, is any shift of the timing on a circuit.

FT-1 (n.) Fractional digital service hierachy level 1 with service in multiples of 56/64Kbps 2 channels (112/128Kbps) or above, and up to 23 channels. 256/512/768/1024Kbps are common rates for this type of service. Also called fractional T1.

FT-3 (n.) Fractional digital service hierachy level 3 with service in multiples of 1.544Mbps. Also called fractional T3.

FTP (n.) file transfer protocol is the TCP/IP standard for remote file transfer. FTP uses telnet and TCP protocols.

full matrix (n.) A full matrix is one in which the number of zero elements is small (in some sense) compared to the total number of elements. See also sparse matrix.

functional decomposition (n.) See decomposition.

functional unit (n.) functionally independent part of the ALU each of which performs a specific function, for example: address calculation; floating-point add or floating-point multiply.

futures (n.) A programming construct specifying that the value of some computation will be needed at some later point, but allowing the system to schedule that computation to run at any arbitrary time. Futures are one way to present lazy evaluation in shared variable programming. See also eager evaluation.

fuzzy barrier(n.) A pair of points in a program such that no process may pass the second point before all processes have reached the first. Fuzzy barriers are often used inside loop iterations to ensure that no changes are made during an iteration before all values needed in the previous iteration have been read. See also barrier, barrier synchronization, dependence.

G

GaAs (n.) Gallium arsenide; an relatively new semiconductor material, still not yet fully mastered by chip manufacturers. GaAs components can run much faster than silicon-based components. See also CMOS.

game tree (n.) the state space tree representation of a game position.

Gantt chart (n.) a diagram used to illustrate a deterministic schedule.

gather/scatter (n.) the operations related to large sparse data structures. A full vector with relatively few nonzeroes is transformed into a vector with only those nonzeroes by using a gather operation. The full vector, or one with the same structure, is built from the inverse operation or scatter. The process is accomplished with an index vector, which is usually the length of the number of nonzeroes, with each component being the relative location in the full vector. See also compress/index.

Gauss-Seidel method (n.) An iterative method for solving partial differential equations on a grid. When updating a grid point the new value depends on the current values at the neighbouring grid points, some of which are from the previous iteration and some, which have already been updated, are from the current iteration. So updates are performed by sweeping through the grid in some user-chosen fashion. The key feature of this algorithm is that it makes use of new information (updated grid point values) as soon as they become available.

Gaussian elimination (n.) A method for solving sets of simultaneous linear equations by eliminating variables from the successive equations. The original equation in the form Ax=b (A is a matrix, b the vector of known values, and x the unknown solution vector) is reduced to Ux=c, where U is an upper triangular matrix. The solution vector x can then be found by back substitution. This method is usually formulated as LU decomposition.

GAXPY (n.) a generalised SAXPY operation, taking linear combinations of a set of columns and accumulating the columns in a vector, as in a matrix-vector product.

generative communication(n.) A model of parallel computing in which processes that have been spawned dynamically turn into data upon completion, and data may be stored in tuples in one or more shared tuple spaces. A process may add tuples to a tuple space, or remove then by matching against their contents. See also associative memory, shared variables, virtual shared memory.

geometric decomposition (n.) See decomposition.

GGP (n.) gateway to gateway protocol is the protocol used by core gateways to exchange routing information.

gigaFLOPS (n.) 10^9 FLOPS

GKS (n.) the Graphics Kernel Standard; a graphics standard developed for pen plotters and now supported on a wide variety of pixel based devices.

global addressing (n.) A frame relay addressing scheme which uses the DLCI to identify a specific end across device somewhere else in the frame relay network. In a global addressing scheme, the DLCI is a unique identifier for each IPX port in the network.

grain (n.) See granularity

granularity (n.) The size of operations done by a process between communications events. A fine grained process may perform only a few arithmetic operations between processing one message and the next, whereas a coarse grained process may perform millions. See also computation-to-communication ratio.

graph (n.) an entity consisting of a set of vertices and a set of edges between pairs of vertices.

Gray code (n.) A mapping which labels the lattice points on an n-dimensional grid with the integers 0,1,2,...2^d-1, so that the labels at adjacent grid points differ in precisely one bit in their integer representation.

Grosch's Law (n.) an empirical rule that the cost of computer systems increases as the square root of the computational power of the systems.

guard (n.) A logical condition that controls whether a communication operation can take place. Guards are usually defined as part of the syntax and semantics of CSP-based langauges.

guard ring (n.) Parallel algorithms operating on a two dimensional grid generally store grid values in a two dimensional array, geometrically decomposed across the nodes of the parallel computer. The guard ring is the additional storage allocated around the edges of the array to hold the values at the grid points lying along the adjacent boundaries of neighbouring nodes. These values are obtained by communication between nodes.

Gustafson's Law(n.) A rule stating that if the size of most problems is scaled up sufficiently, then any required efficiency can be achieved on any number of processors, as stated by Gustafson in 1988. See also Amdahl's Law, efficiency.

H

halo (n.) The region around a point in a mesh, whose values are used when updating the value of the central point. In a cellular automata, the halo comprises those neighbouring cells whose values are used by the automaton's update rule.

Hamiltonian(n.) A closed path through a graph which passes through each node in the graph exactly once.

Hawick (n.) a Scots word meaning a "town surrounded by a hedge"; also an actual town on the border between Scotland and England; also my surname. This is not relevant to HPCC except that this is a usful way of ensuring my email address (hawick@npac.syr.edu) does not get lost from this file so you can always seek out the latest version of this glossary.

HDB3 (n.) High density bipolar three, a line interface standard for E1 which is similar to B8ZS, which eliminates data streams with 8 or more consecutive zeros. Allows for 64Kbps clear channel capacity and still assure a minimum ones density required by E1 lines.

HDLC (n.) high level link control is an ISO link level protocol standard. CCITT uses HDLC for its link access protocol with X.25 networks. HDLC was used in the ARPANET to transfer frames between hosts and packet switched networks.

height (n.) in graph theory, height is the length of the longest path from the root of a tree to one of its leaves.

heterogeneous(adj.) Containing components of more than one kind. A heterogeneous architecture may be one in which some components are processors, and others memories, or it may be one that uses different types of processor together. See also distributed computer, homogeneous.

high-order interleaving (n.) memory interleaving strategy based on high-order bits of an address.

HiPPI (n.) High performance parallel interface; a point to point 100 MByte/sec interface standard used for networking components of high performance multicomputers together.

hit ratio (n.) the ratio of the number of times data requested from a cache is found (or hit) to the number of times it is not found (or missed).

homogeneous(adj.) Made up of identical components. A homogeneous architecture is one in which each element is of the same type; processor arrays and multicomputers are usually homogeneous. See also heterogeneous.

hop (n.) A network connection between two distant nodes.

horizontal processing (n.) act of processing a two dimensional array row by row.

Horn clause (n.) a clause that contains at most one conclusion.

hot-spot contention (n.) an interference phenomenon observed in multiprocessors caused by memory access statistics being slightly skewed from a uniform distribution to favour a specific memory module.

HPCC (n.) an acronymn for High Performance Computing and Communications, which is the field of information addressed by this glossary. A USA National Coordination Office for HPCC also exists, and other information on HPCC can be found from the Northeast Parallel Architectures Center, the Center for Research in Parallel Computing the National Software Exchange or the Edinburgh Parallel Computing Centre depending upon your geography.

hypercube (n.) A topology of which each node is the vertex of a d-Dimensional cube. In a binary hypercube, each node is connected to n others, and its co-ordinates are one of the 2^n different n-bit sequences of binary digits. Most early American multicomputers used hypercubic topologies, and so the term hypercube is sometimes used as a synonym for multicomputers. See also butterfly, e-cube routing, shuffle exchange network.

I

I/O (n.) refers to the hardware and software mechanisms connecting a computer with its environment. This includes connections between the computer and its disk and bulk storage system and also connections to user terminals, graphics systems, and networks to other computer systems or devices. Standard I/O is a particular software package developed under UNIX for the C programming language.

ICMP (n.) is the internet control message protocol and is used to handle errors and control messges at the internet protocol layer. ICMP is considered to be part of IP and is used to test whether a destination is reachable and responding.

IEEE 802.3 (n.) The standard for Carrier Sense Multiple Access with Collision Detection is one of the most used LAN protocols.

IGP (n.) interior gateway protocol is used to exchange routing information between routers in the internet. It is generally used for routers within an autonomous system. The RIP and OSPF are examples of IGP.

ILMI (n.) Interim Local Management Interface that is the de facto standard until a standard based committee designates a standardized protocol suite. In the last few years ILMI has been popular due to the frequency of changes in the tele-communications industry.

implicitly parallel (n.) language semantics that do not allow the user to explicitly describe which computations are independent and can be performed in parallel. For an implicitly parallel language, the compiler must deduce or prove independence in order to make use of parallel hardware. The comparative difficulty of the deduction separates implicitly parallel languages from explicitly parallel ones.

indirect method (n.) Any technique for solving a system of equations that does not rely on linear algebra directly. Successive over-relaxation is an example of an indirect method. See also direct method.

information (n.) a collection of related data objects.

inner product method (n.) method of matrix multiplication in which one element of the resultant matrix is computed at a time. See also middle product method

input/output (n.) See I/O.

instruction buffering (n.) process of prefetching instructions with the goal of never making the processor wait for an instruction to be fetched. This is sometimes also known as instruction look-ahead.

instruction cache (n.) a cache memory that holds only instructions but not data.

instruction pipelining (n.) strategy of allowing more than one instruction to be in some stage of execution at the same time. See also MISD.

instruction scheduling (n.) a strategy of a compiler to analyse the outcome of the operations specified in a program and to issue instructions in an optimal manner. That is, the instructions are not necessarily issued in the order specified by the programmer, but in an order that optimally uses the registers, functional units and memory paths of the computer, while at the same time guaranteeing correct results for the computation.

instruction set (n.) the set of low level instructions that a computer is capable of executing. Programs expressed in a high level language must ultimately be reduced to these.

instruction stream (n.) sequence of instructions performed by a computer.

interactive vectorizer (n.) an interactive program to help a user vectorize source code. See also true ratio.

interconnection network (n.) the system of logic and conductors that connects the processors in a parallel computer system. Some examples are bus, mesh, hypercube and Omega networks.

interleaved memory (n.) memory divide into a number of modules or banks that can be accessed simultaneously.

internal sort (n.) sorting a table of names contained entirely in primary memory.

interprocessor communication (n.) the passing of data and information among the processors of a parallel computer during the execution of a parallel program.

interprocessor contention (n.) conflicts caused when multiple CPU's compete for shared system resources. For example, memory bank conflicts for a user's code in global memory architectures are caused by other processors running independent applications.

interrupt-driven system (n.) A system whose processes communicate by message passing in such a way that when a message is delivered to its destination process it interrupts execution of that process and initiates execution of an interrupt handler process, which stores the message for subsequent retrieval. On completion of the interrupt-handler process (which sets some flag or sends some signal to denote an available message), the original process resumes execution.

interval routing (n.) A routing algorithm that assigns an integer identifier to each possible destination and then labels the outgoing links of each node with a single contiguous interval or window so that a message can be routed simply by sending it out the link in whose interval its destination identifier falls.

invariant (n.) a variable, especially in a DO-loop, that appears only on the right side of an equals sign. The variable is read only, it is never assigned a new value.

invariant expression (n.) an expression, especially in a DO-loop, all of whose operands are invariant or constant.

IP (n.) The internet protocol that defines the unit of information passed between systems that provides a basis packet devlivery service. It handles best effort connectionless delivery service and includes ICMP. See also IP address.

IP address (n.) The Internet protocol address which is a 32-bit address assigned to a host. The IP address has a host component and a network component.

IPX (n.) Integrated Packet Exchange for example Stratacom's Packet switch for public and private T1 and E1 networks.

IPX/link (n.) This application for NetWare connects a PC Novell NetWare LAN through a network interface device.

IS-IS (n.) intermediate system to intermediate system protocol is the ISO protocol that defines how intermediate systems exchange routing information.

ISO (n.) International Standards Organization, which, among other things, sets standards for programming languages.

ISODE (n.) ISO development environment software provides an ISO transport level protocol interface that resides on top of TCP/IP.

isoefficiency(n.) A way to quantify the effect of scaling problem size on an algorithm's efficiency. For a given efficiency, the isoefficiency function for an algorithm specifies what size of problem must be solved on a given number of processors to achieve that efficiency.

iterative decomposition (n.) a decomposition involving breaking down a calculation in which one or more operations are repeatedly applied to one or more data values by treating each data subset / operation subset as a separate task and distributing tasks amongst available processors, either deterministically or randomly. In a deterministic decomposition, the data to be processed are fixed and the same operations are applied to each. In a speculative decomposition, different operations are applied simultaneously to the same input until at least one has completed. See also functional decomposition and geometric decomposition.

iterative deepening (n.) use of a D-ply search to prepare for a (D+1)-ply search.

J

Jacobi method (n.) A stationary, iterative method for solving a partial differential equation on a discrete grid. The update of each grid point depends only on the values at neighbouring grid points from the previous iteration. See also Gauss-Seidel method.

join(v.) To wait for the termination of one or more descendent processes that were forked at some earlier time. See also spawn.

K

KA9Q (n.) an implementation of TCP/IP for amateur packet radio systems.

Kermit (n.) a public domain file transfer and terminal emulation program. Similar in functionality to uucp.

kernel (n.) A process providing basic services. A service kernel can run on every processor to support minimal operating system services, while a routing kernel can handle or forward incoming messages.

key (n.) unique object of a search.

knowledge (n.) information plus semantic meaning.

knowledge information processing system(n.) a computer system that processes knowledge, rather than data or information.

L

LAN (n.) Local area network, a network of multiple interconnected data terminals or devices within a local area to facilitate data transfer. Most notable of LAN topologies is ethernet, token ring, FDDI, etc.

LAP (n.) link access procedure is a modified form of HDLC that CCITT specified for X.25 networks. LAP-B is link access procedures- balanced and is the X.25 implementation of SDLC and similarly, LAP-D is the ISDN and frame relay implementation of SDLC.

LAPACK (n.) A linear algebra software package, which has been mounted on a wide range of platforms. It evolved from the older LINPACK package from Netlib. See also ScaLAPACK.

latency (n.) The time taken to service a request or deliver a message which is independent of the size or nature of the operation. The latency of a message passing system is the minimum time to deliver a message, even one of zero length that does not have to leave the source processor; the latency of a file system is the time required to decode and execute a null operation. See also startup cost.

lazy evaluation (n.) A Scheduling policy under which no calculation is begun until it is certain that its result is needed. This policy contrasts with the eager evaluation used in most programs, but is often used in functional and logic programming. See also dataflow, dependence, dependence graph, futures.

LD-1 (n.) An integrated T1 with possibly voice, data and frame relay circuits. Fractional digital service hierachy level 1 with service much the same as FT-1 except the service is integrated with voice, data, video and frame relay.

light-weight process (n.) A process which executes concurrently with other processes, in the same address space and in an unprotected fashion. Light-weight processes are used by systems such as MACH to reduce the overhead of process start-up.

linear speedup (n.) Speedup that is directly proportional to the number of processors used. According to Amdahl's Law, linear speedup is not possible for a problem that contains any sequential portion, no matter how small. Gustafson's Law however, states that linear speedup can be achieved if the problem size is increased as well as the number of processor employed. See also superlinear speedup.

linear vector scan (n.) a table lookup algorithm for pipelined vector processors that in a single operation compares a large number of contiguous elements of the table against the key.

link (n.) A one-to-one connection between two processors or nodes in a multicomputer. See also bus.

link loading (n.) The amount of communication traffic carried by a link, or by the most heavily loaded link in the system. As link loading increases, both latency and contention are likely to increase. See also bisection bandwidth.

linked array (n.) a data structure designed to allow the joining of various sized lists so that the inserting and deleting of the list elements can be done in parallel without contention.

linked triadic operation (n.) act of executing a pair of vector operations, such as V+S*V as if they were a single, longer operation by taking the output of the first pipelined functional unit and directing it to the second pipelined functional unit, avoiding a trip to main memory and back.

linking (n.) See chaining.

LINPACK (n.) A linear algebra software package, which has been mounted on a wide range of platforms. It has now been superceded by LAPACK. (n.) also a set of widely quoted performance benchmarks based on linear algebra and available from the National Software Exchange.

LIPS (n.) logical inferences per second; procedure calls per second in a logic programming system.

live variable(n.) A variable visible to a process and whose value can be changed by actions taken outside that process. For example, if a variable is shared by two processes, one of which can write to it at some point, then that variable is live within the other process. See also race condition, shared variables.

livelock (n.) A situation in which a process is forever blocked because another process has preferential access to a resource needed by both processes.

LIW(n.) Long Instruction Words: the use of long (64 or more bits) instruction words in a processor to improve its ability to pipeline calculations.

LMI (n.) Local Management Interface, a protocol, with 4 different versions, used to control the local interface from a routing device to the IPX Switch. Also used for configuration, flow control and maintenance of the local connection.

load balance(n.) The degree to which work is evenly distributed among available processors. A program executes most quickly when it is perfectly load balanced, that is when every processor has a share of the total amount of work to perform so that all processors complete their assigned tasks at the same time. One measure of load imbalance is the ratio of the difference between the finishing times of the first and last processors to complete their portion of the calculation to the time taken by the last processor.

locality (n.) The degree to which the computations done by a processor depend only on data values held in memory that is close to that processor, or the degree to which computations done on a point in some data structure depend only on values near that point. Locality can be measured by the ratio of local to nonlocal data accesses, or by the distribution of distances of, or times taken by, nonlocal accesses. See also halo, stencil.

locality of reference (n.) the observation that references to memory tend to cluster. Temporal locality refers to the observation that a particular datum or instruction, once referenced, is often referenced again in the near future. Spatial locality refers to the observation that once a particular location is referenced, a nearby location is often referenced in the near future.

lock (n.) Any device or algorithm whose use guarantees that only one process can perform some action or use some resource at a time.

logarithmic cost criterion (n.) cost criterion that assumes the cost of performing an instruction is proportional to the length of the operands. The integer N requires at least log N + 1 bits of memory, hence the name.

logic (n.) the branch of mathematics that investigates the relationships between premises and conclusions of arguments.

logic program (n.) a set of Horn clauses, or procedures.

logical inferences per second (n.) See LIPS.

logical time (n.) Elapsed time as seen from within processes. This may differ from clock time, because processes can block or be suspended during multitasking and because they can run at different speeds. The logical times of events only define a partial order on those events.

loop unrolling (n.) A compiler optimization technique in which the body of a loop is replicated L times, and the number of iterations of that loop reduced by a corresponding factor L. By lengthening the basic block inside the loop, this can increase the scope for vectorization and other optimizations.

loopback test (n.) A circuit test at any device which will tie the transmit data to the receive data in order to apply a signal and receive the data back for interpretation.

loose synchronization (adj.) A program running on a concurrent computer is said to be running in loose synchronization if the nodes are constrained to intermittently synchronize with each other via some communication. Frequently, some global computational parameter such as a time or iteration count provides a natural synchronization reference. This parameter divides the running program into compute and communication cycles. See also synchronous.

low-order interleaving (n.) a memory interleaving based on the low-order bits of the address.

lower triangular matrix (n.) a matrix with no nonzero elements above the main diagonal.

LU decomposition (n.) a technique where a matrix A is represented as the product of a lower triangular matrix, L, and an upper triangular matrix U. This decomposition can be made unique either by stipulating that the diagonal elements of L be unity, or that the diagonal elements of L and U be correspondingly identical.

M

M-section (n.) a table lookup algorithm for pipelined vector processors that combines features of bisection and linear vector scan.

MACH (n.) an operating system based on Berkely UNIX developed by Carnegie Mellon University.

macropipelining(n.) See pipelining.

macrotasking (n.) technique of dividing a computation into two or more large tasks to be executed in parallel. Typically the tasks are subroutine calls executed in parallel.

mailbox (n.) an address used as a source or destination designator in a message.

mapping (n.) often used to indicate an allocation of processes to processors; allocating work to processes is usually called scheduling. See also load balance.

marshall(v.) To compact the values of several variables, arrays, or structures into a single contiguous block of memory; copying values out of a block of memory is called unmarshalling. In most message passing systems, data must be marshalled to be sent in a single message.

mask (n.) A Boolean array or array-valued expression used to control where a data parallel operation has effect; the operation is only executed where array elements are true.

MegaFLOPS (n.) 10^6 FLOPS.

memory bank conflict (n.) a condition that occurs when a memory unit receives a request to fetch or store a data item prior to completion of its bank cycle time since its last such request.

memory protection (n.) Any system that prevents one process from accessing a region of memory being used by another. Memory protection is supported both in hardware and by the operating system of most serial computers, and by the hardware kernel and service kernel of the processors in most parallel computers.

mesh(n.) A topology in which nodes form a regular acyclic d-dimensional grid, and each edge is parallel to a grid axis and joins two nodes that are adjacent along that axis. The architecture of many multicomputers is a two or three dimensional mesh; meshes are also the basis of many scientific calculations, in which each node represents a point in space, and the edges define the neighbours of a node. See also hypercube, torus.

message passing (n.) A style of interprocess communication in which processes send discrete messages to one another. Some computer architectures are called message passing architectures because they support this model in hardware, although message passing has often been used to construct operating systems and network software for uniprocessors and distributed computers. See also routing.

message typing (n.) The association of information with a message that identifies the nature of its contents. Most message passing systems automatically transfer information about a message's sender to its receiver. Many also require the sender to specify a type for the message, and let the receiver select which types of messages it is willing to receive.

message-oriented language (n.) a programming language in which process interaction is strictly through message passing.

Metropolis routing(n.) A routing algorithm for meshes, in which an ordering is imposed on axes, and messages are sent as far along the most significant axis as they need to go, then as far along the next most significant axis, and so on until they reach their destination. See also e-cube routing, randomized routing.

MHS (n.) is CCITT's X.400 series of recommendations for electronic mail transfer. MHS defines the system of message user agents, message transfer agents, message stores and access units.

MIB (n.) management information base is a variable database for gateways running CMOT or SNMP. MIB-II refers to a database not shared by CMOT and SNMP.

microtasking (n.) technique of employing parallelism at the DO-loop level. Different iterations of a loop are executed in parallel on different processors.

middle product method (n.)a method of matrix multiplication in which entire columns of the result are computed concurrently. See also inner product method.

MIMD (n.) Multiple Instruction, Multiple Data; a category of Flynn's taxonomy in which many instruction streams are concurrently applied to multiple data sets. A MIMD architecture is one in which heterogeneous processes may execute at different rates.

minimax (n.)algorithm used to determine the value of a game tree.

minimum spanning tree (n.) a spanning tree with the smallest possible weight among all spanning trees for a given graph.

MIPS(n.) one Million Instructions Per Second. A performance rating usually referring to integer or non-floating point instructions. See also MOPS.

MIR (n.) Minimum Information Rate or Committed Information Rate (CIR), is the minimum transmit and receive data rate for a connection.

MISD (n.) Multiple Instruction, Single Data. A member of Flynn's taxonomy almost never used. This category has an ambiguous meaning. It refers to a computer which applies several instructions to each datum. The closest real implementation to this category is a vector computer with an instruction pipeline.

module (n.) a memory bank, often used in the context of interleaved memory.

monitor (n.) a structure consisting of variables representing the state of some resource, procedures to implement operations on that resource, and initialization code.

Monte Carlo (adj.) Making use of randomness. A simulation in which many independent trials are run independently to gather statistics is a Monte Carlo simulation. A search algorithm that uses randomness to try to speed up convergence is a Monte Carlo algorithm.

MOPS(n.) one Million Operations Per Second. Usually used for a general operation, either integer, floating point or otherwise. See also MIPS, FLOPS.

MOS (n.) Metal oxide on Silicon: a basic technology for fabricating semiconductors. See also CMOS, BiCMOS.

motherboard (n.) A printed circuit board or card on which other boards or cards can be mounted. Motherboards will generally have a number of slots for other boards, by which means the computer system may be expanded. When all the slots are used up however, it is usually difficult to expand further, and this is the manufacturer's way of telling you to buy a bigger system.

multicast(n.) To send a message to many, but not necessarily all possible recipient processes. See also broadcast, process group.

multicomputer(n.) A computer in which processors can execute separate instruction streams, have their own private memories and cannot directly access one another's memories. Most multicomputers are disjoint memory machines, constructed by joining nodes (each containing a microprocessor and some memory) via links. See also architecture, distributed computer, multiprocessor, processor array.

multigrid method (n.) A method for solving partial differential equations in which an approximate solution on a coarse resolution grid is used to obtain an improved solution on a finer resolution grid. The method reduces long wavelength components of the error or residual by iterating between a hierarchy of coarse and fine resolution grids.

multiprocessor(n.) A computer in which processors can execute separate instruction streams, but have access to a single address space. Most multiprocessors are shared memory machines, constructed by connecting several processors to one or more memory banks through a bus or switch. See also architecture, distributed computer, multicomputer, processor array.

multiprogramming (n.) the ability of a computer system to time share its (at least one) CPU with more than one program at once. See also multitasking.

multitasking(n.) Executing many processes on a single processor. This is usually done by time-slicing the execution of individual processes and performing a context switch each time a process is swapped in or out, but is supported by special-purpose hardware in some computers. Most operating systems support multitasking, but it can be costly if the need to switch large caches or execution pipelines makes context switching expensive in time.

mutual exclusion(n.) A situation in which at most one process can be engaged in a specified activity at any time. Semaphores are often used to implement this. See also contention, deadlock, critical sections.

N

n-1/2 (n.) (pronounced as n-one-half, usually written with a subscript.) The minimum vector length on which a pipelined architecture delivers one half of its theoretical peak performance. The larger n-1/2 is, the longer calculations must be to amortize the startup cost of the pipeline. This measure was coined by Hockney and Jesshope in 1988. See also r-inf.

NC (n.) The class of parallel algorithms that can run in polylogarithmic (polynomial in the logarithm of the problem size) time on a polynomial number of processors in the PRAM model.

necklace (n.) the nodes a data item travels through in response to a sequence of shuffles.

need-predictable (adj.) Used to describe a concurrent algorithm in which the need for but not the nature of point-to-point communication between nodes is known prior to program execution. Need predictable problems are loosely synchronous.

NeTBIOS (n.) Network Basic Input/Output System that provides a Session layer interface between network applications running on a PC and the underlying protocol software of the Transport and Network layers on the OSI model. Normally a LAN protocol.

network(n.) A physical communication medium. A network may consist of one or more buses, a switch, or the links joining processors in a multicomputer.

neural network (n.) Man made devices using interconnects and processing capabilities suggested by models of the cortex are termed neural networks. These systems are widely used for optimization problems including content addressable memories and pattern recognition.

NFS (n.) network file system ia a protocol developed to use IP and allow a set of computers to access each other's file systems as if they were on the local host.

NNTP (n.) network news transport protocol is used with uucp to transfer news across the Usenet.

node (n.) generic term used to refer to an entity that accesses a network.

non-blocking(adj.) An operation that does not block the execution of the process using it. Usually applied to communications operations, where it implies that the communicating process may perform other operations before the communication has completed. See also blocking.

non-deterministic model (n.) a task model in which the execution time of each task is represented by a random variable.

non-uniform memory access (adj.) See NUMA.

NP complete (adj.) A term used to describe a problem if it can only be solved in polynomial time by non-deterministic methods. In practice such problems are "hard" and are only solved by a number of heuristic methods that only give approximate answers near to the optimum solution.

NUMA (adj.) Nonuniform memory access; not supporting constant-time read and write operations. In most NUMA architectures, memory is organised hierarchically, so that some portions can be read and written more quickly by some processors than by others. See also UMA

O

oblivious(adj.) Working in the same fashion regardless of surroundings or history. An oblivious scheduling strategy always schedules processes in the same way, no matter how much work they have done in the past; an oblivious routing algorithm always routes messages in the same direction, regardless of local load. See also adaptive.

OEM (n.) Original Equipment Manufacturer; a company which adds components to someone else's computers and sells the result as a complete product.

OLTP (n.) On-line transaction processing; handling transactions (such as deposits and withdrawals) as they occur. An application area of great importance to banks and insurance companies.

Omega network (n.) a composition of shuffle-exchange networks with programmable switches.

OOF (n.) out of frame condition counter that increments every change in the framing status of a circuit or device.

operating system (n.) That software responsible for providing standard services and supporting standard operations such as multitasking and file access. See also kernel.

operation oriented language (n.) a programming language using remote procedure calls as the principle means for interprocess communication and synchronization.

optimal (adj.) Cannot be bettered. An optimal mapping is one that yields the best possible load balance; an optimal parallel algorithm is one that has the lowest possible time-processor product.

optimization block (n.) a block of code (rarely a whole subprogram, but often a single DO-loop) in which a compiler optimizes the generated code. A few compilers attempt to optimize across such blocks; many just work on each block separately.

optimization problem (n.) a problem whose solution involves satisfying a set of constraints and minimising (or maximising) and objective function.

OR-parallelism (n.) A form of parallelism exploited by some implementations of parallel logic programming languages, in which the terms in disjunctions are evaluated simultaneously, and the parent computation allowed to proceed as soon as any of its children have completed. See also AND-parallelism.

OS (n.) See operating system.

OSF (n.) Open Software Foundation; an organization established by a number of the major computer manufacturers to set software standards.

OSPF (n.) open shortest path first is a proposed IGP for the internet.

output dependence (n.) See dependence.

P

packet switching(n.) A routing technique in which intermediate nodes wait until they have received the whole of a message before forwarding any of it. Packet switching often requires a large amount of buffer space, and contention for access to this space can lead to deadlock. See also virtual cut-through, wormhole routing.

PAD (n.) packet assembler/disassembler, a device that converts a serial data stream into discrete packets in the transmit direction and converts the received packets back into a serial data stream. Adds header information in the transmit packet to allow it to be routed to the proper destination.

page (n.) The smallest unit of a virtual memory system. The system maintains separate virtual-to-physical translation information for each page.

parallel balance point (n.) The point at which using more processors to do a calculation increases the time taken to complete that calculation. See also Amdahl's Law, efficiency, Gustafson's Law, isoefficiency, speedup.

parallel computation thesis (n.) time on an unbounded parallel model of computation is (polynomially) equivalent to space on a sequential model of computation. (Unproved)

parallel computer (n.) A computer system made up of many identifiable processing units working together in parallel. The term is often used synonymously with concurrent computer to include both multiprocessor and multicomputer. The term concurrent generally dominates in usage in the USA, whereas the term parallel is the more widely used in Europe.

parallel prefix (n.) An operation applying an associative binary operator o to an n-vector V to produce: <(V0)(V0oV1)(V0oV1oV2)...(V0oV1o...oVn)>. Variations of this operation, which is also sometimes called a scan, may leave the operations identity element in the first element of the vector, apply the operation downward from the last element of the vector, and so on. See also reduction, scan-vector model, segmented parallel prefix operation.

parallel random access machine (n.) See PRAM

parallel slackness (n.) Hiding the latency of communication by giving each processor many different task, and having them work on the tasks that are ready while other tasks are blocked (waiting on communication or other operations).

parallel unification(n.) finding the set of possible unifications for a goal in parallel.

parallelization (n.) Turning a serial computation into a parallel one. Also sometimes turning a vector computation into a parallel one. This may be done automatically by a parallelising compiler or (more usually) by rewriting (parts of) the program so that it uses some parallel paradigm. See also dataflow, data parallelism, futures, generative communication, message passing, shared variables.

parsing (n.) process whereby a compiler analyzes the syntax of a program to establish the relationship among operators, operands, and other tokens of a program. Parsing does not involve any semantic analysis.

partial cascade sum (n.)parallel algorithms to compute partial sums in logarithmic time.

partial sum (n.) the result from a summation of some subsequence of the elements in a sequence of operands.

partitioning (n.) process of restructuring a program or algorithm into independent computational segments. The goal is to have multiple processors simultaneously work on these independent computational segments.

PDE (n.) partial differential equation.

percentage parallelization (n.) the percentage of processor expenditure processed in parallel on a single job. It is not usually possible to achieve 100 percent of an application's processing time to be shared equally on all processors. See Amdahl's Law.

percentage vectorization (n.) The percentage of an application executing in vector mode. This percentage may be calculated as a percentage of CPU time or as the percentage of lines of code (usually Fortran) in vector instructions. The two approaches are not consistent and may give very different ratings. The first calculation method leads to performance improvement as measured by CPU time, while the second method measures the success rate of the compiler in converting scalar code to vector code. The former is the more meaningful hardware performance measure. See also vectorize.

performance model (n.) A formula that predicts the speed, efficiency, memory requirements, or other execution characteristics of a program on a particular architecture

ping (n.) The packet internet groper is a program useful in testing and debugging LAN/WAN troubles. It sends out an echo and expects a specified host to respond back in a specified time frame.

pipe (n.) A communication primitive which involves the transmission of information through a linearly connected subset of the nodes of a parallel computer.

pipelining (n.) Overlapping the execution of two or more operations. Pipelining is used within processors by prefetching instructions on the assumption that no branches are going to preempt their execution; in vector processors, in which application of a single operation to the elements of a vector or vectors may be pipelined to decrease the time needed to complete the aggregate operation; and in multiprocessors and multicomputers, in which a process may send a request for values before it reaches the computation that requires them. See also architecture.

PIR (n.) peak information rate, is the peak rate of information transfer available on a connection. See also MIR and CIR and QIR.

pivot (n.) A particular element of a matrix during some algorithm. Many matrix algorithms need carefully chosen pivots to improve their numerical accuracy and stability. Pivoting involves arranging that the pivotal element has an appropriate numerical value by reordering and/or swapping rows and columns in the matrix.

PLN (n.) Packet Line, an inter-switch trunk, usually an E1 or T1, designed to carry packets between IPX nodes.

point of presence (pop) (n.) The physical access location interface between a local exchange carrier and the main network. The point to which the telephone company terminates a subscriber's circuit for long distance service or leased line communications.

polling (adj.) of a communication system. Polling involves a node inspecting the communication hardware - typically a flag bit - to see if information has arrived or departed. Polling is an alternative to an interrupt driven system. The natural synchronization of the nodes imposed by polling is used in the implementation of blocking communications primitives.

ports (n.) a variant of mailboxes allowing multiple client processes but only a single server process.

POSIX (n.) A standard of definition of the interface to the UNIX operating system.

PPP (n.) point to point protocol is an alternative to SLIP and provides router to router and host to network connections over both synchronous and asynchronous circuits.

PRAM (n.) Parallel random access machine; a theoretical model of parallel computation in which an arbitrary but finite number of processors can access any value in an arbitrarily large shared memory in a single time step. Processors may execute different instruction streams, but work synchronously. The three most important variations of the PRAM are: EREW - Exclusive read, exclusive write; any memory location may only be accessed once in any one step. CREW - Concurrent read, exclusive write; any memory location may be read any number of times during a single step, but only written to once, with the write taking place after the reads. CRCW - Concurrent read, concurrent write; any memory location may be written to or read from any number of times during a single step. A CRCW PRAM model must define some rule for resolving multiple writes, such as giving priority to the lowest-numbered processor or choosing amongst processors randomly. The PRAM is popular because it is theoretically tractable and because it gives algorithm designers a common target. However, PRAMs cannot be emulated optimally on all architectures. See also NC.

preconditioning (n.) a technique for improving the convergence of iterative matrix inversion algorithms such as the conjugate gradient method, by transforming the matrix of equation coefficients so that the eigenvalues are redistributed.

prefetch (v.) to fetch or load a data entity or program instruction from memory in advance of actually starting to process it. Processors that have prefetch instructions can avoid some of the bottlenecks that arise from a memory system that is slower than processing speed.

private line (n.) A full duplex dedicated channel between two specified points.

private memory (n.) Me