CPA 2000 - Academic Programme

Communicating Process Architectures - 2000

Sunday 10th. September (evening) through Wednesday 13th. September (lunchtime)

Academic Programme

[Note: the categories listed below are not definitive - many papers fit into several]

Invited Talk:

Concurrency in Industry (Wot, no CSPs?)
Johan P.E. Sunter (Philips TASS, Eindhoven, The Netherlands)

Abstract: Within an international company such as Philips, systems are made from the very small to the extremely large. And although parallelism is an important aspect of system architecture, it is often overlooked. Due to the wide variety of systems, parallelism presents itself in many different shapes, requiring different design approaches. What approaches have been chosen in the past, and do they have something in common? Can we learn from them how to do it better? What do we need in the future?

Theory:

Formal Analysis of Concurrent Java Systems
Peter H. Welch (Computing Laboratory, University of Kent)
Jeremy M.R. Martin (Oxford Supercomputing Centre)

Abstract: Java threads are synchronised through primitives based upon monitor concepts developed in the early 1970s. The semantics of Java's primitives have only been presented in natural language -- this paper remedies this with a simple and formal CSP model. In view of the difficulties encountered in reasoning about any non-trivial interactions between Java threads, being able to perform that reasoning in a formal context (where careless errors can be highlighted by mechanical checks) should be a considerable confidence boost. Further, automated model-checking tools can be used to root out dangerous states (such as deadlock and livelock), find overlooked race hazards and prove equivalence between algorithms (e.g. between optimised and unoptimised versions).
A case study using the CSP model to prove the correctness of the JCSP and CTJ channel implementations (which are built using standard Java monitor synchronisation) is presented. In addition, the JCSP mechanism for ALTing (i.e. waiting for and, then, choosing between multiple events) is verified. Given the history of erroneous implementations of this key primitive, this is a considerable relief.

Practice:

CSP: Arriving at the CHANnel Island (an Industrial Practitioner's Diary: in Search of a New Fairway)
Oyvind Teig (Navia Maritime AS, division Autronica, Norway)

Abstract: This paper is a non-academic hands-on log of practical experiences of software engineering problems, where the process leading to the decision to use CSP to program real-time systems is on trial. A new and hopefully objective decision process is instigated. What we have previously learnt by using CSP in embedded real-time process control is used as subjective basis (or bias?) for the new process. The conclusion seems to be that CSP should be sufficiently future-proof to justify its use even in new projects. The term "CSP" is here used as shorthand for both the CSP language proper and different implementations of subsets.

Hardware Infrastructure:

Synchronisation in a Multithreaded Processor
Shondip Sen, Henk Muller and David May (Computer Science, University of Bristol)

Abstract: A multithreaded architecture exploits instruction level parallelism by interleaving instructions from disjoint thread contexts. As each thread executes within its own instruction stream with private data (the context registers), there is no interdependency between instructions from different threads. This allows high resource utilisation of a super scalar pipelined processor at a very low cost, in terms of complexity and silicon area. A new synchronisation mechanism for a multithreaded architecture is outlined. Two new instructions have been introduced to perform one to one and n-way synchronisation. The operation allows synchronisations to be requested and actioned efficiently on chip in as little as four clock cycles. Barriers and CSP style channels can easily be constructed with this new synchronisation instruction. A brief examination of performance of this multithreaded architecture shows that the optimum number of contexts per multithreaded processing element is four, based on test programs.

Effective Caching for Multithreaded Processors
David May, James Irwin, Henk Muller and Dan Page (Computer Science, University of Bristol)

Abstract: Multithreaded architectures have been developed as a way to hide latencies in memory access, communication, and long pipelines. Caches have been developed to hide latencies and reduce memory bandwidth requirements. Caches do not work well in multithreaded environments, because threads unintentionally evict each others data and instructions. To enable effective use of caches in a multithreaded environment (giving high execution speed even in the context of high memory latencies), we propose to use a cache architecture where the cache can be divided into partitions. Each thread is assigned a set of partitions which are used to cache a view of data structures, or part of the instruction stream. The partition assignment is completely automated in the compiler. With our compiler and architecture, all forms of interference are eliminated and predictable execution of multithreaded programs is achieved in moderately sized caches.

occam on Field Programmable Gate Arrays - Optimising for Performance
Roger M.A. Peel (Electrical Engineering, University of Surrey)
Barry M. Cook (Computer Science, University of Keele)

Abstract: This paper shows how the parallel occam code for a graphics application has been compiled to run on a Field Programmable Gate Array (FPGA). This application has stringent timing constraints, and many optimisations to the sequencing of operations and occam constructs have been employed to meet them. In particular, two crucial pipelined processes are shown to operate with no control overhead despite containing parallel and looping constructs and channel communications.

Software Infrastructure:

Distributed Computing using Channel Communications in Java
A. Ripke, Alastair R. Allen and Y. Feng (Department of Engineering, University of Aberdeen, Scotland)

Abstract: Various methods of connecting CSP channels to external software systems are examined. The aim is to facilitate the implementation of distributed heterogeneous systems using channel communication. The paper concentrates on TCP/IP connections in a Java environment. The approaches used are: custom protocol; object serialization stream protocol; Remote Method Invocation (RMI); Common Object Request Broker Architecture CORBA). Practical systems are described and compared.

A Comparison of Linda Implementations in Java
George Wells, Peter Clayton (Computer Science, Rhodes University, South Africa)
Alan Chalmers (Computer Science, University of Bristol)

Abstract: This paper describes the implementation of an extended version of Linda in Java. The extensions have been made with a view to increasing the efficiency of the underlying communication mechanisms and the flexibility with which data may be accessed, particularly in the area of distributed multimedia applications. The system is compared with two other recent implementations of Linda for Java: JavaSpaces from Sun Microsystems, and TSpaces from IBM. The comparison is performed both qualitatively, comparing and contrasting the features of the systems, and quantitatively, using a simple communication benchmark program and a ray-tracing program to assess the performance and scalability of the different systems for networks of workstations.

Conditional Communication in the Presence of Priority
Gerald H. Hilderink and Jan F. Broenink (Control Laboratory, University of Twente, the Netherlands)

Abstract: In this paper the behavior of conditional communication in the presence of priority will be described. In the theory of CSP, conditional communications are expressed by a (external) choice construct (also known as the alternative construct) by which the choice of one communication out of a set of communications is non-deterministic. In practice, some communication can be more important than others, so that, the choice is deterministic. Therefore, one can distinguish two prioritized alternative constructs; a symmetric alternative construct (ALT) and an asymmetric alternative construct (PRIALT). Current formal semantics and implementations of the ALT and PRIALT are based on the priorities of communication and not related to the surrounding priorities of communicating processes. This can result in a structural mismatch that can cause performance problems. In this paper a practical solution for realizing fair and unfair conditional communication in the presence of the PAR and the PRIPAR will be discussed.

Using Java for Parallel Computing - JCSP versus CTJ
Nan C, Schaller (Computer Science, Rochester Institute of Technology, NY)
Gerald H. Hilderink (Control Laboratory, University of Twente, the Netherlands)
Peter H. Welch (Computing Laboratory, University of Kent)

Abstract: Java provides support for concurrent and parallel programming through threads, monitors, and its socket and Remote Method Invocation (RMI) classes. However, there have been many concerns expressed about the way in which this support is provided -- e.g. [Brinch-Hansen][Welch], citing problems such as an improper implementation of monitors and the difficulty of programming with threads. Hoare's Communicating Sequential Processes (CSP) model fully describes thread synchronization and is based on processes, composition and channel communication. It provides a mathematical notation for describing patterns of communication using algebraic expressions and contains formal proofs for analyzing, verifying and eliminating undesirable conditions, such as race hazards, deadlocks, livelock, and starvation. Two independent research efforts provide a CSP based process-oriented design pattern for concurrency implemented in Java: Java Communicating Sequential Processes (JCSP) and Communication Threads in Java (CTJ). In this paper, we compare these two packages, looking at the philosophy behind their development, their similarities, their differences, their performance and their use.

Blocking System Calls in KRoC/Linux
Frederick R.M. Barnes (Computing Laboratory, University of Kent)

Abstract: This paper describes an extension to Kent Retargetable occam Compiler (KRoC), which enables the execution of a blocking call, without blocking the occam-kernel. This allows a process to make a blocking system call (eg, read, write), without blocking processes running in parallel with it. Blocking calls are implemented using Linux clones which communicate using shared memory, and synchronise using kernel level semaphores. The usefulness of this is apparent in server applications with a need to handle multiple clients simultaneously. An implementation of an occam web-server is described, which uses standard TCP sockets via an occam socket library. The web-server comes with the ability to execute CGI scripts as well as dispensing static pages, which demonstrates some level of OS process management from within occam.
However, this mechanism is not limited to blocking in the Linux kernel. On multi-processor machines, the clones are quite free to be scheduled on different processors, allowing computationally heavy processing to be performed aside the occam world, but still with a reasonable level of interaction with it. Using them in this way provides a coarse-grained level of parallelism from within the fine-grained occam world.

An Experiment with Recursion in occam
David C. Wood (Computing Laboratory, University of Kent)

Abstract: An experimental version of KRoC has been written that implements recursion in occam, using Brinch Hansen's algorithm for allocating activation records. It shows that efficient parallel recursion is possible in occam.

Post-Mortem Debugging in KRoC
David C. Wood (Computing Laboratory, University of Kent)
Fred M.R. Barnes (Computing Laboratory, University of Kent)

Abstract: A simple post-mortem debugging facility has been added to KRoC, to identify and locate run-time errors, including deadlock. It has been implemented for the SPARC/Solaris and i386/Linux versions of KRoC.

Native JCSP - the CSP for Java library with a Low-Overhead CSP Kernel
James Moores (Computing Laboratory, University of Kent)

Abstract: The JCSP library provides a superior framework for building concurrent Java applications. Currently, CSP is a collection of classes that uses the standard Java Threads mechanism to provide low-level facilities such a process scheduling and synchronization. The overheads of using Java Threads can be quite large though, especially for synchronization and context switching.
This paper begins by describing various options for increasing performance, and then how the standard Java threads work. The integration of the low-overhead CCSP run-time system into a Linux-based Sun JDK 1.2.1 Java Virtual Machine is then described. This integration provides the low-level support required to dramatically increase the performance of the JCSP library's model of concurrency. The paper then looks at the problem of maintaining backward compatibility by preserving the functionality of the existing threads mechanism on which much legacy code depends.
The paper finishes by looking at the performance displayed by the current prototype JVM and contrasting it with the performance of both Green (co-operatively scheduled) and Native (operating-system scheduled) Java Threads.

libcsp - a Building mechanism for CSP Communication and Synchronisation in Multithreaded C Programs
Rick Beton (Roke Manor Research Ltd.)

Abstract: A C library is described which exists to support synchronisation and communication in the CSP model using the Posix Threads API as its basis. This differs from other notable approaches (e.g. [Moores:1999]) in that it uses Posix Threads. Whilst this approach has drawbacks, its main achievement is being easily adopted by people already familiar with Posix Threads and being useful to a wide range of target platforms.
libcsp is open-source, available under the GNU Library General Public License. See http://www.beton.freeserve.co.uk/libcsp/.

Tools:

The Automated Serialization of Concurrent CSP Scripts using Mathematica
Weiyang Zhou and G.S. Styles (Electrical and Computer Engineering, Utah State University)

Abstract: This report discusses the design and implementation of a package of Mathematica-based tools for serializing scripts describing the behavior of concurrent systems in CSP. Under some conditions, serialization can yield code that is more efficient than multi-process versions and can better meet processor constraints. Under most conditions concurrent design is easier and more reliable so we prefer to design concurrently and only implement serially when necessary. The tools include a set of definitions and procedures to automatically change concurrent code to serial code. Conversions of the following step laws have been implemented thus far: external choice, generalized parallel, and alphabetized parallel; additional laws will be implemented in the next month or so. The tools support the input of the more formal typeset notation of CSP, which makes the script writing intuitive, and can convert the typeset version to the machine-readable version required by software such as FDR. The correctness of the conversions has been checked with the FDR package. This package should be of use to both those regularly working on concurrent systems and those learning CSP.

Parallel Algorithms for Deadlock and Livelock Analysis of Concurrent Systems
Jeremy M.R. Martin (Oxford Supercomputing Centre)
Yvonne Huddart (Oxford University Computing Laboratory)

Abstract: Conventional model-checking techniques for analysing concurrent systems for deadlock or livelock are hampered by the problem of exponential state explosion: the overall number of global states that needs to be checked may grow exponentially with the number of component processes in the system. The state-of-the-art commercial tool FDR provides deadlock and livelock analysis but it is limited at present to analysing systems of up to one hundred million global states.
The Deadlock Checker tool is capable of analysing very much larger systems by taking certain short-cuts. But this is achieved at the cost of incompleteness -- there are certain deadlock-free and livelock-free networks that may not be proven so using that tool.
Here we investigate a different approach. We present parallelised model-checking algorithms for deadlock and livelock analysis and describe their implementation. The techniques are found to scale well running either on a conventional supercomputer or on a PC cluster.

CSP Design Model and Tool Support
H.J. Volkerink, Gerald H. Hilderink, Jan F. Broenink, W.A. Veroort and Andre W.P. Bakkers
(Control Laboratory, University of Twente, the Netherlands)

Abstract: The CSP paradigm is known as a powerful concept for designing and analysing the architectural and behavioural parts of concurrent software. Although the theory of CSP is useful for mathematicians, the programming language occam has been derived from CSP that is useful for any engineering practice. Nowadays, the concept of occam/CSP can be used for almost every object-oriented programming language. This paper describes a tree-based description model and prototype tool that elevates the use of occam/CSP concepts at design level and performs code generation for the level of implementation in Java, C/C++, and machine-readable CSP. The tool is a kind of browser that is able to assist modern workbenches (like Borland Builder, Microsoft Visual C++ and 20-SIM) with coding concurrency. The tree description model can be used to browse through the generated source code. The tool will guide the user through the design trajectory using supporting messages and several semantic and syntax rule checks. The machine-readable CSP can be read by FDR, enabling more advanced analysis on the design. Early experiments with the prototype tool show that the browser concept, combined with the tree description model, enables a user-friendly way to create a design using the CSP concepts and benefits. The design tool is available from our URL, http://www.rt.el.utwente.nl/javapp.

Applications:

A Self-Configuring Distributed Kernel for Satellite Networks
Scott Cannon and Larry Denys (Space Software Laboratory, Utah State University)

Abstract: The Space Software Laboratory is developing a self-configuring distributed kernel to be used on future satellite missions. The completion of this system will allow a network of heterogeneous processor nodes to communicate and broadcast in a scalable, self-configuring manner. Node applications software will be transparent to the underlying network architecture, message routing, and number of network nodes. Nodes may halt or be reset and later rejoin the network. The kernel will support in-flight programming of individual nodes.

Steering High-Performance Parallel Programs: a Case Study
Peter J. Love (Theoretical Physics, University of Oxford)
Jeremy M.R. Martin (Oxford Supercomputing Centre)

Abstract: Computational steering is the ability to visualise the data from a computation in progress and to modify the future behaviour of the computation in response to this. It is often perceived as being something very difficult to implement, especially for parallel computations. However, given a good visualisation environment, we have found that this is not necessarily the case. We have sought to dispel this myth, using a very simple model which makes it easy to 'wire-up' an existing MPI parallel program for steering. New insights may quickly be gained by continually monitoring and guiding the progress of computational simulations, that were perhaps previously analysed only in their final state.

A Cruise Control in occam based on an Implementation of KRoC on the Philips 8051 Microcontroller
Frank T.M. van Vugt (Vos-Bertens Consultants in Technology, the Netherlands)
Andre W.P. Bakkers (Control Laboratory, University of Twente, the Netherlands)

Abstract: This paper summarises the results of the realisation of a Cruise Control system in occam using an implementation of the Kent Retargettable occam Compiler (KRoC) on the Philips 8051 microcontroller.
The increase in complexity of systems designed comes with difficulties that can probably be overcome using concurrent programming languages. occam is such a language, originally developed for use with transputers. The KRoC initiative allows one to translate the transputer assembly produced from a program written in occam into the assembly of another processor. In this case, it was implemented for the Philips 8051 microcontroller, which is an 8-bits processor. The design and realisation in occam of the Cruise Control system of Yourdon demonstrate its proper functioning. The generated code is tested using a real-time in-circuit 8051 emulator and special hardware to represent car and interface. The design process using occam is compared to a regular solution using a language like C.
Since this port is the first of its kind inasmuch as it is not targeting 'large' processors like the SPARC, important conclusions can be drawn regarding the power of the CSP-concept. The port is not complete yet, future work on it is recommended.