Communicating Process Architectures - 2002

The University of Reading (England)

Sunday 15th. September (evening) through Wednesday 18th. September (lunchtime)

Academic Programme

[Note: the categories listed below are not definitive - many papers fit into several]

Invited Talk:

How to SPIN (to analyse real-life CPAs)
Theo C. Ruys (University of Twente, The Netherlands)

Abstract: SPIN is a model checker for the verification of software systems. During the last decade, SPIN has been successfully applied to trace logical design errors in distributed systems, such as operating systems, data communications protocols, switching systems, concurrent algorithms, railway signaling protocols, etc. SPIN checks the logical consistency of a specification; it reports on deadlocks, unspecified receptions, flags incompleteness, race conditions, and unwarranted assumptions about the relative speeds of processes. SPIN is considered to be one of the most powerful and advanced model checkers (freely) available today. SPIN is widely distributed and has a large user base. Moreover, last year, SPIN received the prestigious ACM System Award.
SPIN uses a high level language called PROMELA (PROcess MEta LAnguage) to specify systems descriptions. The tutorial gives a crash-course introduction to PROMELA and presents an overview of the validation and verification features of SPIN. The material will be illustrated by several demonstrations using XSpin, the graphical user interface to SPIN. The second part of the tutorial discusses guidelines to construct efficient PROMELA models and shows how to use SPIN in the most effective way. Topics to be discussed include: SPIN's optimisation algorithms, directives and options to tune SPIN, aggressive PROMELA modelling, literate modelling, using SPIN as a debugger, validation management and rules of thumb.
CV: Theo Ruys is an assistant professor in the Formal Methods and Tools group at the University of Twente, The Netherlands, headed by Professor Ed Brinksma. He received his MSc in computer science in 1995 on the design and implementation of a retargetable code-generator for TCGS, the in-house compiler generator used in Twente. In 2001, he received his PhD under supervision of Ed Brinksma. His PhD Thesis "Towards Effective Model Checking" investigates methods and techniques to improve the effectiveness of the model checking process with the goal to reliably apply model checking technology `in the large'. The research can be classified by three themes: methodology, pragmatics and true concurrency. The pragmatics part is devoted to the model checker SPIN, where several techniques and methods are discussed to construct efficient PROMELA models and to use SPIN in the most effective way.

Theory:

Semantics of prialt in Handel-C (tm)
Andrew Butterfield (Dublin University, Ireland)
Jim Woodcock (Computing Laboratory, University of Kent, UK)
Pages 1-16.

Abstract: This paper discusses the semantics of the prialt construct in Handel-C. The language is essentially a static subset of C, augmented with a parallel construct and channel communication, as found in CSP. All assignments and channel communication events take one clock cycle, with all updates synchronised with the clock edge marking the cycle end. The behaviour of prialt in Handel-C is similar in concept to that of occam, and to the p-priority concept of Adrian Lawrence's CSPP . However, we have to contend with both input and output guards in Handel-C, unlike the situation in occam, although prialts with conflicting priority requirements are not legal in Handel-C. This makes our problem simpler than the more general case including such conflicts considered by Lawrence. We start with an informal discussion of the issues that arise when considering the semantics of Handel-C's prialt construct. We define a resolution function (R) that determines which requests in a collection of prialts become active. We describe a few properties that we expect to hold for resolution, and discuss the issue of compositionality.

Acceptances, Behaviours and Infinite Activity in CSPP
Adrian Lawrence (Department of Computer Science, Loughborough University, UK)
Pages 17-38.

Abstract: The denotational semantics presented here defines an extension of CSP called CSPP. It includes a full description of infinite behaviour in one simple model using only finite traces. This is true for both finite and infinite alphabets. The structure is a complete lattice, and so also a complete partial order, under refinement. Thus recursion is defined by fixed points in the usual way. It is also a complete restriction metric space so this provides an alternative treatment of recursion for contraction mappings.

HCSP: Imperative State and True Concurrency
Adrian Lawrence (Department of Computer Science, Loughborough University, UK)
Pages 39-56.

Abstract: HCSP is an extension of CSPP which captures the semantics of hardware compilation. Because it is a superset of CSPP, it can describe both hardware and software and so is useful for co-design. The extensions beyond CSPP include: true concurrency; new hardware constructors; and a simple and natural way to represent imperative state. Both CSPP and HCSP were invented to cope with problems that arose while the author was trying to prove that the hardware that he had designed correctly implemented channels between a processor and an FPGA. Standard CSP did not capture priority, yet the circuits in the FPGA and the occam processes in the transputer both depended on priority for their correctness. The attempt to extend CSP rigorously to handle such problems of co-design has led to develoments that seem to have a much wider significance including a new way of unifying theories for imperative programming. This paper reports on the current state of HCSP and focuses on handling imperative state and true concurrency. The acceptance denotational semantics is described briefly.

A Predicate Transformer Semantics for a Concurrent Language of Refinement
Ana Cavalcanti and Jim Woodcock (Computing Laboratory, University of Kent, UK)
Pages 157-176.

Abstract: Circus is a combination of Z and CSP; its chief distinguishing feature is the inclusion of the ideas of the refinement calculus. Our main objective is the definition of refinement methods for concurrent programs. The original semantic model for Circus is Hoare and He's unifying theories of programming. In this paper, we present an equivalent semantics based on predicate transformers. With this new model, we provide a more adequate basis for the formalisation of refinement and verification-condition generation rules. Furthermore, this new framework makes it possible to include logical variables and angelic nondeterminism in Circus. The consistency of the relational and predicate transformer models gives us confidence in their accuracy.

View-Centric Reasoning for Linda and Tuple Space Computation
Marc L. Smith (Computer Science Department, Colby College, USA)
Rebecca J. Parsons (ThoughtWorks, Inc., Chicago, USA)
Charles E. Hughes (School of EE and CS, University of Central Florida, USA)
Pages 233-264.

Abstract: In contrast to sequential computation, concurrent computation gives rise to parallel events. Efforts to translate the history of concurrent computations into sequential event traces result in the potential uncertainty of the observed order of these events. Loosely coupled distributed systems complicate this uncertainty even further by introducing the element of multiple imperfect observers of these parallel events. Properties of such systems are difficult to reason about, and in some cases, attempts to prove safety or liveness lead to ambiguities. We present a survey of challenges of reasoning about properties of concurrent systems. We then propose a new approach, view-centric reasoning, that avoids the problem of translating concurrency into a se-quential representation. Finally. we demonstrate the usefulness of view-centric reasoning as a framework for disambiguating the meaning of tuple space predicate operations, versions of which exist commercially in IBM?s T Spaces and Sun?s JavaSpaces.

On the Complexity of Buffer Allocation in Message Passing Systems
Alex Brodsky, Jan Bækgaard Pedersen and Alan Wagner (Department of Computer Science, University of British Columbia, Canada)
Pages 89-106.

Abstract: In modern cluster systems message passing functionality is often off-loaded to the network interface card for efficiency reasons. However, this limits the amount of memory available for message buffers. Unfortunately, buffer insufficiency can cause an otherwise correct program to deadlock, or at least slow down. Hence, given a program trace from an execution in an unrestricted environment, determining the minimum number of buffers needed for a safe execution is an important problem. We present three related problems, all concerned with buffer allocation for safe and efficient execution. We prove intractability results for the first two problems and present a polynomial time algorithm for the third.

Design and Modelling:

A Graphical Modeling Language for Specifying Concurrency based on CSP
Gerald Hilderink (Department of Electrical Engineering - Control Laboratory, University of Twente, The Netherlands)
Pages 265-294.

Abstract: In this paper, we introduce a graphical specification language for modeling concurrency in software design. The language notations are derived from CSP and the resulting designs form so-called CSP diagrams. The notations reflect data-flow aspects, control-flow aspects, and they reflect CSP algebraic expressions that can be used for formal analysis. The designer does not have to be aware of the underlying mathematics. The techniques and rules presented provide guidance to the development of concurrent software architectures. One can detect and reason about compositional conflicts (errors in design), potential deadlocks (errors at run-time), and priority inversion problems (performance burden) at a high level of abstraction. The CSP diagram collaborates with object-oriented and structured methods. The CSP diagram is UMLable and can be presented as a new process diagram for the UML to capture concurrent, real-time, and event-flow oriented software architectures. The extension to the UML is not presented in this paper.

Consolidating The Agreement Problem Protocol Verification Environment
James Pascoe and Roger Loader (Department of Computer Science, University of Reading, UK)
Pages 57-78.

Abstract: The Agreement Problem Protocol Verification Environment (APPROVE) has become a mature and robust platform for the automated verification of proposed solutions to agreement problems. APPROVE is based on the Spin model checker and leverages the literate programming tool noweb to provide Promela code supplied with L A T E X documentation. The APPROVE discussion opened in Communicating Process Architectures 2001 and described the initial project phases and summarised some preliminary results. This paper presents a follow up, providing a canonical report on the development, application and insight gained from the project.

Languages, Libraries and Kernels (soft and hard):

Prioritised Dynamic Communicating Processes - Part I
Fred Barnes and Peter Welch (Computing Laboratory, University of Kent)
Pages 331-362.

Abstract: This paper reports continuing research on language design, compilation and kernel support for highly dynamic concurrent reactive systems. The work extends the occam multiprocessing language, which is both sufficiently small to allow for easy experimentation and sufficiently powerful to yield results that are directly applicable to a wide range of industrial and commercial practice. Classical occam was designed for embedded systems and enforced a number of constraints - such as statically pre-determined memory allocation and concurrency limits - that were relevant to that generation of application and hardware technology. Most of these constraints have been removed in this work and a number of new facilities introduced (channel structures, mobile channels, channel ends, dynamic process creation, extended rendezvous and process priorities) that significantly broaden occam?s field of application and raise the level of concurrent system design directly supported. Four principles were set for modifications/enhancements of the language. They must be useful and easy to use. They must be semantically sound and policed (ideally, at compile-time) to prevent mis-use. They must have very lightweight and fast implementation. Finally, they must be aligned with the concurrency model of the original core language, must not damage its security and must not add (significantly) to the ultra-low overheads. These principles have all been observed. All these enhancements are available in the latest release (1.3.3) of KRoC, freely available (GPL/open source) from: http://www.cs.ukc.ac.uk/projects/ofa/kroc/ .

Prioritised Dynamic Communicating Processes - Part II
Fred Barnes and Peter Welch (Computing Laboratory, University of Kent)
Pages 363-380.

Abstract: This paper illustrates the work presented in 'Part I', giving additional examples of use of channel-types, extended rendezvous and FORKs that lean towards real applications. Also presented are a number of other additions and extensions to the occam language that correct, tidy up or complete facilities that have long existed. These include fixing the PRI ALT bug, allowing an unconditional SKIP guard as the last in a PRI ALT, replicator STEP sizes, run-time computed PAR replication counts, RESULT parameters and abbreviations, nested PROTOCOL definitions, inline array constructors and parallel recursion. All are available in the latest release (1.3.3) of KRoC, freely available (GPL/open source) from: http://www.cs.ukc.ac.uk/projects/ofa/kroc/ .

The "Honeysuckle" Programming Language: Event and Process
Ian East (School of Computing and Mathematical Sciences, Oxford Brookes University, UK)
Pages 295-310.

Abstract: A new language for programming systems with Communicating Process Architecture [1] is introduced which builds upon the success of occam [2]. Some of the principal objectives are presented and justified. The means employed to express behaviour are then described, including a transfer primitive, which conveys object ownership as well as value [3], and an alternation construct. The latter replaces PRI PAR and PRI ALT, and affords explicit expression of conflict-free prioritized reactive (event-driven) behaviour, including exception response [4]. HPL also offers source-code modularity, object encapsulation, and the recursive definition of both object and process. Despite such ambition, a primary aim has been to retain simplicity in abstraction, expression, and implementation.

The "Honeysuckle" Programming Language: Object and Protocol
Ian East (School of Computing and Mathematical Sciences, Oxford Brookes University, UK)
Pages 311-320.

Abstract: Applications are demanding greater concurrency, higher integrity, and more reactive behaviour. There is a need to meet this demand without a requirement for additional tools and skills. To this end, HPL adds to occam's inherent security by enforcing formal design rules which guarantee deadlock-freedom [1]. Service protocol is introduced to constrain inter-process communication. Recursive data type definition [2] is introduced to allow a little object-oriented programming (OOP) within communicating process architecture (CPA) [3] and restore the balance between process and object abstraction.

Synchronous Active Objects Introduce CSP's Primitives in Java
Claude Petitpierre (Laboratoire de Téléinformatique, EPFL,Switzerland)
Pages 119-132

Abstract: This paper presents a proposal of a language, based on synchronous active objects that introduces the CSP primitives into Java. The proposal does not use channels to realise the inter-process communications, but is shown to offer the same expressive power as the channel based solutions. The paper also shows that the rendezvous defined by CSP or our synchronous objects are very well adapted to the industrial development of event driven applications, handling simultaneously GUIs and accesses to remote applications.

Cluster Computing and JCSP Networking
Brian Vinter (Department of Mathematics and Computer Science, University of Southern Denmark, Denmark)
Peter Welch (Computing Laboratory, University of Kent at Canterbury, UK)
Pages 213-232.

Abstract: Hoare's algebra of Communicating Sequential Processes (CSP) enables a view of systems as layered networks of concurrent components, generating and responding to events communicated to each other through channels, barriers and other (formally defined) synchronisation primitives. The resulting image and discipline is close to hardware design and correspondingly easy to visualise, reason about, compose and scale. JCSP is a library of Java packages providing an (occam) extended version of this model that may be used alongside, or as a replacement for, the very different threads-and-monitors concurrency mechanisms built into Java. The current release (JCSP 1.0) supports concurrency within a single Java Virtual Machine (which may be multi-processor). This paper reports early experiments with JCSP.net, an extension of JCSP for the dynamic construction of CSP networks across distributed environments. The aims of JCSP.net are to simplify the construction and programming of dynamically distributed and parallel systems. It provides high-level support for CSP architectures, unifying concurrency logic within and between processors. The experiments are on some classical HPC problems, an area of work for which JCSP.net was not primarily designed. However, low overheads in the supporting infrastructure were a primary consideration - along with an intuitive and high-level distributed programming model (based on CSP). Results reported show JCSP holding up well against - and often exceeding - the performance obtained from existing tools such as mpiJava and IBM?s TSpaces. The experimental platform was a cluster of 16 dual-processor PIII Linux machines. It is expected that future optimisations in the pipeline for the JCSP.net infrastructure will improve the results presented here. JCSP and JCSP.net were developed at the University of Kent.

Configurable Collective Communication in LAM-MPI
John Markus Bjorndalen, Otto J. ANSHUS and Tore LARSEN (Department of Computer Science, University of Tromsø, Norway)
Brian Vinter (Department of Mathematics and Computer Science, University of Southern Denmark, Denmark)
Pages 133-144.

Abstract: In an earlier paper, we observed that PastSet (our experimental tuple space system) was 1.83 times faster on global reductions than LAM-MPI. Our hypothesis was that this was due to the better resource usage of the PATHS framework (an extension to PastSet that supports orchestration and configuration) due to a mapping of the communication and operations which matched the computing resources and cluster topology better. This paper reports on an experiment to verify this and represents on-going work to add some of the same configurability of PastSet and PATHS to MPI. We show that by adding run-time configurable collective communication, we can reduce the latencies without recompiling the application source code. For the same cluster where we experienced the faster PastSet, we show that Allreduce with our con-figuration mechanism is 1.79 times faster than the original LAM-MPI Allreduce. We also experiment with the configuration mechanism on 3 different cluster platforms with 2-, 4-, and 8-way nodes. For the cluster of 8-way nodes, we show an improvement by a factor of 1.98 for Allreduce.

Java PastSet - A Structured Distributed Shared Memory System
Kei Simon Pedersen and Brian Vinter (Department of Mathematics and Computer Science, University of Southern Denmark, Denmark)
Pages 107-118.

Abstract: The architecture and performance of a Java implementation of a structured distributed shared memory system, PastSet, is described. The PastSet abstraction allows programmers to write applications that run efficiently on different architectures, from clusters to widely distributed systems. PastSet is a tuple-based three-dimensional structured distributed shared memory system, which provides the programmer with operations to perform causally ordered reads and writes of tuples to a virtual structured memory space called the PastSet. It has been shown that the original, native code, PastSet was able to outperform MPI and PVM when running real applications and we show that the design translates into Java so that Java PastSet is a qualified competitor to other cluster application programming interfaces for Java.

Cache-Affinity Scheduling for Fine Grain Multithreading
Kurt Debattista, Kevin Vella and Joseph Cordina (Department of Computer Science and Artificial Intelligence, University of Malta, Malta)
Pages 145-156.

Abstract: Cache utilisation is often very poor in multithreaded applications, due to the loss of data access locality incurred by frequent context switching. This problem is compounded on shared memory multiprocessors when dynamic load balancing is introduced and thread migration disrupts cache content. In this paper, we present a technique, which we refer to as ?batching?, for reducing the negative impact of fine grain multithreading on cache performance. Prototype schedulers running on uniprocessors and shared memory multiprocessors are described, and finally experimental results which illustrate the improvements observed after applying our techniques are presented.

Reconnetics: A System for the Dynamic Implementation of Mobile Hardware Processes in FPGAs
Ralph Moseley (Computing Laboratory, University of Kent at Canterbury, UK)
Pages 177-190.

Abstract: The capacity to utilise FPGA designs in such a way that they are compositional, mobile, and ?interactive?, offers many new possibilities. This holds true for both research and commercial applications. With the system described here, hardware becomes as easy to distribute as software and the mediating links between the two domains allow for manipulation of physical resources in real time. Designs are no longer monolithic ?images? downloaded at one time, but mobile entities that can be communicated over a distance and dynamically installed at run-time many times and at many places. Such twin domain designs can be as complex as a processor, or as small in scale as a few logic gates. The run-time system, Reconnetics, provides an environment of high-level control over such elements, which requires little knowledge of the underlying hardware technology.

Applications:

Performance Analysis and Behaviour Tuning for Optimisation of Communicating Systems
Mark Green and Ali E. Abdallah (Centre for Applied Formal Methods, South Bank University, UK)
Pages 191-200.

Abstract: Improving performance is the main driving force behind the use of parallel systems. Models for performance evaluation and techniques for performance optimisation are crucial for effectively exploiting the computational power of parallel systems. This paper focuses on methods for evaluating the performance of parallel applications built from components using the software architecture methodology. Minor differences in the low-level behaviour of functionally equivalent processing elements can have a dramatic effect upon the performance of the overall system; this confounds attempts to predict performance at any step prior to implementation. VisualNets is a tool supporting the construction and graphical manipulation of interacting systems built from components. It makes use of specifications in the formal method CSP, which enables the relevant component behaviours and the linkage between components to be concisely described. The tool allows the behaviour of the whole system over time, and the patterns of interaction between the components, to be visualised through graphical animation. The graphical display produced facilitates the analysis and evaluation of performance, and highlights areas where performance could be improved via better utilisation of parallel resources. VisualNets also allows the timing properties of the components and of the architecture that underlies them to be changed, to represent different component implementations or platform configurations. A case study, based on the dual pipeline architecture, is presented to show how the graphical animation capability of VisualNets can be used, firstly to evaluate performance, and secondly to guide the development of more efficient versions of the parallel system.

Configuration Discovery and Mapping of a Home Network
Keith Pugh (Computer Science Department, Keele University, UK)
Pages 201-212.

Abstract: A home network comprising many heterogeneous devices requires a scaleable interconnect capable of satisfying the often vastly different network resource demands of the devices. In addition, it is likely that over time the configuration of the network will change as new devices are added to the network and older ones removed or replaced. While it is acceptable for enterprise networks to be managed by a trained network administrator it is unreasonable to expect a home owner to have such expertise. Consequently, a free topology network coupled with a capacity for plug and play that allows nodes to be added anywhere, and at any time without interruption to the operation of the networked system are essential requirements. IEEE 1355 standard technology has the potential to satisfy these criteria. The demand for a free topology and a capacity for plug and play require that the configuration of the network is re-discovered and mapped automatically, and at regular intervals. This paper describes such a configuration mapping process.

A Communicating Threads Case Study: JIWY
Dusko Jovanovic, Gerald Hilderink, Jan Broenink (Department of Electrical Engineering, University of Twente, The Netherlands)
Pages 321-330.

Abstract: We strive to allow a control system designer the power of designing control computer code concurrently and generating it efficiently, in spite of her or his lack of skills in software engineering. This is important for many reasons. Nowdays, it is impossible to separate control engineering from software engineering. There is no way of implementing control strategies other than to transform them into computer code for the chosen processing target. Usually, there are not so many general-purpose programmers available in control engineering research teams in companies and universities. In these cases, software development techniques suffer from insufficient knowledge in disciplines of software modelling, concurrency, reusability and testing. This paper develops a case study whose solution is based upon the CSP principles supported by the Communicating Threads libraries developed at Twente and argues why the techniques are accessible to non-computing-specialist control engineers..

Implementing a Distributed Algorithm for Detection of Local Knots and Cycles in Directed Graphs
Geraldo Pereira de Souza and Gerson Henrique Pfitscher (Department of Computer Science, University of Brasilia, Brazil)
Pages 381-396.

Abstract: In general, most deadlocks take form of cycles (in database systems) and knots (in communication systems). Boukerche and Tropper have proposed a distributed algorithm to detect cycles and knots in generic graphs. Their algorithm has a message complexity of 2m vs. (at least) 4m for the Chandy and Misra algorithm, where m is the number of links in the graph, and requires O (n log n) bits of memory, where n is the number of nodes. We have implemented Boukerche´s algorithm. Our implementation of the algorithm is based on the construction of processes of the CSP model. The implementation was done using JCSP, an implementation of CSP for Java.