From: Simon Thorpe <thorpe@cerco.ups-tlse.fr>
Newsgroups: comp.parallel.pvm
Subject: Multiprocessor PCI boards
Date: Thu, 03 Sep 1998 13:09:11 +0100
Organization: CNRS
Message-Id: <35EE86DC.2DFE@cerco.ups-tlse.fr>
Reply-To: thorpe@cerco.ups-tlse.fr
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Hi,

	Over the past few months I have been in discussion with Neil Carson
from Causality Ltd. http://www.causality.com/ in the UK about the
possibility of developping hardware for doing neural network
simulations. Right now, the project is looking very promising,  and Neil
has said that he would be happy to go ahead and build a first batch of
25 boards as soon as he can be reasonably confident that there will be
enough buyers. I myself will be buying 8-10 boards, but we need a few
other interested people to get the project off the ground. I was
wondering whether anyone on the comp-neuro mailing list might be
interested.

	Briefly, what we are proposing is the following. The basic board will
be a standard PCI board that could be used in PCs, Macs or indeed any
other computer with PCI slots in it. Each board would be fitted with 6
processor modules, three on each side, which would plug into standard
SODIMM memory slots (Small Outline Dual In-Line Memory Modules), the
same type of slots that are used for memory expansion on laptop
computers. Each of these daughter boards would have a StrongARM  SA-110
micro-processor, a Digital 21285-A PCI bridge circuit, and 32 Mbytes of
SDRAM. In effect, each board would be a computer in its own right, and
would run a version of Unix (Linux or NetBSD). It would also have an IP
address, allowing messages to be sent efficiently via the PCI bus from
processor to processor. Neil and the other programmers at Causality will
look after the message passing mechanisms, probably using I20 protocols
(if you know what that is).

	In our case, we want to use these boards to run a parallel version of
SpikeNET, our asynchronous spiking neuronal network simulator. It turns
out that this particular program would parallelise very nicely on such a
system, because the communication bandwidth between processors is kept
very low. If you're interested in SpikeNET, just let me know - we're
seriously thinking about making the program available to whoever wants
to use it. But in fact, the same hardware could also be used for any
sort of parallel program that could be run using PVM or MPI message
passing protocols.

	There are some limitations though. The StrongARM SA-110 doesn't have a
floating point unit, but as long as you only want to do integer
calculations, it goes like a rocket. Our own software (which is integer
only) runs as fast on the StrongARM as on a Pentium II of the same clock
speed. This is really impressive, since the StrongARM doesn't have a
second level cache (unlike the Pentium II) and the code has (as yet) not
been optimised for the StrongARM at all. 

	The reasons for choosing the StrongARM are pretty straightforward. It
is very small, doesn't get hot (< I Watt) and is cheap. This means that
it becomes perfectly feasible to imagine packing large numbers of
StrongARMs in a very small space without having to worry about
overheating (imagine trying to do the same thing with Pentium IIs). In
addition, although the future of StrongARM was once in doubt (it was
co-developped by Digital and Advanced Risc Machines), it has now been
bought up by Intel who have recently announced that they will be
investing heavilly in StrongARM development. See
http://developer.intel.com/design/strong/ for details.

	The current top-of-the-range StrongARM runs at 233 MHz, and this is
what Neil Carson is proposing to use in this first batch. However, in
the not to distant future there will be 400 MHz StrongARMs with 100 MHz
SDRAM memory busses. And there will be a new StrongARM processor (the
SA-1500) which will have a separate floating point unit for multimedia
operations. One of the nice features of this daughter board arrangement
is that it would be pretty simple and cost effective to do a new batch
of boards using whatever the best technology is at that moment. Another
advantage of using this sort of parallel hardware is that  even last
years technology will still be useful to you - not like conventional PCs
where you feel that you have to buy a new computer every six months if
you don't want to be obsolete.

	So, what about prices you may be saying. Well, if you are interested it
should be possible to do such a board for 1200 pounds ($2000) on this
first run.  Each board would only take one PCI slot, so with four free
PCI slots you could put up to 24 processors in a single PC! If we can
round up enough interested people, we should be able to get the boards
done in about 2 months. Please note that I am not personally going to
making any money on this, and Causality are only expecting to break even
on it. However, both Neil and I are confident that this could be a
really promising approach - we just need to get enough support to get
the ball rolling. Obviously, the more people that are interested, the
cheaper it gets....

	If you want more information, don't hesitate to contact either me at
thorpe@cerco.ups-tlse.fr or Neil at neil@causality.com.

	Best wishes

	Simon Thorpe

