From: Control Room <VERLE_L_KESZLER@rl.gov>
Newsgroups: comp.parallel.mpi,comp.parallel.pvm
Subject: Re: Dual- vs. single processor motherboards
Date: Sat, 13 Mar 1999 11:03:31 -0800
Organization: Pacific Northwest National Lab
Message-Id: <36EAB683.CA10BFCA@rl.gov>
References: <UTkG2.4447$_n2.99958@carnaval.risq.qc.ca>
Reply-To: VERLE_L_KESZLER@rl.gov
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Xref: ukc comp.parallel.mpi:4744 comp.parallel.pvm:8114


Michael

As I understand it you will lose somewhat on the memory contention at each dual
processor. The pvm setup I use is 1 dual processor and 5 pII 300-400s and it
works to have a task on all as well as control in the dual. One thing I have not
tried is overloading ( more than one pvm on a machine) but this may be the way
to go in a multi - dual system.

crb@owt.com


Michael Guevara wrote:

> We are planning to build a cluster made up of 16 Pentium II (350 MHz) CPUs.
> We intend to use Linux RedHat as operating system and to run mostly PVM and
> some MPI.
>
> At Montreal prices, we can save about Canadian $4K by using 8 dual-CPU
> boards
> rather than 16 single-CPU boards (overall system price C$20K vs. C$24K).
> Another advantage of using dual-CPU boards is that, when operating in
> non-cluster
> mode, one then has access to individual workstations that are more powerful.
>
> Does it make any sense to use dual-CPU machines? Will there be any penalty?
>
> For example, is it harder to write PVM or MPI code for a dual-processor
> machine?
>
> Will overall computing speed for network of 8 dual-processor machines be
> reduced
> in comparison with network of 16 single-processor machines?
>
> Our application involves numerically integration of a partial differential
> equation (a nonlinear cable equation of the reaction-diffusion sort).
> On a network of 16 single-CPU machines, one basically ends up partitioning
> a matrix into submatrices, with each of the 16 processors doing almost
> exactly the same amount of computation at each iteration before entering
> the message-passing phase.  At present, less than 10% of the total
> computation
> time is spent in message-passing (using PVM in a network of single-CPU
> machines),
> with node i sending and receiving info from nodes i-1 and i+1 at the end of
> each numerical time-step.
>
> I can visualize two scenarios when using a dual-processor motherboard:
> (1) partition the problem into 16 parts, with each of the two CPUs on each
> board
> working independently on different submatrices, so that each CPU computes
> 1/16 of
> the overall problem;
> (2) partition the problem into 8 parts, with the two CPUs on each
> motherboard somehow
> working jointly on a submatrix representing 1/8 of the problem size.
>
> In the first case a typical CPU (node i) would be communicating with the
> other CPU on its
> own board (node i-1) and another CPU off-board (node i+1) via Fast Ethernet
> 100 Mbps, full-duplex.  Unfortunately, it's my guess that any increased
> inter-processor communication speed within the
> dual-processor board (with respect to Ethernet) would probably not result in
> savings in overall
> computation time, since node i would still have to wait on node i+1 (which
> is on another
> board) via Ethernet.
>
> Two questions:
> (1) are both scenarios above possible?
> (2) if so, which one is better from point of view of:
>           (a) overall computation speed?
>           (b) ease of PVM/MPI programming?
>
> Thanks for any help in the above.
>
> Michael Guevara
> Department of Physiology
> McGill University

