From: Srinidhi <srinidhi@ece.neu.edu>
Newsgroups: comp.parallel.pvm
Subject: Re: PVM Performance on SGI Power Challenge
Date: Fri, 04 Dec 1998 10:30:23 -0500
Organization: Northeastern University, Boston
Message-Id: <3668000F.D7A1A3DA@ece.neu.edu>
References: <36675533.74EFF212@its-gipps1.cc.monash.edu.au>
Mime-Version: 1.0
Content-Type: multipart/alternative;
    boundary="------------763EC3DCDE346843CE41E400"


--------------763EC3DCDE346843CE41E400
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi David,

One reason you see that performance  drop is the time it takes to
transmit A and B. If the slaves could generate
the partial As aand Bs themselves or read them from a file, you should
see a significant speed-up.

You'll find it useful to check out the NetPIPE: A Network Protocol
independent performance evaluator by
Snell, Mikler and Gustafson from Ames laboratory in Iowa.  It lets you
determine the optimum block size for transmission so that you extract
the most  bandwidth out of the system. Breaking up a job into smaller
and smaller pieces does not automatically mean more and more speed-up. I
don't have their URL, but it was posted on this newsgroup recently.
Worth an effort, I would think ....

-Srinidhi


Dear all,

> I have written a simple matrix multiplication program to run on SGI
> Power Challenge using PVM. It is a master-slave model in which the
> master would read [A] and [B] matrices from the input file and
> broadcast it to all slaves, each slave perform (nrow / nproc) of
> calculation). After each multiplication is done, the slave would send
> the calculated element back to master. Finally, the master program
> would store the [C] matrix on an output file.
>
> The reason I choose PVM for parallel programming is to shorten my
> processing time for very large calculations, hopefully about two to
> three times faster than job done by single processor. In this case, it
> takes about 3 minutes to complete the matrix multiplication of 500 X
> 500 matrix size without PVM. However when using PVM, it require 30 -
> 40 minutes to get the job done. The speed-up (single processing time /
> parallel processing time) for PVM is 0.38, which is about 3 times
> slower than job done on single processor.
>
> The following question I like to know are:
> 1.  What could the possible reasons that causes the slow down of PVM
> performance?
>
> 2. Is there any possible remedy?
>
> 3. I have tried to spawn 2 slaves on a single architect
> (multiprocessors machine) using the routine:
>            pvm_spawn(slavename, (char**)0, 2, "SGI64", nproc, tids);
>
>    Is it true that 2 slaves would be created that could  run on 2
> separate CPUs on a multiprocessor environment?
>
> 4. When more slaves are spawn in PVM, say...4 slaves. The speed-up
> should get better because more CPUs are used and less calculations are
> done by each slave. But the processing time I'm getting is even slower
> than that of two slaves. May I know the reasons why is it so?
>
> I'll be appreciate if you can provide comments on the questions asked.
> Thank you.
>
> David Leong
>


--------------763EC3DCDE346843CE41E400
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<HTML>
Hi David,

<P>One reason you see that performance&nbsp; drop is the time it takes
to transmit A and B. If the slaves could generate
<BR>the partial As aand Bs themselves or read them from a file, you should
see a significant speed-up.

<P>You'll find it useful to check out the NetPIPE: A Network Protocol independent
performance evaluator by
<BR>Snell, Mikler and Gustafson from Ames laboratory in Iowa.&nbsp; It
lets you determine the optimum block size for transmission so that you
extract the most&nbsp; bandwidth out of the system. Breaking up a job into
smaller and smaller pieces does not automatically mean more and more speed-up.
I don't have their URL, but it was posted on this newsgroup recently. Worth
an effort, I would think ....

<P>-Srinidhi
<BR>&nbsp;

<P>Dear all,
<BLOCKQUOTE TYPE=CITE>

<P>I have written a simple matrix multiplication program to run on SGI
Power Challenge using PVM. It is a master-slave model in which the master
would read [A] and [B] matrices from the input file and broadcast it to
all slaves, each slave perform (nrow / nproc) of calculation). After each
multiplication is done, the slave would send the calculated element back
to master. Finally, the master program would store the [C] matrix on an
output file.

<P>The reason I choose PVM for parallel programming is to shorten my processing
time for very large calculations, hopefully about two to three times faster
than job done by single processor. In this case, it takes about 3 minutes
to complete the matrix multiplication of 500 X 500 matrix size without
PVM. However when using PVM, it require 30 - 40 minutes to get the job
done. The speed-up (single processing time / parallel processing time)
for PVM is 0.38, which is about 3 times slower than job done on single
processor.

<P>The following question I like to know are:
<BR>1.&nbsp; What could the possible reasons that causes the slow down
of PVM performance?

<P>2. Is there any possible remedy?

<P>3. I have tried to spawn 2 slaves on a single architect (multiprocessors
machine) using the routine:
<BR><I>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; pvm_spawn(slavename,
(char**)0, 2, "SGI64", nproc, tids);</I>
<BR>&nbsp;
<BR>&nbsp;&nbsp; Is it true that 2 slaves would be created that could&nbsp;
run on 2 separate CPUs on a multiprocessor environment?

<P>4. When more slaves are spawn in PVM, say...4 slaves. The speed-up should
get better because more CPUs are used and less calculations are done by
each slave. But the processing time I'm getting is even slower than that
of two slaves. May I know the reasons why is it so?

<P>I'll be appreciate if you can provide comments on the questions asked.
Thank you.

<P>David Leong
<BR>&nbsp;</BLOCKQUOTE>
&nbsp;</HTML>

--------------763EC3DCDE346843CE41E400--