From: Clark Dorman <clark@s3i.com>
Newsgroups: comp.parallel.pvm
Subject: Re: PVM Performance on SGI Power Challenge
Date: 04 Dec 1998 10:00:19 -0500
Organization: PSINet
Message-Id: <d90goc5q4.fsf@s3i.com>
References: <36675533.74EFF212@its-gipps1.cc.monash.edu.au>
Cc: dleong@its-gipps1.cc.monash.edu.au


David Leong <dleong@its-gipps1.cc.monash.edu.au> writes:
[snip matrix multiplication problem]

> The following question I like to know are:
> 1.  What could the possible reasons that causes the slow down of PVM
> performance?

o you are taking a very long time to pack the information into the pvm
  message, or

o your network is hideously slow for you packet size, or

o you are waiting on messages when you do not need to, or 

o your algorithm is faulty.
 

> 2. Is there any possible remedy?

You really need to figure out exactly why it taking so long.  To be
honest, it makes no sense to me whatsoever that a simple matrix
multiplication could take any longer with more processors under PVM.
Also, how fast is an SGI Power Challenge?  I'm on a SUN Sparc 20, and
a naive 500x500 multiplication takes 73 seconds.

Are you using C or C++?  On way to figure out where the slow-down is
is to use a performance monitoring tool.  However, when I am having
time problems, I add the following to the top of my files:

//----------------------------------------------------------------------
// These are c-code abominations for quicky timing
//----------------------------------------------------------------------
#include <time.h>
#include <sys/types.h>
struct timeval tv1, tv2;
#define TIMER_CLEAR     (tv1.tv_sec = tv1.tv_usec = tv2.tv_sec = tv2.tv_usec =0)
#define TIMER_START     gettimeofday(&tv1, (struct timezone*)0)
#define TIMER_STOP      gettimeofday(&tv2, (struct timezone*)0)
#define TIMER_ELAPSED   (tv2.tv_sec-tv1.tv_sec+(tv2.tv_usec-tv1.tv_usec)*1.E-6)
//----------------------------------------------------------------------

Then, in routines or something that looks like it may be the trouble,
you just put

	TIMER_CLEAR;
	TIMER_START;
	<routine goes here>
	TIMER_STOP;
	cout << "Time to do routine << TIMER_ELAPSED << endl;

If you put these things liberally in your code, you should be able to
figure out what the problem is.
 
> 3. I have tried to spawn 2 slaves on a single architect (multiprocessors
> machine) using the routine:
>            pvm_spawn(slavename, (char**)0, 2, "SGI64", nproc, tids);
> 
>    Is it true that 2 slaves would be created that could  run on 2
> separate CPUs on a multiprocessor environment?

If nproc=2, then yes.  The 2 in the line above means that you are
passing the type of architecture, not that you are starting two
processes.  nproc determines that.

Make sure that you have created the array tids large enough to hold
the id's of the child processes and be sure to check them to make sure
that the slaves actually got started.  
 
> 4. When more slaves are spawn in PVM, say...4 slaves. The speed-up
> should get better because more CPUs are used and less calculations are
> done by each slave. But the processing time I'm getting is even slower
> than that of two slaves. May I know the reasons why is it so?

Because (possibly):

o your basic process is so fast on a single cpu that the process of
  creating, packing, and passing messages takes longer than the basic
  process itself, or

o your algorithm is faulty. 

Again, you results are counter-intuitive.
 
> I'll be appreciate if you can provide comments on the questions asked.

Would you be willing to show me your code?  I may be able to help.  As
it stands though, I don't think anybody can help until the problem is
better specified.

-- 
Clark