From: Xingfu Wu <wuxf@bit.csc.lsu.edu>
Newsgroups: comp.lang.fortran,comp.parallel.pvm,comp.parallel.mpi
Subject: Re: superlinear speedup, (was: fortran77 programming)
Date: Tue, 15 Dec 1998 12:36:56 -0600
Organization: Department of Computer Science, LSU
Message-Id: <3676AC48.18DE0329@bit.csc.lsu.edu>
References: <3660D892.E1EDBF12@est.it> <3665044D.A0F57E59@kings.uq.edu.au>
    <3665E491.7993355@flash.net> <y6k909mqww.fsf@tweedledumb.cygnus.com>
    <m21zmgnuau.fsf@blinky.bfr.co.il> <y6emqejtgq.fsf@tweedledumb.cygnus.com>
    <m2d85xi988.fsf@blinky.bfr.co.il> <y6yaokiw0r.fsf@tweedledumb.cygnus.com>
    <366BF6F3.1E8DC934@cic-mail.lanl.gov> <m2iuffrq59.fsf@blinky.bfr.co.il>
    <l6lnka7t6k.fsf@rebutosa.ime.usp.br> <ueemq2pnk6.fsf@altair.dfrc.nasa.gov>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Xref: ukc comp.lang.fortran:62199 comp.parallel.pvm:7878
    comp.parallel.mpi:4444


I think that the defination of superlinear speedup is a key problem.

The most commonly used performance metric for parallel processing is speedup,
which gives the performance gain of parallel processing versus serial
processing. Traditionally, speedup is defined as the ratio of the runtime on a
uniprocessor to the
runtime on a parallel processor, i.e., the speedup is defined as the serial
runtime
divided by parallel runtime.When the ratio is larger than system size, it is
called
superlinear speedup. How to measure the serial runtime and parallel runtime?
Especially how to measure the serial runtime?

For a given problem, more than one serial algorithm may be available, but all
of these
may not be equally suitable for parallelization. When a sequential computer is
used, it
is natural to use the serial algorithm that solves the problem in the least
amount of
time. Given a parallel algorithm, it is fair to judge its performance with
respect to the
fastest serial algorithm for solving the same problem on a uniprocessor.
Speedup is normally less than the system size because of the time lost to
synchronization, communication time, and other overheads required by the
parallel computation. I think that it is not useful to only pursue the
superlinear speedup.

However, it is interesting to mention that superlinear speedups have been
reported
for search operations, such as parallel genetic algorithms, etc. Because of
parallel
processing in a number of branches, some paths leading to bad results may be
eliminated earlier than with sequential processing. Therefore, by avoiding some

unnecessary computations, the speedup may increase by more than the system
size.
Of course, superlinear speedups may also be produced, especially in modern
workstations with large external cache memories, i.e., if running on a single
workstation, the computer uses main memory, perhaps even swap memory; if
running on many computers, the entire application runs in main memory, perhaps
even entirely in a large external cache. Hence the superlinear speedups may be
experienced.

These have been discussed by my monograph: Performance Evaluation, Prediction
and Visualization of Parallel Systems, Kluwer Academic Publishers, March 1999.


Richard Maine wrote:

> rsilva@ime.usp.br (Paulo J. da Silva e Silva) writes:
>
> > Please, correct me if I am wrong. But I keep on thinking that SUPERLINEAR
> > speedup is not possible.
>
> As has been mentioned elsewhere in this thread, there seems to be a
> disconnect between those discussing abstract theory and those discussing
> practice.
>
> I've seen superliner speedup in real life.  Not a lot, but enough to
> be measurable.  Ran a set of jobs with a single-cpu system (or a
> multi-cpu system with all but 1 cpu disabled - twas long enough ago
> that I forget that detail).  Then rerun it after adding (or enabling)
> the multiple cpus.  Yes, 'tis true that when the extra cpus were
> added, they brought with them extra cache, registers, etc..  And
> that might have more to do with the speedup than with the cpu per
> se.  But then, in practice I don't know how to separate these effects.
> I've never had handy a theoretical machine that had multiple cpus
> without also having changes like more registers.
>
> The explanations I heard were some of the same ones mentioned in
> this thread (fewer process context switches because the multiple
> cpus allow more process contexts), but I was more interested in
> the measurable result than in the explanation.
>
> Turned out the speedup wasn't enough to get very excited about
> (a few percent), in fact it was not a whole lot above the threshhold
> of repeatability, but it was by a little.
>
> (Not that it much matters, but the machine I recall measuring
> superlinear speedup on was an Elxsi).
>
> --
> Richard Maine
> maine@altair.dfrc.nasa.gov