From: Peter Rander <rander+@cs.cmu.edu>
Newsgroups: comp.lang.fortran,comp.parallel.pvm
Subject: Re: superlinear speedup, (was: fortran77 programming)
Date: Mon, 14 Dec 1998 10:18:30 -0500
Organization: School of Computer Science, Carnegie Mellon University
Message-Id: <36752C46.41C6@cs.cmu.edu>
References: <3660D892.E1EDBF12@est.it> <3665044D.A0F57E59@kings.uq.edu.au>
    <3665E491.7993355@flash.net> <y6k909mqww.fsf@tweedledumb.cygnus.com>
    <m21zmgnuau.fsf@blinky.bfr.co.il> <y6emqejtgq.fsf@tweedledumb.cygnus.com>
    <m2d85xi988.fsf@blinky.bfr.co.il> <y6yaokiw0r.fsf@tweedledumb.cygnus.com>
    <366BF6F3.1E8DC934@cic-mail.lanl.gov> <m2iuffrq59.fsf@blinky.bfr.co.il>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Xref: ukc comp.lang.fortran:62158 comp.parallel.pvm:7869


Like many readers of these two bboards, I suspect, I have been 
sporadically following this discussion, waiting for the active
players involved to discover that they are talking at two
*EXTREMELY* different levels.  

On the side of the one-fast-processor-beats-N-slower(by_1/N)-
processors, we have a *PURELY
THEORETICAL* argument based on 
the assumption that NOT ONLY does the processor run N times 
faster, but that there are N times the number of registers, 
N times as much L1/L2 cache, N times as much memory bandwidth,
N times as much... I think you get the idea: N times as much
of *everything*.  In addition, since this is theory, we can
*STATE* that the processor does not need to conform to modern
processor design.  With N program counters, we don't need to
push/pop state, etc, since we can just keep all those registers
around.  In fact, if this were actually done, then the argument
goes that things might actually go *faster*, since process A
might have an L2 cache miss, forcing a memory read, while process
B then gets to process, which could be a computation -- thus
reducing the loss of a main memory access.

On the other side, we have numerous this-is-the-real-world-
so-what's-your-point authors stating all kinds of things like
"that would hurt modern processors", "you can't build such a
system", etc.  The arguments here are pretty clear, mostly
valid, and almost obvious in many cases, even to a guy like
me who is not a computer architect.

So perhaps this thread can be split into "theoretical computer
science issues in PVM and FORTRAN" and "real-world issues in
PVM and FORTRAN" (PVM and FORTRAN being the two bboard homes
of this incredibly bizarre miscommunication).  I suspect that
the discussion would have ended millenia ago with such a
simple segmentation....

-Pete
----- 
Harvey J. Stein wrote:
> 
> "Tony T. Warnock" <u091889@cic-mail.lanl.gov> writes:
> 
>  > A simple argument (due to Vance Faber of LANL) shows that
>  > superlinear speedup is not possible. Given an N times faster
>  > processor, one can have the processor execute the first instruction
>  > of each of the N slower processors, then the second instruction,
>  > etc. The faster processor will finish at least as fast as the
>  > slower set of processors.
> 
> One can't do this and it'd make a modern processor slower.
> 
> One can't do it because the the registers will be in the wrong state
> 
> Even if you could do it, it'd make a modern processor slower because
> you'd break pipeline optimizations that had been done to the
> individual programs.
> 
> --
> Harvey J. Stein
> BFM Financial Research
> hjstein@bfr.co.il