From: Craig Burley <burley@tweedledumb.cygnus.com>
Newsgroups: comp.lang.fortran,comp.parallel.pvm
Subject: Re: fortran77 programming
Date: 06 Dec 1998 20:24:51 -0500
Organization: Cygnus Support
Message-Id: <y6yaokiw0r.fsf@tweedledumb.cygnus.com>
References: <3660D892.E1EDBF12@est.it> <3665044D.A0F57E59@kings.uq.edu.au>
    <3665E491.7993355@flash.net> <y6k909mqww.fsf@tweedledumb.cygnus.com>
    <m21zmgnuau.fsf@blinky.bfr.co.il> <y6emqejtgq.fsf@tweedledumb.cygnus.com>
    <m2d85xi988.fsf@blinky.bfr.co.il>
Xref: ukc comp.lang.fortran:61888 comp.parallel.pvm:7843


hjstein@bfr.co.il (Harvey J. Stein) writes:

> I might have quoted the wrong paragraph.  I meant this in response to
> the comment that an N times faster CPU is a win over N CPUs.

Whoops, indeed, that comment is (I *think*) wrong: an N-times faster
CPU is never a *lose* over N CPUs, but I suppose it can be *equal*.
A subtle distinction, perhaps.  Sorry I didn't clarify that when I
entered the thread.  (Like off-by-one errors, > vs. >= errors are
easy to make even outside writing code, at least for me.  :)

That is, I believe it is *possible* for the N-CPU system to perform
as well as the N*faster, 1-CPU system on all applications -- though,
again, I might be wrong.

In any case, regardless of what application domain you throw at it,
all else being equal, an N*faster 1-CPU system is at going to perform
at least as well as an N-CPU system.  ("All else being equal" means,
of course, software suitably optimized for the system, not "software
is exactly the same", etc.)

And, AFAIK, that's either an accepted, or proven, axiom of computer
science.

>  > I didn't say anything about a context-switch occurring.  You simply
>  > interleave instructions, if necessary.
> 
> The single processor machine still has more instructions to execute
> because of the context switches needed for multitasking.

Doesn't matter.  You won't get superlinear speedups by using parallel
hardware.  I don't care whether your application is real-time or not.

Trying to get superlinear speedups out of parallel hardware for *any*
application is like playing those "bonk the gopher" games at
carnivals -- you hit on one area you think is the "final" bottleneck,
and another area pops up as the new problem, sometimes due to the
"fixes" you just made.  When you *think* you're done -- which really
means you've finally exhibited what you think are superlinear
speedups on the few benchmarks you're now doggedly running in hopes
of striking oil -- you then, if you're conscientious, take all that
time, silicon, and software with which you've come up to accomplish your
goal and devote similar amounts to the single-processing hardware version
you're benchmarking against, then hang your head as it runs those
very benchmarks fast enough to destroy the very superlinearity you
thought you'd reached.

And, no, you don't *need* multitasking when "all else [is] equal" on
the N*1-CPU any more than you need it on the N-CPU system.  Exercise
left to the reader to figure out why (it's not exactly obvious,
except perhaps to people experienced in low-level system architecture
and design, e.g. doing low-level OS stuff).

As I pointed out earlier, I *too* used to labor under the delusion
of such superlinear speedups.  I've been corrected as to that delusion
by people who I highly respect in a forum that I believe
is populated by many experts in issues of computer architecture and
science.  I've since corrected many others as to that delusion, and
have yet to meet anyone who can explain, without invoking obscure,
anecdotal evidence, how my original delusion is correct.

Further, my original delusion was not the product of the detailed
knowledge of how CPUs and software actually interact -- it was more
a "gee, this seems obvious" sort of thing.  Once I was corrected,
I re-examined my beliefs using my experience, trying to construct
examples in my head of such superlinear speedups, and realized it
was impossible.  Any such speedups was always attributable to
parallelizing the *application*, not the hardware -- "equivalent",
faster, single-processing hardware would achieve those speedups
just as well, if not better, though I had to use my detailed
knowledge to see how.

>  > Remember, *all else is equal*.  The double-speed processor has twice
>  > the "fast" registers as the single-speed processor in the dual-processor
>  > has, to make up for the fact that there are, effectively, twice as
>  > many fast registers available.  Same for I/O ports, cache, whatever
>  > else you might try to invoke to fortify the spurious claim that
>  > a dual-processor system will be able to do "some tasks" better than the
>  > "equivalent" single-processor system.
> 
> That's not what anyone ever compares when comparing 1 X mhz CPU vs N
> X/N mhz CPUs.

Yet, it's what *I* was talking about.  I was careful to explain the
issues in very general terms in my post.

Perhaps the *original* comment (not mine) about one double-speed-CPU
system always being better than a dual single-speed-CPU system was
meant, or interpreted, to apply only to the popular Intel processors,
or even to the Celeron.  If so, I agree that comment might not be
right (even ignoring the > vs. >= discrepancy in the comparison),
but I'd be interested in any real examples of cases where it isn't,
actually, right when everything *other* than the CPU itself is
made "equal".

Still, on the anecdotal-evidence side, I've been in this industry a
long-enough time to have seen several dual-processor machines built,
sold, and benchmarked.

And, every time, either they did *not* get at least 2X speedup on
*any* benchmark, or something fundamentally wrong was discovered
in either the single-processor or dual-processor system explaining
that >=2X speedup.  (Not that *any* of these were examples of "all
else being equal", since the single-processor machine had fewer fast
resources, e.g. registers, including the PC.)

Another example of this sort of thing: about 8 years ago, while doing
a contract IIRC, I was shown an article on some new technology some
start-up had would enable compression of *any* stored data so it'd
take up less space.  (I think a 2:1 compression ratio was billed as
"typical", but perhaps it claimed a *guaranteed* compression to that
rate.  I think the example of 10MB->5MB was billed as "typical" as
well.  It doesn't matter.)

Anyway, there was some discussion whether this represented a breakthrough
technology (that being back in the days when GBs of disk space was still
too expensive and hard to configure for most PC users) or snake oil.

Someone, probably not me, pointed out that, if such a technology was
for real, then one could infer that it could be used to compress *any*
data down to exactly one bit.  Exercise left to the reader to understand
why.

Since that's clearly impossible, the technology therefore is snake oil,
and the purveyors and supporters of it should have known that, and
most probably did.

So, yes, if you spend $2500 on a dual 250MHz-CPU system, you can get
better performance on some applications than if you spend $2000 on a
single 500MHz-CPU system.  (And, yes, these are probably not
unreasonable examples of *potential* prices of such systems, though,
of course, markets can distort prices, e.g. if everyone insists on
buying "parallel-ready" CPUs even in single-system configurations,
the costs of those systems is relatively higher than the costs of
multi-processor systems.)

But that doesn't mean you've seen superlinear speedups due to using
parallel processors, and it especially doesn't mean you should
commit any significant resources based on the snake-oil promise that
parallel processing means superlinear speedups.

Again, exercise left to the reader to understand why.

The performance of *any* application -- INCLUDING REAL-TIME -- involves
so many factors that anyone making claims about generic performance
characteristics had better be an expert on the relevant topics: process
exchange, interrupt latencies, response times, deadlock prevention, shared
memories, process migration, backing register files, designing TLBs and
caches for multitasking and multiprocessing, and so on.  (Or, they
shouldn't raise the relevant issues to attack just the single-processor
performance.)

As far as how much *I* know about these topics: I'm not saying it's a lot,
but it's sure a heck of a lot more than I know about Fortran.  Still,
I defer to the experts, such as the people who've been studying these
issues and building systems for decades, and are still doing so today.
-- 

"Practice random senselessness and act kind of beautiful."
James Craig Burley, Software Craftsperson    burley@gnu.org