From: "Michael T. Bird" <bird@sd.aetc.com>
Newsgroups: comp.parallel
Subject: Re: Unrolling murders performance
Date: 7 Jun 1999 13:58:36 GMT
Organization: CTS Network Services
Approved: bigrigg@cs.cmu.edu
Message-Id: <7jgj6c$dqb$1@goldenapple.srv.cs.cmu.edu>
Originator: bigrigg@ux6.sp.cs.cmu.edu
Xref: ukc comp.parallel:15658


Ramiro,

One obvious source of problems is all the redundant computations. (i+1),
(i+2), (i+3), (i+4) are each computed 10 times in your code.  Compute
them once per iteration.

Mike Bird

Ramiro Willmersdorf wrote:
> Is there a reason for this an unrolled loop such as :
[deleted -mod]

--
Michael T. Bird               email:  bird@sd.aetc.com
AETC, Inc.
8910 University Center Ln.
Suite 900                     voice:  (619) 450-1211
San Diego, CA  92122-1012     FAX:    (619) 450-1794



--Boundary_(ID_WX61uVCK0Dd2lWr+bMqQPA)
Content-type: text/html; charset=us-ascii
Content-transfer-encoding: 7bit

<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
Ramiro,
<p>One obvious source of problems is all the redundant computations. (i+1),
<br>(i+2), (i+3), (i+4) are each computed 10 times in your code.&nbsp;
Compute
<br>them once per iteration.
<p>Mike Bird
<br>&nbsp;
<p>Ramiro Willmersdorf wrote:
<blockquote TYPE=CITE>Hi,
<p>Is there a reason for this an unrolled loop such as :
<p>***************************************************************
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do 100 iter = 1, niter
<br>*---!MIC$ DOALL PRIVATE(i, istep), SHARED(a,b,c,d)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do 20 i = 1, (nstep-1)*NROLL+1,
nroll
<br>*&nbsp;&nbsp;&nbsp;&nbsp; 0
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
c(i) = c(i) + a(i)*a(i) + b(i)*b(i)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
d(i) = c(i) + a(i)*b(i)
<br>*&nbsp;&nbsp;&nbsp;&nbsp; 1
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
c(i+1) = c(i+1) + a(i+1)*a(i+1) + b(i+1)*b(i+1)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
d(i+1) = c(i+1) + a(i+1)*b(i+1)
<br>*&nbsp;&nbsp;&nbsp;&nbsp; 2
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
c(i+2) = c(i+2) + a(i+2)*a(i+2) + b(i+2)*b(i+2)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
d(i+2) = c(i+2) + a(i+2)*b(i+2)
<br>*&nbsp;&nbsp;&nbsp;&nbsp; 3
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
c(i+3) = c(i+3) + a(i+3)*a(i+3) + b(i+3)*b(i+3)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
d(i+3) = c(i+3) + a(i+3)*b(i+3)
<br>*&nbsp;&nbsp;&nbsp;&nbsp; 4
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
c(i+4) = c(i+4) + a(i+4)*a(i+4) + b(i+4)*b(i+4)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
d(i+4) = c(i+4) + a(i+4)*b(i+4)
<p>&nbsp;20&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; continue
<p>*&nbsp;&nbsp;&nbsp;&nbsp; Remaining iterations
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do 30 i = nstep*nroll+1,
BIG
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
c(i) = c(i) + a(i)*a(i) + b(i)*b(i)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
d(i) = c(i) + a(i)*b(i)
<br>&nbsp;30&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; continue
<p>&nbsp;100&nbsp; continue
<br>***************************************************************
<p>to run *four* times as slow as the original loop:
<p>***************************************************************
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do 100 iter = 1, niter
<br>*--- !MIC$ DOALL PRIVATE(i), SHARED(a,b,c,d)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; do 20
i = 1, BIG
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
c(i) = c(i) + a(i)*a(i) + b(i)*b(i)
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
d(i) = c(i) + a(i)*b(i)
<br>&nbsp;&nbsp; 20&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; continue
<br>&nbsp; 100 continue
<p>***************************************************************
<p>The loop just above is representative of the loop that uses
<br>most time in a fortran program I'm trying to paralellize with
<br>a Sun Enterprise 450 with 4 processors. ``BIG'' is very big :)
<br>It runs Solaris 2.6 with Workshop Fortran 4.2.
<p>Obviously, with such light loops and long vectors, when I try to
<br>force paralellism, the performace just sucks, it takes
<br>twice as long to run. That's actually what I expected.
<p>``Hey, no problem, I'll just unroll the loop, and eventually
<br>it's *gotta* get enough load tomake the startup costs low enough.
<p>The problem is that when I run the loop as unrolled above,
<br>it takes about *four* times as long to run!&nbsp; 22 seconds
<br>against 5 seconds, give or take.
<p>I hardly expected that.&nbsp; The unrolled loop even parallelizes
<br>very well, however, it still takes longer than the sequential
<br>loop since it's baseline is so much slower.
<p>The loops in the original program are actually somewhat
<br>more complex, but I don't want to tackle them before I
<br>understand what's going on here.
<p>I'd be very grateful for any insight,
<p>Ramiro.
<p>Sent via Deja.com <a href="http://www.deja.com/">http://www.deja.com/</a>
<br>Share what you know. Learn what you don't.
<p>--
<br>Articles to bigrigg+parallel@cs.cmu.edu (Admin: bigrigg@cs.cmu.edu)
<br>Archive: <a href="http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel">http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel</a></blockquote>

<pre>--&nbsp;
Michael T. Bird&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; email:&nbsp; bird@sd.aetc.com&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
AETC, Inc.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8910 University Center Ln.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Suite 900&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; voice:&nbsp; (619) 450-1211&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
San Diego, CA&nbsp; 92122-1012&nbsp;&nbsp;&nbsp;&nbsp; FAX:&nbsp;&nbsp;&nbsp; (619) 450-1794</pre>
&nbsp;</html>

--Boundary_(ID_WX61uVCK0Dd2lWr+bMqQPA)--

--
Articles to bigrigg+parallel@cs.cmu.edu (Admin: bigrigg@cs.cmu.edu)
Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel

