From: "Michael T. Bird" Newsgroups: comp.parallel Subject: Re: Unrolling murders performance Date: 7 Jun 1999 13:58:36 GMT Organization: CTS Network Services Approved: bigrigg@cs.cmu.edu Message-Id: <7jgj6c$dqb$1@goldenapple.srv.cs.cmu.edu> Originator: bigrigg@ux6.sp.cs.cmu.edu Xref: ukc comp.parallel:15658 Ramiro, One obvious source of problems is all the redundant computations. (i+1), (i+2), (i+3), (i+4) are each computed 10 times in your code. Compute them once per iteration. Mike Bird Ramiro Willmersdorf wrote: > Is there a reason for this an unrolled loop such as : [deleted -mod] -- Michael T. Bird email: bird@sd.aetc.com AETC, Inc. 8910 University Center Ln. Suite 900 voice: (619) 450-1211 San Diego, CA 92122-1012 FAX: (619) 450-1794 --Boundary_(ID_WX61uVCK0Dd2lWr+bMqQPA) Content-type: text/html; charset=us-ascii Content-transfer-encoding: 7bit Ramiro,

One obvious source of problems is all the redundant computations. (i+1),
(i+2), (i+3), (i+4) are each computed 10 times in your code. Compute
them once per iteration.

Mike Bird

Ramiro Willmersdorf wrote:

Hi,
Is there a reason for this an unrolled loop such as :
***************************************************************
      do 100 iter = 1, niter
*---!MIC$ DOALL PRIVATE(i, istep), SHARED(a,b,c,d)
         do 20 i = 1, (nstep-1)*NROLL+1, nroll
*     0
            c(i) = c(i) + a(i)*a(i) + b(i)*b(i)
            d(i) = c(i) + a(i)*b(i)
*     1
            c(i+1) = c(i+1) + a(i+1)*a(i+1) + b(i+1)*b(i+1)
            d(i+1) = c(i+1) + a(i+1)*b(i+1)
*     2
            c(i+2) = c(i+2) + a(i+2)*a(i+2) + b(i+2)*b(i+2)
            d(i+2) = c(i+2) + a(i+2)*b(i+2)
*     3
            c(i+3) = c(i+3) + a(i+3)*a(i+3) + b(i+3)*b(i+3)
            d(i+3) = c(i+3) + a(i+3)*b(i+3)
*     4
            c(i+4) = c(i+4) + a(i+4)*a(i+4) + b(i+4)*b(i+4)
            d(i+4) = c(i+4) + a(i+4)*b(i+4)
20      continue
*     Remaining iterations
         do 30 i = nstep*nroll+1, BIG
            c(i) = c(i) + a(i)*a(i) + b(i)*b(i)
            d(i) = c(i) + a(i)*b(i)
30      continue
100 continue
***************************************************************
to run *four* times as slow as the original loop:
***************************************************************
      do 100 iter = 1, niter
*--- !MIC$ DOALL PRIVATE(i), SHARED(a,b,c,d)
           do 20 i = 1, BIG
                c(i) = c(i) + a(i)*a(i) + b(i)*b(i)
                d(i) = c(i) + a(i)*b(i)
   20      continue
100 continue
***************************************************************
The loop just above is representative of the loop that uses
most time in a fortran program I'm trying to paralellize with
a Sun Enterprise 450 with 4 processors. ``BIG'' is very big :)
It runs Solaris 2.6 with Workshop Fortran 4.2.
Obviously, with such light loops and long vectors, when I try to
force paralellism, the performace just sucks, it takes
twice as long to run. That's actually what I expected.
``Hey, no problem, I'll just unroll the loop, and eventually
it's *gotta* get enough load tomake the startup costs low enough.
The problem is that when I run the loop as unrolled above,
it takes about *four* times as long to run! 22 seconds
against 5 seconds, give or take.
I hardly expected that. The unrolled loop even parallelizes
very well, however, it still takes longer than the sequential
loop since it's baseline is so much slower.
The loops in the original program are actually somewhat
more complex, but I don't want to tackle them before I
understand what's going on here.
I'd be very grateful for any insight,
Ramiro.
Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
--
Articles to bigrigg+parallel@cs.cmu.edu (Admin: bigrigg@cs.cmu.edu)
Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel

-- 
Michael T. Bird               email:  bird@sd.aetc.com                         
AETC, Inc.                     
8910 University Center Ln.     
Suite 900                     voice:  (619) 450-1211                             
San Diego, CA  92122-1012     FAX:    (619) 450-1794

--Boundary_(ID_WX61uVCK0Dd2lWr+bMqQPA)-- -- Articles to bigrigg+parallel@cs.cmu.edu (Admin: bigrigg@cs.cmu.edu) Archive: http://www.hensa.ac.uk/parallel/internet/usenet/comp.parallel