From: Paul Clark <prc@sysmag.com>
Newsgroups: comp.sys.arm,comp.sys.transputer
Subject: Re: Floating Point Performance of the StrongARM
Date: Wed, 03 Feb 1999 17:20:31 +0000
Organization: Systems Magic Ltd.
Message-Id: <36B8855F.118C581D@sysmag.com>
References: <797adq$br7$1@nnrp1.dejanews.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Xref: ukc comp.sys.arm:3040 comp.sys.transputer:9019


Brian.Oneill@ntu.ac.uk wrote:
>                             Integer                Float
> StrongARM
>    2.11 compiler           0.366 sec             34.50 sec
>    2.50 compiler           0.366 sec              1.29 sec
                                                    ^^^^
>    J Browm=92s FP                                  10.30 sec

Well, that certainly woke me from my lurking slumber!  I'm afraid I find
it very difficult to believe, though.  A float op in less than 4 integer
ops? =


I'm afraid (having played with this myself many years ago), I think
you're being bitten by an optimiser...

> Below copy of our test code.

Excellent.  Time to pretend to be an optimiser.  I'm sure you've already
done this, but I'm still suspicious.  What I'd really like to see is the
ARM output...
 =

> void BenchMark2(void)
> {
>     //section used for floating points op
>     long unsigned int i, j;
>     float p, q, k, l, m;
>     float ans[10];

Nothing volatile here, but I think you've ensured that the results are
all used by the two-stage accumulation into ans[0].  Fine.

>     Time();
> =

>     j=3D0;
> =

>     for (i=3D0;i<100000;i++)
>     {
>     p=3D4.0F;
>     q=3D200.0F;

Alarm bell - constants.  Better to pass these into the function from
outside and hope there isn't any inter-function optimisation.
 =

>         //benchmarking starts here
>         for (j=3D0;j<10;j++)
>         {
>         p++;
>         q++;

Hmm.  So p=3D[5..15], q=3D[201..211]

>         k =3D p + q;

=3D> k =3D [206..226]

>         l =3D k*p;
>         m =3D l*q;

I can't work it out, but a compiler could, if it chose to unroll this
j[0..9] loop - the whole shebang would be constant folded down to

  ans[j] =3D <some number>

>         ans[j] =3D k + l + m;

This is dangerous, because only the last iteration of the 100,000 is
actually significant, and it's the same calculation every time.  Better
to accumulate in ans[j] (not forgetting to reset it at the start), and
make sure some part of the calculation involves 'i'.  However, since
you're not getting 100,000* speedup, this probably isn't the problem.

>     }
> =

>     Time();

Whoa!  Think code motion here.  What's to stop the compiler shifting
some or all of that calculation right >here<, since it's all on the
stack, and has no side effects that can effect Time().  Indeed, it'd be
a cool thing to do, because it'll save ever having all the stack for
ans[] alive during a subcall.  I'd seriously consider including this
final accumulation in the timing, and setting a global with the answer
to force it to calculate it before calling Time().

The problem is, if the compiler _was_ doing this, you'd probably be
reading zero time!

>     i =3D 0;
>     for (i=3D0;i<10;i++)
>     {
>     ans[0]=3Dans[i] + ans[0];
>     }
> =

>     writeHex(ans[1]);

Oops!  I think you meant ans[0] here.  This could be it, in fact. =

You're only demanding that ans[1] is calculated, which would take one
tenth of the time to calculate ans[0..9], which is roughly what you're
seeing (at least compared to Julian's library).  If it decided to unroll
the j[0..9] loop, it would do this easily, because ans[0] and ans[2..9]
are never live, partly because of this typo, and partly because you
always reset them on each iteration.

>     Exit();

[Stage left ;-]

One way of solving most of these problems at a stroke is to put all the
variables into volatile globals.  That way it won't dare optimise
anything, but the base overhead of the calculation will be higher.

Hope this helps,

P.
-- =

Paul Clark             mailto:prc@sysmag.com      $ whois pc52
Systems Magic Ltd.     http://www.sysmag.com

