2006-03-24 17:04:40

by Andreas Mohr

[permalink] [raw]
Subject: delay_tsc(): inefficient delay loop (2.6.16-mm1)

Hello all,

I discovered that the delay loop used there (which simply found a new place
in 2.6.16-mm1; it existed much longer) is inefficient,
since it does an unnecessary subtraction *in the main loop* which also hits
asm code:

00000024 <delay_tsc>:
24: 53 push %ebx
25: 89 c3 mov %eax,%ebx
27: 0f 31 rdtsc
29: 89 c1 mov %eax,%ecx
2b: f3 90 pause
2d: 0f 31 rdtsc
2f: 29 c8 sub %ecx,%eax
31: 39 d8 cmp %ebx,%eax
33: 72 f6 jb 2b <delay_tsc+0x7>
35: 5b pop %ebx
36: c3 ret

With such a patch:

--- linux-2.6.16-mm1/arch/i386/lib/delay.c.orig 2006-03-23 12:52:45.000000000 +0100
+++ linux-2.6.16-mm1/arch/i386/lib/delay.c 2006-03-24 12:53:01.000000000 +0100
@@ -44,10 +44,12 @@
unsigned long bclock, now;

rdtscl(bclock);
+ /* offset with bclock to have very simple comparison below */
+ loops += bclock;
do {
rep_nop();
rdtscl(now);
- } while ((now-bclock) < loops);
+ } while (now < loops);
}

/*

the result is:

00000024 <delay_tsc>:
24: 89 c1 mov %eax,%ecx
26: 0f 31 rdtsc
28: 01 c1 add %eax,%ecx
2a: f3 90 pause
2c: 0f 31 rdtsc
2e: 39 c8 cmp %ecx,%eax
30: 72 f8 jb 2a <delay_tsc+0x6>
32: c3 ret

Improvement: no unnecessary stuff after having hit the timer target value
(read: faster), better power saving, 4 bytes less opcodes.

The patch above could be considered weird since it fiddles with "loops"
directly. A possibly cleaner way would be to introduce a new variable
end = bclock + loops.

Comments? Anything that I'm missing here as to why it hasn't been done that
way before?

If this holds water then I'll submit a final patch soon.

Thanks!

Andreas Mohr


2006-03-24 17:22:53

by Ray Lee

[permalink] [raw]
Subject: Re: delay_tsc(): inefficient delay loop (2.6.16-mm1)

On 3/24/06, Andreas Mohr <[email protected]> wrote:
> + loops += bclock;
[...]
> - } while ((now-bclock) < loops);
> + } while (now < loops);

Erm, aren't you introducing an overflow problem here?

if loops is 2^32-1, bclock is 1, the old version would execute the
proper number of times, the new one will blow out in one tick.

2006-03-24 17:37:54

by Edgar Toernig

[permalink] [raw]
Subject: Re: delay_tsc(): inefficient delay loop (2.6.16-mm1)

Andreas Mohr wrote:
>
> rdtscl(bclock);
> + /* offset with bclock to have very simple comparison below */
> + loops += bclock;
> do {
> rep_nop();
> rdtscl(now);
> - } while ((now-bclock) < loops);
> + } while (now < loops);
> }

Hehe, optimizing delay loops *g* But your optimization is
wrong. 'loops+bclock' and/or 'now' is likely to wrap around
and then the test condition becomes bogus.

Ciao, ET.

2006-03-24 17:41:52

by Andreas Mohr

[permalink] [raw]
Subject: Re: delay_tsc(): inefficient delay loop (2.6.16-mm1)

Hi,

On Fri, Mar 24, 2006 at 09:22:51AM -0800, Ray Lee wrote:
> On 3/24/06, Andreas Mohr <[email protected]> wrote:
> > + loops += bclock;
> [...]
> > - } while ((now-bclock) < loops);
> > + } while (now < loops);
>
> Erm, aren't you introducing an overflow problem here?
>
> if loops is 2^32-1, bclock is 1, the old version would execute the
> proper number of times, the new one will blow out in one tick.

Doh. That's what happens if you get too excited about some new trick...
Back to the drawing board, methinks.

Andreas Mohr

--
No programming skills!? Why not help translate many Linux applications!
https://launchpad.ubuntu.com/rosetta