The "mull" instruction in __const_udelay() cuts off the lower 32 bits -- so,
it is "rounding down". This is both an issue for small ndelay()s for _all_
values for loops_per_jiffy and for certain {n,u}delay()s for many loops_per_jiffy
values.
Assuming
LPJ = 1501115
udelay(87)
results in
130597 loops to be spent.
However, 1000 * 130597 / 1501115 is 86.999997 us, so we're actually
_rounding down_. 1000 * 130598 / 1501115 is 87.000662841, which would be the
technically correct thing to do. Of course, for the TSC case this won't
matter as the maths take some time, so the actual delay is
1000 * __udelay(x) / lpj + __OVERHEAD(x)
Anybody worried about both the additional overhead and the fact that the
overhead takes some time to run should add a check
if (unlikely(xloops < OVERHEAD))
return;
xloops -= OVERHEAD;
to the delay() routines in arch/i386/kernel/timers/*.c and determine
what the OVERHEAD is.
Signed-off-by: Dominik Brodowski <[email protected]>
diff -ruN linux-original/arch/i386/lib/delay.c linux/arch/i386/lib/delay.c
--- linux-original/arch/i386/lib/delay.c 2004-06-15 07:54:16.938687280 +0200
+++ linux/arch/i386/lib/delay.c 2004-06-15 07:54:36.275747600 +0200
@@ -35,7 +35,7 @@
__asm__("mull %0"
:"=d" (xloops), "=&a" (d0)
:"1" (xloops),"0" (current_cpu_data.loops_per_jiffy * (HZ/4)));
- __delay(xloops);
+ __delay(++xloops);
}
void __udelay(unsigned long usecs)