LinuxLists.cc - Question and patch about spinlocks (x86)

2001-10-11 20:37:43

Subject: Question and patch about spinlocks (x86)

Just looking through at the spinlock assembly I noticed a few things which I
think are bugs:

"js 2f\n" \
".section .text.lock,\"ax\"\n" \
"2:\t" \
"cmpb $0,%0\n\t" \
"rep;nop\n\t" \
"jle 2b\n\t" \
"jmp 1b\n" \
".previous"

We do the cmp loop as a 'soft' check, as the lock operand locks the whole system
bus, stopping the system for a while (as much as 70 cycles, I believe). However,
I don't understand why it was put before the 'rep; nop' which just sets the
processor to wait for a bit. Surely it would be better to test *after* we have
waited, as then we have a better chance of it being correct.

Any comments? Attached is a patch to fix it.

--

Mark Zealey (aka JALH on irc.openprojects.net: #zealos and many more)
[email protected]
[email protected]

UL++++>$ G!>(GCM/GCS/GS/GM) dpu? s:-@ a16! C++++>$ P++++>+++++$ L+++>+++++$
!E---? W+++>$ N- !o? !w--- O? !M? !V? !PS !PE--@ PGP+? r++ !t---?@ !X---?
!R- b+ !tv b+ DI+ D+? G+++ e>+++++ !h++* r!-- y--

(http://www.geekcode.com)

Attachments:

(No filename) (995.00 B)
patch (302.00 B)
Download all attachments

2001-10-11 22:56:42

by Davide Libenzi

[permalink] [raw]

Subject: Re: Question and patch about spinlocks (x86)

On Thu, 11 Oct 2001, Mark Zealey wrote:

> Just looking through at the spinlock assembly I noticed a few things which I
> think are bugs:
>
> "js 2f\n" \
> ".section .text.lock,\"ax\"\n" \
> "2:\t" \
> "cmpb $0,%0\n\t" \
> "rep;nop\n\t" \
> "jle 2b\n\t" \
> "jmp 1b\n" \
> ".previous"
>
> We do the cmp loop as a 'soft' check, as the lock operand locks the whole system
> bus, stopping the system for a while (as much as 70 cycles, I believe). However,
> I don't understand why it was put before the 'rep; nop' which just sets the
> processor to wait for a bit. Surely it would be better to test *after* we have
> waited, as then we have a better chance of it being correct.

The effect of the rep-nop is not to wait but to slow down the cpu to the
speed of the memory bus.
This to not overload ( due pipeline prefetch ) the memory controller with
requests that 1) will be useless coz the watched memory location can
change only at the membus speed 2) will have a big cost on loop exit due
the invalidation of a number>1 requests issued on the memory controller.
Beside this i kindly agree to move the pause before the cmp.
There should be a valid reason to not have followed the intel scheme but i
don't know why.

- Davide