Commit 51d7d5205d338 ("powerpc: Add smp_mb() to arch_spin_is_locked()")
added an smp_mb() to arch_spin_is_locked(), in order to ensure that
Thread 0 Thread 1
spin_lock(A); spin_lock(B);
r0 = spin_is_locked(B) r1 = spin_is_locked(A);
never ends up with r0 = r1 = 0, and reported one example (in ipc/sem.c)
relying on such guarantee.
It's however understood (and undocumented) that spin_is_locked() is not
required to ensure such ordering guarantee, guarantee that is currently
_not_ provided by all implementations/arch, and that callers relying on
such ordering should instead use suitable memory barriers before acting
on the result of spin_is_locked().
Following a recent auditing[1] of the callers of {,raw_}spin_is_locked()
revealing that none of them are relying on this guarantee anymore, this
commit removes the leading smp_mb() from the primitive thus effectively
reverting 51d7d5205d338.
[1] https://marc.info/?l=linux-kernel&m=151981440005264&w=2
Signed-off-by: Andrea Parri <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
---
arch/powerpc/include/asm/spinlock.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index b9ebc3085fb79..ecc141e3f1a73 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -67,7 +67,6 @@ static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
static inline int arch_spin_is_locked(arch_spinlock_t *lock)
{
- smp_mb();
return !arch_spin_value_unlocked(*lock);
}
--
2.7.4
On Mon, 2018-03-26 at 12:37 +0200, Andrea Parri wrote:
> Commit 51d7d5205d338 ("powerpc: Add smp_mb() to arch_spin_is_locked()")
> added an smp_mb() to arch_spin_is_locked(), in order to ensure that
>
> Thread 0 Thread 1
>
> spin_lock(A); spin_lock(B);
> r0 = spin_is_locked(B) r1 = spin_is_locked(A);
>
> never ends up with r0 = r1 = 0, and reported one example (in ipc/sem.c)
> relying on such guarantee.
>
> It's however understood (and undocumented) that spin_is_locked() is not
> required to ensure such ordering guarantee,
Shouldn't we start by documenting it ?
> guarantee that is currently
> _not_ provided by all implementations/arch, and that callers relying on
> such ordering should instead use suitable memory barriers before acting
> on the result of spin_is_locked().
>
> Following a recent auditing[1] of the callers of {,raw_}spin_is_locked()
> revealing that none of them are relying on this guarantee anymore, this
> commit removes the leading smp_mb() from the primitive thus effectively
> reverting 51d7d5205d338.
I would rather wait until it is properly documented. Debugging that IPC
problem took a *LOT* of time and energy, I wouldn't want these issues
to come and bite us again.
> [1] https://marc.info/?l=linux-kernel&m=151981440005264&w=2
>
> Signed-off-by: Andrea Parri <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> ---
> arch/powerpc/include/asm/spinlock.h | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index b9ebc3085fb79..ecc141e3f1a73 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -67,7 +67,6 @@ static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>
> static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> {
> - smp_mb();
> return !arch_spin_value_unlocked(*lock);
> }
>
On Tue, Mar 27, 2018 at 11:06:56AM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2018-03-26 at 12:37 +0200, Andrea Parri wrote:
> > Commit 51d7d5205d338 ("powerpc: Add smp_mb() to arch_spin_is_locked()")
> > added an smp_mb() to arch_spin_is_locked(), in order to ensure that
> >
> > Thread 0 Thread 1
> >
> > spin_lock(A); spin_lock(B);
> > r0 = spin_is_locked(B) r1 = spin_is_locked(A);
> >
> > never ends up with r0 = r1 = 0, and reported one example (in ipc/sem.c)
> > relying on such guarantee.
> >
> > It's however understood (and undocumented) that spin_is_locked() is not
> > required to ensure such ordering guarantee,
>
> Shouldn't we start by documenting it ?
I do sympathize with your concern about the documentation! ;) The patch in
[1] was my (re)action to this concern; the sort of the patch is unclear to
me by this time (and I'm not aware of other proposals in this respect).
>
> > guarantee that is currently
> > _not_ provided by all implementations/arch, and that callers relying on
> > such ordering should instead use suitable memory barriers before acting
> > on the result of spin_is_locked().
> >
> > Following a recent auditing[1] of the callers of {,raw_}spin_is_locked()
> > revealing that none of them are relying on this guarantee anymore, this
> > commit removes the leading smp_mb() from the primitive thus effectively
> > reverting 51d7d5205d338.
>
> I would rather wait until it is properly documented. Debugging that IPC
> problem took a *LOT* of time and energy, I wouldn't want these issues
> to come and bite us again.
I understand. And I'm grateful for this debugging as well as for the (IMO)
excellent account of it you provided in 51d7d5205d338.
Said this ;) I cannot except myself from saying that I would probably have
resisted that solution (adding an smp_mb() in my arch_spin_is_locked), and
instead "blamed"/suggested that caller to fix his memory ordering...
Andrea
>
> > [1] https://marc.info/?l=linux-kernel&m=151981440005264&w=2
> >
> > Signed-off-by: Andrea Parri <[email protected]>
> > Cc: Benjamin Herrenschmidt <[email protected]>
> > Cc: Paul Mackerras <[email protected]>
> > Cc: Michael Ellerman <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > ---
> > arch/powerpc/include/asm/spinlock.h | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> > index b9ebc3085fb79..ecc141e3f1a73 100644
> > --- a/arch/powerpc/include/asm/spinlock.h
> > +++ b/arch/powerpc/include/asm/spinlock.h
> > @@ -67,7 +67,6 @@ static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
> >
> > static inline int arch_spin_is_locked(arch_spinlock_t *lock)
> > {
> > - smp_mb();
> > return !arch_spin_value_unlocked(*lock);
> > }
> >
On Tue, 2018-03-27 at 12:25 +0200, Andrea Parri wrote:
> > I would rather wait until it is properly documented. Debugging that IPC
> > problem took a *LOT* of time and energy, I wouldn't want these issues
> > to come and bite us again.
>
> I understand. And I'm grateful for this debugging as well as for the (IMO)
> excellent account of it you provided in 51d7d5205d338.
>
> Said this ;) I cannot except myself from saying that I would probably have
> resisted that solution (adding an smp_mb() in my arch_spin_is_locked), and
> instead "blamed"/suggested that caller to fix his memory ordering...
This is debatable. I tend go for the conservative approach assuming
most people using fairly high level APIs such as spin_is_locked() are
not fully cognisant of the subtleties of our memory model.
After all, it works on x86 ...
The fact that the load can leak into the internals of spin_lock()
implementation and re-order with the lock word load itself is quite
hard to fathom for somebody who doesn't have years of experience
dealing with that sort of issue.
So people will get it wrong.
So unless it's very performance sensitive, I'd rather have things like
spin_is_locked be conservative by default and provide simpler ordering
semantics.
Cheers,
Ben.
On Tue, Mar 27, 2018 at 10:33:06PM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2018-03-27 at 12:25 +0200, Andrea Parri wrote:
> > > I would rather wait until it is properly documented. Debugging that IPC
> > > problem took a *LOT* of time and energy, I wouldn't want these issues
> > > to come and bite us again.
> >
> > I understand. And I'm grateful for this debugging as well as for the (IMO)
> > excellent account of it you provided in 51d7d5205d338.
> >
> > Said this ;) I cannot except myself from saying that I would probably have
> > resisted that solution (adding an smp_mb() in my arch_spin_is_locked), and
> > instead "blamed"/suggested that caller to fix his memory ordering...
>
> This is debatable. I tend go for the conservative approach assuming
> most people using fairly high level APIs such as spin_is_locked() are
> not fully cognisant of the subtleties of our memory model.
>
> After all, it works on x86 ...
>
> The fact that the load can leak into the internals of spin_lock()
> implementation and re-order with the lock word load itself is quite
> hard to fathom for somebody who doesn't have years of experience
> dealing with that sort of issue.
>
> So people will get it wrong.
>
> So unless it's very performance sensitive, I'd rather have things like
> spin_is_locked be conservative by default and provide simpler ordering
> semantics.
Well, it might not be "very performance sensitive" but allow me to say
that "40+ SYNCs in stuff like BUG_ON or such" is sadness to my eyes ;),
especially when considered that our "high level API" provides means to
avoid this situation (e.g., smp_mb__after_spinlock(); BTW, if you look
at architectures for which this macro is "non-trivial", you can get an
idea of the architectures which "wouldn't work"; of course, x86 is not
among these). Yes, we do appear to have different views on what is to
be considered the "simpler ordering semantics". I'm willing to change
mine _as soon as_ this gets documented: would you be willing to send a
patch (on the lines of my [1]) to describe/document such semantics?
Andrea
>
> Cheers,
> Ben.
>
On Tue, 2018-03-27 at 15:13 +0200, Andrea Parri wrote:
> >
> > So unless it's very performance sensitive, I'd rather have things like
> > spin_is_locked be conservative by default and provide simpler ordering
> > semantics.
>
> Well, it might not be "very performance sensitive" but allow me to say
> that "40+ SYNCs in stuff like BUG_ON or such" is sadness to my eyes ;),
In the fast path or the trap case ? Because the latter doesn't matter
at all...
> especially when considered that our "high level API" provides means to
> avoid this situation (e.g., smp_mb__after_spinlock(); BTW, if you look
> at architectures for which this macro is "non-trivial", you can get an
> idea of the architectures which "wouldn't work"; of course, x86 is not
> among these). Yes, we do appear to have different views on what is to
> be considered the "simpler ordering semantics". I'm willing to change
> mine _as soon as_ this gets documented: would you be willing to send a
> patch (on the lines of my [1]) to describe/document such semantics?
Not really :-) Just expressing an opinion. I don't fully object to your
approach, just saying it's open for debate.
At this point, I have too many other things to chase to follow up too
much on this.
Cheers,
Ben.
Andrea Parri <[email protected]> writes:
> On Tue, Mar 27, 2018 at 11:06:56AM +1100, Benjamin Herrenschmidt wrote:
>> On Mon, 2018-03-26 at 12:37 +0200, Andrea Parri wrote:
>> > Commit 51d7d5205d338 ("powerpc: Add smp_mb() to arch_spin_is_locked()")
>> > added an smp_mb() to arch_spin_is_locked(), in order to ensure that
>> >
>> > Thread 0 Thread 1
>> >
>> > spin_lock(A); spin_lock(B);
>> > r0 = spin_is_locked(B) r1 = spin_is_locked(A);
>> >
>> > never ends up with r0 = r1 = 0, and reported one example (in ipc/sem.c)
>> > relying on such guarantee.
>> >
>> > It's however understood (and undocumented) that spin_is_locked() is not
>> > required to ensure such ordering guarantee,
>>
>> Shouldn't we start by documenting it ?
>
> I do sympathize with your concern about the documentation! ;) The patch in
> [1] was my (re)action to this concern; the sort of the patch is unclear to
> me by this time (and I'm not aware of other proposals in this respect).
>>
>> > guarantee that is currently
>> > _not_ provided by all implementations/arch, and that callers relying on
>> > such ordering should instead use suitable memory barriers before acting
>> > on the result of spin_is_locked().
>> >
>> > Following a recent auditing[1] of the callers of {,raw_}spin_is_locked()
>> > revealing that none of them are relying on this guarantee anymore, this
>> > commit removes the leading smp_mb() from the primitive thus effectively
>> > reverting 51d7d5205d338.
>>
>> I would rather wait until it is properly documented. Debugging that IPC
>> problem took a *LOT* of time and energy, I wouldn't want these issues
>> to come and bite us again.
>
> I understand. And I'm grateful for this debugging as well as for the (IMO)
> excellent account of it you provided in 51d7d5205d338.
Thanks, you're welcome :)
> Said this ;) I cannot except myself from saying that I would probably have
> resisted that solution (adding an smp_mb() in my arch_spin_is_locked), and
> instead "blamed"/suggested that caller to fix his memory ordering...
That was tempting, but it leaves unfixed all the other potential
callers, both in in-tree and out-of-tree and in code that's yet to be
written.
Looking today nearly all the callers are debug code, where we probably
don't need the barrier but we also don't care about the overhead of the
barrier.
Documenting it would definitely be good, but even then I'd be inclined
to leave the barrier in our implementation. Matching the documented
behaviour is one thing, but the actual real-world behaviour on well
tested platforms (ie. x86) is more important.
cheers
On Wed, Mar 28, 2018 at 08:51:35AM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2018-03-27 at 15:13 +0200, Andrea Parri wrote:
> > >
> > > So unless it's very performance sensitive, I'd rather have things like
> > > spin_is_locked be conservative by default and provide simpler ordering
> > > semantics.
> >
> > Well, it might not be "very performance sensitive" but allow me to say
> > that "40+ SYNCs in stuff like BUG_ON or such" is sadness to my eyes ;),
>
> In the fast path or the trap case ? Because the latter doesn't matter
> at all...
Both: you must execute the sync "before" issuing the load of lock.slock.
>
> > especially when considered that our "high level API" provides means to
> > avoid this situation (e.g., smp_mb__after_spinlock(); BTW, if you look
> > at architectures for which this macro is "non-trivial", you can get an
> > idea of the architectures which "wouldn't work"; of course, x86 is not
> > among these). Yes, we do appear to have different views on what is to
> > be considered the "simpler ordering semantics". I'm willing to change
> > mine _as soon as_ this gets documented: would you be willing to send a
> > patch (on the lines of my [1]) to describe/document such semantics?
>
> Not really :-) Just expressing an opinion. I don't fully object to your
> approach, just saying it's open for debate.
Always (open for debate).
I'm sending a v2 of this series shortly, integrating some feedback;
I prefer to keep this patch in the series (feel free to ignore).
Andrea
>
> At this point, I have too many other things to chase to follow up too
> much on this.
>
> Cheers,
> Ben.
>
On Wed, Mar 28, 2018 at 01:04:36PM +0200, Peter Zijlstra wrote:
> On Wed, Mar 28, 2018 at 04:25:37PM +1100, Michael Ellerman wrote:
> > Documenting it would definitely be good, but even then I'd be inclined
> > to leave the barrier in our implementation. Matching the documented
> > behaviour is one thing, but the actual real-world behaviour on well
> > tested platforms (ie. x86) is more important.
>
> By that argument you should switch your spinlock implementation to RCpc
> and include that SYNC in either lock or unlock already ;-)
*RCsc* obviously... clearly I need to wake up moar.
> Ideally we'd completely eradicate the *_is_locked() crud from the
> kernel, not sure how feasable that really is, but it's a good goal. At
> that point the whole issue of the barrier becomes moot of course.
On Wed, Mar 28, 2018 at 04:25:37PM +1100, Michael Ellerman wrote:
> That was tempting, but it leaves unfixed all the other potential
> callers, both in in-tree and out-of-tree and in code that's yet to be
> written.
So I myself don't care one teeny tiny bit about out of tree code, they
get to keep their pieces :-)
> Looking today nearly all the callers are debug code, where we probably
> don't need the barrier but we also don't care about the overhead of the
> barrier.
Still, code like:
WARN_ON_ONCE(!spin_is_locked(foo));
will unconditionally emit that SYNC. So you might want to be a little
careful.
> Documenting it would definitely be good, but even then I'd be inclined
> to leave the barrier in our implementation. Matching the documented
> behaviour is one thing, but the actual real-world behaviour on well
> tested platforms (ie. x86) is more important.
By that argument you should switch your spinlock implementation to RCpc
and include that SYNC in either lock or unlock already ;-)
Ideally we'd completely eradicate the *_is_locked() crud from the
kernel, not sure how feasable that really is, but it's a good goal. At
that point the whole issue of the barrier becomes moot of course.
Peter Zijlstra <[email protected]> writes:
> On Wed, Mar 28, 2018 at 04:25:37PM +1100, Michael Ellerman wrote:
>> That was tempting, but it leaves unfixed all the other potential
>> callers, both in in-tree and out-of-tree and in code that's yet to be
>> written.
>
> So I myself don't care one teeny tiny bit about out of tree code, they
> get to keep their pieces :-)
I agree, but somehow I still end up seeing the pieces from time to time :)
>> Looking today nearly all the callers are debug code, where we probably
>> don't need the barrier but we also don't care about the overhead of the
>> barrier.
>
> Still, code like:
>
> WARN_ON_ONCE(!spin_is_locked(foo));
>
> will unconditionally emit that SYNC. So you might want to be a little
> careful.
>
>> Documenting it would definitely be good, but even then I'd be inclined
>> to leave the barrier in our implementation. Matching the documented
>> behaviour is one thing, but the actual real-world behaviour on well
>> tested platforms (ie. x86) is more important.
>
> By that argument you should switch your spinlock implementation to RCpc
> and include that SYNC in either lock or unlock already ;-)
Yes!
Unfortunately performance is also a thing :/
I ran the numbers again, switching to sync on unlock is a 30% hit for an
uncontended lock/unlock loop. Which is not going to make me any friends.
> Ideally we'd completely eradicate the *_is_locked() crud from the
> kernel, not sure how feasable that really is, but it's a good goal. At
> that point the whole issue of the barrier becomes moot of course.
Yeah that would be good. The majority of them could/should be using
lockdep_is_held(), though there are still a handful that are actually
using it in non-debug logic.
cheers
Peter Zijlstra <[email protected]> writes:
> On Wed, Mar 28, 2018 at 01:04:36PM +0200, Peter Zijlstra wrote:
>> On Wed, Mar 28, 2018 at 04:25:37PM +1100, Michael Ellerman wrote:
>> > Documenting it would definitely be good, but even then I'd be inclined
>> > to leave the barrier in our implementation. Matching the documented
>> > behaviour is one thing, but the actual real-world behaviour on well
>> > tested platforms (ie. x86) is more important.
>>
>> By that argument you should switch your spinlock implementation to RCpc
>> and include that SYNC in either lock or unlock already ;-)
>
> *RCsc* obviously... clearly I need to wake up moar.
It's just a jumble of letters to me - I didn't even notice ;)
cheers