2021-09-09 11:06:52

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

While looking at current_save_and_set_rtlock_wait_state() I'm thinking
it really ought to use smp_store_mb(), because something like:

current_save_and_set_rtlock_wait_state();
for (;;) {
if (try_lock())
break;

raw_spin_unlock_irq(&lock->wait_lock);
schedule();
raw_spin_lock_irq(&lock->wait_lock);

set_current_state(TASK_RTLOCK_WAIT);
}
current_restore_rtlock_saved_state();

which is the advertised usage in the comment, is actually broken,
since trylock() will only need a load-acquire in general and that
could be re-ordered against the state store, which could lead to a
missed wakeup -> BAD (tm).

While there, make them consistent with the IRQ usage in
set_special_state().

Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
include/linux/sched.h | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -245,7 +245,8 @@ struct task_group;
* if (try_lock())
* break;
* raw_spin_unlock_irq(&lock->wait_lock);
- * schedule_rtlock();
+ * if (!cond)
+ * schedule_rtlock();
* raw_spin_lock_irq(&lock->wait_lock);
* set_current_state(TASK_RTLOCK_WAIT);
* }
@@ -253,22 +254,24 @@ struct task_group;
*/
#define current_save_and_set_rtlock_wait_state() \
do { \
- lockdep_assert_irqs_disabled(); \
- raw_spin_lock(&current->pi_lock); \
+ unsigned long flags; /* may shadow */ \
+ \
+ raw_spin_lock_irqsave(&current->pi_lock, flags); \
current->saved_state = current->__state; \
debug_rtlock_wait_set_state(); \
- WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT); \
- raw_spin_unlock(&current->pi_lock); \
+ smp_store_mb(current->__state, TASK_RTLOCK_WAIT); \
+ raw_spin_unlock_irqrestore(&current->pi_lock, flags); \
} while (0);

#define current_restore_rtlock_saved_state() \
do { \
- lockdep_assert_irqs_disabled(); \
- raw_spin_lock(&current->pi_lock); \
+ unsigned long flags; /* may shadow */ \
+ \
+ raw_spin_lock_irqsave(&current->pi_lock, flags); \
debug_rtlock_wait_restore_state(); \
WRITE_ONCE(current->__state, current->saved_state); \
current->saved_state = TASK_RUNNING; \
- raw_spin_unlock(&current->pi_lock); \
+ raw_spin_unlock_irqrestore(&current->pi_lock, flags); \
} while (0);

#define get_current_state() READ_ONCE(current->__state)



2021-09-09 13:49:52

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> it really ought to use smp_store_mb(), because something like:
>
> current_save_and_set_rtlock_wait_state();
> for (;;) {
> if (try_lock())
> break;
>
> raw_spin_unlock_irq(&lock->wait_lock);
> schedule();
> raw_spin_lock_irq(&lock->wait_lock);
>
> set_current_state(TASK_RTLOCK_WAIT);
> }
> current_restore_rtlock_saved_state();
>
> which is the advertised usage in the comment, is actually broken,
> since trylock() will only need a load-acquire in general and that
> could be re-ordered against the state store, which could lead to a
> missed wakeup -> BAD (tm).

Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
order the state change before the successful try_lock? I'm just struggling
to envisage how this actually goes wrong.

Will

2021-09-09 14:45:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote:
> On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> > While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> > it really ought to use smp_store_mb(), because something like:
> >
> > current_save_and_set_rtlock_wait_state();
> > for (;;) {
> > if (try_lock())
> > break;
> >
> > raw_spin_unlock_irq(&lock->wait_lock);
> > schedule();
> > raw_spin_lock_irq(&lock->wait_lock);
> >
> > set_current_state(TASK_RTLOCK_WAIT);
> > }
> > current_restore_rtlock_saved_state();
> >
> > which is the advertised usage in the comment, is actually broken,
> > since trylock() will only need a load-acquire in general and that
> > could be re-ordered against the state store, which could lead to a
> > missed wakeup -> BAD (tm).
>
> Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
> order the state change before the successful try_lock? I'm just struggling
> to envisage how this actually goes wrong.

Moo yes, so the earlier changelog I wrote was something like:

current_save_and_set_rtlock_wait_state();
for (;;) {
if (try_lock())
break;

raw_spin_unlock_irq(&lock->wait_lock);
if (!cond)
schedule();
raw_spin_lock_irq(&lock->wait_lock);

set_current_state(TASK_RTLOCK_WAIT);
}
current_restore_rtlock_saved_state();

which is more what the code looks like before these patches, and in that
case the @cond load can be lifted before __state.

It all sorta works in the current application because most things are
serialized by ->wait_lock, but given the 'normal' wait pattern I got
highly suspicious of there not being a full barrier around.

Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On 2021-09-09 12:59:16 [+0200], Peter Zijlstra wrote:
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -253,22 +254,24 @@ struct task_group;
> */
> #define current_save_and_set_rtlock_wait_state() \
> do { \
> - lockdep_assert_irqs_disabled(); \
> - raw_spin_lock(&current->pi_lock); \
> + unsigned long flags; /* may shadow */ \
could we haz __flags so no shadow?

> + \
> + raw_spin_lock_irqsave(&current->pi_lock, flags); \
> current->saved_state = current->__state; \
> debug_rtlock_wait_set_state(); \
> - WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT); \
> - raw_spin_unlock(&current->pi_lock); \
> + smp_store_mb(current->__state, TASK_RTLOCK_WAIT); \
> + raw_spin_unlock_irqrestore(&current->pi_lock, flags); \

Sebastian

2021-09-10 12:58:40

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> > > While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> > > it really ought to use smp_store_mb(), because something like:
> > >
> > > current_save_and_set_rtlock_wait_state();
> > > for (;;) {
> > > if (try_lock())
> > > break;
> > >
> > > raw_spin_unlock_irq(&lock->wait_lock);
> > > schedule();
> > > raw_spin_lock_irq(&lock->wait_lock);
> > >
> > > set_current_state(TASK_RTLOCK_WAIT);
> > > }
> > > current_restore_rtlock_saved_state();
> > >
> > > which is the advertised usage in the comment, is actually broken,
> > > since trylock() will only need a load-acquire in general and that
> > > could be re-ordered against the state store, which could lead to a
> > > missed wakeup -> BAD (tm).
> >
> > Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
> > order the state change before the successful try_lock? I'm just struggling
> > to envisage how this actually goes wrong.
>
> Moo yes, so the earlier changelog I wrote was something like:
>
> current_save_and_set_rtlock_wait_state();
> for (;;) {
> if (try_lock())
> break;
>
> raw_spin_unlock_irq(&lock->wait_lock);
> if (!cond)
> schedule();
> raw_spin_lock_irq(&lock->wait_lock);
>
> set_current_state(TASK_RTLOCK_WAIT);
> }
> current_restore_rtlock_saved_state();
>
> which is more what the code looks like before these patches, and in that
> case the @cond load can be lifted before __state.

Ah, so that makes more sense, thanks. I can't see how the try_lock() could
be reordered though, as it's going to have to do an atomic rmw.

Will

2021-09-10 13:21:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Fri, Sep 10, 2021 at 01:57:26PM +0100, Will Deacon wrote:
> On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> > Moo yes, so the earlier changelog I wrote was something like:
> >
> > current_save_and_set_rtlock_wait_state();
> > for (;;) {
> > if (try_lock())
> > break;
> >
> > raw_spin_unlock_irq(&lock->wait_lock);
> > if (!cond)
> > schedule();
> > raw_spin_lock_irq(&lock->wait_lock);
> >
> > set_current_state(TASK_RTLOCK_WAIT);
> > }
> > current_restore_rtlock_saved_state();
> >
> > which is more what the code looks like before these patches, and in that
> > case the @cond load can be lifted before __state.
>
> Ah, so that makes more sense, thanks. I can't see how the try_lock() could
> be reordered though, as it's going to have to do an atomic rmw.

OK, lemme go update the Changelog and make it __flags for bigeasy :-)

2021-09-10 14:05:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Fri, Sep 10, 2021 at 03:17:04PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 10, 2021 at 01:57:26PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> > > Moo yes, so the earlier changelog I wrote was something like:
> > >
> > > current_save_and_set_rtlock_wait_state();
> > > for (;;) {
> > > if (try_lock())
> > > break;
> > >
> > > raw_spin_unlock_irq(&lock->wait_lock);
> > > if (!cond)
> > > schedule();
> > > raw_spin_lock_irq(&lock->wait_lock);
> > >
> > > set_current_state(TASK_RTLOCK_WAIT);
> > > }
> > > current_restore_rtlock_saved_state();
> > >
> > > which is more what the code looks like before these patches, and in that
> > > case the @cond load can be lifted before __state.
> >
> > Ah, so that makes more sense, thanks. I can't see how the try_lock() could
> > be reordered though, as it's going to have to do an atomic rmw.
>
> OK, lemme go update the Changelog and make it __flags for bigeasy :-)

The patch now reads:

---
Subject: sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
From: Peter Zijlstra <[email protected]>
Date: Thu, 09 Sep 2021 12:59:16 +0200

While looking at current_save_and_set_rtlock_wait_state() I'm thinking
it really ought to use smp_store_mb(), because using it for a more
traditional wait loop like:

current_save_and_set_rtlock_wait_state();
for (;;) {
if (cond)
schedule();

set_current_state(TASK_RTLOCK_WAIT);
}
current_restore_rtlock_saved_state();

is actually broken, since the cond load could be re-ordered against
the state store, which could lead to a missed wakeup -> BAD (tm).

While there, make them consistent with the IRQ usage in
set_special_state().

Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/sched.h | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -218,7 +218,7 @@ struct task_group;
*/
#define set_special_state(state_value) \
do { \
- unsigned long flags; /* may shadow */ \
+ unsigned long __flags; /* may shadow */ \
\
raw_spin_lock_irqsave(&current->pi_lock, flags); \
debug_special_state_change((state_value)); \
@@ -245,7 +245,8 @@ struct task_group;
* if (try_lock())
* break;
* raw_spin_unlock_irq(&lock->wait_lock);
- * schedule_rtlock();
+ * if (!cond)
+ * schedule_rtlock();
* raw_spin_lock_irq(&lock->wait_lock);
* set_current_state(TASK_RTLOCK_WAIT);
* }
@@ -253,22 +254,24 @@ struct task_group;
*/
#define current_save_and_set_rtlock_wait_state() \
do { \
- lockdep_assert_irqs_disabled(); \
- raw_spin_lock(&current->pi_lock); \
+ unsigned long __flags; /* may shadow */ \
+ \
+ raw_spin_lock_irqsave(&current->pi_lock, flags); \
current->saved_state = current->__state; \
debug_rtlock_wait_set_state(); \
- WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT); \
- raw_spin_unlock(&current->pi_lock); \
+ smp_store_mb(current->__state, TASK_RTLOCK_WAIT); \
+ raw_spin_unlock_irqrestore(&current->pi_lock, flags); \
} while (0);

#define current_restore_rtlock_saved_state() \
do { \
- lockdep_assert_irqs_disabled(); \
- raw_spin_lock(&current->pi_lock); \
+ unsigned long __flags; /* may shadow */ \
+ \
+ raw_spin_lock_irqsave(&current->pi_lock, flags); \
debug_rtlock_wait_restore_state(); \
WRITE_ONCE(current->__state, current->saved_state); \
current->saved_state = TASK_RUNNING; \
- raw_spin_unlock(&current->pi_lock); \
+ raw_spin_unlock_irqrestore(&current->pi_lock, flags); \
} while (0);

#define get_current_state() READ_ONCE(current->__state)

2021-09-10 15:09:21

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Fri, Sep 10, 2021 at 04:01:29PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 10, 2021 at 03:17:04PM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 10, 2021 at 01:57:26PM +0100, Will Deacon wrote:
> > > On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> > > > Moo yes, so the earlier changelog I wrote was something like:
> > > >
> > > > current_save_and_set_rtlock_wait_state();
> > > > for (;;) {
> > > > if (try_lock())
> > > > break;
> > > >
> > > > raw_spin_unlock_irq(&lock->wait_lock);
> > > > if (!cond)
> > > > schedule();
> > > > raw_spin_lock_irq(&lock->wait_lock);
> > > >
> > > > set_current_state(TASK_RTLOCK_WAIT);
> > > > }
> > > > current_restore_rtlock_saved_state();
> > > >
> > > > which is more what the code looks like before these patches, and in that
> > > > case the @cond load can be lifted before __state.
> > >
> > > Ah, so that makes more sense, thanks. I can't see how the try_lock() could
> > > be reordered though, as it's going to have to do an atomic rmw.
> >
> > OK, lemme go update the Changelog and make it __flags for bigeasy :-)
>
> The patch now reads:
>
> ---
> Subject: sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
> From: Peter Zijlstra <[email protected]>
> Date: Thu, 09 Sep 2021 12:59:16 +0200
>
> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> it really ought to use smp_store_mb(), because using it for a more
> traditional wait loop like:
>
> current_save_and_set_rtlock_wait_state();
> for (;;) {
> if (cond)
> schedule();
>
> set_current_state(TASK_RTLOCK_WAIT);
> }
> current_restore_rtlock_saved_state();
>
> is actually broken, since the cond load could be re-ordered against
> the state store, which could lead to a missed wakeup -> BAD (tm).
>
> While there, make them consistent with the IRQ usage in
> set_special_state().

Cheers, that's much better:

Acked-by: Will Deacon <[email protected]>

Will

2021-09-10 16:11:40

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On 9/10/21 10:01 AM, Peter Zijlstra wrote:
> On Fri, Sep 10, 2021 at 03:17:04PM +0200, Peter Zijlstra wrote:
>> On Fri, Sep 10, 2021 at 01:57:26PM +0100, Will Deacon wrote:
>>> On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
>>>> Moo yes, so the earlier changelog I wrote was something like:
>>>>
>>>> current_save_and_set_rtlock_wait_state();
>>>> for (;;) {
>>>> if (try_lock())
>>>> break;
>>>>
>>>> raw_spin_unlock_irq(&lock->wait_lock);
>>>> if (!cond)
>>>> schedule();
>>>> raw_spin_lock_irq(&lock->wait_lock);
>>>>
>>>> set_current_state(TASK_RTLOCK_WAIT);
>>>> }
>>>> current_restore_rtlock_saved_state();
>>>>
>>>> which is more what the code looks like before these patches, and in that
>>>> case the @cond load can be lifted before __state.
>>> Ah, so that makes more sense, thanks. I can't see how the try_lock() could
>>> be reordered though, as it's going to have to do an atomic rmw.
>> OK, lemme go update the Changelog and make it __flags for bigeasy :-)
> The patch now reads:
>
> ---
> Subject: sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()
> From: Peter Zijlstra <[email protected]>
> Date: Thu, 09 Sep 2021 12:59:16 +0200
>
> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> it really ought to use smp_store_mb(), because using it for a more
> traditional wait loop like:
>
> current_save_and_set_rtlock_wait_state();
> for (;;) {
> if (cond)
> schedule();
>
> set_current_state(TASK_RTLOCK_WAIT);
> }
> current_restore_rtlock_saved_state();
>
> is actually broken, since the cond load could be re-ordered against
> the state store, which could lead to a missed wakeup -> BAD (tm).
>
> While there, make them consistent with the IRQ usage in
> set_special_state().
>
> Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> include/linux/sched.h | 21 ++++++++++++---------
> 1 file changed, 12 insertions(+), 9 deletions(-)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -218,7 +218,7 @@ struct task_group;
> */
> #define set_special_state(state_value) \
> do { \
> - unsigned long flags; /* may shadow */ \
> + unsigned long __flags; /* may shadow */ \

Do you still need the "may shadow" comment?

Cheers,
Longman

2021-09-10 17:13:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Fri, Sep 10, 2021 at 12:07:40PM -0400, Waiman Long wrote:
> > #define set_special_state(state_value) \
> > do { \
> > - unsigned long flags; /* may shadow */ \
> > + unsigned long __flags; /* may shadow */ \
>
> Do you still need the "may shadow" comment?

Strictly speaking yes, there _could_ be a local variable called __flags,
however unlikely.

At some point someone was going around and doing dodgy patches based on
compiler warnings about variables being shadowed, this it to preempt
anybody trying to 'fix' this (again).

2021-09-12 04:05:09

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Thu, Sep 09, 2021 at 04:27:46PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 09, 2021 at 02:45:24PM +0100, Will Deacon wrote:
> > On Thu, Sep 09, 2021 at 12:59:16PM +0200, Peter Zijlstra wrote:
> > > While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> > > it really ought to use smp_store_mb(), because something like:
> > >
> > > current_save_and_set_rtlock_wait_state();
> > > for (;;) {
> > > if (try_lock())
> > > break;
> > >
> > > raw_spin_unlock_irq(&lock->wait_lock);
> > > schedule();
> > > raw_spin_lock_irq(&lock->wait_lock);
> > >
> > > set_current_state(TASK_RTLOCK_WAIT);
> > > }
> > > current_restore_rtlock_saved_state();
> > >
> > > which is the advertised usage in the comment, is actually broken,
> > > since trylock() will only need a load-acquire in general and that
> > > could be re-ordered against the state store, which could lead to a
> > > missed wakeup -> BAD (tm).
> >
> > Why doesn't the UNLOCK of pi_lock in current_save_and_set_rtlock_wait_state()
> > order the state change before the successful try_lock? I'm just struggling
> > to envisage how this actually goes wrong.
>
> Moo yes, so the earlier changelog I wrote was something like:
>
> current_save_and_set_rtlock_wait_state();
> for (;;) {
> if (try_lock())
> break;
>
> raw_spin_unlock_irq(&lock->wait_lock);
> if (!cond)
> schedule();
> raw_spin_lock_irq(&lock->wait_lock);
>
> set_current_state(TASK_RTLOCK_WAIT);
> }
> current_restore_rtlock_saved_state();
>
> which is more what the code looks like before these patches, and in that
> case the @cond load can be lifted before __state.
>
> It all sorta works in the current application because most things are
> serialized by ->wait_lock, but given the 'normal' wait pattern I got
> highly suspicious of there not being a full barrier around.

Hmm.. I think ->pi_lock actually protects us here. IIUC, a mising
wake-up would happen if try_to_wake_up() failed to observe the __state
change by the about-to-wait task, and the about-to-wait task didn't
observe the condition set by the waker task, for example:

TASK 0 TASK 1
====== ======
cond = 1;
...
try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..):
ttwu_state_match(...)
if (t0->__state & TASK_RTLOCK_WAIT) // false
..
return false; // don't wake up
...
current->__state = TASK_RTLOCK_WAIT
...
if (!cond) // !cond is true because of memory reordering
schedule(); // sleep, and may not be waken up again.

But let's add ->pi_lock critical sections into the example:

TASK 0 TASK 1
====== ======
cond = 1;
...
try_to_wake_up(t0, TASK_RTLOCK_WAIT, ..):
raw_spin_lock_irqsave(->pi_lock,...);
ttwu_state_match(...)
if (t0->__state & TASK_RTLOCK_WAIT) // false
..
return false; // don't wake up
raw_spin_unlock_irqrestore(->pi_lock,...); // A
...
raw_spin_lock_irqsave(->pi_lock, ...); // B
current->__state = TASK_RTLOCK_WAIT
raw_spin_unlock_irqrestore(->pi_lock, ...);
if (!cond)
schedule();

Now the read of cond on TASK0 must observe the store of cond on TASK1,
because accesses to __state is serialized by ->pi_lock, so if TASK1's
read to __state didn't observe the write of TASK0 to __state, then the
lock B must read from the unlock A (or another unlock co-after A),
then we have a release-acquire pair to guarantee that the read of cond
on TASK0 sees the write of cond on TASK1. Simplify this by a litmus
test below:

C unlock-lock
{
}

P0(spinlock_t *s, int *cond, int *state)
{
int r1;

spin_lock(s);
WRITE_ONCE(*state, 1);
spin_unlock(s);
r1 = READ_ONCE(*cond);
}

P1(spinlock_t *s, int *cond, int *state)
{
int r1;

WRITE_ONCE(*cond, 1);
spin_lock(s);
r1 = READ_ONCE(*state);
spin_unlock(s);
}

exists (0:r1=0 /\ 1:r1=0)

and result is:

Test unlock-lock Allowed
States 3
0:r1=0; 1:r1=1;
0:r1=1; 1:r1=0;
0:r1=1; 1:r1=1;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (0:r1=0 /\ 1:r1=0)
Observation unlock-lock Never 0 3
Time unlock-lock 0.01
Hash=e1f914505f07e380405f65d3b0fb6940

In short, since we write to the __state with ->pi_lock held, I don't
think we need to smp_store_mb() for __state. But maybe I'm missing
something subtle here ;-)

Regards,
Boqun

2021-09-14 01:17:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Thu, Sep 09 2021 at 12:59, Peter Zijlstra wrote:
> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
> it really ought to use smp_store_mb(), because something like:
>
> current_save_and_set_rtlock_wait_state();
> for (;;) {
> if (try_lock())
> break;
>
> raw_spin_unlock_irq(&lock->wait_lock);
> schedule();
> raw_spin_lock_irq(&lock->wait_lock);
>
> set_current_state(TASK_RTLOCK_WAIT);
> }
> current_restore_rtlock_saved_state();
>
> which is the advertised usage in the comment, is actually broken,
> since trylock() will only need a load-acquire in general and that
> could be re-ordered against the state store, which could lead to a
> missed wakeup -> BAD (tm).

I don't think so because both the state store and the wakeup are
serialized via tsk->pi_lock.

> While there, make them consistent with the IRQ usage in
> set_special_state().
>
> Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> include/linux/sched.h | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -245,7 +245,8 @@ struct task_group;
> * if (try_lock())
> * break;
> * raw_spin_unlock_irq(&lock->wait_lock);
> - * schedule_rtlock();
> + * if (!cond)
> + * schedule_rtlock();

cond is not really relevant here.

> * raw_spin_lock_irq(&lock->wait_lock);
> * set_current_state(TASK_RTLOCK_WAIT);
> * }
> @@ -253,22 +254,24 @@ struct task_group;
> */
> #define current_save_and_set_rtlock_wait_state() \
> do { \
> - lockdep_assert_irqs_disabled(); \
> - raw_spin_lock(&current->pi_lock); \
> + unsigned long flags; /* may shadow */ \
> + \
> + raw_spin_lock_irqsave(&current->pi_lock, flags); \

Why? This is solely for the rtlock use case which invokes this with
interrupts disabled. So why do we need that irqsave() overhead here?

> current->saved_state = current->__state; \
> debug_rtlock_wait_set_state(); \
> - WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT); \
> - raw_spin_unlock(&current->pi_lock); \
> + smp_store_mb(current->__state, TASK_RTLOCK_WAIT); \

The try_lock() does not matter at all here, really. All what matters is
that the unlocker cannot observe the wrong state and that's fully
serialized via tsk::pi_lock.

Thanks,

tglx

2021-09-14 01:26:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Tue, Sep 14 2021 at 00:08, Thomas Gleixner wrote:

> On Thu, Sep 09 2021 at 12:59, Peter Zijlstra wrote:
>> While looking at current_save_and_set_rtlock_wait_state() I'm thinking
>> it really ought to use smp_store_mb(), because something like:
>>
>> current_save_and_set_rtlock_wait_state();
>> for (;;) {
>> if (try_lock())
>> break;
>>
>> raw_spin_unlock_irq(&lock->wait_lock);
>> schedule();
>> raw_spin_lock_irq(&lock->wait_lock);
>>
>> set_current_state(TASK_RTLOCK_WAIT);
>> }
>> current_restore_rtlock_saved_state();
>>
>> which is the advertised usage in the comment, is actually broken,
>> since trylock() will only need a load-acquire in general and that
>> could be re-ordered against the state store, which could lead to a
>> missed wakeup -> BAD (tm).
>
> I don't think so because both the state store and the wakeup are
> serialized via tsk->pi_lock.
>
>> While there, make them consistent with the IRQ usage in
>> set_special_state().
>>
>> Fixes: 5f220be21418 ("sched/wakeup: Prepare for RT sleeping spin/rwlocks")
>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>> ---
>> include/linux/sched.h | 19 +++++++++++--------
>> 1 file changed, 11 insertions(+), 8 deletions(-)
>>
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -245,7 +245,8 @@ struct task_group;
>> * if (try_lock())
>> * break;
>> * raw_spin_unlock_irq(&lock->wait_lock);
>> - * schedule_rtlock();
>> + * if (!cond)
>> + * schedule_rtlock();
>
> cond is not really relevant here.
>
>> * raw_spin_lock_irq(&lock->wait_lock);
>> * set_current_state(TASK_RTLOCK_WAIT);
>> * }
>> @@ -253,22 +254,24 @@ struct task_group;
>> */
>> #define current_save_and_set_rtlock_wait_state() \
>> do { \
>> - lockdep_assert_irqs_disabled(); \
>> - raw_spin_lock(&current->pi_lock); \
>> + unsigned long flags; /* may shadow */ \
>> + \
>> + raw_spin_lock_irqsave(&current->pi_lock, flags); \
>
> Why? This is solely for the rtlock use case which invokes this with
> interrupts disabled. So why do we need that irqsave() overhead here?
>
>> current->saved_state = current->__state; \
>> debug_rtlock_wait_set_state(); \
>> - WRITE_ONCE(current->__state, TASK_RTLOCK_WAIT); \
>> - raw_spin_unlock(&current->pi_lock); \
>> + smp_store_mb(current->__state, TASK_RTLOCK_WAIT); \
>
> The try_lock() does not matter at all here, really. All what matters is
> that the unlocker cannot observe the wrong state and that's fully
> serialized via tsk::pi_lock.

If your reasoning would be correct, then set_special_state() would be
broken as well.

Thanks,

tglx

2021-09-14 06:48:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/4] sched/wakeup: Strengthen current_save_and_set_rtlock_wait_state()

On Tue, Sep 14, 2021 at 12:52:29AM +0200, Thomas Gleixner wrote:
> On Tue, Sep 14 2021 at 00:08, Thomas Gleixner wrote:

> If your reasoning would be correct, then set_special_state() would be
> broken as well.

I've dropped the patch.