LinuxLists.cc - [PATCH] rwsem: fix missed wakeup due to reordering of load

2017-07-26 20:18:24

Subject: [PATCH] rwsem: fix missed wakeup due to reordering of load

If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed.

spinning writer up_write caller
--------------- -----------------------
[S] osq_unlock() [L] osq
spin_lock(wait_lock)
sem->count=0xFFFFFFFF00000001
+0xFFFFFFFF00000000
count=sem->count
MB
sem->count=0xFFFFFFFE00000001
-0xFFFFFFFF00000001
spin_trylock(wait_lock)
return
rwsem_try_write_lock(count)
spin_unlock(wait_lock)
schedule()

Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem->count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().

The smp_rmb() will make sure that the spinner state is
consulted after sem->count is updated in up_write context.

Signed-off-by: Prateek Sood <[email protected]>
---
kernel/locking/rwsem-xadd.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 34e727f..21c111a 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -585,6 +585,40 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
unsigned long flags;
DEFINE_WAKE_Q(wake_q);

+ /*
+ * If a spinner is present, there is a chance that the load of
+ * rwsem_has_spinner() in rwsem_wake() can be reordered with
+ * respect to decrement of rwsem count in __up_write() leading
+ * to wakeup being missed.
+ *
+ * spinning writer up_write caller
+ * --------------- -----------------------
+ * [S] osq_unlock() [L] osq
+ * spin_lock(wait_lock)
+ * sem->count=0xFFFFFFFF00000001
+ * +0xFFFFFFFF00000000
+ * count=sem->count
+ * MB
+ * sem->count=0xFFFFFFFE00000001
+ * -0xFFFFFFFF00000001
+ * spin_trylock(wait_lock)
+ * return
+ * rwsem_try_write_lock(count)
+ * spin_unlock(wait_lock)
+ * schedule()
+ *
+ * Reordering of atomic_long_sub_return_release() in __up_write()
+ * and rwsem_has_spinner() in rwsem_wake() can cause missing of
+ * wakeup in up_write() context. In spinning writer, sem->count
+ * and local variable count is 0XFFFFFFFE00000001. It would result
+ * in rwsem_try_write_lock() failing to acquire rwsem and spinning
+ * writer going to sleep in rwsem_down_write_failed().
+ *
+ * The smp_rmb() here is to make sure that the spinner state is
+ * consulted after sem->count is updated in up_write context.
+ */
+ smp_rmb();
+
/*
* If a spinner is present, it is not necessary to do the wakeup.
* Try to do wakeup only if the trylock succeeds to minimize
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2017-07-27 15:48:56

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH] rwsem: fix missed wakeup due to reordering of load

On 07/26/2017 04:17 PM, Prateek Sood wrote:
> If a spinner is present, there is a chance that the load of
> rwsem_has_spinner() in rwsem_wake() can be reordered with
> respect to decrement of rwsem count in __up_write() leading
> to wakeup being missed.
>
> spinning writer up_write caller
> --------------- -----------------------
> [S] osq_unlock() [L] osq
> spin_lock(wait_lock)
> sem->count=0xFFFFFFFF00000001
> +0xFFFFFFFF00000000
> count=sem->count
> MB
> sem->count=0xFFFFFFFE00000001
> -0xFFFFFFFF00000001
> spin_trylock(wait_lock)
> return
> rwsem_try_write_lock(count)
> spin_unlock(wait_lock)
> schedule()
>
> Reordering of atomic_long_sub_return_release() in __up_write()
> and rwsem_has_spinner() in rwsem_wake() can cause missing of
> wakeup in up_write() context. In spinning writer, sem->count
> and local variable count is 0XFFFFFFFE00000001. It would result
> in rwsem_try_write_lock() failing to acquire rwsem and spinning
> writer going to sleep in rwsem_down_write_failed().
>
> The smp_rmb() will make sure that the spinner state is
> consulted after sem->count is updated in up_write context.
>
> Signed-off-by: Prateek Sood <[email protected]>

Did you actually observe that the reordering happens?

I am not sure if some architectures can actually speculatively execute
instruction ahead of a branch and then ahead into a function call. I
know it can happen if the function call is inlined, but rwsem_wake()
will not be inlined into __up_read() or __up_write().

Even if that is the case, I am not sure if smp_rmb() alone is enough to
guarantee the ordering as I think it will depend on how the
atomic_long_sub_return_release() is implmented.

Cheers,
Longman

2017-07-27 16:59:44

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] rwsem: fix missed wakeup due to reordering of load

On Thu, Jul 27, 2017 at 11:48:53AM -0400, Waiman Long wrote:
> atomic_long_sub_return_release() is implmented.

I've not had time to really thing about the problem at hand, but this I
can answer:

TSO (x86, s390, sparc): fully serialized
PPC: lwsync; ll/sc (RCpc)
ARM64: ll/sc-release (RCsc)
other: smp_mb(); atomic_sub_return_relaxed();

2017-08-10 08:33:07

by Andrea Parri

[permalink] [raw]

Subject: Re: [PATCH] rwsem: fix missed wakeup due to reordering of load

On Thu, Jul 27, 2017 at 11:48:53AM -0400, Waiman Long wrote:
> On 07/26/2017 04:17 PM, Prateek Sood wrote:
> > If a spinner is present, there is a chance that the load of
> > rwsem_has_spinner() in rwsem_wake() can be reordered with
> > respect to decrement of rwsem count in __up_write() leading
> > to wakeup being missed.
> >
> > spinning writer up_write caller
> > --------------- -----------------------
> > [S] osq_unlock() [L] osq
> > spin_lock(wait_lock)
> > sem->count=0xFFFFFFFF00000001
> > +0xFFFFFFFF00000000
> > count=sem->count
> > MB
> > sem->count=0xFFFFFFFE00000001
> > -0xFFFFFFFF00000001
> > spin_trylock(wait_lock)
> > return
> > rwsem_try_write_lock(count)
> > spin_unlock(wait_lock)
> > schedule()
> >
> > Reordering of atomic_long_sub_return_release() in __up_write()
> > and rwsem_has_spinner() in rwsem_wake() can cause missing of
> > wakeup in up_write() context. In spinning writer, sem->count
> > and local variable count is 0XFFFFFFFE00000001. It would result
> > in rwsem_try_write_lock() failing to acquire rwsem and spinning
> > writer going to sleep in rwsem_down_write_failed().
> >
> > The smp_rmb() will make sure that the spinner state is
> > consulted after sem->count is updated in up_write context.
> >
> > Signed-off-by: Prateek Sood <[email protected]>
>
> Did you actually observe that the reordering happens?
>
> I am not sure if some architectures can actually speculatively execute
> instruction ahead of a branch and then ahead into a function call. I
> know it can happen if the function call is inlined, but rwsem_wake()
> will not be inlined into __up_read() or __up_write().

Branches/control dependencies targeting a read do not necessarily preserve
program order; this is for example the case for PowerPC and ARM.

I'd not expect more than a compiler barrier from a function call (in fact,
not even that if the function happens to be inlined).

>
> Even if that is the case, I am not sure if smp_rmb() alone is enough to
> guarantee the ordering as I think it will depend on how the
> atomic_long_sub_return_release() is implmented.

AFAICT, the pattern under discussion is MP with:

- a store-release to osq->tail(unlock) followed by a store to sem->count,
separated by a MB (from atomic_long_add_return()) on CPU0;

- a load of sem->count (for atomic_long_sub_return_release()) followed by
a load of osq->tail (rwsem_has_spinner()) on CPU1.

Thus a RMW between the two loads suffices to forbid the weak behaviour.

Andrea

>
> Cheers,
> Longman
>

2017-08-10 10:41:26

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] rwsem: fix missed wakeup due to reordering of load

On Thu, Aug 10, 2017 at 10:32:56AM +0200, Andrea Parri wrote:
> On Thu, Jul 27, 2017 at 11:48:53AM -0400, Waiman Long wrote:
> > On 07/26/2017 04:17 PM, Prateek Sood wrote:
> > > If a spinner is present, there is a chance that the load of
> > > rwsem_has_spinner() in rwsem_wake() can be reordered with
> > > respect to decrement of rwsem count in __up_write() leading
> > > to wakeup being missed.
> > >
> > > spinning writer up_write caller
> > > --------------- -----------------------
> > > [S] osq_unlock() [L] osq
> > > spin_lock(wait_lock)
> > > sem->count=0xFFFFFFFF00000001
> > > +0xFFFFFFFF00000000
> > > count=sem->count
> > > MB
> > > sem->count=0xFFFFFFFE00000001
> > > -0xFFFFFFFF00000001
> > > spin_trylock(wait_lock)
> > > return
> > > rwsem_try_write_lock(count)
> > > spin_unlock(wait_lock)
> > > schedule()
> > >
> > > Reordering of atomic_long_sub_return_release() in __up_write()
> > > and rwsem_has_spinner() in rwsem_wake() can cause missing of
> > > wakeup in up_write() context. In spinning writer, sem->count
> > > and local variable count is 0XFFFFFFFE00000001. It would result
> > > in rwsem_try_write_lock() failing to acquire rwsem and spinning
> > > writer going to sleep in rwsem_down_write_failed().
> > >
> > > The smp_rmb() will make sure that the spinner state is
> > > consulted after sem->count is updated in up_write context.
> > >
> > > Signed-off-by: Prateek Sood <[email protected]>
> >
> > Did you actually observe that the reordering happens?
> >
> > I am not sure if some architectures can actually speculatively execute
> > instruction ahead of a branch and then ahead into a function call. I
> > know it can happen if the function call is inlined, but rwsem_wake()
> > will not be inlined into __up_read() or __up_write().
>
> Branches/control dependencies targeting a read do not necessarily preserve
> program order; this is for example the case for PowerPC and ARM.
>
> I'd not expect more than a compiler barrier from a function call (in fact,
> not even that if the function happens to be inlined).

Indeed. Reads can be speculated by deep out-of-order CPUs no problem.
That's what branch predictors are for.

> > Even if that is the case, I am not sure if smp_rmb() alone is enough to
> > guarantee the ordering as I think it will depend on how the
> > atomic_long_sub_return_release() is implmented.
>
> AFAICT, the pattern under discussion is MP with:
>
> - a store-release to osq->tail(unlock) followed by a store to sem->count,
> separated by a MB (from atomic_long_add_return()) on CPU0;
>
> - a load of sem->count (for atomic_long_sub_return_release()) followed by

Which is a regular load, as 'release' only need apply to the store.

> a load of osq->tail (rwsem_has_spinner()) on CPU1.
>
> Thus a RMW between the two loads suffices to forbid the weak behaviour.

Agreed.

2017-08-10 10:44:10

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH] rwsem: fix missed wakeup due to reordering of load

On Thu, Jul 27, 2017 at 01:47:52AM +0530, Prateek Sood wrote:
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 34e727f..21c111a 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -585,6 +585,40 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
> unsigned long flags;
> DEFINE_WAKE_Q(wake_q);
>
> + /*
> + * If a spinner is present, there is a chance that the load of
> + * rwsem_has_spinner() in rwsem_wake() can be reordered with
> + * respect to decrement of rwsem count in __up_write() leading
> + * to wakeup being missed.
> + *
> + * spinning writer up_write caller
> + * --------------- -----------------------
> + * [S] osq_unlock() [L] osq
> + * spin_lock(wait_lock)
> + * sem->count=0xFFFFFFFF00000001
> + * +0xFFFFFFFF00000000
> + * count=sem->count
> + * MB
> + * sem->count=0xFFFFFFFE00000001
> + * -0xFFFFFFFF00000001
> + * spin_trylock(wait_lock)
> + * return
> + * rwsem_try_write_lock(count)
> + * spin_unlock(wait_lock)
> + * schedule()
> + *
> + * Reordering of atomic_long_sub_return_release() in __up_write()
> + * and rwsem_has_spinner() in rwsem_wake() can cause missing of
> + * wakeup in up_write() context. In spinning writer, sem->count
> + * and local variable count is 0XFFFFFFFE00000001. It would result
> + * in rwsem_try_write_lock() failing to acquire rwsem and spinning
> + * writer going to sleep in rwsem_down_write_failed().
> + *
> + * The smp_rmb() here is to make sure that the spinner state is
> + * consulted after sem->count is updated in up_write context.

I feel that comment can use help.. for example the RMB you add below is
not present at all.

> + */
> + smp_rmb();
> +
> /*
> * If a spinner is present, it is not necessary to do the wakeup.
> * Try to do wakeup only if the trylock succeeds to minimize

Your patch is whitespace damaged, all the indentation on the + lines is
with spaces. Please resend with \t.