If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed.
spinning writer up_write caller
--------------- -----------------------
[S] osq_unlock() [L] osq
spin_lock(wait_lock)
sem->count=0xFFFFFFFF00000001
+0xFFFFFFFF00000000
count=sem->count
MB
sem->count=0xFFFFFFFE00000001
-0xFFFFFFFF00000001
spin_trylock(wait_lock)
return
rwsem_try_write_lock(count)
spin_unlock(wait_lock)
schedule()
Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem->count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().
The smp_rmb() will make sure that the spinner state is
consulted after sem->count is updated in up_write context.
Signed-off-by: Prateek Sood <[email protected]>
---
kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 02f6606..1fefe6d 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
DEFINE_WAKE_Q(wake_q);
/*
+ * __rwsem_down_write_failed_common(sem)
+ * rwsem_optimistic_spin(sem)
+ * osq_unlock(sem->osq)
+ * ...
+ * atomic_long_add_return(&sem->count)
+ *
+ * - VS -
+ *
+ * __up_write()
+ * if (atomic_long_sub_return_release(&sem->count) < 0)
+ * rwsem_wake(sem)
+ * osq_is_locked(&sem->osq)
+ *
+ * And __up_write() must observe !osq_is_locked() when it observes the
+ * atomic_long_add_return() in order to not miss a wakeup.
+ *
+ * This boils down to:
+ *
+ * [S.rel] X = 1 [RmW] r0 = (Y += 0)
+ * MB RMB
+ * [RmW] Y += 1 [L] r1 = X
+ *
+ * exists (r0=1 /\ r1=0)
+ */
+ smp_rmb();
+
+ /*
* If a spinner is present, it is not necessary to do the wakeup.
* Try to do wakeup only if the trylock succeeds to minimize
* spinlock contention which may introduce too much delay in the
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
On Thu, Sep 07, 2017 at 08:00:58PM +0530, Prateek Sood wrote:
> If a spinner is present, there is a chance that the load of
> rwsem_has_spinner() in rwsem_wake() can be reordered with
> respect to decrement of rwsem count in __up_write() leading
> to wakeup being missed.
>
> spinning writer up_write caller
> --------------- -----------------------
> [S] osq_unlock() [L] osq
> spin_lock(wait_lock)
> sem->count=0xFFFFFFFF00000001
> +0xFFFFFFFF00000000
> count=sem->count
> MB
> sem->count=0xFFFFFFFE00000001
> -0xFFFFFFFF00000001
> spin_trylock(wait_lock)
> return
> rwsem_try_write_lock(count)
> spin_unlock(wait_lock)
> schedule()
>
> Reordering of atomic_long_sub_return_release() in __up_write()
> and rwsem_has_spinner() in rwsem_wake() can cause missing of
> wakeup in up_write() context. In spinning writer, sem->count
> and local variable count is 0XFFFFFFFE00000001. It would result
> in rwsem_try_write_lock() failing to acquire rwsem and spinning
> writer going to sleep in rwsem_down_write_failed().
>
> The smp_rmb() will make sure that the spinner state is
> consulted after sem->count is updated in up_write context.
>
> Signed-off-by: Prateek Sood <[email protected]>
Reviewed-by: Andrea Parri <[email protected]>
I understand that the merge window and LPC made this stalls for
a while... what am I missing? are there other changes that need
to be considered for this patch?
Andrea
> ---
> kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 02f6606..1fefe6d 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
> DEFINE_WAKE_Q(wake_q);
>
> /*
> + * __rwsem_down_write_failed_common(sem)
> + * rwsem_optimistic_spin(sem)
> + * osq_unlock(sem->osq)
> + * ...
> + * atomic_long_add_return(&sem->count)
> + *
> + * - VS -
> + *
> + * __up_write()
> + * if (atomic_long_sub_return_release(&sem->count) < 0)
> + * rwsem_wake(sem)
> + * osq_is_locked(&sem->osq)
> + *
> + * And __up_write() must observe !osq_is_locked() when it observes the
> + * atomic_long_add_return() in order to not miss a wakeup.
> + *
> + * This boils down to:
> + *
> + * [S.rel] X = 1 [RmW] r0 = (Y += 0)
> + * MB RMB
> + * [RmW] Y += 1 [L] r1 = X
> + *
> + * exists (r0=1 /\ r1=0)
> + */
> + smp_rmb();
> +
> + /*
> * If a spinner is present, it is not necessary to do the wakeup.
> * Try to do wakeup only if the trylock succeeds to minimize
> * spinlock contention which may introduce too much delay in the
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.,
> is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.
>
On Thu, 07 Sep 2017, Prateek Sood wrote:
> /*
>+ * __rwsem_down_write_failed_common(sem)
>+ * rwsem_optimistic_spin(sem)
>+ * osq_unlock(sem->osq)
>+ * ...
>+ * atomic_long_add_return(&sem->count)
>+ *
>+ * - VS -
>+ *
>+ * __up_write()
>+ * if (atomic_long_sub_return_release(&sem->count) < 0)
>+ * rwsem_wake(sem)
>+ * osq_is_locked(&sem->osq)
>+ *
>+ * And __up_write() must observe !osq_is_locked() when it observes the
>+ * atomic_long_add_return() in order to not miss a wakeup.
>+ *
>+ * This boils down to:
>+ *
>+ * [S.rel] X = 1 [RmW] r0 = (Y += 0)
>+ * MB RMB
>+ * [RmW] Y += 1 [L] r1 = X
>+ *
>+ * exists (r0=1 /\ r1=0)
>+ */
>+ smp_rmb();
Instead, how about just removing the release from atomic_long_sub_return_release()
such that the osq load is not hoisted over the atomic compound (along with Peter's
comment):
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index 6c6a2141f271..487ce31078ff 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -101,7 +101,7 @@ static inline void __up_read(struct rw_semaphore *sem)
*/
static inline void __up_write(struct rw_semaphore *sem)
{
- if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS,
+ if (unlikely(atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
&sem->count) < 0))
rwsem_wake(sem);
}
On Wed, Sep 20, 2017 at 07:52:54AM -0700, Davidlohr Bueso wrote:
> On Thu, 07 Sep 2017, Prateek Sood wrote:
> > /*
> >+ * __rwsem_down_write_failed_common(sem)
> >+ * rwsem_optimistic_spin(sem)
> >+ * osq_unlock(sem->osq)
> >+ * ...
> >+ * atomic_long_add_return(&sem->count)
> >+ *
> >+ * - VS -
> >+ *
> >+ * __up_write()
> >+ * if (atomic_long_sub_return_release(&sem->count) < 0)
> >+ * rwsem_wake(sem)
> >+ * osq_is_locked(&sem->osq)
> >+ *
> >+ * And __up_write() must observe !osq_is_locked() when it observes the
> >+ * atomic_long_add_return() in order to not miss a wakeup.
> >+ *
> >+ * This boils down to:
> >+ *
> >+ * [S.rel] X = 1 [RmW] r0 = (Y += 0)
> >+ * MB RMB
> >+ * [RmW] Y += 1 [L] r1 = X
> >+ *
> >+ * exists (r0=1 /\ r1=0)
> >+ */
> >+ smp_rmb();
>
> Instead, how about just removing the release from atomic_long_sub_return_release()
> such that the osq load is not hoisted over the atomic compound (along with Peter's
> comment):
This solution will actually enforce a stronger (full) ordering w.r.t. the
solution described by Prateek and Peter. Also, it will "trade" two lwsync
for two sync (powerpc), one dmb.ld for one dmb (arm64).
What are the reasons you would prefer this?
Andrea
>
> diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
> index 6c6a2141f271..487ce31078ff 100644
> --- a/include/asm-generic/rwsem.h
> +++ b/include/asm-generic/rwsem.h
> @@ -101,7 +101,7 @@ static inline void __up_read(struct rw_semaphore *sem)
> */
> static inline void __up_write(struct rw_semaphore *sem)
> {
> - if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS,
> + if (unlikely(atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS,
> &sem->count) < 0))
> rwsem_wake(sem);
> }
On 09/07/2017 08:00 PM, Prateek Sood wrote:
> If a spinner is present, there is a chance that the load of
> rwsem_has_spinner() in rwsem_wake() can be reordered with
> respect to decrement of rwsem count in __up_write() leading
> to wakeup being missed.
>
> spinning writer up_write caller
> --------------- -----------------------
> [S] osq_unlock() [L] osq
> spin_lock(wait_lock)
> sem->count=0xFFFFFFFF00000001
> +0xFFFFFFFF00000000
> count=sem->count
> MB
> sem->count=0xFFFFFFFE00000001
> -0xFFFFFFFF00000001
> spin_trylock(wait_lock)
> return
> rwsem_try_write_lock(count)
> spin_unlock(wait_lock)
> schedule()
>
> Reordering of atomic_long_sub_return_release() in __up_write()
> and rwsem_has_spinner() in rwsem_wake() can cause missing of
> wakeup in up_write() context. In spinning writer, sem->count
> and local variable count is 0XFFFFFFFE00000001. It would result
> in rwsem_try_write_lock() failing to acquire rwsem and spinning
> writer going to sleep in rwsem_down_write_failed().
>
> The smp_rmb() will make sure that the spinner state is
> consulted after sem->count is updated in up_write context.
>
> Signed-off-by: Prateek Sood <[email protected]>
> ---
> kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 02f6606..1fefe6d 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
> DEFINE_WAKE_Q(wake_q);
>
> /*
> + * __rwsem_down_write_failed_common(sem)
> + * rwsem_optimistic_spin(sem)
> + * osq_unlock(sem->osq)
> + * ...
> + * atomic_long_add_return(&sem->count)
> + *
> + * - VS -
> + *
> + * __up_write()
> + * if (atomic_long_sub_return_release(&sem->count) < 0)
> + * rwsem_wake(sem)
> + * osq_is_locked(&sem->osq)
> + *
> + * And __up_write() must observe !osq_is_locked() when it observes the
> + * atomic_long_add_return() in order to not miss a wakeup.
> + *
> + * This boils down to:
> + *
> + * [S.rel] X = 1 [RmW] r0 = (Y += 0)
> + * MB RMB
> + * [RmW] Y += 1 [L] r1 = X
> + *
> + * exists (r0=1 /\ r1=0)
> + */
> + smp_rmb();
> +
> + /*
> * If a spinner is present, it is not necessary to do the wakeup.
> * Try to do wakeup only if the trylock succeeds to minimize
> * spinlock contention which may introduce too much delay in the
>
Hi Folks,
Do you have any more suggestion/feedback on this patch.
Regards
Prateek
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project
On Wed, 20 Sep 2017, Andrea Parri wrote:
>> Instead, how about just removing the release from atomic_long_sub_return_release()
>> such that the osq load is not hoisted over the atomic compound (along with Peter's
>> comment):
>
>This solution will actually enforce a stronger (full) ordering w.r.t. the
>solution described by Prateek and Peter. Also, it will "trade" two lwsync
>for two sync (powerpc), one dmb.ld for one dmb (arm64).
>
>What are the reasons you would prefer this?
It was mainly to maintain consistency about dealing with sem->count, but sure
I won't argue with the above.
Commit-ID: 9c29c31830a4eca724e137a9339137204bbb31be
Gitweb: https://git.kernel.org/tip/9c29c31830a4eca724e137a9339137204bbb31be
Author: Prateek Sood <[email protected]>
AuthorDate: Thu, 7 Sep 2017 20:00:58 +0530
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 29 Sep 2017 10:10:20 +0200
locking/rwsem-xadd: Fix missed wakeup due to reordering of load
If a spinner is present, there is a chance that the load of
rwsem_has_spinner() in rwsem_wake() can be reordered with
respect to decrement of rwsem count in __up_write() leading
to wakeup being missed:
spinning writer up_write caller
--------------- -----------------------
[S] osq_unlock() [L] osq
spin_lock(wait_lock)
sem->count=0xFFFFFFFF00000001
+0xFFFFFFFF00000000
count=sem->count
MB
sem->count=0xFFFFFFFE00000001
-0xFFFFFFFF00000001
spin_trylock(wait_lock)
return
rwsem_try_write_lock(count)
spin_unlock(wait_lock)
schedule()
Reordering of atomic_long_sub_return_release() in __up_write()
and rwsem_has_spinner() in rwsem_wake() can cause missing of
wakeup in up_write() context. In spinning writer, sem->count
and local variable count is 0XFFFFFFFE00000001. It would result
in rwsem_try_write_lock() failing to acquire rwsem and spinning
writer going to sleep in rwsem_down_write_failed().
The smp_rmb() will make sure that the spinner state is
consulted after sem->count is updated in up_write context.
Signed-off-by: Prateek Sood <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 02f6606..1fefe6d 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -613,6 +613,33 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
DEFINE_WAKE_Q(wake_q);
/*
+ * __rwsem_down_write_failed_common(sem)
+ * rwsem_optimistic_spin(sem)
+ * osq_unlock(sem->osq)
+ * ...
+ * atomic_long_add_return(&sem->count)
+ *
+ * - VS -
+ *
+ * __up_write()
+ * if (atomic_long_sub_return_release(&sem->count) < 0)
+ * rwsem_wake(sem)
+ * osq_is_locked(&sem->osq)
+ *
+ * And __up_write() must observe !osq_is_locked() when it observes the
+ * atomic_long_add_return() in order to not miss a wakeup.
+ *
+ * This boils down to:
+ *
+ * [S.rel] X = 1 [RmW] r0 = (Y += 0)
+ * MB RMB
+ * [RmW] Y += 1 [L] r1 = X
+ *
+ * exists (r0=1 /\ r1=0)
+ */
+ smp_rmb();
+
+ /*
* If a spinner is present, it is not necessary to do the wakeup.
* Try to do wakeup only if the trylock succeeds to minimize
* spinlock contention which may introduce too much delay in the