2022-09-29 18:08:55

by Waiman Long

[permalink] [raw]
Subject: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

A non-first waiter can potentially spin in the for loop of
rwsem_down_write_slowpath() without sleeping but fail to acquire the
lock even if the rwsem is free if the following sequence happens:

Non-first waiter First waiter Lock holder
---------------- ------------ -----------
Acquire wait_lock
rwsem_try_write_lock():
Set handoff bit if RT or
wait too long
Set waiter->handoff_set
Release wait_lock
Acquire wait_lock
Inherit waiter->handoff_set
Release wait_lock
Clear owner
Release lock
if (waiter.handoff_set) {
rwsem_spin_on_owner(();
if (OWNER_NULL)
goto trylock_again;
}
trylock_again:
Acquire wait_lock
rwsem_try_write_lock():
if (first->handoff_set && (waiter != first))
return false;
Release wait_lock

It is especially problematic if the non-first waiter is an RT task and
it is running on the same CPU as the first waiter as this can lead to
live lock.

Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/locking/rwsem.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 65f0262f635e..ad676e99e0b3 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
new = count;

if (count & RWSEM_LOCK_MASK) {
+ /*
+ * A waiter (first or not) can set the handoff bit
+ * if it is an RT task or wait in the wait queue
+ * for too long.
+ */
if (has_handoff || (!rt_task(waiter->task) &&
!time_after(jiffies, waiter->timeout)))
return false;
@@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
} while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));

/*
- * We have either acquired the lock with handoff bit cleared or
- * set the handoff bit.
+ * We have either acquired the lock with handoff bit cleared or set
+ * the handoff bit. Only the first waiter can have its handoff_set
+ * set here to enable optimistic spinning in slowpath loop.
*/
if (new & RWSEM_FLAG_HANDOFF) {
- waiter->handoff_set = true;
+ if (waiter == first)
+ waiter->handoff_set = true;
lockevent_inc(rwsem_wlock_handoff);
return false;
}
--
2.31.1


2022-09-29 18:32:28

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 9/29/22 14:04, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
>
> Non-first waiter First waiter Lock holder
> ---------------- ------------ -----------
> Acquire wait_lock
> rwsem_try_write_lock():
> Set handoff bit if RT or
> wait too long
> Set waiter->handoff_set
> Release wait_lock
> Acquire wait_lock
> Inherit waiter->handoff_set
> Release wait_lock
> Clear owner
> Release lock
> if (waiter.handoff_set) {
> rwsem_spin_on_owner(();
> if (OWNER_NULL)
> goto trylock_again;
> }
> trylock_again:
> Acquire wait_lock
> rwsem_try_write_lock():
> if (first->handoff_set && (waiter != first))
> return false;
> Release wait_lock
>
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
>
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <[email protected]>
> ---
> kernel/locking/rwsem.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)

Mukesh, can you test if this patch can fix the RT task lockup problem?

Thanks,
Longman

2022-09-29 22:30:10

by John Donnelly

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 9/29/22 13:04, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
>
> Non-first waiter First waiter Lock holder
> ---------------- ------------ -----------
> Acquire wait_lock
> rwsem_try_write_lock():
> Set handoff bit if RT or
> wait too long
> Set waiter->handoff_set
> Release wait_lock
> Acquire wait_lock
> Inherit waiter->handoff_set
> Release wait_lock
> Clear owner
> Release lock
> if (waiter.handoff_set) {
> rwsem_spin_on_owner(();
> if (OWNER_NULL)
> goto trylock_again;
> }
> trylock_again:
> Acquire wait_lock
> rwsem_try_write_lock():
> if (first->handoff_set && (waiter != first))
> return false;
> Release wait_lock
>
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
>
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <[email protected]>
> ---
> kernel/locking/rwsem.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 65f0262f635e..ad676e99e0b3 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
> new = count;
>
> if (count & RWSEM_LOCK_MASK) {
> + /*
> + * A waiter (first or not) can set the handoff bit
> + * if it is an RT task or wait in the wait queue
> + * for too long.
> + */
> if (has_handoff || (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))
> return false;
> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
> } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
>
> /*
> - * We have either acquired the lock with handoff bit cleared or
> - * set the handoff bit.
> + * We have either acquired the lock with handoff bit cleared or set
> + * the handoff bit. Only the first waiter can have its handoff_set
> + * set here to enable optimistic spinning in slowpath loop.
> */
> if (new & RWSEM_FLAG_HANDOFF) {
> - waiter->handoff_set = true;
> + if (waiter == first)
> + waiter->handoff_set = true;
> lockevent_inc(rwsem_wlock_handoff);
> return false;
> }
Hi,.

Are you missing


[PATCH 5.18 87/88] locking/rwsem: Allow slowpath writer to ignore
handoff bit if not set by first waiter



Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent")

Or is this another regression ?

2022-09-30 01:52:39

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 9/29/22 17:15, John Donnelly wrote:
> On 9/29/22 13:04, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>> consistent")
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262f635e..ad676e99e0b3 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct
>> rw_semaphore *sem,
>>           new = count;
>>             if (count & RWSEM_LOCK_MASK) {
>> +            /*
>> +             * A waiter (first or not) can set the handoff bit
>> +             * if it is an RT task or wait in the wait queue
>> +             * for too long.
>> +             */
>>               if (has_handoff || (!rt_task(waiter->task) &&
>>                           !time_after(jiffies, waiter->timeout)))
>>                   return false;
>> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct
>> rw_semaphore *sem,
>>       } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count,
>> new));
>>         /*
>> -     * We have either acquired the lock with handoff bit cleared or
>> -     * set the handoff bit.
>> +     * We have either acquired the lock with handoff bit cleared or set
>> +     * the handoff bit. Only the first waiter can have its handoff_set
>> +     * set here to enable optimistic spinning in slowpath loop.
>>        */
>>       if (new & RWSEM_FLAG_HANDOFF) {
>> -        waiter->handoff_set = true;
>> +        if (waiter == first)
>> +            waiter->handoff_set = true;
>>           lockevent_inc(rwsem_wlock_handoff);
>>           return false;
>>       }
> Hi,.
>
> Are you missing
>
>
> [PATCH 5.18 87/88] locking/rwsem: Allow slowpath writer to ignore
> handoff bit if not set by first waiter
>
>
>
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
> consistent")
>
> Or is this another regression ?
>
It is another regression which.

Cheers,
Longman

2022-09-30 05:14:18

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Hi,

Thanks for looking into this issue.

On 9/29/2022 11:34 PM, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
>
> Non-first waiter First waiter Lock holder
> ---------------- ------------ -----------
> Acquire wait_lock
> rwsem_try_write_lock():
> Set handoff bit if RT or
> wait too long
> Set waiter->handoff_set
> Release wait_lock
> Acquire wait_lock
> Inherit waiter->handoff_set
> Release wait_lock
> Clear owner
> Release lock
> if (waiter.handoff_set) {
> rwsem_spin_on_owner(();
> if (OWNER_NULL)
> goto trylock_again;
> }
> trylock_again:
> Acquire wait_lock
> rwsem_try_write_lock():
> if (first->handoff_set && (waiter != first))
> return false;
> Release wait_lock
>
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
>
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <[email protected]>
> ---
> kernel/locking/rwsem.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 65f0262f635e..ad676e99e0b3 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
> new = count;
>
> if (count & RWSEM_LOCK_MASK) {
> + /*
> + * A waiter (first or not) can set the handoff bit
> + * if it is an RT task or wait in the wait queue
> + * for too long.
> + */
> if (has_handoff || (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))
> return false;
> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
> } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
>
> /*
> - * We have either acquired the lock with handoff bit cleared or
> - * set the handoff bit.
> + * We have either acquired the lock with handoff bit cleared or set
> + * the handoff bit. Only the first waiter can have its handoff_set
> + * set here to enable optimistic spinning in slowpath loop.
> */
> if (new & RWSEM_FLAG_HANDOFF) {
> - waiter->handoff_set = true;
> + if (waiter == first)
> + waiter->handoff_set = true;
> lockevent_inc(rwsem_wlock_handoff);

nit: Should this not get incremented on waiter->handoff_set=true only ?


> return false;
> }

-Mukesh

2022-09-30 05:28:50

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Hi,

On 9/29/2022 11:34 PM, Waiman Long wrote:
> A non-first waiter can potentially spin in the for loop of
> rwsem_down_write_slowpath() without sleeping but fail to acquire the
> lock even if the rwsem is free if the following sequence happens:
>
> Non-first waiter First waiter Lock holder
> ---------------- ------------ -----------
> Acquire wait_lock
> rwsem_try_write_lock():
> Set handoff bit if RT or
> wait too long
> Set waiter->handoff_set
> Release wait_lock
> Acquire wait_lock
> Inherit waiter->handoff_set
> Release wait_lock
> Clear owner
> Release lock
> if (waiter.handoff_set) {
> rwsem_spin_on_owner(();
> if (OWNER_NULL)
> goto trylock_again;
> }
> trylock_again:
> Acquire wait_lock
> rwsem_try_write_lock():
> if (first->handoff_set && (waiter != first))
> return false;
> Release wait_lock
>
> It is especially problematic if the non-first waiter is an RT task and
> it is running on the same CPU as the first waiter as this can lead to
> live lock.
>
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
> Signed-off-by: Waiman Long <[email protected]>
> ---
> kernel/locking/rwsem.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 65f0262f635e..ad676e99e0b3 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
> new = count;
>
> if (count & RWSEM_LOCK_MASK) {
> + /*
> + * A waiter (first or not) can set the handoff bit
> + * if it is an RT task or wait in the wait queue
> + * for too long.
> + */
> if (has_handoff || (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))

Not related to this issue, however wanted to understand the idea about this.

If RT task comes in any order either come first or later it is setting
the RWSEM_FLAG_HANDOFF bit.
So, here we are giving some priority right a way to RT task however it
can not get waiter->handoff_set=true since it is not the first
waiter.(after this patch), is it not conflicting ?


Why can't we just keep like as below and not set
new |= RWSEM_FLAG_HANDOFF; and return false from here.

--------------0<------------------------------------
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 65f0262..dbe3e16 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -628,8 +628,8 @@ static inline bool rwsem_try_write_lock(struct
rw_semaphore *sem,
new = count;

if (count & RWSEM_LOCK_MASK) {
- if (has_handoff || (!rt_task(waiter->task) &&
- !time_after(jiffies,
waiter->timeout)))
+ if (has_handoff || (rt_task(waiter->task) &&
waiter != first) ||
+ (!rt_task(waiter->task) &&
!time_after(jiffies, waiter->timeout)))
return false;



-Mukesh

> return false;
> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
> } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count, new));
>
> /*
> - * We have either acquired the lock with handoff bit cleared or
> - * set the handoff bit.
> + * We have either acquired the lock with handoff bit cleared or set
> + * the handoff bit. Only the first waiter can have its handoff_set
> + * set here to enable optimistic spinning in slowpath loop.
> */
> if (new & RWSEM_FLAG_HANDOFF) {
> - waiter->handoff_set = true;
> + if (waiter == first)
> + waiter->handoff_set = true;
> lockevent_inc(rwsem_wlock_handoff);
> return false;
> }

2022-09-30 05:40:21

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Hi,

On 9/29/2022 11:36 PM, Waiman Long wrote:
> On 9/29/22 14:04, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>> consistent")
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>
> Mukesh, can you test if this patch can fix the RT task lockup problem?

I will come back with the result.

Thank you.

>
> Thanks,
> Longman
>

2022-09-30 14:39:31

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 9/30/22 01:06, Mukesh Ojha wrote:
> Hi,
>
> On 9/29/2022 11:34 PM, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>> consistent")
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262f635e..ad676e99e0b3 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct
>> rw_semaphore *sem,
>>           new = count;
>>             if (count & RWSEM_LOCK_MASK) {
>> +            /*
>> +             * A waiter (first or not) can set the handoff bit
>> +             * if it is an RT task or wait in the wait queue
>> +             * for too long.
>> +             */
>>               if (has_handoff || (!rt_task(waiter->task) &&
>>                           !time_after(jiffies, waiter->timeout)))
>
> Not related to this issue, however wanted to understand the idea about
> this.
>
> If RT task comes in any order either come first or later it is setting
> the RWSEM_FLAG_HANDOFF bit.
> So, here we are giving some priority right a way to RT task however it
> can not get waiter->handoff_set=true since it is not the first
> waiter.(after this patch), is it not conflicting ?

I have thought about moving the RT task forward in the wait queue, but
then it will greatly complicate the code to try to do what a PREEMPT_RT
kernel does using a rt_mutex variant of rwsem. The reason why HANDOFF
bit is set when an RT task is in the wait queue is speed up the
progression of the wait queue without the interference of optimistic
spinner.

>
>
> Why can't we just keep like as below and not set
> new |= RWSEM_FLAG_HANDOFF; and return false from here.
>
> --------------0<------------------------------------
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index 65f0262..dbe3e16 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -628,8 +628,8 @@ static inline bool rwsem_try_write_lock(struct
> rw_semaphore *sem,
>                  new = count;
>
>                  if (count & RWSEM_LOCK_MASK) {
> -                       if (has_handoff || (!rt_task(waiter->task) &&
> -                                           !time_after(jiffies,
> waiter->timeout)))
> +                       if (has_handoff || (rt_task(waiter->task) &&
> waiter != first) ||
> +                          (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))
>                                  return false;
>
As I said above, we want to make more forward progress in the wait queue
if a RT task is waiting there to try to reduce its latency. That is the
point of that if statement.

Cheers,
Longman


2022-09-30 14:58:45

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 9/30/22 00:46, Mukesh Ojha wrote:
> Hi,
>
> Thanks for looking into this issue.
>
> On 9/29/2022 11:34 PM, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>> consistent")
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262f635e..ad676e99e0b3 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct
>> rw_semaphore *sem,
>>           new = count;
>>             if (count & RWSEM_LOCK_MASK) {
>> +            /*
>> +             * A waiter (first or not) can set the handoff bit
>> +             * if it is an RT task or wait in the wait queue
>> +             * for too long.
>> +             */
>>               if (has_handoff || (!rt_task(waiter->task) &&
>>                           !time_after(jiffies, waiter->timeout)))
>>                   return false;
>> @@ -643,11 +648,13 @@ static inline bool rwsem_try_write_lock(struct
>> rw_semaphore *sem,
>>       } while (!atomic_long_try_cmpxchg_acquire(&sem->count, &count,
>> new));
>>         /*
>> -     * We have either acquired the lock with handoff bit cleared or
>> -     * set the handoff bit.
>> +     * We have either acquired the lock with handoff bit cleared or set
>> +     * the handoff bit. Only the first waiter can have its handoff_set
>> +     * set here to enable optimistic spinning in slowpath loop.
>>        */
>>       if (new & RWSEM_FLAG_HANDOFF) {
>> -        waiter->handoff_set = true;
>> +        if (waiter == first)
>> +            waiter->handoff_set = true;
>>           lockevent_inc(rwsem_wlock_handoff);
>
> nit: Should this not get incremented on waiter->handoff_set=true only ?

The lock event counter records the # of time a handoff bit is set. It is
not related to how the handoff_set in the waiter structure is being set.

cheers,
Longman

2022-09-30 16:15:29

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Hi,

On 9/30/2022 7:38 PM, Waiman Long wrote:
> On 9/30/22 01:06, Mukesh Ojha wrote:
>> Hi,
>>
>> On 9/29/2022 11:34 PM, Waiman Long wrote:
>>> A non-first waiter can potentially spin in the for loop of
>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>> lock even if the rwsem is free if the following sequence happens:
>>>
>>>    Non-first waiter       First waiter      Lock holder
>>>    ----------------       ------------      -----------
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>      Set handoff bit if RT or
>>>        wait too long
>>>      Set waiter->handoff_set
>>>    Release wait_lock
>>>                           Acquire wait_lock
>>>                           Inherit waiter->handoff_set
>>>                           Release wait_lock
>>>                        Clear owner
>>>                                             Release lock
>>>    if (waiter.handoff_set) {
>>>      rwsem_spin_on_owner(();
>>>      if (OWNER_NULL)
>>>        goto trylock_again;
>>>    }
>>>    trylock_again:
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>       if (first->handoff_set && (waiter != first))
>>>           return false;
>>>    Release wait_lock
>>>
>>> It is especially problematic if the non-first waiter is an RT task and
>>> it is running on the same CPU as the first waiter as this can lead to
>>> live lock.
>>>
>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>> consistent")
>>> Signed-off-by: Waiman Long <[email protected]>
>>> ---
>>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>>> index 65f0262f635e..ad676e99e0b3 100644
>>> --- a/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -628,6 +628,11 @@ static inline bool rwsem_try_write_lock(struct
>>> rw_semaphore *sem,
>>>           new = count;
>>>             if (count & RWSEM_LOCK_MASK) {
>>> +            /*
>>> +             * A waiter (first or not) can set the handoff bit
>>> +             * if it is an RT task or wait in the wait queue
>>> +             * for too long.
>>> +             */
>>>               if (has_handoff || (!rt_task(waiter->task) &&
>>>                           !time_after(jiffies, waiter->timeout)))
>>
>> Not related to this issue, however wanted to understand the idea about
>> this.
>>
>> If RT task comes in any order either come first or later it is setting
>> the RWSEM_FLAG_HANDOFF bit.
>> So, here we are giving some priority right a way to RT task however it
>> can not get waiter->handoff_set=true since it is not the first
>> waiter.(after this patch), is it not conflicting ?
>
> I have thought about moving the RT task forward in the wait queue, but
> then it will greatly complicate the code to try to do what a PREEMPT_RT
> kernel does using a rt_mutex variant of rwsem. The reason why HANDOFF
> bit is set when an RT task is in the wait queue is speed up the
> progression of the wait queue without the interference of optimistic
> spinner.
>

Thanks for taking time to explain the motivation behind it.

It will take sometime to test your patch from our side however, like the
other patches this patch too seem to fix the issue.

Feel free to add
Reported-by: Mukesh Ojha <[email protected]>
Reviewed-by: Mukesh Ojha <[email protected]>

-Mukesh

>>
>>
>> Why can't we just keep like as below and not set
>> new |= RWSEM_FLAG_HANDOFF; and return false from here.
>>
>> --------------0<------------------------------------
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262..dbe3e16 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -628,8 +628,8 @@ static inline bool rwsem_try_write_lock(struct
>> rw_semaphore *sem,
>>                  new = count;
>>
>>                  if (count & RWSEM_LOCK_MASK) {
>> -                       if (has_handoff || (!rt_task(waiter->task) &&
>> -                                           !time_after(jiffies,
>> waiter->timeout)))
>> +                       if (has_handoff || (rt_task(waiter->task) &&
>> waiter != first) ||
>> +                          (!rt_task(waiter->task) &&
>> !time_after(jiffies, waiter->timeout)))
>>                                  return false;
>>
> As I said above, we want to make more forward progress in the wait queue
> if a RT task is waiting there to try to reduce its latency. That is the
> point of that if statement.
>
> Cheers,
> Longman
>
>

2022-10-10 11:18:06

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Hi Waiman,


On 9/29/2022 11:36 PM, Waiman Long wrote:
> On 9/29/22 14:04, Waiman Long wrote:
>> A non-first waiter can potentially spin in the for loop of
>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>> lock even if the rwsem is free if the following sequence happens:
>>
>>    Non-first waiter       First waiter      Lock holder
>>    ----------------       ------------      -----------
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>      Set handoff bit if RT or
>>        wait too long
>>      Set waiter->handoff_set
>>    Release wait_lock
>>                           Acquire wait_lock
>>                           Inherit waiter->handoff_set
>>                           Release wait_lock
>>                        Clear owner
>>                                             Release lock
>>    if (waiter.handoff_set) {
>>      rwsem_spin_on_owner(();
>>      if (OWNER_NULL)
>>        goto trylock_again;
>>    }
>>    trylock_again:
>>    Acquire wait_lock
>>    rwsem_try_write_lock():
>>       if (first->handoff_set && (waiter != first))
>>           return false;
>>    Release wait_lock
>>
>> It is especially problematic if the non-first waiter is an RT task and
>> it is running on the same CPU as the first waiter as this can lead to
>> live lock.
>>
>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>> consistent")
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>
> Mukesh, can you test if this patch can fix the RT task lockup problem?
>

Looks like, There is still a window for a race.

There is a chance when a reader who came first added it's BIAS and goes
to slowpath and before it gets added to wait list it got preempted by RT
task which goes to slowpath as well and being the first waiter gets its
hand-off bit set and not able to get the lock due to following condition
in rwsem_try_write_lock()

630 if (count & RWSEM_LOCK_MASK) { ==> reader has
sets its bias
..
...

634
635 new |= RWSEM_FLAG_HANDOFF;
636 } else {
637 new |= RWSEM_WRITER_LOCKED;


---------------------->----------------------->-------------------------

First reader (1) writer(2) RT task Lock holder(3)

It sets
RWSEM_READER_BIAS.
while it is going to
slowpath(as the lock
was held by (3)) and
before it got added
to the waiters list
it got preempted
by (2).
RT task also takes
the slowpath and add release the

itself into waiting list rwsem lock

and since it is the first clear the
it is the next one to get owner.
the lock but it can not
get the lock as (count &
RWSEM_LOCK_MASK) is set
as (1) has added it but
not able to remove its
adjustment.


-Mukesh


> Thanks,
> Longman
>

2022-10-10 17:04:03

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 10/10/22 06:24, Mukesh Ojha wrote:
> Hi Waiman,
>
>
> On 9/29/2022 11:36 PM, Waiman Long wrote:
>> On 9/29/22 14:04, Waiman Long wrote:
>>> A non-first waiter can potentially spin in the for loop of
>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>> lock even if the rwsem is free if the following sequence happens:
>>>
>>>    Non-first waiter       First waiter      Lock holder
>>>    ----------------       ------------      -----------
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>      Set handoff bit if RT or
>>>        wait too long
>>>      Set waiter->handoff_set
>>>    Release wait_lock
>>>                           Acquire wait_lock
>>>                           Inherit waiter->handoff_set
>>>                           Release wait_lock
>>>                        Clear owner
>>>                                             Release lock
>>>    if (waiter.handoff_set) {
>>>      rwsem_spin_on_owner(();
>>>      if (OWNER_NULL)
>>>        goto trylock_again;
>>>    }
>>>    trylock_again:
>>>    Acquire wait_lock
>>>    rwsem_try_write_lock():
>>>       if (first->handoff_set && (waiter != first))
>>>           return false;
>>>    Release wait_lock
>>>
>>> It is especially problematic if the non-first waiter is an RT task and
>>> it is running on the same CPU as the first waiter as this can lead to
>>> live lock.
>>>
>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>> consistent")
>>> Signed-off-by: Waiman Long <[email protected]>
>>> ---
>>>   kernel/locking/rwsem.c | 13 ++++++++++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>
>
> Looks like, There is still a window for a race.
>
> There is a chance when a reader who came first added it's BIAS and
> goes to slowpath and before it gets added to wait list it got
> preempted by RT task which  goes to slowpath as well and being the
> first waiter gets its hand-off bit set and not able to get the lock
> due to following condition in rwsem_try_write_lock()
>
>  630                 if (count & RWSEM_LOCK_MASK) {  ==> reader has
> sets its bias
> ..
> ...
>
>  634
>  635                         new |= RWSEM_FLAG_HANDOFF;
>  636                 } else {
>  637                         new |= RWSEM_WRITER_LOCKED;
>
>
> ---------------------->----------------------->-------------------------
>
> First reader (1)          writer(2) RT task             Lock holder(3)
>
> It sets
> RWSEM_READER_BIAS.
> while it is going to
> slowpath(as the lock
> was held by (3)) and
> before it got added
> to the waiters list
> it got preempted
> by (2).
>                         RT task also takes
>                         the slowpath and add              release the
>                         itself into waiting list          rwsem lock
>             and since it is the first         clear the
>                         it is the next one to get         owner.
>                         the lock but it can not
>                         get the lock as (count &
>                         RWSEM_LOCK_MASK) is set
>                         as (1) has added it but
>                         not able to remove its
>             adjustment.
>
Good catch!

It is indeed a possible livelock scenario. Will update the patch to
address that.

Thanks,
Longman

2022-10-11 13:51:09

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Hi @Hilf,

Thanks for looking into this issue.

On 10/11/2022 4:16 PM, Hillf Danton wrote:
> On 10/10/22 06:24 Mukesh Ojha <[email protected]>
>> Hi Waiman,
>>
>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>> On 9/29/22 14:04, Waiman Long wrote:
>>>> A non-first waiter can potentially spin in the for loop of
>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>> lock even if the rwsem is free if the following sequence happens:
>>>>
>>>> Non-first waiter First waiter Lock holder
>>>> ---------------- ------------ -----------
>>>> Acquire wait_lock
>>>> rwsem_try_write_lock():
>>>> Set handoff bit if RT or
>>>> wait too long
>>>> Set waiter->handoff_set
>>>> Release wait_lock
>>>> Acquire wait_lock
>>>> Inherit waiter->handoff_set
>>>> Release wait_lock
>>>> Clear owner
>>>> Release lock
>>>> if (waiter.handoff_set) {
>>>> rwsem_spin_on_owner(();
>>>> if (OWNER_NULL)
>>>> goto trylock_again;
>>>> }
>>>> trylock_again:
>>>> Acquire wait_lock
>>>> rwsem_try_write_lock():
>>>> if (first->handoff_set && (waiter != first))
>>>> return false;
>>>> Release wait_lock
>>>>
>>>> It is especially problematic if the non-first waiter is an RT task and
>>>> it is running on the same CPU as the first waiter as this can lead to
>>>> live lock.
>>>>
>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>> consistent")
>>>> Signed-off-by: Waiman Long <[email protected]>
>>>> ---
>>>> kernel/locking/rwsem.c | 13 ++++++++++---
>>>> 1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>
>>
>> Looks like, There is still a window for a race.
>>
>> There is a chance when a reader who came first added it's BIAS and
>> goes to slowpath and before it gets added to wait list it got
>> preempted by RT task which goes to slowpath as well and being the
>> first waiter gets its hand-off bit set and not able to get the lock
>> due to following condition in rwsem_try_write_lock()

[]

>>
>> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has
>> sets its bias
>> ..
>> ...
>>
>> 634
>> 635 new |= RWSEM_FLAG_HANDOFF;
>> 636 } else {
>> 637 new |= RWSEM_WRITER_LOCKED;
>>
>>
>> ---------------------->----------------------->-------------------------
>>
>> First reader (1) writer(2) RT task Lock holder(3)
>>
>> It sets
>> RWSEM_READER_BIAS.
>> while it is going to
>> slowpath(as the lock
>> was held by (3)) and
>> before it got added
>> to the waiters list
>> it got preempted
>> by (2).
>> RT task also takes
>> the slowpath and add release the
>> itself into waiting list rwsem lock
>> and since it is the first clear the
>> it is the next one to get owner.
>> the lock but it can not
>> get the lock as (count &
>> RWSEM_LOCK_MASK) is set
>> as (1) has added it but
>> not able to remove its
>> adjustment.

[]

>>
> Hey Mukesh,
>
> Can you test the diff if it makes sense to you?
>
> It simply prevents the first waiter from spinning any longer after detecting
> it barely makes any progress to spin without lock owner.
>
> Hillf
>
> --- mainline/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
> long count, new;
>
> lockdep_assert_held(&sem->wait_lock);
> + waiter->handoff_set = false;
>
> count = atomic_long_read(&sem->count);
> do {
> bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>
> if (has_handoff) {
> - /*
> - * Honor handoff bit and yield only when the first
> - * waiter is the one that set it. Otherwisee, we
> - * still try to acquire the rwsem.
> - */
> - if (first->handoff_set && (waiter != first))
> + if (waiter != first)
> return false;

you mean, you want to check and change waiter->handoff_set on every run
rwsem_try_write_lock().

But does it break optimistic spinning ? @waiman ?

-Mukesh

> -
> - /*
> - * First waiter can inherit a previously set handoff
> - * bit and spin on rwsem if lock acquisition fails.
> - */
> - if (waiter == first)
> - waiter->handoff_set = true;
> }
>
> new = count;

2022-10-12 13:33:48

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 10/12/22 00:04, Hillf Danton wrote:
>> you mean, you want to check and change waiter->handoff_set on every run
>> rwsem_try_write_lock().
>>
> Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to
> spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals
> in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring
> the lock, and it is not a bad option for the first waiter to take a step
> back off.
>
> if (count & RWSEM_LOCK_MASK) {
> if (has_handoff || (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))
> return false;
>
> new |= RWSEM_FLAG_HANDOFF;
> } else {
>
>> But does it break optimistic spinning ? @waiman ?
> Waiters spin for acquiring lock instead of lockup and your report shows
> spinning too much makes trouble. The key is stop spinning neither too
> late nor too early. My proposal is a simple one with as few heuristics
> added as possible.

Yes, too much spinning is bad if we have RT tasks in the mix, otherwise
it should be fine.

Cheers,
Longman

2022-10-12 13:52:30

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 10/11/22 06:46, Hillf Danton wrote:
> On 10/10/22 06:24 Mukesh Ojha <[email protected]>
>> Hi Waiman,
>>
>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>> On 9/29/22 14:04, Waiman Long wrote:
>>>> A non-first waiter can potentially spin in the for loop of
>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>> lock even if the rwsem is free if the following sequence happens:
>>>>
>>>> Non-first waiter First waiter Lock holder
>>>> ---------------- ------------ -----------
>>>> Acquire wait_lock
>>>> rwsem_try_write_lock():
>>>> Set handoff bit if RT or
>>>> wait too long
>>>> Set waiter->handoff_set
>>>> Release wait_lock
>>>> Acquire wait_lock
>>>> Inherit waiter->handoff_set
>>>> Release wait_lock
>>>> Clear owner
>>>> Release lock
>>>> if (waiter.handoff_set) {
>>>> rwsem_spin_on_owner(();
>>>> if (OWNER_NULL)
>>>> goto trylock_again;
>>>> }
>>>> trylock_again:
>>>> Acquire wait_lock
>>>> rwsem_try_write_lock():
>>>> if (first->handoff_set && (waiter != first))
>>>> return false;
>>>> Release wait_lock
>>>>
>>>> It is especially problematic if the non-first waiter is an RT task and
>>>> it is running on the same CPU as the first waiter as this can lead to
>>>> live lock.
>>>>
>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>> consistent")
>>>> Signed-off-by: Waiman Long <[email protected]>
>>>> ---
>>>> kernel/locking/rwsem.c | 13 ++++++++++---
>>>> 1 file changed, 10 insertions(+), 3 deletions(-)
>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>
>> Looks like, There is still a window for a race.
>>
>> There is a chance when a reader who came first added it's BIAS and
>> goes to slowpath and before it gets added to wait list it got
>> preempted by RT task which goes to slowpath as well and being the
>> first waiter gets its hand-off bit set and not able to get the lock
>> due to following condition in rwsem_try_write_lock()
>>
>> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has
>> sets its bias
>> ..
>> ...
>>
>> 634
>> 635 new |= RWSEM_FLAG_HANDOFF;
>> 636 } else {
>> 637 new |= RWSEM_WRITER_LOCKED;
>>
>>
>> ---------------------->----------------------->-------------------------
>>
>> First reader (1) writer(2) RT task Lock holder(3)
>>
>> It sets
>> RWSEM_READER_BIAS.
>> while it is going to
>> slowpath(as the lock
>> was held by (3)) and
>> before it got added
>> to the waiters list
>> it got preempted
>> by (2).
>> RT task also takes
>> the slowpath and add release the
>> itself into waiting list rwsem lock
>> and since it is the first clear the
>> it is the next one to get owner.
>> the lock but it can not
>> get the lock as (count &
>> RWSEM_LOCK_MASK) is set
>> as (1) has added it but
>> not able to remove its
>> adjustment.
>>
> Hey Mukesh,
>
> Can you test the diff if it makes sense to you?
>
> It simply prevents the first waiter from spinning any longer after detecting
> it barely makes any progress to spin without lock owner.
>
> Hillf
>
> --- mainline/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
> long count, new;
>
> lockdep_assert_held(&sem->wait_lock);
> + waiter->handoff_set = false;
>
> count = atomic_long_read(&sem->count);
> do {
> bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>
> if (has_handoff) {
> - /*
> - * Honor handoff bit and yield only when the first
> - * waiter is the one that set it. Otherwisee, we
> - * still try to acquire the rwsem.
> - */
> - if (first->handoff_set && (waiter != first))
> + if (waiter != first)
> return false;
> -
> - /*
> - * First waiter can inherit a previously set handoff
> - * bit and spin on rwsem if lock acquisition fails.
> - */
> - if (waiter == first)
> - waiter->handoff_set = true;
> }
>
> new = count;

That is somewhat equivalent to allowing first waiter to spin only once
after setting the handoff bit. It also remove the code to handle some of
corner cases.

Cheers,
Longman

2022-10-12 13:55:25

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 10/11/22 09:16, Mukesh Ojha wrote:
> Hi @Hilf,
>
> Thanks for looking into this issue.
>
> On 10/11/2022 4:16 PM, Hillf Danton wrote:
>> On 10/10/22 06:24 Mukesh Ojha <[email protected]>
>>> Hi Waiman,
>>>
>>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>>> On 9/29/22 14:04, Waiman Long wrote:
>>>>> A non-first waiter can potentially spin in the for loop of
>>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>>> lock even if the rwsem is free if the following sequence happens:
>>>>>
>>>>>     Non-first waiter       First waiter      Lock holder
>>>>>     ----------------       ------------      -----------
>>>>>     Acquire wait_lock
>>>>>     rwsem_try_write_lock():
>>>>>       Set handoff bit if RT or
>>>>>         wait too long
>>>>>       Set waiter->handoff_set
>>>>>     Release wait_lock
>>>>>                            Acquire wait_lock
>>>>>                            Inherit waiter->handoff_set
>>>>>                            Release wait_lock
>>>>>                         Clear owner
>>>>>                                              Release lock
>>>>>     if (waiter.handoff_set) {
>>>>>       rwsem_spin_on_owner(();
>>>>>       if (OWNER_NULL)
>>>>>         goto trylock_again;
>>>>>     }
>>>>>     trylock_again:
>>>>>     Acquire wait_lock
>>>>>     rwsem_try_write_lock():
>>>>>        if (first->handoff_set && (waiter != first))
>>>>>            return false;
>>>>>     Release wait_lock
>>>>>
>>>>> It is especially problematic if the non-first waiter is an RT task
>>>>> and
>>>>> it is running on the same CPU as the first waiter as this can lead to
>>>>> live lock.
>>>>>
>>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>>> consistent")
>>>>> Signed-off-by: Waiman Long <[email protected]>
>>>>> ---
>>>>>    kernel/locking/rwsem.c | 13 ++++++++++---
>>>>>    1 file changed, 10 insertions(+), 3 deletions(-)
>>>>
>>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>>
>>>
>>> Looks like, There is still a window for a race.
>>>
>>> There is a chance when a reader who came first added it's BIAS and
>>> goes to slowpath and before it gets added to wait list it got
>>> preempted by RT task which  goes to slowpath as well and being the
>>> first waiter gets its hand-off bit set and not able to get the lock
>>> due to following condition in rwsem_try_write_lock()
>
> []
>
>>>
>>>   630                 if (count & RWSEM_LOCK_MASK) { ==> reader has
>>> sets its bias
>>> ..
>>> ...
>>>
>>>   634
>>>   635                         new |= RWSEM_FLAG_HANDOFF;
>>>   636                 } else {
>>>   637                         new |= RWSEM_WRITER_LOCKED;
>>>
>>>
>>> ---------------------->----------------------->-------------------------
>>>
>>>
>>> First reader (1)          writer(2) RT task             Lock holder(3)
>>>
>>> It sets
>>> RWSEM_READER_BIAS.
>>> while it is going to
>>> slowpath(as the lock
>>> was held by (3)) and
>>> before it got added
>>> to the waiters list
>>> it got preempted
>>> by (2).
>>>                         RT task also takes
>>>                          the slowpath and add release the
>>>                          itself into waiting list rwsem lock
>>>              and since it is the first         clear the
>>>                          it is the next one to get owner.
>>>                          the lock but it can not
>>>                          get the lock as (count &
>>>                          RWSEM_LOCK_MASK) is set
>>>                          as (1) has added it but
>>>                          not able to remove its
>>>              adjustment.
>
> []
>
>>>
>> Hey Mukesh,
>>
>> Can you test the diff if it makes sense to you?
>>
>> It simply prevents the first waiter from spinning any longer after
>> detecting
>> it barely makes any progress to spin without lock owner.
>>
>> Hillf
>>
>> --- mainline/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>>       long count, new;
>>         lockdep_assert_held(&sem->wait_lock);
>> +    waiter->handoff_set = false;
>>         count = atomic_long_read(&sem->count);
>>       do {
>>           bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>>             if (has_handoff) {
>> -            /*
>> -             * Honor handoff bit and yield only when the first
>> -             * waiter is the one that set it. Otherwisee, we
>> -             * still try to acquire the rwsem.
>> -             */
>> -            if (first->handoff_set && (waiter != first))
>> +            if (waiter != first)
>>                   return false;
>
> you mean, you want to check and change waiter->handoff_set on every
> run rwsem_try_write_lock().
>
> But does it break optimistic spinning ? @waiman ?
>
As I said in my previous mail, this is equivalent to allow only one
optimistic spinning attempt after setting the handoff bit. That will
likely reduce the performance benefit provided by this feature.

Cheers,
Longman

2022-10-12 14:04:29

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

On 10/11/22 03:20, Ting11 Wang 王婷 wrote:
> Hi Waiman,
>
> Thanks for your quick reply.
> Regarding this issue, we have done a lot of testing on our devices and found that the first patch is the most effective,is it possible to using this patch?
>
> https://lore.kernel.org/lkml/[email protected]/
>
Yes, I am going to resurrect this patch.

Cheers,
Longman

2022-10-12 15:14:39

by Mukesh Ojha

[permalink] [raw]
Subject: Re: [PATCH] locking/rwsem: Prevent non-first waiter from spinning in down_write() slowpath

Hi,

On 10/12/2022 9:34 AM, Hillf Danton wrote:
> On 11 Oct 2022 18:46:20 +0530 Mukesh Ojha <[email protected]>
>> On 10/11/2022 4:16 PM, Hillf Danton wrote:
>>> On 10/10/22 06:24 Mukesh Ojha <[email protected]>
>>>> Hi Waiman,
>>>>
>>>> On 9/29/2022 11:36 PM, Waiman Long wrote:
>>>>> On 9/29/22 14:04, Waiman Long wrote:
>>>>>> A non-first waiter can potentially spin in the for loop of
>>>>>> rwsem_down_write_slowpath() without sleeping but fail to acquire the
>>>>>> lock even if the rwsem is free if the following sequence happens:
>>>>>>
>>>>>> Non-first waiter First waiter Lock holder
>>>>>> ---------------- ------------ -----------
>>>>>> Acquire wait_lock
>>>>>> rwsem_try_write_lock():
>>>>>> Set handoff bit if RT or
>>>>>> wait too long
>>>>>> Set waiter->handoff_set
>>>>>> Release wait_lock
>>>>>> Acquire wait_lock
>>>>>> Inherit waiter->handoff_set
>>>>>> Release wait_lock
>>>>>> Clear owner
>>>>>> Release lock
>>>>>> if (waiter.handoff_set) {
>>>>>> rwsem_spin_on_owner(();
>>>>>> if (OWNER_NULL)
>>>>>> goto trylock_again;
>>>>>> }
>>>>>> trylock_again:
>>>>>> Acquire wait_lock
>>>>>> rwsem_try_write_lock():
>>>>>> if (first->handoff_set && (waiter != first))
>>>>>> return false;
>>>>>> Release wait_lock
>>>>>>
>>>>>> It is especially problematic if the non-first waiter is an RT task and
>>>>>> it is running on the same CPU as the first waiter as this can lead to
>>>>>> live lock.
>>>>>>
>>>>>> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
>>>>>> consistent")
>>>>>> Signed-off-by: Waiman Long <[email protected]>
>>>>>> ---
>>>>>> kernel/locking/rwsem.c | 13 ++++++++++---
>>>>>> 1 file changed, 10 insertions(+), 3 deletions(-)
>>>>>
>>>>> Mukesh, can you test if this patch can fix the RT task lockup problem?
>>>>>
>>>>
>>>> Looks like, There is still a window for a race.
>>>>
>>>> There is a chance when a reader who came first added it's BIAS and
>>>> goes to slowpath and before it gets added to wait list it got
>>>> preempted by RT task which goes to slowpath as well and being the
>>>> first waiter gets its hand-off bit set and not able to get the lock
>>>> due to following condition in rwsem_try_write_lock()
>>
>> []
>>
>>>>
>>>> 630 if (count & RWSEM_LOCK_MASK) { ==> reader has
>>>> sets its bias
>>>> ..
>>>> ...
>>>>
>>>> 634
>>>> 635 new |= RWSEM_FLAG_HANDOFF;
>>>> 636 } else {
>>>> 637 new |= RWSEM_WRITER_LOCKED;
>>>>
>>>>
>>>> ---------------------->----------------------->-------------------------
>>>>
>>>> First reader (1) writer(2) RT task Lock holder(3)
>>>>
>>>> It sets
>>>> RWSEM_READER_BIAS.
>>>> while it is going to
>>>> slowpath(as the lock
>>>> was held by (3)) and
>>>> before it got added
>>>> to the waiters list
>>>> it got preempted
>>>> by (2).
>>>> RT task also takes
>>>> the slowpath and add release the
>>>> itself into waiting list rwsem lock
>>>> and since it is the first clear the
>>>> it is the next one to get owner.
>>>> the lock but it can not
>>>> get the lock as (count &
>>>> RWSEM_LOCK_MASK) is set
>>>> as (1) has added it but
>>>> not able to remove its
>>>> adjustment.
>>
>> []
>>
>>>>
>>> Hey Mukesh,
>>>
>>> Can you test the diff if it makes sense to you?
>>>
>>> It simply prevents the first waiter from spinning any longer after detecting
>>> it barely makes any progress to spin without lock owner.
>>>
>>> Hillf
>>>
>>> --- mainline/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -611,26 +611,15 @@ static inline bool rwsem_try_write_lock(
>>> long count, new;
>>>
>>> lockdep_assert_held(&sem->wait_lock);
>>> + waiter->handoff_set = false;
>>>
>>> count = atomic_long_read(&sem->count);
>>> do {
>>> bool has_handoff = !!(count & RWSEM_FLAG_HANDOFF);
>>>
>>> if (has_handoff) {
>>> - /*
>>> - * Honor handoff bit and yield only when the first
>>> - * waiter is the one that set it. Otherwisee, we
>>> - * still try to acquire the rwsem.
>>> - */
>>> - if (first->handoff_set && (waiter != first))
>>> + if (waiter != first)
>>> return false;
>>
>> you mean, you want to check and change waiter->handoff_set on every run
>> rwsem_try_write_lock().
>>
> Yes, with RWSEM_FLAG_HANDOFF set, it is too late for non first waiters to
> spin, and with both RWSEM_LOCK_MASK and RWSEM_FLAG_HANDOFF set, the rivals
> in the RWSEM_LOCK_MASK have an uphand over the first waiter wrt acquiring
> the lock, and it is not a bad option for the first waiter to take a step
> back off.
>
> if (count & RWSEM_LOCK_MASK) {
> if (has_handoff || (!rt_task(waiter->task) &&
> !time_after(jiffies, waiter->timeout)))
> return false;
>
> new |= RWSEM_FLAG_HANDOFF;
> } else {
>
>> But does it break optimistic spinning ? @waiman ?
>
> Waiters spin for acquiring lock instead of lockup and your report shows
> spinning too much makes trouble. The key is stop spinning neither too
> late nor too early. My proposal is a simple one with as few heuristics
> added as possible.

From the high level, it looks like it will work.
Let me check and get back on this.

-Mukesh
>
> Hillf