2022-07-06 14:28:42

by Waiman Long

[permalink] [raw]
Subject: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
stealing") allows unlimited number of lock stealing's for non-RT
tasks. That can lead to lock starvation of non-RT top waiter tasks if
there is a constant incoming stream of non-RT lockers. This can cause
rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
For example,

[77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.

Avoiding this problem and ensuring forward progress by limiting the
number of times that a lock can be stolen from each waiter. This patch
sets a threshold of 32. That number is arbitrary and can be changed
if needed.

Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/locking/rtmutex.c | 9 ++++++---
kernel/locking/rtmutex_common.h | 8 ++++++++
2 files changed, 14 insertions(+), 3 deletions(-)

[v3: Increase threshold to 32 and add rcu_preempt self-detected stall]

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7779ee8abc2a..bdddb3dc36c2 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -359,10 +359,13 @@ static inline bool rt_mutex_steal(struct rt_mutex_waiter *waiter,
if (rt_prio(waiter->prio) || dl_prio(waiter->prio))
return false;

- return rt_mutex_waiter_equal(waiter, top_waiter);
-#else
- return false;
+ if (rt_mutex_waiter_equal(waiter, top_waiter) &&
+ (top_waiter->nr_steals < RT_MUTEX_LOCK_STEAL_MAX)) {
+ top_waiter->nr_steals++;
+ return true;
+ }
#endif
+ return false;
}

#define __node_2_waiter(node) \
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index c47e8361bfb5..d5eeba955f07 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -26,6 +26,7 @@
* @task: task reference to the blocked task
* @lock: Pointer to the rt_mutex on which the waiter blocks
* @wake_state: Wakeup state to use (TASK_NORMAL or TASK_RTLOCK_WAIT)
+ * @nr_steals: Number of times the lock is stolen
* @prio: Priority of the waiter
* @deadline: Deadline of the waiter if applicable
* @ww_ctx: WW context pointer
@@ -36,11 +37,17 @@ struct rt_mutex_waiter {
struct task_struct *task;
struct rt_mutex_base *lock;
unsigned int wake_state;
+ unsigned int nr_steals;
int prio;
u64 deadline;
struct ww_acquire_ctx *ww_ctx;
};

+/*
+ * The maximum number of times where lock can be stolen per waiter.
+ */
+#define RT_MUTEX_LOCK_STEAL_MAX (1U << 5)
+
/**
* rt_wake_q_head - Wrapper around regular wake_q_head to support
* "sleeping" spinlocks on RT
@@ -194,6 +201,7 @@ static inline void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter)
RB_CLEAR_NODE(&waiter->tree_entry);
waiter->wake_state = TASK_NORMAL;
waiter->task = NULL;
+ waiter->nr_steals = 0;
}

static inline void rt_mutex_init_rtlock_waiter(struct rt_mutex_waiter *waiter)
--
2.31.1


2022-07-06 14:35:29

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

On 7/6/22 09:59, Waiman Long wrote:
> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> stealing") allows unlimited number of lock stealing's for non-RT
> tasks. That can lead to lock starvation of non-RT top waiter tasks if
> there is a constant incoming stream of non-RT lockers. This can cause
> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
> For example,
>
> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>
> Avoiding this problem and ensuring forward progress by limiting the
> number of times that a lock can be stolen from each waiter. This patch
> sets a threshold of 32. That number is arbitrary and can be changed
> if needed.
>
> Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> Signed-off-by: Waiman Long <[email protected]>
> ---
> kernel/locking/rtmutex.c | 9 ++++++---
> kernel/locking/rtmutex_common.h | 8 ++++++++
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]

Note that I decided to increase the threshold to 32 from 10 to reduce
the potential performance impact of this change, if any. We also found
out that this patch can fix some of the rcu_preempt self-detected stall
problems that we saw with the PREEMPT_RT kernel. So I added that
information in the patch description.

Cheers,
Longman

2022-07-07 18:25:49

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
> On 7/6/22 09:59, Waiman Long wrote:
> > Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> > stealing") allows unlimited number of lock stealing's for non-RT
> > tasks. That can lead to lock starvation of non-RT top waiter tasks if
> > there is a constant incoming stream of non-RT lockers. This can cause
> > rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
> > For example,
> >
> > [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
> > [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
> >
> > Avoiding this problem and ensuring forward progress by limiting the
> > number of times that a lock can be stolen from each waiter. This patch
> > sets a threshold of 32. That number is arbitrary and can be changed
> > if needed.
> >
> > Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> > Signed-off-by: Waiman Long <[email protected]>
> > ---
> > kernel/locking/rtmutex.c | 9 ++++++---
> > kernel/locking/rtmutex_common.h | 8 ++++++++
> > 2 files changed, 14 insertions(+), 3 deletions(-)
> >
> > [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
>
> Note that I decided to increase the threshold to 32 from 10 to reduce the
> potential performance impact of this change, if any. We also found out that
> this patch can fix some of the rcu_preempt self-detected stall problems that
> we saw with the PREEMPT_RT kernel. So I added that information in the patch
> description.
>

Have you considered (and tested) whether we can set the threshold
directly proportional to nr_cpu_ids? Because IIUC, the favorable case
for lock stealing is that every CPU gets a chance to steal once. If one
CPU can steal twice, 1) either there is a context switch between two
tasks, which costs similarly as waking up the waiter, or 2) a task drops
and re-graps a lock, which means the task wants to yield to other
waiters of the lock.

Regards,
Boqun

> Cheers,
> Longman
>

2022-07-07 18:52:46

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

On 7/7/22 14:22, Boqun Feng wrote:
> On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
>> On 7/6/22 09:59, Waiman Long wrote:
>>> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
>>> stealing") allows unlimited number of lock stealing's for non-RT
>>> tasks. That can lead to lock starvation of non-RT top waiter tasks if
>>> there is a constant incoming stream of non-RT lockers. This can cause
>>> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
>>> For example,
>>>
>>> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
>>> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>>>
>>> Avoiding this problem and ensuring forward progress by limiting the
>>> number of times that a lock can be stolen from each waiter. This patch
>>> sets a threshold of 32. That number is arbitrary and can be changed
>>> if needed.
>>>
>>> Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
>>> Signed-off-by: Waiman Long <[email protected]>
>>> ---
>>> kernel/locking/rtmutex.c | 9 ++++++---
>>> kernel/locking/rtmutex_common.h | 8 ++++++++
>>> 2 files changed, 14 insertions(+), 3 deletions(-)
>>>
>>> [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
>> Note that I decided to increase the threshold to 32 from 10 to reduce the
>> potential performance impact of this change, if any. We also found out that
>> this patch can fix some of the rcu_preempt self-detected stall problems that
>> we saw with the PREEMPT_RT kernel. So I added that information in the patch
>> description.
>>
> Have you considered (and tested) whether we can set the threshold
> directly proportional to nr_cpu_ids? Because IIUC, the favorable case
> for lock stealing is that every CPU gets a chance to steal once. If one
> CPU can steal twice, 1) either there is a context switch between two
> tasks, which costs similarly as waking up the waiter, or 2) a task drops
> and re-graps a lock, which means the task wants to yield to other
> waiters of the lock.

There is no inherent restriction on not allowing the same cpu stealing
the lock twice or more. With rtmutex, the top waiter may be sleeping and
the wakeup latency can be considerable. By allowing another ready lock
waiter to steal the lock for productive use, it can improve system
throughput. There is no fairness in lock stealing and I don't believe it
is a worthwhile goal to allow each cpu to steal the lock once. It will
just complicate the code.

On the other hand, unlimited lock stealing is bad and we have a put a
limit somehow to ensure forward progress.

Cheers,
Longman

2022-07-07 19:09:48

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

On Thu, Jul 07, 2022 at 02:45:10PM -0400, Waiman Long wrote:
> On 7/7/22 14:22, Boqun Feng wrote:
> > On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
> > > On 7/6/22 09:59, Waiman Long wrote:
> > > > Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> > > > stealing") allows unlimited number of lock stealing's for non-RT
> > > > tasks. That can lead to lock starvation of non-RT top waiter tasks if
> > > > there is a constant incoming stream of non-RT lockers. This can cause
> > > > rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
> > > > For example,
> > > >
> > > > [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
> > > > [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
> > > >
> > > > Avoiding this problem and ensuring forward progress by limiting the
> > > > number of times that a lock can be stolen from each waiter. This patch
> > > > sets a threshold of 32. That number is arbitrary and can be changed
> > > > if needed.
> > > >
> > > > Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
> > > > Signed-off-by: Waiman Long <[email protected]>
> > > > ---
> > > > kernel/locking/rtmutex.c | 9 ++++++---
> > > > kernel/locking/rtmutex_common.h | 8 ++++++++
> > > > 2 files changed, 14 insertions(+), 3 deletions(-)
> > > >
> > > > [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
> > > Note that I decided to increase the threshold to 32 from 10 to reduce the
> > > potential performance impact of this change, if any. We also found out that
> > > this patch can fix some of the rcu_preempt self-detected stall problems that
> > > we saw with the PREEMPT_RT kernel. So I added that information in the patch
> > > description.
> > >
> > Have you considered (and tested) whether we can set the threshold
> > directly proportional to nr_cpu_ids? Because IIUC, the favorable case
> > for lock stealing is that every CPU gets a chance to steal once. If one
> > CPU can steal twice, 1) either there is a context switch between two
> > tasks, which costs similarly as waking up the waiter, or 2) a task drops
> > and re-graps a lock, which means the task wants to yield to other
> > waiters of the lock.
>
> There is no inherent restriction on not allowing the same cpu stealing the
> lock twice or more. With rtmutex, the top waiter may be sleeping and the

Well, I'm not saying we need to restrict the same cpu to steal a lock
twice or more. Think about this, when there is a task running on CPU 1
already steals a lock once, for example:

<lock release>
{task C is the top waiter}

CPU 1
=====
<now task A running>
lock(); // steal the lock
...
unlock():
// set owner to NULL
<switch task B> // similar cost to wake up A
lock(); // steal the lock

, which means if a CPU steals a lock twice or more, it's almost certain
that a context happened between two steals ("almost" because there could
be a case where task A lock()+unlock() twice, but as I said, it
means that task A is willing to yield.).

Therefore if there are @nr_cpu_ids lock steals, it means either there is
a context switch somewhere or a task has been willing to yield. And I
think it's a reasonable signal to stop lock stealing.

Thoughts?

Regards,
Boqun

> wakeup latency can be considerable. By allowing another ready lock waiter to
> steal the lock for productive use, it can improve system throughput. There
> is no fairness in lock stealing and I don't believe it is a worthwhile goal
> to allow each cpu to steal the lock once. It will just complicate the code.
>
> On the other hand, unlimited lock stealing is bad and we have a put a limit
> somehow to ensure forward progress.
>
> Cheers,
> Longman
>

2022-07-07 19:41:49

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

On 7/7/22 15:04, Boqun Feng wrote:
> On Thu, Jul 07, 2022 at 02:45:10PM -0400, Waiman Long wrote:
>> On 7/7/22 14:22, Boqun Feng wrote:
>>> On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
>>>> On 7/6/22 09:59, Waiman Long wrote:
>>>>> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
>>>>> stealing") allows unlimited number of lock stealing's for non-RT
>>>>> tasks. That can lead to lock starvation of non-RT top waiter tasks if
>>>>> there is a constant incoming stream of non-RT lockers. This can cause
>>>>> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
>>>>> For example,
>>>>>
>>>>> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
>>>>> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>>>>>
>>>>> Avoiding this problem and ensuring forward progress by limiting the
>>>>> number of times that a lock can be stolen from each waiter. This patch
>>>>> sets a threshold of 32. That number is arbitrary and can be changed
>>>>> if needed.
>>>>>
>>>>> Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
>>>>> Signed-off-by: Waiman Long <[email protected]>
>>>>> ---
>>>>> kernel/locking/rtmutex.c | 9 ++++++---
>>>>> kernel/locking/rtmutex_common.h | 8 ++++++++
>>>>> 2 files changed, 14 insertions(+), 3 deletions(-)
>>>>>
>>>>> [v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
>>>> Note that I decided to increase the threshold to 32 from 10 to reduce the
>>>> potential performance impact of this change, if any. We also found out that
>>>> this patch can fix some of the rcu_preempt self-detected stall problems that
>>>> we saw with the PREEMPT_RT kernel. So I added that information in the patch
>>>> description.
>>>>
>>> Have you considered (and tested) whether we can set the threshold
>>> directly proportional to nr_cpu_ids? Because IIUC, the favorable case
>>> for lock stealing is that every CPU gets a chance to steal once. If one
>>> CPU can steal twice, 1) either there is a context switch between two
>>> tasks, which costs similarly as waking up the waiter, or 2) a task drops
>>> and re-graps a lock, which means the task wants to yield to other
>>> waiters of the lock.
>> There is no inherent restriction on not allowing the same cpu stealing the
>> lock twice or more. With rtmutex, the top waiter may be sleeping and the
> Well, I'm not saying we need to restrict the same cpu to steal a lock
> twice or more. Think about this, when there is a task running on CPU 1
> already steals a lock once, for example:
>
> <lock release>
> {task C is the top waiter}
>
> CPU 1
> =====
> <now task A running>
> lock(); // steal the lock
> ...
> unlock():
> // set owner to NULL
> <switch task B> // similar cost to wake up A
> lock(); // steal the lock
>
> , which means if a CPU steals a lock twice or more, it's almost certain
> that a context happened between two steals ("almost" because there could
> be a case where task A lock()+unlock() twice, but as I said, it
> means that task A is willing to yield.).
>
> Therefore if there are @nr_cpu_ids lock steals, it means either there is
> a context switch somewhere or a task has been willing to yield. And I
> think it's a reasonable signal to stop lock stealing.
>
> Thoughts?

The reality is that a task can acquire the same lock multiple times
before a context switch. So I believe stealing a lock from the same
sleeping top waiter multiple times can certainly happen. For a large SMP
systems with hundred or even thousands of cpus, allowing that many lock
stealing may significantly increase the lock acquisition latency for the
unfortunate tasks.

Another alternative that I have done in the past is to put in a time
stamp where a task become the top waiter and refrained from stealing the
lock when the elapsed time from the time stamp exceeds a certain limit.
That will limit the max lock acquisition latency of a non-RT task as
long as there are no other RT tasks competing with it for the lock.

Cheers,
Longman

2022-07-11 10:26:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

On Wed, Jul 06, 2022 at 09:59:16AM -0400, Waiman Long wrote:
> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
> stealing") allows unlimited number of lock stealing's for non-RT
> tasks. That can lead to lock starvation of non-RT top waiter tasks if
> there is a constant incoming stream of non-RT lockers. This can cause
> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
> For example,
>
> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>
> Avoiding this problem and ensuring forward progress by limiting the
> number of times that a lock can be stolen from each waiter. This patch
> sets a threshold of 32. That number is arbitrary and can be changed
> if needed.
>

Why not do the same thing we do for regular mutexes?

2022-07-11 20:07:32

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

On 7/11/22 05:34, Peter Zijlstra wrote:
> On Wed, Jul 06, 2022 at 09:59:16AM -0400, Waiman Long wrote:
>> Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
>> stealing") allows unlimited number of lock stealing's for non-RT
>> tasks. That can lead to lock starvation of non-RT top waiter tasks if
>> there is a constant incoming stream of non-RT lockers. This can cause
>> rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
>> For example,
>>
>> [77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
>> [ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.
>>
>> Avoiding this problem and ensuring forward progress by limiting the
>> number of times that a lock can be stolen from each waiter. This patch
>> sets a threshold of 32. That number is arbitrary and can be changed
>> if needed.
>>
> Why not do the same thing we do for regular mutexes?
>
The mutex way is another possible alternative. So we can set a flag to
disable lock stealing if the current top waiter wake up and the rtmutex
has been stolen. I will need to run some tests to find out how many time
lock stealing can happen before it is blocked. I would like to allow
sufficient number of lock stealing to minimize the performance impact of
this change.

Cheers,
Longman