2022-03-11 22:39:50

by Zqiang

[permalink] [raw]
Subject: [PATCH v3] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

When RCU_BOOST is enabled, the boost kthreads will boosting readers
who are blocking a given grace period, if the current reader tasks
have a higher priority than boost kthreads(the boost kthreads priority
not always 1, if the kthread_prio is set), boosting is useless, skip
current task and select next task to boosting, reduce the time for a
given grace period.

Suggested-by: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Zqiang <[email protected]>
---
v1->v2:
Rename label 'end' to 'skip_boost'.
Add 'boost_exp_tasks' pointer to point 'rnp->exp_tasks'
do the similar thing as normal grace period.
v2->v3:
Remove redundant dl_task() judgment conditions.

kernel/rcu/tree.h | 2 ++
kernel/rcu/tree_plugin.h | 30 ++++++++++++++++++++++--------
2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index b8d07bf92d29..862ca09b56c7 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -103,6 +103,8 @@ struct rcu_node {
/* queued on this rcu_node structure that */
/* are blocking the current grace period, */
/* there can be no such task. */
+ struct list_head *boost_exp_tasks;
+
struct rt_mutex boost_mtx;
/* Used only for the priority-boosting */
/* side effect, not as a lock. */
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c3d212bc5338..fd37042ecdb2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -535,6 +535,8 @@ rcu_preempt_deferred_qs_irqrestore(struct task_struct *t, unsigned long flags)
drop_boost_mutex = rt_mutex_owner(&rnp->boost_mtx.rtmutex) == t;
if (&t->rcu_node_entry == rnp->boost_tasks)
WRITE_ONCE(rnp->boost_tasks, np);
+ if (&t->rcu_node_entry == rnp->boost_exp_tasks)
+ WRITE_ONCE(rnp->boost_exp_tasks, np);
}

/*
@@ -1022,7 +1024,7 @@ static int rcu_boost(struct rcu_node *rnp)
struct task_struct *t;
struct list_head *tb;

- if (READ_ONCE(rnp->exp_tasks) == NULL &&
+ if (READ_ONCE(rnp->boost_exp_tasks) == NULL &&
READ_ONCE(rnp->boost_tasks) == NULL)
return 0; /* Nothing left to boost. */

@@ -1032,7 +1034,7 @@ static int rcu_boost(struct rcu_node *rnp)
* Recheck under the lock: all tasks in need of boosting
* might exit their RCU read-side critical sections on their own.
*/
- if (rnp->exp_tasks == NULL && rnp->boost_tasks == NULL) {
+ if (rnp->boost_exp_tasks == NULL && rnp->boost_tasks == NULL) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return 0;
}
@@ -1043,8 +1045,8 @@ static int rcu_boost(struct rcu_node *rnp)
* expedited grace period must boost all blocked tasks, including
* those blocking the pre-existing normal grace period.
*/
- if (rnp->exp_tasks != NULL)
- tb = rnp->exp_tasks;
+ if (rnp->boost_exp_tasks != NULL)
+ tb = rnp->boost_exp_tasks;
else
tb = rnp->boost_tasks;

@@ -1065,14 +1067,24 @@ static int rcu_boost(struct rcu_node *rnp)
* section.
*/
t = container_of(tb, struct task_struct, rcu_node_entry);
+ if (t->prio <= current->prio) {
+ tb = rcu_next_node_entry(t, rnp);
+ if (rnp->boost_exp_tasks)
+ WRITE_ONCE(rnp->boost_exp_tasks, tb);
+ else
+ WRITE_ONCE(rnp->boost_tasks, tb);
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ goto skip_boost;
+ }
+
rt_mutex_init_proxy_locked(&rnp->boost_mtx.rtmutex, t);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
/* Lock only for side effect: boosts task t's priority. */
rt_mutex_lock(&rnp->boost_mtx);
rt_mutex_unlock(&rnp->boost_mtx); /* Then keep lockdep happy. */
rnp->n_boosts++;
-
- return READ_ONCE(rnp->exp_tasks) != NULL ||
+skip_boost:
+ return READ_ONCE(rnp->boost_exp_tasks) != NULL ||
READ_ONCE(rnp->boost_tasks) != NULL;
}

@@ -1090,7 +1102,7 @@ static int rcu_boost_kthread(void *arg)
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_WAITING);
trace_rcu_utilization(TPS("End boost kthread@rcu_wait"));
rcu_wait(READ_ONCE(rnp->boost_tasks) ||
- READ_ONCE(rnp->exp_tasks));
+ READ_ONCE(rnp->boost_exp_tasks));
trace_rcu_utilization(TPS("Start boost kthread@rcu_wait"));
WRITE_ONCE(rnp->boost_kthread_status, RCU_KTHREAD_RUNNING);
more2boost = rcu_boost(rnp);
@@ -1129,13 +1141,15 @@ static void rcu_initiate_boost(struct rcu_node *rnp, unsigned long flags)
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
return;
}
- if (rnp->exp_tasks != NULL ||
+ if ((rnp->exp_tasks != NULL && rnp->boost_exp_tasks == NULL) ||
(rnp->gp_tasks != NULL &&
rnp->boost_tasks == NULL &&
rnp->qsmask == 0 &&
(!time_after(rnp->boost_time, jiffies) || rcu_state.cbovld))) {
if (rnp->exp_tasks == NULL)
WRITE_ONCE(rnp->boost_tasks, rnp->gp_tasks);
+ else
+ WRITE_ONCE(rnp->boost_exp_tasks, rnp->exp_tasks);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
rcu_wake_cond(rnp->boost_kthread_task,
READ_ONCE(rnp->boost_kthread_status));
--
2.25.1


2022-03-12 10:41:36

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH v3] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On 2022-03-11 10:22:26 [+0800], Zqiang wrote:
> When RCU_BOOST is enabled, the boost kthreads will boosting readers
> who are blocking a given grace period, if the current reader tasks
^ Period.

> have a higher priority than boost kthreads(the boost kthreads priority
> not always 1, if the kthread_prio is set),

>>This confuses me:
>>- Why does this matter

In preempt-rt system, if the kthread_prio is not set, it prio is 1.
the boost kthreads can preempt almost rt task, It will affect
the real-time performance of some user rt tasks. In preempt-rt systems,
in most scenarios, this kthread_prio will be configured.

Thanks
Zqiang

>>- If it is not RT prio, what is then? Higher or lower? Afaik it is
>> always >= 1.

>>>If it is not RT prio, the sanitize_kthread_prio() will limit RT prio

> boosting is useless, skip
> current task and select next task to boosting, reduce the time for a
> given grace period.

>>So if the task, that is stuck in a rcu_read() section, has a higher
>>priority than the boosting thread then boosting is futile. Understood.
>>
>>Please correct me if I'm wrong but this is intended for !SCHED_OTHER
>>tasks since there shouldn't a be PI chain on boost_mtx so that its
>>default RT priority is boosted above what has been configured.

>>>Yes, you are right. If the boosting task which itself already boosted due to PI chain,
>>>and Its priority may only be temporarily higher than boost kthreads, once that
>>>PI boost is lifted the task may still be in a RCU section, but if we have been skipped it,
>>>this task have been missed the opportunity to be boosted.

>>
>>You skip boosting tasks which are itself already boosted due to a PI
>>chain. Once that PI boost is lifted the task may still be in a RCU
>>section. But if I understand you right, your intention is skip boosting
>>tasks with a higher priority and concentrate and those which are in
>>need. This shouldn't make a difference unless the scheduler is able to
>>move the rcu-boosted task to another CPU.
>>

>>>Yes, It make sense when the rcu-boosted kthreads and task which to be boosting
>>>should run difference CPU .

>>Am I right so far? If so this should be part of the commit message (the
>>intention and the result). Also, please add that part with
>>boost_exp_tasks. The comment above boost_mtx is now above
>>boost_exp_tasks with a space so it looks (at least to me) like these two
>>don't belong together.

>>>Yes, I will add your description to the commit information.


> Suggested-by: Uladzislau Rezki (Sony) <[email protected]>
> Signed-off-by: Zqiang <[email protected]>

>Sebastian

2022-03-17 04:52:05

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On Sat, Mar 12, 2022 at 03:11:04AM +0000, Zhang, Qiang1 wrote:
> On 2022-03-11 10:22:26 [+0800], Zqiang wrote:
> > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > who are blocking a given grace period, if the current reader tasks
> ^ Period.
>
> > have a higher priority than boost kthreads(the boost kthreads priority
> > not always 1, if the kthread_prio is set),
>
> >>This confuses me:
> >>- Why does this matter
>
> In preempt-rt system, if the kthread_prio is not set, it prio is 1.
> the boost kthreads can preempt almost rt task, It will affect
> the real-time performance of some user rt tasks. In preempt-rt systems,
> in most scenarios, this kthread_prio will be configured.

Just following up... These questions might have been answered, but
I am not seeing those answers right off-hand.

Is the grace-period latency effect of choosing not to boost high-priority
tasks visible at the system level in any actual workload?

Suppose that a SCHED_DEADLINE task has exhausted its time quantum,
and has thus been preempted within an RCU read-side critical section.
Can priority boosting from a SCHED_FIFO prio-1 task cause it to start
running?

Do delays in RCU priority boosting cause excessive grace-period
latencies on real workloads, even when all the to-be-boosted
tasks are SCHED_OTHER?

Thoughts?

Thanx, Paul

> Thanks
> Zqiang
>
> >>- If it is not RT prio, what is then? Higher or lower? Afaik it is
> >> always >= 1.
>
> >>>If it is not RT prio, the sanitize_kthread_prio() will limit RT prio
>
> > boosting is useless, skip
> > current task and select next task to boosting, reduce the time for a
> > given grace period.
>
> >>So if the task, that is stuck in a rcu_read() section, has a higher
> >>priority than the boosting thread then boosting is futile. Understood.
> >>
> >>Please correct me if I'm wrong but this is intended for !SCHED_OTHER
> >>tasks since there shouldn't a be PI chain on boost_mtx so that its
> >>default RT priority is boosted above what has been configured.
>
> >>>Yes, you are right. If the boosting task which itself already boosted due to PI chain,
> >>>and Its priority may only be temporarily higher than boost kthreads, once that
> >>>PI boost is lifted the task may still be in a RCU section, but if we have been skipped it,
> >>>this task have been missed the opportunity to be boosted.
>
> >>
> >>You skip boosting tasks which are itself already boosted due to a PI
> >>chain. Once that PI boost is lifted the task may still be in a RCU
> >>section. But if I understand you right, your intention is skip boosting
> >>tasks with a higher priority and concentrate and those which are in
> >>need. This shouldn't make a difference unless the scheduler is able to
> >>move the rcu-boosted task to another CPU.
> >>
>
> >>>Yes, It make sense when the rcu-boosted kthreads and task which to be boosting
> >>>should run difference CPU .
>
> >>Am I right so far? If so this should be part of the commit message (the
> >>intention and the result). Also, please add that part with
> >>boost_exp_tasks. The comment above boost_mtx is now above
> >>boost_exp_tasks with a space so it looks (at least to me) like these two
> >>don't belong together.
>
> >>>Yes, I will add your description to the commit information.
>
>
> > Suggested-by: Uladzislau Rezki (Sony) <[email protected]>
> > Signed-off-by: Zqiang <[email protected]>
>
> >Sebastian

2022-03-18 13:14:22

by Zqiang

[permalink] [raw]
Subject: RE: [PATCH v3] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On Sat, Mar 12, 2022 at 03:11:04AM +0000, Zhang, Qiang1 wrote:
> On 2022-03-11 10:22:26 [+0800], Zqiang wrote:
> > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > who are blocking a given grace period, if the current reader tasks
> ^ Period.
>
> > have a higher priority than boost kthreads(the boost kthreads priority
> > not always 1, if the kthread_prio is set),
>
> >>This confuses me:
> >>- Why does this matter
>
> In preempt-rt system, if the kthread_prio is not set, it prio is 1.
> the boost kthreads can preempt almost rt task, It will affect
> the real-time performance of some user rt tasks. In preempt-rt systems,
> in most scenarios, this kthread_prio will be configured.
>
>Just following up... These questions might have been answered, but
>I am not seeing those answers right off-hand.
>
>Is the grace-period latency effect of choosing not to boost high-priority
>tasks visible at the system level in any actual workload?
>
>Suppose that a SCHED_DEADLINE task has exhausted its time quantum,
>and has thus been preempted within an RCU read-side critical section.
>Can priority boosting from a SCHED_FIFO prio-1 task cause it to start
>running?
>
>Do delays in RCU priority boosting cause excessive grace-period
>latencies on real workloads, even when all the to-be-boosted
>tasks are SCHED_OTHER?
>
>Thoughts?

I have tested this modification these days, I originally planned to generate a Kconfig option to control
whether to skip tasks with higher priority than boost kthreads. but it doesn't seem necessary
because I find it's optimization is not particularly
obvious in the actual scene, I find that tasks with higher priority than boost kthreads
will quickly exit the rcu critical area , even if be preempted in the rcu critical area.
sorry for the noise.

Thanks,
Zqiang

>
> Thanx, Paul
>
> Thanks
> Zqiang
>
> >>- If it is not RT prio, what is then? Higher or lower? Afaik it is
> >> always >= 1.
>
> >>>If it is not RT prio, the sanitize_kthread_prio() will limit RT prio
>
> > boosting is useless, skip
> > current task and select next task to boosting, reduce the time for a
> > given grace period.
>
> >>So if the task, that is stuck in a rcu_read() section, has a higher
> >>priority than the boosting thread then boosting is futile. Understood.
> >>
> >>Please correct me if I'm wrong but this is intended for !SCHED_OTHER
> >>tasks since there shouldn't a be PI chain on boost_mtx so that its
> >>default RT priority is boosted above what has been configured.
>
> >>>Yes, you are right. If the boosting task which itself already boosted due to PI chain,
> >>>and Its priority may only be temporarily higher than boost kthreads, once that
> >>>PI boost is lifted the task may still be in a RCU section, but if we have been skipped it,
> >>>this task have been missed the opportunity to be boosted.
>
> >>
> >>You skip boosting tasks which are itself already boosted due to a PI
> >>chain. Once that PI boost is lifted the task may still be in a RCU
> >>section. But if I understand you right, your intention is skip boosting
> >>tasks with a higher priority and concentrate and those which are in
> >>need. This shouldn't make a difference unless the scheduler is able to
> >>move the rcu-boosted task to another CPU.
> >>
>
> >>>Yes, It make sense when the rcu-boosted kthreads and task which to be boosting
> >>>should run difference CPU .
>
> >>Am I right so far? If so this should be part of the commit message (the
> >>intention and the result). Also, please add that part with
> >>boost_exp_tasks. The comment above boost_mtx is now above
> >>boost_exp_tasks with a space so it looks (at least to me) like these two
> >>don't belong together.
>
> >>>Yes, I will add your description to the commit information.
>
>
> > Suggested-by: Uladzislau Rezki (Sony) <[email protected]>
> > Signed-off-by: Zqiang <[email protected]>
>
> >Sebastian

2022-03-18 16:52:07

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On Fri, Mar 18, 2022 at 05:50:35AM +0000, Zhang, Qiang1 wrote:
> On Sat, Mar 12, 2022 at 03:11:04AM +0000, Zhang, Qiang1 wrote:
> > On 2022-03-11 10:22:26 [+0800], Zqiang wrote:
> > > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > > who are blocking a given grace period, if the current reader tasks
> > ^ Period.
> >
> > > have a higher priority than boost kthreads(the boost kthreads priority
> > > not always 1, if the kthread_prio is set),
> >
> > >>This confuses me:
> > >>- Why does this matter
> >
> > In preempt-rt system, if the kthread_prio is not set, it prio is 1.
> > the boost kthreads can preempt almost rt task, It will affect
> > the real-time performance of some user rt tasks. In preempt-rt systems,
> > in most scenarios, this kthread_prio will be configured.
> >
> >Just following up... These questions might have been answered, but
> >I am not seeing those answers right off-hand.
> >
> >Is the grace-period latency effect of choosing not to boost high-priority
> >tasks visible at the system level in any actual workload?
> >
> >Suppose that a SCHED_DEADLINE task has exhausted its time quantum,
> >and has thus been preempted within an RCU read-side critical section.
> >Can priority boosting from a SCHED_FIFO prio-1 task cause it to start
> >running?
> >
> >Do delays in RCU priority boosting cause excessive grace-period
> >latencies on real workloads, even when all the to-be-boosted
> >tasks are SCHED_OTHER?
> >
> >Thoughts?
>
> I have tested this modification these days, I originally planned to generate a Kconfig option to control
> whether to skip tasks with higher priority than boost kthreads. but it doesn't seem necessary
> because I find it's optimization is not particularly
> obvious in the actual scene, I find that tasks with higher priority than boost kthreads
> will quickly exit the rcu critical area , even if be preempted in the rcu critical area.
> sorry for the noise.

Thank you for getting back with this information, and no need to
apologize. We all get excited about a potential change from time to time.
Part of us maintainers' jobs is to ask hard questions when that appears
to be happening. ;-)

If you have continued interest in this area, it would be good to keep
looking. After all, neither RCU expedited grace periods nor RCU priority
boosting were designed with these new use cases in mind, so it is quite
likely that there is a useful change to be made in there somewhere.

You see, RCU expedited grace periods were designed for throughput rather
than latency. The original use case was an old networking API that
needed to wait for a grace period on each and every one of a series of
some tens of thousands of system calls. If one or two of those system
calls took a few hundred milliseconds, but the rest completed in less than
a millisecond, no harm done. (Yes, there are now newer APIs that allow
many changes to be made with only the one grace-period wait. But the
kernel must continue to support the old API: Never Break Userspace.)

For its part, RCU priority boosting was originally designed for
debuggging. The point was to avoid OOMing the system when someone
misconfigured their application's real-time priorities. As you know,
such misconfiguration can easily prevent low-priority RCU readers from
ever completing.

So it is reasonably likely that some change or another is needed. After
all, new use cases require new functionality and new fixes. The trick
is figuring out which change makes sense amongst the huge group of other
possible changes that each add much more complexity than improvement.
But part of the process of finding that change that makes sense is trying
out quite a few changes that don't help all that much. ;-)

Thanx, Paul

> Thanks,
> Zqiang
>
> >
> > Thanx, Paul
> >
> > Thanks
> > Zqiang
> >
> > >>- If it is not RT prio, what is then? Higher or lower? Afaik it is
> > >> always >= 1.
> >
> > >>>If it is not RT prio, the sanitize_kthread_prio() will limit RT prio
> >
> > > boosting is useless, skip
> > > current task and select next task to boosting, reduce the time for a
> > > given grace period.
> >
> > >>So if the task, that is stuck in a rcu_read() section, has a higher
> > >>priority than the boosting thread then boosting is futile. Understood.
> > >>
> > >>Please correct me if I'm wrong but this is intended for !SCHED_OTHER
> > >>tasks since there shouldn't a be PI chain on boost_mtx so that its
> > >>default RT priority is boosted above what has been configured.
> >
> > >>>Yes, you are right. If the boosting task which itself already boosted due to PI chain,
> > >>>and Its priority may only be temporarily higher than boost kthreads, once that
> > >>>PI boost is lifted the task may still be in a RCU section, but if we have been skipped it,
> > >>>this task have been missed the opportunity to be boosted.
> >
> > >>
> > >>You skip boosting tasks which are itself already boosted due to a PI
> > >>chain. Once that PI boost is lifted the task may still be in a RCU
> > >>section. But if I understand you right, your intention is skip boosting
> > >>tasks with a higher priority and concentrate and those which are in
> > >>need. This shouldn't make a difference unless the scheduler is able to
> > >>move the rcu-boosted task to another CPU.
> > >>
> >
> > >>>Yes, It make sense when the rcu-boosted kthreads and task which to be boosting
> > >>>should run difference CPU .
> >
> > >>Am I right so far? If so this should be part of the commit message (the
> > >>intention and the result). Also, please add that part with
> > >>boost_exp_tasks. The comment above boost_mtx is now above
> > >>boost_exp_tasks with a space so it looks (at least to me) like these two
> > >>don't belong together.
> >
> > >>>Yes, I will add your description to the commit information.
> >
> >
> > > Suggested-by: Uladzislau Rezki (Sony) <[email protected]>
> > > Signed-off-by: Zqiang <[email protected]>
> >
> > >Sebastian

2022-03-31 02:46:14

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH v3] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

> On Fri, Mar 18, 2022 at 05:50:35AM +0000, Zhang, Qiang1 wrote:
> > On Sat, Mar 12, 2022 at 03:11:04AM +0000, Zhang, Qiang1 wrote:
> > > On 2022-03-11 10:22:26 [+0800], Zqiang wrote:
> > > > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > > > who are blocking a given grace period, if the current reader tasks
> > > ^ Period.
> > >
> > > > have a higher priority than boost kthreads(the boost kthreads priority
> > > > not always 1, if the kthread_prio is set),
> > >
> > > >>This confuses me:
> > > >>- Why does this matter
> > >
> > > In preempt-rt system, if the kthread_prio is not set, it prio is 1.
> > > the boost kthreads can preempt almost rt task, It will affect
> > > the real-time performance of some user rt tasks. In preempt-rt systems,
> > > in most scenarios, this kthread_prio will be configured.
> > >
> > >Just following up... These questions might have been answered, but
> > >I am not seeing those answers right off-hand.
> > >
> > >Is the grace-period latency effect of choosing not to boost high-priority
> > >tasks visible at the system level in any actual workload?
> > >
> > >Suppose that a SCHED_DEADLINE task has exhausted its time quantum,
> > >and has thus been preempted within an RCU read-side critical section.
> > >Can priority boosting from a SCHED_FIFO prio-1 task cause it to start
> > >running?
> > >
> > >Do delays in RCU priority boosting cause excessive grace-period
> > >latencies on real workloads, even when all the to-be-boosted
> > >tasks are SCHED_OTHER?
> > >
> > >Thoughts?
> >
> > I have tested this modification these days, I originally planned to generate a Kconfig option to control
> > whether to skip tasks with higher priority than boost kthreads. but it doesn't seem necessary
> > because I find it's optimization is not particularly
> > obvious in the actual scene, I find that tasks with higher priority than boost kthreads
> > will quickly exit the rcu critical area , even if be preempted in the rcu critical area.
> > sorry for the noise.
>
> Thank you for getting back with this information, and no need to
> apologize. We all get excited about a potential change from time to time.
> Part of us maintainers' jobs is to ask hard questions when that appears
> to be happening. ;-)
>
> If you have continued interest in this area, it would be good to keep
> looking. After all, neither RCU expedited grace periods nor RCU priority
> boosting were designed with these new use cases in mind, so it is quite
> likely that there is a useful change to be made in there somewhere.
>
> You see, RCU expedited grace periods were designed for throughput rather
> than latency. The original use case was an old networking API that
> needed to wait for a grace period on each and every one of a series of
> some tens of thousands of system calls. If one or two of those system
> calls took a few hundred milliseconds, but the rest completed in less than
> a millisecond, no harm done. (Yes, there are now newer APIs that allow
> many changes to be made with only the one grace-period wait. But the
> kernel must continue to support the old API: Never Break Userspace.)
>
> For its part, RCU priority boosting was originally designed for
> debuggging. The point was to avoid OOMing the system when someone
> misconfigured their application's real-time priorities. As you know,
> such misconfiguration can easily prevent low-priority RCU readers from
> ever completing.
>
> So it is reasonably likely that some change or another is needed. After
> all, new use cases require new functionality and new fixes. The trick
> is figuring out which change makes sense amongst the huge group of other
> possible changes that each add much more complexity than improvement.
> But part of the process of finding that change that makes sense is trying
> out quite a few changes that don't help all that much. ;-)
>
Sorry for the late response, but i think i should comment on it since i
have tried to simulate and test this patch on Android device. Basically
we do have RT tasks in Android and i do not see that the patch that is
in question makes any difference. Actually i was not able to trigger its
functionality at all.

From the other hand, i have tried to simulate it making an RT environment
with SCHED_FIFO tasks and some synchronize_rcu_expedited() users. Indeed
i can trigger it but it is very specific env. and number of triggering or
tasks bypassing(high prio) is almost zero.

--
Uladzislau Rezki

2022-03-31 02:49:44

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On Wed, Mar 30, 2022 at 09:35:26PM +0200, Uladzislau Rezki wrote:
> > On Fri, Mar 18, 2022 at 05:50:35AM +0000, Zhang, Qiang1 wrote:
> > > On Sat, Mar 12, 2022 at 03:11:04AM +0000, Zhang, Qiang1 wrote:
> > > > On 2022-03-11 10:22:26 [+0800], Zqiang wrote:
> > > > > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > > > > who are blocking a given grace period, if the current reader tasks
> > > > ^ Period.
> > > >
> > > > > have a higher priority than boost kthreads(the boost kthreads priority
> > > > > not always 1, if the kthread_prio is set),
> > > >
> > > > >>This confuses me:
> > > > >>- Why does this matter
> > > >
> > > > In preempt-rt system, if the kthread_prio is not set, it prio is 1.
> > > > the boost kthreads can preempt almost rt task, It will affect
> > > > the real-time performance of some user rt tasks. In preempt-rt systems,
> > > > in most scenarios, this kthread_prio will be configured.
> > > >
> > > >Just following up... These questions might have been answered, but
> > > >I am not seeing those answers right off-hand.
> > > >
> > > >Is the grace-period latency effect of choosing not to boost high-priority
> > > >tasks visible at the system level in any actual workload?
> > > >
> > > >Suppose that a SCHED_DEADLINE task has exhausted its time quantum,
> > > >and has thus been preempted within an RCU read-side critical section.
> > > >Can priority boosting from a SCHED_FIFO prio-1 task cause it to start
> > > >running?
> > > >
> > > >Do delays in RCU priority boosting cause excessive grace-period
> > > >latencies on real workloads, even when all the to-be-boosted
> > > >tasks are SCHED_OTHER?
> > > >
> > > >Thoughts?
> > >
> > > I have tested this modification these days, I originally planned to generate a Kconfig option to control
> > > whether to skip tasks with higher priority than boost kthreads. but it doesn't seem necessary
> > > because I find it's optimization is not particularly
> > > obvious in the actual scene, I find that tasks with higher priority than boost kthreads
> > > will quickly exit the rcu critical area , even if be preempted in the rcu critical area.
> > > sorry for the noise.
> >
> > Thank you for getting back with this information, and no need to
> > apologize. We all get excited about a potential change from time to time.
> > Part of us maintainers' jobs is to ask hard questions when that appears
> > to be happening. ;-)
> >
> > If you have continued interest in this area, it would be good to keep
> > looking. After all, neither RCU expedited grace periods nor RCU priority
> > boosting were designed with these new use cases in mind, so it is quite
> > likely that there is a useful change to be made in there somewhere.
> >
> > You see, RCU expedited grace periods were designed for throughput rather
> > than latency. The original use case was an old networking API that
> > needed to wait for a grace period on each and every one of a series of
> > some tens of thousands of system calls. If one or two of those system
> > calls took a few hundred milliseconds, but the rest completed in less than
> > a millisecond, no harm done. (Yes, there are now newer APIs that allow
> > many changes to be made with only the one grace-period wait. But the
> > kernel must continue to support the old API: Never Break Userspace.)
> >
> > For its part, RCU priority boosting was originally designed for
> > debuggging. The point was to avoid OOMing the system when someone
> > misconfigured their application's real-time priorities. As you know,
> > such misconfiguration can easily prevent low-priority RCU readers from
> > ever completing.
> >
> > So it is reasonably likely that some change or another is needed. After
> > all, new use cases require new functionality and new fixes. The trick
> > is figuring out which change makes sense amongst the huge group of other
> > possible changes that each add much more complexity than improvement.
> > But part of the process of finding that change that makes sense is trying
> > out quite a few changes that don't help all that much. ;-)
> >
> Sorry for the late response, but i think i should comment on it since i
> have tried to simulate and test this patch on Android device. Basically
> we do have RT tasks in Android and i do not see that the patch that is
> in question makes any difference. Actually i was not able to trigger its
> functionality at all.
>
> >From the other hand, i have tried to simulate it making an RT environment
> with SCHED_FIFO tasks and some synchronize_rcu_expedited() users. Indeed
> i can trigger it but it is very specific env. and number of triggering or
> tasks bypassing(high prio) is almost zero.

Thank you both!

I will set this aside for the time being. I am sure that further
adjustments will be needed, but time will tell.

Thanx, Paul