LinuxLists.cc - [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

2022-03-04 12:30:29

Subject: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

When RCU_BOOST is enabled, the boost kthreads will boosting readers
who are blocking a given grace period, if the current reader tasks
have a higher priority than boost kthreads(the boost kthreads priority
not always 1, if the kthread_prio is set), boosting is useless, skip
current task and select next task to boosting, reduce the time for a
given grace period.

Signed-off-by: Zqiang <[email protected]>
---
kernel/rcu/tree_plugin.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c3d212bc5338..d35b6da66bbd 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -12,6 +12,7 @@
*/

#include "../locking/rtmutex_common.h"
+#include <linux/sched/deadline.h>

static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
{
@@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
* section.
*/
t = container_of(tb, struct task_struct, rcu_node_entry);
+ if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
+ tb = rcu_next_node_entry(t, rnp);
+ WRITE_ONCE(rnp->boost_tasks, tb);
+ raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+ goto end;
+ }
+
rt_mutex_init_proxy_locked(&rnp->boost_mtx.rtmutex, t);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
/* Lock only for side effect: boosts task t's priority. */
rt_mutex_lock(&rnp->boost_mtx);
rt_mutex_unlock(&rnp->boost_mtx); /* Then keep lockdep happy. */
rnp->n_boosts++;
-
+end:
return READ_ONCE(rnp->exp_tasks) != NULL ||
READ_ONCE(rnp->boost_tasks) != NULL;
}
--
2.25.1

2022-03-04 20:44:53

by Neeraj Upadhyay

[permalink] [raw]

Subject: Re: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On 3/4/2022 2:56 PM, Zqiang wrote:
> When RCU_BOOST is enabled, the boost kthreads will boosting readers
> who are blocking a given grace period, if the current reader tasks
> have a higher priority than boost kthreads(the boost kthreads priority
> not always 1, if the kthread_prio is set), boosting is useless, skip
> current task and select next task to boosting, reduce the time for a
> given grace period.
>
> Signed-off-by: Zqiang <[email protected]>
> ---
> kernel/rcu/tree_plugin.h | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c3d212bc5338..d35b6da66bbd 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -12,6 +12,7 @@
> */
>
> #include "../locking/rtmutex_common.h"
> +#include <linux/sched/deadline.h>
>
> static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> {
> @@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
> * section.
> */
> t = container_of(tb, struct task_struct, rcu_node_entry);
> + if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
> + tb = rcu_next_node_entry(t, rnp);
> + WRITE_ONCE(rnp->boost_tasks, tb);
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> + goto end;
> + }
> +
> rt_mutex_init_proxy_locked(&rnp->boost_mtx.rtmutex, t);
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> /* Lock only for side effect: boosts task t's priority. */
> rt_mutex_lock(&rnp->boost_mtx);
> rt_mutex_unlock(&rnp->boost_mtx); /* Then keep lockdep happy. */
> rnp->n_boosts++;
> -
> +end:

Nit: maybe rename the label to "skip_boost:" ?

Code looks fine; however, out of curiosity; given that the higher
priority tasks, in general, would exit their read side critical section
quickly and boost the next blocking reader on exiting their read side
section; do you see noticeable reduction in grace period timings with
the change for certain type of workloads?

Thanks
Neeraj

> return READ_ONCE(rnp->exp_tasks) != NULL ||
> READ_ONCE(rnp->boost_tasks) != NULL;
> }

2022-03-07 09:53:57

by Zqiang

[permalink] [raw]

Subject: RE: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On 3/4/2022 2:56 PM, Zqiang wrote:
> When RCU_BOOST is enabled, the boost kthreads will boosting readers
> who are blocking a given grace period, if the current reader tasks
> have a higher priority than boost kthreads(the boost kthreads priority
> not always 1, if the kthread_prio is set), boosting is useless, skip
> current task and select next task to boosting, reduce the time for a
> given grace period.
>
> Signed-off-by: Zqiang <[email protected]>
> ---
> kernel/rcu/tree_plugin.h | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index c3d212bc5338..d35b6da66bbd 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -12,6 +12,7 @@
> */
>
> #include "../locking/rtmutex_common.h"
> +#include <linux/sched/deadline.h>
>
> static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> {
> @@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
> * section.
> */
> t = container_of(tb, struct task_struct, rcu_node_entry);
> + if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
> + tb = rcu_next_node_entry(t, rnp);
> + WRITE_ONCE(rnp->boost_tasks, tb);
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> + goto end;
> + }
> +
> rt_mutex_init_proxy_locked(&rnp->boost_mtx.rtmutex, t);
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> /* Lock only for side effect: boosts task t's priority. */
> rt_mutex_lock(&rnp->boost_mtx);
> rt_mutex_unlock(&rnp->boost_mtx); /* Then keep lockdep happy. */
> rnp->n_boosts++;
> -
> +end:
>>
>>Nit: maybe rename the label to "skip_boost:" ?
>>
>>Code looks fine; however, out of curiosity; given that the higher
>>priority tasks, in general, would exit their read side critical section
>>quickly and boost the next blocking reader on exiting their read side
>>section; do you see noticeable reduction in grace period timings with
>>the change for certain type of workloads?

Thanks for feedback , In preempt-RT systems, there will be many real-time threads (most
of them are created by users themselves ), their priority is higher or lower than boost kthreads
(kthread_prio is set), for rt tasks with higher priority than boost kthreads, maybe it will exit
read side critical quickly, maybe not, if it is preempted by a higher priority task, If try to boost operation,
this increases the boosts kthread waiting time, as a result, the next blkd tasks cannot be
boosted in time. of course, I don't deny it, there are also reasons that user priority setting is inappropriate.

Thanks
Zqiang

>>
>>
>>Thanks
>>Neeraj

> return READ_ONCE(rnp->exp_tasks) != NULL ||
> READ_ONCE(rnp->boost_tasks) != NULL;
> }

2022-03-07 20:55:46

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On Mon, Mar 07, 2022 at 02:03:17AM +0000, Zhang, Qiang1 wrote:
> On 3/4/2022 2:56 PM, Zqiang wrote:
> > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > who are blocking a given grace period, if the current reader tasks
> > have a higher priority than boost kthreads(the boost kthreads priority
> > not always 1, if the kthread_prio is set), boosting is useless, skip
> > current task and select next task to boosting, reduce the time for a
> > given grace period.
> >
> > Signed-off-by: Zqiang <[email protected]>

Adding to CC to get more eyes on this. I am not necessarily opposed to
it, but I don't do that much RT work myself these days.

Thanx, Paul

> > ---
> > kernel/rcu/tree_plugin.h | 10 +++++++++-
> > 1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index c3d212bc5338..d35b6da66bbd 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -12,6 +12,7 @@
> > */
> >
> > #include "../locking/rtmutex_common.h"
> > +#include <linux/sched/deadline.h>
> >
> > static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> > {
> > @@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
> > * section.
> > */
> > t = container_of(tb, struct task_struct, rcu_node_entry);
> > + if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
> > + tb = rcu_next_node_entry(t, rnp);
> > + WRITE_ONCE(rnp->boost_tasks, tb);
> > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > + goto end;
> > + }
> > +
> > rt_mutex_init_proxy_locked(&rnp->boost_mtx.rtmutex, t);
> > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > /* Lock only for side effect: boosts task t's priority. */
> > rt_mutex_lock(&rnp->boost_mtx);
> > rt_mutex_unlock(&rnp->boost_mtx); /* Then keep lockdep happy. */
> > rnp->n_boosts++;
> > -
> > +end:
> >>
> >>Nit: maybe rename the label to "skip_boost:" ?
> >>
> >>Code looks fine; however, out of curiosity; given that the higher
> >>priority tasks, in general, would exit their read side critical section
> >>quickly and boost the next blocking reader on exiting their read side
> >>section; do you see noticeable reduction in grace period timings with
> >>the change for certain type of workloads?
>
> Thanks for feedback , In preempt-RT systems, there will be many real-time threads (most
> of them are created by users themselves ), their priority is higher or lower than boost kthreads
> (kthread_prio is set), for rt tasks with higher priority than boost kthreads, maybe it will exit
> read side critical quickly, maybe not, if it is preempted by a higher priority task, If try to boost operation,
> this increases the boosts kthread waiting time, as a result, the next blkd tasks cannot be
> boosted in time. of course, I don't deny it, there are also reasons that user priority setting is inappropriate.
>
> Thanks
> Zqiang
>
> >>
> >>
> >>Thanks
> >>Neeraj
>
> > return READ_ONCE(rnp->exp_tasks) != NULL ||
> > READ_ONCE(rnp->boost_tasks) != NULL;
> > }

2022-03-08 21:43:00

by Uladzislau Rezki

[permalink] [raw]

Subject: Re: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

> On Mon, Mar 07, 2022 at 02:03:17AM +0000, Zhang, Qiang1 wrote:
> > On 3/4/2022 2:56 PM, Zqiang wrote:
> > > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > > who are blocking a given grace period, if the current reader tasks
> > > have a higher priority than boost kthreads(the boost kthreads priority
> > > not always 1, if the kthread_prio is set), boosting is useless, skip
> > > current task and select next task to boosting, reduce the time for a
> > > given grace period.
> > >
> > > Signed-off-by: Zqiang <[email protected]>
>
> Adding to CC to get more eyes on this. I am not necessarily opposed to
> it, but I don't do that much RT work myself these days.
>
> Thanx, Paul
>
> > > ---
> > > kernel/rcu/tree_plugin.h | 10 +++++++++-
> > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index c3d212bc5338..d35b6da66bbd 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -12,6 +12,7 @@
> > > */
> > >
> > > #include "../locking/rtmutex_common.h"
> > > +#include <linux/sched/deadline.h>
> > >
> > > static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> > > {
> > > @@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
> > > * section.
> > > */
> > > t = container_of(tb, struct task_struct, rcu_node_entry);
> > > + if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
> > > + tb = rcu_next_node_entry(t, rnp);
> > > + WRITE_ONCE(rnp->boost_tasks, tb);
> > > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > + goto end;
> > > + }
> > > +
Why do you bypass the expedited grace period and boost any tasks anyway?
Same way the expedited gp can be blocked by higher prior tasks SCHED_DEADLINE
or SCHED_FIFO.

Thanks!

--
Vlad Rezki

2022-03-08 23:57:57

by Paul E. McKenney

[permalink] [raw]

Subject: Re: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On Tue, Mar 08, 2022 at 07:04:21PM +0100, Uladzislau Rezki wrote:
> > On Mon, Mar 07, 2022 at 02:03:17AM +0000, Zhang, Qiang1 wrote:
> > > On 3/4/2022 2:56 PM, Zqiang wrote:
> > > > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > > > who are blocking a given grace period, if the current reader tasks
> > > > have a higher priority than boost kthreads(the boost kthreads priority
> > > > not always 1, if the kthread_prio is set), boosting is useless, skip
> > > > current task and select next task to boosting, reduce the time for a
> > > > given grace period.
> > > >
> > > > Signed-off-by: Zqiang <[email protected]>
> >
> > Adding to CC to get more eyes on this. I am not necessarily opposed to
> > it, but I don't do that much RT work myself these days.
> >
> > Thanx, Paul
> >
> > > > ---
> > > > kernel/rcu/tree_plugin.h | 10 +++++++++-
> > > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > index c3d212bc5338..d35b6da66bbd 100644
> > > > --- a/kernel/rcu/tree_plugin.h
> > > > +++ b/kernel/rcu/tree_plugin.h
> > > > @@ -12,6 +12,7 @@
> > > > */
> > > >
> > > > #include "../locking/rtmutex_common.h"
> > > > +#include <linux/sched/deadline.h>
> > > >
> > > > static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> > > > {
> > > > @@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
> > > > * section.
> > > > */
> > > > t = container_of(tb, struct task_struct, rcu_node_entry);
> > > > + if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
> > > > + tb = rcu_next_node_entry(t, rnp);
> > > > + WRITE_ONCE(rnp->boost_tasks, tb);
> > > > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > + goto end;
> > > > + }
> > > > +
> Why do you bypass the expedited grace period and boost any tasks anyway?
> Same way the expedited gp can be blocked by higher prior tasks SCHED_DEADLINE
> or SCHED_FIFO.

Just to make sure that I understand...

Are you pointing out that a SCHED_DEADLINE task might have exhausted
its budget, so that boosting might nonetheless be helpful?

Me, I honestly don't know what happens in that case, so I am just asking
the question. And adding Juri on CC. ;-)

Thanx, Paul

2022-03-09 01:28:38

by Uladzislau Rezki

[permalink] [raw]

Subject: Re: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

On Tue, Mar 08, 2022 at 10:13:55AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 08, 2022 at 07:04:21PM +0100, Uladzislau Rezki wrote:
> > > On Mon, Mar 07, 2022 at 02:03:17AM +0000, Zhang, Qiang1 wrote:
> > > > On 3/4/2022 2:56 PM, Zqiang wrote:
> > > > > When RCU_BOOST is enabled, the boost kthreads will boosting readers
> > > > > who are blocking a given grace period, if the current reader tasks
> > > > > have a higher priority than boost kthreads(the boost kthreads priority
> > > > > not always 1, if the kthread_prio is set), boosting is useless, skip
> > > > > current task and select next task to boosting, reduce the time for a
> > > > > given grace period.
> > > > >
> > > > > Signed-off-by: Zqiang <[email protected]>
> > >
> > > Adding to CC to get more eyes on this. I am not necessarily opposed to
> > > it, but I don't do that much RT work myself these days.
> > >
> > > Thanx, Paul
> > >
> > > > > ---
> > > > > kernel/rcu/tree_plugin.h | 10 +++++++++-
> > > > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > > > index c3d212bc5338..d35b6da66bbd 100644
> > > > > --- a/kernel/rcu/tree_plugin.h
> > > > > +++ b/kernel/rcu/tree_plugin.h
> > > > > @@ -12,6 +12,7 @@
> > > > > */
> > > > >
> > > > > #include "../locking/rtmutex_common.h"
> > > > > +#include <linux/sched/deadline.h>
> > > > >
> > > > > static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> > > > > {
> > > > > @@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
> > > > > * section.
> > > > > */
> > > > > t = container_of(tb, struct task_struct, rcu_node_entry);
> > > > > + if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
> > > > > + tb = rcu_next_node_entry(t, rnp);
> > > > > + WRITE_ONCE(rnp->boost_tasks, tb);
> > > > > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > > + goto end;
> > > > > + }
> > > > > +
> > Why do you bypass the expedited grace period and boost any tasks anyway?
> > Same way the expedited gp can be blocked by higher prior tasks SCHED_DEADLINE
> > or SCHED_FIFO.
>
> Just to make sure that I understand...
>
> Are you pointing out that a SCHED_DEADLINE task might have exhausted
> its budget, so that boosting might nonetheless be helpful?
>
SCHED_DEADLINE we can not preempt nor stop it somehow(highest prio class),
it has some budget it makes use of. If it is in critical section then it
will leave asap, i do not take into account here IRQs and so on. I do not
see a reason to boost it.

>
> Me, I honestly don't know what happens in that case, so I am just asking
> the question. And adding Juri on CC. ;-)
>
Juri should know more :)

--
Vlad Rezki

2022-03-09 03:13:44

by Zqiang

[permalink] [raw]

Subject: RE: [PATCH] rcu: Only boost rcu reader tasks with lower priority than boost kthreads

> On Mon, Mar 07, 2022 at 02:03:17AM +0000, Zhang, Qiang1 wrote:
> > On 3/4/2022 2:56 PM, Zqiang wrote:
> > > When RCU_BOOST is enabled, the boost kthreads will boosting
> > > readers who are blocking a given grace period, if the current
> > > reader tasks have a higher priority than boost kthreads(the boost
> > > kthreads priority not always 1, if the kthread_prio is set),
> > > boosting is useless, skip current task and select next task to
> > > boosting, reduce the time for a given grace period.
> > >
> > > Signed-off-by: Zqiang <[email protected]>
>
> Adding to CC to get more eyes on this. I am not necessarily opposed
> to it, but I don't do that much RT work myself these days.
>
> Thanx, Paul
>
> > > ---
> > > kernel/rcu/tree_plugin.h | 10 +++++++++-
> > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index c3d212bc5338..d35b6da66bbd 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -12,6 +12,7 @@
> > > */
> > >
> > > #include "../locking/rtmutex_common.h"
> > > +#include <linux/sched/deadline.h>
> > >
> > > static bool rcu_rdp_is_offloaded(struct rcu_data *rdp)
> > > {
> > > @@ -1065,13 +1066,20 @@ static int rcu_boost(struct rcu_node *rnp)
> > > * section.
> > > */
> > > t = container_of(tb, struct task_struct, rcu_node_entry);
> > > + if (!rnp->exp_tasks && (dl_task(t) || t->prio <= current->prio)) {
> > > + tb = rcu_next_node_entry(t, rnp);
> > > + WRITE_ONCE(rnp->boost_tasks, tb);
> > > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > + goto end;
> > > + }
> > > +
>>>>Why do you bypass the expedited grace period and boost any tasks anyway?
>>>>Same way the expedited gp can be blocked by higher prior tasks SCHED_DEADLINE or SCHED_FIFO.

Thanks advice, it really ignores expedited grace period, may be
should generate 'boost_exp_tasks' pointer to point 'rnp->exp_tasks', do the same thing
as normal grace period.

Thanks
Zqiang

>>>>
>>>>Thanks!
>>>>
>>>>--
>>>>Vlad Rezki