2022-05-09 09:30:05

by Uladzislau Rezki

[permalink] [raw]
Subject: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

Introduce a RCU_NOCB_CPU_CB_BOOST kernel option. So a user can
decide if an offloading has to be done in a high-prio context or
not. Please note an option depends on RCU_NOCB_CPU and RCU_BOOST
parameters and by default it is off.

This patch splits the boosting preempted RCU readers and those
kthreads which directly responsible for driving expedited grace
periods forward with enabling/disabling the offloading from/to
SCHED_FIFO/SCHED_OTHER contexts.

The main reason of such split is, for example on Android there
are some workloads which require fast expedited grace period to
be done whereas offloading in RT context can lead to starvation
and hogging a CPU for a long time what is not acceptable for
latency sensitive environment. For instance:

<snip>
<...>-60 [006] d..1 2979.028717: rcu_batch_start: rcu_preempt CBs=34619 bl=270
<snip>

invoking 34 619 callbacks will take time thus making other CFS
tasks waiting in run-queue to be starved due to such behaviour.

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
---
kernel/rcu/Kconfig | 14 ++++++++++++++
kernel/rcu/tree.c | 5 ++++-
kernel/rcu/tree_nocb.h | 3 ++-
3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 27aab870ae4c..074630b94902 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -275,6 +275,20 @@ config RCU_NOCB_CPU_DEFAULT_ALL
Say Y here if you want offload all CPUs by default on boot.
Say N here if you are unsure.

+config RCU_NOCB_CPU_CB_BOOST
+ bool "Perform offloading from real-time kthread"
+ depends on RCU_NOCB_CPU && RCU_BOOST
+ default n
+ help
+ Use this option to offload callbacks from the SCHED_FIFO context
+ to make the process faster. As a side effect of this approach is
+ a latency especially for the SCHED_OTHER tasks which will not be
+ able to preempt an offloading kthread. That latency depends on a
+ number of callbacks to be invoked.
+
+ Say Y here if you want to set RT priority for offloading kthreads.
+ Say N here if you are unsure.
+
config TASKS_TRACE_RCU_READ_MB
bool "Tasks Trace RCU readers use memory barriers in user and idle"
depends on RCU_EXPERT && TASKS_TRACE_RCU
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9dc4c4e82db6..d769a15bc0e3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -154,7 +154,10 @@ static void sync_sched_exp_online_cleanup(int cpu);
static void check_cb_ovld_locked(struct rcu_data *rdp, struct rcu_node *rnp);
static bool rcu_rdp_is_offloaded(struct rcu_data *rdp);

-/* rcuc/rcub/rcuop kthread realtime priority */
+/*
+ * rcuc/rcub/rcuop kthread realtime priority. The former
+ * depends on if CONFIG_RCU_NOCB_CPU_CB_BOOST is set.
+ */
static int kthread_prio = IS_ENABLED(CONFIG_RCU_BOOST) ? 1 : 0;
module_param(kthread_prio, int, 0444);

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 60cc92cc6655..a2823be9b1d0 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1315,8 +1315,9 @@ static void rcu_spawn_cpu_nocb_kthread(int cpu)
if (WARN_ONCE(IS_ERR(t), "%s: Could not start rcuo CB kthread, OOM is now expected behavior\n", __func__))
goto end;

- if (kthread_prio)
+ if (IS_ENABLED(CONFIG_RCU_NOCB_CPU_CB_BOOST))
sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
+
WRITE_ONCE(rdp->nocb_cb_kthread, t);
WRITE_ONCE(rdp->nocb_gp_kthread, rdp_gp->nocb_gp_kthread);
return;
--
2.30.2



2022-05-09 10:08:05

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <[email protected]> wrote:
> [...]
> > > > > > > > One easy way to make this work would be to invert the sense of this
> > > > > > > > Kconfig option ("RCU_NOCB_CB_NO_BOOST"?), continue having it default to
> > > > > > > > "n", but then select it somewhere in drivers/android/Kconfig. But I
> > > > > > > > would not be surprised if there is a better way.
> > > > >
> > > > > In that situation probably we should just enable it by default.
> > > >
> > > > You are within your rights to cause it to be enabled by default -within-
> > > > -Android-. You are -not- within your rights to break other workloads.
> > > >
> > > > If ChromeOS needs it too, they too can enable it -within- -ChromeOS-.
> > > >
> > > > It is not -that- hard, guys! ;-)
> > >
> > > I think on the topic of RT, +Steven Rostedt should chime in as well
> > > considering he wrote a good chunk of the RT scheduler ;-). Personally,
> > > I feel the issue of "rcu callback offload" threads running as RT or
> > > not should not be a matter of CONFIG option or the system in concern.
> > > Instead it should be a function of how many callbacks there are to
> > > run. The reason I say this is, RT threads should not be doing a lot
> > > of work anyway, lest they cause RT throttling and starvation of other
> > > threads.
> >
> > This gets complicated surprisingly quickly. For but one example, you
> > would find yourself wanting time-based boosting, most likely before you
> > wanted boosting based on numbers of callbacks. And it is all too easy
> > to drive considerably complexity into the mix before proving that it is
> > really needed. Especially given how rare the need for RCU priority
> > boosting is to begin with.
>
> I think this patch does not deal with or change the behavior of
> dynamic priority boosting preempted RCU readers, but rather it makes
> it such that the no-cb offload threads that execute the callbacks. So
> I am not sure why you are talking about the boosting behavior of
> preempted RCU readers? I was referring only to the nocb offload
> kthreads which as I understand, Vlad *does not* want to run at RT
> priority.

OK. Exactly what is the problem that you are trying to solve? ;-)

And yes, I fully understand that Uladzislau does not want to run the rcuo
kthreads at RT priority, even in kernels built with CONFIG_RCU_BOOST=y.
Which makes sense, given that he is looking to solve a very different
problem than CONFIG_RCU_BOOST was designed to solve. So adjustments must
be made. The discussion is the exact form of the next set of adjustments,
which I expect to be quite straightforward.

> > > Also, I think it is wrong to assume that a certain kind of system will
> > > always have a certain number of callbacks to process at a time. That
> > > seems prone to poor design due to assumptions which may not always be
> > > true.
> >
> > Who was assuming that? Uladzislau was measuring rather than assuming,
> > if that was what you were getting at. Or if you are thinking about
> > things like qhimark, your point is exactly why there is both a default
> > (which has worked quite well for a very long time) and the ability to
> > adjust based on the needs of your specific system.
>
> I was merely saying that based on measurements make assumptions, but
> in the real world the assumption may not be true, then everything
> falls apart. Instead I feel, callback threads should be RT only if 1.
> As you mentioned, the time based thing. 2. If the CB list is long and
> there's lot of processing. But instead, if it is made a CONFIG option,
> then that forces a fixed behavior which may fall apart in the real
> world. I think adding more CONFIGs and special cases is more complex
> but that's my opinion.

Again, exactly what problem are you trying to solve?

From what I can see, Uladzislau's issue can be addressed by statically
setting the rcuo kthreads to SCHED_OTHER at boot time. The discussion
is on exactly how RCU is to be informed of this, at kernel build time.

> > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > and only process few callbacks at a time, while another which is the
> > > lower priority CFS offload thread - executes whenever there is a lot
> > > of CBs pending? Just a thought.
> >
> > How about if we start by solving the problems we know that we have?
>
> I don't know why you would say that, because we are talking about
> solving the specific problem Vlad's patch addresses, not random
> problems. Which is that, Android wants to run expedited GPs, but when
> the callback list is large, the RT nocb thread can starve other
> things. Did I misunderstand the patch? If so, sorry about that but
> that's what my email was discussing. i.e. running of CBs in RT
> threads. I suck at writing well as I clearly miscommunicated.

OK.

Why do you believe that this needs anything other than small adjustments
the defaults of existing Kconfig options? Or am I completely missing
the point of your proposal?

> > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > and increasing burden on the user to get it the CONFIG right.
> >
> > I bet that we will solve this without adding any new Kconfig options.
> > And I bet that the burden is at worst on the device designer, not on
> > the user. Plus it is entirely possible that there might be a way to
> > automatically configure things to handle what we know about today,
> > again without adding Kconfig options.
>
> Yes, agreed.

If I change my last sentence to read as follows, are we still in
agreement?

Plus it is entirely possible that there might be a way to
automatically configure things to handle what we know about today,
again without adding Kconfig options and without changing runtime
code beyond that covered by Uladzislau's patch.

Thanx, Paul

2022-05-09 17:24:59

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <[email protected]> wrote:
>
> On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <[email protected]> wrote:
[...]
> > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > always have a certain number of callbacks to process at a time. That
> > > > seems prone to poor design due to assumptions which may not always be
> > > > true.
> > >
> > > Who was assuming that? Uladzislau was measuring rather than assuming,
> > > if that was what you were getting at. Or if you are thinking about
> > > things like qhimark, your point is exactly why there is both a default
> > > (which has worked quite well for a very long time) and the ability to
> > > adjust based on the needs of your specific system.
> >
> > I was merely saying that based on measurements make assumptions, but
> > in the real world the assumption may not be true, then everything
> > falls apart. Instead I feel, callback threads should be RT only if 1.
> > As you mentioned, the time based thing. 2. If the CB list is long and
> > there's lot of processing. But instead, if it is made a CONFIG option,
> > then that forces a fixed behavior which may fall apart in the real
> > world. I think adding more CONFIGs and special cases is more complex
> > but that's my opinion.
>
> Again, exactly what problem are you trying to solve?
>
> From what I can see, Uladzislau's issue can be addressed by statically
> setting the rcuo kthreads to SCHED_OTHER at boot time. The discussion
> is on exactly how RCU is to be informed of this, at kernel build time.
>
> > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > and only process few callbacks at a time, while another which is the
> > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > of CBs pending? Just a thought.
> > >
> > > How about if we start by solving the problems we know that we have?
> >
> > I don't know why you would say that, because we are talking about
> > solving the specific problem Vlad's patch addresses, not random
> > problems. Which is that, Android wants to run expedited GPs, but when
> > the callback list is large, the RT nocb thread can starve other
> > things. Did I misunderstand the patch? If so, sorry about that but
> > that's what my email was discussing. i.e. running of CBs in RT
> > threads. I suck at writing well as I clearly miscommunicated.
>
> OK.
>
> Why do you believe that this needs anything other than small adjustments
> the defaults of existing Kconfig options? Or am I completely missing
> the point of your proposal?
>
> > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > and increasing burden on the user to get it the CONFIG right.
> > >
> > > I bet that we will solve this without adding any new Kconfig options.
> > > And I bet that the burden is at worst on the device designer, not on
> > > the user. Plus it is entirely possible that there might be a way to
> > > automatically configure things to handle what we know about today,
> > > again without adding Kconfig options.
> >
> > Yes, agreed.
>
> If I change my last sentence to read as follows, are we still in
> agreement?
>
> Plus it is entirely possible that there might be a way to
> automatically configure things to handle what we know about today,
> again without adding Kconfig options and without changing runtime
> code beyond that covered by Uladzislau's patch.

Yes, actually the automatic configuration of things is what I meant,
that's the "problem" I was referring to, where the system does the
right thing for a broader range of systems, without requiring the
users to find RCU issues and hand-tune them (that requires said users
to have tracing and debugging skills and get lucky finding a problem).
To be fair, I did not propose any solutions to such problems either,
it is just some ideas. I don't like knobs too much and I don't trust
users or system designers to get them right most of the time.

In that sense, I don't think making rcuo threads run as RT or not
(which this patch does) is really fixing the problems. In one case,
you might have priority inversion, in another case you might cause
starvation. Probably what is needed is best of both worlds. That said,
I don't have better solutions right now than what I mentioned, which
is to assign priorities to the callbacks themselves and run them in
threads of different priorities.

For the record, I am not against the patch or anything like that (and
even if I was, I am not sure that it matters for merging :P)

Thanks,

- Joel

2022-05-09 18:19:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Mon, May 09, 2022 at 01:17:00PM -0400, Joel Fernandes wrote:
> On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <[email protected]> wrote:
> >
> > On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <[email protected]> wrote:
> [...]
> > > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > > always have a certain number of callbacks to process at a time. That
> > > > > seems prone to poor design due to assumptions which may not always be
> > > > > true.
> > > >
> > > > Who was assuming that? Uladzislau was measuring rather than assuming,
> > > > if that was what you were getting at. Or if you are thinking about
> > > > things like qhimark, your point is exactly why there is both a default
> > > > (which has worked quite well for a very long time) and the ability to
> > > > adjust based on the needs of your specific system.
> > >
> > > I was merely saying that based on measurements make assumptions, but
> > > in the real world the assumption may not be true, then everything
> > > falls apart. Instead I feel, callback threads should be RT only if 1.
> > > As you mentioned, the time based thing. 2. If the CB list is long and
> > > there's lot of processing. But instead, if it is made a CONFIG option,
> > > then that forces a fixed behavior which may fall apart in the real
> > > world. I think adding more CONFIGs and special cases is more complex
> > > but that's my opinion.
> >
> > Again, exactly what problem are you trying to solve?
> >
> > From what I can see, Uladzislau's issue can be addressed by statically
> > setting the rcuo kthreads to SCHED_OTHER at boot time. The discussion
> > is on exactly how RCU is to be informed of this, at kernel build time.
> >
> > > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > > and only process few callbacks at a time, while another which is the
> > > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > > of CBs pending? Just a thought.
> > > >
> > > > How about if we start by solving the problems we know that we have?
> > >
> > > I don't know why you would say that, because we are talking about
> > > solving the specific problem Vlad's patch addresses, not random
> > > problems. Which is that, Android wants to run expedited GPs, but when
> > > the callback list is large, the RT nocb thread can starve other
> > > things. Did I misunderstand the patch? If so, sorry about that but
> > > that's what my email was discussing. i.e. running of CBs in RT
> > > threads. I suck at writing well as I clearly miscommunicated.
> >
> > OK.
> >
> > Why do you believe that this needs anything other than small adjustments
> > the defaults of existing Kconfig options? Or am I completely missing
> > the point of your proposal?
> >
> > > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > > and increasing burden on the user to get it the CONFIG right.
> > > >
> > > > I bet that we will solve this without adding any new Kconfig options.
> > > > And I bet that the burden is at worst on the device designer, not on
> > > > the user. Plus it is entirely possible that there might be a way to
> > > > automatically configure things to handle what we know about today,
> > > > again without adding Kconfig options.
> > >
> > > Yes, agreed.
> >
> > If I change my last sentence to read as follows, are we still in
> > agreement?
> >
> > Plus it is entirely possible that there might be a way to
> > automatically configure things to handle what we know about today,
> > again without adding Kconfig options and without changing runtime
> > code beyond that covered by Uladzislau's patch.
>
> Yes, actually the automatic configuration of things is what I meant,
> that's the "problem" I was referring to, where the system does the
> right thing for a broader range of systems, without requiring the
> users to find RCU issues and hand-tune them (that requires said users
> to have tracing and debugging skills and get lucky finding a problem).
> To be fair, I did not propose any solutions to such problems either,
> it is just some ideas. I don't like knobs too much and I don't trust
> users or system designers to get them right most of the time.
>
> In that sense, I don't think making rcuo threads run as RT or not
> (which this patch does) is really fixing the problems. In one case,
> you might have priority inversion, in another case you might cause
> starvation. Probably what is needed is best of both worlds. That said,
> I don't have better solutions right now than what I mentioned, which
> is to assign priorities to the callbacks themselves and run them in
> threads of different priorities.
>
> For the record, I am not against the patch or anything like that (and
> even if I was, I am not sure that it matters for merging :P)

Fair enough!

And for the record at this end, I would not be surprised if in 2032
RCU offloaded callback invocation has sophisticated dynamic tuning of
priorities and much else besides. But one step at a time! ;-)

Thanx, Paul

2022-05-09 18:34:38

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

> On Mon, May 09, 2022 at 01:17:00PM -0400, Joel Fernandes wrote:
> > On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <[email protected]> wrote:
> > >
> > > On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > > > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <[email protected]> wrote:
> > [...]
> > > > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > > > always have a certain number of callbacks to process at a time. That
> > > > > > seems prone to poor design due to assumptions which may not always be
> > > > > > true.
> > > > >
> > > > > Who was assuming that? Uladzislau was measuring rather than assuming,
> > > > > if that was what you were getting at. Or if you are thinking about
> > > > > things like qhimark, your point is exactly why there is both a default
> > > > > (which has worked quite well for a very long time) and the ability to
> > > > > adjust based on the needs of your specific system.
> > > >
> > > > I was merely saying that based on measurements make assumptions, but
> > > > in the real world the assumption may not be true, then everything
> > > > falls apart. Instead I feel, callback threads should be RT only if 1.
> > > > As you mentioned, the time based thing. 2. If the CB list is long and
> > > > there's lot of processing. But instead, if it is made a CONFIG option,
> > > > then that forces a fixed behavior which may fall apart in the real
> > > > world. I think adding more CONFIGs and special cases is more complex
> > > > but that's my opinion.
> > >
> > > Again, exactly what problem are you trying to solve?
> > >
> > > From what I can see, Uladzislau's issue can be addressed by statically
> > > setting the rcuo kthreads to SCHED_OTHER at boot time. The discussion
> > > is on exactly how RCU is to be informed of this, at kernel build time.
> > >
> > > > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > > > and only process few callbacks at a time, while another which is the
> > > > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > > > of CBs pending? Just a thought.
> > > > >
> > > > > How about if we start by solving the problems we know that we have?
> > > >
> > > > I don't know why you would say that, because we are talking about
> > > > solving the specific problem Vlad's patch addresses, not random
> > > > problems. Which is that, Android wants to run expedited GPs, but when
> > > > the callback list is large, the RT nocb thread can starve other
> > > > things. Did I misunderstand the patch? If so, sorry about that but
> > > > that's what my email was discussing. i.e. running of CBs in RT
> > > > threads. I suck at writing well as I clearly miscommunicated.
> > >
> > > OK.
> > >
> > > Why do you believe that this needs anything other than small adjustments
> > > the defaults of existing Kconfig options? Or am I completely missing
> > > the point of your proposal?
> > >
> > > > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > > > and increasing burden on the user to get it the CONFIG right.
> > > > >
> > > > > I bet that we will solve this without adding any new Kconfig options.
> > > > > And I bet that the burden is at worst on the device designer, not on
> > > > > the user. Plus it is entirely possible that there might be a way to
> > > > > automatically configure things to handle what we know about today,
> > > > > again without adding Kconfig options.
> > > >
> > > > Yes, agreed.
> > >
> > > If I change my last sentence to read as follows, are we still in
> > > agreement?
> > >
> > > Plus it is entirely possible that there might be a way to
> > > automatically configure things to handle what we know about today,
> > > again without adding Kconfig options and without changing runtime
> > > code beyond that covered by Uladzislau's patch.
> >
> > Yes, actually the automatic configuration of things is what I meant,
> > that's the "problem" I was referring to, where the system does the
> > right thing for a broader range of systems, without requiring the
> > users to find RCU issues and hand-tune them (that requires said users
> > to have tracing and debugging skills and get lucky finding a problem).
> > To be fair, I did not propose any solutions to such problems either,
> > it is just some ideas. I don't like knobs too much and I don't trust
> > users or system designers to get them right most of the time.
> >
> > In that sense, I don't think making rcuo threads run as RT or not
> > (which this patch does) is really fixing the problems. In one case,
> > you might have priority inversion, in another case you might cause
> > starvation. Probably what is needed is best of both worlds. That said,
> > I don't have better solutions right now than what I mentioned, which
> > is to assign priorities to the callbacks themselves and run them in
> > threads of different priorities.
> >
> > For the record, I am not against the patch or anything like that (and
> > even if I was, I am not sure that it matters for merging :P)
>
> Fair enough!
>
> And for the record at this end, I would not be surprised if in 2032
> RCU offloaded callback invocation has sophisticated dynamic tuning of
> priorities and much else besides. But one step at a time! ;-)
>
hh... It is hard to comment because i am a bit lost in this big conversation :)

What i have got so far. Joel does not like adding extra *_CONFIG
options, actually me too since it becomes more complicated thus
it requires more specific attention from users. I prefer to make
the code common but it is not possible sometimes to make it common,
because we have different kind of kernels and workloads.

From the other hand the patch splits the BOOSTING logic into two peaces
because driving the grace periods kthreads in RT priority is not a big
issue because their run-times are short. Whereas running the "kthreads-callbacks"
in the RT context can be long so we end up in throttled situation for
other workloads.

I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
was mainly designed for that kind of kernels. So we can align with Alison
patch and her decision, so i do not see any issues. So far RT folk seems
does not mind in having "callback-kthreads" as SCHED_FIFO :)

Do you agree with start from keeping it ON for CONFIG_PREEMPT_RT conf.
by default and OFF for other cases?

--
Uladzislau Rezki

2022-05-09 18:44:47

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Mon, May 09, 2022 at 08:28:26PM +0200, Uladzislau Rezki wrote:
> > On Mon, May 09, 2022 at 01:17:00PM -0400, Joel Fernandes wrote:
> > > On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <[email protected]> wrote:
> > > >
> > > > On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > > > > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <[email protected]> wrote:
> > > [...]
> > > > > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > > > > always have a certain number of callbacks to process at a time. That
> > > > > > > seems prone to poor design due to assumptions which may not always be
> > > > > > > true.
> > > > > >
> > > > > > Who was assuming that? Uladzislau was measuring rather than assuming,
> > > > > > if that was what you were getting at. Or if you are thinking about
> > > > > > things like qhimark, your point is exactly why there is both a default
> > > > > > (which has worked quite well for a very long time) and the ability to
> > > > > > adjust based on the needs of your specific system.
> > > > >
> > > > > I was merely saying that based on measurements make assumptions, but
> > > > > in the real world the assumption may not be true, then everything
> > > > > falls apart. Instead I feel, callback threads should be RT only if 1.
> > > > > As you mentioned, the time based thing. 2. If the CB list is long and
> > > > > there's lot of processing. But instead, if it is made a CONFIG option,
> > > > > then that forces a fixed behavior which may fall apart in the real
> > > > > world. I think adding more CONFIGs and special cases is more complex
> > > > > but that's my opinion.
> > > >
> > > > Again, exactly what problem are you trying to solve?
> > > >
> > > > From what I can see, Uladzislau's issue can be addressed by statically
> > > > setting the rcuo kthreads to SCHED_OTHER at boot time. The discussion
> > > > is on exactly how RCU is to be informed of this, at kernel build time.
> > > >
> > > > > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > > > > and only process few callbacks at a time, while another which is the
> > > > > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > > > > of CBs pending? Just a thought.
> > > > > >
> > > > > > How about if we start by solving the problems we know that we have?
> > > > >
> > > > > I don't know why you would say that, because we are talking about
> > > > > solving the specific problem Vlad's patch addresses, not random
> > > > > problems. Which is that, Android wants to run expedited GPs, but when
> > > > > the callback list is large, the RT nocb thread can starve other
> > > > > things. Did I misunderstand the patch? If so, sorry about that but
> > > > > that's what my email was discussing. i.e. running of CBs in RT
> > > > > threads. I suck at writing well as I clearly miscommunicated.
> > > >
> > > > OK.
> > > >
> > > > Why do you believe that this needs anything other than small adjustments
> > > > the defaults of existing Kconfig options? Or am I completely missing
> > > > the point of your proposal?
> > > >
> > > > > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > > > > and increasing burden on the user to get it the CONFIG right.
> > > > > >
> > > > > > I bet that we will solve this without adding any new Kconfig options.
> > > > > > And I bet that the burden is at worst on the device designer, not on
> > > > > > the user. Plus it is entirely possible that there might be a way to
> > > > > > automatically configure things to handle what we know about today,
> > > > > > again without adding Kconfig options.
> > > > >
> > > > > Yes, agreed.
> > > >
> > > > If I change my last sentence to read as follows, are we still in
> > > > agreement?
> > > >
> > > > Plus it is entirely possible that there might be a way to
> > > > automatically configure things to handle what we know about today,
> > > > again without adding Kconfig options and without changing runtime
> > > > code beyond that covered by Uladzislau's patch.
> > >
> > > Yes, actually the automatic configuration of things is what I meant,
> > > that's the "problem" I was referring to, where the system does the
> > > right thing for a broader range of systems, without requiring the
> > > users to find RCU issues and hand-tune them (that requires said users
> > > to have tracing and debugging skills and get lucky finding a problem).
> > > To be fair, I did not propose any solutions to such problems either,
> > > it is just some ideas. I don't like knobs too much and I don't trust
> > > users or system designers to get them right most of the time.
> > >
> > > In that sense, I don't think making rcuo threads run as RT or not
> > > (which this patch does) is really fixing the problems. In one case,
> > > you might have priority inversion, in another case you might cause
> > > starvation. Probably what is needed is best of both worlds. That said,
> > > I don't have better solutions right now than what I mentioned, which
> > > is to assign priorities to the callbacks themselves and run them in
> > > threads of different priorities.
> > >
> > > For the record, I am not against the patch or anything like that (and
> > > even if I was, I am not sure that it matters for merging :P)
> >
> > Fair enough!
> >
> > And for the record at this end, I would not be surprised if in 2032
> > RCU offloaded callback invocation has sophisticated dynamic tuning of
> > priorities and much else besides. But one step at a time! ;-)
> >
> hh... It is hard to comment because i am a bit lost in this big conversation :)
>
> What i have got so far. Joel does not like adding extra *_CONFIG
> options, actually me too since it becomes more complicated thus
> it requires more specific attention from users. I prefer to make
> the code common but it is not possible sometimes to make it common,
> because we have different kind of kernels and workloads.
>
> >From the other hand the patch splits the BOOSTING logic into two peaces
> because driving the grace periods kthreads in RT priority is not a big
> issue because their run-times are short. Whereas running the "kthreads-callbacks"
> in the RT context can be long so we end up in throttled situation for
> other workloads.
>
> I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> was mainly designed for that kind of kernels. So we can align with Alison
> patch and her decision, so i do not see any issues. So far RT folk seems
> does not mind in having "callback-kthreads" as SCHED_FIFO :)
>
> Do you agree with start from keeping it ON for CONFIG_PREEMPT_RT conf.
> by default and OFF for other cases?

Yes, please!

This allows your current RCU_NOCB_CPU_CB_BOOST with something like
this in place of the "default n":

default y if PREEMPT_RT
default n if !PREEMPT_RT

There might be a simpler way of doing this, but this would work.

Could you please send a v2 with the requested updates?

Thanx, Paul

2022-05-09 18:48:59

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

> On Mon, May 09, 2022 at 08:28:26PM +0200, Uladzislau Rezki wrote:
> > > On Mon, May 09, 2022 at 01:17:00PM -0400, Joel Fernandes wrote:
> > > > On Sun, May 8, 2022 at 11:37 PM Paul E. McKenney <[email protected]> wrote:
> > > > >
> > > > > On Sun, May 08, 2022 at 08:17:49PM -0400, Joel Fernandes wrote:
> > > > > > On Sun, May 8, 2022 at 5:32 PM Paul E. McKenney <[email protected]> wrote:
> > > > [...]
> > > > > > > > Also, I think it is wrong to assume that a certain kind of system will
> > > > > > > > always have a certain number of callbacks to process at a time. That
> > > > > > > > seems prone to poor design due to assumptions which may not always be
> > > > > > > > true.
> > > > > > >
> > > > > > > Who was assuming that? Uladzislau was measuring rather than assuming,
> > > > > > > if that was what you were getting at. Or if you are thinking about
> > > > > > > things like qhimark, your point is exactly why there is both a default
> > > > > > > (which has worked quite well for a very long time) and the ability to
> > > > > > > adjust based on the needs of your specific system.
> > > > > >
> > > > > > I was merely saying that based on measurements make assumptions, but
> > > > > > in the real world the assumption may not be true, then everything
> > > > > > falls apart. Instead I feel, callback threads should be RT only if 1.
> > > > > > As you mentioned, the time based thing. 2. If the CB list is long and
> > > > > > there's lot of processing. But instead, if it is made a CONFIG option,
> > > > > > then that forces a fixed behavior which may fall apart in the real
> > > > > > world. I think adding more CONFIGs and special cases is more complex
> > > > > > but that's my opinion.
> > > > >
> > > > > Again, exactly what problem are you trying to solve?
> > > > >
> > > > > From what I can see, Uladzislau's issue can be addressed by statically
> > > > > setting the rcuo kthreads to SCHED_OTHER at boot time. The discussion
> > > > > is on exactly how RCU is to be informed of this, at kernel build time.
> > > > >
> > > > > > > > Can we not have 2 sets of RCU offload threads, one which operate at RT
> > > > > > > > and only process few callbacks at a time, while another which is the
> > > > > > > > lower priority CFS offload thread - executes whenever there is a lot
> > > > > > > > of CBs pending? Just a thought.
> > > > > > >
> > > > > > > How about if we start by solving the problems we know that we have?
> > > > > >
> > > > > > I don't know why you would say that, because we are talking about
> > > > > > solving the specific problem Vlad's patch addresses, not random
> > > > > > problems. Which is that, Android wants to run expedited GPs, but when
> > > > > > the callback list is large, the RT nocb thread can starve other
> > > > > > things. Did I misunderstand the patch? If so, sorry about that but
> > > > > > that's what my email was discussing. i.e. running of CBs in RT
> > > > > > threads. I suck at writing well as I clearly miscommunicated.
> > > > >
> > > > > OK.
> > > > >
> > > > > Why do you believe that this needs anything other than small adjustments
> > > > > the defaults of existing Kconfig options? Or am I completely missing
> > > > > the point of your proposal?
> > > > >
> > > > > > > > Otherwise, I feel like we might be again proliferating CONFIG options
> > > > > > > > and increasing burden on the user to get it the CONFIG right.
> > > > > > >
> > > > > > > I bet that we will solve this without adding any new Kconfig options.
> > > > > > > And I bet that the burden is at worst on the device designer, not on
> > > > > > > the user. Plus it is entirely possible that there might be a way to
> > > > > > > automatically configure things to handle what we know about today,
> > > > > > > again without adding Kconfig options.
> > > > > >
> > > > > > Yes, agreed.
> > > > >
> > > > > If I change my last sentence to read as follows, are we still in
> > > > > agreement?
> > > > >
> > > > > Plus it is entirely possible that there might be a way to
> > > > > automatically configure things to handle what we know about today,
> > > > > again without adding Kconfig options and without changing runtime
> > > > > code beyond that covered by Uladzislau's patch.
> > > >
> > > > Yes, actually the automatic configuration of things is what I meant,
> > > > that's the "problem" I was referring to, where the system does the
> > > > right thing for a broader range of systems, without requiring the
> > > > users to find RCU issues and hand-tune them (that requires said users
> > > > to have tracing and debugging skills and get lucky finding a problem).
> > > > To be fair, I did not propose any solutions to such problems either,
> > > > it is just some ideas. I don't like knobs too much and I don't trust
> > > > users or system designers to get them right most of the time.
> > > >
> > > > In that sense, I don't think making rcuo threads run as RT or not
> > > > (which this patch does) is really fixing the problems. In one case,
> > > > you might have priority inversion, in another case you might cause
> > > > starvation. Probably what is needed is best of both worlds. That said,
> > > > I don't have better solutions right now than what I mentioned, which
> > > > is to assign priorities to the callbacks themselves and run them in
> > > > threads of different priorities.
> > > >
> > > > For the record, I am not against the patch or anything like that (and
> > > > even if I was, I am not sure that it matters for merging :P)
> > >
> > > Fair enough!
> > >
> > > And for the record at this end, I would not be surprised if in 2032
> > > RCU offloaded callback invocation has sophisticated dynamic tuning of
> > > priorities and much else besides. But one step at a time! ;-)
> > >
> > hh... It is hard to comment because i am a bit lost in this big conversation :)
> >
> > What i have got so far. Joel does not like adding extra *_CONFIG
> > options, actually me too since it becomes more complicated thus
> > it requires more specific attention from users. I prefer to make
> > the code common but it is not possible sometimes to make it common,
> > because we have different kind of kernels and workloads.
> >
> > >From the other hand the patch splits the BOOSTING logic into two peaces
> > because driving the grace periods kthreads in RT priority is not a big
> > issue because their run-times are short. Whereas running the "kthreads-callbacks"
> > in the RT context can be long so we end up in throttled situation for
> > other workloads.
> >
> > I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> > was mainly designed for that kind of kernels. So we can align with Alison
> > patch and her decision, so i do not see any issues. So far RT folk seems
> > does not mind in having "callback-kthreads" as SCHED_FIFO :)
> >
> > Do you agree with start from keeping it ON for CONFIG_PREEMPT_RT conf.
> > by default and OFF for other cases?
>
> Yes, please!
>
> This allows your current RCU_NOCB_CPU_CB_BOOST with something like
> this in place of the "default n":
>
> default y if PREEMPT_RT
> default n if !PREEMPT_RT
>
> There might be a simpler way of doing this, but this would work.
>
> Could you please send a v2 with the requested updates?
>
No problem, will send an updated version.

--
Uladzislau Rezki

2022-05-10 19:16:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Tue, May 10, 2022 at 04:07:23PM +0200, Sebastian Andrzej Siewior wrote:
> On 2022-05-05 12:09:15 [-0700], Paul E. McKenney wrote:
> > All good points!
> >
> > Some questions and comments below.
> >
> > Adding Sebastian on CC for his perspective.
>
> Thank you.
> I may missing things, I tried to digest the thread…
>
> In my understanding: The boosting option is used to allow a SCHED_OTHER
> task within a RCU section to allow to leave the RCU section while tasks
> with higher priority occupy the CPU.
> As far as the RCU callbacks are concerned, I'm not aware that it would
> be beneficial to run them with an elevated priority. On SMP systems,
> there is the suggestion to have a housekeeping CPU and to offload the
> RCU callbacks to this CPU and no to bother the CPU with the RT workload.

Agreed, even within RT, there are multiple ways to get this job done.

Thanx, Paul

2022-05-10 19:27:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Tue, May 10, 2022 at 10:09:46AM -0400, Steven Rostedt wrote:
> On Mon, 9 May 2022 11:39:34 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > This allows your current RCU_NOCB_CPU_CB_BOOST with something like
> > this in place of the "default n":
> >
> > default y if PREEMPT_RT
> > default n if !PREEMPT_RT
>
> BTW, I don't think you need the !PREEMPT_RT, because all configs are
> 'n' by default. That is:
>
> default y if PREEMPT_RT
>
> should be good enough.

Good point, thank you!

That said, there is a lot of "default n" in a lot of Kconfig files.
And I am OK making this explicit. So Uladzislau's choice. ;-)

Thanx, Paul

2022-05-10 20:06:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Mon, 9 May 2022 11:39:34 -0700
"Paul E. McKenney" <[email protected]> wrote:

> This allows your current RCU_NOCB_CPU_CB_BOOST with something like
> this in place of the "default n":
>
> default y if PREEMPT_RT
> default n if !PREEMPT_RT

BTW, I don't think you need the !PREEMPT_RT, because all configs are
'n' by default. That is:

default y if PREEMPT_RT

should be good enough.

-- Steve

2022-05-10 20:35:40

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Mon, 9 May 2022 20:28:26 +0200
Uladzislau Rezki <[email protected]> wrote:

> I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> was mainly designed for that kind of kernels. So we can align with Alison
> patch and her decision, so i do not see any issues. So far RT folk seems
> does not mind in having "callback-kthreads" as SCHED_FIFO :)

That's because RT folks set the threads they care about to a higher RT
priority than the kthreads. ;-)


-- Steve

2022-05-10 21:30:51

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Tue, May 10, 2022 at 10:09:46AM -0400, Steven Rostedt wrote:
> On Mon, 9 May 2022 11:39:34 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > This allows your current RCU_NOCB_CPU_CB_BOOST with something like
> > this in place of the "default n":
> >
> > default y if PREEMPT_RT
> > default n if !PREEMPT_RT
>
> BTW, I don't think you need the !PREEMPT_RT, because all configs are
> 'n' by default. That is:
>
> default y if PREEMPT_RT
>
> should be good enough.
>
Thank you! This is what i have in the next v2 :)

--
Uladzislau Rezki

2022-05-11 15:41:54

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Wed, May 11, 2022 at 10:29:57AM -0400, Steven Rostedt wrote:
> On Wed, 11 May 2022 15:39:56 +0200
> Uladzislau Rezki <[email protected]> wrote:
>
> > <snip>
> > rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
> > rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
> > rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
> > ...
> > rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
> > rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
> > rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
> > rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch
> > rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch
> > <snip>
> >
> > i spent some time in order to understand why the context was not switched,
> > even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
> > was sent to the CPU_0 to reschedule. The last "." in latency field shows
> > that a context has not disabled any preemption. So everything should be fine.
> >
> > An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
> > so a task becomes non preemtable but the ftrace does not provide any signal about
> > it. So i was fooled for some time by my tracer logs.
> >
> > Do you have any thoughts about it? Should it be solved or signaled
> > somehow that a task in fact is not preemtable if a counter > 0?
>
> Hmm, it should show it in the first part (where the 'd' is). Is this a
> snapshot from the kernel or from trace-cmd?
>
I do both and the behavior is the same. But the above one looks like a
kernel trace output, the trace-cmd snapshot looks differently. So you
mean "s" has to be there then?

<snip>
entry->preempt_count = pc & 0xff;
entry->pid = (tsk) ? tsk->pid : 0;
entry->type = type;
entry->flags =
#ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
(irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
#else
TRACE_FLAG_IRQS_NOSUPPORT |
#endif
((pc & NMI_MASK ) ? TRACE_FLAG_NMI : 0) |
((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
((pc & SOFTIRQ_OFFSET) ? TRACE_FLAG_SOFTIRQ : 0) |
(tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) |
(test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0);
<snip>

BTW, i am not the 5.10 kernel. I have not checked the latest kernel
and what ftrace reports under holding local_bh_disable().

--
Uladzislau Rezki

2022-05-11 17:55:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Wed, 11 May 2022 15:39:56 +0200
Uladzislau Rezki <[email protected]> wrote:

> <snip>
> rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
> rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
> rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
> ...
> rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
> rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
> rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
> rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch
> rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch
> <snip>
>
> i spent some time in order to understand why the context was not switched,
> even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
> was sent to the CPU_0 to reschedule. The last "." in latency field shows
> that a context has not disabled any preemption. So everything should be fine.
>
> An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
> so a task becomes non preemtable but the ftrace does not provide any signal about
> it. So i was fooled for some time by my tracer logs.
>
> Do you have any thoughts about it? Should it be solved or signaled
> somehow that a task in fact is not preemtable if a counter > 0?

Hmm, it should show it in the first part (where the 'd' is). Is this a
snapshot from the kernel or from trace-cmd?

-- Steve

2022-05-11 19:56:20

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

> > On Wed, May 11, 2022 at 10:29:57AM -0400, Steven Rostedt wrote:
> > > On Wed, 11 May 2022 15:39:56 +0200
> > > Uladzislau Rezki <[email protected]> wrote:
> > >
> > > > <snip>
> > > > rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
> > > > rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
> > > > rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
> > > > ...
> > > > rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
> > > > rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
> > > > rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
> > > > rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch
> > > > rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch
> > > > <snip>
> > > >
> > > > i spent some time in order to understand why the context was not switched,
> > > > even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
> > > > was sent to the CPU_0 to reschedule. The last "." in latency field shows
> > > > that a context has not disabled any preemption. So everything should be fine.
> > > >
> > > > An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
> > > > so a task becomes non preemtable but the ftrace does not provide any signal about
> > > > it. So i was fooled for some time by my tracer logs.
> > > >
> > > > Do you have any thoughts about it? Should it be solved or signaled
> > > > somehow that a task in fact is not preemtable if a counter > 0?
> > >
> > > Hmm, it should show it in the first part (where the 'd' is). Is this a
> > > snapshot from the kernel or from trace-cmd?
> > >
> > I do both and the behavior is the same. But the above one looks like a
> > kernel trace output, the trace-cmd snapshot looks differently. So you
> > mean "s" has to be there then?
> >
> > <snip>
> > entry->preempt_count = pc & 0xff;
> > entry->pid = (tsk) ? tsk->pid : 0;
> > entry->type = type;
> > entry->flags =
> > #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
> > (irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
> > #else
> > TRACE_FLAG_IRQS_NOSUPPORT |
> > #endif
> > ((pc & NMI_MASK ) ? TRACE_FLAG_NMI : 0) |
> > ((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
> > ((pc & SOFTIRQ_OFFSET) ? TRACE_FLAG_SOFTIRQ : 0) |
> > (tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) |
> > (test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0);
> > <snip>
> >
> > BTW, i am not the 5.10 kernel. I have not checked the latest kernel
> > and what ftrace reports under holding local_bh_disable().
> >
> Sorry, the was a typo. I am checking 5.10 kernel and the trace was taken
> on that kernel.
>
OK. It was added on the latest kernel:

root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace_pipe
vmalloc_test/0-1296 [062] b.... 18.157470: 0xffffffffc044e5dc: -> in the local_bh_disable()

root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:64
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
root@pc638:/home/urezki# uname -a
Linux pc638 5.17.0-rc2-next-20220201 #63 SMP PREEMPT Tue May 10 20:39:08 CEST 2022 x86_64 GNU/Linux
root@pc638:/home/urezki#

so it shows *bh* disabled sections.

--
Uladzislau Rezki

2022-05-12 06:29:32

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

> On Mon, 9 May 2022 20:28:26 +0200
> Uladzislau Rezki <[email protected]> wrote:
>
> > I see that Paul would like to keep it for CONFIG_PREEMPT_RT, because it
> > was mainly designed for that kind of kernels. So we can align with Alison
> > patch and her decision, so i do not see any issues. So far RT folk seems
> > does not mind in having "callback-kthreads" as SCHED_FIFO :)
>
> That's because RT folks set the threads they care about to a higher RT
> priority than the kthreads. ;-)
>
That explains many things :)

I have one question, it is partly related to the topic that is in question
and to this thread also. I was tracing a "long" duration of the offloading
kthreads which actually invoke them one by one. And the picture was like
below from ftrace point of view:

<snip>
rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
...
rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch
rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch
<snip>

i spent some time in order to understand why the context was not switched,
even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
was sent to the CPU_0 to reschedule. The last "." in latency field shows
that a context has not disabled any preemption. So everything should be fine.

An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
so a task becomes non preemtable but the ftrace does not provide any signal about
it. So i was fooled for some time by my tracer logs.

Do you have any thoughts about it? Should it be solved or signaled
somehow that a task in fact is not preemtable if a counter > 0?

Thanks!

--
Uladzislau Rezki

2022-05-12 09:19:04

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

> On Wed, May 11, 2022 at 10:29:57AM -0400, Steven Rostedt wrote:
> > On Wed, 11 May 2022 15:39:56 +0200
> > Uladzislau Rezki <[email protected]> wrote:
> >
> > > <snip>
> > > rcuop/6-54 [000] .N.. 183.753018: rcu_invoke_callback: rcu_preempt rhp=0xffffff88ffd440b0 func=__d_free.cfi_jt
> > > rcuop/6-54 [000] .N.. 183.753020: rcu_invoke_callback: rcu_preempt rhp=0xffffff892ffd8400 func=inode_free_by_rcu.cfi_jt
> > > rcuop/6-54 [000] .N.. 183.753021: rcu_invoke_callback: rcu_preempt rhp=0xffffff89327cd708 func=i_callback.cfi_jt
> > > ...
> > > rcuop/6-54 [000] .N.. 183.755941: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c5a968 func=i_callback.cfi_jt
> > > rcuop/6-54 [000] .N.. 183.755942: rcu_invoke_callback: rcu_preempt rhp=0xffffff8993c4bd20 func=__d_free.cfi_jt
> > > rcuop/6-54 [000] dN.. 183.755944: rcu_batch_end: rcu_preempt CBs-invoked=2112 idle=>c<>c<>c<>c<
> > > rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: Start context switch
> > > rcuop/6-54 [000] dN.. 183.755946: rcu_utilization: End context switch
> > > <snip>
> > >
> > > i spent some time in order to understand why the context was not switched,
> > > even though the "rcuop" kthread was marked as TIF_NEED_RESCHED and an IPI
> > > was sent to the CPU_0 to reschedule. The last "." in latency field shows
> > > that a context has not disabled any preemption. So everything should be fine.
> > >
> > > An explanation is that a local_bh_disable() modifies the current_thread_info()->preempt.count
> > > so a task becomes non preemtable but the ftrace does not provide any signal about
> > > it. So i was fooled for some time by my tracer logs.
> > >
> > > Do you have any thoughts about it? Should it be solved or signaled
> > > somehow that a task in fact is not preemtable if a counter > 0?
> >
> > Hmm, it should show it in the first part (where the 'd' is). Is this a
> > snapshot from the kernel or from trace-cmd?
> >
> I do both and the behavior is the same. But the above one looks like a
> kernel trace output, the trace-cmd snapshot looks differently. So you
> mean "s" has to be there then?
>
> <snip>
> entry->preempt_count = pc & 0xff;
> entry->pid = (tsk) ? tsk->pid : 0;
> entry->type = type;
> entry->flags =
> #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT
> (irqs_disabled_flags(flags) ? TRACE_FLAG_IRQS_OFF : 0) |
> #else
> TRACE_FLAG_IRQS_NOSUPPORT |
> #endif
> ((pc & NMI_MASK ) ? TRACE_FLAG_NMI : 0) |
> ((pc & HARDIRQ_MASK) ? TRACE_FLAG_HARDIRQ : 0) |
> ((pc & SOFTIRQ_OFFSET) ? TRACE_FLAG_SOFTIRQ : 0) |
> (tif_need_resched() ? TRACE_FLAG_NEED_RESCHED : 0) |
> (test_preempt_need_resched() ? TRACE_FLAG_PREEMPT_RESCHED : 0);
> <snip>
>
> BTW, i am not the 5.10 kernel. I have not checked the latest kernel
> and what ftrace reports under holding local_bh_disable().
>
Sorry, the was a typo. I am checking 5.10 kernel and the trace was taken
on that kernel.


2022-05-17 01:51:23

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

> On Wed, 11 May 2022 19:17:42 +0200
> Uladzislau Rezki <[email protected]> wrote:
>
> > OK. It was added on the latest kernel:
> >
> > root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace_pipe
> > vmalloc_test/0-1296 [062] b.... 18.157470: 0xffffffffc044e5dc: -> in the local_bh_disable()
> >
> > root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 0/0 #P:64
> > #
> > # _-----=> irqs-off/BH-disabled
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / _-=> migrate-disable
> > # |||| / delay
> > # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> > # | | | ||||| | |
> > root@pc638:/home/urezki# uname -a
> > Linux pc638 5.17.0-rc2-next-20220201 #63 SMP PREEMPT Tue May 10 20:39:08 CEST 2022 x86_64 GNU/Linux
> > root@pc638:/home/urezki#
> >
> > so it shows *bh* disabled sections.
>
> Ah yes. That was added with commit 289e7b0f7eb47 ("tracing: Account bottom
> half disabled sections."). Maybe I should mark that as stable?
>
BTW, a commit message of the "289e7b0f7eb47" change exactly describes
my first thought why it behaves like that. So if it is possible it would
be great to have it in the 5.10 and 5.15 stable kernels.

Thanks!

--
Uladzislau Rezki

2022-05-17 07:14:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] rcu/nocb: Add an option to ON/OFF an offloading from RT context

On Wed, 11 May 2022 19:17:42 +0200
Uladzislau Rezki <[email protected]> wrote:

> OK. It was added on the latest kernel:
>
> root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace_pipe
> vmalloc_test/0-1296 [062] b.... 18.157470: 0xffffffffc044e5dc: -> in the local_bh_disable()
>
> root@pc638:/home/urezki# cat /sys/kernel/debug/tracing/trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 0/0 #P:64
> #
> # _-----=> irqs-off/BH-disabled
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / _-=> migrate-disable
> # |||| / delay
> # TASK-PID CPU# ||||| TIMESTAMP FUNCTION
> # | | | ||||| | |
> root@pc638:/home/urezki# uname -a
> Linux pc638 5.17.0-rc2-next-20220201 #63 SMP PREEMPT Tue May 10 20:39:08 CEST 2022 x86_64 GNU/Linux
> root@pc638:/home/urezki#
>
> so it shows *bh* disabled sections.

Ah yes. That was added with commit 289e7b0f7eb47 ("tracing: Account bottom
half disabled sections."). Maybe I should mark that as stable?

-- Steve