2020-02-05 10:52:31

by Steven Rostedt

[permalink] [raw]
Subject: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded

From: "Steven Rostedt (VMware)" <[email protected]>

Because the function graph tracer can execute in sections where RCU is not
"watching", the rcu_dereference_sched() for the has needs to be open coded.
This is fine because the RCU "flavor" of the ftrace hash is protected by
its own RCU handling (it does its own little synchronization on every CPU
and does not rely on RCU sched).

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
kernel/trace/trace.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 022def96d307..8c52f5de9384 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -975,6 +975,11 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)

preempt_disable_notrace();

+ /*
+ * Have to open code "rcu_dereference_sched()" because the
+ * function graph tracer can be called when RCU is not
+ * "watching".
+ */
hash = rcu_dereference_protected(ftrace_graph_hash, !preemptible());

if (ftrace_hash_empty(hash)) {
@@ -1022,6 +1027,11 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)

preempt_disable_notrace();

+ /*
+ * Have to open code "rcu_dereference_sched()" because the
+ * function graph tracer can be called when RCU is not
+ * "watching".
+ */
notrace_hash = rcu_dereference_protected(ftrace_graph_notrace_hash,
!preemptible());

--
2.24.1



2020-02-05 11:36:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded


Paul and Joel,

Care to ack this patch (or complain about it ;-) ?

-- Steve


On Wed, 05 Feb 2020 05:49:33 -0500
Steven Rostedt <[email protected]> wrote:

> From: "Steven Rostedt (VMware)" <[email protected]>
>
> Because the function graph tracer can execute in sections where RCU is not
> "watching", the rcu_dereference_sched() for the has needs to be open coded.
> This is fine because the RCU "flavor" of the ftrace hash is protected by
> its own RCU handling (it does its own little synchronization on every CPU
> and does not rely on RCU sched).
>
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> ---
> kernel/trace/trace.h | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 022def96d307..8c52f5de9384 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -975,6 +975,11 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
>
> preempt_disable_notrace();
>
> + /*
> + * Have to open code "rcu_dereference_sched()" because the
> + * function graph tracer can be called when RCU is not
> + * "watching".
> + */
> hash = rcu_dereference_protected(ftrace_graph_hash, !preemptible());
>
> if (ftrace_hash_empty(hash)) {
> @@ -1022,6 +1027,11 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
>
> preempt_disable_notrace();
>
> + /*
> + * Have to open code "rcu_dereference_sched()" because the
> + * function graph tracer can be called when RCU is not
> + * "watching".
> + */
> notrace_hash = rcu_dereference_protected(ftrace_graph_notrace_hash,
> !preemptible());
>

2020-02-05 14:20:59

by Joel Fernandes

[permalink] [raw]
Subject: Re: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded

On Wed, Feb 05, 2020 at 06:33:49AM -0500, Steven Rostedt wrote:
>
> Paul and Joel,
>
> Care to ack this patch (or complain about it ;-) ?
>
> -- Steve
>
>
> On Wed, 05 Feb 2020 05:49:33 -0500
> Steven Rostedt <[email protected]> wrote:
>
> > From: "Steven Rostedt (VMware)" <[email protected]>
> >
> > Because the function graph tracer can execute in sections where RCU is not
> > "watching", the rcu_dereference_sched() for the has needs to be open coded.
> > This is fine because the RCU "flavor" of the ftrace hash is protected by
> > its own RCU handling (it does its own little synchronization on every CPU
> > and does not rely on RCU sched).
> >
> > Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> > ---
> > kernel/trace/trace.h | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 022def96d307..8c52f5de9384 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -975,6 +975,11 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
> >
> > preempt_disable_notrace();
> >
> > + /*
> > + * Have to open code "rcu_dereference_sched()" because the
> > + * function graph tracer can be called when RCU is not
> > + * "watching".
> > + */
> > hash = rcu_dereference_protected(ftrace_graph_hash, !preemptible());
> >
> > if (ftrace_hash_empty(hash)) {
> > @@ -1022,6 +1027,11 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
> >
> > preempt_disable_notrace();
> >
> > + /*
> > + * Have to open code "rcu_dereference_sched()" because the
> > + * function graph tracer can be called when RCU is not
> > + * "watching".
> > + */
> > notrace_hash = rcu_dereference_protected(ftrace_graph_notrace_hash,
> > !preemptible());
> >

Could you paste the stack here when RCU is not watching? In trace event code
IIRC we call rcu_enter_irqs_on() to have RCU temporarily watch, since that
code can be called from idle loop. Should we doing the same here as well?

thanks,

- Joel

2020-02-05 14:31:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded

On Wed, 5 Feb 2020 09:19:15 -0500
Joel Fernandes <[email protected]> wrote:

> Could you paste the stack here when RCU is not watching? In trace event code
> IIRC we call rcu_enter_irqs_on() to have RCU temporarily watch, since that
> code can be called from idle loop. Should we doing the same here as well?

Unfortunately I lost the stack trace. And the last time we tried to use
rcu_enter_irqs_on() for ftrace, we couldn't find a way to do this
properly. Ftrace is much more invasive then going into idle. The
problem is that ftrace traces RCU itself, and calling
"rcu_enter_irqs_on()" in pretty much any place in the RCU code caused
lots of bugs ;-)

This is why we have the schedule_on_each_cpu(ftrace_sync) hack.

-- Steve

2020-02-05 15:43:18

by Joel Fernandes

[permalink] [raw]
Subject: Re: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded

On Wed, Feb 05, 2020 at 09:28:47AM -0500, Steven Rostedt wrote:
> On Wed, 5 Feb 2020 09:19:15 -0500
> Joel Fernandes <[email protected]> wrote:
>
> > Could you paste the stack here when RCU is not watching? In trace event code
> > IIRC we call rcu_enter_irqs_on() to have RCU temporarily watch, since that
> > code can be called from idle loop. Should we doing the same here as well?
>
> Unfortunately I lost the stack trace. And the last time we tried to use
> rcu_enter_irqs_on() for ftrace, we couldn't find a way to do this
> properly. Ftrace is much more invasive then going into idle. The
> problem is that ftrace traces RCU itself, and calling
> "rcu_enter_irqs_on()" in pretty much any place in the RCU code caused
> lots of bugs ;-)
>
> This is why we have the schedule_on_each_cpu(ftrace_sync) hack.

The "schedule a task on each CPU" trick works on !PREEMPT though right?

Because it is possible in PREEMPT=y to get preempted in the middle of a
read-side critical section, switch to the worker thread executing the
ftrace_sync() and then switch back. But RCU still has to watch that CPU since
the read-side critical section was not completed.

Or is there a subtlety here with ftrace that I missed?

thanks,

- Joel


>
> -- Steve

2020-02-05 15:50:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded

On Wed, 5 Feb 2020 10:42:12 -0500
Joel Fernandes <[email protected]> wrote:

> On Wed, Feb 05, 2020 at 09:28:47AM -0500, Steven Rostedt wrote:
> > On Wed, 5 Feb 2020 09:19:15 -0500
> > Joel Fernandes <[email protected]> wrote:
> >
> > > Could you paste the stack here when RCU is not watching? In trace event code
> > > IIRC we call rcu_enter_irqs_on() to have RCU temporarily watch, since that
> > > code can be called from idle loop. Should we doing the same here as well?
> >
> > Unfortunately I lost the stack trace. And the last time we tried to use
> > rcu_enter_irqs_on() for ftrace, we couldn't find a way to do this
> > properly. Ftrace is much more invasive then going into idle. The
> > problem is that ftrace traces RCU itself, and calling
> > "rcu_enter_irqs_on()" in pretty much any place in the RCU code caused
> > lots of bugs ;-)
> >
> > This is why we have the schedule_on_each_cpu(ftrace_sync) hack.
>
> The "schedule a task on each CPU" trick works on !PREEMPT though right?

It works on both, as I care more about the PREEMPT=y case then
the !PREEMPT, and the PREEMPT_RT which is even more preemptive than
PREEMPT!

>
> Because it is possible in PREEMPT=y to get preempted in the middle of a
> read-side critical section, switch to the worker thread executing the
> ftrace_sync() and then switch back. But RCU still has to watch that CPU since
> the read-side critical section was not completed.
>
> Or is there a subtlety here with ftrace that I missed?
>

Hence Amol's patch:

> + notrace_hash = rcu_dereference_protected(ftrace_graph_notrace_hash,
> + !preemptible());

It checks to make sure preemption is off. There is no chance of being
preempted in the read side critical section.

-- Steve

2020-02-05 16:11:19

by Joel Fernandes

[permalink] [raw]
Subject: Re: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded

On Wed, Feb 05, 2020 at 10:49:45AM -0500, Steven Rostedt wrote:
> On Wed, 5 Feb 2020 10:42:12 -0500
> Joel Fernandes <[email protected]> wrote:
>
> > On Wed, Feb 05, 2020 at 09:28:47AM -0500, Steven Rostedt wrote:
> > > On Wed, 5 Feb 2020 09:19:15 -0500
> > > Joel Fernandes <[email protected]> wrote:
> > >
> > > > Could you paste the stack here when RCU is not watching? In trace event code
> > > > IIRC we call rcu_enter_irqs_on() to have RCU temporarily watch, since that
> > > > code can be called from idle loop. Should we doing the same here as well?
> > >
> > > Unfortunately I lost the stack trace. And the last time we tried to use
> > > rcu_enter_irqs_on() for ftrace, we couldn't find a way to do this
> > > properly. Ftrace is much more invasive then going into idle. The
> > > problem is that ftrace traces RCU itself, and calling
> > > "rcu_enter_irqs_on()" in pretty much any place in the RCU code caused
> > > lots of bugs ;-)
> > >
> > > This is why we have the schedule_on_each_cpu(ftrace_sync) hack.
> >
> > The "schedule a task on each CPU" trick works on !PREEMPT though right?
>
> It works on both, as I care more about the PREEMPT=y case then
> the !PREEMPT, and the PREEMPT_RT which is even more preemptive than
> PREEMPT!
>
> >
> > Because it is possible in PREEMPT=y to get preempted in the middle of a
> > read-side critical section, switch to the worker thread executing the
> > ftrace_sync() and then switch back. But RCU still has to watch that CPU since
> > the read-side critical section was not completed.
> >
> > Or is there a subtlety here with ftrace that I missed?
> >
>
> Hence Amol's patch:
>
> > + notrace_hash = rcu_dereference_protected(ftrace_graph_notrace_hash,
> > + !preemptible());
>
> It checks to make sure preemption is off. There is no chance of being
> preempted in the read side critical section.

Yes, this makes sense. Sorry for the noise. For "sched" RCU cases,
scheduling on each CPU would work regardless of PREEMPT configuration.

( I guess I was confusing this case with the non-sched RCU usages (such as using
rcu_read_lock()) where scheduling a task on each CPU obviously would not work
with PREEMPT=y. )

By the way would SRCU not work instead of the ftrace_sync() technique? Or is
the concern that SRCU cannot be used from NMI?

thanks,

- Joel

2020-02-05 21:56:55

by Joel Fernandes

[permalink] [raw]
Subject: Re: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded

On Wed, Feb 05, 2020 at 11:08:24AM -0500, Joel Fernandes wrote:
> On Wed, Feb 05, 2020 at 10:49:45AM -0500, Steven Rostedt wrote:
> > On Wed, 5 Feb 2020 10:42:12 -0500
> > Joel Fernandes <[email protected]> wrote:
> >
> > > On Wed, Feb 05, 2020 at 09:28:47AM -0500, Steven Rostedt wrote:
> > > > On Wed, 5 Feb 2020 09:19:15 -0500
> > > > Joel Fernandes <[email protected]> wrote:
> > > >
> > > > > Could you paste the stack here when RCU is not watching? In trace event code
> > > > > IIRC we call rcu_enter_irqs_on() to have RCU temporarily watch, since that
> > > > > code can be called from idle loop. Should we doing the same here as well?
> > > >
> > > > Unfortunately I lost the stack trace. And the last time we tried to use
> > > > rcu_enter_irqs_on() for ftrace, we couldn't find a way to do this
> > > > properly. Ftrace is much more invasive then going into idle. The
> > > > problem is that ftrace traces RCU itself, and calling
> > > > "rcu_enter_irqs_on()" in pretty much any place in the RCU code caused
> > > > lots of bugs ;-)
> > > >
> > > > This is why we have the schedule_on_each_cpu(ftrace_sync) hack.
> > >
> > > The "schedule a task on each CPU" trick works on !PREEMPT though right?
> >
> > It works on both, as I care more about the PREEMPT=y case then
> > the !PREEMPT, and the PREEMPT_RT which is even more preemptive than
> > PREEMPT!
> >
> > >
> > > Because it is possible in PREEMPT=y to get preempted in the middle of a
> > > read-side critical section, switch to the worker thread executing the
> > > ftrace_sync() and then switch back. But RCU still has to watch that CPU since
> > > the read-side critical section was not completed.
> > >
> > > Or is there a subtlety here with ftrace that I missed?
> > >
> >
> > Hence Amol's patch:
> >
> > > + notrace_hash = rcu_dereference_protected(ftrace_graph_notrace_hash,
> > > + !preemptible());
> >
> > It checks to make sure preemption is off. There is no chance of being
> > preempted in the read side critical section.
>
> Yes, this makes sense. Sorry for the noise. For "sched" RCU cases,
> scheduling on each CPU would work regardless of PREEMPT configuration.
>
> ( I guess I was confusing this case with the non-sched RCU usages (such as using
> rcu_read_lock()) where scheduling a task on each CPU obviously would not work
> with PREEMPT=y. )
>
> By the way would SRCU not work instead of the ftrace_sync() technique? Or is
> the concern that SRCU cannot be used from NMI?

Answering my own question, SRCU would likely slow down ftrace_graph_addr()
unnecessarily so is probably not worth doing so in this path (especially
because ftrace_graph_addr() already starts an implict read-side critical
section anyway via preempt_disable()).

thanks,

- Joel

2020-02-05 22:01:31

by Joel Fernandes

[permalink] [raw]
Subject: Re: [for-next][PATCH 4/4] ftrace: Add comment to why rcu_dereference_sched() is open coded

On Wed, Feb 05, 2020 at 06:33:49AM -0500, Steven Rostedt wrote:
>
> Paul and Joel,
>
> Care to ack this patch (or complain about it ;-) ?
>
> -- Steve
>
>
> On Wed, 05 Feb 2020 05:49:33 -0500
> Steven Rostedt <[email protected]> wrote:
>
> > From: "Steven Rostedt (VMware)" <[email protected]>
> >
> > Because the function graph tracer can execute in sections where RCU is not
> > "watching", the rcu_dereference_sched() for the has needs to be open coded.
> > This is fine because the RCU "flavor" of the ftrace hash is protected by
> > its own RCU handling (it does its own little synchronization on every CPU
> > and does not rely on RCU sched).
> >
> > Signed-off-by: Steven Rostedt (VMware) <[email protected]>

Acked-by: Joel Fernandes (Google) <[email protected]>

thanks,

- Joel


> > ---
> > kernel/trace/trace.h | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 022def96d307..8c52f5de9384 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -975,6 +975,11 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
> >
> > preempt_disable_notrace();
> >
> > + /*
> > + * Have to open code "rcu_dereference_sched()" because the
> > + * function graph tracer can be called when RCU is not
> > + * "watching".
> > + */
> > hash = rcu_dereference_protected(ftrace_graph_hash, !preemptible());
> >
> > if (ftrace_hash_empty(hash)) {
> > @@ -1022,6 +1027,11 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
> >
> > preempt_disable_notrace();
> >
> > + /*
> > + * Have to open code "rcu_dereference_sched()" because the
> > + * function graph tracer can be called when RCU is not
> > + * "watching".
> > + */
> > notrace_hash = rcu_dereference_protected(ftrace_graph_notrace_hash,
> > !preemptible());
> >
>