LinuxLists.cc - [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

----- On Apr 27, 2018, at 12:26 AM, Joel Fernandes [email protected] wrote:

> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
>
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.

The general approach and the implementation look fine, except for
one small detail: I would be tempted to explicitly disable preemption
around the call to the tracepoint callback for the rcuidle variant,
unless we plan to audit every tracer right away to remove any assumption
that preemption is disabled in the callback implementation.

That would be the main difference between an eventual "may_sleep" tracepoint
and a rcuidle tracepoint: "may_sleep" would use SRCU and leave preemption
enabled when invoking the callback. rcuidle uses SRCU too, but would
disable preemption when invoking the callback.

Thoughts ?

Thanks,

Mathieu

>
> In the future, we can add a new may_sleep API which can use this
> infrastructure for callbacks that actually can sleep which will support
> Mathieu's usecase of blocking probes.
>
> Test: Tested idle and preempt/irq tracepoints.
>
> [1] https://patchwork.kernel.org/patch/10344297/
>
> Cc: Steven Rostedt <[email protected]>
> Cc: Peter Zilstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Tom Zanussi <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Thomas Glexiner <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Paul McKenney <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Fenguang Wu <[email protected]>
> Cc: Baohong Liu <[email protected]>
> Cc: Vedang Patel <[email protected]>
> Cc: [email protected]
> Signed-off-by: Joel Fernandes <[email protected]>
> ---
> include/linux/tracepoint.h | 37 +++++++++++++++++++++++++++++--------
> kernel/tracepoint.c | 10 +++++++++-
> 2 files changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..a1c1987de423 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
> */
>
> #include <linux/smp.h>
> +#include <linux/srcu.h>
> #include <linux/errno.h>
> #include <linux/types.h>
> #include <linux/cpumask.h>
> @@ -33,6 +34,8 @@ struct trace_eval_map {
>
> #define TRACEPOINT_DEFAULT_PRIO 10
>
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct
> notifier_block *nb)
> */
> static inline void tracepoint_synchronize_unregister(void)
> {
> + synchronize_srcu(&tracepoint_srcu);
> synchronize_sched();
> }
>
> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
> * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
> * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
> */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck) \
> +#define __DO_TRACE(tp, proto, args, cond, preempt_on) \
> do { \
> struct tracepoint_func *it_func_ptr; \
> void *it_func; \
> void *__data; \
> + int __maybe_unused idx = 0; \
> \
> if (!(cond)) \
> return; \
> - if (rcucheck) \
> - rcu_irq_enter_irqson(); \
> - rcu_read_lock_sched_notrace(); \
> - it_func_ptr = rcu_dereference_sched((tp)->funcs); \
> + if (preempt_on) { \
> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */ \
> + idx = srcu_read_lock(&tracepoint_srcu); \
> + it_func_ptr = srcu_dereference((tp)->funcs, \
> + &tracepoint_srcu); \
> + } else { \
> + rcu_read_lock_sched_notrace(); \
> + it_func_ptr = \
> + rcu_dereference_sched((tp)->funcs); \
> + } \
> + \
> if (it_func_ptr) { \
> do { \
> it_func = (it_func_ptr)->func; \
> @@ -148,12 +160,21 @@ extern void syscall_unregfunc(void);
> ((void(*)(proto))(it_func))(args); \
> } while ((++it_func_ptr)->func); \
> } \
> - rcu_read_unlock_sched_notrace(); \
> - if (rcucheck) \
> - rcu_irq_exit_irqson(); \
> + \
> + if (preempt_on) \
> + srcu_read_unlock(&tracepoint_srcu, idx); \
> + else \
> + rcu_read_unlock_sched_notrace(); \
> } while (0)
>
> #ifndef MODULE
> +/*
> + * This is for tracepoints that may be called from an rcu idle path. To make
> + * sure we its safe in the idle path, use the srcu instead of sched-rcu by
> + * specifying the preempt_on flag below. This has the obvious effect that any
> + * callback expecting preemption to be disabled should explicitly do so since
> + * with srcu it'd run with preempt on.
> + */
> #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
> static inline void trace_##name##_rcuidle(proto) \
> { \
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 671b13457387..b3b1d65a2460 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -31,6 +31,9 @@
> extern struct tracepoint * const __start___tracepoints_ptrs[];
> extern struct tracepoint * const __stop___tracepoints_ptrs[];
>
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
> /* Set to 1 to enable tracepoint debug output */
> static const int tracepoint_debug;
>
> @@ -67,11 +70,16 @@ static inline void *allocate_probes(int count)
> return p == NULL ? NULL : p->probes;
> }
>
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
> {
> kfree(container_of(head, struct tp_probes, rcu));
> }
>
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> + call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
> static inline void release_probes(struct tracepoint_func *old)
> {
> if (old) {
> --
> 2.17.0.441.gb46fe60e1d-goog

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-04-27 14:49:10

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
Mathieu Desnoyers <[email protected]> wrote:

> The general approach and the implementation look fine, except for
> one small detail: I would be tempted to explicitly disable preemption
> around the call to the tracepoint callback for the rcuidle variant,
> unless we plan to audit every tracer right away to remove any assumption
> that preemption is disabled in the callback implementation.

I'm thinking that we do that audit. There shouldn't be many instances
of it. I like the idea that a tracepoint callback gets called with
preemption enabled.

-- Steve

2018-04-27 15:39:41

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers <[email protected]> wrote:
>
> > The general approach and the implementation look fine, except for
> > one small detail: I would be tempted to explicitly disable preemption
> > around the call to the tracepoint callback for the rcuidle variant,
> > unless we plan to audit every tracer right away to remove any assumption
> > that preemption is disabled in the callback implementation.
>
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

Are you really sure you want to increase your state space that much?

Thanx, Paul

2018-04-27 15:41:29

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, 27 Apr 2018 08:38:26 -0700
"Paul E. McKenney" <[email protected]> wrote:

> On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers <[email protected]> wrote:
> >
> > > The general approach and the implementation look fine, except for
> > > one small detail: I would be tempted to explicitly disable preemption
> > > around the call to the tracepoint callback for the rcuidle variant,
> > > unless we plan to audit every tracer right away to remove any assumption
> > > that preemption is disabled in the callback implementation.
> >
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.
>
> Are you really sure you want to increase your state space that much?

Why not? The code I have in callbacks already deals with all sorts of
context - normal, softirq, irq, NMI, preemption disabled, irq
disabled.

-- Steve

2018-04-27 15:43:41

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

----- On Apr 27, 2018, at 10:47 AM, rostedt [email protected] wrote:

> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers <[email protected]> wrote:
>
>> The general approach and the implementation look fine, except for
>> one small detail: I would be tempted to explicitly disable preemption
>> around the call to the tracepoint callback for the rcuidle variant,
>> unless we plan to audit every tracer right away to remove any assumption
>> that preemption is disabled in the callback implementation.
>
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

I see that ftrace explicitly disables preemption in its ring buffer
code. FWIW, this is redundant when called from sched-rcu tracepoints
and from kprobes which adds unnecessary performance overhead.

LTTng expects preemption to be disabled when invoked. I can adapt on my
side as needed, but would prefer not to have redundant preemption disabling
for probes hooking on sched-rcu tracepoints (which is the common case).

Do perf callbacks expect preemption to be disabled ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-04-27 15:45:26

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

----- On Apr 27, 2018, at 11:40 AM, rostedt [email protected] wrote:

> On Fri, 27 Apr 2018 08:38:26 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
>> On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>> > Mathieu Desnoyers <[email protected]> wrote:
>> >
>> > > The general approach and the implementation look fine, except for
>> > > one small detail: I would be tempted to explicitly disable preemption
>> > > around the call to the tracepoint callback for the rcuidle variant,
>> > > unless we plan to audit every tracer right away to remove any assumption
>> > > that preemption is disabled in the callback implementation.
>> >
>> > I'm thinking that we do that audit. There shouldn't be many instances
>> > of it. I like the idea that a tracepoint callback gets called with
>> > preemption enabled.
>>
>> Are you really sure you want to increase your state space that much?
>
> Why not? The code I have in callbacks already deals with all sorts of
> context - normal, softirq, irq, NMI, preemption disabled, irq
> disabled.

It does so by disabling preemption in the callbacks, even when it's
redundant with the guarantees already provided by tracepoint-sched-rcu
and by kprobes. It's not that great for a fast-path.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-04-27 15:57:30

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Thu, Apr 26, 2018 at 09:26:56PM -0700, Joel Fernandes wrote:
> In recent tests with IRQ on/off tracepoints, a large performance
> overhead ~10% is noticed when running hackbench. This is root caused to
> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
> tracepoint code. Following a long discussion on the list [1] about this,
> we concluded that srcu is a better alternative for use during rcu idle.
> Although it does involve extra barriers, its lighter than the sched-rcu
> version which has to do additional RCU calls to notify RCU idle about
> entry into RCU sections.
>
> In this patch, we change the underlying implementation of the
> trace_*_rcuidle API to use SRCU. This has shown to improve performance
> alot for the high frequency irq enable/disable tracepoints.
>
> In the future, we can add a new may_sleep API which can use this
> infrastructure for callbacks that actually can sleep which will support
> Mathieu's usecase of blocking probes.
>
> Test: Tested idle and preempt/irq tracepoints.

Looks good overall! One question and a few comments below.

Thanx, Paul

> [1] https://patchwork.kernel.org/patch/10344297/
>
> Cc: Steven Rostedt <[email protected]>
> Cc: Peter Zilstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Tom Zanussi <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Thomas Glexiner <[email protected]>
> Cc: Boqun Feng <[email protected]>
> Cc: Paul McKenney <[email protected]>
> Cc: Frederic Weisbecker <[email protected]>
> Cc: Randy Dunlap <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Fenguang Wu <[email protected]>
> Cc: Baohong Liu <[email protected]>
> Cc: Vedang Patel <[email protected]>
> Cc: [email protected]
> Signed-off-by: Joel Fernandes <[email protected]>
> ---
> include/linux/tracepoint.h | 37 +++++++++++++++++++++++++++++--------
> kernel/tracepoint.c | 10 +++++++++-
> 2 files changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index c94f466d57ef..a1c1987de423 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -15,6 +15,7 @@
> */
>
> #include <linux/smp.h>
> +#include <linux/srcu.h>
> #include <linux/errno.h>
> #include <linux/types.h>
> #include <linux/cpumask.h>
> @@ -33,6 +34,8 @@ struct trace_eval_map {
>
> #define TRACEPOINT_DEFAULT_PRIO 10
>
> +extern struct srcu_struct tracepoint_srcu;
> +
> extern int
> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
> extern int
> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
> */
> static inline void tracepoint_synchronize_unregister(void)
> {
> + synchronize_srcu(&tracepoint_srcu);
> synchronize_sched();
> }
>
> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
> * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
> * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
> */
> -#define __DO_TRACE(tp, proto, args, cond, rcucheck) \
> +#define __DO_TRACE(tp, proto, args, cond, preempt_on) \
> do { \
> struct tracepoint_func *it_func_ptr; \
> void *it_func; \
> void *__data; \
> + int __maybe_unused idx = 0; \
> \
> if (!(cond)) \
> return; \
> - if (rcucheck) \
> - rcu_irq_enter_irqson(); \
> - rcu_read_lock_sched_notrace(); \
> - it_func_ptr = rcu_dereference_sched((tp)->funcs); \
> + if (preempt_on) { \
> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */ \

Very good on this check, thank you!

> + idx = srcu_read_lock(&tracepoint_srcu); \

Hmmm... Do I need to create a _notrace variant of srcu_read_lock()
and srcu_read_unlock()?

> + it_func_ptr = srcu_dereference((tp)->funcs, \
> + &tracepoint_srcu); \
> + } else { \
> + rcu_read_lock_sched_notrace(); \
> + it_func_ptr = \
> + rcu_dereference_sched((tp)->funcs); \
> + } \
> + \
> if (it_func_ptr) { \
> do { \
> it_func = (it_func_ptr)->func; \
> @@ -148,12 +160,21 @@ extern void syscall_unregfunc(void);
> ((void(*)(proto))(it_func))(args); \
> } while ((++it_func_ptr)->func); \
> } \
> - rcu_read_unlock_sched_notrace(); \
> - if (rcucheck) \
> - rcu_irq_exit_irqson(); \
> + \
> + if (preempt_on) \
> + srcu_read_unlock(&tracepoint_srcu, idx); \
> + else \
> + rcu_read_unlock_sched_notrace(); \
> } while (0)
>
> #ifndef MODULE
> +/*
> + * This is for tracepoints that may be called from an rcu idle path. To make
> + * sure we its safe in the idle path, use the srcu instead of sched-rcu by
> + * specifying the preempt_on flag below. This has the obvious effect that any
> + * callback expecting preemption to be disabled should explicitly do so since
> + * with srcu it'd run with preempt on.
> + */
> #define __DECLARE_TRACE_RCU(name, proto, args, cond, data_proto, data_args) \
> static inline void trace_##name##_rcuidle(proto) \
> { \
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 671b13457387..b3b1d65a2460 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -31,6 +31,9 @@
> extern struct tracepoint * const __start___tracepoints_ptrs[];
> extern struct tracepoint * const __stop___tracepoints_ptrs[];
>
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
> /* Set to 1 to enable tracepoint debug output */
> static const int tracepoint_debug;
>
> @@ -67,11 +70,16 @@ static inline void *allocate_probes(int count)
> return p == NULL ? NULL : p->probes;
> }
>
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
> {
> kfree(container_of(head, struct tp_probes, rcu));
> }
>
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> + call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);

Chaining them through the two callbacks works, good!

> +}
> +
> static inline void release_probes(struct tracepoint_func *old)
> {
> if (old) {
> --
> 2.17.0.441.gb46fe60e1d-goog
>

2018-04-27 15:58:59

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 11:40:05AM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 08:38:26 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > On Fri, Apr 27, 2018 at 10:47:47AM -0400, Steven Rostedt wrote:
> > > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > > Mathieu Desnoyers <[email protected]> wrote:
> > >
> > > > The general approach and the implementation look fine, except for
> > > > one small detail: I would be tempted to explicitly disable preemption
> > > > around the call to the tracepoint callback for the rcuidle variant,
> > > > unless we plan to audit every tracer right away to remove any assumption
> > > > that preemption is disabled in the callback implementation.
> > >
> > > I'm thinking that we do that audit. There shouldn't be many instances
> > > of it. I like the idea that a tracepoint callback gets called with
> > > preemption enabled.
> >
> > Are you really sure you want to increase your state space that much?
>
> Why not? The code I have in callbacks already deals with all sorts of
> context - normal, softirq, irq, NMI, preemption disabled, irq
> disabled.

But why? Do people really expect good real-time response on systems
instrumented with lots of tracing?

Thanx, Paul

2018-04-27 16:10:03

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, 27 Apr 2018 11:42:15 -0400 (EDT)
Mathieu Desnoyers <[email protected]> wrote:

> ----- On Apr 27, 2018, at 10:47 AM, rostedt [email protected] wrote:
>
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers <[email protected]> wrote:
> >
> >> The general approach and the implementation look fine, except for
> >> one small detail: I would be tempted to explicitly disable preemption
> >> around the call to the tracepoint callback for the rcuidle variant,
> >> unless we plan to audit every tracer right away to remove any assumption
> >> that preemption is disabled in the callback implementation.
> >
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.
>
> I see that ftrace explicitly disables preemption in its ring buffer
> code. FWIW, this is redundant when called from sched-rcu tracepoints
> and from kprobes which adds unnecessary performance overhead.

Sure, but that code is called from other locations that do not have
preemption disabled. Calling preempt_disable() is far from the biggest
overhead of that code path.

>
> LTTng expects preemption to be disabled when invoked. I can adapt on my
> side as needed, but would prefer not to have redundant preemption disabling
> for probes hooking on sched-rcu tracepoints (which is the common case).

Why not? Really, preempt_disable is simply a per cpu counter, with only
need of adding compiler barriers.

>
> Do perf callbacks expect preemption to be disabled ?

I'll have to look, but wouldn't be hard to change.

-- Steve

2018-04-27 16:11:20

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, 27 Apr 2018 11:43:41 -0400 (EDT)
Mathieu Desnoyers <[email protected]> wrote:

> It does so by disabling preemption in the callbacks, even when it's
> redundant with the guarantees already provided by tracepoint-sched-rcu
> and by kprobes. It's not that great for a fast-path.

Really, preempt_disable() is not bad for a fast path. It's far better
than a local_irq_disable().

-- Steve

2018-04-27 16:15:55

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

Hi Paul,

On Fri, Apr 27, 2018 at 8:57 AM, Paul E. McKenney
<[email protected]> wrote:
> On Thu, Apr 26, 2018 at 09:26:56PM -0700, Joel Fernandes wrote:
>> In recent tests with IRQ on/off tracepoints, a large performance
>> overhead ~10% is noticed when running hackbench. This is root caused to
>> calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the
>> tracepoint code. Following a long discussion on the list [1] about this,
>> we concluded that srcu is a better alternative for use during rcu idle.
>> Although it does involve extra barriers, its lighter than the sched-rcu
>> version which has to do additional RCU calls to notify RCU idle about
>> entry into RCU sections.
>>
>> In this patch, we change the underlying implementation of the
>> trace_*_rcuidle API to use SRCU. This has shown to improve performance
>> alot for the high frequency irq enable/disable tracepoints.
>>
>> In the future, we can add a new may_sleep API which can use this
>> infrastructure for callbacks that actually can sleep which will support
>> Mathieu's usecase of blocking probes.
>>
>> Test: Tested idle and preempt/irq tracepoints.
>
> Looks good overall! One question and a few comments below.
>
> Thanx, Paul
>
>> [1] https://patchwork.kernel.org/patch/10344297/
>>
>> Cc: Steven Rostedt <[email protected]>
>> Cc: Peter Zilstra <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Mathieu Desnoyers <[email protected]>
>> Cc: Tom Zanussi <[email protected]>
>> Cc: Namhyung Kim <[email protected]>
>> Cc: Thomas Glexiner <[email protected]>
>> Cc: Boqun Feng <[email protected]>
>> Cc: Paul McKenney <[email protected]>
>> Cc: Frederic Weisbecker <[email protected]>
>> Cc: Randy Dunlap <[email protected]>
>> Cc: Masami Hiramatsu <[email protected]>
>> Cc: Fenguang Wu <[email protected]>
>> Cc: Baohong Liu <[email protected]>
>> Cc: Vedang Patel <[email protected]>
>> Cc: [email protected]
>> Signed-off-by: Joel Fernandes <[email protected]>
>> ---
>> include/linux/tracepoint.h | 37 +++++++++++++++++++++++++++++--------
>> kernel/tracepoint.c | 10 +++++++++-
>> 2 files changed, 38 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
>> index c94f466d57ef..a1c1987de423 100644
>> --- a/include/linux/tracepoint.h
>> +++ b/include/linux/tracepoint.h
>> @@ -15,6 +15,7 @@
>> */
>>
>> #include <linux/smp.h>
>> +#include <linux/srcu.h>
>> #include <linux/errno.h>
>> #include <linux/types.h>
>> #include <linux/cpumask.h>
>> @@ -33,6 +34,8 @@ struct trace_eval_map {
>>
>> #define TRACEPOINT_DEFAULT_PRIO 10
>>
>> +extern struct srcu_struct tracepoint_srcu;
>> +
>> extern int
>> tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
>> extern int
>> @@ -77,6 +80,7 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb)
>> */
>> static inline void tracepoint_synchronize_unregister(void)
>> {
>> + synchronize_srcu(&tracepoint_srcu);
>> synchronize_sched();
>> }
>>
>> @@ -129,18 +133,26 @@ extern void syscall_unregfunc(void);
>> * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
>> * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
>> */
>> -#define __DO_TRACE(tp, proto, args, cond, rcucheck) \
>> +#define __DO_TRACE(tp, proto, args, cond, preempt_on) \
>> do { \
>> struct tracepoint_func *it_func_ptr; \
>> void *it_func; \
>> void *__data; \
>> + int __maybe_unused idx = 0; \
>> \
>> if (!(cond)) \
>> return; \
>> - if (rcucheck) \
>> - rcu_irq_enter_irqson(); \
>> - rcu_read_lock_sched_notrace(); \
>> - it_func_ptr = rcu_dereference_sched((tp)->funcs); \
>> + if (preempt_on) { \
>> + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */ \
>
> Very good on this check, thank you!

Sure thing :-)

>
>> + idx = srcu_read_lock(&tracepoint_srcu); \
>
> Hmmm... Do I need to create a _notrace variant of srcu_read_lock()
> and srcu_read_unlock()?

That shouldn't be needed. For the rcu_read_lock_sched case, there is a
preempt_disable which needs to be a notrace, but for the srcu one,
since we don't do that, I think it should be fine.

Thanks!

- Joel

2018-04-27 16:16:05

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, 27 Apr 2018 08:57:01 -0700
"Paul E. McKenney" <[email protected]> wrote:

> > + if (preempt_on) { \
> > + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */ \
>
> Very good on this check, thank you!

I think you need to return and not call the read lock.

if (WARN_ON_ONCE(in_nmi()))
return;

>
> > + idx = srcu_read_lock(&tracepoint_srcu); \
>
> Hmmm... Do I need to create a _notrace variant of srcu_read_lock()
> and srcu_read_unlock()?

I think so.

-- Steve

>
> > + it_func_ptr = srcu_dereference((tp)->funcs, \
> > + &tracepoint_srcu); \
> > + } else { \
> > + rcu_read_lock_sched_notrace(); \
> > + it_func_ptr = \
> > + rcu_dereference_sched((tp)->funcs); \
> > + } \
> > + \

2018-04-27 16:23:40

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, 27 Apr 2018 09:14:30 -0700
Joel Fernandes <[email protected]> wrote:

> > Hmmm... Do I need to create a _notrace variant of srcu_read_lock()
> > and srcu_read_unlock()?
>
> That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> preempt_disable which needs to be a notrace, but for the srcu one,
> since we don't do that, I think it should be fine.

Actually, I think I may agree here too. Because the _notrace is for
function tracing, and it shouldn't affect it. If people don't want it
traced, they could add those functions to the list in the notrace file.

-- Steve

2018-04-27 16:24:38

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 9:13 AM, Steven Rostedt <[email protected]> wrote:
> On Fri, 27 Apr 2018 08:57:01 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
>> > + if (preempt_on) { \
>> > + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */ \
>>
>> Very good on this check, thank you!
>
> I think you need to return and not call the read lock.
>
> if (WARN_ON_ONCE(in_nmi()))
> return;

Cool, I'll do that.

>>
>> > + idx = srcu_read_lock(&tracepoint_srcu); \
>>
>> Hmmm... Do I need to create a _notrace variant of srcu_read_lock()
>> and srcu_read_unlock()?
>
> I think so.

Oh yes, since otherwise we call into lockdep.

Paul, then I think that's true we'd need srcu _notrace variants that
don't do the rcu_lock_acquire. Sorry for my earlier email saying its
not needed. Thanks,

- Joel

2018-04-27 16:31:50

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt <[email protected]> wrote:
> On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> Mathieu Desnoyers <[email protected]> wrote:
>
>> The general approach and the implementation look fine, except for
>> one small detail: I would be tempted to explicitly disable preemption
>> around the call to the tracepoint callback for the rcuidle variant,
>> unless we plan to audit every tracer right away to remove any assumption
>> that preemption is disabled in the callback implementation.
>
> I'm thinking that we do that audit. There shouldn't be many instances
> of it. I like the idea that a tracepoint callback gets called with
> preemption enabled.

Here is the list of all callers of the _rcuidle :

trace_clk_disable_complete_rcuidle
trace_clk_disable_rcuidle
trace_clk_enable_complete_rcuidle
trace_clk_enable_rcuidle
trace_console_rcuidle
trace_cpu_idle_rcuidle
trace_ipi_entry_rcuidle
trace_ipi_exit_rcuidle
trace_ipi_raise_rcuidle
trace_irq_disable_rcuidle
trace_irq_enable_rcuidle
trace_power_domain_target_rcuidle
trace_preempt_disable_rcuidle
trace_preempt_enable_rcuidle
trace_rpm_idle_rcuidle
trace_rpm_resume_rcuidle
trace_rpm_return_int_rcuidle
trace_rpm_suspend_rcuidle
trace_tlb_flush_rcuidle

All of these are either called from irqs or preemption disabled
already. So I think it should be fine to keep preemption on. But I'm
Ok with disabling it before callback execution if we agree that we
want that.

(and the ring buffer code is able to cope anyway Steven pointed).

thanks,

- Joel

2018-04-27 16:39:36

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, 27 Apr 2018 09:30:05 -0700
Joel Fernandes <[email protected]> wrote:

> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt <[email protected]> wrote:
> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
> > Mathieu Desnoyers <[email protected]> wrote:
> >
> >> The general approach and the implementation look fine, except for
> >> one small detail: I would be tempted to explicitly disable preemption
> >> around the call to the tracepoint callback for the rcuidle variant,
> >> unless we plan to audit every tracer right away to remove any assumption
> >> that preemption is disabled in the callback implementation.
> >
> > I'm thinking that we do that audit. There shouldn't be many instances
> > of it. I like the idea that a tracepoint callback gets called with
> > preemption enabled.
>
> Here is the list of all callers of the _rcuidle :

I was thinking of auditing who registers callbacks to any tracepoints.

-- Steve

>
> trace_clk_disable_complete_rcuidle
> trace_clk_disable_rcuidle
> trace_clk_enable_complete_rcuidle
> trace_clk_enable_rcuidle
> trace_console_rcuidle
> trace_cpu_idle_rcuidle
> trace_ipi_entry_rcuidle
> trace_ipi_exit_rcuidle
> trace_ipi_raise_rcuidle
> trace_irq_disable_rcuidle
> trace_irq_enable_rcuidle
> trace_power_domain_target_rcuidle
> trace_preempt_disable_rcuidle
> trace_preempt_enable_rcuidle
> trace_rpm_idle_rcuidle
> trace_rpm_resume_rcuidle
> trace_rpm_return_int_rcuidle
> trace_rpm_suspend_rcuidle
> trace_tlb_flush_rcuidle
>
> All of these are either called from irqs or preemption disabled
> already. So I think it should be fine to keep preemption on. But I'm
> Ok with disabling it before callback execution if we agree that we
> want that.
>
> (and the ring buffer code is able to cope anyway Steven pointed).
>
> thanks,
>
> - Joel

2018-04-27 16:44:43

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 12:13:30PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 08:57:01 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > > + if (preempt_on) { \
> > > + WARN_ON_ONCE(in_nmi()); /* no srcu from nmi */ \
> >
> > Very good on this check, thank you!
>
> I think you need to return and not call the read lock.

Works for me either way, at least assuming that the splat actually gets
printed. ;-)

> if (WARN_ON_ONCE(in_nmi()))
> return;
>
> >
> > > + idx = srcu_read_lock(&tracepoint_srcu); \
> >
> > Hmmm... Do I need to create a _notrace variant of srcu_read_lock()
> > and srcu_read_unlock()?
>
> I think so.

OK, please see the (untested) patch below. Of course,
srcu_read_lock_notrace() invokes __srcu_read_lock(), which looks as
follows:

int __srcu_read_lock(struct srcu_struct *sp)
{
int idx;

idx = READ_ONCE(sp->srcu_idx) & 0x1;
this_cpu_inc(sp->sda->srcu_lock_count[idx]);
smp_mb(); /* B */ /* Avoid leaking the critical section. */
return idx;
}

Do I also need to make a notrace version of __srcu_read_lock()?
Same question for __srcu_read_unlock(), which is similar. If so,
assuming that there is no need for a notrace variant of this_cpu_inc()
and smp_mb(), I suppose I could simply macro-ize the internals in both
cases, but perhaps you have a better approach.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7e8e41..e2e2cf05a6eb 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
return retval;
}

+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+ int retval;
+
+ retval = __srcu_read_lock(sp);
+ return retval;
+}
+
/**
* srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
* @sp: srcu_struct in which to unregister the old reader.
@@ -205,6 +215,13 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__releases(sp)
{
+ __srcu_read_unlock(sp, idx);
+}
+
+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
rcu_lock_release(&(sp)->dep_map);
__srcu_read_unlock(sp, idx);
}

2018-04-27 16:46:20

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 12:22:01PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 09:14:30 -0700
> Joel Fernandes <[email protected]> wrote:
>
> > > Hmmm... Do I need to create a _notrace variant of srcu_read_lock()
> > > and srcu_read_unlock()?
> >
> > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > preempt_disable which needs to be a notrace, but for the srcu one,
> > since we don't do that, I think it should be fine.
>
> Actually, I think I may agree here too. Because the _notrace is for
> function tracing, and it shouldn't affect it. If people don't want it
> traced, they could add those functions to the list in the notrace file.

OK, feel free to ignore my notrace srcu_read_lock() patch, then. ;-)

Thanx, Paul

2018-04-27 16:48:16

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, 27 Apr 2018 09:45:54 -0700
"Paul E. McKenney" <[email protected]> wrote:

> > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > since we don't do that, I think it should be fine.
> >
> > Actually, I think I may agree here too. Because the _notrace is for
> > function tracing, and it shouldn't affect it. If people don't want it
> > traced, they could add those functions to the list in the notrace file.
>
> OK, feel free to ignore my notrace srcu_read_lock() patch, then. ;-)

Of course I wasn't thinking about the lockdep tracepoints that Joel
mentioned, which happens to be the reason for all this discussion in
the first place :-) Now I think we do need it. (OK, I can keep
changing my mind, can't I?).

-- Steve

2018-04-27 17:01:23

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 12:46:41PM -0400, Steven Rostedt wrote:
> On Fri, 27 Apr 2018 09:45:54 -0700
> "Paul E. McKenney" <[email protected]> wrote:
>
> > > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > > since we don't do that, I think it should be fine.
> > >
> > > Actually, I think I may agree here too. Because the _notrace is for
> > > function tracing, and it shouldn't affect it. If people don't want it
> > > traced, they could add those functions to the list in the notrace file.
> >
> > OK, feel free to ignore my notrace srcu_read_lock() patch, then. ;-)
>
> Of course I wasn't thinking about the lockdep tracepoints that Joel
> mentioned, which happens to be the reason for all this discussion in
> the first place :-) Now I think we do need it. (OK, I can keep
> changing my mind, can't I?).

You can, but at some point I start applying heavy-duty hysteresis. ;-)

So the current thought (as of your having sent the above email) is that
we need notrace versions of srcu_read_lock() and srcu_read_unlock(),
but not for __srcu_read_lock() and __srcu_read_unlock(), correct?

Thanx, Paul

2018-04-27 17:05:44

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 10:00:48AM -0700, Paul E. McKenney wrote:
> On Fri, Apr 27, 2018 at 12:46:41PM -0400, Steven Rostedt wrote:
> > On Fri, 27 Apr 2018 09:45:54 -0700
> > "Paul E. McKenney" <[email protected]> wrote:
> >
> > > > > That shouldn't be needed. For the rcu_read_lock_sched case, there is a
> > > > > preempt_disable which needs to be a notrace, but for the srcu one,
> > > > > since we don't do that, I think it should be fine.
> > > >
> > > > Actually, I think I may agree here too. Because the _notrace is for
> > > > function tracing, and it shouldn't affect it. If people don't want it
> > > > traced, they could add those functions to the list in the notrace file.
> > >
> > > OK, feel free to ignore my notrace srcu_read_lock() patch, then. ;-)
> >
> > Of course I wasn't thinking about the lockdep tracepoints that Joel
> > mentioned, which happens to be the reason for all this discussion in
> > the first place :-) Now I think we do need it. (OK, I can keep
> > changing my mind, can't I?).
>
> You can, but at some point I start applying heavy-duty hysteresis. ;-)
>
> So the current thought (as of your having sent the above email) is that
> we need notrace versions of srcu_read_lock() and srcu_read_unlock(),
> but not for __srcu_read_lock() and __srcu_read_unlock(), correct?

And Joel noted offline that I messed up srcu_read_unlock_notrace(),
so here is an updated patch with that fixed.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 91494d7e8e41..3e72a291c401 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -195,6 +195,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
return retval;
}

+/* Used by tracing, cannot be traced and cannot invoke lockdep. */
+static inline notrace int
+srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp)
+{
+ int retval;
+
+ retval = __srcu_read_lock(sp);
+ return retval;
+}
+
/**
* srcu_read_unlock - unregister a old reader from an SRCU-protected structure.
* @sp: srcu_struct in which to unregister the old reader.
@@ -209,6 +219,13 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx)
__srcu_read_unlock(sp, idx);
}

+/* Used by tracing, cannot be traced and cannot call lockdep. */
+static inline notrace void
+srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp)
+{
+ __srcu_read_unlock(sp, idx);
+}
+
/**
* smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock
*

2018-04-27 18:13:30

[permalink] [raw]

Subject: Re: [PATCH RFC] tracepoint: Introduce tracepoint callbacks executing with preempt on

On Fri, Apr 27, 2018 at 9:37 AM, Steven Rostedt <[email protected]> wrote:
> On Fri, 27 Apr 2018 09:30:05 -0700
> Joel Fernandes <[email protected]> wrote:
>
>> On Fri, Apr 27, 2018 at 7:47 AM, Steven Rostedt <[email protected]> wrote:
>> > On Fri, 27 Apr 2018 10:26:29 -0400 (EDT)
>> > Mathieu Desnoyers <[email protected]> wrote:
>> >
>> >> The general approach and the implementation look fine, except for
>> >> one small detail: I would be tempted to explicitly disable preemption
>> >> around the call to the tracepoint callback for the rcuidle variant,
>> >> unless we plan to audit every tracer right away to remove any assumption
>> >> that preemption is disabled in the callback implementation.
>> >
>> > I'm thinking that we do that audit. There shouldn't be many instances
>> > of it. I like the idea that a tracepoint callback gets called with
>> > preemption enabled.
>>
>> Here is the list of all callers of the _rcuidle :
>
> I was thinking of auditing who registers callbacks to any tracepoints.

Ok. If you feel strongly about this, I think for now I could also just
wrap the callback execution with preempt_disable_notrace. And, when/if
we get to doing the blocking callbacks work, we can considering
keeping preempts on.

thanks,

- Joel

2018-04-27 18:43:55