2010-12-03 04:08:27

by Steven Rostedt

[permalink] [raw]
Subject: [RFC][PATCH 1/2 v2] tracing: Add TRACE_EVENT_CONDITIONAL()

From: Steven Rostedt <[email protected]>

There are instances in the kernel that we only want to trace
a tracepoint when a certain condition is set. But we do not
want to test for that condition in the core kernel.
If we test for that condition before calling the tracepoin, then
we will be performing that test even when tracing is not enabled.
This is 99.99% of the time.

We currently can just filter out on that condition, but that happens
after we write to the trace buffer. We just wasted time writing to
the ring buffer for an event we never cared about.

This patch adds:

TRACE_EVENT_CONDITION() and DEFINE_EVENT_CONDITION()

These have a new TP_CONDITION() argument that comes right after
the TP_ARGS(). This condition can use the parameters of TP_ARGS()
in the TRACE_EVENT() to determine if the tracepoint should be traced
or not. The TP_CONDITION() will be placed in a if (cond) trace;

For example, for the tracepoint sched_wakeup, it is useless to
trace a wakeup event where the caller never actually wakes
anything up (where success == 0). So adding:

TP_CONDITION(success),

which uses the "success" parameter of the wakeup tracepoint
will have it only trace when we have successfully woken up a
task.

Cc: Mathieu Desnoyers <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/tracepoint.h | 29 +++++++++++++++++++++++------
include/trace/define_trace.h | 15 +++++++++++++++
2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 5a6074f..d3e4f87 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -106,6 +106,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,

#define TP_PROTO(args...) args
#define TP_ARGS(args...) args
+#define TP_CONDITION(args...) args

#ifdef CONFIG_TRACEPOINTS

@@ -119,12 +120,14 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
* as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just
* "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto".
*/
-#define __DO_TRACE(tp, proto, args) \
+#define __DO_TRACE(tp, proto, args, cond) \
do { \
struct tracepoint_func *it_func_ptr; \
void *it_func; \
void *__data; \
\
+ if (!(cond)) \
+ return; \
rcu_read_lock_sched_notrace(); \
it_func_ptr = rcu_dereference_sched((tp)->funcs); \
if (it_func_ptr) { \
@@ -142,7 +145,7 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
* not add unwanted padding between the beginning of the section and the
* structure. Force alignment to the same alignment as the section start.
*/
-#define __DECLARE_TRACE(name, proto, args, data_proto, data_args) \
+#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
extern struct tracepoint __tracepoint_##name; \
static inline void trace_##name(proto) \
{ \
@@ -151,7 +154,8 @@ static inline void tracepoint_update_probe_range(struct tracepoint *begin,
do_trace: \
__DO_TRACE(&__tracepoint_##name, \
TP_PROTO(data_proto), \
- TP_ARGS(data_args)); \
+ TP_ARGS(data_args), \
+ TP_CONDITION(cond)); \
} \
static inline int \
register_trace_##name(void (*probe)(data_proto), void *data) \
@@ -186,7 +190,7 @@ do_trace: \
EXPORT_SYMBOL(__tracepoint_##name)

#else /* !CONFIG_TRACEPOINTS */
-#define __DECLARE_TRACE(name, proto, args, data_proto, data_args) \
+#define __DECLARE_TRACE(name, proto, args, cond, data_proto, data_args) \
static inline void trace_##name(proto) \
{ } \
static inline int \
@@ -227,13 +231,18 @@ do_trace: \
* "void *__data, proto" as the callback prototype.
*/
#define DECLARE_TRACE_NOARGS(name) \
- __DECLARE_TRACE(name, void, , void *__data, __data)
+ __DECLARE_TRACE(name, void, , 1, void *__data, __data)

#define DECLARE_TRACE(name, proto, args) \
- __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \
+ __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), 1, \
PARAMS(void *__data, proto), \
PARAMS(__data, args))

+#define DECLARE_TRACE_CONDITION(name, proto, args, cond) \
+ __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), PARAMS(cond), \
+ PARAMS(void *__data, proto), \
+ PARAMS(__data, args))
+
#define TRACE_EVENT_FLAGS(event, flag)

#endif /* DECLARE_TRACE */
@@ -349,12 +358,20 @@ do_trace: \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
+#define DEFINE_EVENT_CONDITION(template, name, proto, \
+ args, cond) \
+ DECLARE_TRACE_CONDITION(name, PARAMS(proto), \
+ PARAMS(args), PARAMS(cond))

#define TRACE_EVENT(name, proto, args, struct, assign, print) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
#define TRACE_EVENT_FN(name, proto, args, struct, \
assign, print, reg, unreg) \
DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
+#define TRACE_EVENT_CONDITION(name, proto, args, cond, \
+ struct, assign, print) \
+ DECLARE_TRACE_CONDITION(name, PARAMS(proto), \
+ PARAMS(args), PARAMS(cond))

#define TRACE_EVENT_FLAGS(event, flag)

diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index 1dfab54..b0b4eb2 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -26,6 +26,15 @@
#define TRACE_EVENT(name, proto, args, tstruct, assign, print) \
DEFINE_TRACE(name)

+#undef TRACE_EVENT_CONDITION
+#define TRACE_EVENT_CONDITION(name, proto, args, cond, tstruct, assign, print) \
+ TRACE_EVENT(name, \
+ PARAMS(proto), \
+ PARAMS(args), \
+ PARAMS(tstruct), \
+ PARAMS(assign), \
+ PARAMS(print))
+
#undef TRACE_EVENT_FN
#define TRACE_EVENT_FN(name, proto, args, tstruct, \
assign, print, reg, unreg) \
@@ -39,6 +48,10 @@
#define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
DEFINE_TRACE(name)

+#undef DEFINE_EVENT_CONDITION
+#define DEFINE_EVENT_CONDITION(template, name, proto, args, cond) \
+ DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
+
#undef DECLARE_TRACE
#define DECLARE_TRACE(name, proto, args) \
DEFINE_TRACE(name)
@@ -75,9 +88,11 @@

#undef TRACE_EVENT
#undef TRACE_EVENT_FN
+#undef TRACE_EVENT_CONDITION
#undef DECLARE_EVENT_CLASS
#undef DEFINE_EVENT
#undef DEFINE_EVENT_PRINT
+#undef DEFINE_EVENT_CONDITION
#undef TRACE_HEADER_MULTI_READ
#undef DECLARE_TRACE

--
1.7.2.3


2010-12-03 04:54:09

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2 v2] tracing: Add TRACE_EVENT_CONDITIONAL()

* Steven Rostedt ([email protected]) wrote:
[...]
> -#define __DO_TRACE(tp, proto, args) \
> +#define __DO_TRACE(tp, proto, args, cond) \
> do { \
> struct tracepoint_func *it_func_ptr; \
> void *it_func; \
> void *__data; \
> \
> + if (!(cond)) \

One small documentation-related detail: my guess is that you are leaving
"cond" without likely/unlikely builtin expect purposefully so that we
can write, in TP_CONDITION:

TP_CONDITION(unlikely(someparam)),

when we expect the condition to be usually false (and likely() for the
reverse). Maybe it could be worth documenting that expressions like the
following are valid :

TP_CONDITION((likely(param1) && unlikely(param2)) || likely(param3))

It's fair to assume that kernel developers know this already, but given
we plan to re-use TRACE_EVENT() for the user-space tracer soon enough,
documenting this kind of use-case now can save us the trouble in the
future.

Other than that,

Acked-by: Mathieu Desnoyers <[email protected]>

Thanks!

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-12-03 14:09:08

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2 v2] tracing: Add TRACE_EVENT_CONDITIONAL()

On Thu, 2010-12-02 at 23:54 -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:
> [...]
> > -#define __DO_TRACE(tp, proto, args) \
> > +#define __DO_TRACE(tp, proto, args, cond) \
> > do { \
> > struct tracepoint_func *it_func_ptr; \
> > void *it_func; \
> > void *__data; \
> > \
> > + if (!(cond)) \
>
> One small documentation-related detail: my guess is that you are leaving
> "cond" without likely/unlikely builtin expect purposefully so that we
> can write, in TP_CONDITION:
>
> TP_CONDITION(unlikely(someparam)),

I actually think this is an abuse of "unlikely".

>
> when we expect the condition to be usually false (and likely() for the
> reverse). Maybe it could be worth documenting that expressions like the
> following are valid :
>
> TP_CONDITION((likely(param1) && unlikely(param2)) || likely(param3))
>
> It's fair to assume that kernel developers know this already, but given
> we plan to re-use TRACE_EVENT() for the user-space tracer soon enough,
> documenting this kind of use-case now can save us the trouble in the
> future.

I would frown on someone using unlikely here. But a TRACE_EVENT()
belongs to the maintainer, not me.

>
> Other than that,
>
> Acked-by: Mathieu Desnoyers <[email protected]>

Thanks!

-- Steve


2010-12-03 15:27:38

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2 v2] tracing: Add TRACE_EVENT_CONDITIONAL()

* Steven Rostedt ([email protected]) wrote:
> On Thu, 2010-12-02 at 23:54 -0500, Mathieu Desnoyers wrote:
[...]
> >
> > One small documentation-related detail: my guess is that you are leaving
> > "cond" without likely/unlikely builtin expect purposefully so that we
> > can write, in TP_CONDITION:
> >
> > TP_CONDITION(unlikely(someparam)),
>
> I actually think this is an abuse of "unlikely".

Why are you considering this an abuse ?

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-12-03 15:38:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2 v2] tracing: Add TRACE_EVENT_CONDITIONAL()

On Fri, 2010-12-03 at 10:27 -0500, Mathieu Desnoyers wrote:

> > > TP_CONDITION(unlikely(someparam)),
> >
> > I actually think this is an abuse of "unlikely".
>
> Why are you considering this an abuse ?

Because it is overused. I would rather get rid of most unlikely()'s
because they are mostly meaningless. Just run the unlikely profiler, and
you will see a large number of them are just plain incorrect.

Adding them here probably doesn't do any good. The only reason for this
TP_CONDITION() is to ignore those cases that it just does not make sense
to trace. Like a wake up tracepoint that does not wake anything up. No
need for "unlikely" or "likely", by trying to do that, you will most
likely get it wrong.

unlikely(use_likely_correctly)


-- Steve

2010-12-03 15:46:36

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/2 v2] tracing: Add TRACE_EVENT_CONDITIONAL()

* Steven Rostedt ([email protected]) wrote:
> On Fri, 2010-12-03 at 10:27 -0500, Mathieu Desnoyers wrote:
>
> > > > TP_CONDITION(unlikely(someparam)),
> > >
> > > I actually think this is an abuse of "unlikely".
> >
> > Why are you considering this an abuse ?
>
> Because it is overused. I would rather get rid of most unlikely()'s
> because they are mostly meaningless. Just run the unlikely profiler, and
> you will see a large number of them are just plain incorrect.
>
> Adding them here probably doesn't do any good. The only reason for this
> TP_CONDITION() is to ignore those cases that it just does not make sense
> to trace. Like a wake up tracepoint that does not wake anything up. No
> need for "unlikely" or "likely", by trying to do that, you will most
> likely get it wrong.
>
> unlikely(use_likely_correctly)

Ah OK. You are afraid that people will misuse it, not saying that it would be
technically incorrect. Fair enough. It sounds like a good enough reason for not
documenting this use-case.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com