Hi,
On Sat, 2009-02-28 at 04:06 -0500, Steven Rostedt wrote:
> plain text document attachment
> (0007-tracing-add-raw-trace-point-recording-infrastructur.patch)
> From: Steven Rostedt <[email protected]>
>
> Impact: lower overhead tracing
>
> The current event tracer can automatically pick up trace points
> that are registered with the TRACE_FORMAT macro. But it required
> a printf format string and parsing. Although, this adds the ability
> to get guaranteed information like task names and such, it took
> a hit in overhead processing. This processing can add about 500-1000
> nanoseconds overhead, but in some cases that too is considered
> too much and we want to shave off as much from this overhead as
> possible.
>
> Tom Zanussi recently posted tracing patches to lkml that are based
> on a nice idea about capturing the data via C structs using
> STRUCT_ENTER, STRUCT_EXIT type of macros.
>
> I liked that method very much, but did not like the implementation
> that required a developer to add data/code in several disjoint
> locations.
>
> This patch extends the event_tracer macros to do a similar "raw C"
> approach that Tom Zanussi did. But instead of having the developers
> needing to tweak a bunch of code all over the place, they can do it
> all in one macro - preferably placed near the code that it is
> tracing. That makes it much more likely that tracepoints will be
> maintained on an ongoing basis by the code they modify.
>
> The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
> a developer may still utilize the more low level DECLARE_TRACE macros
> if they don't care about getting their traces automatically in the event
> tracer.)
>
> They can also use the existing TRACE_FORMAT if they don't need to code
> the tracepoint in C, but just want to use the convenience of printf.
>
> So if the developer wants to "hardwire" a tracepoint in the fastest
> possible way, and wants to acquire their data via a user space utility
> in a raw binary format, or wants to see it in the trace output but not
> sacrifice any performance, then they can implement the faster but
> more complex TRACE_EVENT_FORMAT macro.
>
> Here's what usage looks like:
>
> TRACE_EVENT_FORMAT(name,
> TPPROTO(proto),
> TPARGS(args),
> TPFMT(fmt, fmt_args),
> TRACE_STUCT(
> TRACE_FIELD(type1, item1, assign1)
> TRACE_FIELD(type2, item2, assign2)
> [...]
> ),
> TPRAWFMT(raw_fmt)
> );
>
> Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
> uses.
>
> name: is the unique identifier of the trace point
> proto: The proto type that the trace point uses
> args: the args in the proto type
> fmt: printf format to use with the event printf tracer
> fmt_args: the printf argments to match fmt
>
> TRACE_STRUCT starts the ability to create a structure.
> Each item in the structure is defined with a TRACE_FIELD
>
> TRACE_FIELD(type, item, assign)
>
> type: the C type of item.
> item: the name of the item in the stucture
> assign: what to assign the item in the trace point callback
>
> raw_fmt is a way to pretty print the struct. It must match
> the order of the items are added in TRACE_STUCT
>
> An example of this would be:
>
> TRACE_EVENT_FORMAT(sched_wakeup,
> TPPROTO(struct rq *rq, struct task_struct *p, int success),
> TPARGS(rq, p, success),
> TPFMT("task %s:%d %s",
> p->comm, p->pid, success?"succeeded":"failed"),
> TRACE_STRUCT(
> TRACE_FIELD(pid_t, pid, p->pid)
> TRACE_FIELD(int, success, success)
> ),
> TPRAWFMT("task %d success=%d")
> );
>
> This creates us a unique struct of:
>
> struct {
> pid_t pid;
> int success;
> };
>
> And the way the call back would assign these values would be:
>
> entry->pid = p->pid;
> entry->success = success;
>
> The nice part about this is that the creation of the assignent is done
> via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
> created, the developer will then have a faster method to record
> into the ring buffer. They do not need to worry about the tracer itself.
>
Nice improvements - I definitely was unhappy about having things spread
around in different files unnecessarily. And I like the fact that your
macros generate assignments too but am curious about what to do if you
need to do something more complicated than an assignment e.g. in the
block tracepoints I had to assign fields differently based on the value
of blk_pc_request():
if (blk_pc_request(rq)) {
zed_event->sector = 0;
zed_event->bytes = rq->data_len;
zed_event->pdu_len = pdu_len;
memcpy(zed_event->pdu, rq->cmd, pdu_len);
} else {
zed_event->sector = rq->hard_sector;
zed_event->bytes = rq->hard_nr_sectors << 9;
zed_event->pdu_len = 0;
}
Is there a way to define some fields but without the assignments, and do
them manually somewhere else? I guess it would be nice to be able to
define all events using TRACE_EVENT_FORMAT but have a way to special
case certain events/fields.
Anyway, sorry if it's already handled in the code - haven't had a chance
to really peruse it.
Tom
2009/3/2 Tom Zanussi <[email protected]>:
> Hi,
>
> On Sat, 2009-02-28 at 04:06 -0500, Steven Rostedt wrote:
>> plain text document attachment
>> (0007-tracing-add-raw-trace-point-recording-infrastructur.patch)
>> From: Steven Rostedt <[email protected]>
>>
>> Impact: lower overhead tracing
>>
>> The current event tracer can automatically pick up trace points
>> that are registered with the TRACE_FORMAT macro. But it required
>> a printf format string and parsing. Although, this adds the ability
>> to get guaranteed information like task names and such, it took
>> a hit in overhead processing. This processing can add about 500-1000
>> nanoseconds overhead, but in some cases that too is considered
>> too much and we want to shave off as much from this overhead as
>> possible.
>>
>> Tom Zanussi recently posted tracing patches to lkml that are based
>> on a nice idea about capturing the data via C structs using
>> STRUCT_ENTER, STRUCT_EXIT type of macros.
>>
>> I liked that method very much, but did not like the implementation
>> that required a developer to add data/code in several disjoint
>> locations.
>>
>> This patch extends the event_tracer macros to do a similar "raw C"
>> approach that Tom Zanussi did. But instead of having the developers
>> needing to tweak a bunch of code all over the place, they can do it
>> all in one macro - preferably placed near the code that it is
>> tracing. That makes it much more likely that tracepoints will be
>> maintained on an ongoing basis by the code they modify.
>>
>> The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
>> a developer may still utilize the more low level DECLARE_TRACE macros
>> if they don't care about getting their traces automatically in the event
>> tracer.)
>>
>> They can also use the existing TRACE_FORMAT if they don't need to code
>> the tracepoint in C, but just want to use the convenience of printf.
>>
>> So if the developer wants to "hardwire" a tracepoint in the fastest
>> possible way, and wants to acquire their data via a user space utility
>> in a raw binary format, or wants to see it in the trace output but not
>> sacrifice any performance, then they can implement the faster but
>> more complex TRACE_EVENT_FORMAT macro.
>>
>> Here's what usage looks like:
>>
>> ? TRACE_EVENT_FORMAT(name,
>> ? ? ? TPPROTO(proto),
>> ? ? ? TPARGS(args),
>> ? ? ? TPFMT(fmt, fmt_args),
>> ? ? ? TRACE_STUCT(
>> ? ? ? ? ? ? ? TRACE_FIELD(type1, item1, assign1)
>> ? ? ? ? ? ? ? TRACE_FIELD(type2, item2, assign2)
>> ? ? ? ? ? ? ? ? ? ? ? [...]
>> ? ? ? ),
>> ? ? ? TPRAWFMT(raw_fmt)
>> ? ? ? );
>>
>> Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
>> uses.
>>
>> ?name: is the unique identifier of the trace point
>> ?proto: The proto type that the trace point uses
>> ?args: the args in the proto type
>> ?fmt: printf format to use with the event printf tracer
>> ?fmt_args: the printf argments to match fmt
>>
>> ?TRACE_STRUCT starts the ability to create a structure.
>> ?Each item in the structure is defined with a TRACE_FIELD
>>
>> ? TRACE_FIELD(type, item, assign)
>>
>> ?type: the C type of item.
>> ?item: the name of the item in the stucture
>> ?assign: what to assign the item in the trace point callback
>>
>> ?raw_fmt is a way to pretty print the struct. It must match
>> ? the order of the items are added in TRACE_STUCT
>>
>> ?An example of this would be:
>>
>> ?TRACE_EVENT_FORMAT(sched_wakeup,
>> ? ? ? TPPROTO(struct rq *rq, struct task_struct *p, int success),
>> ? ? ? TPARGS(rq, p, success),
>> ? ? ? TPFMT("task %s:%d %s",
>> ? ? ? ? ? ? p->comm, p->pid, success?"succeeded":"failed"),
>> ? ? ? TRACE_STRUCT(
>> ? ? ? ? ? ? ? TRACE_FIELD(pid_t, pid, p->pid)
>> ? ? ? ? ? ? ? TRACE_FIELD(int, success, success)
>> ? ? ? ),
>> ? ? ? TPRAWFMT("task %d success=%d")
>> ? ? ? );
>>
>> ?This creates us a unique struct of:
>>
>> ?struct {
>> ? ? ? pid_t ? ? ? ? ? pid;
>> ? ? ? int ? ? ? ? ? ? success;
>> ?};
>>
>> ?And the way the call back would assign these values would be:
>>
>> ? ? ? entry->pid = p->pid;
>> ? ? ? entry->success = success;
>>
>> The nice part about this is that the creation of the assignent is done
>> via macro magic in the event tracer. ?Once the TRACE_EVENT_FORMAT is
>> created, the developer will then have a faster method to record
>> into the ring buffer. They do not need to worry about the tracer itself.
>>
>
> Nice improvements - I definitely was unhappy about having things spread
> around in different files unnecessarily. ?And I like the fact that your
> macros generate assignments too but am curious about what to do if you
> need to do something more complicated than an assignment e.g. in the
> block tracepoints I had to assign fields differently based on the value
> of blk_pc_request():
>
> ? ? ? if (blk_pc_request(rq)) {
> ? ? ? ? ? ? ? zed_event->sector = 0;
> ? ? ? ? ? ? ? zed_event->bytes = rq->data_len;
> ? ? ? ? ? ? ? zed_event->pdu_len = pdu_len;
> ? ? ? ? ? ? ? memcpy(zed_event->pdu, rq->cmd, pdu_len);
> ? ? ? } else {
> ? ? ? ? ? ? ? zed_event->sector = rq->hard_sector;
> ? ? ? ? ? ? ? zed_event->bytes = rq->hard_nr_sectors << 9;
> ? ? ? ? ? ? ? zed_event->pdu_len = 0;
> ? ? ? }
>
> Is there a way to define some fields but without the assignments, and do
> them manually somewhere else? ?I guess it would be nice to be able to
> define all events using TRACE_EVENT_FORMAT but have a way to special
> case certain events/fields.
Note that on such case you can do a conditional assignment:
TRACE_FIELD(int, bytes, blk_pc_request(rq) ? rq->data_len :
rq->hard_nr_sectors << 9);
The drawback here is that you'll have to repeat this conditional for
each field, which is annoying
and a loss of performance.
Perhaps it would be interesting to allow something more low level if
the user wishes.
> Anyway, sorry if it's already handled in the code - haven't had a chance
> to really peruse it.
>
> Tom
>
>
>
>
On Mon, 2 Mar 2009, Tom Zanussi wrote:
> > An example of this would be:
> >
> > TRACE_EVENT_FORMAT(sched_wakeup,
> > TPPROTO(struct rq *rq, struct task_struct *p, int success),
> > TPARGS(rq, p, success),
> > TPFMT("task %s:%d %s",
> > p->comm, p->pid, success?"succeeded":"failed"),
> > TRACE_STRUCT(
> > TRACE_FIELD(pid_t, pid, p->pid)
> > TRACE_FIELD(int, success, success)
> > ),
> > TPRAWFMT("task %d success=%d")
> > );
> >
> > This creates us a unique struct of:
> >
> > struct {
> > pid_t pid;
> > int success;
> > };
> >
> > And the way the call back would assign these values would be:
> >
> > entry->pid = p->pid;
> > entry->success = success;
> >
> > The nice part about this is that the creation of the assignent is done
> > via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
> > created, the developer will then have a faster method to record
> > into the ring buffer. They do not need to worry about the tracer itself.
> >
>
> Nice improvements - I definitely was unhappy about having things spread
> around in different files unnecessarily. And I like the fact that your
> macros generate assignments too but am curious about what to do if you
> need to do something more complicated than an assignment e.g. in the
> block tracepoints I had to assign fields differently based on the value
> of blk_pc_request():
>
> if (blk_pc_request(rq)) {
> zed_event->sector = 0;
> zed_event->bytes = rq->data_len;
> zed_event->pdu_len = pdu_len;
> memcpy(zed_event->pdu, rq->cmd, pdu_len);
> } else {
> zed_event->sector = rq->hard_sector;
> zed_event->bytes = rq->hard_nr_sectors << 9;
> zed_event->pdu_len = 0;
> }
>
> Is there a way to define some fields but without the assignments, and do
> them manually somewhere else? I guess it would be nice to be able to
> define all events using TRACE_EVENT_FORMAT but have a way to special
> case certain events/fields.
>
> Anyway, sorry if it's already handled in the code - haven't had a chance
> to really peruse it.
Nope, you are right, it is not handled... yet ;-)
I was thinking about adding TRACE_FIELD_SPECIAL() that would allow for a
different means to copy the field.
TRACE_FIELD_SPECIAL(char*, pdu, rec,
TPCMD(memcpy((rec)->pdu, rq->cmd, pdu_len));
Where, rec would have your "zed_event", and the fourth argument would have
the way to handle that field.
I have not tried that yet, but I think this, or something similar, could
work.
Thanks,
-- Steve