Subject: Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion

On 08/20/2014 09:14 PM, Alexandre Montplaisir wrote:
> On 08/20/2014 05:28 AM, Jiri Olsa wrote:
>>
>> ok, easy enough ;-) so I'm guessing this governs the expected
>> CTF layout for event/stream headers/contexts, right?
>
> Correct, if the domain is "kernel" we then assume that the rest of the
> trace contains the expected elements of a kernel trace.

So that is one thing that I had to add.

> Of course, one could craft a CTF trace to advertize itself as "kernel"
> or "ust", and not actually have the layout of that trace type, in which
> case it would fail parsing later on.
>
>> Also judging from the trace display, you have hardcoded specific
>> displays/actions for specific events? That's all connected/specific
>> under trace type?
>
> Yes the trace type is the main "provider" of functionality. I could go
> into more details, but we use Eclipse extension points to define which
> columns to put in the event table, which views are available, etc. for
> each supported trace type.
>
>>> Once we have some views or analysis specific to perf CTF traces, we
>>> could
>>> definitely add a separate trace type for those too.
>> I guess tracepoints and breakpoints should display something like
>> the standard kernel trace. As for HW events it's usual to display
>> profile infomation as the perf report does:
>>
>> https://perf.wiki.kernel.org/index.php/Tutorial#Sampling_with_perf_record
>
> Interesting, I haven't tried the perf CTF output yet, but I could see it
> using the Statistics view (which by default just prints the % of events,
> per event type) to print the results of the different "perf reports",
> calculated from the CTF events. Eventually with pie charts!

I did on the linux side:

|perf record \
| -e sched:sched_switch \
| -a

This gave me perf.data trace. No with the new extension I converted it
into CTF data stream (perf data convert -i perf.data --to-ctf ctf) and
imported it into eclipse as a new trace. By setting 'domain = "kernel"'
I managed to get it accepted as a kernel trace.

Additionally I had to:
- rename sched:sched_switch -> sched_switch (I dropped the other events)
- rename "perf_cpu" to "cpu_id" and move it within "packet context"
(had a patch for that but we wanted to merge this later)
- I added "prev_tid" with the content of "prev_pid" and I added
"next_tid" with the content of "next_pid". Now exclipse does not
"freeze" after loading the first entry
- I prefixed every entry with _ (so "prev_tid" becomes "_prev_tid") and
now it seems to be recognized by the tracing plugin.

puh. Now I have something in "cpu usage", "control flow" windows.
The cpu_id field change will be addressed soon on our side.
Now, the remaining things:
The "domain = kernel" thingy (or another identifier if desired) is
something we could add.
What do we do on the naming convention?

The converter takes basically the event names the way they come from
perf. That is why we have a "system" prefix.
For the member fields, we take all the specific ones as perf serves
them. The only exception is for the generic fields which get a "perf_"
prefix in order not to clash with the specific ones.
And there is this _ prefix in front of every data member.

Now that I identified the differences between the CTF from lttng and
perf, any suggestions / ideas how this could be solved?

> Cheers,
> Alexandre

Sebastian


2014-11-04 01:30:07

by Alexandre Montplaisir

[permalink] [raw]
Subject: Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion

Hi Sebastian,

On 11/03/2014 06:58 PM, Sebastian Andrzej Siewior wrote:
> [...]
> I did on the linux side:
>
> |perf record \
> | -e sched:sched_switch \
> | -a
>
> This gave me perf.data trace. No with the new extension I converted it
> into CTF data stream (perf data convert -i perf.data --to-ctf ctf) and
> imported it into eclipse as a new trace. By setting 'domain = "kernel"'
> I managed to get it accepted as a kernel trace.
>
> Additionally I had to:
> - rename sched:sched_switch -> sched_switch (I dropped the other events)
> - rename "perf_cpu" to "cpu_id" and move it within "packet context"
> (had a patch for that but we wanted to merge this later)
> - I added "prev_tid" with the content of "prev_pid" and I added
> "next_tid" with the content of "next_pid". Now exclipse does not
> "freeze" after loading the first entry
> - I prefixed every entry with _ (so "prev_tid" becomes "_prev_tid") and
> now it seems to be recognized by the tracing plugin.
>
> puh. Now I have something in "cpu usage", "control flow" windows.

This is really great! Initially, I had believed that we would have
needed to add a separate parser plugin, and to consider "perf traces" as
a completely different beast from LTTng traces. However if you can get
this close to they way LTTng presents its data, then we can probably
re-use most of the existing code. In which case we could rename the
"LTTng Kernel Trace" type in the UI to simply "Linux Kernel Trace". And
that would cover both LTTng kernel traces and CTF-perf traces.

> The cpu_id field change will be addressed soon on our side.
> Now, the remaining things:
> The "domain = kernel" thingy (or another identifier if desired) is
> something we could add.

Unless the event data is exactly the same, it would be easier to use a
different name. Like "kernel-perf" for instance?
From the user's point of view, both would still be Linux Kernel Traces,
but we could use the domain internally to determine which event/field
layout to use.

Mathieu, any thoughts on how CTF domains should be namespaced?

> What do we do on the naming convention?
>
> The converter takes basically the event names the way they come from
> perf. That is why we have a "system" prefix.
> For the member fields, we take all the specific ones as perf serves
> them. The only exception is for the generic fields which get a "perf_"
> prefix in order not to clash with the specific ones.
> And there is this _ prefix in front of every data member.
>
> Now that I identified the differences between the CTF from lttng and
> perf, any suggestions / ideas how this could be solved?

I suppose it would be better/cleaner if the event and field names would
remain the same, or at least be similar, in the perf.data and perf-CTF
formats.

If the trace events from both LTTng and perf represent the same thing
(and I assume they should, since they come from the same tracepoints,
right?), then we could just add a wrapper on the viewer side to decide
which event/field names to use, depending on the trace type.

Right now, we only define LTTng event and field names:
http://git.eclipse.org/c/tracecompass/org.eclipse.tracecompass.git/tree/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/internal/lttng2/kernel/core/LttngStrings.java

But if you could for example tell me the perf equivalents of all the
strings in that file, I could hack together such wrapper. With that, in
theory, perf traces should behave exactly the same as LTTng traces in
the viewer!


Cheers,
Alexandre

ps. fyi, we have recently moved the trace viewer code to its own
project, now called "Trace Compass". We will still support the Linux
Tools plugins for the time being, but all new developments are done in
Trace Compass. If you want to check it out: http://eclipse.org/tracecompass

Subject: Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion

* Alexandre Montplaisir | 2014-11-04 02:20:10 [+0100]:

>Hi Sebastian,
Hi Alexandre,

>On 11/03/2014 06:58 PM, Sebastian Andrzej Siewior wrote:
>This is really great! Initially, I had believed that we would have
>needed to add a separate parser plugin, and to consider "perf traces"
>as a completely different beast from LTTng traces. However if you can
>get this close to they way LTTng presents its data, then we can
>probably re-use most of the existing code. In which case we could
>rename the "LTTng Kernel Trace" type in the UI to simply "Linux
>Kernel Trace". And that would cover both LTTng kernel traces and
>CTF-perf traces.

we have now CTF here. So lets see what we do about the naming
convention.

>>The cpu_id field change will be addressed soon on our side.
>>Now, the remaining things:
>>The "domain = kernel" thingy (or another identifier if desired) is
>>something we could add.
>
>Unless the event data is exactly the same, it would be easier to use
>a different name. Like "kernel-perf" for instance?

Some kind of a namespace / identifier is probably not wrong. The lttng
tracer added a tracer version probably in case the format changes
between version for some reason. Perf comes with the kernel so for this
the kernel version should sufficient.

>From the user's point of view, both would still be Linux Kernel
>Traces, but we could use the domain internally to determine which
>event/field layout to use.
>
>Mathieu, any thoughts on how CTF domains should be namespaced?
>
>>Now that I identified the differences between the CTF from lttng and
>>perf, any suggestions / ideas how this could be solved?
>
>I suppose it would be better/cleaner if the event and field names
>would remain the same, or at least be similar, in the perf.data and
>perf-CTF formats.

Yes, that would be cool. Especially if we teach perf to record straight
to CTF.

>If the trace events from both LTTng and perf represent the same thing
>(and I assume they should, since they come from the same tracepoints,
>right?), then we could just add a wrapper on the viewer side to
>decide which event/field names to use, depending on the trace type.
>
>Right now, we only define LTTng event and field names:
>http://git.eclipse.org/c/tracecompass/org.eclipse.tracecompass.git/tree/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/internal/lttng2/kernel/core/LttngStrings.java

Okay. So I found this file for linuxtools now let me try tracecompass.
The basic renaming should do the job. Then I have to figure out how to
compile this thingy…

There is this one thing where you go for "tid" while perf says "pid". I
guess I could figure that out once I have the rename done.
We don't have lttng_statedump_process_state, this look lttng specific. I
would have to look if there is a replacement event in perf.

I have no idea what we could do about the "unknown" events, say someone
enbales skb tracing. But this is probably something for once we are
done with the basic integration.

>But if you could for example tell me the perf equivalents of all the
>strings in that file, I could hack together such wrapper. With that,
>in theory, perf traces should behave exactly the same as LTTng traces
>in the viewer!

Oooh, that would be awesome. So I installed maven but didn't get much
further. Let me gather this for you.

So first the renaming:
diff --git a/LttngStrings.java b/LttngStrings.java
--- a/LttngStrings.java
+++ b/LttngStrings.java
@@ -27,17 +27,17 @@ public interface LttngStrings {

/* Event names */
static final String EXIT_SYSCALL = "exit_syscall";
- static final String IRQ_HANDLER_ENTRY = "irq_handler_entry";
- static final String IRQ_HANDLER_EXIT = "irq_handler_exit";
- static final String SOFTIRQ_ENTRY = "softirq_entry";
- static final String SOFTIRQ_EXIT = "softirq_exit";
- static final String SOFTIRQ_RAISE = "softirq_raise";
- static final String SCHED_SWITCH = "sched_switch";
- static final String SCHED_WAKEUP = "sched_wakeup";
- static final String SCHED_WAKEUP_NEW = "sched_wakeup_new";
- static final String SCHED_PROCESS_FORK = "sched_process_fork";
- static final String SCHED_PROCESS_EXIT = "sched_process_exit";
- static final String SCHED_PROCESS_FREE = "sched_process_free";
+ static final String IRQ_HANDLER_ENTRY = "irq:irq_handler_entry";
+ static final String IRQ_HANDLER_EXIT = "irq:irq_handler_exit";
+ static final String SOFTIRQ_ENTRY = "irq:softirq_entry";
+ static final String SOFTIRQ_EXIT = "irq:softirq_exit";
+ static final String SOFTIRQ_RAISE = "irq:softirq_raise";
+ static final String SCHED_SWITCH = "sched:sched_switch";
+ static final String SCHED_WAKEUP = "sched:sched_wakeup";
+ static final String SCHED_WAKEUP_NEW = "sched:sched_wakeup_new";
+ static final String SCHED_PROCESS_FORK = "sched:sched_process_fork";
+ static final String SCHED_PROCESS_EXIT = "sched:sched_process_exit";
+ static final String SCHED_PROCESS_FREE = "sched:sched_process_free";
static final String STATEDUMP_PROCESS_STATE = "lttng_statedump_process_state";

/* System call names */

I have no idea how exit_syscall is different from irq_handler_exit and I
think we have to skip for now on STATEDUMP_PROCESS_STATE.

For the syscalls:

- static final String SYSCALL_PREFIX = "sys_";
It is bassicaly:
syscalls:sys_enter_
syscalls:sys_exit_
depending what you are looking for.

- static final String COMPAT_SYSCALL_PREFIX = "compat_sys_";
I haven't found this. Could it be that we don't record the compat_sys
at all?

- static final String SYS_CLONE = "sys_clone";
here we have
syscalls:sys_enter_clone
syscalls:sys_exit_clone
I guess the enter is what you are looking for.

For the fields, this is one event with alle the members we have. Please
note that lttng saves the members with the _ prefix and I haven't seen
that prefix in that .java file. The members of each event:

event {
name = "raw_syscalls:sys_enter";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
integer { size = 64; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } id;
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } args[6];
};

event {
name = "raw_syscalls:sys_exit";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
integer { size = 64; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } id;
integer { size = 64; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } ret;
};

event {
name = "irq:irq_handler_entry";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } irq;
string { encoding = UTF8; } name;
};

event {
name = "irq:irq_handler_exit";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } irq;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } ret;
};

event {
name = "irq:softirq_entry";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } vec;
};

event {
name = "irq:softirq_exit";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } vec;
};

event {
name = "irq:softirq_raise";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } vec;
};

event {
name = "sched:sched_switch";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
string { encoding = UTF8; } prev_comm;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } prev_pid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } prev_prio;
integer { size = 64; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } prev_state;
string { encoding = UTF8; } next_comm;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } next_pid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } next_prio;
};

event {
name = "sched:sched_wakeup";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
string { encoding = UTF8; } comm;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } pid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } prio;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } success;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } target_cpu;
};

event {
name = "sched:sched_wakeup_new";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
string { encoding = UTF8; } comm;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } pid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } prio;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } success;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } target_cpu;
};

event {
name = "sched:sched_process_fork";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
string { encoding = UTF8; } parent_comm;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } parent_pid;
string { encoding = UTF8; } child_comm;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } child_pid;
};

event {
name = "sched:sched_process_exit";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
string { encoding = UTF8; } comm;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } pid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } prio;
};

event {
name = "sched:sched_process_free";
integer { size = 64; align = 1; signed = false; encoding = none; base = hexadecimal; byte_order = le; } perf_ip;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_tid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } perf_pid;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_id;
integer { size = 64; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } perf_period;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_type;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_flags;
integer { size = 32; align = 1; signed = false; encoding = none; base = decimal; byte_order = le; } common_preempt_count;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } common_pid;
string { encoding = UTF8; } comm;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } pid;
integer { size = 32; align = 1; signed = true; encoding = none; base = decimal; byte_order = le; } prio;
};

and here are some example from babeltrace output. One example per event:

raw_syscalls:sys_enter
[03:37:07.579969498] (+?.?????????) raw_syscalls:sys_enter: { cpu_id = 2 }, { perf_ip = 0xFFFFFFFF81020EBC, perf_tid = 30004, perf_pid = 30004, perf_id = 382, perf_period = 1, common_type = 76, common_flags = 0, common_preempt_count = 0, common_pid = 30004, id = 16, args = [ [0] = 0xE, [1] = 0x2400, [2] = 0x0, [3] = 0x0, [4] = 0xA20F00, [5] = 0xA1FDA0 ] }

raw_syscalls:sys_exit
[03:37:07.579971763] (+0.000002265) raw_syscalls:sys_exit: { cpu_id = 2 }, { perf_ip = 0xFFFFFFFF81020FAD, perf_tid = 30004, perf_pid = 30004, perf_id = 390, perf_period = 1, common_type = 75, common_flags = 0, common_preempt_count = 0, common_pid = 30004, id = 16, ret = 0 }

irq:irq_handler_entry
[03:37:08.088041486] (+0.000000211) irq:irq_handler_entry: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810B8C30, perf_tid = 0, perf_pid = 0, perf_id = 396, perf_period = 1, common_type = 112, common_flags = 9, common_preempt_count = 0, common_pid = 0, irq = 47, name = "ahci" }

irq:irq_handler_exit
[03:37:08.088049506] (+0.000000508) irq:irq_handler_exit: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF810B8C03, perf_tid = 0, perf_pid = 0, perf_id = 404, perf_period = 1, common_type = 111, common_flags = 9, common_preempt_count = 0, common_pid = 0, irq = 47, ret = 1 }

irq:softirq_entry
[03:37:07.583335617] (+0.000001100) irq:softirq_entry: { cpu_id = 3 }, { perf_ip = 0xFFFFFFFF8106A6DC, perf_tid = 0, perf_pid = 0, perf_id = 415, perf_period = 1, common_type = 110, common_flags = 16, common_preempt_count = 0, common_pid = 0, vec = 1 }

irq:softirq_exit
[03:37:07.583336555] (+0.000000619) irq:softirq_exit: { cpu_id = 2 }, { perf_ip = 0xFFFFFFFF8106A6AC, perf_tid = 30004, perf_pid = 30004, perf_id = 422, perf_period = 1, common_type = 109, common_flags = 16, common_preempt_count = 0, common_pid = 30004, vec = 1 }

irq:softirq_raise
[03:37:07.583333374] (+0.000002067) irq:softirq_raise: { cpu_id = 3 }, { perf_ip = 0xFFFFFFFF8106AA7C, perf_tid = 0, perf_pid = 0, perf_id = 431, perf_period = 1, common_type = 108, common_flags = 9, common_preempt_count = 0, common_pid = 0, vec = 1 }

sched:sched_switch
[03:37:07.583339259] (+0.000000046) sched:sched_switch: { cpu_id = 3 }, { perf_ip = 0xFFFFFFFF81508B48, perf_tid = 0, perf_pid = 0, perf_id = 439, perf_period = 1, common_type = 312, common_flags = 1, common_preempt_count = 0, common_pid = 0, prev_comm = "swapper/3", prev_pid = 0, prev_prio = 120, prev_state = 0, next_comm = "rcu_sched", next_pid = 7, next_prio = 120 }

sched:sched_wakeup
[03:37:07.583336717] (+0.000000162) sched:sched_wakeup: { cpu_id = 3 }, { perf_ip = 0xFFFFFFFF81092112, perf_tid = 0, perf_pid = 0, perf_id = 447, perf_period = 1, common_type = 314, common_flags = 53, common_preempt_count = 0, common_pid = 0, comm = "rcu_sched", pid = 7, prio = 120, success = 1, target_cpu = 3 }

sched:sched_wakeup_new
[04:16:12.328543282] (+0.000005791) sched:sched_wakeup_new: { cpu_id = 2 }, { perf_ip = 0xFFFFFFFF810944B2, perf_tid = 2819, perf_pid = 2819 , perf_id = 486, perf_period = 1, common_type = 313, common_flags = 1, common_preempt_count = 0, common_pid = 2819, comm = "bash", pid = 32604, prio = 120, success = 1, target_cpu = 7 }

sched:sched_process_fork
[04:16:12.328537491] (+?.?????????) sched:sched_process_fork: { cpu_id = 2 }, { perf_ip = 0xFFFFFFFF810649D0, perf_tid = 2819, perf_pid = 28 19, perf_id = 494, perf_period = 1, common_type = 306, common_flags = 0, common_preempt_count = 0, common_pid = 2819, parent_comm = "bash", parent_pid = 2819, child_comm = "bash", child_pid = 32604 }

sched:sched_process_exit
[04:16:12.350957386] (+0.022414104) sched:sched_process_exit: { cpu_id = 0 }, { perf_ip = 0xFFFFFFFF81067FED, perf_tid = 32604, perf_pid = 32604, perf_id = 500, perf_period = 1, common_type = 309, common_flags = 0, common_preempt_count = 0, common_pid = 32604, comm = "ps", pid = 32604, prio = 120 }

sched:sched_process_free
[04:16:12.382175653] (+0.031218267) sched:sched_process_free: { cpu_id = 2 }, { perf_ip = 0xFFFFFFFF810663FD, perf_tid = 18, perf_pid = 18, perf_id = 510, perf_period = 1, common_type = 310, common_flags = 16, common_preempt_count = 0, common_pid = 18, comm = "ps", pid = 32604, prio = 120 }

The complete babeltrace output from my sample trace is at
https://breakpoint.cc/perf-ctf

>Cheers,
>Alexandre

Sebastian

2014-11-05 17:21:13

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC 0/5] perf tools: Add perf data CTF conversion

----- Original Message -----
> From: "Sebastian Andrzej Siewior" <[email protected]>
> To: "Alexandre Montplaisir" <[email protected]>
> Cc: "Jiri Olsa" <[email protected]>, [email protected], "Dominique Toupin" <[email protected]>,
> "Mathieu Desnoyers" <[email protected]>, "Tom Zanussi" <[email protected]>, "Jeremie Galarneau"
> <[email protected]>, "David Ahern" <[email protected]>, "Arnaldo Carvalho de Melo" <[email protected]>
> Sent: Wednesday, November 5, 2014 7:50:28 AM
> Subject: Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion
>
> * Alexandre Montplaisir | 2014-11-04 02:20:10 [+0100]:
>
> >Hi Sebastian,
> Hi Alexandre,

Hi!

Sorry for jumping in late in the discussion. I really wanted to
consider the various impact of tracepoint semantic before answering.

>
> >On 11/03/2014 06:58 PM, Sebastian Andrzej Siewior wrote:
> >This is really great! Initially, I had believed that we would have
> >needed to add a separate parser plugin, and to consider "perf traces"
> >as a completely different beast from LTTng traces. However if you can
> >get this close to they way LTTng presents its data, then we can
> >probably re-use most of the existing code. In which case we could
> >rename the "LTTng Kernel Trace" type in the UI to simply "Linux
> >Kernel Trace". And that would cover both LTTng kernel traces and
> >CTF-perf traces.
>
> we have now CTF here. So lets see what we do about the naming
> convention.
>
> >>The cpu_id field change will be addressed soon on our side.
> >>Now, the remaining things:
> >>The "domain = kernel" thingy (or another identifier if desired) is
> >>something we could add.
> >
> >Unless the event data is exactly the same, it would be easier to use
> >a different name. Like "kernel-perf" for instance?
>
> Some kind of a namespace / identifier is probably not wrong. The lttng
> tracer added a tracer version probably in case the format changes
> between version for some reason. Perf comes with the kernel so for this
> the kernel version should sufficient.

Yes, using the kernel version for Perf makes sense. I reach a similar
conclusion for LTTng: we should add tracepoint semantic versioning
somewhere in the CTF metadata, because the semantic of an event can
change based on the LTTng version, and based on which kernel version
LTTng is tracing.

A very good example is the semantic of the sched_wakeup event. It has
changed due to scheduler code modification, and is now called from an
IPI context, which changes its semantic (not called from the same
PID). Unfortunately, there is little we can do besides checking the
kernel version to detect the semantic change from the trace viewer
side, because neither the event nor the field names have changed.

The trace viewer could therefore care about the following information
to identify the semantic of a trace:

- Tracer name (e.g. lttng or perf),
- Domain (e.g. kernel or userspace),
- Tracepoint versioning (e.g. kernel version for Perf).

Because CTF supports both kernel and userspace tracing, we also want
to solve this semantic detection problem both for the kernel and
userspace. Therefore, we should consider how the userspace
tracepoints could save version information in the user-space metadata
too.

Since we have traces shared across applications (per user-ID buffers)
in lttng-ust, the semantic info, and therefore the versioning, should
be done on a per-provider (or per-event) basis, rather than trace-wide,
because a single trace could contain events from various applications,
each with their own set of providers, therefore each with their
versioning info.

So if we apply this description scheme to the kernel tracing case,
this would mean that each event in the CTF metadata would have
version information. For Perf, this could very well be the kernel
version that we simply repeat for each event metadata entry. For
LTTng-modules, we would have our own versioning that is independent
of the kernel version, since the semantic of the events we expose
can change for a given kernel version as lttng-modules evolves.

In summary, for perf it would be really easy: just repeat the
kernel version in a new attribute attached to each event in the
metadata. For LTTng we would have the flexibility to have our own
version numbers in there. This would also cover the case of
userspace tracing, allowing each application to advertise their
tracepoint provider semantic changes through versioning.

>
> >From the user's point of view, both would still be Linux Kernel
> >Traces, but we could use the domain internally to determine which
> >event/field layout to use.
> >
> >Mathieu, any thoughts on how CTF domains should be namespaced?

(see above)

> >
> >>Now that I identified the differences between the CTF from lttng and
> >>perf, any suggestions / ideas how this could be solved?
> >
> >I suppose it would be better/cleaner if the event and field names
> >would remain the same, or at least be similar, in the perf.data and
> >perf-CTF formats.
>
> Yes, that would be cool. Especially if we teach perf to record straight
> to CTF.
>
> >If the trace events from both LTTng and perf represent the same thing
> >(and I assume they should, since they come from the same tracepoints,
> >right?), then we could just add a wrapper on the viewer side to
> >decide which event/field names to use, depending on the trace type.

I think we might want to keep a different semantic namespace for
perf and lttng, because LTTng has the luxury to change event semantic
mapping between minor LTTng versions in order to add/remove/tweak event
content as necessary, and Perf is really tied to each kernel version
it is shipped with.

> >
> >Right now, we only define LTTng event and field names:
> >http://git.eclipse.org/c/tracecompass/org.eclipse.tracecompass.git/tree/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/internal/lttng2/kernel/core/LttngStrings.java
>
> Okay. So I found this file for linuxtools now let me try tracecompass.
> The basic renaming should do the job. Then I have to figure out how to
> compile this thingy…
>
> There is this one thing where you go for "tid" while perf says "pid". I
> guess I could figure that out once I have the rename done.

LTTng uses the semantic presented to user-space to identify threads and
processes. What you find in /proc is what you find in a LTTng trace. The
tracepoint semantic used by perf and ftrace uses the kernel-internal
meaning of pid = thread ID, pgid = process ID, which differs from what is
visible from user-space.

I guess it's up to you to decide if you want to stick to the kernel-internal
semantic, or switch to the user-visible (/proc) semantic for perf traces.

> We don't have lttng_statedump_process_state, this look lttng specific. I
> would have to look if there is a replacement event in perf.

Not that I am aware of. Perf tends to add fields to each records to keep
track of extra state. LTTng can also do that by dynamically attaching
context information, but it also supports dumping the initial system
state, thus allowing trace viewers to reconstruct the system state by
reading the trace, starting with the state dump events at the beginning.

>
> I have no idea what we could do about the "unknown" events, say someone
> enbales skb tracing. But this is probably something for once we are
> done with the basic integration.
>
> >But if you could for example tell me the perf equivalents of all the
> >strings in that file, I could hack together such wrapper. With that,
> >in theory, perf traces should behave exactly the same as LTTng traces
> >in the viewer!

Ideally, the Trace Compass views should only care about a model of the OS.
Populating this model can be done by various "state gathering" plugins,
e.g. one for lttng, one for perf, which know about versioning and semantic
of the events contained in each trace.

[...]

> For the fields, this is one event with alle the members we have. Please
> note that lttng saves the members with the _ prefix and I haven't seen
> that prefix in that .java file. The members of each event:

Yeah, the _ prefix for event names. This is one decision I would like to
find a way to revert, but we'll have to live with it unfortunately for
CTF 1.8. The issue it's trying to fix is to allow having fields named
"event" that don't clash with the "event" reserved keyword. When I added
the _ prefix, I did it like this in the CTF spec:

"Replacing reserved keywords with underscore-prefixed field names is
recommended. Fields starting with an underscore should have their leading
underscore removed by the CTF trace readers."

Unfortunately, this introduces semantic corner-cases for event names that
would indeed start with an underscore, unless they are prefixed with
double-underscore in the metadata.

So far, the only fix I see to this situation is to eventually do a
CTF 1.9, and add the notion of a $ prefix to the grammar (which is not
part of the symbols accepted for an identifier) to be used as a field
name prefix that ensures there is no clash with reserved keywords. I'm
very open to suggestions there through, and I'm really not in a hurry
to release a new CTF spec version (we should only do so when we have
a batch of changes that are required, because it will require all trace
readers to be updated).

Thanks!

Mathieu

> >Cheers,
> >Alexandre
>
> Sebastian
>

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2014-11-06 03:25:56

by Alexandre Montplaisir

[permalink] [raw]
Subject: Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion


On 11/05/2014 01:50 PM, Sebastian Andrzej Siewior wrote:
> [...]
>> If the trace events from both LTTng and perf represent the same thing
>> (and I assume they should, since they come from the same tracepoints,
>> right?), then we could just add a wrapper on the viewer side to
>> decide which event/field names to use, depending on the trace type.
>>
>> Right now, we only define LTTng event and field names:
>> http://git.eclipse.org/c/tracecompass/org.eclipse.tracecompass.git/tree/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/internal/lttng2/kernel/core/LttngStrings.java
> Okay. So I found this file for linuxtools now let me try tracecompass.
> The basic renaming should do the job. Then I have to figure out how to
> compile this thingy…

"mvn clean install". It is the Maven equivalent of "./configure && make" ;)

Or if you want to build a standalone application (RCP):
mvn clean install -Pbuild-rcp -Dmaven.test.skip=true

see the README file in the git tree for details.

>
> There is this one thing where you go for "tid" while perf says "pid". I
> guess I could figure that out once I have the rename done.
> We don't have lttng_statedump_process_state, this look lttng specific. I
> would have to look if there is a replacement event in perf.

Yes, the state dump is something specific to LTTng. It allows us to know
about processes that exist on the system, even if they are sleeping for
the whole duration of the trace (and thus, would not show up in the
trace at all).

But even if these events are not present, we can still know about active
processes when they do shed_switch's for example.

> I have no idea what we could do about the "unknown" events, say someone
> enbales skb tracing. But this is probably something for once we are
> done with the basic integration.
>
>> But if you could for example tell me the perf equivalents of all the
>> strings in that file, I could hack together such wrapper. With that,
>> in theory, perf traces should behave exactly the same as LTTng traces
>> in the viewer!
> Oooh, that would be awesome. So I installed maven but didn't get much
> further. Let me gather this for you.

Awesome, thanks!

I am travelling this week, so I'm a bit busy, but I will try to
prototype a "wrapper" for the kernel analysis, and adding support for
the perf events, whenever I have a chance. I'll keep you posted.

>
> So first the renaming:
> diff --git a/LttngStrings.java b/LttngStrings.java
> --- a/LttngStrings.java
> +++ b/LttngStrings.java
> @@ -27,17 +27,17 @@ public interface LttngStrings {
>
> /* Event names */
> static final String EXIT_SYSCALL = "exit_syscall";
> - static final String IRQ_HANDLER_ENTRY = "irq_handler_entry";
> - static final String IRQ_HANDLER_EXIT = "irq_handler_exit";
> - static final String SOFTIRQ_ENTRY = "softirq_entry";
> - static final String SOFTIRQ_EXIT = "softirq_exit";
> - static final String SOFTIRQ_RAISE = "softirq_raise";
> - static final String SCHED_SWITCH = "sched_switch";
> - static final String SCHED_WAKEUP = "sched_wakeup";
> - static final String SCHED_WAKEUP_NEW = "sched_wakeup_new";
> - static final String SCHED_PROCESS_FORK = "sched_process_fork";
> - static final String SCHED_PROCESS_EXIT = "sched_process_exit";
> - static final String SCHED_PROCESS_FREE = "sched_process_free";
> + static final String IRQ_HANDLER_ENTRY = "irq:irq_handler_entry";
> + static final String IRQ_HANDLER_EXIT = "irq:irq_handler_exit";
> + static final String SOFTIRQ_ENTRY = "irq:softirq_entry";
> + static final String SOFTIRQ_EXIT = "irq:softirq_exit";
> + static final String SOFTIRQ_RAISE = "irq:softirq_raise";
> + static final String SCHED_SWITCH = "sched:sched_switch";
> + static final String SCHED_WAKEUP = "sched:sched_wakeup";
> + static final String SCHED_WAKEUP_NEW = "sched:sched_wakeup_new";
> + static final String SCHED_PROCESS_FORK = "sched:sched_process_fork";
> + static final String SCHED_PROCESS_EXIT = "sched:sched_process_exit";
> + static final String SCHED_PROCESS_FREE = "sched:sched_process_free";
> static final String STATEDUMP_PROCESS_STATE = "lttng_statedump_process_state";
>
> /* System call names */
>
> I have no idea how exit_syscall is different from irq_handler_exit and I
> think we have to skip for now on STATEDUMP_PROCESS_STATE.
>
> For the syscalls:
>
> - static final String SYSCALL_PREFIX = "sys_";
> It is bassicaly:
> syscalls:sys_enter_
> syscalls:sys_exit_
> depending what you are looking for.
>
> - static final String COMPAT_SYSCALL_PREFIX = "compat_sys_";
> I haven't found this. Could it be that we don't record the compat_sys
> at all?

IIRC, compat_sys is for instance for 32-bit system calls on a 64-bit
kernel. Perhaps the "compat" system calls are recorded as standard
system call events with perf? We could test it once we get the base
things working.

> - static final String SYS_CLONE = "sys_clone";
> here we have
> syscalls:sys_enter_clone
> syscalls:sys_exit_clone
> I guess the enter is what you are looking for.
>
> For the fields, this is one event with alle the members we have. Please
> note that lttng saves the members with the _ prefix and I haven't seen
> that prefix in that .java file.

As Mathieu explained in his reply, in LTTng-CTF they have a _ before
field names. In our parser, we take out the first character if it is an
underscore. So it should still work with underscore-less fields.


Cheers,
Alexandre

2014-11-06 04:53:36

by Alexandre Montplaisir

[permalink] [raw]
Subject: Re: [RFC 0/5] perf tools: Add perf data CTF conversion

Hi Mathieu,


On 11/05/2014 06:21 PM, Mathieu Desnoyers wrote:
> [...]
>>>> The cpu_id field change will be addressed soon on our side.
>>>> Now, the remaining things:
>>>> The "domain = kernel" thingy (or another identifier if desired) is
>>>> something we could add.
>>> Unless the event data is exactly the same, it would be easier to use
>>> a different name. Like "kernel-perf" for instance?
>> Some kind of a namespace / identifier is probably not wrong. The lttng
>> tracer added a tracer version probably in case the format changes
>> between version for some reason. Perf comes with the kernel so for this
>> the kernel version should sufficient.
> Yes, using the kernel version for Perf makes sense. I reach a similar
> conclusion for LTTng: we should add tracepoint semantic versioning
> somewhere in the CTF metadata, because the semantic of an event can
> change based on the LTTng version, and based on which kernel version
> LTTng is tracing.
>
> A very good example is the semantic of the sched_wakeup event. It has
> changed due to scheduler code modification, and is now called from an
> IPI context, which changes its semantic (not called from the same
> PID). Unfortunately, there is little we can do besides checking the
> kernel version to detect the semantic change from the trace viewer
> side, because neither the event nor the field names have changed.
>
> The trace viewer could therefore care about the following information
> to identify the semantic of a trace:
>
> - Tracer name (e.g. lttng or perf),
> - Domain (e.g. kernel or userspace),
> - Tracepoint versioning (e.g. kernel version for Perf).

Sounds good. So perf-CTF traces could still use the "kernel" domain, but
the CTF environment metadata would also mention the tracer, which could
be so far either lttng or perf. For now we only look at the domain to
infer the trace type, but we could also look at the tracer, and tracer
version, to determine which event and field naming to use for the analysis.

I can also see how in general, versioning the "instrumentation" of an
instrumented program could be useful. For example, LTTng changed the
name of their syscall events in 2.6. The event still represents the same
thing from an analysis's point of view, only the name changed.

> Because CTF supports both kernel and userspace tracing, we also want
> to solve this semantic detection problem both for the kernel and
> userspace. Therefore, we should consider how the userspace
> tracepoints could save version information in the user-space metadata
> too.
>
> Since we have traces shared across applications (per user-ID buffers)
> in lttng-ust, the semantic info, and therefore the versioning, should
> be done on a per-provider (or per-event) basis, rather than trace-wide,
> because a single trace could contain events from various applications,
> each with their own set of providers, therefore each with their
> versioning info.

Hmm, where would this per-tracepoint version come from? From the version
of the application? From a new "instrumentation version" defined
somewhere? Or would the maintainers of the application have to manually
version every single tracepoint in their program?

Per-tracepoint versioning, at first glance, seems a bit heavy. I'd have
to understand more about it to make an informed opinion though ;) But
this seems to be a problem for userspace traces only, right? Because
with kernel traces
1) the tracers put the kernel version in the environment metadata and
2) you can't have more than one kernel provider in the same CTF trace
(can you?)

But from a trace viewer's analysis point of view, I think it would make
sense. If events in the trace supply a version (in addition to its
name/type), then the analysis may decide to handle different versions of
an event in different ways.


>
> So if we apply this description scheme to the kernel tracing case,
> this would mean that each event in the CTF metadata would have
> version information. For Perf, this could very well be the kernel
> version that we simply repeat for each event metadata entry. For
> LTTng-modules, we would have our own versioning that is independent
> of the kernel version, since the semantic of the events we expose
> can change for a given kernel version as lttng-modules evolves.
>
> In summary, for perf it would be really easy: just repeat the
> kernel version in a new attribute attached to each event in the
> metadata. For LTTng we would have the flexibility to have our own
> version numbers in there. This would also cover the case of
> userspace tracing, allowing each application to advertise their
> tracepoint provider semantic changes through versioning.
>
>> >From the user's point of view, both would still be Linux Kernel
>>> Traces, but we could use the domain internally to determine which
>>> event/field layout to use.
>>>
>>> Mathieu, any thoughts on how CTF domains should be namespaced?
> (see above)
>
>>>> Now that I identified the differences between the CTF from lttng and
>>>> perf, any suggestions / ideas how this could be solved?
>>> I suppose it would be better/cleaner if the event and field names
>>> would remain the same, or at least be similar, in the perf.data and
>>> perf-CTF formats.
>> Yes, that would be cool. Especially if we teach perf to record straight
>> to CTF.
>>
>>> If the trace events from both LTTng and perf represent the same thing
>>> (and I assume they should, since they come from the same tracepoints,
>>> right?), then we could just add a wrapper on the viewer side to
>>> decide which event/field names to use, depending on the trace type.
> I think we might want to keep a different semantic namespace for
> perf and lttng, because LTTng has the luxury to change event semantic
> mapping between minor LTTng versions in order to add/remove/tweak event
> content as necessary, and Perf is really tied to each kernel version
> it is shipped with.
>
>>> Right now, we only define LTTng event and field names:
>>> http://git.eclipse.org/c/tracecompass/org.eclipse.tracecompass.git/tree/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/internal/lttng2/kernel/core/LttngStrings.java
>> Okay. So I found this file for linuxtools now let me try tracecompass.
>> The basic renaming should do the job. Then I have to figure out how to
>> compile this thingy…
>>
>> There is this one thing where you go for "tid" while perf says "pid". I
>> guess I could figure that out once I have the rename done.
> LTTng uses the semantic presented to user-space to identify threads and
> processes. What you find in /proc is what you find in a LTTng trace. The
> tracepoint semantic used by perf and ftrace uses the kernel-internal
> meaning of pid = thread ID, pgid = process ID, which differs from what is
> visible from user-space.
>
> I guess it's up to you to decide if you want to stick to the kernel-internal
> semantic, or switch to the user-visible (/proc) semantic for perf traces.

This is something I will have to look more into. We do use TIDs for most
of the kernel analysis, because that is what LTTng is usually providing,
but we also track PID's, with events like the statedump and fork's. We
just need to make sure we match the field values to the right thing.

>
>> We don't have lttng_statedump_process_state, this look lttng specific. I
>> would have to look if there is a replacement event in perf.
> Not that I am aware of. Perf tends to add fields to each records to keep
> track of extra state. LTTng can also do that by dynamically attaching
> context information, but it also supports dumping the initial system
> state, thus allowing trace viewers to reconstruct the system state by
> reading the trace, starting with the state dump events at the beginning.
>
>> I have no idea what we could do about the "unknown" events, say someone
>> enbales skb tracing. But this is probably something for once we are
>> done with the basic integration.
>>
>>> But if you could for example tell me the perf equivalents of all the
>>> strings in that file, I could hack together such wrapper. With that,
>>> in theory, perf traces should behave exactly the same as LTTng traces
>>> in the viewer!
> Ideally, the Trace Compass views should only care about a model of the OS.
> Populating this model can be done by various "state gathering" plugins,
> e.g. one for lttng, one for perf, which know about versioning and semantic
> of the events contained in each trace.

Exactly, the "wrapper" I was talking about previously would be something
like an interface that only exposes the *concepts* present in the
application, in this case the Linux kernel. It would then be up to the
support of each tracer (or tracer version) to provide which events and
fields to use for each of those concepts.


Cheers!
Alexandre

>
> [...]
>
>> For the fields, this is one event with alle the members we have. Please
>> note that lttng saves the members with the _ prefix and I haven't seen
>> that prefix in that .java file. The members of each event:
> Yeah, the _ prefix for event names. This is one decision I would like to
> find a way to revert, but we'll have to live with it unfortunately for
> CTF 1.8. The issue it's trying to fix is to allow having fields named
> "event" that don't clash with the "event" reserved keyword. When I added
> the _ prefix, I did it like this in the CTF spec:
>
> "Replacing reserved keywords with underscore-prefixed field names is
> recommended. Fields starting with an underscore should have their leading
> underscore removed by the CTF trace readers."
>
> Unfortunately, this introduces semantic corner-cases for event names that
> would indeed start with an underscore, unless they are prefixed with
> double-underscore in the metadata.
>
> So far, the only fix I see to this situation is to eventually do a
> CTF 1.9, and add the notion of a $ prefix to the grammar (which is not
> part of the symbols accepted for an identifier) to be used as a field
> name prefix that ensures there is no clash with reserved keywords. I'm
> very open to suggestions there through, and I'm really not in a hurry
> to release a new CTF spec version (we should only do so when we have
> a batch of changes that are required, because it will require all trace
> readers to be updated).
>
> Thanks!
>
> Mathieu
>
>>> Cheers,
>>> Alexandre
>> Sebastian
>>

2014-11-10 01:31:39

by Alexandre Montplaisir

[permalink] [raw]
Subject: Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion

On 2014-11-05 10:25 PM, Alexandre Montplaisir wrote:
>
>>
>>> But if you could for example tell me the perf equivalents of all the
>>> strings in that file, I could hack together such wrapper. With that,
>>> in theory, perf traces should behave exactly the same as LTTng traces
>>> in the viewer!
>> Oooh, that would be awesome. So I installed maven but didn't get much
>> further. Let me gather this for you.
>
> Awesome, thanks!
>
> I am travelling this week, so I'm a bit busy, but I will try to
> prototype a "wrapper" for the kernel analysis, and adding support for
> the perf events, whenever I have a chance. I'll keep you posted.

Ok, some good news!

I managed to get the CTF traces from perf working in Trace Compass! See
attached screenshots. This is showing the "ctf-out2" trace from your
previous email. The other trace seems to have less events enabled, so it
would only show some WAIT_FOR_CPU states in the view.

If anybody wishes to try it, you can grab the whole branch ending at
https://git.eclipse.org/r/#/c/36200/ . Or run:
$ git fetch
git://git.eclipse.org/gitroot/tracecompass/org.eclipse.tracecompass
refs/changes/00/36200/3 && git checkout FETCH_HEAD

It reuses much of the code from the LTTng analysis, which is why it was
relatively quick to do. For now, it looks for the domain in the CTF
environment to be "kernel-perf". But this will be easy to update, if
needed, once the final format is decided.

Maybe I missed it, but I couldn't find the system call names in the
trace. Using the sys_enter and sys_exit events, the viewer is able to
determine the kernel-mode states (in blue), but we cannot show the exact
system call names like we do with LTTng.
There is also something weird with the arrows in the Control Flow View
(disabled in the screenshot), I don't know if it's due to the
particularity of the trace or to a bug in the view. We'll investigate.

Feedback is very welcome.

Cheers,
Alexandre


Attachments:
ss1.png (168.04 kB)
ss2.png (126.12 kB)
Download all attachments

2014-11-12 22:14:51

by Alexandre Montplaisir

[permalink] [raw]
Subject: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)


On 11/09/2014 08:31 PM, Alexandre Montplaisir wrote:
> On 2014-11-05 10:25 PM, Alexandre Montplaisir wrote:
>>
>>>
>>>> But if you could for example tell me the perf equivalents of all the
>>>> strings in that file, I could hack together such wrapper. With that,
>>>> in theory, perf traces should behave exactly the same as LTTng traces
>>>> in the viewer!
>>> Oooh, that would be awesome. So I installed maven but didn't get much
>>> further. Let me gather this for you.
>>
>> Awesome, thanks!
>>
>> I am travelling this week, so I'm a bit busy, but I will try to
>> prototype a "wrapper" for the kernel analysis, and adding support for
>> the perf events, whenever I have a chance. I'll keep you posted.
>
> Ok, some good news!
>
> I managed to get the CTF traces from perf working in Trace Compass!
> See attached screenshots. This is showing the "ctf-out2" trace from
> your previous email. The other trace seems to have less events
> enabled, so it would only show some WAIT_FOR_CPU states in the view.
>
> If anybody wishes to try it, you can grab the whole branch ending at
> https://git.eclipse.org/r/#/c/36200/ . Or run:
> $ git fetch
> git://git.eclipse.org/gitroot/tracecompass/org.eclipse.tracecompass
> refs/changes/00/36200/3 && git checkout FETCH_HEAD

Just a quick note, this branch is now merged to master. So anyone who
pulls the code from the master branch at
git://git.eclipse.org/gitroot/tracecompass/org.eclipse.tracecompass.git
should be able to load perf-CTF traces in the viewer. The trace type is
now called "Common Trace Format -> Linux Kernel Trace" and should
support both LTTng kernel and perf traces in CTF format (although
auto-detection should work in most cases).

This was based on the most recent file format I was aware of, we will
update it accordingly if required.

Testing welcome!

Cheers,
Alexandre

>
> It reuses much of the code from the LTTng analysis, which is why it
> was relatively quick to do. For now, it looks for the domain in the
> CTF environment to be "kernel-perf". But this will be easy to update,
> if needed, once the final format is decided.
>
> Maybe I missed it, but I couldn't find the system call names in the
> trace. Using the sys_enter and sys_exit events, the viewer is able to
> determine the kernel-mode states (in blue), but we cannot show the
> exact system call names like we do with LTTng.
> There is also something weird with the arrows in the Control Flow View
> (disabled in the screenshot), I don't know if it's due to the
> particularity of the trace or to a bug in the view. We'll investigate.
>
> Feedback is very welcome.
>
> Cheers,
> Alexandre

Subject: Re: [RFC 0/5] perf tools: Add perf data CTF conversion

On 11/05/2014 06:21 PM, Mathieu Desnoyers wrote:
> A very good example is the semantic of the sched_wakeup event. It has
> changed due to scheduler code modification, and is now called from an
> IPI context, which changes its semantic (not called from the same
> PID). Unfortunately, there is little we can do besides checking the
> kernel version to detect the semantic change from the trace viewer
> side, because neither the event nor the field names have changed.
>
> The trace viewer could therefore care about the following information
> to identify the semantic of a trace:
>
> - Tracer name (e.g. lttng or perf),
> - Domain (e.g. kernel or userspace),
> - Tracepoint versioning (e.g. kernel version for Perf).

this sounds reasonable. That means for "domain" I switch to kernel from
kernel-perf that I am using now. And then I need to add tracer_name.

> In summary, for perf it would be really easy: just repeat the
> kernel version in a new attribute attached to each event in the
> metadata. For LTTng we would have the flexibility to have our own
> version numbers in there. This would also cover the case of
> userspace tracing, allowing each application to advertise their
> tracepoint provider semantic changes through versioning.

So what you are saying is that I need something like:

event {
name = "sched:sched_process_fork";
id = 1;
stream_id = 0;
=> version = "3.16";
fields := struct {
integer { … } perf_ip;
integer { … } perf_tid;

} align(8);
};

where the line marked "=>" is that one I should add.

>>> Right now, we only define LTTng event and field names:
>>> http://git.eclipse.org/c/tracecompass/org.eclipse.tracecompass.git/tree/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/internal/lttng2/kernel/core/LttngStrings.java
>>
>> Okay. So I found this file for linuxtools now let me try tracecompass.
>> The basic renaming should do the job. Then I have to figure out how to
>> compile this thingy…
>>
>> There is this one thing where you go for "tid" while perf says "pid". I
>> guess I could figure that out once I have the rename done.
>
> LTTng uses the semantic presented to user-space to identify threads and
> processes. What you find in /proc is what you find in a LTTng trace. The
> tracepoint semantic used by perf and ftrace uses the kernel-internal
> meaning of pid = thread ID, pgid = process ID, which differs from what is
> visible from user-space.
>
> I guess it's up to you to decide if you want to stick to the kernel-internal
> semantic, or switch to the user-visible (/proc) semantic for perf traces.

I am happy if I can record and pass unchanged perf data :)

>> We don't have lttng_statedump_process_state, this look lttng specific. I
>> would have to look if there is a replacement event in perf.
>
> Not that I am aware of. Perf tends to add fields to each records to keep
> track of extra state. LTTng can also do that by dynamically attaching
> context information, but it also supports dumping the initial system
> state, thus allowing trace viewers to reconstruct the system state by
> reading the trace, starting with the state dump events at the beginning.

I see. So if this is really a must-have for trace compass there would
need to be a similar event added once we start perf. But from what I
read in Alexandre's email it is not that tragic.

>> For the fields, this is one event with alle the members we have. Please
>> note that lttng saves the members with the _ prefix and I haven't seen
>> that prefix in that .java file. The members of each event:
>
> Yeah, the _ prefix for event names. This is one decision I would like to
> find a way to revert, but we'll have to live with it unfortunately for
> CTF 1.8. The issue it's trying to fix is to allow having fields named
> "event" that don't clash with the "event" reserved keyword. When I added
> the _ prefix, I did it like this in the CTF spec:
>
> "Replacing reserved keywords with underscore-prefixed field names is
> recommended. Fields starting with an underscore should have their leading
> underscore removed by the CTF trace readers."
>
> Unfortunately, this introduces semantic corner-cases for event names that
> would indeed start with an underscore, unless they are prefixed with
> double-underscore in the metadata.
>
> So far, the only fix I see to this situation is to eventually do a
> CTF 1.9, and add the notion of a $ prefix to the grammar (which is not
> part of the symbols accepted for an identifier) to be used as a field
> name prefix that ensures there is no clash with reserved keywords. I'm
> very open to suggestions there through, and I'm really not in a hurry
> to release a new CTF spec version (we should only do so when we have
> a batch of changes that are required, because it will require all trace
> readers to be updated).

Aha. I haven't seen this underscore prefix in babeltrace examples so I
wasn't aware for this. Thanks for explaining. Now should I add the
prefix to perf by all means or is okay keep it as is?

> Thanks!
>
> Mathieu
>
>>> Cheers,
>>> Alexandre

Sebastian

Subject: Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion

I try to get through my ctf mailbox and I hoped I can finish it today
but I don't make it completely…

On 11/06/2014 04:25 AM, Alexandre Montplaisir wrote:
> "mvn clean install". It is the Maven equivalent of "./configure && make" ;)
>
> Or if you want to build a standalone application (RCP):
> mvn clean install -Pbuild-rcp -Dmaven.test.skip=true

thanks.

> Yes, the state dump is something specific to LTTng. It allows us to know
> about processes that exist on the system, even if they are sleeping for
> the whole duration of the trace (and thus, would not show up in the
> trace at all).
>
> But even if these events are not present, we can still know about active
> processes when they do shed_switch's for example.

good to know.

> IIRC, compat_sys is for instance for 32-bit system calls on a 64-bit
> kernel. Perhaps the "compat" system calls are recorded as standard
> system call events with perf? We could test it once we get the base
> things working.

I booted a x86-64 in a 32bit userland and I expected to see them
somewhere but nothing. However if the task has a compat flag then it
would be the same information, right?

>> - static final String SYS_CLONE = "sys_clone";
>> here we have
>> syscalls:sys_enter_clone
>> syscalls:sys_exit_clone
>> I guess the enter is what you are looking for.
>>
>> For the fields, this is one event with alle the members we have. Please
>> note that lttng saves the members with the _ prefix and I haven't seen
>> that prefix in that .java file.
>
> As Mathieu explained in his reply, in LTTng-CTF they have a _ before
> field names. In our parser, we take out the first character if it is an
> underscore. So it should still work with underscore-less fields.

I see. I think it started working once I added the underline prefix but
I might be wrong. Let me see what Mathieu says if I may leave that
prefix out.

> Cheers,
> Alexandre
Sebastian

2014-11-14 11:51:11

by Jiri Olsa

[permalink] [raw]
Subject: Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion

adding Matthew Khouzam to the loop

jirka

On Thu, Nov 13, 2014 at 08:24:48PM +0100, Sebastian Andrzej Siewior wrote:
> I try to get through my ctf mailbox and I hoped I can finish it today
> but I don't make it completely…
>
> On 11/06/2014 04:25 AM, Alexandre Montplaisir wrote:
> > "mvn clean install". It is the Maven equivalent of "./configure && make" ;)
> >
> > Or if you want to build a standalone application (RCP):
> > mvn clean install -Pbuild-rcp -Dmaven.test.skip=true
>
> thanks.
>
> > Yes, the state dump is something specific to LTTng. It allows us to know
> > about processes that exist on the system, even if they are sleeping for
> > the whole duration of the trace (and thus, would not show up in the
> > trace at all).
> >
> > But even if these events are not present, we can still know about active
> > processes when they do shed_switch's for example.
>
> good to know.
>
> > IIRC, compat_sys is for instance for 32-bit system calls on a 64-bit
> > kernel. Perhaps the "compat" system calls are recorded as standard
> > system call events with perf? We could test it once we get the base
> > things working.
>
> I booted a x86-64 in a 32bit userland and I expected to see them
> somewhere but nothing. However if the task has a compat flag then it
> would be the same information, right?
>
> >> - static final String SYS_CLONE = "sys_clone";
> >> here we have
> >> syscalls:sys_enter_clone
> >> syscalls:sys_exit_clone
> >> I guess the enter is what you are looking for.
> >>
> >> For the fields, this is one event with alle the members we have. Please
> >> note that lttng saves the members with the _ prefix and I haven't seen
> >> that prefix in that .java file.
> >
> > As Mathieu explained in his reply, in LTTng-CTF they have a _ before
> > field names. In our parser, we take out the first character if it is an
> > underscore. So it should still work with underscore-less fields.
>
> I see. I think it started working once I added the underline prefix but
> I might be wrong. Let me see what Mathieu says if I may leave that
> prefix out.
>
> > Cheers,
> > Alexandre
> Sebastian

2014-11-14 15:51:24

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC 0/5] perf tools: Add perf data CTF conversion

----- Original Message -----
> From: "Sebastian Andrzej Siewior" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Alexandre Montplaisir" <[email protected]>, "Jiri Olsa" <[email protected]>, [email protected],
> "Dominique Toupin" <[email protected]>, "Tom Zanussi" <[email protected]>, "Jeremie Galarneau"
> <[email protected]>, "David Ahern" <[email protected]>, "Arnaldo Carvalho de Melo" <[email protected]>
> Sent: Thursday, November 13, 2014 2:24:51 PM
> Subject: Re: [RFC 0/5] perf tools: Add perf data CTF conversion
>
> On 11/05/2014 06:21 PM, Mathieu Desnoyers wrote:
> > A very good example is the semantic of the sched_wakeup event. It has
> > changed due to scheduler code modification, and is now called from an
> > IPI context, which changes its semantic (not called from the same
> > PID). Unfortunately, there is little we can do besides checking the
> > kernel version to detect the semantic change from the trace viewer
> > side, because neither the event nor the field names have changed.
> >
> > The trace viewer could therefore care about the following information
> > to identify the semantic of a trace:
> >
> > - Tracer name (e.g. lttng or perf),
> > - Domain (e.g. kernel or userspace),
> > - Tracepoint versioning (e.g. kernel version for Perf).
>
> this sounds reasonable. That means for "domain" I switch to kernel from
> kernel-perf that I am using now. And then I need to add tracer_name.

Yes,

>
> > In summary, for perf it would be really easy: just repeat the
> > kernel version in a new attribute attached to each event in the
> > metadata. For LTTng we would have the flexibility to have our own
> > version numbers in there. This would also cover the case of
> > userspace tracing, allowing each application to advertise their
> > tracepoint provider semantic changes through versioning.
>
> So what you are saying is that I need something like:
>
> event {
> name = "sched:sched_process_fork";
> id = 1;
> stream_id = 0;
> => version = "3.16";
> fields := struct {
> integer { … } perf_ip;
> integer { … } perf_tid;
> …
> } align(8);
> };
>
> where the line marked "=>" is that one I should add.

Typically we don't use strings for this. This makes it
easier for trace analysis to check on version ranges.
We should also define a clear semantic for what constitutes
compatible versions.

We do have an issue here through. We've had various cases
in the past where commits that changed the event layout
or semantic were backported to kernel stable versions
(e.g. between a x.y.0 and x.y.1 kernel).

There is also the question of distribution vendor kernels
to consider, where some backport commits without kernel
patchlevel increments.

The more I look into this problem, the more I start thinking
that we might want to add fields to TRACE_EVENT that specify
the major/minor version of the event per se. If the content
of existing event fields change, we bump the major number. If
new fields are added to the event, but the semantic of all
existing fields stay the same, we bump the minor number.

This would make it really easy for trace viewers to track
event semantic changes, without ending up with a mess of
incompatible traces generated by kernels with same version
but behaving differently due to stable kernels and
distribution backports.

>
> >>> Right now, we only define LTTng event and field names:
> >>> http://git.eclipse.org/c/tracecompass/org.eclipse.tracecompass.git/tree/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/internal/lttng2/kernel/core/LttngStrings.java
> >>
> >> Okay. So I found this file for linuxtools now let me try tracecompass.
> >> The basic renaming should do the job. Then I have to figure out how to
> >> compile this thingy…
> >>
> >> There is this one thing where you go for "tid" while perf says "pid". I
> >> guess I could figure that out once I have the rename done.
> >
> > LTTng uses the semantic presented to user-space to identify threads and
> > processes. What you find in /proc is what you find in a LTTng trace. The
> > tracepoint semantic used by perf and ftrace uses the kernel-internal
> > meaning of pid = thread ID, pgid = process ID, which differs from what is
> > visible from user-space.
> >
> > I guess it's up to you to decide if you want to stick to the
> > kernel-internal
> > semantic, or switch to the user-visible (/proc) semantic for perf traces.
>
> I am happy if I can record and pass unchanged perf data :)
>
> >> We don't have lttng_statedump_process_state, this look lttng specific. I
> >> would have to look if there is a replacement event in perf.
> >
> > Not that I am aware of. Perf tends to add fields to each records to keep
> > track of extra state. LTTng can also do that by dynamically attaching
> > context information, but it also supports dumping the initial system
> > state, thus allowing trace viewers to reconstruct the system state by
> > reading the trace, starting with the state dump events at the beginning.
>
> I see. So if this is really a must-have for trace compass there would
> need to be a similar event added once we start perf. But from what I
> read in Alexandre's email it is not that tragic.

Indeed, trace compass should be able to deal with "missing" info.

>
> >> For the fields, this is one event with alle the members we have. Please
> >> note that lttng saves the members with the _ prefix and I haven't seen
> >> that prefix in that .java file. The members of each event:
> >
> > Yeah, the _ prefix for event names. This is one decision I would like to
> > find a way to revert, but we'll have to live with it unfortunately for
> > CTF 1.8. The issue it's trying to fix is to allow having fields named
> > "event" that don't clash with the "event" reserved keyword. When I added
> > the _ prefix, I did it like this in the CTF spec:
> >
> > "Replacing reserved keywords with underscore-prefixed field names is
> > recommended. Fields starting with an underscore should have their leading
> > underscore removed by the CTF trace readers."
> >
> > Unfortunately, this introduces semantic corner-cases for event names that
> > would indeed start with an underscore, unless they are prefixed with
> > double-underscore in the metadata.
> >
> > So far, the only fix I see to this situation is to eventually do a
> > CTF 1.9, and add the notion of a $ prefix to the grammar (which is not
> > part of the symbols accepted for an identifier) to be used as a field
> > name prefix that ensures there is no clash with reserved keywords. I'm
> > very open to suggestions there through, and I'm really not in a hurry
> > to release a new CTF spec version (we should only do so when we have
> > a batch of changes that are required, because it will require all trace
> > readers to be updated).
>
> Aha. I haven't seen this underscore prefix in babeltrace examples so I
> wasn't aware for this. Thanks for explaining. Now should I add the
> prefix to perf by all means or is okay keep it as is?

If you can eventually have field names such as "event", "trace", or such
names that clash with existing keywords, then you should prefix at least
those field names with underscore. In LTTng, we simply prefix every field
name with underscore.

Thanks,

Mathieu

>
> > Thanks!
> >
> > Mathieu
> >
> >>> Cheers,
> >>> Alexandre
>
> Sebastian
>

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

* Alexandre Montplaisir | 2014-11-12 17:14:45 [-0500]:

>Just a quick note, this branch is now merged to master. So anyone who
>pulls the code from the master branch at
>git://git.eclipse.org/gitroot/tracecompass/org.eclipse.tracecompass.git
>should be able to load perf-CTF traces in the viewer. The trace type
>is now called "Common Trace Format -> Linux Kernel Trace" and should
>support both LTTng kernel and perf traces in CTF format (although
>auto-detection should work in most cases).

Thank you for all the work.
Let me try to reply to the emails at once here:
- I added to the environment metadata the following (comparing to the
last version):
domain => kernel
tracer_name => perf

There is no tracer_major + minor. Instead I added
version => perf's version
On my system I have:
release = "3.16.0-4-amd64";
version = "3.18.rc3.g91405a";

Because I run Debian's v3.16 and recorded the trace with perf from the
kernel 3.18.rc3.
There is no version of perf doing the convert of perf.data => ctf.
Any objections, rename of the version member?

- Mathieu decided that it makes no sense to add the kernel version to
each event we trace. Instead each event should have its own version
with a major/minor member. Once the event is changed the "ABI"
version should be adjusted. I second this since it makes sense.
Therefore there are no changes that were made to the converter.

- Alexandre (you) noticed that there are syscall names in the events
recorded via "sys_enter and sys_exit". This is true, but there is a
hint. There is an event for instance:

[03:37:07.579969498] (+?.?????????) raw_syscalls:sys_enter: { cpu_id = 2 }, { perf_ip = 0xFFFFFFFF81020EBC, perf_tid = 30004, perf_pid = 30004, perf_id = 382, perf_period = 1, common_type = 76, common_flags = 0, common_preempt_count = 0, common_pid = 30004, id = 16, args = [ [0] = 0xE, [1] = 0x2400, [2] = 0x0, [3] = 0x0, [4] = 0xA20F00, [5] = 0xA1FDA0 ] }

By the end you notice id=16 and args. args are the Arguments passed
to syscall and id is the syscall number. Together with machine =
x86_64 you know which architecture you need to lookup the number 16.
The numbers are from unistd.h (and may be different between architectures,
even between i386 & x86_64). strace has for instance the following [0] table.

{ 3, TD, sys_read, "read" }, /* 0 */

{ 3, TD, sys_ioctl, "ioctl" }, /* 16 */


So 16 is ioctl. strace has those tables for a bunch of architectures
so it might be helpful to suck them in. I know no other way to ease
things here.

[0] https://github.com/bnoordhuis/strace/blob/master/linux/x86_64/syscallent.h

The same thing is true for softirq_entry events for instance. This event
will give you you only vec=9 and you need to lookup that 9 => RCU. That
one is easy however:

const char * const softirq_to_name[NR_SOFTIRQS] = {
"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
"TASKLET", "SCHED", "HRTIMER", "RCU"
};

this has been taken from kernel/softirq.c.

>This was based on the most recent file format I was aware of, we will
>update it accordingly if required.
>
>Testing welcome!

I pushed the perf changes I mentioned to

git://git.breakpoint.cc/bigeasy/linux.git ctf_convert_7

It is now based on Arnaldo's perf/core. If everything goes well from
the compass side and nobody complains here in any way, the next step
would be to present the patches on the mailing list and compass as a
user.

I took you tree and added the patch below. I uploaded the following
files to https://breakpoint.cc/perf-ctf/:
- ctf-out6.tar.xz from perf6.data.xz
shows nothing

- ctf-out7.tar.xz from perf7.data.xz
shows something

The only obvious difference is the size of the CTF data. The size of out6
is almost 300MiB and it contains 3,259,929 events. The out7 is has only
15MiB and contains 152,900 events.

>Cheers,
>Alexandre



>From 7ffa619d918f2010046b391ae29063ffc5329468 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <[email protected]>
Date: Wed, 26 Nov 2014 18:04:53 +0100
Subject: [PATCH] CTF: use tracer_name for perf-CTF traces

domain will be set to kernel for both, perf and lttng traces. The
tracer_name will be set to perf if the trace is generated by perf and
otherwise lttng-modules if created by thet lttng tool.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
.../tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java b/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
index a58269f..03a09b9 100644
--- a/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
+++ b/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
@@ -96,12 +96,11 @@ public class LttngKernelTrace extends CtfTmfTrace {
* metadata
*/
Map<String, String> traceEnv = this.getCTFTrace().getEnvironment();
- String domain = traceEnv.get("domain"); //$NON-NLS-1$
String tracerName = traceEnv.get("tracer_name"); //$NON-NLS-1$
String tracerMajor = traceEnv.get("tracer_major"); //$NON-NLS-1$
String tracerMinor = traceEnv.get("tracer_minor"); //$NON-NLS-1$

- if ("\"kernel-perf\"".equals(domain)) { //$NON-NLS-1$
+ if ("\"perf\"".equals(tracerName)) { //$NON-NLS-1$
fOriginTracer = OriginTracer.PERF;

} else if ("\"lttng-modules\"".equals(tracerName) && //$NON-NLS-1$
@@ -130,7 +129,7 @@ public class LttngKernelTrace extends CtfTmfTrace {
CTFTrace temp = new CTFTrace(path);
/* Make sure the domain is "kernel" in the trace's env vars */
String dom = temp.getEnvironment().get("domain"); //$NON-NLS-1$
- if (dom != null && dom.startsWith("\"kernel")) { //$NON-NLS-1$
+ if (dom != null && dom.equals("\"kernel\"")) { //$NON-NLS-1$
return new TraceValidationStatus(CONFIDENCE, Activator.PLUGIN_ID);
}
return new Status(IStatus.ERROR, Activator.PLUGIN_ID, Messages.LttngKernelTrace_DomainError);
--
2.1.3

Sebastian

2014-11-27 04:28:05

by Alexandre Montplaisir

[permalink] [raw]
Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

On 2014-11-26 12:37 PM, Sebastian Andrzej Siewior wrote:
> * Alexandre Montplaisir | 2014-11-12 17:14:45 [-0500]:
>
>> Just a quick note, this branch is now merged to master. So anyone who
>> pulls the code from the master branch at
>> git://git.eclipse.org/gitroot/tracecompass/org.eclipse.tracecompass.git
>> should be able to load perf-CTF traces in the viewer. The trace type
>> is now called "Common Trace Format -> Linux Kernel Trace" and should
>> support both LTTng kernel and perf traces in CTF format (although
>> auto-detection should work in most cases).
> Thank you for all the work.
> Let me try to reply to the emails at once here:
> - I added to the environment metadata the following (comparing to the
> last version):
> domain => kernel
> tracer_name => perf
>
> There is no tracer_major + minor. Instead I added
> version => perf's version
> On my system I have:
> release = "3.16.0-4-amd64";
> version = "3.18.rc3.g91405a";
>
> Because I run Debian's v3.16 and recorded the trace with perf from the
> kernel 3.18.rc3.
> There is no version of perf doing the convert of perf.data => ctf.
> Any objections, rename of the version member?
>
> - Mathieu decided that it makes no sense to add the kernel version to
> each event we trace. Instead each event should have its own version
> with a major/minor member. Once the event is changed the "ABI"
> version should be adjusted. I second this since it makes sense.
> Therefore there are no changes that were made to the converter.
>
> - Alexandre (you) noticed that there are syscall names in the events
> recorded via "sys_enter and sys_exit". This is true, but there is a
> hint. There is an event for instance:
>
> [03:37:07.579969498] (+?.?????????) raw_syscalls:sys_enter: { cpu_id = 2 }, { perf_ip = 0xFFFFFFFF81020EBC, perf_tid = 30004, perf_pid = 30004, perf_id = 382, perf_period = 1, common_type = 76, common_flags = 0, common_preempt_count = 0, common_pid = 30004, id = 16, args = [ [0] = 0xE, [1] = 0x2400, [2] = 0x0, [3] = 0x0, [4] = 0xA20F00, [5] = 0xA1FDA0 ] }

Oh ok, so this "id" field is really the system call ID then. Good to
know, thanks!

>
> By the end you notice id=16 and args. args are the Arguments passed
> to syscall and id is the syscall number. Together with machine =
> x86_64 you know which architecture you need to lookup the number 16.
> The numbers are from unistd.h (and may be different between architectures,
> even between i386 & x86_64). strace has for instance the following [0] table.
>
> { 3, TD, sys_read, "read" }, /* 0 */
> …
> { 3, TD, sys_ioctl, "ioctl" }, /* 16 */
> …
>
> So 16 is ioctl. strace has those tables for a bunch of architectures
> so it might be helpful to suck them in. I know no other way to ease
> things here.

Indeed. Well this information could be part of the trace metadata too,
but I guess that wouldn't be very practical.
We'll just need to add a way for each supported tracer to advertize how
it gets its system call names.

>
> [0] https://github.com/bnoordhuis/strace/blob/master/linux/x86_64/syscallent.h
>
> The same thing is true for softirq_entry events for instance. This event
> will give you you only vec=9 and you need to lookup that 9 => RCU. That
> one is easy however:
>
> const char * const softirq_to_name[NR_SOFTIRQS] = {
> "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
> "TASKLET", "SCHED", "HRTIMER", "RCU"
> };
>
> this has been taken from kernel/softirq.c.

Oh, that's right, we never got around to getting/showing the actual
names of the soft IRQs. Thanks for reminding us. ;)

>> This was based on the most recent file format I was aware of, we will
>> update it accordingly if required.
>>
>> Testing welcome!
> I pushed the perf changes I mentioned to
>
> git://git.breakpoint.cc/bigeasy/linux.git ctf_convert_7
>
> It is now based on Arnaldo's perf/core. If everything goes well from
> the compass side and nobody complains here in any way, the next step
> would be to present the patches on the mailing list and compass as a
> user.
>
> I took you tree and added the patch below. I uploaded the following
> files to https://breakpoint.cc/perf-ctf/:
> - ctf-out6.tar.xz from perf6.data.xz
> shows nothing

Hmm, indeed it throws exceptions in the console when trying to validate
the trace. It seems to read a packet size in perf_stream_0 as a negative
value. Babeltrace handles it fine though, so we're probably reading it
wrong. We'll investigate.

Cheers,
Alexandre

>
> - ctf-out7.tar.xz from perf7.data.xz
> shows something
>
> The only obvious difference is the size of the CTF data. The size of out6
> is almost 300MiB and it contains 3,259,929 events. The out7 is has only
> 15MiB and contains 152,900 events.
>
>> Cheers,
>> Alexandre
> ✂
>
> From 7ffa619d918f2010046b391ae29063ffc5329468 Mon Sep 17 00:00:00 2001
> From: Sebastian Andrzej Siewior <[email protected]>
> Date: Wed, 26 Nov 2014 18:04:53 +0100
> Subject: [PATCH] CTF: use tracer_name for perf-CTF traces
>
> domain will be set to kernel for both, perf and lttng traces. The
> tracer_name will be set to perf if the trace is generated by perf and
> otherwise lttng-modules if created by thet lttng tool.
>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> .../tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java b/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
> index a58269f..03a09b9 100644
> --- a/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
> +++ b/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
> @@ -96,12 +96,11 @@ public class LttngKernelTrace extends CtfTmfTrace {
> * metadata
> */
> Map<String, String> traceEnv = this.getCTFTrace().getEnvironment();
> - String domain = traceEnv.get("domain"); //$NON-NLS-1$
> String tracerName = traceEnv.get("tracer_name"); //$NON-NLS-1$
> String tracerMajor = traceEnv.get("tracer_major"); //$NON-NLS-1$
> String tracerMinor = traceEnv.get("tracer_minor"); //$NON-NLS-1$
>
> - if ("\"kernel-perf\"".equals(domain)) { //$NON-NLS-1$
> + if ("\"perf\"".equals(tracerName)) { //$NON-NLS-1$
> fOriginTracer = OriginTracer.PERF;
>
> } else if ("\"lttng-modules\"".equals(tracerName) && //$NON-NLS-1$
> @@ -130,7 +129,7 @@ public class LttngKernelTrace extends CtfTmfTrace {
> CTFTrace temp = new CTFTrace(path);
> /* Make sure the domain is "kernel" in the trace's env vars */
> String dom = temp.getEnvironment().get("domain"); //$NON-NLS-1$
> - if (dom != null && dom.startsWith("\"kernel")) { //$NON-NLS-1$
> + if (dom != null && dom.equals("\"kernel\"")) { //$NON-NLS-1$
> return new TraceValidationStatus(CONFIDENCE, Activator.PLUGIN_ID);
> }
> return new Status(IStatus.ERROR, Activator.PLUGIN_ID, Messages.LttngKernelTrace_DomainError);

2014-11-27 15:43:41

by Jiri Olsa

[permalink] [raw]
Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

On Wed, Nov 12, 2014 at 05:14:45PM -0500, Alexandre Montplaisir wrote:
>
> On 11/09/2014 08:31 PM, Alexandre Montplaisir wrote:
> >On 2014-11-05 10:25 PM, Alexandre Montplaisir wrote:
> >>
> >>>
> >>>>But if you could for example tell me the perf equivalents of all the
> >>>>strings in that file, I could hack together such wrapper. With that,
> >>>>in theory, perf traces should behave exactly the same as LTTng traces
> >>>>in the viewer!
> >>>Oooh, that would be awesome. So I installed maven but didn't get much
> >>>further. Let me gather this for you.
> >>
> >>Awesome, thanks!
> >>
> >>I am travelling this week, so I'm a bit busy, but I will try to
> >>prototype a "wrapper" for the kernel analysis, and adding support for
> >>the perf events, whenever I have a chance. I'll keep you posted.
> >
> >Ok, some good news!
> >
> >I managed to get the CTF traces from perf working in Trace Compass! See
> >attached screenshots. This is showing the "ctf-out2" trace from your
> >previous email. The other trace seems to have less events enabled, so it
> >would only show some WAIT_FOR_CPU states in the view.
> >
> >If anybody wishes to try it, you can grab the whole branch ending at
> >https://git.eclipse.org/r/#/c/36200/ . Or run:
> >$ git fetch
> >git://git.eclipse.org/gitroot/tracecompass/org.eclipse.tracecompass
> >refs/changes/00/36200/3 && git checkout FETCH_HEAD
>
> Just a quick note, this branch is now merged to master. So anyone who pulls
> the code from the master branch at
> git://git.eclipse.org/gitroot/tracecompass/org.eclipse.tracecompass.git
> should be able to load perf-CTF traces in the viewer. The trace type is now
> called "Common Trace Format -> Linux Kernel Trace" and should support both
> LTTng kernel and perf traces in CTF format (although auto-detection should
> work in most cases).
>
> This was based on the most recent file format I was aware of, we will update
> it accordingly if required.
>
> Testing welcome!

hi,
any other way besides compiling eclipse to test this? For pure mortals
with Fedora eclipse rpm.. ;-)

or any instructions for the compilation.. I actually haven't checked yet

thanks,
jirka

Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

On 11/27/2014 04:43 PM, Jiri Olsa wrote:

> hi,
> any other way besides compiling eclipse to test this? For pure mortals
> with Fedora eclipse rpm.. ;-)
>
> or any instructions for the compilation.. I actually haven't checked yet

| mvn clean install -Pbuild-rcp -Dmaven.test.skip=true

does the trick. It took a while to complete especially since it sucked
some jars at 30KiB/sec

That archive at
https://breakpoint.cc/perf-ctf/trace-compass-0.1.0-20141126-1744-linux.gtk.x86_64.tar.xz

contains the standalone application "tracecompass" without eclipse. It
contains also the patch I quoted in the thread.

> thanks,
> jirka

Sebastian

2014-11-27 18:31:39

by Alexandre Montplaisir

[permalink] [raw]
Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)


On 11/27/2014 10:43 AM, Jiri Olsa wrote:
> On Wed, Nov 12, 2014 at 05:14:45PM -0500, Alexandre Montplaisir wrote:
>>
>> Testing welcome!
> hi,
> any other way besides compiling eclipse to test this? For pure mortals
> with Fedora eclipse rpm.. ;-)

If you already have an Eclipse installation, you can use the update site:
http://download.eclipse.org/tracecompass/master/nightly/
This would install the plugins into that Eclipse install.

We will also put nightly builds of the stand-alone version on the
download page Very Soon(TM). We just have a few issues left to figure out.

Cheers,
Alexandre

2014-11-28 09:33:12

by Jiri Olsa

[permalink] [raw]
Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

On Thu, Nov 27, 2014 at 01:31:38PM -0500, Alexandre Montplaisir wrote:
>
> On 11/27/2014 10:43 AM, Jiri Olsa wrote:
> >On Wed, Nov 12, 2014 at 05:14:45PM -0500, Alexandre Montplaisir wrote:
> >>
> >>Testing welcome!
> >hi,
> >any other way besides compiling eclipse to test this? For pure mortals
> >with Fedora eclipse rpm.. ;-)
>
> If you already have an Eclipse installation, you can use the update site:
> http://download.eclipse.org/tracecompass/master/nightly/
> This would install the plugins into that Eclipse install.
>
> We will also put nightly builds of the stand-alone version on the download
> page Very Soon(TM). We just have a few issues left to figure out.

cool, I'll check

thanks,
jirka

2014-11-28 11:27:23

by Jiri Olsa

[permalink] [raw]
Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

On Wed, Nov 26, 2014 at 06:37:21PM +0100, Sebastian Andrzej Siewior wrote:
> * Alexandre Montplaisir | 2014-11-12 17:14:45 [-0500]:

SNIP

>
> >This was based on the most recent file format I was aware of, we will
> >update it accordingly if required.
> >
> >Testing welcome!
>
> I pushed the perf changes I mentioned to
>
> git://git.breakpoint.cc/bigeasy/linux.git ctf_convert_7
>

from perf data convert point of view (and libbabeltrace).. ;-)

I tried to convert big perf.data (2GB) and my laptop got stuck
for few minutes.. the reason is that perf allocated too much memory

I needed to add occasional bt_ctf_stream_flush into the
processing (patch attached)

What I do now is checking if we processed given amount of events
for the stream and once the value is crossed I flush it.

My question is if there's a way to find out the allocated memory
for the stream? It'd be nicer to setup maximum allocation size
rather than the number of events.

thanks,
jirka



diff --git a/tools/perf/util/data-bt.c b/tools/perf/util/data-bt.c
index c5720e13d8f1..981e8ff2c32a 100644
--- a/tools/perf/util/data-bt.c
+++ b/tools/perf/util/data-bt.c
@@ -39,10 +39,15 @@ struct evsel_priv {

#define MAX_CPUS 4096

+struct ctf_stream {
+ struct bt_ctf_stream *stream;
+ u64 count;
+};
+
struct ctf_writer {
/* writer primitives */
struct bt_ctf_writer *writer;
- struct bt_ctf_stream *stream[MAX_CPUS];
+ struct ctf_stream *stream[MAX_CPUS];
struct bt_ctf_stream_class *stream_class;
struct bt_ctf_clock *clock;
u32 last_cpu;
@@ -377,6 +382,73 @@ static int add_generic_values(struct ctf_writer *cw,
return 0;
}

+#define STREAM_FLUSH_COUNT 1000
+
+static bool is_flush_needed(struct ctf_stream *cstream)
+{
+ return cstream && (cstream->count >= STREAM_FLUSH_COUNT);
+}
+
+static int flush_stream(struct ctf_writer *cw, int cpu)
+{
+ struct ctf_stream *cstream = cw->stream[cpu];
+ int err = 0;
+
+ if (cstream) {
+ err = bt_ctf_stream_flush(cstream->stream);
+ if (err)
+ pr_err("CTF stream %d flush failed\n", cpu);
+
+ cstream->count = 0;
+ }
+
+ return err;
+}
+
+static int create_stream(struct ctf_writer *cw, int cpu)
+{
+ struct ctf_stream *cstream;
+ struct bt_ctf_field *pkt_ctx;
+ struct bt_ctf_field *cpu_field;
+ struct bt_ctf_stream *stream;
+ u32 i = cpu;
+ int ret;
+
+ cstream = zalloc(sizeof(*cstream));
+ if (!cstream) {
+ pr_err("Failed to allocate ctf stream\n");
+ return -1;
+ }
+
+ stream = bt_ctf_writer_create_stream(cw->writer, cw->stream_class);
+ if (!stream)
+ return -1;
+
+ pkt_ctx = bt_ctf_stream_get_packet_context(stream);
+ if (!pkt_ctx) {
+ pr_err("Failed to obtain packet context\n");
+ return -1;
+ }
+
+ cpu_field = bt_ctf_field_structure_get_field(pkt_ctx, "cpu_id");
+ bt_ctf_field_put(pkt_ctx);
+ if (!cpu_field) {
+ pr_err("Failed to obtain cpu field\n");
+ return -1;
+ }
+
+ ret = bt_ctf_field_unsigned_integer_set_value(cpu_field, i);
+ if (ret) {
+ pr_err("Failed to update CPU number\n");
+ return -1;
+ }
+ bt_ctf_field_put(cpu_field);
+
+ cstream->stream = stream;
+ cw->stream[i] = cstream;
+ return 0;
+}
+
static int process_sample_event(struct perf_tool *tool,
union perf_event *_event __maybe_unused,
struct perf_sample *sample,
@@ -431,40 +503,14 @@ static int process_sample_event(struct perf_tool *tool,
cpu = 0;
}

- if (!cw->stream[cpu]) {
- struct bt_ctf_field *pkt_ctx;
- struct bt_ctf_field *cpu_field;
- struct bt_ctf_stream *stream;
- u32 i = sample->cpu;
-
- stream = bt_ctf_writer_create_stream(cw->writer, cw->stream_class);
- if (!stream)
- return -1;
-
- pkt_ctx = bt_ctf_stream_get_packet_context(stream);
- if (!pkt_ctx) {
- pr_err("Failed to obtain packet context\n");
- return -1;
- }
-
- cpu_field = bt_ctf_field_structure_get_field(pkt_ctx, "cpu_id");
- bt_ctf_field_put(pkt_ctx);
- if (!cpu_field) {
- pr_err("Failed to obtain cpu field\n");
- return -1;
- }
+ if (!cw->stream[cpu] && create_stream(cw, cpu))
+ return -1;

- ret = bt_ctf_field_unsigned_integer_set_value(cpu_field, i);
- if (ret) {
- pr_err("Failed to update CPU number\n");
- return -1;
- }
- bt_ctf_field_put(cpu_field);
+ if (is_flush_needed(cw->stream[cpu]))
+ flush_stream(cw, cpu);

- cw->stream[i] = stream;
- }
-
- bt_ctf_stream_append_event(cw->stream[cpu], event);
+ cw->stream[cpu]->count++;
+ bt_ctf_stream_append_event(cw->stream[cpu]->stream, event);
bt_ctf_event_put(event);
return 0;
}
@@ -745,8 +791,12 @@ static void ctf_writer__cleanup(struct ctf_writer *cw)
ctf_writer__cleanup_data(cw);

bt_ctf_clock_put(cw->clock);
- for (i = 0; i < MAX_CPUS; i++)
- bt_ctf_stream_put(cw->stream[i]);
+ for (i = 0; i < MAX_CPUS; i++) {
+ if (cw->stream[i]) {
+ bt_ctf_stream_put(cw->stream[i]->stream);
+ free(cw->stream[i]);
+ }
+ }
bt_ctf_stream_class_put(cw->stream_class);
bt_ctf_writer_put(cw->writer);

@@ -862,6 +912,9 @@ int bt_convert__perf2ctf(const char *input, const char *path)
if (!session)
goto free_writer;

+ ordered_events__set_alloc_size(&session->ordered_events,
+ 100*1024*1024);
+
/* CTF writer env/clock setup */
if (ctf_writer__setup_env(cw, session))
goto free_session;
@@ -872,15 +925,10 @@ int bt_convert__perf2ctf(const char *input, const char *path)

err = perf_session__process_events(session, &c.tool);
if (!err) {
- int i;
+ int cpu;

- for (i = 0; i < MAX_CPUS; i++) {
- if (cw->stream[i]) {
- err = bt_ctf_stream_flush(cw->stream[i]);
- if (err)
- pr_err("CTF stream %d flush failed\n", i);
- }
- }
+ for (cpu = 0; cpu < MAX_CPUS; cpu++)
+ flush_stream(cw, cpu);
}

fprintf(stderr,

2014-11-29 09:36:19

by Jerome CORRENOZ

[permalink] [raw]
Subject: RE: [tracecompass-dev] Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

Hi Alexandre,

Catching with this thread of email: could you please share some links to documentation/screenshot about perf support with trace compass ?

Regards,
Jerome

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Alexandre Montplaisir
Sent: Thursday, November 27, 2014 5:28 AM
To: Sebastian Andrzej Siewior
Cc: Jeremie Galarneau; [email protected]; Arnaldo Carvalho de Melo; Tom Zanussi; Mathieu Desnoyers; David Ahern; Dominique Toupin; Trace Compass Developer Discussions; Jiri Olsa; [email protected]
Subject: Re: [tracecompass-dev] Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

On 2014-11-26 12:37 PM, Sebastian Andrzej Siewior wrote:
> * Alexandre Montplaisir | 2014-11-12 17:14:45 [-0500]:
>
>> Just a quick note, this branch is now merged to master. So anyone who
>> pulls the code from the master branch at
>> git://git.eclipse.org/gitroot/tracecompass/org.eclipse.tracecompass.g
>> it should be able to load perf-CTF traces in the viewer. The trace
>> type is now called "Common Trace Format -> Linux Kernel Trace" and
>> should support both LTTng kernel and perf traces in CTF format
>> (although auto-detection should work in most cases).
> Thank you for all the work.
> Let me try to reply to the emails at once here:
> - I added to the environment metadata the following (comparing to the
> last version):
> domain => kernel
> tracer_name => perf
>
> There is no tracer_major + minor. Instead I added
> version => perf's version
> On my system I have:
> release = "3.16.0-4-amd64";
> version = "3.18.rc3.g91405a";
>
> Because I run Debian's v3.16 and recorded the trace with perf from the
> kernel 3.18.rc3.
> There is no version of perf doing the convert of perf.data => ctf.
> Any objections, rename of the version member?
>
> - Mathieu decided that it makes no sense to add the kernel version to
> each event we trace. Instead each event should have its own version
> with a major/minor member. Once the event is changed the "ABI"
> version should be adjusted. I second this since it makes sense.
> Therefore there are no changes that were made to the converter.
>
> - Alexandre (you) noticed that there are syscall names in the events
> recorded via "sys_enter and sys_exit". This is true, but there is a
> hint. There is an event for instance:
>
> [03:37:07.579969498] (+?.?????????) raw_syscalls:sys_enter: { cpu_id =
> 2 }, { perf_ip = 0xFFFFFFFF81020EBC, perf_tid = 30004, perf_pid =
> 30004, perf_id = 382, perf_period = 1, common_type = 76, common_flags
> = 0, common_preempt_count = 0, common_pid = 30004, id = 16, args = [
> [0] = 0xE, [1] = 0x2400, [2] = 0x0, [3] = 0x0, [4] = 0xA20F00, [5] =
> 0xA1FDA0 ] }

Oh ok, so this "id" field is really the system call ID then. Good to know, thanks!

>
> By the end you notice id=16 and args. args are the Arguments passed
> to syscall and id is the syscall number. Together with machine =
> x86_64 you know which architecture you need to lookup the number 16.
> The numbers are from unistd.h (and may be different between architectures,
> even between i386 & x86_64). strace has for instance the following [0] table.
>
> { 3, TD, sys_read, "read" }, /* 0 */
> …
> { 3, TD, sys_ioctl, "ioctl" }, /* 16 */
> …
>
> So 16 is ioctl. strace has those tables for a bunch of architectures
> so it might be helpful to suck them in. I know no other way to ease
> things here.

Indeed. Well this information could be part of the trace metadata too, but I guess that wouldn't be very practical.
We'll just need to add a way for each supported tracer to advertize how it gets its system call names.

>
> [0] https://github.com/bnoordhuis/strace/blob/master/linux/x86_64/syscallent.h
>
> The same thing is true for softirq_entry events for instance. This event
> will give you you only vec=9 and you need to lookup that 9 => RCU. That
> one is easy however:
>
> const char * const softirq_to_name[NR_SOFTIRQS] = {
> "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
> "TASKLET", "SCHED", "HRTIMER", "RCU"
> };
>
> this has been taken from kernel/softirq.c.

Oh, that's right, we never got around to getting/showing the actual
names of the soft IRQs. Thanks for reminding us. ;)

>> This was based on the most recent file format I was aware of, we will
>> update it accordingly if required.
>>
>> Testing welcome!
> I pushed the perf changes I mentioned to
>
> git://git.breakpoint.cc/bigeasy/linux.git ctf_convert_7
>
> It is now based on Arnaldo's perf/core. If everything goes well from
> the compass side and nobody complains here in any way, the next step
> would be to present the patches on the mailing list and compass as a
> user.
>
> I took you tree and added the patch below. I uploaded the following
> files to https://breakpoint.cc/perf-ctf/:
> - ctf-out6.tar.xz from perf6.data.xz
> shows nothing

Hmm, indeed it throws exceptions in the console when trying to validate
the trace. It seems to read a packet size in perf_stream_0 as a negative
value. Babeltrace handles it fine though, so we're probably reading it
wrong. We'll investigate.

Cheers,
Alexandre

>
> - ctf-out7.tar.xz from perf7.data.xz
> shows something
>
> The only obvious difference is the size of the CTF data. The size of out6
> is almost 300MiB and it contains 3,259,929 events. The out7 is has only
> 15MiB and contains 152,900 events.
>
>> Cheers,
>> Alexandre
> ✂
>
> From 7ffa619d918f2010046b391ae29063ffc5329468 Mon Sep 17 00:00:00 2001
> From: Sebastian Andrzej Siewior <[email protected]>
> Date: Wed, 26 Nov 2014 18:04:53 +0100
> Subject: [PATCH] CTF: use tracer_name for perf-CTF traces
>
> domain will be set to kernel for both, perf and lttng traces. The
> tracer_name will be set to perf if the trace is generated by perf and
> otherwise lttng-modules if created by thet lttng tool.
>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> .../tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java b/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
> index a58269f..03a09b9 100644
> --- a/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
> +++ b/org.eclipse.tracecompass.lttng2.kernel.core/src/org/eclipse/tracecompass/lttng2/kernel/core/trace/LttngKernelTrace.java
> @@ -96,12 +96,11 @@ public class LttngKernelTrace extends CtfTmfTrace {
> * metadata
> */
> Map<String, String> traceEnv = this.getCTFTrace().getEnvironment();
> - String domain = traceEnv.get("domain"); //$NON-NLS-1$
> String tracerName = traceEnv.get("tracer_name"); //$NON-NLS-1$
> String tracerMajor = traceEnv.get("tracer_major"); //$NON-NLS-1$
> String tracerMinor = traceEnv.get("tracer_minor"); //$NON-NLS-1$
>
> - if ("\"kernel-perf\"".equals(domain)) { //$NON-NLS-1$
> + if ("\"perf\"".equals(tracerName)) { //$NON-NLS-1$
> fOriginTracer = OriginTracer.PERF;
>
> } else if ("\"lttng-modules\"".equals(tracerName) && //$NON-NLS-1$
> @@ -130,7 +129,7 @@ public class LttngKernelTrace extends CtfTmfTrace {
> CTFTrace temp = new CTFTrace(path);
> /* Make sure the domain is "kernel" in the trace's env vars */
> String dom = temp.getEnvironment().get("domain"); //$NON-NLS-1$
> - if (dom != null && dom.startsWith("\"kernel")) { //$NON-NLS-1$
> + if (dom != null && dom.equals("\"kernel\"")) { //$NON-NLS-1$
> return new TraceValidationStatus(CONFIDENCE, Activator.PLUGIN_ID);
> }
> return new Status(IStatus.ERROR, Activator.PLUGIN_ID, Messages.LttngKernelTrace_DomainError);

_______________________________________________
tracecompass-dev mailing list
[email protected]
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/tracecompass-dev
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-12-01 17:28:14

by Jérémie Galarneau

[permalink] [raw]
Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

On Fri, Nov 28, 2014 at 6:26 AM, Jiri Olsa <[email protected]> wrote:
> On Wed, Nov 26, 2014 at 06:37:21PM +0100, Sebastian Andrzej Siewior wrote:
>> * Alexandre Montplaisir | 2014-11-12 17:14:45 [-0500]:
>
> SNIP
>
>>
>> >This was based on the most recent file format I was aware of, we will
>> >update it accordingly if required.
>> >
>> >Testing welcome!
>>
>> I pushed the perf changes I mentioned to
>>
>> git://git.breakpoint.cc/bigeasy/linux.git ctf_convert_7
>>
>
> from perf data convert point of view (and libbabeltrace).. ;-)
>
> I tried to convert big perf.data (2GB) and my laptop got stuck
> for few minutes.. the reason is that perf allocated too much memory
>
> I needed to add occasional bt_ctf_stream_flush into the
> processing (patch attached)
>
> What I do now is checking if we processed given amount of events
> for the stream and once the value is crossed I flush it.
>
> My question is if there's a way to find out the allocated memory
> for the stream? It'd be nicer to setup maximum allocation size
> rather than the number of events.
>

Not currently, although I agree that this would be a nice feature.
Perhaps by setting a maximal packet size, CTF Writer could handle the
automatic flushing.

Thoughts?

Jérémie

> thanks,
> jirka
>
>
>
> diff --git a/tools/perf/util/data-bt.c b/tools/perf/util/data-bt.c
> index c5720e13d8f1..981e8ff2c32a 100644
> --- a/tools/perf/util/data-bt.c
> +++ b/tools/perf/util/data-bt.c
> @@ -39,10 +39,15 @@ struct evsel_priv {
>
> #define MAX_CPUS 4096
>
> +struct ctf_stream {
> + struct bt_ctf_stream *stream;
> + u64 count;
> +};
> +
> struct ctf_writer {
> /* writer primitives */
> struct bt_ctf_writer *writer;
> - struct bt_ctf_stream *stream[MAX_CPUS];
> + struct ctf_stream *stream[MAX_CPUS];
> struct bt_ctf_stream_class *stream_class;
> struct bt_ctf_clock *clock;
> u32 last_cpu;
> @@ -377,6 +382,73 @@ static int add_generic_values(struct ctf_writer *cw,
> return 0;
> }
>
> +#define STREAM_FLUSH_COUNT 1000
> +
> +static bool is_flush_needed(struct ctf_stream *cstream)
> +{
> + return cstream && (cstream->count >= STREAM_FLUSH_COUNT);
> +}
> +
> +static int flush_stream(struct ctf_writer *cw, int cpu)
> +{
> + struct ctf_stream *cstream = cw->stream[cpu];
> + int err = 0;
> +
> + if (cstream) {
> + err = bt_ctf_stream_flush(cstream->stream);
> + if (err)
> + pr_err("CTF stream %d flush failed\n", cpu);
> +
> + cstream->count = 0;
> + }
> +
> + return err;
> +}
> +
> +static int create_stream(struct ctf_writer *cw, int cpu)
> +{
> + struct ctf_stream *cstream;
> + struct bt_ctf_field *pkt_ctx;
> + struct bt_ctf_field *cpu_field;
> + struct bt_ctf_stream *stream;
> + u32 i = cpu;
> + int ret;
> +
> + cstream = zalloc(sizeof(*cstream));
> + if (!cstream) {
> + pr_err("Failed to allocate ctf stream\n");
> + return -1;
> + }
> +
> + stream = bt_ctf_writer_create_stream(cw->writer, cw->stream_class);
> + if (!stream)
> + return -1;
> +
> + pkt_ctx = bt_ctf_stream_get_packet_context(stream);
> + if (!pkt_ctx) {
> + pr_err("Failed to obtain packet context\n");
> + return -1;
> + }
> +
> + cpu_field = bt_ctf_field_structure_get_field(pkt_ctx, "cpu_id");
> + bt_ctf_field_put(pkt_ctx);
> + if (!cpu_field) {
> + pr_err("Failed to obtain cpu field\n");
> + return -1;
> + }
> +
> + ret = bt_ctf_field_unsigned_integer_set_value(cpu_field, i);
> + if (ret) {
> + pr_err("Failed to update CPU number\n");
> + return -1;
> + }
> + bt_ctf_field_put(cpu_field);
> +
> + cstream->stream = stream;
> + cw->stream[i] = cstream;
> + return 0;
> +}
> +
> static int process_sample_event(struct perf_tool *tool,
> union perf_event *_event __maybe_unused,
> struct perf_sample *sample,
> @@ -431,40 +503,14 @@ static int process_sample_event(struct perf_tool *tool,
> cpu = 0;
> }
>
> - if (!cw->stream[cpu]) {
> - struct bt_ctf_field *pkt_ctx;
> - struct bt_ctf_field *cpu_field;
> - struct bt_ctf_stream *stream;
> - u32 i = sample->cpu;
> -
> - stream = bt_ctf_writer_create_stream(cw->writer, cw->stream_class);
> - if (!stream)
> - return -1;
> -
> - pkt_ctx = bt_ctf_stream_get_packet_context(stream);
> - if (!pkt_ctx) {
> - pr_err("Failed to obtain packet context\n");
> - return -1;
> - }
> -
> - cpu_field = bt_ctf_field_structure_get_field(pkt_ctx, "cpu_id");
> - bt_ctf_field_put(pkt_ctx);
> - if (!cpu_field) {
> - pr_err("Failed to obtain cpu field\n");
> - return -1;
> - }
> + if (!cw->stream[cpu] && create_stream(cw, cpu))
> + return -1;
>
> - ret = bt_ctf_field_unsigned_integer_set_value(cpu_field, i);
> - if (ret) {
> - pr_err("Failed to update CPU number\n");
> - return -1;
> - }
> - bt_ctf_field_put(cpu_field);
> + if (is_flush_needed(cw->stream[cpu]))
> + flush_stream(cw, cpu);
>
> - cw->stream[i] = stream;
> - }
> -
> - bt_ctf_stream_append_event(cw->stream[cpu], event);
> + cw->stream[cpu]->count++;
> + bt_ctf_stream_append_event(cw->stream[cpu]->stream, event);
> bt_ctf_event_put(event);
> return 0;
> }
> @@ -745,8 +791,12 @@ static void ctf_writer__cleanup(struct ctf_writer *cw)
> ctf_writer__cleanup_data(cw);
>
> bt_ctf_clock_put(cw->clock);
> - for (i = 0; i < MAX_CPUS; i++)
> - bt_ctf_stream_put(cw->stream[i]);
> + for (i = 0; i < MAX_CPUS; i++) {
> + if (cw->stream[i]) {
> + bt_ctf_stream_put(cw->stream[i]->stream);
> + free(cw->stream[i]);
> + }
> + }
> bt_ctf_stream_class_put(cw->stream_class);
> bt_ctf_writer_put(cw->writer);
>
> @@ -862,6 +912,9 @@ int bt_convert__perf2ctf(const char *input, const char *path)
> if (!session)
> goto free_writer;
>
> + ordered_events__set_alloc_size(&session->ordered_events,
> + 100*1024*1024);
> +
> /* CTF writer env/clock setup */
> if (ctf_writer__setup_env(cw, session))
> goto free_session;
> @@ -872,15 +925,10 @@ int bt_convert__perf2ctf(const char *input, const char *path)
>
> err = perf_session__process_events(session, &c.tool);
> if (!err) {
> - int i;
> + int cpu;
>
> - for (i = 0; i < MAX_CPUS; i++) {
> - if (cw->stream[i]) {
> - err = bt_ctf_stream_flush(cw->stream[i]);
> - if (err)
> - pr_err("CTF stream %d flush failed\n", i);
> - }
> - }
> + for (cpu = 0; cpu < MAX_CPUS; cpu++)
> + flush_stream(cw, cpu);
> }
>
> fprintf(stderr,



--
Jérémie Galarneau
EfficiOS Inc.
http://www.efficios.com

2014-12-01 20:45:08

by Jiri Olsa

[permalink] [raw]
Subject: Re: Support for Perf CTF traces now in master (was Re: FW: [RFC 0/5] perf tools: Add perf data CTF conversion)

On Mon, Dec 01, 2014 at 12:28:08PM -0500, J?r?mie Galarneau wrote:
> On Fri, Nov 28, 2014 at 6:26 AM, Jiri Olsa <[email protected]> wrote:
> > On Wed, Nov 26, 2014 at 06:37:21PM +0100, Sebastian Andrzej Siewior wrote:
> >> * Alexandre Montplaisir | 2014-11-12 17:14:45 [-0500]:
> >
> > SNIP
> >
> >>
> >> >This was based on the most recent file format I was aware of, we will
> >> >update it accordingly if required.
> >> >
> >> >Testing welcome!
> >>
> >> I pushed the perf changes I mentioned to
> >>
> >> git://git.breakpoint.cc/bigeasy/linux.git ctf_convert_7
> >>
> >
> > from perf data convert point of view (and libbabeltrace).. ;-)
> >
> > I tried to convert big perf.data (2GB) and my laptop got stuck
> > for few minutes.. the reason is that perf allocated too much memory
> >
> > I needed to add occasional bt_ctf_stream_flush into the
> > processing (patch attached)
> >
> > What I do now is checking if we processed given amount of events
> > for the stream and once the value is crossed I flush it.
> >
> > My question is if there's a way to find out the allocated memory
> > for the stream? It'd be nicer to setup maximum allocation size
> > rather than the number of events.
> >
>
> Not currently, although I agree that this would be a nice feature.
> Perhaps by setting a maximal packet size, CTF Writer could handle the
> automatic flushing.
>
> Thoughts?

hum, it looks like 2 separated things to me.. the packet size is
result format related size.. I'm more interested in the size of
the 'struct ctf_stream' within the writer during the processing

it seems it wont be trivial to track this AFAICS from sources, but
I guess I can live so far with tracking the the number of events,
until we find something else

thanks,
jirka