LinuxLists.cc - [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

2011-02-08 01:59:35

Subject: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

Ingo,

I finally finished and tested all these patches. Minor fixes were
made since the last email. I also added a couple of patches to
allow a switch between filters instead of having a small period that
the filter would either filter out all or none as it changes the
filtering.

Please pull the latest tip/perf/filter tree, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/perf/filter

Steven Rostedt (14):
tracing/filter: Have no filter return a match
tracing/filter: Move OR and AND logic out of fn() method
tracing/filter: Dynamically allocate preds
tracing/filter: Call synchronize_sched() just once for system filters
tracing/filter: Allocate the preds in an array
tracing/filter: Free pred array on disabling of filter
tracing/filter: Use a tree instead of stack for filter_match_preds()
tracing/filter: Optimize short ciruit check
tracing/filter: Check the created pred tree
tracing/filter: Optimize filter by folding the tree
tracing/filter: Move MAX_FILTER_PRED to local tracing directory
tracing/filter: Increase the max preds to 2^14
tracing/filter: Swap entire filter of events
tracing/filter: Remove synchronize_sched() from __alloc_preds()

----
include/linux/ftrace_event.h | 1 -
kernel/trace/trace.h | 38 ++-
kernel/trace/trace_events_filter.c | 885 +++++++++++++++++++++++++++++-------
3 files changed, 754 insertions(+), 170 deletions(-)

2011-02-15 04:45:01

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

hi Steve,

* Steven Rostedt <[email protected]> wrote:

> Steven Rostedt (14):
> tracing/filter: Have no filter return a match
> tracing/filter: Move OR and AND logic out of fn() method
> tracing/filter: Dynamically allocate preds
> tracing/filter: Call synchronize_sched() just once for system filters
> tracing/filter: Allocate the preds in an array
> tracing/filter: Free pred array on disabling of filter
> tracing/filter: Use a tree instead of stack for filter_match_preds()
> tracing/filter: Optimize short ciruit check
> tracing/filter: Check the created pred tree
> tracing/filter: Optimize filter by folding the tree
> tracing/filter: Move MAX_FILTER_PRED to local tracing directory
> tracing/filter: Increase the max preds to 2^14
> tracing/filter: Swap entire filter of events
> tracing/filter: Remove synchronize_sched() from __alloc_preds()

I finally got around testing the various tracing filter features we have.

Here is what i've done. Firstly, i have put my 'naive but curious user trying to
make use of filters' hat on.

I did:

perf list | grep Tracepoint | less

to get a list of tracepoints.

#
# Btw., unrelated feature request, it would be nice if the following shorcut did the
# obvious thing:
#
# perf list Tracepoint
#

I picked one of the interesting looking tracepoints:

syscalls:sys_enter_close [Tracepoint event]

first roadblock:

I had absolutely no idea how to proceed from here any further. I knew it from 'perf
list --help' that I could stick 'syscalls:sys_enter_close' into -e expressions, but
i had no idea how to utilize filter expressions at all.

I could stick it into perf stat though, and it gave me:

aldebaran:~> perf stat -a -e syscalls:sys_enter_close sleep 1

Performance counter stats for 'sleep 1':

12,240 syscalls:sys_enter_close

1.004366276 seconds time elapsed

Which is fine enough and just about right. Trying something more complex like
'firefox' seems to mis-count the number of close() syscalls though, when compared
to 'strace -c':

aldebaran:~> strace -f -q -c firefox 2>&1 | head
Error: cannot open display: :0
% time seconds usecs/call calls errors syscall
...
32.35 0.000011 0 271 4 close
...

aldebaran:~> perf stat -e syscalls:sys_enter_close firefox
Error: cannot open display: :0

Performance counter stats for 'firefox':

146 syscalls:sys_enter_close

0.048227674 seconds time elapsed

Note the 271 calls by strace, versus 146 calls by perf. One of them must be wrong -
or they are counting different things.

Using the syscalls:sys_enter_close event to filter i had to put my kernel hacker hat
on and do:

aldebaran:~> cat /debug/tracing/events/syscalls/sys_enter_close/format
name: sys_enter_close
ID: 404
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int common_lock_depth; offset:8; size:4; signed:1;

field:int nr; offset:12; size:4; signed:1;
field:unsigned int fd; offset:16; size:8; signed:0;

print fmt: "fd: 0x%08lx", ((unsigned long)(REC->fd))

And putting my kernel tracing hacker hat on i knew that the only interesting piece
of information for a filter would be the 'fd' word.

Putting my naive user hat back on and knowing the 'fd' trick, i tried to filter out
all close(10) calls from a 'hackbench' run. I knew that they existed:

aldebaran:~> strace -f ./hackbench 1 2>&1 | grep 'close(10)'
[pid 15277] close(10) = 0

So i first tried to use them in 'perf stat':

aldebaran:~> perf stat -e syscalls:sys_enter_close --filter 'fd == 10' ./hackbench 1
Error: unknown option `filter'

Ouch, no go - how come this does not work for perf stat? It makes quite a bit of
sense to use it there as well.

Anyway, knowing that it might work for 'perf record', i tried it there:

aldebaran:~> perf record -e syscalls:sys_enter_close --filter 'fd == 10' ./hackbench 1
Time: 0.079
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB perf.data (~325 samples) ]

Cool i thought, it has 325 samples, success!

Lets look at it via 'perf report' i thought:

aldebaran:~> perf report
aldebaran:~>

ouch. The TUI flickered something which went away immediately (feature request:
don't do that - at least try to mumble something about non-existent data or so).

As a naive user i had no idea whether this was a perf report bug or whether i did
something wrong.

As a kernel hacker i knew that this meant that the events in the perf.data werent
enough to display anything. As a kernel tracing hacker i knew i had to look at 'perf
report -D' output:

aldebaran:~> perf report -D | tail -15
TOTAL events: 107
MMAP events: 59
LOST events: 0
COMM events: 2
EXIT events: 3
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 40
READ events: 0
SAMPLE events: 0
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 3

ouch - no SAMPLE or TRACING_DATA events at all!

Lets try it without the filter:

aldebaran:~> perf record -e syscalls:sys_enter_close ./hackbench 1
Time: 0.043
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.010 MB perf.data (~448 samples) ]
aldebaran:~> perf report -D | tail -15
TOTAL events: 148
MMAP events: 59
LOST events: 0
COMM events: 2
EXIT events: 3
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 40
READ events: 0
SAMPLE events: 41
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 3

Got 41 SAMPLE events there.

So the --filter stuff does not seem to work at all for syscall events.

I tried my luck with another event: irq:irq_handler_entry. This seemed to work better:

aldebaran:~> perf record -a -e irq:irq_handler_entry sleep 1; perf report -D | tail -15[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.335 MB perf.data (~14630 samples) ]
TOTAL events: 4635
MMAP events: 2689
LOST events: 0
COMM events: 548
EXIT events: 64
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 31
READ events: 0
SAMPLE events: 1299
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 4

and produced a real trace:

swapper-0 [000] 49985.641036: irq_handler_entry: irq=16 name=uhci_hcd:usb3
swapper-0 [000] 49985.641038: irq_handler_entry: irq=16 name=eth2
distcc-20461 [000] 49985.641088: irq_handler_entry: irq=16 name=uhci_hcd:usb3
distcc-20461 [000] 49985.641090: irq_handler_entry: irq=16 name=eth2
swapper-0 [000] 49985.641140: irq_handler_entry: irq=16 name=uhci_hcd:usb3
swapper-0 [000] 49985.641141: irq_handler_entry: irq=16 name=eth2
swapper-0 [000] 49985.641186: irq_handler_entry: irq=16 name=uhci_hcd:usb3

So i thought i could finally test all the cool --filter stuff!

I wanted to trace the timer irq, so i tried:

aldebaran:~> perf record -a -e irq:irq_handler_entry --filter 'irq==0' sleep 1; perf report -D | tail -15
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.744 MB perf.data (~32519 samples) ]
TOTAL events: 10360
MMAP events: 6870
LOST events: 0
COMM events: 1205
EXIT events: 1332
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 656
READ events: 0
SAMPLE events: 293
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 4

I had to do the -D trick because otherwise I had no idea whether there were any REAL
samples recorded...

But 'perf script' output gave me:

cc1-27479 [000] 50100.959394: irq_handler_entry: irq=16 name=eth2
cc1-27479 [000] 50100.959636: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-27479 [000] 50100.959638: irq_handler_entry: irq=16 name=eth2
cc1-27479 [000] 50100.959711: irq_handler_entry: irq=0 name=timer
cc1-27479 [000] 50100.959880: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-27479 [000] 50100.959883: irq_handler_entry: irq=16 name=eth2

Hm, filtering didnt work. Lets try to filter only that irq 16 thing then:

aldebaran:~> perf record -a -e irq:irq_handler_entry --filter 'irq==16' sleep 1; perf report -D | tail -15
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.299 MB perf.data (~13053 samples) ]
TOTAL events: 4118
MMAP events: 2993
LOST events: 0
COMM events: 589
EXIT events: 144
THROTTLE events: 0
UNTHROTTLE events: 0
FORK events: 73
READ events: 0
SAMPLE events: 314
ATTR events: 0
EVENT_TYPE events: 0
TRACING_DATA events: 0
BUILD_ID events: 0
FINISHED_ROUND events: 5

Ok, this finally seems to have worked! I saw:

cc1-3833 [000] 50250.934705: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-3833 [000] 50250.934708: irq_handler_entry: irq=16 name=eth2
as-3834 [000] 50250.943321: irq_handler_entry: irq=16 name=uhci_hcd:usb3
as-3834 [000] 50250.943325: irq_handler_entry: irq=16 name=eth2
as-3834 [000] 50250.943367: irq_handler_entry: irq=16 name=uhci_hcd:usb3

Except that it didnt - sometimes the trace started with:

cc1-3833 [000] 50250.916784: irq_handler_entry: irq=19 name=ahci
cc1-3833 [000] 50250.916790: irq_handler_entry: irq=19 name=uhci_hcd:usb7
cc1-3833 [000] 50250.917680: irq_handler_entry: irq=0 name=timer
cc1-3833 [000] 50250.918112: irq_handler_entry: irq=19 name=ahci
cc1-3833 [000] 50250.918119: irq_handler_entry: irq=19 name=uhci_hcd:usb7
cc1-3833 [000] 50250.930152: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-3833 [000] 50250.930156: irq_handler_entry: irq=16 name=eth2
cc1-3833 [000] 50250.930367: irq_handler_entry: irq=16 name=uhci_hcd:usb3
cc1-3833 [000] 50250.930369: irq_handler_entry: irq=16 name=eth2

ouch - both irq 19 and irq 0 entries there - clearly not what i expected to see.

Anyway, i ignored the extra lines i did not want to see and tried to use the filter
stuff some more, to match on the 'name' field. Wanting to see both usb3 and usb7
interrupts i typed the obvious:

aldebaran:~> perf record -a -e irq:irq_handler_entry --filter 'name==uhci_hcd:usb' sleep 1

But got nothing but a (not so obvious) timer irq:

aldebaran:~> perf script
:32749-32749 [000] 50514.340198: irq_handler_entry: irq=0 name=timer

Ok, i thought "these guys must be doing this via wildcards, lets try that":

aldebaran:~> perf record -a -e irq:irq_handler_entry --filter 'name==uhci_hcd:usb*'

That didnt work either.

So ... i have to say that this tracing filter business is unusable crap from a user
POV right now and i see no reason to pull *anything* in this area until it does not
get improved to *much* better levels of usability and utility.

Nobody could *ever* have tested this with a 'naive but curious user' hat on and this
is really sad. We need to do much better!

Thanks,

Ingo

2011-02-15 13:33:13

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

On Tue, 2011-02-15 at 05:44 +0100, Ingo Molnar wrote:
> hi Steve,
>
> * Steven Rostedt <[email protected]> wrote:
>
> > Steven Rostedt (14):
> > tracing/filter: Have no filter return a match
> > tracing/filter: Move OR and AND logic out of fn() method
> > tracing/filter: Dynamically allocate preds
> > tracing/filter: Call synchronize_sched() just once for system filters
> > tracing/filter: Allocate the preds in an array
> > tracing/filter: Free pred array on disabling of filter
> > tracing/filter: Use a tree instead of stack for filter_match_preds()
> > tracing/filter: Optimize short ciruit check
> > tracing/filter: Check the created pred tree
> > tracing/filter: Optimize filter by folding the tree
> > tracing/filter: Move MAX_FILTER_PRED to local tracing directory
> > tracing/filter: Increase the max preds to 2^14
> > tracing/filter: Swap entire filter of events
> > tracing/filter: Remove synchronize_sched() from __alloc_preds()
>
> I finally got around testing the various tracing filter features we have.
>

[ snip all the perf output ]

> So ... i have to say that this tracing filter business is unusable crap from a user
> POV right now and i see no reason to pull *anything* in this area until it does not
> get improved to *much* better levels of usability and utility.
>
> Nobody could *ever* have tested this with a 'naive but curious user' hat on and this
> is really sad. We need to do much better!

Sorry I did not work with perf in writing this code. I was using the
debugfs directly. I figured that any improvement I made there would also
improve perf as I tried to make sure the perf hooks into that code were
updated too.

My question is, did this patch set cause any of the perf problems or did
these problems always exist?

I'm just saying that perf is not the only user. And to deny improvements
in the code because one user does not currently work well with them is
just hindering progress.

There happens to be real users out in the world that are still using
ftrace. I see no reason to stop improving it because your goal is to
have everyone move to perf.

Thanks for letting me waste three days on developing this. I even posted
an RFC a while back, and no one complained then.

-- Steve

2011-02-15 13:44:35

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

Addressing the tools/perf/ suggestions:

Em Tue, Feb 15, 2011 at 05:44:25AM +0100, Ingo Molnar escreveu:
> Here is what i've done. Firstly, i have put my 'naive but curious user trying to
> make use of filters' hat on.
>
> I did:
>
> perf list | grep Tracepoint | less
>
> to get a list of tracepoints.
>
> #
> # Btw., unrelated feature request, it would be nice if the following shorcut did the
> # obvious thing:
> #
> # perf list Tracepoint
> #

Will do

> I picked one of the interesting looking tracepoints:
>
> syscalls:sys_enter_close [Tracepoint event]
>
> first roadblock:
>
> I had absolutely no idea how to proceed from here any further. I knew it from 'perf
> list --help' that I could stick 'syscalls:sys_enter_close' into -e expressions, but
> i had no idea how to utilize filter expressions at all.

<SNIP>

> aldebaran:~> cat /debug/tracing/events/syscalls/sys_enter_close/format
> name: sys_enter_close
> ID: 404
> format:
> field:unsigned short common_type; offset:0; size:2; signed:0;
> field:unsigned char common_flags; offset:2; size:1; signed:0;
> field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
> field:int common_pid; offset:4; size:4; signed:1;
> field:int common_lock_depth; offset:8; size:4; signed:1;
>
> field:int nr; offset:12; size:4; signed:1;
> field:unsigned int fd; offset:16; size:8; signed:0;
>
> print fmt: "fd: 0x%08lx", ((unsigned long)(REC->fd))
>
> And putting my kernel tracing hacker hat on i knew that the only interesting piece
> of information for a filter would be the 'fd' word.

Something in the TUI that allows the user to navigate thru the
tracepoints, grouping them and allowing something like:

+ Hardware
+ Dynamic Probes
+ Software
+ Scheduler
+ Block I/O
- System Calls
- close
entry
exit

And at any point in the tree allow enabling the tree branch and allowing
filters for fields that are common from that point down, i.e. fd should
be usable as a filter for file system events, allowing to see all events
that have an fd for a given pid.

Worth some experimentation.

<SNIP>

> aldebaran:~> perf record -e syscalls:sys_enter_close --filter 'fd == 10' ./hackbench 1
> Time: 0.079
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.007 MB perf.data (~325 samples) ]
>
> Cool i thought, it has 325 samples, success!
>
> Lets look at it via 'perf report' i thought:
>
> aldebaran:~> perf report
> aldebaran:~>
>
> ouch. The TUI flickered something which went away immediately (feature request:
> don't do that - at least try to mumble something about non-existent data or so).

Right, as the tools do when it finds LOST events, they should emit a
warning stating "No EVENT_NAME samples found in file foo.data", will do.

- Arnaldo

2011-02-15 16:29:32

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

On Tue, 2011-02-15 at 08:33 -0500, Steven Rostedt wrote:

> Thanks for letting me waste three days on developing this. I even posted
> an RFC a while back, and no one complained then.

Sorry about being a bit bitchy in my reply here. I need to make a note
not to reply to LKML before my first cup of coffee. ;)

Arnaldo,

Thanks for the post, I'll help you out where you need it. trace-cmd has
some features that reports back to the user on failed filter usage. We
can incorporate that into perf.

-- Steve

2011-02-15 16:53:17

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

On Tue, Feb 15, 2011 at 11:29:22AM -0500, Steven Rostedt wrote:
> On Tue, 2011-02-15 at 08:33 -0500, Steven Rostedt wrote:
>
> > Thanks for letting me waste three days on developing this. I even posted
> > an RFC a while back, and no one complained then.
>
> Sorry about being a bit bitchy in my reply here. I need to make a note
> not to reply to LKML before my first cup of coffee. ;)
>
> Arnaldo,
>
> Thanks for the post, I'll help you out where you need it. trace-cmd has
> some features that reports back to the user on failed filter usage. We
> can incorporate that into perf.

Cool!

That said I agree that we should not block improvements in the generic
filtering code because of issues in perf uses of filters.

I believe it used to work better in perf by the past, but I saw similar
issues lately like those Ingo noticed. So probably something
broke and we need to investigate. But until then your patches are
still nice improvements: lesser memory usage, lesser kernel stack usage in the
fast path, lesser limitation, faster and smarter filter evaluation...

2011-02-15 18:36:58

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

Em Tue, Feb 15, 2011 at 05:53:06PM +0100, Frederic Weisbecker escreveu:
> On Tue, Feb 15, 2011 at 11:29:22AM -0500, Steven Rostedt wrote:
> > On Tue, 2011-02-15 at 08:33 -0500, Steven Rostedt wrote:
> >
> > > Thanks for letting me waste three days on developing this. I even posted
> > > an RFC a while back, and no one complained then.
> >
> > Sorry about being a bit bitchy in my reply here. I need to make a note
> > not to reply to LKML before my first cup of coffee. ;)
> >
> > Arnaldo,
> >
> > Thanks for the post, I'll help you out where you need it. trace-cmd has
> > some features that reports back to the user on failed filter usage. We
> > can incorporate that into perf.

Thanks!

> Cool!
>
> That said I agree that we should not block improvements in the generic
> filtering code because of issues in perf uses of filters.
>
> I believe it used to work better in perf by the past, but I saw similar
> issues lately like those Ingo noticed. So probably something
> broke and we need to investigate. But until then your patches are
> still nice improvements: lesser memory usage, lesser kernel stack usage in the
> fast path, lesser limitation, faster and smarter filter evaluation...

Yeah, usability of the --filter parameter in perf is a bit
(understatement) lacking, one has to look at the /format thing in
/sys/kernel/debug/tracing... and not commit any mistake, else a generic
invalid 22 message is spit out.

I tried it and after some back and forth changing hats and scratching my
head while doing so, it got to work.

I talked with Steven and the same operation using trace-cmd would
produce a better error report, stating that the field used in the filter
expression was not valid.

I'll try to get that code from trace-cmd and glue that into
tools/perf/util/, that eventually will get moved to tools/lib/ or
something like that, as Borislav has been experimenting with for some
time already.

The location in the source tree is not the most important thing at this
point, usability improvements are, so I'm not rushing to moving code
around all the time (even doing it more than I would like), lets try to
improve usability first and then we can move it to tools/lib.

- Arnaldo

2011-02-15 18:42:39

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

* Steven Rostedt <[email protected]> wrote:

> On Tue, 2011-02-15 at 05:44 +0100, Ingo Molnar wrote:

> > So ... i have to say that this tracing filter business is unusable crap from a
> > user POV right now and i see no reason to pull *anything* in this area until it
> > does not get improved to *much* better levels of usability and utility.
> >
> > Nobody could *ever* have tested this with a 'naive but curious user' hat on and
> > this is really sad. We need to do much better!
>
> Sorry I did not work with perf in writing this code. I was using the debugfs
> directly. I figured that any improvement I made there would also improve perf as I
> tried to make sure the perf hooks into that code were updated too.
>
> My question is, did this patch set cause any of the perf problems or did these
> problems always exist?
>
> I'm just saying that perf is not the only user. And to deny improvements in the
> code because one user does not currently work well with them is just hindering
> progress.
>
> There happens to be real users out in the world that are still using ftrace. I see
> no reason to stop improving it because your goal is to have everyone move to perf.
>
> Thanks for letting me waste three days on developing this. I even posted an RFC a
> while back, and no one complained then.

I initially pulled your bits with the intention of merging them, tested them as the
final line of defense, gave you my feedback in my mail in a very detailed way, with
suggestions of what to improve.

A few lines I would normally not worry about, but I refuse to pull such a massive
diffstat:

3 files changed, 754 insertions(+), 170 deletions(-)

That ignores a major usecase. I do not pull bits that are arcane to begin with which
improve something that we don't even know whether it works in all cases - in fact
which we know does not work at all in a major usecase, as my testing has shown.

My point is that you guys need to work this out with the 'other side' *before* it
goes upstream. The tracing and perf code needs to stop doing this kind of
self-serving improvements *when basic utility sucks so much*.

And yes, it sucks both on the perf and the ftrace tracing 'side' - in no small part
because there's two sides.

We had huge churn in the tracing code in the last 2 years and frankly i do not see
the results and i do not see it getting cleaned up and i do not see it getting
unified.

I find this kind of 'the other side does not exist' schizm quite harmful to the
'generic' code in question and am pushing back on you, as i'm expected to. I don't
care whether it's "perf's fault" or "ftrace's fault" - i find the whole artificial
division harmful and refuse to elongate/deepen it.

Anyway, there's certainly encouraging responses in this thread so i'm hopeful that
it's getting fixed and improved and we can push the generic bits upstream.

Thanks,

Ingo

2011-02-15 18:59:41

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

On Tue, 2011-02-15 at 19:42 +0100, Ingo Molnar wrote:
> * Steven Rostedt <[email protected]> wrote:

> I find this kind of 'the other side does not exist' schizm quite harmful to the
> 'generic' code in question and am pushing back on you, as i'm expected to. I don't
> care whether it's "perf's fault" or "ftrace's fault" - i find the whole artificial
> division harmful and refuse to elongate/deepen it.

Let me apologize again. I did wake up on the wrong side of the bed this
morning, I didn't have my coffee and I was just in a bad mood. That was
not the proper response. I read it as you were not going to take any
more infrastructure updates until the perf side was fixed.

You're right, neither perf or ftrace is intuitive on the filter front
for the novice. But there are advance users that do use it and I was
focused on improving the infrastructure not the interface.

I guess if you started off saying, "Look, I just tried to work with the
perf interface, and the filtering sucks. Can we work to fix that next".
I would have been in a much more collaborating mood.

>
> Anyway, there's certainly encouraging responses in this thread so i'm hopeful that
> it's getting fixed and improved and we can push the generic bits upstream.

Yes, I'm hopeful too ;)

-- Steve

2011-02-16 09:10:35

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

* Steven Rostedt <[email protected]> wrote:

> Let me apologize again. [...]

No need to apologize - you raised valid questions that have come up in the past and
i am pretty good at ignoring early-in-the-morning aspect of mails ;-)

> > Anyway, there's certainly encouraging responses in this thread so i'm hopeful
> > that it's getting fixed and improved and we can push the generic bits upstream.
>
> Yes, I'm hopeful too ;)

Great! :-)

If someone wants to dust off the 'trace' utility patches that are still in
tip:tmp.perf/trace that would be fantastic. Thomas and Peter prototyped an
ftrace-esque buffering model there, to have all events associated with a
single fd (and hence a single [per cpu] buffer).

Warning, they conflict left and right with the current code:

kernel/sched.c
kernel/trace/trace.c
tools/perf/Makefile
tools/perf/builtin-record.c
tools/perf/util/event.c
tools/perf/util/header.c
tools/perf/util/parse-events.c
tools/perf/util/session.c
tools/perf/util/session.h

So it's quite a bit of work - and of course it was all very unfinished, not even
reaching prototype stage really.

Thanks,

Ingo

2011-02-16 13:34:15

by 平松雅巳 / HIRAMATU，MASAMI

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

(2011/02/16 3:35), Arnaldo Carvalho de Melo wrote:
> Em Tue, Feb 15, 2011 at 05:53:06PM +0100, Frederic Weisbecker escreveu:
>> On Tue, Feb 15, 2011 at 11:29:22AM -0500, Steven Rostedt wrote:
>>> On Tue, 2011-02-15 at 08:33 -0500, Steven Rostedt wrote:
>>>
>>>> Thanks for letting me waste three days on developing this. I even posted
>>>> an RFC a while back, and no one complained then.
>>>
>>> Sorry about being a bit bitchy in my reply here. I need to make a note
>>> not to reply to LKML before my first cup of coffee. ;)
>>>
>>> Arnaldo,
>>>
>>> Thanks for the post, I'll help you out where you need it. trace-cmd has
>>> some features that reports back to the user on failed filter usage. We
>>> can incorporate that into perf.
>
> Thanks!

Sounds good :)

>
>> Cool!
>>
>> That said I agree that we should not block improvements in the generic
>> filtering code because of issues in perf uses of filters.
>>
>> I believe it used to work better in perf by the past, but I saw similar
>> issues lately like those Ingo noticed. So probably something
>> broke and we need to investigate. But until then your patches are
>> still nice improvements: lesser memory usage, lesser kernel stack usage in the
>> fast path, lesser limitation, faster and smarter filter evaluation...
>
> Yeah, usability of the --filter parameter in perf is a bit
> (understatement) lacking, one has to look at the /format thing in
> /sys/kernel/debug/tracing... and not commit any mistake, else a generic
> invalid 22 message is spit out.

And also, there seems very less comment in perf-record.txt.

--filter=<filter>::
Event filter.

Hmm, I think, at least, there should be a comment where user should
refer to and if possible, how he can use it...

> I tried it and after some back and forth changing hats and scratching my
> head while doing so, it got to work.
>
> I talked with Steven and the same operation using trace-cmd would
> produce a better error report, stating that the field used in the filter
> expression was not valid.

If possible, please show us what parameters we can use on that event too.
perf list gives us a list of events, but not show us what parameters
we can use on those events. I think we can update perf list for that
purpose.

e.g.)
perf list --field kvm:kvm_entry
unsigned short common_type;
unsigned char common_flags;
unsigned char common_preempt_count;
int common_pid;
int common_lock_depth;
unsigned int vcpu_id;

>
> I'll try to get that code from trace-cmd and glue that into
> tools/perf/util/, that eventually will get moved to tools/lib/ or
> something like that, as Borislav has been experimenting with for some
> time already.
>
> The location in the source tree is not the most important thing at this
> point, usability improvements are, so I'm not rushing to moving code
> around all the time (even doing it more than I would like), lets try to
> improve usability first and then we can move it to tools/lib.
>
> - Arnaldo

Thank you,

--
Masami HIRAMATSU
2nd Dept. Linux Technology Center
Hitachi, Ltd., Systems Development Laboratory
E-mail: [email protected]

2011-02-16 14:53:13

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [PATCH 00/14] [GIT PULL][v2.6.39] tracing/filter: More robust filtering

Em Wed, Feb 16, 2011 at 10:34:07PM +0900, Masami Hiramatsu escreveu:
> (2011/02/16 3:35), Arnaldo Carvalho de Melo wrote:

> > Yeah, usability of the --filter parameter in perf is a bit
> > (understatement) lacking, one has to look at the /format thing in
> > /sys/kernel/debug/tracing... and not commit any mistake, else a generic
> > invalid 22 message is spit out.

> And also, there seems very less comment in perf-record.txt.

> --filter=<filter>::
> Event filter.

> Hmm, I think, at least, there should be a comment where user should
> refer to and if possible, how he can use it...

> > I tried it and after some back and forth changing hats and scratching my
> > head while doing so, it got to work.

> > I talked with Steven and the same operation using trace-cmd would
> > produce a better error report, stating that the field used in the filter
> > expression was not valid.

> If possible, please show us what parameters we can use on that event too.
> perf list gives us a list of events, but not show us what parameters
> we can use on those events. I think we can update perf list for that
> purpose.

> e.g.)
> perf list --field kvm:kvm_entry
> unsigned short common_type;
> unsigned char common_flags;
> unsigned char common_preempt_count;
> int common_pid;
> int common_lock_depth;
> unsigned int vcpu_id;

That is a cool idea, I would accept a patch implementing this :-)

- Arnaldo