2013-08-27 19:40:29

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 00/10] tracing: trace event triggers (repost)

This is a repost of the v7 patchset - I inadvertently used the wrong
branch in the previous posting, thought the branch URL was correct in
both cases..

Hi,

This is v7 of the trace event triggers patchset. This version mainly
moves some code between patches to fix some bisectibity problems, but
also adds a couple minor cleanups and variable naming changes
mentioned by Masami Hiramatsu.

v7:
- moved find_event_file() extern declartion to patch 06.
- moved helper functions from patch 02 to 03, where they're first
used.
- removed copies of cmd_ops fields from trigger_data and changed to
use cmd_ops diretly instead.
- renamed trigger_mode to trigger_type to avoid confusion with the
FTRACE_EVENT_FL_TRIGGER_MODE_BIT bitflag, and fixed up
usage/documentation, etc.

v6:
- fixed up the conflicts in trace_events.c related to the actual
creation of the per-event 'trigger' files.

v5:
- got rid of the trigger_iterator, a vestige of the first patchset,
which attempted to abstract the ftrace_iterator for triggers, and
cleaned up related code simplified as a result.
- replaced the void *cmd_data everywhere with ftrace_event_file *,
another vestige of the initial patchset.
- updated the patchset to use event_file_data() to grab the i_private
ftrace_event_files where appropriate (this was a separate patch in
the previous patchset, but was merged into the basic framework
patch as suggested by Masami. The only interesting part about this
is that it moved event_file_data() from kernel/trace/trace_events.c
to kernel/trace/trace.h so it can be used in
e.g. trace_events_trigger.c as well.)
- add missing grab of event_mutex in event_trigger_regex_write().
- realized when making the above changes that the trigger filters
weren't being freed when the trigger was freed, so added a
trigger_data_free() to do that. It also ensures that trigger_data
won't be freed until nothing is using it.
- added clear_event_triggers(), which clears all triggers in a trace
array (and soft-disable associated with event_enable/disable
events).
- added a comment to ftrace_syscall_enter/exit to document the use of
rcu_dereference_raw() there.

v4:
- made some changes to the soft-disable for syscall patch, according
to Masami's suggestions. Actually, since there's now an array of
ftrace_files for syscalls that can serve the same purpose, the
enabled_enter/exit_syscalls bit arrays became redundant and were
removed.
- moved all the remaining common functions out of the
traceon/traceoff patch and into the basic trigger framework patch
and added comments to all the common functions.
- extensively commented the event_trigger_ops and event_command ops.
- made the register/unregister_command functions __init. Since that
code was originally inspired by similar ftrace code, a new patch
was added to do the same thing for the register/unregister of the
ftrace commands (patch 10/11).
- fixed the event_trigger_regex_open i_private problem noted by
Masami that's currently being addressed by Oleg Nesterov's fixes
for this. Note that that patchset also affects patch 8/11 (update
filters for multi-buffer, since it touches event filters as well).
Patch 11/11 depends on that patchset and also moves
event_file_data() to trace.h.b

v3:
- added a new patch to the series (patch 8/9 - update event filters
for multibuffer) to bring the event filters up-to-date wrt the
multibuffer changes - without this patch, the same filter is
applied to all buffers regardless of which instance sets it; this
patch allows you to set per-instance filters as you'd expect. The
one exception to this is the 'ftrace subsystem' events, which are
special and retain their current behavior.
- changed the syscall soft enabling to keep a per-trace-array array
of trace_event_files alongside the 'enabled' bitmaps there. This
keeps them in a place where they're only allocated for tracing
and which I think addresses all the previous comments for that
patch.

v2:
- removed all changes to __ftrace_event_enable_disable() (except
for patch 04/11 which clears the soft_disabled bit as discussed)
and created a separate trace_event_trigger_enable_disable() that
calls it after setting/clearing the TRIGGER_MODE_BIT.
- added a trigger_mode enum for future patches that break up the
trigger calls for filtering, but that's also now used as a command
id for registering/unregistering commands.
- removed the enter_file/exit_file members that were added to
syscall_metadata after realizing that they were unnecessary if
ftrace_syscall_enter/exit() were modified to receive a pointer
to the ftrace_file instead of the pointer to the trace_array in
the ftrace_file.
- broke up the trigger invocation into two parts so that triggers
like 'stacktrace' that themselves log into the trace buffer can
defer the actual trigger invocation until after the current
record is closed, which is needed for the filter check that
in turn determines whether the trigger gets invoked.
- other minor cleanup


This patchset implements 'trace event triggers', which are similar to
the function triggers implemented for 'ftrace filter commands' (see
'Filter commands' in Documentation/trace/ftrace.txt), but instead of
being invoked from function calls are invoked by trace events.
Basically the patchset allows 'commands' to be triggered whenever a
given trace event is hit. The set of commands implemented by this
patchset are:

- enable/disable_event - enable or disable another event whenever
the trigger event is hit

- stacktrace - dump a stacktrace to the trace buffer whenever the
trigger event is hit

- snapshot - create a snapshot of the current trace buffer whenever
the trigger event is hit

- traceon/traceoff - turn tracing on or off whenever the trigger
event is hit

Triggers can also be conditionally invoked by associating a standard
trace event filter with them - if the given event passes the filter,
the trigger is invoked, otherwise it's not. (see 'Event filtering' in
Documentation/trace/events.txt for info on event filters).

See the last patch in the series for more complete documention on
event triggers and the available trigger commands, and below for some
simple examples of each of the above commands along with conditional
filtering.

The first four patches are bugfix patches or minor improvements which
can be applied regardless; the rest contain the basic framework and
implementations for each command.

This patchset was based on some ideas from Steve Rostedt, which he
outlined during a couple discussions at ELC and follow-on e-mails.
Code- and interface-wise, it's also partially based on the existing
function triggers implementation and essentially works on top of the
SOFT_DISABLE mode introduced for that. Both Steve and Masami
Hiramatsu took a look at a couple early versions of this patchset, and
offered some very useful suggestions reflected in this patchset -
thanks to them both for the ideas and for taking the time to do some
basic sanity reviews!

Below are a few concrete examples demonstrating each of the available
commands.

The first example attempts to capture all the kmalloc events that
happen as a result of reading a particular file.

The first part of the set of commands below adds a kmalloc
'enable_event' trigger to the sys_enter_read trace event - as a
result, when the sys_enter_read event occurs, kmalloc events are
enabled, resulting in those kmalloc events getting logged into the
trace buffer. The :1 at the end of the kmalloc enable_event specifies
that the enabling of kmalloc events on sys_enter_read will only happen
once - subsequent reads won't trigger the kmalloc logging. The next
part of the example reads a test file, which triggers the
sys_enter_read tracepoint and thus turns on the kmalloc events, and
once done, adds a trigger to sys_exit_read that disables kmalloc
events. The disable_event doesn't have a :1 appended, which means it
happens on every sys_exit_read.

# echo 'enable_event:kmem:kmalloc:1' > \
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger; \
cat ~/junk.txt > /dev/null; \
echo 'disable_event:kmem:kmalloc' > \
/sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger

Just to show a bit of what happens under the covers, if we display the
kmalloc 'enable' file, we see that it's 'soft disabled' (the asterisk
after the enable flag). This means that it's actually enabled but is
in the SOFT_DISABLED state, and is essentially held back from actually
logging anything to the trace buffer, but can be made to log into the
buffer by simply flipping a bit :

# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/enable
0*

If we look at the 'enable' file for the triggering sys_enter_read
trace event, we can see that it also has the 'soft disable' flag set.
This is because in the case of the triggering event, we also need to
have the trace event invoked regardless of whether or not its actually
being logged, so we can process the triggers. This functionality is
also built on top of the SOFT_DISABLE flag and is reflected in the
enable state as well:

# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/enable
0*

To find out which triggers are set for a particular event, we can look
at the 'trigger' file for the event. Here's what the 'trigger' file
for the sys_enter_read event looks like:

# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
enable_event:kmem:kmalloc:count=0

The 'count=0' field at the end shows that this trigger has no more
triggering ability left - it's essentially fired all its shots - if
it was still active, it would have a non-zero count.

Looking at the sys_exit_read, we see that since we didn't specify a
number at the end, the number of times it can fire is unlimited:

# cat /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
disable_event:kmem:kmalloc:unlimited

# cat /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/enable
0*

Finally, let's look at the results of the above set of commands by
cat'ing the 'trace' file:

# cat /sys/kernel/debug/tracing/trace

# tracer: nop
#
# entries-in-buffer/entries-written: 85/85 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
cat-2596 [001] .... 374.518849: kmalloc: call_site=ffffffff812de707 ptr=ffff8800306b9290 bytes_req=2 bytes_alloc=8 gfp_flags=GFP_KERNEL|GFP_ZERO
cat-2596 [001] .... 374.518956: kmalloc: call_site=ffffffff81182a12 ptr=ffff88010c8e1500 bytes_req=256 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_ZERO
cat-2596 [001] .... 374.518959: kmalloc: call_site=ffffffff812d8e49 ptr=ffff88003002a200 bytes_req=32 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
cat-2596 [001] .... 374.518960: kmalloc: call_site=ffffffff812de707 ptr=ffff8800306b9088 bytes_req=2 bytes_alloc=8 gfp_flags=GFP_KERNEL|GFP_ZERO
cat-2596 [003] .... 374.519063: kmalloc: call_site=ffffffff812d9f50 ptr=ffff8800b793fd00 bytes_req=256 bytes_alloc=256 gfp_flags=GFP_KERNEL
cat-2596 [003] .... 374.519119: kmalloc: call_site=ffffffff811cc3bc ptr=ffff8800b7918900 bytes_req=128 bytes_alloc=128 gfp_flags=GFP_KERNEL
cat-2596 [003] .... 374.519122: kmalloc: call_site=ffffffff811cc4d2 ptr=ffff880030404800 bytes_req=504 bytes_alloc=512 gfp_flags=GFP_KERNEL
cat-2596 [003] .... 374.519125: kmalloc: call_site=ffffffff811cc64e ptr=ffff88003039d8a0 bytes_req=28 bytes_alloc=32 gfp_flags=GFP_KERNEL
.
.
.
Xorg-1194 [000] .... 374.543956: kmalloc: call_site=ffffffffa03a8599 ptr=ffff8800ba23b700 bytes_req=112 bytes_alloc=128 gfp_flags=GFP_TEMPORARY|GFP_NOWARN|GFP_NORETRY
Xorg-1194 [000] .... 374.543961: kmalloc: call_site=ffffffffa03a7639 ptr=ffff8800b7905b40 bytes_req=56 bytes_alloc=64 gfp_flags=GFP_TEMPORARY|GFP_ZERO
Xorg-1194 [000] .... 374.543973: kmalloc: call_site=ffffffffa039f716 ptr=ffff8800b7905ac0 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL
.
.
.
compiz-1769 [002] .... 374.547586: kmalloc: call_site=ffffffffa03a8599 ptr=ffff8800ba320400 bytes_req=952 bytes_alloc=1024 gfp_flags=GFP_TEMPORARY|GFP_NOWARN|GFP_NORETRY
compiz-1769 [002] .... 374.547592: kmalloc: call_site=ffffffffa03a7639 ptr=ffff8800bd5f7400 bytes_req=280 bytes_alloc=512 gfp_flags=GFP_TEMPORARY|GFP_ZERO
compiz-1769 [002] .... 374.547623: kmalloc: call_site=ffffffffa039f716 ptr=ffff8800b792d580 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL
.
.
.
cat-2596 [000] .... 374.646019: kmalloc: call_site=ffffffff8123df9f ptr=ffff8800ba2f2900 bytes_req=96 bytes_alloc=96 gfp_flags=GFP_NOFS|GFP_ZERO
cat-2596 [000] .... 374.648263: kmalloc: call_site=ffffffff8123df9f ptr=ffff8800ba2f2900 bytes_req=96 bytes_alloc=96 gfp_flags=GFP_NOFS|GFP_ZERO
cat-2596 [000] .... 374.650503: kmalloc: call_site=ffffffff8123df9f ptr=ffff8800ba2f2900 bytes_req=96 bytes_alloc=96 gfp_flags=GFP_NOFS|GFP_ZERO
.
.
.
bash-2425 [002] .... 374.654923: kmalloc: call_site=ffffffff8123df9f ptr=ffff8800b7a28780 bytes_req=96 bytes_alloc=96 gfp_flags=GFP_NOFS|GFP_ZERO
rsyslogd-974 [002] .... 374.655163: kmalloc: call_site=ffffffff81046ae6 ptr=ffff8800ba320400 bytes_req=1024 bytes_alloc=1024 gfp_flags=GFP_KERNEL

As you can see, we captured all the kmallocs from our 'cat' reads, but
also any other kmallocs that happened for other processes between the
time we turned on kmalloc events and turned them off. Future work
should add a way to screen out unwanted events e.g. the abilitiy to
capture the triggering pid in a simple variable and use that variable
in event filters to screen out other pids.

To turn off the events we turned on, simply reinvoke the commands
prefixed by '!':

# echo '!enable_event:kmem:kmalloc:1' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# echo '!disable_event:kmem:kmalloc' > /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger

You can verify that the events have been turned off by again examining
the 'enable' and 'trigger' files:

# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/enable
0
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/enable
0


The next example shows how to use the 'stacktrace' command. To have a
stacktrace logged every time a particular event occurs, simply echo
'stacktrace' into the 'trigger' file for that event:

# echo 'stacktrace' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger

# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
stacktrace:unlimited

Looking at the 'trace' output, we indeed see stack traces for every
kmalloc:

# cat /sys/kernel/debug/tracing/trace

compiz-1769 [003] .... 2422.614630: <stack trace>
=> i915_add_request
=> i915_gem_do_execbuffer.isra.15
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [002] .... 2422.619076: <stack trace>
=> drm_wait_vblank
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [000] .... 2422.625823: <stack trace>
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
.
.
.
bash-2842 [001] .... 2423.002059: <stack trace>
=> __tracing_open
=> tracing_open
=> do_dentry_open
=> finish_open
=> do_last
=> path_openat
=> do_filp_open
=> do_sys_open
=> SyS_open
=> system_call_fastpath
bash-2842 [001] .... 2423.002070: <stack trace>
=> __tracing_open
=> tracing_open
=> do_dentry_open
=> finish_open
=> do_last
=> path_openat
=> do_filp_open
=> do_sys_open
=> SyS_open
=> system_call_fastpath

For an event like kmalloc, however, we don't typically want to see a
stack trace for every single event, since the amount of data produced
is overwhelming. What we'd typically want to do is only log a stack
trace for particular events of interest. We can accomplish that by
appending an 'event filter' to the trigger. The event filters used to
filter triggers are exactly the same as those implemented for the
existing trace event 'filter' files - see the trace event
documentation for details.

First, let's turn off the existing stacktrace event, and clear the
trace buffer:

# echo '!stacktrace' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# echo > /sys/kernel/debug/tracing/trace

Now, we can add a new stacktrace trigger which will fire 5 times, but
only if the number of bytes requested by the caller was greater than
or equal to 512:

# echo 'stacktrace:5 if bytes_req >= 512' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger

# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
stacktrace:count=0 if bytes_req >= 512

>From looking at the trigger, we can see the event fired 5 times
(count=0) and looking at the 'trace' file, we can verify that:

# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 5/5 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
rsyslogd-974 [000] .... 1796.412997: <stack trace>
=> kmem_cache_alloc_trace
=> do_syslog
=> kmsg_read
=> proc_reg_read
=> vfs_read
=> SyS_read
=> system_call_fastpath
compiz-1769 [000] .... 1796.427342: <stack trace>
=> __kmalloc
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [003] .... 1796.441251: <stack trace>
=> __kmalloc
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [003] .... 1796.441392: <stack trace>
=> __kmalloc
=> sg_kmalloc
=> __sg_alloc_table
=> sg_alloc_table
=> i915_gem_object_get_pages_gtt
=> i915_gem_object_get_pages
=> i915_gem_object_pin
=> i915_gem_execbuffer_reserve_object.isra.11
=> i915_gem_execbuffer_reserve
=> i915_gem_do_execbuffer.isra.15
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [003] .... 1796.441672: <stack trace>
=> __kmalloc
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath

So the trace output shows exactly 5 stacktraces, as expected.

Just for comparison, let's look at an event that's harder to trigger,
to see a count that isn't 0 in the trigger description:

# echo 'stacktrace:5 if bytes_req >= 65536' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger

# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
stacktrace:count=5 if bytes_req >= 65536


The next example shows how to use the 'snapshot' command to capture a
snapshot of the trace buffer when an 'interesting' event occurs.

In this case, we'll first start the entire block subsystem tracing:

# echo 1 > /sys/kernel/debug/tracing/events/block/enable

Next, we add a 'snapshot' trigger that will take a snapshot of all the
events leading up to the particular event we're interested in, which
is a block queue unplug with a depth > 1. In this case we're
interested in capturing the snapshot just one time, the first time it
occurs:

# echo 'snapshot:1 if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger

It may take awhile for the condition to occur, but once it does, we
can see the entire sequence of block events leading up to in in the
'snapshot' file:

# cat /sys/kernel/debug/tracing/snapshot

jbd2/sdb1-8-278 [001] .... 382.075012: block_bio_queue: 8,16 WS 629429976 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [001] .... 382.075012: block_bio_backmerge: 8,16 WS 629429976 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [001] d... 382.075015: block_rq_insert: 8,16 WS 0 () 629429912 + 72 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [001] d... 382.075030: block_rq_issue: 8,16 WS 0 () 629429912 + 72 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [001] d... 382.075044: block_unplug: [jbd2/sdb1-8] 1
<idle>-0 [000] ..s. 382.075310: block_rq_complete: 8,16 WS () 629429912 + 72 [0]
jbd2/sdb1-8-278 [000] .... 382.075407: block_touch_buffer: 8,17 sector=78678492 size=4096
jbd2/sdb1-8-278 [000] .... 382.075413: block_bio_remap: 8,16 FWFS 629429984 + 8 <- (8,17) 629427936
jbd2/sdb1-8-278 [000] .... 382.075415: block_bio_queue: 8,16 FWFS 629429984 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [000] .... 382.075418: block_getrq: 8,16 FWFS 629429984 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [000] d... 382.075421: block_rq_insert: 8,16 FWFS 0 () 629429984 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [000] d... 382.075424: block_rq_issue: 8,16 FWS 0 () 18446744073709551615 + 0 [jbd2/sdb1-8]
<idle>-0 [000] dNs. 382.115912: block_rq_issue: 8,16 WS 0 () 629429984 + 8 [swapper/0]
<idle>-0 [000] ..s. 382.116059: block_rq_complete: 8,16 WS () 629429984 + 8 [0]
<idle>-0 [000] dNs. 382.116079: block_rq_issue: 8,16 FWS 0 () 18446744073709551615 + 0 [swapper/0]
<idle>-0 [000] d.s. 382.131030: block_rq_complete: 8,16 WS () 629429984 + 0 [0]
jbd2/sdb1-8-278 [000] .... 382.131106: block_dirty_buffer: 8,17 sector=26 size=4096
jbd2/sdb1-8-278 [000] .... 382.131111: block_dirty_buffer: 8,17 sector=106954757 size=4096
.
.
.
kworker/u16:3-66 [002] .... 387.144505: block_bio_remap: 8,16 WM 2208 + 8 <- (8,17) 160
kworker/u16:3-66 [002] .... 387.144512: block_bio_queue: 8,16 WM 2208 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] .... 387.144522: block_getrq: 8,16 WM 2208 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] .... 387.144525: block_plug: [kworker/u16:3]
kworker/u16:3-66 [002] .... 387.144530: block_bio_remap: 8,16 WM 2216 + 8 <- (8,17) 168
kworker/u16:3-66 [002] .... 387.144531: block_bio_queue: 8,16 WM 2216 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] .... 387.144533: block_bio_backmerge: 8,16 WM 2216 + 8 [kworker/u16:3]
.
.
.
kworker/u16:3-66 [002] d... 387.144631: block_rq_insert: 8,16 WM 0 () 2208 + 16 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144636: block_rq_insert: 8,16 WM 0 () 2256 + 16 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144638: block_rq_insert: 8,16 WM 0 () 662702080 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144640: block_rq_insert: 8,16 WM 0 () 683673680 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144641: block_rq_insert: 8,16 WM 0 () 729812344 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144642: block_rq_insert: 8,16 WM 0 () 729828896 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144643: block_rq_insert: 8,16 WM 0 () 730599480 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144644: block_rq_insert: 8,16 WM 0 () 855640104 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144645: block_rq_insert: 8,16 WM 0 () 880805984 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144646: block_rq_insert: 8,16 WM 0 () 1186990400 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144649: block_unplug: [kworker/u16:3] 10


The final example shows something very similer but using the
'traceoff' command to stop tracing when an 'interesting' event occurs.
The traceon and traceoff commands can be used together to toggle
tracing on and off in creative ways to capture different traces in the
'trace' buffer, but this example just shows essentially the same use
case as the previous example but using 'traceoff' to capture trace
data of interest in the standard 'trace' buffer.

Again, we'll start the entire block subsystem tracing:

# echo 1 > /sys/kernel/debug/tracing/events/block/enable

# echo 'traceoff:1 if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger

# cat /sys/kernel/debug/tracing/trace

kworker/u16:4-67 [000] .... 803.003670: block_bio_remap: 8,16 WM 2208 + 8 <- (8,17) 160
kworker/u16:4-67 [000] .... 803.003670: block_bio_queue: 8,16 WM 2208 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003672: block_getrq: 8,16 WM 2208 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003674: block_bio_remap: 8,16 WM 2216 + 8 <- (8,17) 168
kworker/u16:4-67 [000] .... 803.003675: block_bio_queue: 8,16 WM 2216 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003676: block_bio_backmerge: 8,16 WM 2216 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003678: block_bio_remap: 8,16 WM 2232 + 8 <- (8,17) 184
kworker/u16:4-67 [000] .... 803.003678: block_bio_queue: 8,16 WM 2232 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003680: block_getrq: 8,16 WM 2232 + 8 [kworker/u16:4]
.
.
.
kworker/u16:4-67 [000] d... 803.003720: block_rq_insert: 8,16 WM 0 () 285223776 + 16 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003721: block_rq_insert: 8,16 WM 0 () 662702080 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003722: block_rq_insert: 8,16 WM 0 () 683673680 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003723: block_rq_insert: 8,16 WM 0 () 730599480 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003724: block_rq_insert: 8,16 WM 0 () 763365384 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003725: block_rq_insert: 8,16 WM 0 () 880805984 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003726: block_rq_insert: 8,16 WM 0 () 1186990872 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003727: block_rq_insert: 8,16 WM 0 () 1187057608 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003729: block_unplug: [kworker/u16:4] 14

The following changes since commit fc30f13b7c1b87b44ee364462c3408c913f01439:

Merge branch 'trace/ftrace/core-tpstring' into trace/for-next (2013-08-22 12:30:55 -0400)

are available in the git repository at:


git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/event-triggers-v7
http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/event-triggers-v7

Tom Zanussi (10):
tracing: Add support for SOFT_DISABLE to syscall events
tracing: add basic event trigger framework
tracing: add 'traceon' and 'traceoff' event trigger commands
tracing: add 'snapshot' event trigger command
tracing: add 'stacktrace' event trigger command
tracing: add 'enable_event' and 'disable_event' event trigger
commands
tracing: add and use generic set_trigger_filter() implementation
tracing: update event filters for multibuffer
tracing: add documentation for trace event triggers
tracing: make register/unregister_ftrace_command __init

Documentation/trace/events.txt | 207 +++++
include/linux/ftrace.h | 4 +-
include/linux/ftrace_event.h | 53 +-
include/trace/ftrace.h | 39 +-
kernel/trace/Makefile | 1 +
kernel/trace/ftrace.c | 12 +-
kernel/trace/trace.c | 31 +-
kernel/trace/trace.h | 194 ++++-
kernel/trace/trace_branch.c | 2 +-
kernel/trace/trace_events.c | 49 +-
kernel/trace/trace_events_filter.c | 181 ++++-
kernel/trace/trace_events_trigger.c | 1387 ++++++++++++++++++++++++++++++++++
kernel/trace/trace_export.c | 2 +-
kernel/trace/trace_functions_graph.c | 4 +-
kernel/trace/trace_kprobe.c | 4 +-
kernel/trace/trace_mmiotrace.c | 4 +-
kernel/trace/trace_sched_switch.c | 4 +-
kernel/trace/trace_syscalls.c | 62 +-
kernel/trace/trace_uprobe.c | 3 +-
19 files changed, 2128 insertions(+), 115 deletions(-)
create mode 100644 kernel/trace/trace_events_trigger.c

--
1.7.11.4


2013-08-27 19:40:38

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 02/10] tracing: add basic event trigger framework

Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.

'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.

The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.

The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is what's already implemented by SOFT_DISABLE mode, so trigger
mode directly reuses that. Event trigger essentially inherit the soft
disable logic in __ftrace_event_enable_disable() while adding a bit of
logic and trigger reference counting via tm_ref on top of that in a
new trace_event_trigger_enable_disable() function. Because the base
__ftrace_event_enable_disable() code now needs to be invoked from
outside trace_events.c, a wrapper is also added for those usages.

The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.

The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.

The standard open, read, and release file operations are implemented
here.

The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.

The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.

The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.

A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.

also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.

A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.

The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.

Every event_command func() implementation essentially does the
same thing for any command:

- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event

The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.

Each command has an associated trigger_type, which serves double duty,
both as a unique identifier for the command as well as a value that
can be used for setting a trigger mode bit during trigger invocation.

The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.

The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.

This also moves event_file_data() into trace.h so it can be used
outside of trace_events.c.

Signed-off-by: Tom Zanussi <[email protected]>
Idea-by: Steve Rostedt <[email protected]>
---
include/linux/ftrace_event.h | 13 +-
include/trace/ftrace.h | 4 +
kernel/trace/Makefile | 1 +
kernel/trace/trace.h | 171 ++++++++++++++++++++++
kernel/trace/trace_events.c | 21 ++-
kernel/trace/trace_events_trigger.c | 280 ++++++++++++++++++++++++++++++++++++
kernel/trace/trace_syscalls.c | 4 +
7 files changed, 488 insertions(+), 6 deletions(-)
create mode 100644 kernel/trace/trace_events_trigger.c

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 5eaa746..4b4fa62 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -255,6 +255,7 @@ enum {
FTRACE_EVENT_FL_RECORDED_CMD_BIT,
FTRACE_EVENT_FL_SOFT_MODE_BIT,
FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
};

/*
@@ -263,13 +264,15 @@ enum {
* RECORDED_CMD - The comms should be recorded at sched_switch
* SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
* SOFT_DISABLED - When set, do not trace the event (even though its
- * tracepoint may be enabled)
+ * tracepoint may be enabled)
+ * TRIGGER_MODE - The event is enabled/disabled by SOFT_DISABLED
*/
enum {
FTRACE_EVENT_FL_ENABLED = (1 << FTRACE_EVENT_FL_ENABLED_BIT),
FTRACE_EVENT_FL_RECORDED_CMD = (1 << FTRACE_EVENT_FL_RECORDED_CMD_BIT),
FTRACE_EVENT_FL_SOFT_MODE = (1 << FTRACE_EVENT_FL_SOFT_MODE_BIT),
FTRACE_EVENT_FL_SOFT_DISABLED = (1 << FTRACE_EVENT_FL_SOFT_DISABLED_BIT),
+ FTRACE_EVENT_FL_TRIGGER_MODE = (1 << FTRACE_EVENT_FL_TRIGGER_MODE_BIT),
};

struct ftrace_event_file {
@@ -278,6 +281,7 @@ struct ftrace_event_file {
struct dentry *dir;
struct trace_array *tr;
struct ftrace_subsystem_dir *system;
+ struct list_head triggers;

/*
* 32 bit flags:
@@ -285,6 +289,7 @@ struct ftrace_event_file {
* bit 1: enabled cmd record
* bit 2: enable/disable with the soft disable bit
* bit 3: soft disabled
+ * bit 4: trigger enabled
*
* Note: The bits must be set atomically to prevent races
* from other writers. Reads of flags do not need to be in
@@ -296,6 +301,7 @@ struct ftrace_event_file {
*/
unsigned long flags;
atomic_t sm_ref; /* soft-mode reference counter */
+ atomic_t tm_ref; /* trigger-mode reference counter */
};

#define __TRACE_EVENT_FLAGS(name, value) \
@@ -310,12 +316,17 @@ struct ftrace_event_file {

#define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */

+enum event_trigger_type {
+ ETT_NONE = (0),
+};
+
extern void destroy_preds(struct ftrace_event_call *call);
extern int filter_match_preds(struct event_filter *filter, void *rec);
extern int filter_current_check_discard(struct ring_buffer *buffer,
struct ftrace_event_call *call,
void *rec,
struct ring_buffer_event *event);
+extern void event_triggers_call(struct ftrace_event_file *file);

enum {
FILTER_OTHER = 0,
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 41a6643..326ba32 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -526,6 +526,10 @@ ftrace_raw_event_##call(void *__data, proto) \
int __data_size; \
int pc; \
\
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
+ &ftrace_file->flags)) \
+ event_triggers_call(ftrace_file); \
+ \
if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
&ftrace_file->flags)) \
return; \
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index d7e2068..1378e84 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -50,6 +50,7 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
endif
obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
+obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
obj-$(CONFIG_TRACEPOINTS) += power-traces.o
ifeq ($(CONFIG_PM_RUNTIME),y)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index b1227b9..f8a18e5 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1016,9 +1016,180 @@ extern void trace_event_enable_cmd_record(bool enable);
extern int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr);
extern int event_trace_del_tracer(struct trace_array *tr);

+static inline void *event_file_data(struct file *filp)
+{
+ return ACCESS_ONCE(file_inode(filp)->i_private);
+}
+
extern struct mutex event_mutex;
extern struct list_head ftrace_events;

+extern const struct file_operations event_trigger_fops;
+
+extern int register_trigger_cmds(void);
+extern void clear_event_triggers(struct trace_array *tr);
+
+/**
+ * struct event_trigger_ops - callbacks for trace event triggers
+ *
+ * The methods in this structure provide per-event trigger hooks for
+ * various trigger operations.
+ *
+ * All the methods below, except for @init() and @free(), must be
+ * implemented.
+ *
+ * @func: The trigger 'probe' function called when the triggering
+ * event occurs. The data passed into this callback is the data
+ * that was supplied to the event_command @reg() function that
+ * registered the trigger (see struct event_command).
+ *
+ * @init: An optional initialization function called for the trigger
+ * when the trigger is registered (via the event_command reg()
+ * function). This can be used to perform per-trigger
+ * initialization such as incrementing a per-trigger reference
+ * count, for instance. This is usually implemented by the
+ * generic utility function @event_trigger_init() (see
+ * trace_event_triggers.c).
+ *
+ * @free: An optional de-initialization function called for the
+ * trigger when the trigger is unregistered (via the
+ * event_command @reg() function). This can be used to perform
+ * per-trigger de-initialization such as decrementing a
+ * per-trigger reference count and freeing corresponding trigger
+ * data, for instance. This is usually implemented by the
+ * generic utility function @event_trigger_free() (see
+ * trace_event_triggers.c).
+ *
+ * @print: The callback function invoked to have the trigger print
+ * itself. This is usually implemented by a wrapper function
+ * that calls the generic utility function @event_trigger_print()
+ * (see trace_event_triggers.c).
+ */
+struct event_trigger_ops {
+ void (*func)(void **data);
+ int (*init)(struct event_trigger_ops *ops,
+ void **data);
+ void (*free)(struct event_trigger_ops *ops,
+ void **data);
+ int (*print)(struct seq_file *m,
+ struct event_trigger_ops *ops,
+ void *data);
+};
+
+/**
+ * struct event_command - callbacks and data members for event commands
+ *
+ * Event commands are invoked by users by writing the command name
+ * into the 'trigger' file associated with a trace event. The
+ * parameters associated with a specific invocation of an event
+ * command are used to create an event trigger instance, which is
+ * added to the list of trigger instances associated with that trace
+ * event. When the event is hit, the set of triggers associated with
+ * that event is invoked.
+ *
+ * The data members in this structure provide per-event command data
+ * for various event commands.
+ *
+ * All the data members below, except for @post_trigger, must be set
+ * for each event command.
+ *
+ * @name: The unique name that identifies the event command. This is
+ * the name used when setting triggers via trigger files.
+ *
+ * @trigger_type: A unique id that identifies the event command
+ * 'type'. This value has two purposes, the first to ensure that
+ * only one trigger of the same type can be set at a given time
+ * for a particular event e.g. it doesn't make sense to have both
+ * a traceon and traceoff trigger attached to a single event at
+ * the same time, so traceon and traceoff have the same type
+ * though they have different names. The @trigger_type value is
+ * also used as a bit value for deferring the actual trigger
+ * action until after the current event is finished. Some
+ * commands need to do this if they themselves log to the trace
+ * buffer (see the @post_trigger() member below). @trigger_type
+ * values are defined by adding new values to the trigger_type
+ * enum in include/linux/ftrace_event.h.
+ *
+ * @post_trigger: A flag that says whether or not this command needs
+ * to have its action delayed until after the current event has
+ * been closed. Some triggers need to avoid being invoked while
+ * an event is currently in the process of being logged, since
+ * the trigger may itself log data into the trace buffer. Thus
+ * we make sure the current event is committed before invoking
+ * those triggers. To do that, the trigger invocation is split
+ * in two - the first part checks the filter using the current
+ * trace record; if a command has the @post_trigger flag set, it
+ * sets a bit for itself in the return value, otherwise it
+ * directly invokes the trigger. Once all commands have been
+ * either invoked or set their return flag, the current record is
+ * either committed or discarded. At that point, if any commands
+ * have deferred their triggers, those commands are finally
+ * invoked following the close of the current event. In other
+ * words, if the event_trigger_ops @func() probe implementation
+ * itself logs to the trace buffer, this flag should be set,
+ * otherwise it can be left unspecified.
+ *
+ * All the methods below, except for @set_filter(), must be
+ * implemented.
+ *
+ * @func: The callback function responsible for parsing and
+ * registering the trigger written to the 'trigger' file by the
+ * user. It allocates the trigger instance and registers it with
+ * the appropriate trace event. It makes use of the other
+ * event_command callback functions to orchestrate this, and is
+ * usually implemented by the generic utility function
+ * @event_trigger_callback() (see trace_event_triggers.c).
+ *
+ * @reg: Adds the trigger to the list of triggers associated with the
+ * event, and enables the event trigger itself, after
+ * initializing it (via the event_trigger_ops @init() function).
+ * This is also where commands can use the @trigger_type value to
+ * make the decision as to whether or not multiple instances of
+ * the trigger should be allowed. This is usually implemented by
+ * the generic utility function @register_trigger() (see
+ * trace_event_triggers.c).
+ *
+ * @unreg: Removes the trigger from the list of triggers associated
+ * with the event, and disables the event trigger itself, after
+ * initializing it (via the event_trigger_ops @free() function).
+ * This is usually implemented by the generic utility function
+ * @unregister_trigger() (see trace_event_triggers.c).
+ *
+ * @set_filter: An optional function called to parse and set a filter
+ * for the trigger. If no @set_filter() method is set for the
+ * event command, filters set by the user for the command will be
+ * ignored. This is usually implemented by the generic utility
+ * function @set_trigger_filter() (see trace_event_triggers.c).
+ *
+ * @get_trigger_ops: The callback function invoked to retrieve the
+ * event_trigger_ops implementation associated with the command.
+ */
+struct event_command {
+ struct list_head list;
+ char *name;
+ enum event_trigger_type trigger_type;
+ bool post_trigger;
+ int (*func)(struct event_command *cmd_ops,
+ struct ftrace_event_file *file,
+ char *glob, char *cmd,
+ char *params, int enable);
+ int (*reg)(char *glob,
+ struct event_trigger_ops *trigger_ops,
+ void *trigger_data,
+ struct ftrace_event_file *file);
+ void (*unreg)(char *glob,
+ struct event_trigger_ops *trigger_ops,
+ void *trigger_data,
+ struct ftrace_event_file *file);
+ int (*set_filter)(char *filter_str,
+ void *trigger_data,
+ struct ftrace_event_file *file);
+ struct event_trigger_ops *(*get_trigger_ops)(char *cmd, char *param);
+};
+
+extern int trace_event_enable_disable(struct ftrace_event_file *file,
+ int enable, int soft_disable);
+
extern const char *__start___trace_bprintk_fmt[];
extern const char *__stop___trace_bprintk_fmt[];

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 368a4d5..7d8eb8a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -342,6 +342,12 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
return ret;
}

+int trace_event_enable_disable(struct ftrace_event_file *file,
+ int enable, int soft_disable)
+{
+ return __ftrace_event_enable_disable(file, enable, soft_disable);
+}
+
static int ftrace_event_enable_disable(struct ftrace_event_file *file,
int enable)
{
@@ -421,11 +427,6 @@ static void remove_subsystem(struct ftrace_subsystem_dir *dir)
}
}

-static void *event_file_data(struct file *filp)
-{
- return ACCESS_ONCE(file_inode(filp)->i_private);
-}
-
static void remove_event_file_dir(struct ftrace_event_file *file)
{
struct dentry *dir = file->dir;
@@ -1542,6 +1543,9 @@ event_create_dir(struct dentry *parent, struct ftrace_event_file *file)
trace_create_file("filter", 0644, file->dir, call,
&ftrace_event_filter_fops);

+ trace_create_file("trigger", 0644, file->dir, file,
+ &event_trigger_fops);
+
trace_create_file("format", 0444, file->dir, call,
&ftrace_event_format_fops);

@@ -1637,6 +1641,8 @@ trace_create_new_event(struct ftrace_event_call *call,
file->event_call = call;
file->tr = tr;
atomic_set(&file->sm_ref, 0);
+ atomic_set(&file->tm_ref, 0);
+ INIT_LIST_HEAD(&file->triggers);
list_add(&file->list, &tr->events);

return file;
@@ -2303,6 +2309,9 @@ int event_trace_del_tracer(struct trace_array *tr)
{
mutex_lock(&event_mutex);

+ /* Disable any event triggers and associated soft-disabled events */
+ clear_event_triggers(tr);
+
/* Disable any running events */
__ftrace_set_clr_event_nolock(tr, NULL, NULL, NULL, 0);

@@ -2366,6 +2375,8 @@ static __init int event_trace_enable(void)

register_event_cmds();

+ register_trigger_cmds();
+
return 0;
}

diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
new file mode 100644
index 0000000..5ec8336
--- /dev/null
+++ b/kernel/trace/trace_events_trigger.c
@@ -0,0 +1,280 @@
+/*
+ * trace_events_trigger - trace event triggers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2013 Tom Zanussi <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/ctype.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+
+#include "trace.h"
+
+static LIST_HEAD(trigger_commands);
+static DEFINE_MUTEX(trigger_cmd_mutex);
+
+struct event_trigger_data {
+ struct ftrace_event_file *file;
+ unsigned long count;
+ int ref;
+ bool enable;
+ struct event_trigger_ops *ops;
+ struct event_command * cmd_ops;
+ struct event_filter *filter;
+ char *filter_str;
+ struct list_head list;
+};
+
+void event_triggers_call(struct ftrace_event_file *file)
+{
+ struct event_trigger_data *data;
+
+ if (list_empty(&file->triggers))
+ return;
+
+ preempt_disable_notrace();
+ list_for_each_entry_rcu(data, &file->triggers, list)
+ data->ops->func((void **)&data);
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(event_triggers_call);
+
+static void *trigger_next(struct seq_file *m, void *t, loff_t *pos)
+{
+ struct ftrace_event_file *event_file = event_file_data(m->private);
+
+ return seq_list_next(t, &event_file->triggers, pos);
+}
+
+static void *trigger_start(struct seq_file *m, loff_t *pos)
+{
+ struct ftrace_event_file *event_file;
+
+ /* ->stop() is called even if ->start() fails */
+ mutex_lock(&event_mutex);
+ event_file = event_file_data(m->private);
+ if (unlikely(!event_file))
+ return ERR_PTR(-ENODEV);
+
+ return seq_list_start(&event_file->triggers, *pos);
+}
+
+static void trigger_stop(struct seq_file *m, void *t)
+{
+ mutex_unlock(&event_mutex);
+}
+
+static int trigger_show(struct seq_file *m, void *v)
+{
+ struct event_trigger_data *data;
+
+ data = list_entry(v, struct event_trigger_data, list);
+ data->ops->print(m, data->ops, data);
+
+ return 0;
+}
+
+static const struct seq_operations event_triggers_seq_ops = {
+ .start = trigger_start,
+ .next = trigger_next,
+ .stop = trigger_stop,
+ .show = trigger_show,
+};
+
+static int event_trigger_regex_open(struct inode *inode, struct file *file)
+{
+ int ret = 0;
+
+ mutex_lock(&event_mutex);
+
+ if (unlikely(!event_file_data(file))) {
+ mutex_unlock(&event_mutex);
+ return -ENODEV;
+ }
+
+ if (file->f_mode & FMODE_READ) {
+ ret = seq_open(file, &event_triggers_seq_ops);
+ if (!ret) {
+ struct seq_file *m = file->private_data;
+ m->private = file;
+ }
+ }
+
+ mutex_unlock(&event_mutex);
+
+ return ret;
+}
+
+static int trigger_process_regex(struct ftrace_event_file *file,
+ char *buff, int enable)
+{
+ char *command, *next = buff;
+ struct event_command *p;
+ int ret = -EINVAL;
+
+ command = strsep(&next, ": \t");
+ command = (command[0] != '!') ? command : command + 1;
+
+ mutex_lock(&trigger_cmd_mutex);
+ list_for_each_entry(p, &trigger_commands, list) {
+ if (strcmp(p->name, command) == 0) {
+ ret = p->func(p, file, buff, command, next, enable);
+ goto out_unlock;
+ }
+ }
+ out_unlock:
+ mutex_unlock(&trigger_cmd_mutex);
+
+ return ret;
+}
+
+static ssize_t event_trigger_regex_write(struct file *file,
+ const char __user *ubuf,
+ size_t cnt, loff_t *ppos, int enable)
+{
+ struct ftrace_event_file *event_file;
+ ssize_t ret;
+ char *buf;
+
+ if (!cnt)
+ return 0;
+
+ if (cnt >= PAGE_SIZE)
+ return -EINVAL;
+
+ buf = (char *)__get_free_page(GFP_TEMPORARY);
+ if (!buf)
+ return -ENOMEM;
+
+ if (copy_from_user(buf, ubuf, cnt)) {
+ free_page((unsigned long) buf);
+ return -EFAULT;
+ }
+ buf[cnt] = '\0';
+ strim(buf);
+
+ mutex_lock(&event_mutex);
+ event_file = event_file_data(file);
+ if (unlikely(!event_file)) {
+ mutex_unlock(&event_mutex);
+ free_page((unsigned long) buf);
+ return -ENODEV;
+ }
+ ret = trigger_process_regex(event_file, buf, enable);
+ mutex_unlock(&event_mutex);
+
+ free_page((unsigned long) buf);
+ if (ret < 0)
+ goto out;
+
+ *ppos += cnt;
+ ret = cnt;
+ out:
+ return ret;
+}
+
+static int event_trigger_regex_release(struct inode *inode, struct file *file)
+{
+ mutex_lock(&event_mutex);
+
+ if (file->f_mode & FMODE_READ)
+ seq_release(inode, file);
+
+ mutex_unlock(&event_mutex);
+
+ return 0;
+}
+
+static ssize_t
+event_trigger_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ return event_trigger_regex_write(filp, ubuf, cnt, ppos, 1);
+}
+
+static int
+event_trigger_open(struct inode *inode, struct file *filp)
+{
+ return event_trigger_regex_open(inode, filp);
+}
+
+static int
+event_trigger_release(struct inode *inode, struct file *file)
+{
+ return event_trigger_regex_release(inode, file);
+}
+
+const struct file_operations event_trigger_fops = {
+ .open = event_trigger_open,
+ .read = seq_read,
+ .write = event_trigger_write,
+ .llseek = ftrace_filter_lseek,
+ .release = event_trigger_release,
+};
+
+static int trace_event_trigger_enable_disable(struct ftrace_event_file *file,
+ int trigger_enable)
+{
+ int ret = 0;
+
+ if (trigger_enable) {
+ if (atomic_inc_return(&file->tm_ref) > 1)
+ return ret;
+ set_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
+ ret = trace_event_enable_disable(file, 1, 1);
+ } else {
+ if (atomic_dec_return(&file->tm_ref) > 0)
+ return ret;
+ clear_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
+ ret = trace_event_enable_disable(file, 0, 1);
+ }
+
+ return ret;
+}
+
+/**
+ * clear_event_triggers - clear all triggers associated with a trace array.
+ *
+ * For each trigger, the triggering event has its tm_ref decremented
+ * via trace_event_trigger_enable_disable(), and any associated event
+ * (in the case of enable/disable_event triggers) will have its sm_ref
+ * decremented via free()->trace_event_enable_disable(). That
+ * combination effectively reverses the soft-mode/trigger state added
+ * by trigger registration.
+ *
+ * Must be called with event_mutex held.
+ */
+void
+clear_event_triggers(struct trace_array *tr)
+{
+ struct ftrace_event_file *file;
+
+ list_for_each_entry(file, &tr->events, list) {
+ struct event_trigger_data *data;
+ list_for_each_entry_rcu(data, &file->triggers, list) {
+ trace_event_trigger_enable_disable(file, 0);
+ if (data->ops->free)
+ data->ops->free(data->ops, (void **)&data);
+ }
+ }
+}
+
+__init int register_trigger_cmds(void)
+{
+ return 0;
+}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 230cdb6..4f56d54 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -321,6 +321,8 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
if (!ftrace_file)
return;

+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
+ event_triggers_call(ftrace_file);
if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
return;

@@ -370,6 +372,8 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
if (!ftrace_file)
return;

+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
+ event_triggers_call(ftrace_file);
if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
return;

--
1.7.11.4

2013-08-27 19:40:50

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 08/10] tracing: update event filters for multibuffer

The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:

Before:

# echo 'count > 65536' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
count > 65536
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'count > 4096' > /sys/kernel/debug/tracing/instances/test1/events/syscalls/sys_enter_read/filter
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
count > 4096

Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.

After:

# echo 'count > 65536' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
count > 65536
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'count > 4096' > /sys/kernel/debug/tracing/instances/test1/events/syscalls/sys_enter_read/filter
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/filter
count > 65536

We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters. For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavioir for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.

The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ftrace_event.h | 34 ++++++-
include/trace/ftrace.h | 6 +-
kernel/trace/trace.c | 18 ++--
kernel/trace/trace.h | 10 +--
kernel/trace/trace_branch.c | 2 +-
kernel/trace/trace_events.c | 26 +++---
kernel/trace/trace_events_filter.c | 168 +++++++++++++++++++++++++++--------
kernel/trace/trace_export.c | 2 +-
kernel/trace/trace_functions_graph.c | 4 +-
kernel/trace/trace_kprobe.c | 4 +-
kernel/trace/trace_mmiotrace.c | 4 +-
kernel/trace/trace_sched_switch.c | 4 +-
kernel/trace/trace_syscalls.c | 8 +-
kernel/trace/trace_uprobe.c | 3 +-
14 files changed, 201 insertions(+), 92 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cccfad3..8c2e842 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -202,6 +202,7 @@ enum {
TRACE_EVENT_FL_NO_SET_FILTER_BIT,
TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
TRACE_EVENT_FL_WAS_ENABLED_BIT,
+ TRACE_EVENT_FL_USE_CALL_FILTER_BIT,
};

/*
@@ -213,6 +214,7 @@ enum {
* WAS_ENABLED - Set and stays set when an event was ever enabled
* (used for module unloading, if a module event is enabled,
* it is best to clear the buffers that used it).
+ * USE_CALL_FILTER - For ftrace internal events, don't use file filter
*/
enum {
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@@ -220,6 +222,7 @@ enum {
TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT),
TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT),
TRACE_EVENT_FL_WAS_ENABLED = (1 << TRACE_EVENT_FL_WAS_ENABLED_BIT),
+ TRACE_EVENT_FL_USE_CALL_FILTER = (1 << TRACE_EVENT_FL_USE_CALL_FILTER_BIT),
};

struct ftrace_event_call {
@@ -238,6 +241,7 @@ struct ftrace_event_call {
* bit 2: failed to apply filter
* bit 3: ftrace internal event (do not enable)
* bit 4: Event was enabled by module
+ * bit 5: use call filter rather than file filter
*/
int flags; /* static flags of different events */

@@ -253,6 +257,8 @@ struct ftrace_subsystem_dir;
enum {
FTRACE_EVENT_FL_ENABLED_BIT,
FTRACE_EVENT_FL_RECORDED_CMD_BIT,
+ FTRACE_EVENT_FL_FILTERED_BIT,
+ FTRACE_EVENT_FL_NO_SET_FILTER_BIT,
FTRACE_EVENT_FL_SOFT_MODE_BIT,
FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
@@ -262,6 +268,8 @@ enum {
* Ftrace event file flags:
* ENABLED - The event is enabled
* RECORDED_CMD - The comms should be recorded at sched_switch
+ * FILTERED - The event has a filter attached
+ * NO_SET_FILTER - Set when filter has error and is to be ignored
* SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
* SOFT_DISABLED - When set, do not trace the event (even though its
* tracepoint may be enabled)
@@ -270,6 +278,8 @@ enum {
enum {
FTRACE_EVENT_FL_ENABLED = (1 << FTRACE_EVENT_FL_ENABLED_BIT),
FTRACE_EVENT_FL_RECORDED_CMD = (1 << FTRACE_EVENT_FL_RECORDED_CMD_BIT),
+ FTRACE_EVENT_FL_FILTERED = (1 << FTRACE_EVENT_FL_FILTERED_BIT),
+ FTRACE_EVENT_FL_NO_SET_FILTER = (1 << FTRACE_EVENT_FL_NO_SET_FILTER_BIT),
FTRACE_EVENT_FL_SOFT_MODE = (1 << FTRACE_EVENT_FL_SOFT_MODE_BIT),
FTRACE_EVENT_FL_SOFT_DISABLED = (1 << FTRACE_EVENT_FL_SOFT_DISABLED_BIT),
FTRACE_EVENT_FL_TRIGGER_MODE = (1 << FTRACE_EVENT_FL_TRIGGER_MODE_BIT),
@@ -278,6 +288,7 @@ enum {
struct ftrace_event_file {
struct list_head list;
struct ftrace_event_call *event_call;
+ struct event_filter *filter;
struct dentry *dir;
struct trace_array *tr;
struct ftrace_subsystem_dir *system;
@@ -288,8 +299,10 @@ struct ftrace_event_file {
* bit 0: enabled
* bit 1: enabled cmd record
* bit 2: enable/disable with the soft disable bit
- * bit 3: soft disabled
- * bit 4: trigger enabled
+ * bit 3: filter_active
+ * bit 4: failed to apply filter
+ * bit 5: soft disabled
+ * bit 6: trigger enabled
*
* Note: The bits must be set atomically to prevent races
* from other writers. Reads of flags do not need to be in
@@ -324,7 +337,8 @@ enum event_trigger_type {
ETT_EVENT_ENABLE = (1 << 3),
};

-extern void destroy_preds(struct ftrace_event_call *call);
+extern void destroy_preds(struct ftrace_event_file *file);
+extern void destroy_call_preds(struct ftrace_event_call *call);
extern int filter_match_preds(struct event_filter *filter, void *rec);

extern int filter_current_check_discard(struct ring_buffer *buffer,
@@ -336,6 +350,20 @@ extern enum event_trigger_type event_triggers_call(struct ftrace_event_file *fil
extern void event_triggers_post_call(struct ftrace_event_file *file,
enum event_trigger_type tt);

+static inline int
+filter_check_discard(struct ftrace_event_file *file, void *rec,
+ struct ring_buffer *buffer,
+ struct ring_buffer_event *event)
+{
+ if (unlikely(file->flags & FTRACE_EVENT_FL_FILTERED) &&
+ !filter_match_preds(file->filter, rec)) {
+ ring_buffer_discard_commit(buffer, event);
+ return 1;
+ }
+
+ return 0;
+}
+
enum {
FILTER_OTHER = 0,
FILTER_STATIC_STRING,
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 6c701c3..0de03fd 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -446,8 +446,7 @@ static inline notrace int ftrace_get_offsets_##call( \
* if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
* &ftrace_file->flags))
* ring_buffer_discard_commit(buffer, event);
- * else if (!filter_current_check_discard(buffer, event_call,
- * entry, event))
+ * else if (!filter_check_discard(ftrace_file, entry, buffer, event))
* trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
*
* if (__tt)
@@ -568,8 +567,7 @@ ftrace_raw_event_##call(void *__data, proto) \
if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
&ftrace_file->flags)) \
ring_buffer_discard_commit(buffer, event); \
- else if (!filter_current_check_discard(buffer, event_call, \
- entry, event)) \
+ else if (!filter_check_discard(ftrace_file, entry, buffer, event)) \
trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
\
if (__tt) \
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5a61dbe..2aabd34 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -235,14 +235,6 @@ void trace_array_put(struct trace_array *this_tr)
mutex_unlock(&trace_types_lock);
}

-int filter_current_check_discard(struct ring_buffer *buffer,
- struct ftrace_event_call *call, void *rec,
- struct ring_buffer_event *event)
-{
- return filter_check_discard(call, rec, buffer, event);
-}
-EXPORT_SYMBOL_GPL(filter_current_check_discard);
-
cycle_t buffer_ftrace_now(struct trace_buffer *buf, int cpu)
{
u64 ts;
@@ -1630,7 +1622,7 @@ trace_function(struct trace_array *tr,
entry->ip = ip;
entry->parent_ip = parent_ip;

- if (!filter_check_discard(call, entry, buffer, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
__buffer_unlock_commit(buffer, event);
}

@@ -1714,7 +1706,7 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer,

entry->size = trace.nr_entries;

- if (!filter_check_discard(call, entry, buffer, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
__buffer_unlock_commit(buffer, event);

out:
@@ -1816,7 +1808,7 @@ ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc)
trace.entries = entry->caller;

save_stack_trace_user(&trace);
- if (!filter_check_discard(call, entry, buffer, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
__buffer_unlock_commit(buffer, event);

out_drop_count:
@@ -2008,7 +2000,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
entry->fmt = fmt;

memcpy(entry->buf, tbuffer, sizeof(u32) * len);
- if (!filter_check_discard(call, entry, buffer, event)) {
+ if (!call_filter_check_discard(call, entry, buffer, event)) {
__buffer_unlock_commit(buffer, event);
ftrace_trace_stack(buffer, flags, 6, pc);
}
@@ -2063,7 +2055,7 @@ __trace_array_vprintk(struct ring_buffer *buffer,

memcpy(&entry->buf, tbuffer, len);
entry->buf[len] = '\0';
- if (!filter_check_discard(call, entry, buffer, event)) {
+ if (!call_filter_check_discard(call, entry, buffer, event)) {
__buffer_unlock_commit(buffer, event);
ftrace_trace_stack(buffer, flags, 6, pc);
}
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index af5f3b6..a588ca8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -985,9 +985,9 @@ struct filter_pred {

extern enum regex_type
filter_parse_regex(char *buff, int len, char **search, int *not);
-extern void print_event_filter(struct ftrace_event_call *call,
+extern void print_event_filter(struct ftrace_event_file *file,
struct trace_seq *s);
-extern int apply_event_filter(struct ftrace_event_call *call,
+extern int apply_event_filter(struct ftrace_event_file *file,
char *filter_string);
extern int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
char *filter_string);
@@ -1003,9 +1003,9 @@ struct ftrace_event_field *
trace_find_event_field(struct ftrace_event_call *call, char *name);

static inline int
-filter_check_discard(struct ftrace_event_call *call, void *rec,
- struct ring_buffer *buffer,
- struct ring_buffer_event *event)
+call_filter_check_discard(struct ftrace_event_call *call, void *rec,
+ struct ring_buffer *buffer,
+ struct ring_buffer_event *event)
{
if (unlikely(call->flags & TRACE_EVENT_FL_FILTERED) &&
!filter_match_preds(call->filter, rec)) {
diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
index d594da0..697fb9b 100644
--- a/kernel/trace/trace_branch.c
+++ b/kernel/trace/trace_branch.c
@@ -78,7 +78,7 @@ probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
entry->line = f->line;
entry->correct = val == expect;

- if (!filter_check_discard(call, entry, buffer, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
__buffer_unlock_commit(buffer, event);

out:
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 25b2c86..7dacbd1 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -990,7 +990,7 @@ static ssize_t
event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_call *call;
+ struct ftrace_event_file *file;
struct trace_seq *s;
int r = -ENODEV;

@@ -1005,12 +1005,12 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
trace_seq_init(s);

mutex_lock(&event_mutex);
- call = event_file_data(filp);
- if (call)
- print_event_filter(call, s);
+ file = event_file_data(filp);
+ if (file)
+ print_event_filter(file, s);
mutex_unlock(&event_mutex);

- if (call)
+ if (file)
r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);

kfree(s);
@@ -1022,7 +1022,7 @@ static ssize_t
event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
loff_t *ppos)
{
- struct ftrace_event_call *call;
+ struct ftrace_event_file *file;
char *buf;
int err = -ENODEV;

@@ -1040,9 +1040,9 @@ event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
buf[cnt] = '\0';

mutex_lock(&event_mutex);
- call = event_file_data(filp);
- if (call)
- err = apply_event_filter(call, buf);
+ file = event_file_data(filp);
+ if (file)
+ err = apply_event_filter(file, buf);
mutex_unlock(&event_mutex);

free_page((unsigned long) buf);
@@ -1540,7 +1540,7 @@ event_create_dir(struct dentry *parent, struct ftrace_event_file *file)
return -1;
}
}
- trace_create_file("filter", 0644, file->dir, call,
+ trace_create_file("filter", 0644, file->dir, file,
&ftrace_event_filter_fops);

trace_create_file("trigger", 0644, file->dir, file,
@@ -1581,6 +1581,10 @@ static void event_remove(struct ftrace_event_call *call)
if (file->event_call != call)
continue;
ftrace_event_enable_disable(file, 0);
+ if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
+ destroy_call_preds(call);
+ else
+ destroy_preds(file);
/*
* The do_for_each_event_file() is
* a double loop. After finding the call for this
@@ -1706,7 +1710,7 @@ static void __trace_remove_event_call(struct ftrace_event_call *call)
{
event_remove(call);
trace_destroy_fields(call);
- destroy_preds(call);
+ destroy_call_preds(call);
}

static int probe_remove_event_call(struct ftrace_event_call *call)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 0c45aa1..af55a84 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -638,9 +638,14 @@ static void append_filter_err(struct filter_parse_state *ps,
}

/* caller must hold event_mutex */
-void print_event_filter(struct ftrace_event_call *call, struct trace_seq *s)
+void print_event_filter(struct ftrace_event_file *file, struct trace_seq *s)
{
- struct event_filter *filter = call->filter;
+ struct event_filter *filter;
+
+ if (file->event_call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
+ filter = file->event_call->filter;
+ else
+ filter = file->filter;

if (filter && filter->filter_string)
trace_seq_printf(s, "%s\n", filter->filter_string);
@@ -766,7 +771,12 @@ static void __free_preds(struct event_filter *filter)
filter->n_preds = 0;
}

-static void filter_disable(struct ftrace_event_call *call)
+static void filter_disable(struct ftrace_event_file *file)
+{
+ file->flags &= ~FTRACE_EVENT_FL_FILTERED;
+}
+
+static void call_filter_disable(struct ftrace_event_call *call)
{
call->flags &= ~TRACE_EVENT_FL_FILTERED;
}
@@ -787,12 +797,24 @@ void free_event_filter(struct event_filter *filter)
}

/*
+ * Called when destroying the ftrace_event_file.
+ * The file is being freed, so we do not need to worry about
+ * the file being currently used. This is for module code removing
+ * the tracepoints from within it.
+ */
+void destroy_preds(struct ftrace_event_file *file)
+{
+ __free_filter(file->filter);
+ file->filter = NULL;
+}
+
+/*
* Called when destroying the ftrace_event_call.
* The call is being freed, so we do not need to worry about
* the call being currently used. This is for module code removing
* the tracepoints from within it.
*/
-void destroy_preds(struct ftrace_event_call *call)
+void destroy_call_preds(struct ftrace_event_call *call)
{
__free_filter(call->filter);
call->filter = NULL;
@@ -830,28 +852,44 @@ static int __alloc_preds(struct event_filter *filter, int n_preds)
return 0;
}

-static void filter_free_subsystem_preds(struct event_subsystem *system)
+static void filter_free_subsystem_preds(struct event_subsystem *system,
+ struct trace_array *tr)
{
+ struct ftrace_event_file *file;
struct ftrace_event_call *call;

- list_for_each_entry(call, &ftrace_events, list) {
+ list_for_each_entry(file, &tr->events, list) {
+ call = file->event_call;
if (strcmp(call->class->system, system->name) != 0)
continue;

- filter_disable(call);
- remove_filter_string(call->filter);
+ if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
+ call_filter_disable(call);
+ remove_filter_string(call->filter);
+ } else {
+ filter_disable(file);
+ remove_filter_string(file->filter);
+ }
}
}

-static void filter_free_subsystem_filters(struct event_subsystem *system)
+static void filter_free_subsystem_filters(struct event_subsystem *system,
+ struct trace_array *tr)
{
+ struct ftrace_event_file *file;
struct ftrace_event_call *call;

- list_for_each_entry(call, &ftrace_events, list) {
+ list_for_each_entry(file, &tr->events, list) {
+ call = file->event_call;
if (strcmp(call->class->system, system->name) != 0)
continue;
- __free_filter(call->filter);
- call->filter = NULL;
+ if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
+ __free_filter(call->filter);
+ call->filter = NULL;
+ } else {
+ __free_filter(file->filter);
+ file->filter = NULL;
+ }
}
}

@@ -1628,9 +1666,11 @@ struct filter_list {
};

static int replace_system_preds(struct event_subsystem *system,
+ struct trace_array *tr,
struct filter_parse_state *ps,
char *filter_string)
{
+ struct ftrace_event_file *file;
struct ftrace_event_call *call;
struct filter_list *filter_item;
struct filter_list *tmp;
@@ -1638,8 +1678,8 @@ static int replace_system_preds(struct event_subsystem *system,
bool fail = true;
int err;

- list_for_each_entry(call, &ftrace_events, list) {
-
+ list_for_each_entry(file, &tr->events, list) {
+ call = file->event_call;
if (strcmp(call->class->system, system->name) != 0)
continue;

@@ -1648,21 +1688,34 @@ static int replace_system_preds(struct event_subsystem *system,
* (filter arg is ignored on dry_run)
*/
err = replace_preds(call, NULL, ps, filter_string, true);
- if (err)
- call->flags |= TRACE_EVENT_FL_NO_SET_FILTER;
- else
- call->flags &= ~TRACE_EVENT_FL_NO_SET_FILTER;
+ if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
+ if (err)
+ call->flags |= TRACE_EVENT_FL_NO_SET_FILTER;
+ else
+ call->flags &= ~TRACE_EVENT_FL_NO_SET_FILTER;
+ } else {
+ if (err)
+ file->flags |= FTRACE_EVENT_FL_NO_SET_FILTER;
+ else
+ file->flags &= ~FTRACE_EVENT_FL_NO_SET_FILTER;
+ }
}

- list_for_each_entry(call, &ftrace_events, list) {
+ list_for_each_entry(file, &tr->events, list) {
struct event_filter *filter;

+ call = file->event_call;
+
if (strcmp(call->class->system, system->name) != 0)
continue;

- if (call->flags & TRACE_EVENT_FL_NO_SET_FILTER)
+ if (file->flags & FTRACE_EVENT_FL_NO_SET_FILTER)
continue;

+ if ((call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) &&
+ (call->flags & TRACE_EVENT_FL_NO_SET_FILTER))
+ continue;
+
filter_item = kzalloc(sizeof(*filter_item), GFP_KERNEL);
if (!filter_item)
goto fail_mem;
@@ -1681,17 +1734,29 @@ static int replace_system_preds(struct event_subsystem *system,

err = replace_preds(call, filter, ps, filter_string, false);
if (err) {
- filter_disable(call);
+ if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
+ call_filter_disable(call);
+ else
+ filter_disable(file);
parse_error(ps, FILT_ERR_BAD_SUBSYS_FILTER, 0);
append_filter_err(ps, filter);
- } else
- call->flags |= TRACE_EVENT_FL_FILTERED;
+ } else {
+ if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
+ call->flags |= TRACE_EVENT_FL_FILTERED;
+ else
+ file->flags |= FTRACE_EVENT_FL_FILTERED;
+ }
/*
* Regardless of if this returned an error, we still
* replace the filter for the call.
*/
- filter = call->filter;
- rcu_assign_pointer(call->filter, filter_item->filter);
+ if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
+ filter = call->filter;
+ rcu_assign_pointer(call->filter, filter_item->filter);
+ } else {
+ filter = file->filter;
+ rcu_assign_pointer(file->filter, filter_item->filter);
+ }
filter_item->filter = filter;

fail = false;
@@ -1829,6 +1894,7 @@ int create_event_filter(struct ftrace_event_call *call,
* and always remembers @filter_str.
*/
static int create_system_filter(struct event_subsystem *system,
+ struct trace_array *tr,
char *filter_str, struct event_filter **filterp)
{
struct event_filter *filter = NULL;
@@ -1837,7 +1903,7 @@ static int create_system_filter(struct event_subsystem *system,

err = create_filter_start(filter_str, true, &ps, &filter);
if (!err) {
- err = replace_system_preds(system, ps, filter_str);
+ err = replace_system_preds(system, tr, ps, filter_str);
if (!err) {
/* System filters just show a default message */
kfree(filter->filter_string);
@@ -1853,17 +1919,29 @@ static int create_system_filter(struct event_subsystem *system,
}

/* caller must hold event_mutex */
-int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
+int apply_event_filter(struct ftrace_event_file *file, char *filter_string)
{
+ struct ftrace_event_call *call = file->event_call;
struct event_filter *filter;
+ bool use_call_filter;
int err;

+ use_call_filter = call->flags & TRACE_EVENT_FL_USE_CALL_FILTER;
+
if (!strcmp(strstrip(filter_string), "0")) {
- filter_disable(call);
- filter = call->filter;
+ if (use_call_filter) {
+ call_filter_disable(call);
+ filter = call->filter;
+ } else {
+ filter_disable(file);
+ filter = file->filter;
+ }
if (!filter)
return 0;
- RCU_INIT_POINTER(call->filter, NULL);
+ if (use_call_filter)
+ RCU_INIT_POINTER(call->filter, NULL);
+ else
+ RCU_INIT_POINTER(file->filter, NULL);
/* Make sure the filter is not being used */
synchronize_sched();
__free_filter(filter);
@@ -1879,14 +1957,25 @@ int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
* string
*/
if (filter) {
- struct event_filter *tmp = call->filter;
+ struct event_filter *tmp;

- if (!err)
- call->flags |= TRACE_EVENT_FL_FILTERED;
- else
- filter_disable(call);
+ if (use_call_filter) {
+ tmp = call->filter;
+ if (!err)
+ call->flags |= TRACE_EVENT_FL_FILTERED;
+ else
+ call_filter_disable(call);
+
+ rcu_assign_pointer(call->filter, filter);
+ } else {
+ tmp = file->filter;
+ if (!err)
+ file->flags |= FTRACE_EVENT_FL_FILTERED;
+ else
+ filter_disable(file);

- rcu_assign_pointer(call->filter, filter);
+ rcu_assign_pointer(file->filter, filter);
+ }

if (tmp) {
/* Make sure the call is done with the filter */
@@ -1902,6 +1991,7 @@ int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
char *filter_string)
{
struct event_subsystem *system = dir->subsystem;
+ struct trace_array *tr = dir->tr;
struct event_filter *filter;
int err = 0;

@@ -1914,18 +2004,18 @@ int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
}

if (!strcmp(strstrip(filter_string), "0")) {
- filter_free_subsystem_preds(system);
+ filter_free_subsystem_preds(system, tr);
remove_filter_string(system->filter);
filter = system->filter;
system->filter = NULL;
/* Ensure all filters are no longer used */
synchronize_sched();
- filter_free_subsystem_filters(system);
+ filter_free_subsystem_filters(system, tr);
__free_filter(filter);
goto out_unlock;
}

- err = create_system_filter(system, filter_string, &filter);
+ err = create_system_filter(system, tr, filter_string, &filter);
if (filter) {
/*
* No event actually uses the system filter
diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
index d21a746..7c3e3e7 100644
--- a/kernel/trace/trace_export.c
+++ b/kernel/trace/trace_export.c
@@ -180,7 +180,7 @@ struct ftrace_event_call __used event_##call = { \
.event.type = etype, \
.class = &event_class_ftrace_##call, \
.print_fmt = print, \
- .flags = TRACE_EVENT_FL_IGNORE_ENABLE, \
+ .flags = TRACE_EVENT_FL_IGNORE_ENABLE | TRACE_EVENT_FL_USE_CALL_FILTER, \
}; \
struct ftrace_event_call __used \
__attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index b5c0924..7d2fcd7 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -230,7 +230,7 @@ int __trace_graph_entry(struct trace_array *tr,
return 0;
entry = ring_buffer_event_data(event);
entry->graph_ent = *trace;
- if (!filter_current_check_discard(buffer, call, entry, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
__buffer_unlock_commit(buffer, event);

return 1;
@@ -335,7 +335,7 @@ void __trace_graph_return(struct trace_array *tr,
return;
entry = ring_buffer_event_data(event);
entry->ret = *trace;
- if (!filter_current_check_discard(buffer, call, entry, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
__buffer_unlock_commit(buffer, event);
}

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 243f683..dae9541 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -835,7 +835,7 @@ __kprobe_trace_func(struct trace_probe *tp, struct pt_regs *regs,
entry->ip = (unsigned long)tp->rp.kp.addr;
store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);

- if (!filter_current_check_discard(buffer, call, entry, event))
+ if (!filter_check_discard(ftrace_file, entry, buffer, event))
trace_buffer_unlock_commit_regs(buffer, event,
irq_flags, pc, regs);
}
@@ -884,7 +884,7 @@ __kretprobe_trace_func(struct trace_probe *tp, struct kretprobe_instance *ri,
entry->ret_ip = (unsigned long)ri->ret_addr;
store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);

- if (!filter_current_check_discard(buffer, call, entry, event))
+ if (!filter_check_discard(ftrace_file, entry, buffer, event))
trace_buffer_unlock_commit_regs(buffer, event,
irq_flags, pc, regs);
}
diff --git a/kernel/trace/trace_mmiotrace.c b/kernel/trace/trace_mmiotrace.c
index b3dcfb2..0abd9b8 100644
--- a/kernel/trace/trace_mmiotrace.c
+++ b/kernel/trace/trace_mmiotrace.c
@@ -323,7 +323,7 @@ static void __trace_mmiotrace_rw(struct trace_array *tr,
entry = ring_buffer_event_data(event);
entry->rw = *rw;

- if (!filter_check_discard(call, entry, buffer, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, 0, pc);
}

@@ -353,7 +353,7 @@ static void __trace_mmiotrace_map(struct trace_array *tr,
entry = ring_buffer_event_data(event);
entry->map = *map;

- if (!filter_check_discard(call, entry, buffer, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, 0, pc);
}

diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index 4e98e3b..3f34dc9 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -45,7 +45,7 @@ tracing_sched_switch_trace(struct trace_array *tr,
entry->next_state = next->state;
entry->next_cpu = task_cpu(next);

- if (!filter_check_discard(call, entry, buffer, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, flags, pc);
}

@@ -101,7 +101,7 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
entry->next_state = wakee->state;
entry->next_cpu = task_cpu(wakee);

- if (!filter_check_discard(call, entry, buffer, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, flags, pc);
}

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 84cdbce..655bcf8 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -326,11 +326,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
(FTRACE_EVENT_FL_SOFT_DISABLED | FTRACE_EVENT_FL_TRIGGER_MODE)) ==
FTRACE_EVENT_FL_SOFT_DISABLED)
return;
-
sys_data = syscall_nr_to_meta(syscall_nr);
if (!sys_data)
return;
-
size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;

local_save_flags(irq_flags);
@@ -351,8 +349,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)

if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
ring_buffer_discard_commit(buffer, event);
- else if (!filter_current_check_discard(buffer, sys_data->enter_event,
- entry, event))
+ else if (!filter_check_discard(ftrace_file, entry, buffer, event))
trace_current_buffer_unlock_commit(buffer, event,
irq_flags, pc);
if (__tt)
@@ -409,8 +406,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)

if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
ring_buffer_discard_commit(buffer, event);
- else if (!filter_current_check_discard(buffer, sys_data->exit_event,
- entry, event))
+ else if (!filter_check_discard(ftrace_file, entry, buffer, event))
trace_current_buffer_unlock_commit(buffer, event,
irq_flags, pc);
if (__tt)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 272261b..b6dcc42 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -128,6 +128,7 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
if (is_ret)
tu->consumer.ret_handler = uretprobe_dispatcher;
init_trace_uprobe_filter(&tu->filter);
+ tu->call.flags |= TRACE_EVENT_FL_USE_CALL_FILTER;
return tu;

error:
@@ -561,7 +562,7 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
for (i = 0; i < tu->nr_args; i++)
call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);

- if (!filter_current_check_discard(buffer, call, entry, event))
+ if (!call_filter_check_discard(call, entry, buffer, event))
trace_buffer_unlock_commit(buffer, event, 0, 0);
}

--
1.7.11.4

2013-08-27 19:40:59

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 10/10] tracing: make register/unregister_ftrace_command __init

register/unregister_ftrace_command() are only ever called from __init
functions, so can themselves be made __init.

Also make register_snapshot_cmd() __init for the same reason.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ftrace.h | 4 ++--
kernel/trace/ftrace.c | 12 ++++++++++--
kernel/trace/trace.c | 4 ++--
3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9f15c00..6062491 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -533,11 +533,11 @@ static inline int ftrace_force_update(void) { return 0; }
static inline void ftrace_disable_daemon(void) { }
static inline void ftrace_enable_daemon(void) { }
static inline void ftrace_release_mod(struct module *mod) {}
-static inline int register_ftrace_command(struct ftrace_func_command *cmd)
+static inline __init int register_ftrace_command(struct ftrace_func_command *cmd)
{
return -EINVAL;
}
-static inline int unregister_ftrace_command(char *cmd_name)
+static inline __init int unregister_ftrace_command(char *cmd_name)
{
return -EINVAL;
}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index a6d098c..64f7f39 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3292,7 +3292,11 @@ void unregister_ftrace_function_probe_all(char *glob)
static LIST_HEAD(ftrace_commands);
static DEFINE_MUTEX(ftrace_cmd_mutex);

-int register_ftrace_command(struct ftrace_func_command *cmd)
+/*
+ * Currently we only register ftrace commands from __init, so mark this
+ * __init too.
+ */
+__init int register_ftrace_command(struct ftrace_func_command *cmd)
{
struct ftrace_func_command *p;
int ret = 0;
@@ -3311,7 +3315,11 @@ int register_ftrace_command(struct ftrace_func_command *cmd)
return ret;
}

-int unregister_ftrace_command(struct ftrace_func_command *cmd)
+/*
+ * Currently we only unregister ftrace commands from __init, so mark
+ * this __init too.
+ */
+__init int unregister_ftrace_command(struct ftrace_func_command *cmd)
{
struct ftrace_func_command *p, *n;
int ret = -ENODEV;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2aabd34..4222c6a 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5458,12 +5458,12 @@ static struct ftrace_func_command ftrace_snapshot_cmd = {
.func = ftrace_trace_snapshot_callback,
};

-static int register_snapshot_cmd(void)
+static __init int register_snapshot_cmd(void)
{
return register_ftrace_command(&ftrace_snapshot_cmd);
}
#else
-static inline int register_snapshot_cmd(void) { return 0; }
+static inline __init int register_snapshot_cmd(void) { return 0; }
#endif /* defined(CONFIG_TRACER_SNAPSHOT) && defined(CONFIG_DYNAMIC_FTRACE) */

struct dentry *tracing_init_dentry_tr(struct trace_array *tr)
--
1.7.11.4

2013-08-27 19:40:41

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 07/10] tracing: add and use generic set_trigger_filter() implementation

Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.

Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:

echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger

The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.

As another example, to add a filter to a stacktrace command:

echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger

The above command will only trigger a stacktrace if the common_pid
field in the event is 999.

The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.

Because triggers can now use filters, the trigger-invoking logic needs
to be moved - for ftrace_raw_event_calls, trigger invocation now needs
to happen after the { assign; } part of the call.

Also, because triggers need to be invoked even for soft-disabled
events, the SOFT_DISABLED check and return needs to be moved from the
top of the call to a point following the trigger check, which means
that soft-disabled events actually get discarded instead of simply
skipped. There's still a SOFT_DISABLED-only check at the top of the
function, so when an event is soft disabled but not because of the
presence of a trigger, the original SOFT_DISABLED behavior remains
unchanged.

There's also a bit of trickiness in that some triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking those
triggers. To do that, we split the trigger invocation in two - the
first part (event_triggers_call()) checks the filter using the current
trace record; if a command has the post_trigger flag set, it sets a
bit for itself in the return value, otherwise it directly invoks the
trigger. Once all commands have been either invoked or set their
return flag, event_triggers_call() returns. The current record is
then either committed or discarded; if any commands have deferred
their triggers, those commands are finally invoked following the close
of the current event by event_triggers_post_call().

The syscall event invocation code is also changed in analogous ways.

Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ftrace_event.h | 6 ++-
include/trace/ftrace.h | 45 +++++++++++-----
kernel/trace/trace.h | 4 ++
kernel/trace/trace_events_filter.c | 13 +++++
kernel/trace/trace_events_trigger.c | 101 ++++++++++++++++++++++++++++++++++--
kernel/trace/trace_syscalls.c | 36 +++++++++----
6 files changed, 179 insertions(+), 26 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index cac0954..cccfad3 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -326,11 +326,15 @@ enum event_trigger_type {

extern void destroy_preds(struct ftrace_event_call *call);
extern int filter_match_preds(struct event_filter *filter, void *rec);
+
extern int filter_current_check_discard(struct ring_buffer *buffer,
struct ftrace_event_call *call,
void *rec,
struct ring_buffer_event *event);
-extern void event_triggers_call(struct ftrace_event_file *file);
+extern enum event_trigger_type event_triggers_call(struct ftrace_event_file *file,
+ void *rec);
+extern void event_triggers_post_call(struct ftrace_event_file *file,
+ enum event_trigger_type tt);

enum {
FILTER_OTHER = 0,
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 326ba32..6c701c3 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -412,13 +412,15 @@ static inline notrace int ftrace_get_offsets_##call( \
* struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
* struct ring_buffer_event *event;
* struct ftrace_raw_<call> *entry; <-- defined in stage 1
+ * enum event_trigger_type __tt = ETT_NONE;
* struct ring_buffer *buffer;
* unsigned long irq_flags;
* int __data_size;
* int pc;
*
- * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
- * &ftrace_file->flags))
+ * if ((ftrace_file->flags & (FTRACE_EVENT_FL_SOFT_DISABLED |
+ * FTRACE_EVENT_FL_TRIGGER_MODE)) ==
+ * FTRACE_EVENT_FL_SOFT_DISABLED)
* return;
*
* local_save_flags(irq_flags);
@@ -437,9 +439,19 @@ static inline notrace int ftrace_get_offsets_##call( \
* { <assign>; } <-- Here we assign the entries by the __field and
* __array macros.
*
- * if (!filter_current_check_discard(buffer, event_call, entry, event))
- * trace_nowake_buffer_unlock_commit(buffer,
- * event, irq_flags, pc);
+ * if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
+ * &ftrace_file->flags))
+ * __tt = event_triggers_call(ftrace_file, entry);
+ *
+ * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ * &ftrace_file->flags))
+ * ring_buffer_discard_commit(buffer, event);
+ * else if (!filter_current_check_discard(buffer, event_call,
+ * entry, event))
+ * trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
+ *
+ * if (__tt)
+ * event_triggers_post_call(ftrace_file, __tt);
* }
*
* static struct trace_event ftrace_event_type_<call> = {
@@ -521,17 +533,15 @@ ftrace_raw_event_##call(void *__data, proto) \
struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
struct ring_buffer_event *event; \
struct ftrace_raw_##call *entry; \
+ enum event_trigger_type __tt = ETT_NONE; \
struct ring_buffer *buffer; \
unsigned long irq_flags; \
int __data_size; \
int pc; \
\
- if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
- &ftrace_file->flags)) \
- event_triggers_call(ftrace_file); \
- \
- if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
- &ftrace_file->flags)) \
+ if ((ftrace_file->flags & (FTRACE_EVENT_FL_SOFT_DISABLED | \
+ FTRACE_EVENT_FL_TRIGGER_MODE)) == \
+ FTRACE_EVENT_FL_SOFT_DISABLED) \
return; \
\
local_save_flags(irq_flags); \
@@ -551,8 +561,19 @@ ftrace_raw_event_##call(void *__data, proto) \
\
{ assign; } \
\
- if (!filter_current_check_discard(buffer, event_call, entry, event)) \
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
+ &ftrace_file->flags)) \
+ __tt = event_triggers_call(ftrace_file, entry); \
+ \
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
+ &ftrace_file->flags)) \
+ ring_buffer_discard_commit(buffer, event); \
+ else if (!filter_current_check_discard(buffer, event_call, \
+ entry, event)) \
trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
+ \
+ if (__tt) \
+ event_triggers_post_call(ftrace_file, __tt); \
}
/*
* The ftrace_test_probe is compiled out, it is only here as a build time check
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3cb846e..af5f3b6 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -994,6 +994,10 @@ extern int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
extern void print_subsystem_event_filter(struct event_subsystem *system,
struct trace_seq *s);
extern int filter_assign_type(const char *type);
+extern int create_event_filter(struct ftrace_event_call *call,
+ char *filter_str, bool set_str,
+ struct event_filter **filterp);
+extern void free_event_filter(struct event_filter *filter);

struct ftrace_event_field *
trace_find_event_field(struct ftrace_event_call *call, char *name);
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 97daa8c..0c45aa1 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -781,6 +781,11 @@ static void __free_filter(struct event_filter *filter)
kfree(filter);
}

+void free_event_filter(struct event_filter *filter)
+{
+ __free_filter(filter);
+}
+
/*
* Called when destroying the ftrace_event_call.
* The call is being freed, so we do not need to worry about
@@ -1806,6 +1811,14 @@ static int create_filter(struct ftrace_event_call *call,
return err;
}

+int create_event_filter(struct ftrace_event_call *call,
+ char *filter_str, bool set_str,
+ struct event_filter **filterp)
+{
+ return create_filter(call, filter_str, set_str, filterp);
+}
+
+
/**
* create_system_filter - create a filter for an event_subsystem
* @system: event_subsystem to create a filter for
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 54678e2..b5e7ca7 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -43,24 +43,53 @@ struct event_trigger_data {
static void
trigger_data_free(struct event_trigger_data *data)
{
+ if (data->cmd_ops->set_filter)
+ data->cmd_ops->set_filter(NULL, data, NULL);
+
synchronize_sched(); /* make sure current triggers exit before free */
kfree(data);
}

-void event_triggers_call(struct ftrace_event_file *file)
+enum event_trigger_type
+event_triggers_call(struct ftrace_event_file *file, void *rec)
{
struct event_trigger_data *data;
+ enum event_trigger_type tt = ETT_NONE;

if (list_empty(&file->triggers))
- return;
+ return tt;

preempt_disable_notrace();
- list_for_each_entry_rcu(data, &file->triggers, list)
+ list_for_each_entry_rcu(data, &file->triggers, list) {
+ if (data->filter && !filter_match_preds(data->filter, rec))
+ continue;
+ if (data->cmd_ops->post_trigger) {
+ tt |= data->cmd_ops->trigger_type;
+ continue;
+ }
data->ops->func((void **)&data);
+ }
preempt_enable_notrace();
+
+ return tt;
}
EXPORT_SYMBOL_GPL(event_triggers_call);

+void
+event_triggers_post_call(struct ftrace_event_file *file,
+ enum event_trigger_type tt)
+{
+ struct event_trigger_data *data;
+
+ preempt_disable_notrace();
+ list_for_each_entry_rcu(data, &file->triggers, list) {
+ if (data->cmd_ops->trigger_type & tt)
+ data->ops->func((void **)&data);
+ }
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(event_triggers_post_call);
+
static void *trigger_next(struct seq_file *m, void *t, loff_t *pos)
{
struct ftrace_event_file *event_file = event_file_data(m->private);
@@ -561,6 +590,66 @@ event_trigger_callback(struct event_command *cmd_ops,
goto out;
}

+/**
+ * set_trigger_filter - generic event_command @set_filter
+ * implementation
+ *
+ * Common implementation for event command filter parsing and filter
+ * instantiation.
+ *
+ * Usually used directly as the @set_filter method in event command
+ * implementations.
+ *
+ * Also used to remove a filter (if filter_str = NULL).
+ */
+static int set_trigger_filter(char *filter_str, void *trigger_data,
+ struct ftrace_event_file *file)
+{
+ struct event_trigger_data *data = trigger_data;
+ struct event_filter *filter = NULL, *tmp;
+ int ret = -EINVAL;
+ char *s;
+
+ if (!filter_str) /* clear the current filter */
+ goto assign;
+
+ s = strsep(&filter_str, " \t");
+
+ if (!strlen(s) || strcmp(s, "if") != 0)
+ goto out;
+
+ if (!filter_str)
+ goto out;
+
+ /* The filter is for the 'trigger' event, not the triggered event */
+ ret = create_event_filter(file->event_call, filter_str, false, &filter);
+ if (ret)
+ goto out;
+ assign:
+ tmp = data->filter;
+
+ rcu_assign_pointer(data->filter, filter);
+
+ if (tmp) {
+ /* Make sure the call is done with the filter */
+ synchronize_sched();
+ free_event_filter(tmp);
+ }
+
+ kfree(data->filter_str);
+
+ if (filter_str) {
+ data->filter_str = kstrdup(filter_str, GFP_KERNEL);
+ if (!data->filter_str) {
+ free_event_filter(data->filter);
+ data->filter = NULL;
+ ret = -ENOMEM;
+ }
+ }
+ out:
+ return ret;
+}
+
static void
traceon_trigger(void **_data)
{
@@ -698,6 +787,7 @@ static struct event_command trigger_traceon_cmd = {
.reg = register_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = onoff_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};

static struct event_command trigger_traceoff_cmd = {
@@ -707,6 +797,7 @@ static struct event_command trigger_traceoff_cmd = {
.reg = register_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = onoff_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};

static void
@@ -788,6 +879,7 @@ static struct event_command trigger_snapshot_cmd = {
.reg = register_snapshot_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = snapshot_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};

/*
@@ -867,6 +959,7 @@ static struct event_command trigger_stacktrace_cmd = {
.reg = register_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = stacktrace_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};

static __init void unregister_trigger_traceon_traceoff_cmds(void)
@@ -1194,6 +1287,7 @@ static struct event_command trigger_enable_cmd = {
.reg = event_enable_register_trigger,
.unreg = event_enable_unregister_trigger,
.get_trigger_ops = event_enable_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};

static struct event_command trigger_disable_cmd = {
@@ -1203,6 +1297,7 @@ static struct event_command trigger_disable_cmd = {
.reg = event_enable_register_trigger,
.unreg = event_enable_unregister_trigger,
.get_trigger_ops = event_enable_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};

static __init void unregister_trigger_enable_disable_cmds(void)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 4f56d54..84cdbce 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -306,6 +306,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
struct syscall_trace_enter *entry;
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
+ enum event_trigger_type __tt = ETT_NONE;
struct ring_buffer *buffer;
unsigned long irq_flags;
int pc;
@@ -321,9 +322,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
if (!ftrace_file)
return;

- if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
- event_triggers_call(ftrace_file);
- if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
+ if ((ftrace_file->flags &
+ (FTRACE_EVENT_FL_SOFT_DISABLED | FTRACE_EVENT_FL_TRIGGER_MODE)) ==
+ FTRACE_EVENT_FL_SOFT_DISABLED)
return;

sys_data = syscall_nr_to_meta(syscall_nr);
@@ -345,10 +346,17 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
entry->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);

- if (!filter_current_check_discard(buffer, sys_data->enter_event,
- entry, event))
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
+ __tt = event_triggers_call(ftrace_file, entry);
+
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
+ ring_buffer_discard_commit(buffer, event);
+ else if (!filter_current_check_discard(buffer, sys_data->enter_event,
+ entry, event))
trace_current_buffer_unlock_commit(buffer, event,
irq_flags, pc);
+ if (__tt)
+ event_triggers_post_call(ftrace_file, __tt);
}

static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
@@ -358,6 +366,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
struct syscall_trace_exit *entry;
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
+ enum event_trigger_type __tt = ETT_NONE;
struct ring_buffer *buffer;
unsigned long irq_flags;
int pc;
@@ -372,9 +381,9 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
if (!ftrace_file)
return;

- if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
- event_triggers_call(ftrace_file);
- if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
+ if ((ftrace_file->flags &
+ (FTRACE_EVENT_FL_SOFT_DISABLED | FTRACE_EVENT_FL_TRIGGER_MODE)) ==
+ FTRACE_EVENT_FL_SOFT_DISABLED)
return;

sys_data = syscall_nr_to_meta(syscall_nr);
@@ -395,10 +404,17 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
entry->nr = syscall_nr;
entry->ret = syscall_get_return_value(current, regs);

- if (!filter_current_check_discard(buffer, sys_data->exit_event,
- entry, event))
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
+ __tt = event_triggers_call(ftrace_file, entry);
+
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
+ ring_buffer_discard_commit(buffer, event);
+ else if (!filter_current_check_discard(buffer, sys_data->exit_event,
+ entry, event))
trace_current_buffer_unlock_commit(buffer, event,
irq_flags, pc);
+ if (__tt)
+ event_triggers_post_call(ftrace_file, __tt);
}

static int reg_event_syscall_enter(struct ftrace_event_file *file,
--
1.7.11.4

2013-08-27 19:40:53

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 06/10] tracing: add 'enable_event' and 'disable_event' event trigger commands

Add 'enable_event' and 'disable_event' event_command commands.

enable_event and disable_event event triggers are added by the user
via these commands in a similar way and using practically the same
syntax as the analagous 'enable_event' and 'disable_event' ftrace
function commands, but instead of writing to the set_ftrace_filter
file, the enable_event and disable_event triggers are written to the
per-event 'trigger' files:

echo 'enable_event:system:event' > .../othersys/otherevent/trigger
echo 'disable_event:system:event' > .../othersys/otherevent/trigger

The above commands will enable or disable the 'system:event' trace
events whenever the othersys:otherevent events are hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger
echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will enable or disable the 'system:event'
trace events whenever the othersys:otherevent events are hit, but only
N times.

This also makes the find_event_file() helper function extern, since
it's useful to use from other places, such as the event triggers code,
so make it accessible.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ftrace_event.h | 1 +
kernel/trace/trace.h | 4 +
kernel/trace/trace_events.c | 2 +-
kernel/trace/trace_events_trigger.c | 364 ++++++++++++++++++++++++++++++++++++
4 files changed, 370 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index df5eae9..cac0954 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -321,6 +321,7 @@ enum event_trigger_type {
ETT_TRACE_ONOFF = (1 << 0),
ETT_SNAPSHOT = (1 << 1),
ETT_STACKTRACE = (1 << 2),
+ ETT_EVENT_ENABLE = (1 << 3),
};

extern void destroy_preds(struct ftrace_event_call *call);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 11fb2ea..3cb846e 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1016,6 +1016,10 @@ extern void trace_event_enable_cmd_record(bool enable);
extern int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr);
extern int event_trace_del_tracer(struct trace_array *tr);

+extern struct ftrace_event_file *find_event_file(struct trace_array *tr,
+ const char *system,
+ const char *event);
+
static inline void *event_file_data(struct file *filp)
{
return ACCESS_ONCE(file_inode(filp)->i_private);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7d8eb8a..25b2c86 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1860,7 +1860,7 @@ struct event_probe_data {
bool enable;
};

-static struct ftrace_event_file *
+struct ftrace_event_file *
find_event_file(struct trace_array *tr, const char *system, const char *event)
{
struct ftrace_event_file *file;
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index c83f56f..54678e2 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -879,6 +879,358 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
&trigger_cmd_mutex);
}

+/* Avoid typos */
+#define ENABLE_EVENT_STR "enable_event"
+#define DISABLE_EVENT_STR "disable_event"
+
+static void
+event_enable_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (data->enable)
+ clear_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &data->file->flags);
+ else
+ set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &data->file->flags);
+}
+
+static void
+event_enable_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ /* Skip if the event is in a state we want to switch to */
+ if (data->enable == !(data->file->flags & FTRACE_EVENT_FL_SOFT_DISABLED))
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ event_enable_trigger(_data);
+}
+
+static int
+event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ seq_printf(m, "%s:%s:%s",
+ data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR,
+ data->file->event_call->class->system,
+ data->file->event_call->name);
+
+ if (data->count == -1)
+ seq_puts(m, ":unlimited");
+ else
+ seq_printf(m, ":count=%ld", data->count);
+
+ if (data->filter_str)
+ seq_printf(m, " if %s\n", data->filter_str);
+ else
+ seq_puts(m, "\n");
+
+ return 0;
+}
+
+static void
+event_enable_trigger_free(struct event_trigger_ops *ops, void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (WARN_ON_ONCE(data->ref <= 0))
+ return;
+
+ data->ref--;
+ if (!data->ref) {
+ /* Remove the SOFT_MODE flag */
+ trace_event_enable_disable(data->file, 0, 1);
+ module_put(data->file->event_call->mod);
+ trigger_data_free(data);
+ }
+}
+
+static struct event_trigger_ops event_enable_trigger_ops = {
+ .func = event_enable_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops event_enable_count_trigger_ops = {
+ .func = event_enable_count_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops event_disable_trigger_ops = {
+ .func = event_enable_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops event_disable_count_trigger_ops = {
+ .func = event_enable_count_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static int
+event_enable_trigger_func(struct event_command *cmd_ops,
+ struct ftrace_event_file *file,
+ char *glob, char *cmd, char *param, int enabled)
+{
+ struct ftrace_event_file *event_enable_file;
+ struct event_trigger_data *trigger_data;
+ struct event_trigger_ops *trigger_ops;
+ struct trace_array *tr = file->tr;
+ const char *system;
+ const char *event;
+ char *trigger;
+ char *number;
+ bool enable;
+ int ret;
+
+ if (!enabled)
+ return -EINVAL;
+
+ if (!param)
+ return -EINVAL;
+
+ /* separate the trigger from the filter (s:e:n [if filter]) */
+ trigger = strsep(&param, " \t");
+ if (!trigger)
+ return -EINVAL;
+
+ system = strsep(&trigger, ":");
+ if (!trigger)
+ return -EINVAL;
+
+ event = strsep(&trigger, ":");
+
+ ret = -EINVAL;
+ event_enable_file = find_event_file(tr, system, event);
+ if (!event_enable_file)
+ goto out;
+
+ enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
+
+ trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
+
+ ret = -ENOMEM;
+ trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
+ if (!trigger_data)
+ goto out;
+
+ trigger_data->enable = enable;
+ trigger_data->count = -1;
+ trigger_data->file = event_enable_file;
+ trigger_data->ops = trigger_ops;
+ trigger_data->cmd_ops = cmd_ops;
+ INIT_LIST_HEAD(&trigger_data->list);
+ RCU_INIT_POINTER(trigger_data->filter, NULL);
+
+ if (glob[0] == '!') {
+ cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+ kfree(trigger_data);
+ ret = 0;
+ goto out;
+ }
+
+ if (trigger) {
+ number = strsep(&trigger, ":");
+
+ ret = -EINVAL;
+ if (!strlen(number))
+ goto out_free;
+
+ /*
+ * We use the callback data field (which is a pointer)
+ * as our counter.
+ */
+ ret = kstrtoul(number, 0, &trigger_data->count);
+ if (ret)
+ goto out_free;
+ }
+
+ if (!param) /* if param is non-empty, it's supposed to be a filter */
+ goto out_reg;
+
+ if (!cmd_ops->set_filter)
+ goto out_reg;
+
+ ret = cmd_ops->set_filter(param, trigger_data, file);
+ if (ret < 0)
+ goto out_free;
+
+ out_reg:
+ /* Don't let event modules unload while probe registered */
+ ret = try_module_get(event_enable_file->event_call->mod);
+ if (!ret) {
+ ret = -EBUSY;
+ goto out_free;
+ }
+
+ ret = trace_event_enable_disable(event_enable_file, 1, 1);
+ if (ret < 0)
+ goto out_put;
+ ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
+ /*
+ * The above returns on success the # of functions enabled,
+ * but if it didn't find any functions it returns zero.
+ * Consider no functions a failure too.
+ */
+ if (!ret) {
+ ret = -ENOENT;
+ goto out_disable;
+ } else if (ret < 0)
+ goto out_disable;
+ /* Just return zero, not the number of enabled functions */
+ ret = 0;
+ out:
+ return ret;
+
+ out_disable:
+ trace_event_enable_disable(event_enable_file, 0, 1);
+ out_put:
+ module_put(event_enable_file->event_call->mod);
+ out_free:
+ kfree(trigger_data);
+ goto out;
+}
+
+static int event_enable_register_trigger(char *glob,
+ struct event_trigger_ops *ops,
+ void *trigger_data,
+ struct ftrace_event_file *file)
+{
+ struct event_trigger_data *data = trigger_data;
+ struct event_trigger_data *test;
+ int ret = 0;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->file == data->file) {
+ ret = -EEXIST;
+ goto out;
+ }
+ }
+
+ if (data->ops->init) {
+ ret = data->ops->init(data->ops, (void **)&data);
+ if (ret < 0)
+ goto out;
+ }
+
+ list_add_rcu(&data->list, &file->triggers);
+ ret++;
+
+ if (trace_event_trigger_enable_disable(file, 1) < 0) {
+ list_del_rcu(&data->list);
+ ret--;
+ }
+out:
+ return ret;
+}
+
+static void event_enable_unregister_trigger(char *glob,
+ struct event_trigger_ops *ops,
+ void *trigger_data,
+ struct ftrace_event_file *file)
+{
+ struct event_trigger_data *test = trigger_data;
+ struct event_trigger_data *data;
+ bool unregistered = false;
+
+ list_for_each_entry_rcu(data, &file->triggers, list) {
+ if (data->file == test->file) {
+ unregistered = true;
+ list_del_rcu(&data->list);
+ trace_event_trigger_enable_disable(file, 0);
+ break;
+ }
+ }
+
+ if (unregistered && data->ops->free)
+ data->ops->free(data->ops, (void **)&data);
+}
+
+static struct event_trigger_ops *
+event_enable_get_trigger_ops(char *cmd, char *param)
+{
+ struct event_trigger_ops *ops;
+ bool enable;
+
+ enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
+
+ if (enable)
+ ops = param ? &event_enable_count_trigger_ops :
+ &event_enable_trigger_ops;
+ else
+ ops = param ? &event_disable_count_trigger_ops :
+ &event_disable_trigger_ops;
+
+ return ops;
+}
+
+static struct event_command trigger_enable_cmd = {
+ .name = ENABLE_EVENT_STR,
+ .trigger_type = ETT_EVENT_ENABLE,
+ .func = event_enable_trigger_func,
+ .reg = event_enable_register_trigger,
+ .unreg = event_enable_unregister_trigger,
+ .get_trigger_ops = event_enable_get_trigger_ops,
+};
+
+static struct event_command trigger_disable_cmd = {
+ .name = DISABLE_EVENT_STR,
+ .trigger_type = ETT_EVENT_ENABLE,
+ .func = event_enable_trigger_func,
+ .reg = event_enable_register_trigger,
+ .unreg = event_enable_unregister_trigger,
+ .get_trigger_ops = event_enable_get_trigger_ops,
+};
+
+static __init void unregister_trigger_enable_disable_cmds(void)
+{
+ unregister_event_command(&trigger_enable_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ unregister_event_command(&trigger_disable_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+}
+
+static __init int register_trigger_enable_disable_cmds(void)
+{
+ int ret;
+
+ ret = register_event_command(&trigger_enable_cmd, &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0))
+ return ret;
+ ret = register_event_command(&trigger_disable_cmd, &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0))
+ unregister_trigger_enable_disable_cmds();
+
+ return ret;
+}
+
static __init int register_trigger_traceon_traceoff_cmds(void)
{
int ret;
@@ -924,5 +1276,17 @@ __init int register_trigger_cmds(void)
return ret;
}

+ ret = register_trigger_enable_disable_cmds();
+ if (ret) {
+ unregister_trigger_traceon_traceoff_cmds();
+ unregister_event_command(&trigger_snapshot_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ unregister_event_command(&trigger_stacktrace_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ return ret;
+ }
+
return 0;
}
--
1.7.11.4

2013-08-27 19:40:48

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 01/10] tracing: Add support for SOFT_DISABLE to syscall events

The original SOFT_DISABLE patches didn't add support for soft disable
of syscall events; this adds it and paves the way for future patches
allowing triggers to be added to syscall events, since triggers are
built on top of SOFT_DISABLE.

Add an array of ftrace_event_file pointers indexed by syscall number
to the trace array and remove the existing enabled bitmaps, which as a
result are now redundant. The ftrace_event_file structs in turn
contain the soft disable flags we need for per-syscall soft disable
accounting; later patches add additional 'trigger' flags and
per-syscall triggers and filters.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace.h | 4 ++--
kernel/trace/trace_syscalls.c | 36 ++++++++++++++++++++++++++++++------
2 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index fe39acd..b1227b9 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -192,8 +192,8 @@ struct trace_array {
#ifdef CONFIG_FTRACE_SYSCALLS
int sys_refcount_enter;
int sys_refcount_exit;
- DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
- DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
+ struct ftrace_event_file *enter_syscall_files[NR_syscalls];
+ struct ftrace_event_file *exit_syscall_files[NR_syscalls];
#endif
int stop_count;
int clock_id;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 559329d..230cdb6 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -302,6 +302,7 @@ static int __init syscall_exit_define_fields(struct ftrace_event_call *call)
static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
{
struct trace_array *tr = data;
+ struct ftrace_event_file *ftrace_file;
struct syscall_trace_enter *entry;
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
@@ -314,7 +315,13 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0)
return;
- if (!test_bit(syscall_nr, tr->enabled_enter_syscalls))
+
+ /* Here we're inside the tp handler's rcu_read_lock (__DO_TRACE()) */
+ ftrace_file = rcu_dereference_raw(tr->enter_syscall_files[syscall_nr]);
+ if (!ftrace_file)
+ return;
+
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
return;

sys_data = syscall_nr_to_meta(syscall_nr);
@@ -345,6 +352,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
{
struct trace_array *tr = data;
+ struct ftrace_event_file *ftrace_file;
struct syscall_trace_exit *entry;
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
@@ -356,7 +364,13 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
syscall_nr = trace_get_syscall_nr(current, regs);
if (syscall_nr < 0)
return;
- if (!test_bit(syscall_nr, tr->enabled_exit_syscalls))
+
+ /* Here we're inside the tp handler's rcu_read_lock (__DO_TRACE()) */
+ ftrace_file = rcu_dereference_raw(tr->exit_syscall_files[syscall_nr]);
+ if (!ftrace_file)
+ return;
+
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
return;

sys_data = syscall_nr_to_meta(syscall_nr);
@@ -397,7 +411,7 @@ static int reg_event_syscall_enter(struct ftrace_event_file *file,
if (!tr->sys_refcount_enter)
ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
if (!ret) {
- set_bit(num, tr->enabled_enter_syscalls);
+ rcu_assign_pointer(tr->enter_syscall_files[num], file);
tr->sys_refcount_enter++;
}
mutex_unlock(&syscall_trace_lock);
@@ -415,9 +429,14 @@ static void unreg_event_syscall_enter(struct ftrace_event_file *file,
return;
mutex_lock(&syscall_trace_lock);
tr->sys_refcount_enter--;
- clear_bit(num, tr->enabled_enter_syscalls);
+ rcu_assign_pointer(tr->enter_syscall_files[num], NULL);
if (!tr->sys_refcount_enter)
unregister_trace_sys_enter(ftrace_syscall_enter, tr);
+ /*
+ * Callers expect the event to be completely disabled on
+ * return, so wait for current handlers to finish.
+ */
+ synchronize_sched();
mutex_unlock(&syscall_trace_lock);
}

@@ -435,7 +454,7 @@ static int reg_event_syscall_exit(struct ftrace_event_file *file,
if (!tr->sys_refcount_exit)
ret = register_trace_sys_exit(ftrace_syscall_exit, tr);
if (!ret) {
- set_bit(num, tr->enabled_exit_syscalls);
+ rcu_assign_pointer(tr->exit_syscall_files[num], file);
tr->sys_refcount_exit++;
}
mutex_unlock(&syscall_trace_lock);
@@ -453,9 +472,14 @@ static void unreg_event_syscall_exit(struct ftrace_event_file *file,
return;
mutex_lock(&syscall_trace_lock);
tr->sys_refcount_exit--;
- clear_bit(num, tr->enabled_exit_syscalls);
+ rcu_assign_pointer(tr->exit_syscall_files[num], NULL);
if (!tr->sys_refcount_exit)
unregister_trace_sys_exit(ftrace_syscall_exit, tr);
+ /*
+ * Callers expect the event to be completely disabled on
+ * return, so wait for current handlers to finish.
+ */
+ synchronize_sched();
mutex_unlock(&syscall_trace_lock);
}

--
1.7.11.4

2013-08-27 19:40:47

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 09/10] tracing: add documentation for trace event triggers

Provide a basic overview of trace event triggers and document the
available trigger commands, along with a few simple examples.

Signed-off-by: Tom Zanussi <[email protected]>
---
Documentation/trace/events.txt | 207 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 207 insertions(+)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 37732a2..c94435d 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -287,3 +287,210 @@ their old filters):
prev_pid == 0
# cat sched_wakeup/filter
common_pid == 0
+
+6. Event triggers
+=================
+
+Trace events can be made to conditionally invoke trigger 'commands'
+which can take various forms and are described in detail below;
+examples would be enabling or disabling other trace events or invoking
+a stack trace whenever the trace event is hit. Whenever a trace event
+with attached triggers is invoked, the set of trigger commands
+associated with that event is invoked. Any given trigger can
+additionally have an event filter of the same form as described in
+section 5 (Event filtering) associated with it - the command will only
+be invoked if the event being invoked passes the associated filter.
+If no filter is associated with the trigger, it always passes.
+
+Triggers are added to and removed from a particular event by writing
+trigger expressions to the 'trigger' file for the given event.
+
+A given event can have any number of triggers associated with it,
+subject to any restrictions that individual commands may have in that
+regard.
+
+Event triggers are implemented on top of "soft" mode, which means that
+whenever a trace event has one or more triggers associated with it,
+the event is activated even if it isn't actually enabled, but is
+disabled in a "soft" mode. That is, the tracepoint will be called,
+but just will not be traced, unless of course it's actually enabled.
+This scheme allows triggers to be invoked even for events that aren't
+enabled, and also allows the current event filter implementation to be
+used for conditionally invoking triggers.
+
+The syntax for event triggers is roughly based on the syntax for
+set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
+section of Documentation/trace/ftrace.txt), but there are major
+differences and the implementation isn't currently tied to it in any
+way, so beware about making generalizations between the two.
+
+6.1 Expression syntax
+---------------------
+
+Triggers are added by echoing the command to the 'trigger' file:
+
+ # echo 'command[:count] [if filter]' > trigger
+
+Triggers are removed by echoing the same command but starting with '!'
+to the 'trigger' file:
+
+ # echo '!command[:count] [if filter]' > trigger
+
+The [if filter] part isn't used in matching commands when removing, so
+leaving that off in a '!' command will accomplish the same thing as
+having it in.
+
+The filter syntax is the same as that described in the 'Event
+filtering' section above.
+
+For ease of use, writing to the trigger file using '>' currently just
+adds or removes a single trigger and there's no explicit '>>' support
+('>' actually behaves like '>>') or truncation support to remove all
+triggers (you have to use '!' for each one added.)
+
+6.2 Supported trigger commands
+------------------------------
+
+The following commands are supported:
+
+- enable_event/disable_event
+
+ These commands can enable or disable another trace event whenever
+ the triggering event is hit. When these commands are registered,
+ the other trace event is activated, but disabled in a "soft" mode.
+ That is, the tracepoint will be called, but just will not be traced.
+ The event tracepoint stays in this mode as long as there's a trigger
+ in effect that can trigger it.
+
+ For example, the following trigger causes kmalloc events to be
+ traced when a read system call is entered, and the :1 at the end
+ specifies that this enablement happens only once:
+
+ # echo 'enable_event:kmem:kmalloc:1' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+ The following trigger causes kmalloc events to stop being traced
+ when a read system call exits. This disablement happens on every
+ read system call exit:
+
+ # echo 'disable_event:kmem:kmalloc' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+
+ The format is:
+
+ enable_event:<system>:<event>[:count]
+ disable_event:<system>:<event>[:count]
+
+ To remove the above commands:
+
+ # echo '!enable_event:kmem:kmalloc:1' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+ # echo '!disable_event:kmem:kmalloc' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+
+ Note that there can be any number of enable/disable_event triggers
+ per triggering event, but there can only be one trigger per
+ triggered event. e.g. sys_enter_read can have triggers enabling both
+ kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
+ versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
+ bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
+ could be combined into a single filter on kmem:kmalloc though).
+
+- stacktrace
+
+ This command dumps a stacktrace in the trace buffer whenever the
+ triggering event occurs.
+
+ For example, the following trigger dumps a stacktrace every time the
+ kmalloc tracepoint is hit:
+
+ # echo 'stacktrace' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The following trigger dumps a stacktrace the first 5 times a kmalloc
+ request happens with a size >= 64K
+
+ # echo 'stacktrace:5 if bytes_req >= 65536' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The format is:
+
+ stacktrace[:count]
+
+ To remove the above commands:
+
+ # echo '!stacktrace' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # echo '!stacktrace:5 if bytes_req >= 65536' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The latter can also be removed more simply by the following (without
+ the filter):
+
+ # echo '!stacktrace:5' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ Note that there can be only one stacktrace trigger per triggering
+ event.
+
+- snapshot
+
+ This command causes a snapshot to be triggered whenever the
+ triggering event occurs.
+
+ The following command creates a snapshot every time a block request
+ queue is unplugged with a depth > 1. If you were tracing a set of
+ events or functions at the time, the snapshot trace buffer would
+ capture those events when the trigger event occured:
+
+ # echo 'snapshot if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ To only snapshot once:
+
+ # echo 'snapshot:1 if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ To remove the above commands:
+
+ # echo '!snapshot if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ # echo '!snapshot:1 if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ Note that there can be only one snapshot trigger per triggering
+ event.
+
+- traceon/traceoff
+
+ These commands turn tracing on and off when the specified events are
+ hit. The parameter determines how many times the tracing system is
+ turned on and off. If unspecified, there is no limit.
+
+ The following command turns tracing off the first time a block
+ request queue is unplugged with a depth > 1. If you were tracing a
+ set of events or functions at the time, you could then examine the
+ trace buffer to see the sequence of events that led up to the
+ trigger event:
+
+ # echo 'traceoff:1 if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ To always disable tracing when nr_rq > 1 :
+
+ # echo 'traceoff if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ To remove the above commands:
+
+ # echo '!traceoff:1 if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ # echo '!traceoff if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ Note that there can be only one traceon or traceoff trigger per
+ triggering event.
--
1.7.11.4

2013-08-27 19:40:37

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 03/10] tracing: add 'traceon' and 'traceoff' event trigger commands

Add 'traceon' and 'traceoff' ftrace_func_command commands. traceon
and traceoff event triggers are added by the user via these commands
in a similar way and using practically the same syntax as the
analagous 'traceon' and 'traceoff' ftrace function commands, but
instead of writing to the set_ftrace_filter file, the traceon and
traceoff triggers are written to the per-event 'trigger' files:

echo 'traceon' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff' > .../tracing/events/somesys/someevent/trigger

The above command will turn tracing on or off whenever someevent is
hit.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above commands will will turn tracing on or off whenever someevent
is hit, but only N times.

Some common register/unregister_trigger() implementations of the
event_command reg()/unreg() callbacks are also provided, which add and
remove trigger instances to the per-event list of triggers, and
arm/disarm them as appropriate. event_trigger_callback() is a
general-purpose event_command func() implementation that orchestrates
command parsing and registration for most normal commands.

Most event commands will use these, but some will override and
possibly reuse them.

The event_trigger_init(), event_trigger_free(), and
event_trigger_print() functions are meant to be common implementations
of the event_trigger_ops init(), free(), and print() ops,
respectively.

Most trigger_ops implementations will use these, but some will
override and possibly reuse them.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ftrace_event.h | 1 +
kernel/trace/trace_events_trigger.c | 469 ++++++++++++++++++++++++++++++++++++
2 files changed, 470 insertions(+)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4b4fa62..59354a0 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -318,6 +318,7 @@ struct ftrace_event_file {

enum event_trigger_type {
ETT_NONE = (0),
+ ETT_TRACE_ONOFF = (1 << 0),
};

extern void destroy_preds(struct ftrace_event_call *call);
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 5ec8336..91ad86b 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -40,6 +40,13 @@ struct event_trigger_data {
struct list_head list;
};

+static void
+trigger_data_free(struct event_trigger_data *data)
+{
+ synchronize_sched(); /* make sure current triggers exit before free */
+ kfree(data);
+}
+
void event_triggers_call(struct ftrace_event_file *file)
{
struct event_trigger_data *data;
@@ -227,6 +234,125 @@ const struct file_operations event_trigger_fops = {
.release = event_trigger_release,
};

+/*
+ * Currently we only register event commands from __init, so mark this
+ * __init too.
+ */
+static __init int register_event_command(struct event_command *cmd,
+ struct list_head *cmd_list,
+ struct mutex *cmd_list_mutex)
+{
+ struct event_command *p;
+ int ret = 0;
+
+ mutex_lock(cmd_list_mutex);
+ list_for_each_entry(p, cmd_list, list) {
+ if (strcmp(cmd->name, p->name) == 0) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+ }
+ list_add(&cmd->list, cmd_list);
+ out_unlock:
+ mutex_unlock(cmd_list_mutex);
+
+ return ret;
+}
+
+/*
+ * Currently we only unregister event commands from __init, so mark
+ * this __init too.
+ */
+static __init int unregister_event_command(struct event_command *cmd,
+ struct list_head *cmd_list,
+ struct mutex *cmd_list_mutex)
+{
+ struct event_command *p, *n;
+ int ret = -ENODEV;
+
+ mutex_lock(cmd_list_mutex);
+ list_for_each_entry_safe(p, n, cmd_list, list) {
+ if (strcmp(cmd->name, p->name) == 0) {
+ ret = 0;
+ list_del_init(&p->list);
+ goto out_unlock;
+ }
+ }
+ out_unlock:
+ mutex_unlock(cmd_list_mutex);
+
+ return ret;
+}
+
+/**
+ * event_trigger_print - generic event_trigger_ops @print implementation
+ *
+ * Common implementation for event triggers to print themselves.
+ *
+ * Usually wrapped by a function that simply sets the @name of the
+ * trigger command and then invokes this.
+ */
+static int
+event_trigger_print(const char *name, struct seq_file *m,
+ void *data, char *filter_str)
+{
+ long count = (long)data;
+
+ seq_printf(m, "%s", name);
+
+ if (count == -1)
+ seq_puts(m, ":unlimited");
+ else
+ seq_printf(m, ":count=%ld", count);
+
+ if (filter_str)
+ seq_printf(m, " if %s\n", filter_str);
+ else
+ seq_puts(m, "\n");
+
+ return 0;
+}
+
+/**
+ * event_trigger_init - generic event_trigger_ops @init implementation
+ *
+ * Common implementation of event trigger initialization.
+ *
+ * Usually used directly as the @init method in event trigger
+ * implementations.
+ */
+static int
+event_trigger_init(struct event_trigger_ops *ops, void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ data->ref++;
+ return 0;
+}
+
+/**
+ * event_trigger_free - generic event_trigger_ops @free implementation
+ *
+ * Common implementation of event trigger de-initialization.
+ *
+ * Usually used directly as the @free method in event trigger
+ * implementations.
+ */
+static void
+event_trigger_free(struct event_trigger_ops *ops, void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (WARN_ON_ONCE(data->ref <= 0))
+ return;
+
+ data->ref--;
+ if (!data->ref)
+ trigger_data_free(data);
+}
+
static int trace_event_trigger_enable_disable(struct ftrace_event_file *file,
int trigger_enable)
{
@@ -274,7 +400,350 @@ clear_event_triggers(struct trace_array *tr)
}
}

+/**
+ * register_trigger - generic event_command @reg implementation
+ *
+ * Common implementation for event trigger registration.
+ *
+ * Usually used directly as the @reg method in event command
+ * implementations.
+ */
+static int register_trigger(char *glob, struct event_trigger_ops *ops,
+ void *trigger_data, struct ftrace_event_file *file)
+{
+ struct event_trigger_data *data = trigger_data;
+ struct event_trigger_data *test;
+ int ret = 0;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == data->cmd_ops->trigger_type) {
+ ret = -EEXIST;
+ goto out;
+ }
+ }
+
+ if (data->ops->init) {
+ ret = data->ops->init(data->ops, (void **)&data);
+ if (ret < 0)
+ goto out;
+ }
+
+ list_add_rcu(&data->list, &file->triggers);
+ ret++;
+
+ if (trace_event_trigger_enable_disable(file, 1) < 0) {
+ list_del_rcu(&data->list);
+ ret--;
+ }
+out:
+ return ret;
+}
+
+/**
+ * unregister_trigger - generic event_command @unreg implementation
+ *
+ * Common implementation for event trigger unregistration.
+ *
+ * Usually used directly as the @unreg method in event command
+ * implementations.
+ */
+static void unregister_trigger(char *glob, struct event_trigger_ops *ops,
+ void *trigger_data,
+ struct ftrace_event_file *file)
+{
+ struct event_trigger_data *test = trigger_data;
+ struct event_trigger_data *data;
+ bool unregistered = false;
+
+ list_for_each_entry_rcu(data, &file->triggers, list) {
+ if (data->cmd_ops->trigger_type == test->cmd_ops->trigger_type) {
+ unregistered = true;
+ list_del_rcu(&data->list);
+ trace_event_trigger_enable_disable(file, 0);
+ break;
+ }
+ }
+
+ if (unregistered && data->ops->free)
+ data->ops->free(data->ops, (void **)&data);
+}
+
+/**
+ * event_trigger_callback - generic event_command @func implementation
+ *
+ * Common implementation for event command parsing and trigger
+ * instantiation.
+ *
+ * Usually used directly as the @func method in event command
+ * implementations.
+ */
+static int
+event_trigger_callback(struct event_command *cmd_ops,
+ struct ftrace_event_file *file,
+ char *glob, char *cmd, char *param, int enabled)
+{
+ struct event_trigger_data *trigger_data;
+ struct event_trigger_ops *trigger_ops;
+ char *trigger = NULL;
+ char *number;
+ int ret;
+
+ if (!enabled)
+ return -EINVAL;
+
+ /* separate the trigger from the filter (t:n [if filter]) */
+ if (param && isdigit(param[0]))
+ trigger = strsep(&param, " \t");
+
+ trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
+
+ ret = -ENOMEM;
+ trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
+ if (!trigger_data)
+ goto out;
+
+ trigger_data->count = -1;
+ trigger_data->ops = trigger_ops;
+ trigger_data->cmd_ops = cmd_ops;
+ INIT_LIST_HEAD(&trigger_data->list);
+
+ if (glob[0] == '!') {
+ cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+ kfree(trigger_data);
+ ret = 0;
+ goto out;
+ }
+
+ if (trigger) {
+ number = strsep(&trigger, ":");
+
+ ret = -EINVAL;
+ if (!strlen(number))
+ goto out_free;
+
+ /*
+ * We use the callback data field (which is a pointer)
+ * as our counter.
+ */
+ ret = kstrtoul(number, 0, &trigger_data->count);
+ if (ret)
+ goto out_free;
+ }
+
+ if (!param) /* if param is non-empty, it's supposed to be a filter */
+ goto out_reg;
+
+ if (!cmd_ops->set_filter)
+ goto out_reg;
+
+ ret = cmd_ops->set_filter(param, trigger_data, file);
+ if (ret < 0)
+ goto out_free;
+
+ out_reg:
+ ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
+ /*
+ * The above returns on success the # of functions enabled,
+ * but if it didn't find any functions it returns zero.
+ * Consider no functions a failure too.
+ */
+ if (!ret) {
+ ret = -ENOENT;
+ goto out_free;
+ } else if (ret < 0)
+ goto out_free;
+ ret = 0;
+ out:
+ return ret;
+
+ out_free:
+ kfree(trigger_data);
+ goto out;
+}
+
+static void
+traceon_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (tracing_is_on())
+ return;
+
+ tracing_on();
+}
+
+static void
+traceon_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ traceon_trigger(_data);
+}
+
+static void
+traceoff_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!tracing_is_on())
+ return;
+
+ tracing_off();
+}
+
+static void
+traceoff_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ traceoff_trigger(_data);
+}
+
+static int
+traceon_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ return event_trigger_print("traceon", m, (void *)data->count,
+ data->filter_str);
+}
+
+static int
+traceoff_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ return event_trigger_print("traceoff", m, (void *)data->count,
+ data->filter_str);
+}
+
+static struct event_trigger_ops traceon_trigger_ops = {
+ .func = traceon_trigger,
+ .print = traceon_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops traceon_count_trigger_ops = {
+ .func = traceon_count_trigger,
+ .print = traceon_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops traceoff_trigger_ops = {
+ .func = traceoff_trigger,
+ .print = traceoff_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops traceoff_count_trigger_ops = {
+ .func = traceoff_count_trigger,
+ .print = traceoff_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops *
+onoff_get_trigger_ops(char *cmd, char *param)
+{
+ struct event_trigger_ops *ops;
+
+ /* we register both traceon and traceoff to this callback */
+ if (strcmp(cmd, "traceon") == 0)
+ ops = param ? &traceon_count_trigger_ops :
+ &traceon_trigger_ops;
+ else
+ ops = param ? &traceoff_count_trigger_ops :
+ &traceoff_trigger_ops;
+
+ return ops;
+}
+
+static struct event_command trigger_traceon_cmd = {
+ .name = "traceon",
+ .trigger_type = ETT_TRACE_ONOFF,
+ .func = event_trigger_callback,
+ .reg = register_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = onoff_get_trigger_ops,
+};
+
+static struct event_command trigger_traceoff_cmd = {
+ .name = "traceoff",
+ .trigger_type = ETT_TRACE_ONOFF,
+ .func = event_trigger_callback,
+ .reg = register_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = onoff_get_trigger_ops,
+};
+
+static __init void unregister_trigger_traceon_traceoff_cmds(void)
+{
+ unregister_event_command(&trigger_traceon_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ unregister_event_command(&trigger_traceoff_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+}
+
+static __init int register_trigger_traceon_traceoff_cmds(void)
+{
+ int ret;
+
+ ret = register_event_command(&trigger_traceon_cmd, &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0))
+ return ret;
+ ret = register_event_command(&trigger_traceoff_cmd, &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0))
+ unregister_trigger_traceon_traceoff_cmds();
+
+ return ret;
+}
+
__init int register_trigger_cmds(void)
{
+ int ret;
+
+ ret = register_trigger_traceon_traceoff_cmds();
+ if (ret) {
+ unregister_trigger_traceon_traceoff_cmds();
+ return ret;
+ }
+
return 0;
}
--
1.7.11.4

2013-08-27 19:40:36

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 04/10] tracing: add 'snapshot' event trigger command

Add 'snapshot' ftrace_func_command. snapshot event triggers are added
by the user via this command in a similar way and using practically
the same syntax as the analogous 'snapshot' ftrace function command,
but instead of writing to the set_ftrace_filter file, the snapshot
event trigger is written to the per-event 'trigger' files:

echo 'snapshot' > .../somesys/someevent/trigger

The above command will turn on snapshots for someevent i.e. whenever
someevent is hit, a snapshot will be done.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'snapshot:N' > .../somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will snapshot N times for someevent i.e. whenever
someevent is hit N times, a snapshot will be done.

Also adds a new ftrace_alloc_snapshot() function - the ftrace snapshot
command defines code that allocates a snapshot, which would be nice to
be able to reuse, which this does.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ftrace_event.h | 1 +
kernel/trace/trace.c | 9 ++++
kernel/trace/trace.h | 1 +
kernel/trace/trace_events_trigger.c | 89 +++++++++++++++++++++++++++++++++++++
4 files changed, 100 insertions(+)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 59354a0..21fae3d 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -319,6 +319,7 @@ struct ftrace_event_file {
enum event_trigger_type {
ETT_NONE = (0),
ETT_TRACE_ONOFF = (1 << 0),
+ ETT_SNAPSHOT = (1 << 1),
};

extern void destroy_preds(struct ftrace_event_call *call);
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 496f94d..5a61dbe 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5358,6 +5358,15 @@ static const struct file_operations tracing_dyn_info_fops = {
};
#endif /* CONFIG_DYNAMIC_FTRACE */

+#if defined(CONFIG_TRACER_SNAPSHOT)
+int ftrace_alloc_snapshot(void)
+{
+ return alloc_snapshot(&global_trace);
+}
+#else
+int ftrace_alloc_snapshot(void) { return -ENOSYS; }
+#endif
+
#if defined(CONFIG_TRACER_SNAPSHOT) && defined(CONFIG_DYNAMIC_FTRACE)
static void
ftrace_snapshot(unsigned long ip, unsigned long parent_ip, void **data)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f8a18e5..11fb2ea 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1189,6 +1189,7 @@ struct event_command {

extern int trace_event_enable_disable(struct ftrace_event_file *file,
int enable, int soft_disable);
+extern int ftrace_alloc_snapshot(void);

extern const char *__start___trace_bprintk_fmt[];
extern const char *__stop___trace_bprintk_fmt[];
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 91ad86b..1c266f9 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -709,6 +709,87 @@ static struct event_command trigger_traceoff_cmd = {
.get_trigger_ops = onoff_get_trigger_ops,
};

+static void
+snapshot_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ tracing_snapshot();
+}
+
+static void
+snapshot_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ snapshot_trigger(_data);
+}
+
+static int
+register_snapshot_trigger(char *glob, struct event_trigger_ops *ops,
+ void *data, struct ftrace_event_file *file)
+{
+ int ret = register_trigger(glob, ops, data, file);
+
+ if (ret > 0)
+ ftrace_alloc_snapshot();
+
+ return ret;
+}
+
+static int
+snapshot_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ return event_trigger_print("snapshot", m, (void *)data->count,
+ data->filter_str);
+}
+
+static struct event_trigger_ops snapshot_trigger_ops = {
+ .func = snapshot_trigger,
+ .print = snapshot_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops snapshot_count_trigger_ops = {
+ .func = snapshot_count_trigger,
+ .print = snapshot_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops *
+snapshot_get_trigger_ops(char *cmd, char *param)
+{
+ return param ? &snapshot_count_trigger_ops : &snapshot_trigger_ops;
+}
+
+static struct event_command trigger_snapshot_cmd = {
+ .name = "snapshot",
+ .trigger_type = ETT_SNAPSHOT,
+ .func = event_trigger_callback,
+ .reg = register_snapshot_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = snapshot_get_trigger_ops,
+};
+
static __init void unregister_trigger_traceon_traceoff_cmds(void)
{
unregister_event_command(&trigger_traceon_cmd,
@@ -745,5 +826,13 @@ __init int register_trigger_cmds(void)
return ret;
}

+ ret = register_event_command(&trigger_snapshot_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0)) {
+ unregister_trigger_traceon_traceoff_cmds();
+ return ret;
+ }
+
return 0;
}
--
1.7.11.4

2013-08-27 19:42:41

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v7 05/10] tracing: add 'stacktrace' event trigger command

Add 'stacktrace' ftrace_func_command. stacktrace event triggers are
added by the user via this command in a similar way and using
practically the same syntax as the analogous 'stacktrace' ftrace
function command, but instead of writing to the set_ftrace_filter
file, the stacktrace event trigger is written to the per-event
'trigger' files:

echo 'stacktrace' > .../tracing/events/somesys/someevent/trigger

The above command will turn on stacktraces for someevent i.e. whenever
someevent is hit, a stacktrace will be logged.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'stacktrace:N' > .../tracing/events/somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will log N stacktraces for someevent i.e. whenever
someevent is hit N times, a stacktrace will be logged.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ftrace_event.h | 1 +
kernel/trace/trace_events_trigger.c | 90 +++++++++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 21fae3d..df5eae9 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -320,6 +320,7 @@ enum event_trigger_type {
ETT_NONE = (0),
ETT_TRACE_ONOFF = (1 << 0),
ETT_SNAPSHOT = (1 << 1),
+ ETT_STACKTRACE = (1 << 2),
};

extern void destroy_preds(struct ftrace_event_call *call);
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 1c266f9..c83f56f 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -790,6 +790,85 @@ static struct event_command trigger_snapshot_cmd = {
.get_trigger_ops = snapshot_get_trigger_ops,
};

+/*
+ * Skip 4:
+ * ftrace_stacktrace()
+ * function_trace_probe_call()
+ * ftrace_ops_list_func()
+ * ftrace_call()
+ */
+#define STACK_SKIP 4
+
+static void
+stacktrace_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ trace_dump_stack(STACK_SKIP);
+}
+
+static void
+stacktrace_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ stacktrace_trigger(_data);
+}
+
+static int
+stacktrace_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ return event_trigger_print("stacktrace", m, (void *)data->count,
+ data->filter_str);
+}
+
+static struct event_trigger_ops stacktrace_trigger_ops = {
+ .func = stacktrace_trigger,
+ .print = stacktrace_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops stacktrace_count_trigger_ops = {
+ .func = stacktrace_count_trigger,
+ .print = stacktrace_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops *
+stacktrace_get_trigger_ops(char *cmd, char *param)
+{
+ return param ? &stacktrace_count_trigger_ops : &stacktrace_trigger_ops;
+}
+
+static struct event_command trigger_stacktrace_cmd = {
+ .name = "stacktrace",
+ .trigger_type = ETT_STACKTRACE,
+ .post_trigger = true,
+ .func = event_trigger_callback,
+ .reg = register_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = stacktrace_get_trigger_ops,
+};
+
static __init void unregister_trigger_traceon_traceoff_cmds(void)
{
unregister_event_command(&trigger_traceon_cmd,
@@ -834,5 +913,16 @@ __init int register_trigger_cmds(void)
return ret;
}

+ ret = register_event_command(&trigger_stacktrace_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0)) {
+ unregister_trigger_traceon_traceoff_cmds();
+ unregister_event_command(&trigger_snapshot_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ return ret;
+ }
+
return 0;
}
--
1.7.11.4

2013-08-27 20:01:53

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v7 01/10] tracing: Add support for SOFT_DISABLE to syscall events

On Tue, 27 Aug 2013 14:40:13 -0500
Tom Zanussi <[email protected]> wrote:

return;
> - if (!test_bit(syscall_nr, tr->enabled_enter_syscalls))
> +
> + /* Here we're inside the tp handler's rcu_read_lock (__DO_TRACE()) */
> + ftrace_file = rcu_dereference_raw(tr->enter_syscall_files[syscall_nr]);

What's the reason for using rcu_dereference_raw() and not normal
rcu_dereference?

-- Steve

> + if (!ftrace_file)
> + return;
> +
> + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> return;
>

2013-08-27 20:08:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v7 01/10] tracing: Add support for SOFT_DISABLE to syscall events

On Tue, 27 Aug 2013 14:40:13 -0500
Tom Zanussi <[email protected]> wrote:

> @@ -415,9 +429,14 @@ static void unreg_event_syscall_enter(struct ftrace_event_file *file,
> return;
> mutex_lock(&syscall_trace_lock);
> tr->sys_refcount_enter--;
> - clear_bit(num, tr->enabled_enter_syscalls);
> + rcu_assign_pointer(tr->enter_syscall_files[num], NULL);
> if (!tr->sys_refcount_enter)
> unregister_trace_sys_enter(ftrace_syscall_enter, tr);
> + /*
> + * Callers expect the event to be completely disabled on
> + * return, so wait for current handlers to finish.
> + */
> + synchronize_sched();
> mutex_unlock(&syscall_trace_lock);
> }
>
> @@ -435,7 +454,7 @@ static int reg_event_syscall_exit(struct ftrace_event_file *file,
> if (!tr->sys_refcount_exit)
> ret = register_trace_sys_exit(ftrace_syscall_exit, tr);
> if (!ret) {
> - set_bit(num, tr->enabled_exit_syscalls);
> + rcu_assign_pointer(tr->exit_syscall_files[num], file);
> tr->sys_refcount_exit++;
> }
> mutex_unlock(&syscall_trace_lock);
> @@ -453,9 +472,14 @@ static void unreg_event_syscall_exit(struct ftrace_event_file *file,
> return;
> mutex_lock(&syscall_trace_lock);
> tr->sys_refcount_exit--;
> - clear_bit(num, tr->enabled_exit_syscalls);
> + rcu_assign_pointer(tr->exit_syscall_files[num], NULL);
> if (!tr->sys_refcount_exit)
> unregister_trace_sys_exit(ftrace_syscall_exit, tr);
> + /*
> + * Callers expect the event to be completely disabled on
> + * return, so wait for current handlers to finish.
> + */
> + synchronize_sched();
> mutex_unlock(&syscall_trace_lock);

Can we do the synchronize_sched() after the mutex unlock in these two
places?

-- Steve


> }
>

2013-08-27 20:15:29

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v7 02/10] tracing: add basic event trigger framework

On Tue, 27 Aug 2013 14:40:14 -0500
Tom Zanussi <[email protected]> wrote:


> Signed-off-by: Tom Zanussi <[email protected]>
> Idea-by: Steve Rostedt <[email protected]>
> ---
> include/linux/ftrace_event.h | 13 +-
> include/trace/ftrace.h | 4 +
> kernel/trace/Makefile | 1 +
> kernel/trace/trace.h | 171 ++++++++++++++++++++++
> kernel/trace/trace_events.c | 21 ++-
> kernel/trace/trace_events_trigger.c | 280 ++++++++++++++++++++++++++++++++++++
> kernel/trace/trace_syscalls.c | 4 +
> 7 files changed, 488 insertions(+), 6 deletions(-)
> create mode 100644 kernel/trace/trace_events_trigger.c
>
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index 5eaa746..4b4fa62 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -255,6 +255,7 @@ enum {
> FTRACE_EVENT_FL_RECORDED_CMD_BIT,
> FTRACE_EVENT_FL_SOFT_MODE_BIT,
> FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> + FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> };
>
> /*
> @@ -263,13 +264,15 @@ enum {
> * RECORDED_CMD - The comms should be recorded at sched_switch
> * SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
> * SOFT_DISABLED - When set, do not trace the event (even though its
> - * tracepoint may be enabled)
> + * tracepoint may be enabled)

Nitpick... I like the extra space in the indentation. It makes the
continuation of the line obvious. Otherwise I start reading the next
line as part of the original line.


> + * TRIGGER_MODE - The event is enabled/disabled by SOFT_DISABLED
> */
> enum {
> FTRACE_EVENT_FL_ENABLED = (1 << FTRACE_EVENT_FL_ENABLED_BIT),
> FTRACE_EVENT_FL_RECORDED_CMD = (1 << FTRACE_EVENT_FL_RECORDED_CMD_BIT),
> FTRACE_EVENT_FL_SOFT_MODE = (1 << FTRACE_EVENT_FL_SOFT_MODE_BIT),
> FTRACE_EVENT_FL_SOFT_DISABLED = (1 << FTRACE_EVENT_FL_SOFT_DISABLED_BIT),
> + FTRACE_EVENT_FL_TRIGGER_MODE = (1 << FTRACE_EVENT_FL_TRIGGER_MODE_BIT),
> };
>
> struct ftrace_event_file {
> @@ -278,6 +281,7 @@ struct ftrace_event_file {
> struct dentry *dir;
> struct trace_array *tr;
> struct ftrace_subsystem_dir *system;
> + struct list_head triggers;
>
> /*
> * 32 bit flags:
> @@ -285,6 +289,7 @@ struct ftrace_event_file {
> * bit 1: enabled cmd record
> * bit 2: enable/disable with the soft disable bit
> * bit 3: soft disabled
> + * bit 4: trigger enabled
> *
> * Note: The bits must be set atomically to prevent races
> * from other writers. Reads of flags do not need to be in
> @@ -296,6 +301,7 @@ struct ftrace_event_file {
> */
> unsigned long flags;
> atomic_t sm_ref; /* soft-mode reference counter */
> + atomic_t tm_ref; /* trigger-mode reference counter */
> };
>
> #define __TRACE_EVENT_FLAGS(name, value) \
> @@ -310,12 +316,17 @@ struct ftrace_event_file {
>
> #define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */
>
> +enum event_trigger_type {
> + ETT_NONE = (0),
> +};
> +
> extern void destroy_preds(struct ftrace_event_call *call);
> extern int filter_match_preds(struct event_filter *filter, void *rec);
> extern int filter_current_check_discard(struct ring_buffer *buffer,
> struct ftrace_event_call *call,
> void *rec,
> struct ring_buffer_event *event);
> +extern void event_triggers_call(struct ftrace_event_file *file);
>
> enum {
> FILTER_OTHER = 0,
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index 41a6643..326ba32 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -526,6 +526,10 @@ ftrace_raw_event_##call(void *__data, proto) \
> int __data_size; \
> int pc; \
> \
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> + &ftrace_file->flags)) \
> + event_triggers_call(ftrace_file); \
> + \
> if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> &ftrace_file->flags)) \
> return; \
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index d7e2068..1378e84 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -50,6 +50,7 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
> obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
> endif
> obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
> +obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
> obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
> obj-$(CONFIG_TRACEPOINTS) += power-traces.o
> ifeq ($(CONFIG_PM_RUNTIME),y)
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index b1227b9..f8a18e5 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -1016,9 +1016,180 @@ extern void trace_event_enable_cmd_record(bool enable);
> extern int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr);
> extern int event_trace_del_tracer(struct trace_array *tr);
>
> +static inline void *event_file_data(struct file *filp)
> +{
> + return ACCESS_ONCE(file_inode(filp)->i_private);
> +}
> +
> extern struct mutex event_mutex;
> extern struct list_head ftrace_events;
>
> +extern const struct file_operations event_trigger_fops;
> +
> +extern int register_trigger_cmds(void);
> +extern void clear_event_triggers(struct trace_array *tr);
> +
> +/**
> + * struct event_trigger_ops - callbacks for trace event triggers
> + *
> + * The methods in this structure provide per-event trigger hooks for
> + * various trigger operations.
> + *
> + * All the methods below, except for @init() and @free(), must be
> + * implemented.
> + *
> + * @func: The trigger 'probe' function called when the triggering
> + * event occurs. The data passed into this callback is the data
> + * that was supplied to the event_command @reg() function that
> + * registered the trigger (see struct event_command).
> + *
> + * @init: An optional initialization function called for the trigger
> + * when the trigger is registered (via the event_command reg()
> + * function). This can be used to perform per-trigger
> + * initialization such as incrementing a per-trigger reference
> + * count, for instance. This is usually implemented by the
> + * generic utility function @event_trigger_init() (see
> + * trace_event_triggers.c).
> + *
> + * @free: An optional de-initialization function called for the
> + * trigger when the trigger is unregistered (via the
> + * event_command @reg() function). This can be used to perform
> + * per-trigger de-initialization such as decrementing a
> + * per-trigger reference count and freeing corresponding trigger
> + * data, for instance. This is usually implemented by the
> + * generic utility function @event_trigger_free() (see
> + * trace_event_triggers.c).
> + *
> + * @print: The callback function invoked to have the trigger print
> + * itself. This is usually implemented by a wrapper function
> + * that calls the generic utility function @event_trigger_print()
> + * (see trace_event_triggers.c).
> + */
> +struct event_trigger_ops {
> + void (*func)(void **data);

Why the void **?

> + int (*init)(struct event_trigger_ops *ops,
> + void **data);
> + void (*free)(struct event_trigger_ops *ops,
> + void **data);
> + int (*print)(struct seq_file *m,
> + struct event_trigger_ops *ops,
> + void *data);
> +};
> +
> +/**
> + * struct event_command - callbacks and data members for event commands
> + *
> + * Event commands are invoked by users by writing the command name
> + * into the 'trigger' file associated with a trace event. The
> + * parameters associated with a specific invocation of an event
> + * command are used to create an event trigger instance, which is
> + * added to the list of trigger instances associated with that trace
> + * event. When the event is hit, the set of triggers associated with
> + * that event is invoked.
> + *
> + * The data members in this structure provide per-event command data
> + * for various event commands.
> + *
> + * All the data members below, except for @post_trigger, must be set
> + * for each event command.
> + *
> + * @name: The unique name that identifies the event command. This is
> + * the name used when setting triggers via trigger files.
> + *
> + * @trigger_type: A unique id that identifies the event command
> + * 'type'. This value has two purposes, the first to ensure that
> + * only one trigger of the same type can be set at a given time
> + * for a particular event e.g. it doesn't make sense to have both
> + * a traceon and traceoff trigger attached to a single event at
> + * the same time, so traceon and traceoff have the same type
> + * though they have different names. The @trigger_type value is

Heh, I didn't bother for that in the ftrace triggers. If the root user
wants to start and stop tracing on the same function, so be it. They
get what they ask for.

I'm not against this, I was just pointing out how you care more than I
do about keeping root from stepping on his/her own toes ;-)

> + * also used as a bit value for deferring the actual trigger
> + * action until after the current event is finished. Some
> + * commands need to do this if they themselves log to the trace
> + * buffer (see the @post_trigger() member below). @trigger_type
> + * values are defined by adding new values to the trigger_type
> + * enum in include/linux/ftrace_event.h.
> + *
> + * @post_trigger: A flag that says whether or not this command needs
> + * to have its action delayed until after the current event has
> + * been closed. Some triggers need to avoid being invoked while
> + * an event is currently in the process of being logged, since
> + * the trigger may itself log data into the trace buffer. Thus
> + * we make sure the current event is committed before invoking
> + * those triggers. To do that, the trigger invocation is split
> + * in two - the first part checks the filter using the current
> + * trace record; if a command has the @post_trigger flag set, it
> + * sets a bit for itself in the return value, otherwise it
> + * directly invokes the trigger. Once all commands have been
> + * either invoked or set their return flag, the current record is
> + * either committed or discarded. At that point, if any commands
> + * have deferred their triggers, those commands are finally
> + * invoked following the close of the current event. In other
> + * words, if the event_trigger_ops @func() probe implementation
> + * itself logs to the trace buffer, this flag should be set,
> + * otherwise it can be left unspecified.

Is this just for consistency of output layout? So the normal event gets
printed first before this prints anything, like a stack trace?

> + *
> + * All the methods below, except for @set_filter(), must be
> + * implemented.
> + *
> + * @func: The callback function responsible for parsing and
> + * registering the trigger written to the 'trigger' file by the
> + * user. It allocates the trigger instance and registers it with
> + * the appropriate trace event. It makes use of the other
> + * event_command callback functions to orchestrate this, and is
> + * usually implemented by the generic utility function
> + * @event_trigger_callback() (see trace_event_triggers.c).
> + *
> + * @reg: Adds the trigger to the list of triggers associated with the
> + * event, and enables the event trigger itself, after
> + * initializing it (via the event_trigger_ops @init() function).
> + * This is also where commands can use the @trigger_type value to
> + * make the decision as to whether or not multiple instances of
> + * the trigger should be allowed. This is usually implemented by
> + * the generic utility function @register_trigger() (see
> + * trace_event_triggers.c).
> + *
> + * @unreg: Removes the trigger from the list of triggers associated
> + * with the event, and disables the event trigger itself, after
> + * initializing it (via the event_trigger_ops @free() function).
> + * This is usually implemented by the generic utility function
> + * @unregister_trigger() (see trace_event_triggers.c).
> + *
> + * @set_filter: An optional function called to parse and set a filter
> + * for the trigger. If no @set_filter() method is set for the
> + * event command, filters set by the user for the command will be
> + * ignored. This is usually implemented by the generic utility
> + * function @set_trigger_filter() (see trace_event_triggers.c).
> + *
> + * @get_trigger_ops: The callback function invoked to retrieve the
> + * event_trigger_ops implementation associated with the command.
> + */
> +struct event_command {
> + struct list_head list;
> + char *name;
> + enum event_trigger_type trigger_type;
> + bool post_trigger;
> + int (*func)(struct event_command *cmd_ops,
> + struct ftrace_event_file *file,
> + char *glob, char *cmd,
> + char *params, int enable);
> + int (*reg)(char *glob,
> + struct event_trigger_ops *trigger_ops,
> + void *trigger_data,
> + struct ftrace_event_file *file);
> + void (*unreg)(char *glob,
> + struct event_trigger_ops *trigger_ops,
> + void *trigger_data,
> + struct ftrace_event_file *file);
> + int (*set_filter)(char *filter_str,
> + void *trigger_data,
> + struct ftrace_event_file *file);
> + struct event_trigger_ops *(*get_trigger_ops)(char *cmd, char *param);
> +};
> +
> +extern int trace_event_enable_disable(struct ftrace_event_file *file,
> + int enable, int soft_disable);
> +
> extern const char *__start___trace_bprintk_fmt[];
> extern const char *__stop___trace_bprintk_fmt[];
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 368a4d5..7d8eb8a 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -342,6 +342,12 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
> return ret;
> }
>
> +int trace_event_enable_disable(struct ftrace_event_file *file,
> + int enable, int soft_disable)
> +{
> + return __ftrace_event_enable_disable(file, enable, soft_disable);
> +}
> +
> static int ftrace_event_enable_disable(struct ftrace_event_file *file,
> int enable)
> {
> @@ -421,11 +427,6 @@ static void remove_subsystem(struct ftrace_subsystem_dir *dir)
> }
> }
>
> -static void *event_file_data(struct file *filp)
> -{
> - return ACCESS_ONCE(file_inode(filp)->i_private);
> -}
> -
> static void remove_event_file_dir(struct ftrace_event_file *file)
> {
> struct dentry *dir = file->dir;
> @@ -1542,6 +1543,9 @@ event_create_dir(struct dentry *parent, struct ftrace_event_file *file)
> trace_create_file("filter", 0644, file->dir, call,
> &ftrace_event_filter_fops);
>
> + trace_create_file("trigger", 0644, file->dir, file,
> + &event_trigger_fops);
> +
> trace_create_file("format", 0444, file->dir, call,
> &ftrace_event_format_fops);
>
> @@ -1637,6 +1641,8 @@ trace_create_new_event(struct ftrace_event_call *call,
> file->event_call = call;
> file->tr = tr;
> atomic_set(&file->sm_ref, 0);
> + atomic_set(&file->tm_ref, 0);
> + INIT_LIST_HEAD(&file->triggers);
> list_add(&file->list, &tr->events);
>
> return file;
> @@ -2303,6 +2309,9 @@ int event_trace_del_tracer(struct trace_array *tr)
> {
> mutex_lock(&event_mutex);
>
> + /* Disable any event triggers and associated soft-disabled events */
> + clear_event_triggers(tr);
> +
> /* Disable any running events */
> __ftrace_set_clr_event_nolock(tr, NULL, NULL, NULL, 0);
>
> @@ -2366,6 +2375,8 @@ static __init int event_trace_enable(void)
>
> register_event_cmds();
>
> + register_trigger_cmds();
> +
> return 0;
> }
>
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> new file mode 100644
> index 0000000..5ec8336
> --- /dev/null
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -0,0 +1,280 @@
> +/*
> + * trace_events_trigger - trace event triggers
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright (C) 2013 Tom Zanussi <[email protected]>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/ctype.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +
> +#include "trace.h"
> +
> +static LIST_HEAD(trigger_commands);
> +static DEFINE_MUTEX(trigger_cmd_mutex);
> +
> +struct event_trigger_data {
> + struct ftrace_event_file *file;
> + unsigned long count;
> + int ref;
> + bool enable;
> + struct event_trigger_ops *ops;
> + struct event_command * cmd_ops;
> + struct event_filter *filter;
> + char *filter_str;
> + struct list_head list;
> +};
> +
> +void event_triggers_call(struct ftrace_event_file *file)
> +{
> + struct event_trigger_data *data;
> +
> + if (list_empty(&file->triggers))
> + return;
> +
> + preempt_disable_notrace();
> + list_for_each_entry_rcu(data, &file->triggers, list)
> + data->ops->func((void **)&data);

Why is this a void ** and why are you passing the address of the
variable that holds the pointer to the data that was registered and not
the pointer to the data itself?

Do you plan on modifying the local data variable on return?

> + preempt_enable_notrace();
> +}
> +EXPORT_SYMBOL_GPL(event_triggers_call);
> +
> +static void *trigger_next(struct seq_file *m, void *t, loff_t *pos)
> +{
> + struct ftrace_event_file *event_file = event_file_data(m->private);
> +
> + return seq_list_next(t, &event_file->triggers, pos);
> +}
> +
> +static void *trigger_start(struct seq_file *m, loff_t *pos)
> +{
> + struct ftrace_event_file *event_file;
> +
> + /* ->stop() is called even if ->start() fails */
> + mutex_lock(&event_mutex);
> + event_file = event_file_data(m->private);
> + if (unlikely(!event_file))
> + return ERR_PTR(-ENODEV);
> +
> + return seq_list_start(&event_file->triggers, *pos);
> +}
> +
> +static void trigger_stop(struct seq_file *m, void *t)
> +{
> + mutex_unlock(&event_mutex);
> +}
> +
> +static int trigger_show(struct seq_file *m, void *v)
> +{
> + struct event_trigger_data *data;
> +
> + data = list_entry(v, struct event_trigger_data, list);
> + data->ops->print(m, data->ops, data);
> +
> + return 0;
> +}
> +
> +static const struct seq_operations event_triggers_seq_ops = {
> + .start = trigger_start,
> + .next = trigger_next,
> + .stop = trigger_stop,
> + .show = trigger_show,
> +};
> +
> +static int event_trigger_regex_open(struct inode *inode, struct file *file)
> +{
> + int ret = 0;
> +
> + mutex_lock(&event_mutex);
> +
> + if (unlikely(!event_file_data(file))) {
> + mutex_unlock(&event_mutex);
> + return -ENODEV;
> + }
> +
> + if (file->f_mode & FMODE_READ) {
> + ret = seq_open(file, &event_triggers_seq_ops);
> + if (!ret) {
> + struct seq_file *m = file->private_data;
> + m->private = file;
> + }
> + }
> +
> + mutex_unlock(&event_mutex);
> +
> + return ret;
> +}
> +
> +static int trigger_process_regex(struct ftrace_event_file *file,
> + char *buff, int enable)
> +{
> + char *command, *next = buff;
> + struct event_command *p;
> + int ret = -EINVAL;
> +
> + command = strsep(&next, ": \t");
> + command = (command[0] != '!') ? command : command + 1;
> +
> + mutex_lock(&trigger_cmd_mutex);
> + list_for_each_entry(p, &trigger_commands, list) {
> + if (strcmp(p->name, command) == 0) {
> + ret = p->func(p, file, buff, command, next, enable);
> + goto out_unlock;
> + }
> + }
> + out_unlock:
> + mutex_unlock(&trigger_cmd_mutex);
> +
> + return ret;
> +}
> +
> +static ssize_t event_trigger_regex_write(struct file *file,
> + const char __user *ubuf,
> + size_t cnt, loff_t *ppos, int enable)
> +{
> + struct ftrace_event_file *event_file;
> + ssize_t ret;
> + char *buf;
> +
> + if (!cnt)
> + return 0;
> +
> + if (cnt >= PAGE_SIZE)
> + return -EINVAL;
> +
> + buf = (char *)__get_free_page(GFP_TEMPORARY);
> + if (!buf)
> + return -ENOMEM;
> +
> + if (copy_from_user(buf, ubuf, cnt)) {
> + free_page((unsigned long) buf);
> + return -EFAULT;
> + }
> + buf[cnt] = '\0';
> + strim(buf);
> +
> + mutex_lock(&event_mutex);
> + event_file = event_file_data(file);
> + if (unlikely(!event_file)) {
> + mutex_unlock(&event_mutex);
> + free_page((unsigned long) buf);
> + return -ENODEV;
> + }
> + ret = trigger_process_regex(event_file, buf, enable);
> + mutex_unlock(&event_mutex);
> +
> + free_page((unsigned long) buf);
> + if (ret < 0)
> + goto out;
> +
> + *ppos += cnt;
> + ret = cnt;
> + out:
> + return ret;
> +}
> +
> +static int event_trigger_regex_release(struct inode *inode, struct file *file)
> +{
> + mutex_lock(&event_mutex);
> +
> + if (file->f_mode & FMODE_READ)
> + seq_release(inode, file);
> +
> + mutex_unlock(&event_mutex);
> +
> + return 0;
> +}
> +
> +static ssize_t
> +event_trigger_write(struct file *filp, const char __user *ubuf,
> + size_t cnt, loff_t *ppos)
> +{
> + return event_trigger_regex_write(filp, ubuf, cnt, ppos, 1);
> +}
> +
> +static int
> +event_trigger_open(struct inode *inode, struct file *filp)
> +{
> + return event_trigger_regex_open(inode, filp);
> +}
> +
> +static int
> +event_trigger_release(struct inode *inode, struct file *file)
> +{
> + return event_trigger_regex_release(inode, file);
> +}
> +
> +const struct file_operations event_trigger_fops = {
> + .open = event_trigger_open,
> + .read = seq_read,
> + .write = event_trigger_write,
> + .llseek = ftrace_filter_lseek,
> + .release = event_trigger_release,
> +};
> +
> +static int trace_event_trigger_enable_disable(struct ftrace_event_file *file,
> + int trigger_enable)
> +{
> + int ret = 0;
> +
> + if (trigger_enable) {
> + if (atomic_inc_return(&file->tm_ref) > 1)
> + return ret;
> + set_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
> + ret = trace_event_enable_disable(file, 1, 1);
> + } else {
> + if (atomic_dec_return(&file->tm_ref) > 0)
> + return ret;
> + clear_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
> + ret = trace_event_enable_disable(file, 0, 1);
> + }
> +
> + return ret;
> +}
> +
> +/**
> + * clear_event_triggers - clear all triggers associated with a trace array.
> + *
> + * For each trigger, the triggering event has its tm_ref decremented
> + * via trace_event_trigger_enable_disable(), and any associated event
> + * (in the case of enable/disable_event triggers) will have its sm_ref
> + * decremented via free()->trace_event_enable_disable(). That
> + * combination effectively reverses the soft-mode/trigger state added
> + * by trigger registration.
> + *
> + * Must be called with event_mutex held.
> + */
> +void
> +clear_event_triggers(struct trace_array *tr)
> +{
> + struct ftrace_event_file *file;
> +
> + list_for_each_entry(file, &tr->events, list) {
> + struct event_trigger_data *data;
> + list_for_each_entry_rcu(data, &file->triggers, list) {
> + trace_event_trigger_enable_disable(file, 0);
> + if (data->ops->free)
> + data->ops->free(data->ops, (void **)&data);

ditto here.

-- Steve

> + }
> + }
> +}
> +
> +__init int register_trigger_cmds(void)
> +{
> + return 0;
> +}
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 230cdb6..4f56d54 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -321,6 +321,8 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> if (!ftrace_file)
> return;
>
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> + event_triggers_call(ftrace_file);
> if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> return;
>
> @@ -370,6 +372,8 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> if (!ftrace_file)
> return;
>
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> + event_triggers_call(ftrace_file);
> if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> return;
>

2013-08-27 20:17:22

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v7 02/10] tracing: add basic event trigger framework

On Tue, 27 Aug 2013 14:40:14 -0500
Tom Zanussi <[email protected]> wrote:

> +
> + if (copy_from_user(buf, ubuf, cnt)) {
> + free_page((unsigned long) buf);
> + return -EFAULT;
> + }
> + buf[cnt] = '\0';
> + strim(buf);
> +
> + mutex_lock(&event_mutex);
> + event_file = event_file_data(file);
> + if (unlikely(!event_file)) {
> + mutex_unlock(&event_mutex);
> + free_page((unsigned long) buf);

Nitpick... Nuke the space before "buf".

-- Steve

> + return -ENODEV;
> + }
> + ret = trigger_process_regex(event_file, buf, enable);
> + mutex_unlock(&event_mutex);
> +
> + free_page((unsigned long) buf);
> + if (ret < 0)
> + goto out;

2013-08-27 23:29:12

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v7 01/10] tracing: Add support for SOFT_DISABLE to syscall events

On Tue, 2013-08-27 at 16:01 -0400, Steven Rostedt wrote:
> On Tue, 27 Aug 2013 14:40:13 -0500
> Tom Zanussi <[email protected]> wrote:
>
> return;
> > - if (!test_bit(syscall_nr, tr->enabled_enter_syscalls))
> > +
> > + /* Here we're inside the tp handler's rcu_read_lock (__DO_TRACE()) */
> > + ftrace_file = rcu_dereference_raw(tr->enter_syscall_files[syscall_nr]);
>
> What's the reason for using rcu_dereference_raw() and not normal
> rcu_dereference?
>

This is because we know the tracepoint handler has rcu_read_lock_held()
and so we don't need to do the rcu_dereference_check().

Tom

> -- Steve
>
> > + if (!ftrace_file)
> > + return;
> > +
> > + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> > return;
> >

2013-08-27 23:32:25

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v7 01/10] tracing: Add support for SOFT_DISABLE to syscall events

On Tue, 2013-08-27 at 16:08 -0400, Steven Rostedt wrote:
> On Tue, 27 Aug 2013 14:40:13 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > @@ -415,9 +429,14 @@ static void unreg_event_syscall_enter(struct ftrace_event_file *file,
> > return;
> > mutex_lock(&syscall_trace_lock);
> > tr->sys_refcount_enter--;
> > - clear_bit(num, tr->enabled_enter_syscalls);
> > + rcu_assign_pointer(tr->enter_syscall_files[num], NULL);
> > if (!tr->sys_refcount_enter)
> > unregister_trace_sys_enter(ftrace_syscall_enter, tr);
> > + /*
> > + * Callers expect the event to be completely disabled on
> > + * return, so wait for current handlers to finish.
> > + */
> > + synchronize_sched();
> > mutex_unlock(&syscall_trace_lock);
> > }
> >
> > @@ -435,7 +454,7 @@ static int reg_event_syscall_exit(struct ftrace_event_file *file,
> > if (!tr->sys_refcount_exit)
> > ret = register_trace_sys_exit(ftrace_syscall_exit, tr);
> > if (!ret) {
> > - set_bit(num, tr->enabled_exit_syscalls);
> > + rcu_assign_pointer(tr->exit_syscall_files[num], file);
> > tr->sys_refcount_exit++;
> > }
> > mutex_unlock(&syscall_trace_lock);
> > @@ -453,9 +472,14 @@ static void unreg_event_syscall_exit(struct ftrace_event_file *file,
> > return;
> > mutex_lock(&syscall_trace_lock);
> > tr->sys_refcount_exit--;
> > - clear_bit(num, tr->enabled_exit_syscalls);
> > + rcu_assign_pointer(tr->exit_syscall_files[num], NULL);
> > if (!tr->sys_refcount_exit)
> > unregister_trace_sys_exit(ftrace_syscall_exit, tr);
> > + /*
> > + * Callers expect the event to be completely disabled on
> > + * return, so wait for current handlers to finish.
> > + */
> > + synchronize_sched();
> > mutex_unlock(&syscall_trace_lock);
>
> Can we do the synchronize_sched() after the mutex unlock in these two
> places?
>

Yeah, I think that should be ok and there should be no need to delay
waiters for that mutex - I'll look at moving it out for the next
revision.

Thanks,

Tom

> -- Steve
>
>
> > }
> >
>

2013-08-27 23:39:00

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v7 02/10] tracing: add basic event trigger framework

On Tue, 2013-08-27 at 16:15 -0400, Steven Rostedt wrote:
> On Tue, 27 Aug 2013 14:40:14 -0500
> Tom Zanussi <[email protected]> wrote:
>
>
> > Signed-off-by: Tom Zanussi <[email protected]>
> > Idea-by: Steve Rostedt <[email protected]>
> > ---
> > include/linux/ftrace_event.h | 13 +-
> > include/trace/ftrace.h | 4 +
> > kernel/trace/Makefile | 1 +
> > kernel/trace/trace.h | 171 ++++++++++++++++++++++
> > kernel/trace/trace_events.c | 21 ++-
> > kernel/trace/trace_events_trigger.c | 280 ++++++++++++++++++++++++++++++++++++
> > kernel/trace/trace_syscalls.c | 4 +
> > 7 files changed, 488 insertions(+), 6 deletions(-)
> > create mode 100644 kernel/trace/trace_events_trigger.c
> >
> > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> > index 5eaa746..4b4fa62 100644
> > --- a/include/linux/ftrace_event.h
> > +++ b/include/linux/ftrace_event.h
> > @@ -255,6 +255,7 @@ enum {
> > FTRACE_EVENT_FL_RECORDED_CMD_BIT,
> > FTRACE_EVENT_FL_SOFT_MODE_BIT,
> > FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> > + FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> > };
> >
> > /*
> > @@ -263,13 +264,15 @@ enum {
> > * RECORDED_CMD - The comms should be recorded at sched_switch
> > * SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
> > * SOFT_DISABLED - When set, do not trace the event (even though its
> > - * tracepoint may be enabled)
> > + * tracepoint may be enabled)
>
> Nitpick... I like the extra space in the indentation. It makes the
> continuation of the line obvious. Otherwise I start reading the next
> line as part of the original line.
>

OK, I can put that back.

>
> > + * TRIGGER_MODE - The event is enabled/disabled by SOFT_DISABLED
> > */
> > enum {
> > FTRACE_EVENT_FL_ENABLED = (1 << FTRACE_EVENT_FL_ENABLED_BIT),
> > FTRACE_EVENT_FL_RECORDED_CMD = (1 << FTRACE_EVENT_FL_RECORDED_CMD_BIT),
> > FTRACE_EVENT_FL_SOFT_MODE = (1 << FTRACE_EVENT_FL_SOFT_MODE_BIT),
> > FTRACE_EVENT_FL_SOFT_DISABLED = (1 << FTRACE_EVENT_FL_SOFT_DISABLED_BIT),
> > + FTRACE_EVENT_FL_TRIGGER_MODE = (1 << FTRACE_EVENT_FL_TRIGGER_MODE_BIT),
> > };
> >
> > struct ftrace_event_file {
> > @@ -278,6 +281,7 @@ struct ftrace_event_file {
> > struct dentry *dir;
> > struct trace_array *tr;
> > struct ftrace_subsystem_dir *system;
> > + struct list_head triggers;
> >
> > /*
> > * 32 bit flags:
> > @@ -285,6 +289,7 @@ struct ftrace_event_file {
> > * bit 1: enabled cmd record
> > * bit 2: enable/disable with the soft disable bit
> > * bit 3: soft disabled
> > + * bit 4: trigger enabled
> > *
> > * Note: The bits must be set atomically to prevent races
> > * from other writers. Reads of flags do not need to be in
> > @@ -296,6 +301,7 @@ struct ftrace_event_file {
> > */
> > unsigned long flags;
> > atomic_t sm_ref; /* soft-mode reference counter */
> > + atomic_t tm_ref; /* trigger-mode reference counter */
> > };
> >
> > #define __TRACE_EVENT_FLAGS(name, value) \
> > @@ -310,12 +316,17 @@ struct ftrace_event_file {
> >
> > #define MAX_FILTER_STR_VAL 256 /* Should handle KSYM_SYMBOL_LEN */
> >
> > +enum event_trigger_type {
> > + ETT_NONE = (0),
> > +};
> > +
> > extern void destroy_preds(struct ftrace_event_call *call);
> > extern int filter_match_preds(struct event_filter *filter, void *rec);
> > extern int filter_current_check_discard(struct ring_buffer *buffer,
> > struct ftrace_event_call *call,
> > void *rec,
> > struct ring_buffer_event *event);
> > +extern void event_triggers_call(struct ftrace_event_file *file);
> >
> > enum {
> > FILTER_OTHER = 0,
> > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> > index 41a6643..326ba32 100644
> > --- a/include/trace/ftrace.h
> > +++ b/include/trace/ftrace.h
> > @@ -526,6 +526,10 @@ ftrace_raw_event_##call(void *__data, proto) \
> > int __data_size; \
> > int pc; \
> > \
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > + &ftrace_file->flags)) \
> > + event_triggers_call(ftrace_file); \
> > + \
> > if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> > &ftrace_file->flags)) \
> > return; \
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index d7e2068..1378e84 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -50,6 +50,7 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
> > obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
> > endif
> > obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
> > +obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
> > obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
> > obj-$(CONFIG_TRACEPOINTS) += power-traces.o
> > ifeq ($(CONFIG_PM_RUNTIME),y)
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index b1227b9..f8a18e5 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -1016,9 +1016,180 @@ extern void trace_event_enable_cmd_record(bool enable);
> > extern int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr);
> > extern int event_trace_del_tracer(struct trace_array *tr);
> >
> > +static inline void *event_file_data(struct file *filp)
> > +{
> > + return ACCESS_ONCE(file_inode(filp)->i_private);
> > +}
> > +
> > extern struct mutex event_mutex;
> > extern struct list_head ftrace_events;
> >
> > +extern const struct file_operations event_trigger_fops;
> > +
> > +extern int register_trigger_cmds(void);
> > +extern void clear_event_triggers(struct trace_array *tr);
> > +
> > +/**
> > + * struct event_trigger_ops - callbacks for trace event triggers
> > + *
> > + * The methods in this structure provide per-event trigger hooks for
> > + * various trigger operations.
> > + *
> > + * All the methods below, except for @init() and @free(), must be
> > + * implemented.
> > + *
> > + * @func: The trigger 'probe' function called when the triggering
> > + * event occurs. The data passed into this callback is the data
> > + * that was supplied to the event_command @reg() function that
> > + * registered the trigger (see struct event_command).
> > + *
> > + * @init: An optional initialization function called for the trigger
> > + * when the trigger is registered (via the event_command reg()
> > + * function). This can be used to perform per-trigger
> > + * initialization such as incrementing a per-trigger reference
> > + * count, for instance. This is usually implemented by the
> > + * generic utility function @event_trigger_init() (see
> > + * trace_event_triggers.c).
> > + *
> > + * @free: An optional de-initialization function called for the
> > + * trigger when the trigger is unregistered (via the
> > + * event_command @reg() function). This can be used to perform
> > + * per-trigger de-initialization such as decrementing a
> > + * per-trigger reference count and freeing corresponding trigger
> > + * data, for instance. This is usually implemented by the
> > + * generic utility function @event_trigger_free() (see
> > + * trace_event_triggers.c).
> > + *
> > + * @print: The callback function invoked to have the trigger print
> > + * itself. This is usually implemented by a wrapper function
> > + * that calls the generic utility function @event_trigger_print()
> > + * (see trace_event_triggers.c).
> > + */
> > +struct event_trigger_ops {
> > + void (*func)(void **data);
>
> Why the void **?
>

This is another vestige of the original patchset, where I tried to keep
the event triggers and the ftrace triggers unified. One of the things
that I attempted to keep common was this struct and the
ftrace_probe_ops, which you can see have essentially the same signatures
for all the functions (minus the ip params). Anyway, since that's not
a goal of this patchset, I'll just simplify and not use the double
pointers.

> > + int (*init)(struct event_trigger_ops *ops,
> > + void **data);
> > + void (*free)(struct event_trigger_ops *ops,
> > + void **data);
> > + int (*print)(struct seq_file *m,
> > + struct event_trigger_ops *ops,
> > + void *data);
> > +};
> > +
> > +/**
> > + * struct event_command - callbacks and data members for event commands
> > + *
> > + * Event commands are invoked by users by writing the command name
> > + * into the 'trigger' file associated with a trace event. The
> > + * parameters associated with a specific invocation of an event
> > + * command are used to create an event trigger instance, which is
> > + * added to the list of trigger instances associated with that trace
> > + * event. When the event is hit, the set of triggers associated with
> > + * that event is invoked.
> > + *
> > + * The data members in this structure provide per-event command data
> > + * for various event commands.
> > + *
> > + * All the data members below, except for @post_trigger, must be set
> > + * for each event command.
> > + *
> > + * @name: The unique name that identifies the event command. This is
> > + * the name used when setting triggers via trigger files.
> > + *
> > + * @trigger_type: A unique id that identifies the event command
> > + * 'type'. This value has two purposes, the first to ensure that
> > + * only one trigger of the same type can be set at a given time
> > + * for a particular event e.g. it doesn't make sense to have both
> > + * a traceon and traceoff trigger attached to a single event at
> > + * the same time, so traceon and traceoff have the same type
> > + * though they have different names. The @trigger_type value is
>
> Heh, I didn't bother for that in the ftrace triggers. If the root user
> wants to start and stop tracing on the same function, so be it. They
> get what they ask for.
>
> I'm not against this, I was just pointing out how you care more than I
> do about keeping root from stepping on his/her own toes ;-)
>
> > + * also used as a bit value for deferring the actual trigger
> > + * action until after the current event is finished. Some
> > + * commands need to do this if they themselves log to the trace
> > + * buffer (see the @post_trigger() member below). @trigger_type
> > + * values are defined by adding new values to the trigger_type
> > + * enum in include/linux/ftrace_event.h.
> > + *
> > + * @post_trigger: A flag that says whether or not this command needs
> > + * to have its action delayed until after the current event has
> > + * been closed. Some triggers need to avoid being invoked while
> > + * an event is currently in the process of being logged, since
> > + * the trigger may itself log data into the trace buffer. Thus
> > + * we make sure the current event is committed before invoking
> > + * those triggers. To do that, the trigger invocation is split
> > + * in two - the first part checks the filter using the current
> > + * trace record; if a command has the @post_trigger flag set, it
> > + * sets a bit for itself in the return value, otherwise it
> > + * directly invokes the trigger. Once all commands have been
> > + * either invoked or set their return flag, the current record is
> > + * either committed or discarded. At that point, if any commands
> > + * have deferred their triggers, those commands are finally
> > + * invoked following the close of the current event. In other
> > + * words, if the event_trigger_ops @func() probe implementation
> > + * itself logs to the trace buffer, this flag should be set,
> > + * otherwise it can be left unspecified.
>
> Is this just for consistency of output layout? So the normal event gets
> printed first before this prints anything, like a stack trace?
>

Yeah, that's one of the reasons - it makes sense to me to always have
the triggering event print first, but also, this is essentially the
implementation of the post triggers we discussed here:

https://lkml.org/lkml/2013/6/22/37

>From there, basically the problem is that the trace_recursive_lock()
check in ring_buffer_lock_reserve() prevents a trigger that itself logs
to the ring buffer from reserving a slot in the buffer, since it's being
done from the same context as the triggering event. For example, the
stacktrace trigger, which calls trace_dump_stack().

> > + *
> > + * All the methods below, except for @set_filter(), must be
> > + * implemented.
> > + *
> > + * @func: The callback function responsible for parsing and
> > + * registering the trigger written to the 'trigger' file by the
> > + * user. It allocates the trigger instance and registers it with
> > + * the appropriate trace event. It makes use of the other
> > + * event_command callback functions to orchestrate this, and is
> > + * usually implemented by the generic utility function
> > + * @event_trigger_callback() (see trace_event_triggers.c).
> > + *
> > + * @reg: Adds the trigger to the list of triggers associated with the
> > + * event, and enables the event trigger itself, after
> > + * initializing it (via the event_trigger_ops @init() function).
> > + * This is also where commands can use the @trigger_type value to
> > + * make the decision as to whether or not multiple instances of
> > + * the trigger should be allowed. This is usually implemented by
> > + * the generic utility function @register_trigger() (see
> > + * trace_event_triggers.c).
> > + *
> > + * @unreg: Removes the trigger from the list of triggers associated
> > + * with the event, and disables the event trigger itself, after
> > + * initializing it (via the event_trigger_ops @free() function).
> > + * This is usually implemented by the generic utility function
> > + * @unregister_trigger() (see trace_event_triggers.c).
> > + *
> > + * @set_filter: An optional function called to parse and set a filter
> > + * for the trigger. If no @set_filter() method is set for the
> > + * event command, filters set by the user for the command will be
> > + * ignored. This is usually implemented by the generic utility
> > + * function @set_trigger_filter() (see trace_event_triggers.c).
> > + *
> > + * @get_trigger_ops: The callback function invoked to retrieve the
> > + * event_trigger_ops implementation associated with the command.
> > + */
> > +struct event_command {
> > + struct list_head list;
> > + char *name;
> > + enum event_trigger_type trigger_type;
> > + bool post_trigger;
> > + int (*func)(struct event_command *cmd_ops,
> > + struct ftrace_event_file *file,
> > + char *glob, char *cmd,
> > + char *params, int enable);
> > + int (*reg)(char *glob,
> > + struct event_trigger_ops *trigger_ops,
> > + void *trigger_data,
> > + struct ftrace_event_file *file);
> > + void (*unreg)(char *glob,
> > + struct event_trigger_ops *trigger_ops,
> > + void *trigger_data,
> > + struct ftrace_event_file *file);
> > + int (*set_filter)(char *filter_str,
> > + void *trigger_data,
> > + struct ftrace_event_file *file);
> > + struct event_trigger_ops *(*get_trigger_ops)(char *cmd, char *param);
> > +};
> > +
> > +extern int trace_event_enable_disable(struct ftrace_event_file *file,
> > + int enable, int soft_disable);
> > +
> > extern const char *__start___trace_bprintk_fmt[];
> > extern const char *__stop___trace_bprintk_fmt[];
> >
> > diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> > index 368a4d5..7d8eb8a 100644
> > --- a/kernel/trace/trace_events.c
> > +++ b/kernel/trace/trace_events.c
> > @@ -342,6 +342,12 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
> > return ret;
> > }
> >
> > +int trace_event_enable_disable(struct ftrace_event_file *file,
> > + int enable, int soft_disable)
> > +{
> > + return __ftrace_event_enable_disable(file, enable, soft_disable);
> > +}
> > +
> > static int ftrace_event_enable_disable(struct ftrace_event_file *file,
> > int enable)
> > {
> > @@ -421,11 +427,6 @@ static void remove_subsystem(struct ftrace_subsystem_dir *dir)
> > }
> > }
> >
> > -static void *event_file_data(struct file *filp)
> > -{
> > - return ACCESS_ONCE(file_inode(filp)->i_private);
> > -}
> > -
> > static void remove_event_file_dir(struct ftrace_event_file *file)
> > {
> > struct dentry *dir = file->dir;
> > @@ -1542,6 +1543,9 @@ event_create_dir(struct dentry *parent, struct ftrace_event_file *file)
> > trace_create_file("filter", 0644, file->dir, call,
> > &ftrace_event_filter_fops);
> >
> > + trace_create_file("trigger", 0644, file->dir, file,
> > + &event_trigger_fops);
> > +
> > trace_create_file("format", 0444, file->dir, call,
> > &ftrace_event_format_fops);
> >
> > @@ -1637,6 +1641,8 @@ trace_create_new_event(struct ftrace_event_call *call,
> > file->event_call = call;
> > file->tr = tr;
> > atomic_set(&file->sm_ref, 0);
> > + atomic_set(&file->tm_ref, 0);
> > + INIT_LIST_HEAD(&file->triggers);
> > list_add(&file->list, &tr->events);
> >
> > return file;
> > @@ -2303,6 +2309,9 @@ int event_trace_del_tracer(struct trace_array *tr)
> > {
> > mutex_lock(&event_mutex);
> >
> > + /* Disable any event triggers and associated soft-disabled events */
> > + clear_event_triggers(tr);
> > +
> > /* Disable any running events */
> > __ftrace_set_clr_event_nolock(tr, NULL, NULL, NULL, 0);
> >
> > @@ -2366,6 +2375,8 @@ static __init int event_trace_enable(void)
> >
> > register_event_cmds();
> >
> > + register_trigger_cmds();
> > +
> > return 0;
> > }
> >
> > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > new file mode 100644
> > index 0000000..5ec8336
> > --- /dev/null
> > +++ b/kernel/trace/trace_events_trigger.c
> > @@ -0,0 +1,280 @@
> > +/*
> > + * trace_events_trigger - trace event triggers
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> > + *
> > + * Copyright (C) 2013 Tom Zanussi <[email protected]>
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/ctype.h>
> > +#include <linux/mutex.h>
> > +#include <linux/slab.h>
> > +
> > +#include "trace.h"
> > +
> > +static LIST_HEAD(trigger_commands);
> > +static DEFINE_MUTEX(trigger_cmd_mutex);
> > +
> > +struct event_trigger_data {
> > + struct ftrace_event_file *file;
> > + unsigned long count;
> > + int ref;
> > + bool enable;
> > + struct event_trigger_ops *ops;
> > + struct event_command * cmd_ops;
> > + struct event_filter *filter;
> > + char *filter_str;
> > + struct list_head list;
> > +};
> > +
> > +void event_triggers_call(struct ftrace_event_file *file)
> > +{
> > + struct event_trigger_data *data;
> > +
> > + if (list_empty(&file->triggers))
> > + return;
> > +
> > + preempt_disable_notrace();
> > + list_for_each_entry_rcu(data, &file->triggers, list)
> > + data->ops->func((void **)&data);
>
> Why is this a void ** and why are you passing the address of the
> variable that holds the pointer to the data that was registered and not
> the pointer to the data itself?
>
> Do you plan on modifying the local data variable on return?
>

No, as above, another vestige of the original patchset, where I tried to
keep the event triggers and the ftrace triggers unified. It'll be gone
in the next revision.

> > + preempt_enable_notrace();
> > +}
> > +EXPORT_SYMBOL_GPL(event_triggers_call);
> > +
> > +static void *trigger_next(struct seq_file *m, void *t, loff_t *pos)
> > +{
> > + struct ftrace_event_file *event_file = event_file_data(m->private);
> > +
> > + return seq_list_next(t, &event_file->triggers, pos);
> > +}
> > +
> > +static void *trigger_start(struct seq_file *m, loff_t *pos)
> > +{
> > + struct ftrace_event_file *event_file;
> > +
> > + /* ->stop() is called even if ->start() fails */
> > + mutex_lock(&event_mutex);
> > + event_file = event_file_data(m->private);
> > + if (unlikely(!event_file))
> > + return ERR_PTR(-ENODEV);
> > +
> > + return seq_list_start(&event_file->triggers, *pos);
> > +}
> > +
> > +static void trigger_stop(struct seq_file *m, void *t)
> > +{
> > + mutex_unlock(&event_mutex);
> > +}
> > +
> > +static int trigger_show(struct seq_file *m, void *v)
> > +{
> > + struct event_trigger_data *data;
> > +
> > + data = list_entry(v, struct event_trigger_data, list);
> > + data->ops->print(m, data->ops, data);
> > +
> > + return 0;
> > +}
> > +
> > +static const struct seq_operations event_triggers_seq_ops = {
> > + .start = trigger_start,
> > + .next = trigger_next,
> > + .stop = trigger_stop,
> > + .show = trigger_show,
> > +};
> > +
> > +static int event_trigger_regex_open(struct inode *inode, struct file *file)
> > +{
> > + int ret = 0;
> > +
> > + mutex_lock(&event_mutex);
> > +
> > + if (unlikely(!event_file_data(file))) {
> > + mutex_unlock(&event_mutex);
> > + return -ENODEV;
> > + }
> > +
> > + if (file->f_mode & FMODE_READ) {
> > + ret = seq_open(file, &event_triggers_seq_ops);
> > + if (!ret) {
> > + struct seq_file *m = file->private_data;
> > + m->private = file;
> > + }
> > + }
> > +
> > + mutex_unlock(&event_mutex);
> > +
> > + return ret;
> > +}
> > +
> > +static int trigger_process_regex(struct ftrace_event_file *file,
> > + char *buff, int enable)
> > +{
> > + char *command, *next = buff;
> > + struct event_command *p;
> > + int ret = -EINVAL;
> > +
> > + command = strsep(&next, ": \t");
> > + command = (command[0] != '!') ? command : command + 1;
> > +
> > + mutex_lock(&trigger_cmd_mutex);
> > + list_for_each_entry(p, &trigger_commands, list) {
> > + if (strcmp(p->name, command) == 0) {
> > + ret = p->func(p, file, buff, command, next, enable);
> > + goto out_unlock;
> > + }
> > + }
> > + out_unlock:
> > + mutex_unlock(&trigger_cmd_mutex);
> > +
> > + return ret;
> > +}
> > +
> > +static ssize_t event_trigger_regex_write(struct file *file,
> > + const char __user *ubuf,
> > + size_t cnt, loff_t *ppos, int enable)
> > +{
> > + struct ftrace_event_file *event_file;
> > + ssize_t ret;
> > + char *buf;
> > +
> > + if (!cnt)
> > + return 0;
> > +
> > + if (cnt >= PAGE_SIZE)
> > + return -EINVAL;
> > +
> > + buf = (char *)__get_free_page(GFP_TEMPORARY);
> > + if (!buf)
> > + return -ENOMEM;
> > +
> > + if (copy_from_user(buf, ubuf, cnt)) {
> > + free_page((unsigned long) buf);
> > + return -EFAULT;
> > + }
> > + buf[cnt] = '\0';
> > + strim(buf);
> > +
> > + mutex_lock(&event_mutex);
> > + event_file = event_file_data(file);
> > + if (unlikely(!event_file)) {
> > + mutex_unlock(&event_mutex);
> > + free_page((unsigned long) buf);
> > + return -ENODEV;
> > + }
> > + ret = trigger_process_regex(event_file, buf, enable);
> > + mutex_unlock(&event_mutex);
> > +
> > + free_page((unsigned long) buf);
> > + if (ret < 0)
> > + goto out;
> > +
> > + *ppos += cnt;
> > + ret = cnt;
> > + out:
> > + return ret;
> > +}
> > +
> > +static int event_trigger_regex_release(struct inode *inode, struct file *file)
> > +{
> > + mutex_lock(&event_mutex);
> > +
> > + if (file->f_mode & FMODE_READ)
> > + seq_release(inode, file);
> > +
> > + mutex_unlock(&event_mutex);
> > +
> > + return 0;
> > +}
> > +
> > +static ssize_t
> > +event_trigger_write(struct file *filp, const char __user *ubuf,
> > + size_t cnt, loff_t *ppos)
> > +{
> > + return event_trigger_regex_write(filp, ubuf, cnt, ppos, 1);
> > +}
> > +
> > +static int
> > +event_trigger_open(struct inode *inode, struct file *filp)
> > +{
> > + return event_trigger_regex_open(inode, filp);
> > +}
> > +
> > +static int
> > +event_trigger_release(struct inode *inode, struct file *file)
> > +{
> > + return event_trigger_regex_release(inode, file);
> > +}
> > +
> > +const struct file_operations event_trigger_fops = {
> > + .open = event_trigger_open,
> > + .read = seq_read,
> > + .write = event_trigger_write,
> > + .llseek = ftrace_filter_lseek,
> > + .release = event_trigger_release,
> > +};
> > +
> > +static int trace_event_trigger_enable_disable(struct ftrace_event_file *file,
> > + int trigger_enable)
> > +{
> > + int ret = 0;
> > +
> > + if (trigger_enable) {
> > + if (atomic_inc_return(&file->tm_ref) > 1)
> > + return ret;
> > + set_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
> > + ret = trace_event_enable_disable(file, 1, 1);
> > + } else {
> > + if (atomic_dec_return(&file->tm_ref) > 0)
> > + return ret;
> > + clear_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
> > + ret = trace_event_enable_disable(file, 0, 1);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +/**
> > + * clear_event_triggers - clear all triggers associated with a trace array.
> > + *
> > + * For each trigger, the triggering event has its tm_ref decremented
> > + * via trace_event_trigger_enable_disable(), and any associated event
> > + * (in the case of enable/disable_event triggers) will have its sm_ref
> > + * decremented via free()->trace_event_enable_disable(). That
> > + * combination effectively reverses the soft-mode/trigger state added
> > + * by trigger registration.
> > + *
> > + * Must be called with event_mutex held.
> > + */
> > +void
> > +clear_event_triggers(struct trace_array *tr)
> > +{
> > + struct ftrace_event_file *file;
> > +
> > + list_for_each_entry(file, &tr->events, list) {
> > + struct event_trigger_data *data;
> > + list_for_each_entry_rcu(data, &file->triggers, list) {
> > + trace_event_trigger_enable_disable(file, 0);
> > + if (data->ops->free)
> > + data->ops->free(data->ops, (void **)&data);
>
> ditto here.
>

Same here, that'll get fixed up too, as above.

Thanks,

Tom

> -- Steve
>
> > + }
> > + }
> > +}
> > +
> > +__init int register_trigger_cmds(void)
> > +{
> > + return 0;
> > +}
> > diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> > index 230cdb6..4f56d54 100644
> > --- a/kernel/trace/trace_syscalls.c
> > +++ b/kernel/trace/trace_syscalls.c
> > @@ -321,6 +321,8 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> > if (!ftrace_file)
> > return;
> >
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> > + event_triggers_call(ftrace_file);
> > if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> > return;
> >
> > @@ -370,6 +372,8 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> > if (!ftrace_file)
> > return;
> >
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> > + event_triggers_call(ftrace_file);
> > if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> > return;
> >
>

2013-08-27 23:41:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v7 01/10] tracing: Add support for SOFT_DISABLE to syscall events

On Tue, 27 Aug 2013 18:29:06 -0500
Tom Zanussi <[email protected]> wrote:

> On Tue, 2013-08-27 at 16:01 -0400, Steven Rostedt wrote:
> > On Tue, 27 Aug 2013 14:40:13 -0500
> > Tom Zanussi <[email protected]> wrote:
> >
> > return;
> > > - if (!test_bit(syscall_nr, tr->enabled_enter_syscalls))
> > > +
> > > + /* Here we're inside the tp handler's rcu_read_lock (__DO_TRACE()) */
> > > + ftrace_file = rcu_dereference_raw(tr->enter_syscall_files[syscall_nr]);
> >
> > What's the reason for using rcu_dereference_raw() and not normal
> > rcu_dereference?
> >
>
> This is because we know the tracepoint handler has rcu_read_lock_held()
> and so we don't need to do the rcu_dereference_check().
>

That's not the point of raw(). What happens if in the future we modify
the code and this is called without rcu_read_lock held? Then we just
took away the debug check. Please do not circumvent any debug unless
there's a reason for it. The function tracer does circumvent these
checks because the debug checks can cause the function tracer to hang
the box. Kind of killing the point of debug checks ;-)

Anyway, this should be rcu_derefence() not rcu_dereference_raw().

-- Steve

Subject: Re: [PATCH v7 00/10] tracing: trace event triggers (repost)

(2013/08/28 4:40), Tom Zanussi wrote:
> This is a repost of the v7 patchset - I inadvertently used the wrong
> branch in the previous posting, thought the branch URL was correct in
> both cases..

Ah, I directly pulled v7 from your git repository for test & review...

Thanks,

--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2013-08-28 16:38:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v7 07/10] tracing: add and use generic set_trigger_filter() implementation

On Tue, 27 Aug 2013 14:40:19 -0500
Tom Zanussi <[email protected]> wrote:

> enum {
> FILTER_OTHER = 0,
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index 326ba32..6c701c3 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -412,13 +412,15 @@ static inline notrace int ftrace_get_offsets_##call( \
> * struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
> * struct ring_buffer_event *event;
> * struct ftrace_raw_<call> *entry; <-- defined in stage 1
> + * enum event_trigger_type __tt = ETT_NONE;
> * struct ring_buffer *buffer;
> * unsigned long irq_flags;
> * int __data_size;
> * int pc;
> *
> - * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> - * &ftrace_file->flags))
> + * if ((ftrace_file->flags & (FTRACE_EVENT_FL_SOFT_DISABLED |
> + * FTRACE_EVENT_FL_TRIGGER_MODE)) ==
> + * FTRACE_EVENT_FL_SOFT_DISABLED)

Don't worry too much about 80 character limit here. Move the
FL_SOFT_DISABLED up.

> * return;
> *
> * local_save_flags(irq_flags);
> @@ -437,9 +439,19 @@ static inline notrace int ftrace_get_offsets_##call( \
> * { <assign>; } <-- Here we assign the entries by the __field and
> * __array macros.
> *
> - * if (!filter_current_check_discard(buffer, event_call, entry, event))
> - * trace_nowake_buffer_unlock_commit(buffer,
> - * event, irq_flags, pc);
> + * if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> + * &ftrace_file->flags))
> + * __tt = event_triggers_call(ftrace_file, entry);
> + *
> + * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> + * &ftrace_file->flags))
> + * ring_buffer_discard_commit(buffer, event);
> + * else if (!filter_current_check_discard(buffer, event_call,
> + * entry, event))
> + * trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
> + *
> + * if (__tt)
> + * event_triggers_post_call(ftrace_file, __tt);
> * }
> *
> * static struct trace_event ftrace_event_type_<call> = {
> @@ -521,17 +533,15 @@ ftrace_raw_event_##call(void *__data, proto) \
> struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> struct ring_buffer_event *event; \
> struct ftrace_raw_##call *entry; \
> + enum event_trigger_type __tt = ETT_NONE; \
> struct ring_buffer *buffer; \
> unsigned long irq_flags; \
> int __data_size; \
> int pc; \
> \
> - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> - &ftrace_file->flags)) \
> - event_triggers_call(ftrace_file); \
> - \
> - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> - &ftrace_file->flags)) \
> + if ((ftrace_file->flags & (FTRACE_EVENT_FL_SOFT_DISABLED | \
> + FTRACE_EVENT_FL_TRIGGER_MODE)) == \
> + FTRACE_EVENT_FL_SOFT_DISABLED) \

Ditto.

Also, I don't think we need to worry about the flags changing, so we
should be able to just save it.

unsigned long eflags = ftrace_file_flags;

And then we can also only delay the event triggers if it has a
condition. What about adding a FTRACE_EVENT_FL_TRIGGER_COND_BIT, and do:

if (!(eflags & FTRACE_EVENT_FL_TRIGGER_COND))
if (eflags & FTRACE_EVENT_FL_TRIGGER_MODE)
event_triggers_call(ftrace_file, NULL);

if (eflags & FTRACE_EVENT_FL_SOFT_DISABLED)
return;
}

> return; \
> \
> local_save_flags(irq_flags); \
> @@ -551,8 +561,19 @@ ftrace_raw_event_##call(void *__data, proto) \
> \
> { assign; } \
> \
> - if (!filter_current_check_discard(buffer, event_call, entry, event)) \
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> + &ftrace_file->flags)) \
> + __tt = event_triggers_call(ftrace_file, entry); \

Then here we test just the TRIGGER_COND bit.


> + \
> + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> + &ftrace_file->flags)) \
> + ring_buffer_discard_commit(buffer, event); \
> + else if (!filter_current_check_discard(buffer, event_call, \
> + entry, event)) \
> trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
> + \
> + if (__tt) \
> + event_triggers_post_call(ftrace_file, __tt); \

This part is fine.

> }
> /*
> * The ftrace_test_probe is compiled out, it is only here as a build time check
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 3cb846e..af5f3b6 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -994,6 +994,10 @@ extern int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
> extern void print_subsystem_event_filter(struct event_subsystem *system,
> struct trace_seq *s);
> extern int filter_assign_type(const char *type);
> +extern int create_event_filter(struct ftrace_event_call *call,
> + char *filter_str, bool set_str,
> + struct event_filter **filterp);
> +extern void free_event_filter(struct event_filter *filter);
>
> struct ftrace_event_field *
> trace_find_event_field(struct ftrace_event_call *call, char *name);
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index 97daa8c..0c45aa1 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -781,6 +781,11 @@ static void __free_filter(struct event_filter *filter)
> kfree(filter);
> }
>
> +void free_event_filter(struct event_filter *filter)
> +{
> + __free_filter(filter);
> +}
> +
> /*
> * Called when destroying the ftrace_event_call.
> * The call is being freed, so we do not need to worry about
> @@ -1806,6 +1811,14 @@ static int create_filter(struct ftrace_event_call *call,
> return err;
> }
>
> +int create_event_filter(struct ftrace_event_call *call,
> + char *filter_str, bool set_str,
> + struct event_filter **filterp)
> +{
> + return create_filter(call, filter_str, set_str, filterp);
> +}
> +
> +
> /**
> * create_system_filter - create a filter for an event_subsystem
> * @system: event_subsystem to create a filter for
> diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> index 54678e2..b5e7ca7 100644
> --- a/kernel/trace/trace_events_trigger.c
> +++ b/kernel/trace/trace_events_trigger.c
> @@ -43,24 +43,53 @@ struct event_trigger_data {
> static void
> trigger_data_free(struct event_trigger_data *data)
> {
> + if (data->cmd_ops->set_filter)
> + data->cmd_ops->set_filter(NULL, data, NULL);
> +
> synchronize_sched(); /* make sure current triggers exit before free */
> kfree(data);
> }
>
> -void event_triggers_call(struct ftrace_event_file *file)
> +enum event_trigger_type
> +event_triggers_call(struct ftrace_event_file *file, void *rec)
> {
> struct event_trigger_data *data;
> + enum event_trigger_type tt = ETT_NONE;
>
> if (list_empty(&file->triggers))
> - return;
> + return tt;
>
> preempt_disable_notrace();
> - list_for_each_entry_rcu(data, &file->triggers, list)
> + list_for_each_entry_rcu(data, &file->triggers, list) {
> + if (data->filter && !filter_match_preds(data->filter, rec))

We would need a check for !rec, just to be safe (with the mods I talked
about).

> + continue;
> + if (data->cmd_ops->post_trigger) {
> + tt |= data->cmd_ops->trigger_type;
> + continue;
> + }
> data->ops->func((void **)&data);
> + }
> preempt_enable_notrace();
> +
> + return tt;
> }
> EXPORT_SYMBOL_GPL(event_triggers_call);
>
> +void
> +event_triggers_post_call(struct ftrace_event_file *file,
> + enum event_trigger_type tt)
> +{
> + struct event_trigger_data *data;
> +
> + preempt_disable_notrace();
> + list_for_each_entry_rcu(data, &file->triggers, list) {
> + if (data->cmd_ops->trigger_type & tt)
> + data->ops->func((void **)&data);
> + }
> + preempt_enable_notrace();
> +}
> +EXPORT_SYMBOL_GPL(event_triggers_post_call);
> +
> static void *trigger_next(struct seq_file *m, void *t, loff_t *pos)
> {
> struct ftrace_event_file *event_file = event_file_data(m->private);
> @@ -561,6 +590,66 @@ event_trigger_callback(struct event_command *cmd_ops,
> goto out;
> }
>
> +/**
> + * set_trigger_filter - generic event_command @set_filter
> + * implementation
> + *
> + * Common implementation for event command filter parsing and filter
> + * instantiation.
> + *
> + * Usually used directly as the @set_filter method in event command
> + * implementations.
> + *
> + * Also used to remove a filter (if filter_str = NULL).
> + */
> +static int set_trigger_filter(char *filter_str, void *trigger_data,
> + struct ftrace_event_file *file)
> +{
> + struct event_trigger_data *data = trigger_data;
> + struct event_filter *filter = NULL, *tmp;
> + int ret = -EINVAL;
> + char *s;
> +
> + if (!filter_str) /* clear the current filter */
> + goto assign;
> +
> + s = strsep(&filter_str, " \t");
> +
> + if (!strlen(s) || strcmp(s, "if") != 0)
> + goto out;
> +
> + if (!filter_str)
> + goto out;
> +
> + /* The filter is for the 'trigger' event, not the triggered event */
> + ret = create_event_filter(file->event_call, filter_str, false, &filter);
> + if (ret)
> + goto out;
> + assign:
> + tmp = data->filter;
> +
> + rcu_assign_pointer(data->filter, filter);
> +
> + if (tmp) {
> + /* Make sure the call is done with the filter */
> + synchronize_sched();
> + free_event_filter(tmp);
> + }
> +
> + kfree(data->filter_str);
> +
> + if (filter_str) {
> + data->filter_str = kstrdup(filter_str, GFP_KERNEL);
> + if (!data->filter_str) {
> + free_event_filter(data->filter);
> + data->filter = NULL;
> + ret = -ENOMEM;
> + }
> + }
> + out:
> + return ret;
> +}
> +
> static void
> traceon_trigger(void **_data)
> {
> @@ -698,6 +787,7 @@ static struct event_command trigger_traceon_cmd = {
> .reg = register_trigger,
> .unreg = unregister_trigger,
> .get_trigger_ops = onoff_get_trigger_ops,
> + .set_filter = set_trigger_filter,
> };
>
> static struct event_command trigger_traceoff_cmd = {
> @@ -707,6 +797,7 @@ static struct event_command trigger_traceoff_cmd = {
> .reg = register_trigger,
> .unreg = unregister_trigger,
> .get_trigger_ops = onoff_get_trigger_ops,
> + .set_filter = set_trigger_filter,
> };
>
> static void
> @@ -788,6 +879,7 @@ static struct event_command trigger_snapshot_cmd = {
> .reg = register_snapshot_trigger,
> .unreg = unregister_trigger,
> .get_trigger_ops = snapshot_get_trigger_ops,
> + .set_filter = set_trigger_filter,
> };
>
> /*
> @@ -867,6 +959,7 @@ static struct event_command trigger_stacktrace_cmd = {
> .reg = register_trigger,
> .unreg = unregister_trigger,
> .get_trigger_ops = stacktrace_get_trigger_ops,
> + .set_filter = set_trigger_filter,
> };
>
> static __init void unregister_trigger_traceon_traceoff_cmds(void)
> @@ -1194,6 +1287,7 @@ static struct event_command trigger_enable_cmd = {
> .reg = event_enable_register_trigger,
> .unreg = event_enable_unregister_trigger,
> .get_trigger_ops = event_enable_get_trigger_ops,
> + .set_filter = set_trigger_filter,
> };
>
> static struct event_command trigger_disable_cmd = {
> @@ -1203,6 +1297,7 @@ static struct event_command trigger_disable_cmd = {
> .reg = event_enable_register_trigger,
> .unreg = event_enable_unregister_trigger,
> .get_trigger_ops = event_enable_get_trigger_ops,
> + .set_filter = set_trigger_filter,
> };
>
> static __init void unregister_trigger_enable_disable_cmds(void)
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 4f56d54..84cdbce 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -306,6 +306,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> struct syscall_trace_enter *entry;
> struct syscall_metadata *sys_data;
> struct ring_buffer_event *event;
> + enum event_trigger_type __tt = ETT_NONE;
> struct ring_buffer *buffer;
> unsigned long irq_flags;
> int pc;
> @@ -321,9 +322,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> if (!ftrace_file)
> return;
>
> - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> - event_triggers_call(ftrace_file);
> - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> + if ((ftrace_file->flags &
> + (FTRACE_EVENT_FL_SOFT_DISABLED | FTRACE_EVENT_FL_TRIGGER_MODE)) ==
> + FTRACE_EVENT_FL_SOFT_DISABLED)

We would need the same changes here too.

-- Steve

> return;
>
> sys_data = syscall_nr_to_meta(syscall_nr);
> @@ -345,10 +346,17 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> entry->nr = syscall_nr;
> syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
>
> - if (!filter_current_check_discard(buffer, sys_data->enter_event,
> - entry, event))
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> + __tt = event_triggers_call(ftrace_file, entry);
> +
> + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> + ring_buffer_discard_commit(buffer, event);
> + else if (!filter_current_check_discard(buffer, sys_data->enter_event,
> + entry, event))
> trace_current_buffer_unlock_commit(buffer, event,
> irq_flags, pc);
> + if (__tt)
> + event_triggers_post_call(ftrace_file, __tt);
> }
>
> static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> @@ -358,6 +366,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> struct syscall_trace_exit *entry;
> struct syscall_metadata *sys_data;
> struct ring_buffer_event *event;
> + enum event_trigger_type __tt = ETT_NONE;
> struct ring_buffer *buffer;
> unsigned long irq_flags;
> int pc;
> @@ -372,9 +381,9 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> if (!ftrace_file)
> return;
>
> - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> - event_triggers_call(ftrace_file);
> - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> + if ((ftrace_file->flags &
> + (FTRACE_EVENT_FL_SOFT_DISABLED | FTRACE_EVENT_FL_TRIGGER_MODE)) ==
> + FTRACE_EVENT_FL_SOFT_DISABLED)
> return;
>
> sys_data = syscall_nr_to_meta(syscall_nr);
> @@ -395,10 +404,17 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> entry->nr = syscall_nr;
> entry->ret = syscall_get_return_value(current, regs);
>
> - if (!filter_current_check_discard(buffer, sys_data->exit_event,
> - entry, event))
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> + __tt = event_triggers_call(ftrace_file, entry);
> +
> + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> + ring_buffer_discard_commit(buffer, event);
> + else if (!filter_current_check_discard(buffer, sys_data->exit_event,
> + entry, event))
> trace_current_buffer_unlock_commit(buffer, event,
> irq_flags, pc);
> + if (__tt)
> + event_triggers_post_call(ftrace_file, __tt);
> }
>
> static int reg_event_syscall_enter(struct ftrace_event_file *file,

2013-08-28 18:00:37

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v7 08/10] tracing: update event filters for multibuffer

On Tue, 27 Aug 2013 14:40:20 -0500
Tom Zanussi <[email protected]> wrote:


> extern int filter_current_check_discard(struct ring_buffer *buffer,
> @@ -336,6 +350,20 @@ extern enum event_trigger_type event_triggers_call(struct ftrace_event_file *fil
> extern void event_triggers_post_call(struct ftrace_event_file *file,
> enum event_trigger_type tt);
>
> +static inline int
> +filter_check_discard(struct ftrace_event_file *file, void *rec,
> + struct ring_buffer *buffer,
> + struct ring_buffer_event *event)
> +{
> + if (unlikely(file->flags & FTRACE_EVENT_FL_FILTERED) &&
> + !filter_match_preds(file->filter, rec)) {
> + ring_buffer_discard_commit(buffer, event);
> + return 1;
> + }
> +
> + return 0;
> +}
> +

Keep this as a function, don't make it inlined. Note, anything
added to ftrace_raw_event_##call increases the kernel by quite a bit.
We have almost a 1000 tracepoints, which means we have 1000 versions of
this function. If anything, we need to remove code from it, not add to
it.

> enum {
> FILTER_OTHER = 0,
> FILTER_STATIC_STRING,
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index 6c701c3..0de03fd 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -446,8 +446,7 @@ static inline notrace int ftrace_get_offsets_##call( \
> * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> * &ftrace_file->flags))
> * ring_buffer_discard_commit(buffer, event);
> - * else if (!filter_current_check_discard(buffer, event_call,
> - * entry, event))
> + * else if (!filter_check_discard(ftrace_file, entry, buffer, event))
> * trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
> *
> * if (__tt)
> @@ -568,8 +567,7 @@ ftrace_raw_event_##call(void *__data, proto) \
> if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> &ftrace_file->flags)) \
> ring_buffer_discard_commit(buffer, event); \
> - else if (!filter_current_check_discard(buffer, event_call, \
> - entry, event)) \
> + else if (!filter_check_discard(ftrace_file, entry, buffer, event)) \
> trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
> \
> if (__tt) \
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 5a61dbe..2aabd34 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -235,14 +235,6 @@ void trace_array_put(struct trace_array *this_tr)
> mutex_unlock(&trace_types_lock);
> }
>
> -int filter_current_check_discard(struct ring_buffer *buffer,
> - struct ftrace_event_call *call, void *rec,
> - struct ring_buffer_event *event)
> -{
> - return filter_check_discard(call, rec, buffer, event);
> -}
> -EXPORT_SYMBOL_GPL(filter_current_check_discard);
> -
> cycle_t buffer_ftrace_now(struct trace_buffer *buf, int cpu)
> {
> u64 ts;
> @@ -1630,7 +1622,7 @@ trace_function(struct trace_array *tr,
> entry->ip = ip;
> entry->parent_ip = parent_ip;
>
> - if (!filter_check_discard(call, entry, buffer, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> __buffer_unlock_commit(buffer, event);
> }
>
> @@ -1714,7 +1706,7 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer,
>
> entry->size = trace.nr_entries;
>
> - if (!filter_check_discard(call, entry, buffer, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> __buffer_unlock_commit(buffer, event);
>
> out:
> @@ -1816,7 +1808,7 @@ ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc)
> trace.entries = entry->caller;
>
> save_stack_trace_user(&trace);
> - if (!filter_check_discard(call, entry, buffer, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> __buffer_unlock_commit(buffer, event);
>
> out_drop_count:
> @@ -2008,7 +2000,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
> entry->fmt = fmt;
>
> memcpy(entry->buf, tbuffer, sizeof(u32) * len);
> - if (!filter_check_discard(call, entry, buffer, event)) {
> + if (!call_filter_check_discard(call, entry, buffer, event)) {
> __buffer_unlock_commit(buffer, event);
> ftrace_trace_stack(buffer, flags, 6, pc);
> }
> @@ -2063,7 +2055,7 @@ __trace_array_vprintk(struct ring_buffer *buffer,
>
> memcpy(&entry->buf, tbuffer, len);
> entry->buf[len] = '\0';
> - if (!filter_check_discard(call, entry, buffer, event)) {
> + if (!call_filter_check_discard(call, entry, buffer, event)) {
> __buffer_unlock_commit(buffer, event);
> ftrace_trace_stack(buffer, flags, 6, pc);
> }
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index af5f3b6..a588ca8 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -985,9 +985,9 @@ struct filter_pred {
>
> extern enum regex_type
> filter_parse_regex(char *buff, int len, char **search, int *not);
> -extern void print_event_filter(struct ftrace_event_call *call,
> +extern void print_event_filter(struct ftrace_event_file *file,
> struct trace_seq *s);
> -extern int apply_event_filter(struct ftrace_event_call *call,
> +extern int apply_event_filter(struct ftrace_event_file *file,
> char *filter_string);
> extern int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
> char *filter_string);
> @@ -1003,9 +1003,9 @@ struct ftrace_event_field *
> trace_find_event_field(struct ftrace_event_call *call, char *name);
>
> static inline int
> -filter_check_discard(struct ftrace_event_call *call, void *rec,
> - struct ring_buffer *buffer,
> - struct ring_buffer_event *event)
> +call_filter_check_discard(struct ftrace_event_call *call, void *rec,
> + struct ring_buffer *buffer,
> + struct ring_buffer_event *event)
> {
> if (unlikely(call->flags & TRACE_EVENT_FL_FILTERED) &&
> !filter_match_preds(call->filter, rec)) {
> diff --git a/kernel/trace/trace_branch.c b/kernel/trace/trace_branch.c
> index d594da0..697fb9b 100644
> --- a/kernel/trace/trace_branch.c
> +++ b/kernel/trace/trace_branch.c
> @@ -78,7 +78,7 @@ probe_likely_condition(struct ftrace_branch_data *f, int val, int expect)
> entry->line = f->line;
> entry->correct = val == expect;
>
> - if (!filter_check_discard(call, entry, buffer, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> __buffer_unlock_commit(buffer, event);
>
> out:
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 25b2c86..7dacbd1 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -990,7 +990,7 @@ static ssize_t
> event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
> loff_t *ppos)
> {
> - struct ftrace_event_call *call;
> + struct ftrace_event_file *file;
> struct trace_seq *s;
> int r = -ENODEV;
>
> @@ -1005,12 +1005,12 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
> trace_seq_init(s);
>
> mutex_lock(&event_mutex);
> - call = event_file_data(filp);
> - if (call)
> - print_event_filter(call, s);
> + file = event_file_data(filp);
> + if (file)
> + print_event_filter(file, s);
> mutex_unlock(&event_mutex);
>
> - if (call)
> + if (file)
> r = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, s->len);
>
> kfree(s);
> @@ -1022,7 +1022,7 @@ static ssize_t
> event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
> loff_t *ppos)
> {
> - struct ftrace_event_call *call;
> + struct ftrace_event_file *file;
> char *buf;
> int err = -ENODEV;
>
> @@ -1040,9 +1040,9 @@ event_filter_write(struct file *filp, const char __user *ubuf, size_t cnt,
> buf[cnt] = '\0';
>
> mutex_lock(&event_mutex);
> - call = event_file_data(filp);
> - if (call)
> - err = apply_event_filter(call, buf);
> + file = event_file_data(filp);
> + if (file)
> + err = apply_event_filter(file, buf);
> mutex_unlock(&event_mutex);
>
> free_page((unsigned long) buf);
> @@ -1540,7 +1540,7 @@ event_create_dir(struct dentry *parent, struct ftrace_event_file *file)
> return -1;
> }
> }
> - trace_create_file("filter", 0644, file->dir, call,
> + trace_create_file("filter", 0644, file->dir, file,
> &ftrace_event_filter_fops);
>
> trace_create_file("trigger", 0644, file->dir, file,
> @@ -1581,6 +1581,10 @@ static void event_remove(struct ftrace_event_call *call)
> if (file->event_call != call)
> continue;
> ftrace_event_enable_disable(file, 0);
> + if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
> + destroy_call_preds(call);
> + else
> + destroy_preds(file);

Why not just use destroy_preds(file), and then do the check inside that
call. Instead of open coding that check all over the place.

Have both a destroy_file_preds() and destroy_call_preds() and have:

destroy_preds()
{
call = file->call;
if (call & TRACE_EVENT_FL_USE_CALL_FILTER)
destroy_call_preds(call);
else
destroy_file_preds(file);
}

?

}
> /*
> * The do_for_each_event_file() is
> * a double loop. After finding the call for this
> @@ -1706,7 +1710,7 @@ static void __trace_remove_event_call(struct ftrace_event_call *call)
> {
> event_remove(call);
> trace_destroy_fields(call);
> - destroy_preds(call);
> + destroy_call_preds(call);
> }
>
> static int probe_remove_event_call(struct ftrace_event_call *call)
> diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> index 0c45aa1..af55a84 100644
> --- a/kernel/trace/trace_events_filter.c
> +++ b/kernel/trace/trace_events_filter.c
> @@ -638,9 +638,14 @@ static void append_filter_err(struct filter_parse_state *ps,
> }
>
> /* caller must hold event_mutex */
> -void print_event_filter(struct ftrace_event_call *call, struct trace_seq *s)
> +void print_event_filter(struct ftrace_event_file *file, struct trace_seq *s)
> {
> - struct event_filter *filter = call->filter;
> + struct event_filter *filter;
> +
> + if (file->event_call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
> + filter = file->event_call->filter;
> + else
> + filter = file->filter;
>
> if (filter && filter->filter_string)
> trace_seq_printf(s, "%s\n", filter->filter_string);
> @@ -766,7 +771,12 @@ static void __free_preds(struct event_filter *filter)
> filter->n_preds = 0;
> }
>
> -static void filter_disable(struct ftrace_event_call *call)
> +static void filter_disable(struct ftrace_event_file *file)
> +{
> + file->flags &= ~FTRACE_EVENT_FL_FILTERED;
> +}
> +
> +static void call_filter_disable(struct ftrace_event_call *call)
> {
> call->flags &= ~TRACE_EVENT_FL_FILTERED;
> }
> @@ -787,12 +797,24 @@ void free_event_filter(struct event_filter *filter)
> }
>
> /*
> + * Called when destroying the ftrace_event_file.
> + * The file is being freed, so we do not need to worry about
> + * the file being currently used. This is for module code removing
> + * the tracepoints from within it.
> + */
> +void destroy_preds(struct ftrace_event_file *file)
> +{
> + __free_filter(file->filter);
> + file->filter = NULL;
> +}
> +
> +/*
> * Called when destroying the ftrace_event_call.
> * The call is being freed, so we do not need to worry about
> * the call being currently used. This is for module code removing
> * the tracepoints from within it.
> */
> -void destroy_preds(struct ftrace_event_call *call)
> +void destroy_call_preds(struct ftrace_event_call *call)
> {
> __free_filter(call->filter);
> call->filter = NULL;
> @@ -830,28 +852,44 @@ static int __alloc_preds(struct event_filter *filter, int n_preds)
> return 0;
> }
>
> -static void filter_free_subsystem_preds(struct event_subsystem *system)
> +static void filter_free_subsystem_preds(struct event_subsystem *system,
> + struct trace_array *tr)
> {
> + struct ftrace_event_file *file;
> struct ftrace_event_call *call;
>
> - list_for_each_entry(call, &ftrace_events, list) {
> + list_for_each_entry(file, &tr->events, list) {
> + call = file->event_call;
> if (strcmp(call->class->system, system->name) != 0)
> continue;
>
> - filter_disable(call);
> - remove_filter_string(call->filter);
> + if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
> + call_filter_disable(call);
> + remove_filter_string(call->filter);
> + } else {
> + filter_disable(file);
> + remove_filter_string(file->filter);
> + }
> }
> }
>
> -static void filter_free_subsystem_filters(struct event_subsystem *system)
> +static void filter_free_subsystem_filters(struct event_subsystem *system,
> + struct trace_array *tr)
> {
> + struct ftrace_event_file *file;
> struct ftrace_event_call *call;
>
> - list_for_each_entry(call, &ftrace_events, list) {
> + list_for_each_entry(file, &tr->events, list) {
> + call = file->event_call;
> if (strcmp(call->class->system, system->name) != 0)
> continue;
> - __free_filter(call->filter);
> - call->filter = NULL;
> + if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
> + __free_filter(call->filter);
> + call->filter = NULL;
> + } else {
> + __free_filter(file->filter);
> + file->filter = NULL;
> + }

I'm thinking it will be cleaner to encapsulate all of these into
functions like:

static inline void __event_free_filter(file)
{
call = file->event_call;
if (call->flags ...) {
__free_filter(call->filter);
call->filter = NULL;
} else {
__free_filter(file->filter);
file->filter = NULL;
}
}

That way, we get the ugly flag check contained, and not spread out in
open coded functions.

> }
> }
>
> @@ -1628,9 +1666,11 @@ struct filter_list {
> };
>
> static int replace_system_preds(struct event_subsystem *system,
> + struct trace_array *tr,
> struct filter_parse_state *ps,
> char *filter_string)
> {
> + struct ftrace_event_file *file;
> struct ftrace_event_call *call;
> struct filter_list *filter_item;
> struct filter_list *tmp;
> @@ -1638,8 +1678,8 @@ static int replace_system_preds(struct event_subsystem *system,
> bool fail = true;
> int err;
>
> - list_for_each_entry(call, &ftrace_events, list) {
> -
> + list_for_each_entry(file, &tr->events, list) {
> + call = file->event_call;
> if (strcmp(call->class->system, system->name) != 0)
> continue;
>
> @@ -1648,21 +1688,34 @@ static int replace_system_preds(struct event_subsystem *system,
> * (filter arg is ignored on dry_run)
> */
> err = replace_preds(call, NULL, ps, filter_string, true);
> - if (err)
> - call->flags |= TRACE_EVENT_FL_NO_SET_FILTER;
> - else
> - call->flags &= ~TRACE_EVENT_FL_NO_SET_FILTER;
> + if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
> + if (err)
> + call->flags |= TRACE_EVENT_FL_NO_SET_FILTER;
> + else
> + call->flags &= ~TRACE_EVENT_FL_NO_SET_FILTER;
> + } else {
> + if (err)
> + file->flags |= FTRACE_EVENT_FL_NO_SET_FILTER;
> + else
> + file->flags &= ~FTRACE_EVENT_FL_NO_SET_FILTER;
> + }

ditto

> }
>
> - list_for_each_entry(call, &ftrace_events, list) {
> + list_for_each_entry(file, &tr->events, list) {
> struct event_filter *filter;
>
> + call = file->event_call;
> +
> if (strcmp(call->class->system, system->name) != 0)
> continue;
>
> - if (call->flags & TRACE_EVENT_FL_NO_SET_FILTER)
> + if (file->flags & FTRACE_EVENT_FL_NO_SET_FILTER)
> continue;
>
> + if ((call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) &&
> + (call->flags & TRACE_EVENT_FL_NO_SET_FILTER))
> + continue;
> +
> filter_item = kzalloc(sizeof(*filter_item), GFP_KERNEL);
> if (!filter_item)
> goto fail_mem;
> @@ -1681,17 +1734,29 @@ static int replace_system_preds(struct event_subsystem *system,
>
> err = replace_preds(call, filter, ps, filter_string, false);
> if (err) {
> - filter_disable(call);
> + if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
> + call_filter_disable(call);
> + else
> + filter_disable(file);
> parse_error(ps, FILT_ERR_BAD_SUBSYS_FILTER, 0);
> append_filter_err(ps, filter);
> - } else
> - call->flags |= TRACE_EVENT_FL_FILTERED;
> + } else {
> + if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER)
> + call->flags |= TRACE_EVENT_FL_FILTERED;
> + else
> + file->flags |= FTRACE_EVENT_FL_FILTERED;

We can have a event_set_filter_flag(file) that does this check?

> + }
> /*
> * Regardless of if this returned an error, we still
> * replace the filter for the call.
> */
> - filter = call->filter;
> - rcu_assign_pointer(call->filter, filter_item->filter);
> + if (call->flags & TRACE_EVENT_FL_USE_CALL_FILTER) {
> + filter = call->filter;
> + rcu_assign_pointer(call->filter, filter_item->filter);
> + } else {
> + filter = file->filter;
> + rcu_assign_pointer(file->filter, filter_item->filter);
> + }

Again, encapsulate.

> filter_item->filter = filter;
>
> fail = false;
> @@ -1829,6 +1894,7 @@ int create_event_filter(struct ftrace_event_call *call,
> * and always remembers @filter_str.
> */
> static int create_system_filter(struct event_subsystem *system,
> + struct trace_array *tr,
> char *filter_str, struct event_filter **filterp)
> {
> struct event_filter *filter = NULL;
> @@ -1837,7 +1903,7 @@ static int create_system_filter(struct event_subsystem *system,
>
> err = create_filter_start(filter_str, true, &ps, &filter);
> if (!err) {
> - err = replace_system_preds(system, ps, filter_str);
> + err = replace_system_preds(system, tr, ps, filter_str);
> if (!err) {
> /* System filters just show a default message */
> kfree(filter->filter_string);
> @@ -1853,17 +1919,29 @@ static int create_system_filter(struct event_subsystem *system,
> }
>
> /* caller must hold event_mutex */
> -int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
> +int apply_event_filter(struct ftrace_event_file *file, char *filter_string)
> {
> + struct ftrace_event_call *call = file->event_call;
> struct event_filter *filter;
> + bool use_call_filter;
> int err;
>
> + use_call_filter = call->flags & TRACE_EVENT_FL_USE_CALL_FILTER;
> +
> if (!strcmp(strstrip(filter_string), "0")) {
> - filter_disable(call);
> - filter = call->filter;
> + if (use_call_filter) {
> + call_filter_disable(call);
> + filter = call->filter;
> + } else {
> + filter_disable(file);
> + filter = file->filter;
> + }
> if (!filter)
> return 0;
> - RCU_INIT_POINTER(call->filter, NULL);
> + if (use_call_filter)
> + RCU_INIT_POINTER(call->filter, NULL);
> + else
> + RCU_INIT_POINTER(file->filter, NULL);

Again encapsulate. Hopefully gcc will store the result.

Try to keep that check out as much as possible, including places that I
may have missed in this email.

-- Steve

> /* Make sure the filter is not being used */
> synchronize_sched();
> __free_filter(filter);
> @@ -1879,14 +1957,25 @@ int apply_event_filter(struct ftrace_event_call *call, char *filter_string)
> * string
> */
> if (filter) {
> - struct event_filter *tmp = call->filter;
> + struct event_filter *tmp;
>
> - if (!err)
> - call->flags |= TRACE_EVENT_FL_FILTERED;
> - else
> - filter_disable(call);
> + if (use_call_filter) {
> + tmp = call->filter;
> + if (!err)
> + call->flags |= TRACE_EVENT_FL_FILTERED;
> + else
> + call_filter_disable(call);
> +
> + rcu_assign_pointer(call->filter, filter);
> + } else {
> + tmp = file->filter;
> + if (!err)
> + file->flags |= FTRACE_EVENT_FL_FILTERED;
> + else
> + filter_disable(file);
>
> - rcu_assign_pointer(call->filter, filter);
> + rcu_assign_pointer(file->filter, filter);
> + }
>
> if (tmp) {
> /* Make sure the call is done with the filter */
> @@ -1902,6 +1991,7 @@ int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
> char *filter_string)
> {
> struct event_subsystem *system = dir->subsystem;
> + struct trace_array *tr = dir->tr;
> struct event_filter *filter;
> int err = 0;
>
> @@ -1914,18 +2004,18 @@ int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
> }
>
> if (!strcmp(strstrip(filter_string), "0")) {
> - filter_free_subsystem_preds(system);
> + filter_free_subsystem_preds(system, tr);
> remove_filter_string(system->filter);
> filter = system->filter;
> system->filter = NULL;
> /* Ensure all filters are no longer used */
> synchronize_sched();
> - filter_free_subsystem_filters(system);
> + filter_free_subsystem_filters(system, tr);
> __free_filter(filter);
> goto out_unlock;
> }
>
> - err = create_system_filter(system, filter_string, &filter);
> + err = create_system_filter(system, tr, filter_string, &filter);
> if (filter) {
> /*
> * No event actually uses the system filter
> diff --git a/kernel/trace/trace_export.c b/kernel/trace/trace_export.c
> index d21a746..7c3e3e7 100644
> --- a/kernel/trace/trace_export.c
> +++ b/kernel/trace/trace_export.c
> @@ -180,7 +180,7 @@ struct ftrace_event_call __used event_##call = { \
> .event.type = etype, \
> .class = &event_class_ftrace_##call, \
> .print_fmt = print, \
> - .flags = TRACE_EVENT_FL_IGNORE_ENABLE, \
> + .flags = TRACE_EVENT_FL_IGNORE_ENABLE | TRACE_EVENT_FL_USE_CALL_FILTER, \
> }; \
> struct ftrace_event_call __used \
> __attribute__((section("_ftrace_events"))) *__event_##call = &event_##call;
> diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
> index b5c0924..7d2fcd7 100644
> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -230,7 +230,7 @@ int __trace_graph_entry(struct trace_array *tr,
> return 0;
> entry = ring_buffer_event_data(event);
> entry->graph_ent = *trace;
> - if (!filter_current_check_discard(buffer, call, entry, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> __buffer_unlock_commit(buffer, event);
>
> return 1;
> @@ -335,7 +335,7 @@ void __trace_graph_return(struct trace_array *tr,
> return;
> entry = ring_buffer_event_data(event);
> entry->ret = *trace;
> - if (!filter_current_check_discard(buffer, call, entry, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> __buffer_unlock_commit(buffer, event);
> }
>
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 243f683..dae9541 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -835,7 +835,7 @@ __kprobe_trace_func(struct trace_probe *tp, struct pt_regs *regs,
> entry->ip = (unsigned long)tp->rp.kp.addr;
> store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
>
> - if (!filter_current_check_discard(buffer, call, entry, event))
> + if (!filter_check_discard(ftrace_file, entry, buffer, event))
> trace_buffer_unlock_commit_regs(buffer, event,
> irq_flags, pc, regs);
> }
> @@ -884,7 +884,7 @@ __kretprobe_trace_func(struct trace_probe *tp, struct kretprobe_instance *ri,
> entry->ret_ip = (unsigned long)ri->ret_addr;
> store_trace_args(sizeof(*entry), tp, regs, (u8 *)&entry[1], dsize);
>
> - if (!filter_current_check_discard(buffer, call, entry, event))
> + if (!filter_check_discard(ftrace_file, entry, buffer, event))
> trace_buffer_unlock_commit_regs(buffer, event,
> irq_flags, pc, regs);
> }
> diff --git a/kernel/trace/trace_mmiotrace.c b/kernel/trace/trace_mmiotrace.c
> index b3dcfb2..0abd9b8 100644
> --- a/kernel/trace/trace_mmiotrace.c
> +++ b/kernel/trace/trace_mmiotrace.c
> @@ -323,7 +323,7 @@ static void __trace_mmiotrace_rw(struct trace_array *tr,
> entry = ring_buffer_event_data(event);
> entry->rw = *rw;
>
> - if (!filter_check_discard(call, entry, buffer, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> trace_buffer_unlock_commit(buffer, event, 0, pc);
> }
>
> @@ -353,7 +353,7 @@ static void __trace_mmiotrace_map(struct trace_array *tr,
> entry = ring_buffer_event_data(event);
> entry->map = *map;
>
> - if (!filter_check_discard(call, entry, buffer, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> trace_buffer_unlock_commit(buffer, event, 0, pc);
> }
>
> diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
> index 4e98e3b..3f34dc9 100644
> --- a/kernel/trace/trace_sched_switch.c
> +++ b/kernel/trace/trace_sched_switch.c
> @@ -45,7 +45,7 @@ tracing_sched_switch_trace(struct trace_array *tr,
> entry->next_state = next->state;
> entry->next_cpu = task_cpu(next);
>
> - if (!filter_check_discard(call, entry, buffer, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> trace_buffer_unlock_commit(buffer, event, flags, pc);
> }
>
> @@ -101,7 +101,7 @@ tracing_sched_wakeup_trace(struct trace_array *tr,
> entry->next_state = wakee->state;
> entry->next_cpu = task_cpu(wakee);
>
> - if (!filter_check_discard(call, entry, buffer, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> trace_buffer_unlock_commit(buffer, event, flags, pc);
> }
>
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index 84cdbce..655bcf8 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -326,11 +326,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> (FTRACE_EVENT_FL_SOFT_DISABLED | FTRACE_EVENT_FL_TRIGGER_MODE)) ==
> FTRACE_EVENT_FL_SOFT_DISABLED)
> return;
> -
> sys_data = syscall_nr_to_meta(syscall_nr);
> if (!sys_data)
> return;
> -
> size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
>
> local_save_flags(irq_flags);
> @@ -351,8 +349,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
>
> if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> ring_buffer_discard_commit(buffer, event);
> - else if (!filter_current_check_discard(buffer, sys_data->enter_event,
> - entry, event))
> + else if (!filter_check_discard(ftrace_file, entry, buffer, event))
> trace_current_buffer_unlock_commit(buffer, event,
> irq_flags, pc);
> if (__tt)
> @@ -409,8 +406,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
>
> if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> ring_buffer_discard_commit(buffer, event);
> - else if (!filter_current_check_discard(buffer, sys_data->exit_event,
> - entry, event))
> + else if (!filter_check_discard(ftrace_file, entry, buffer, event))
> trace_current_buffer_unlock_commit(buffer, event,
> irq_flags, pc);
> if (__tt)
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 272261b..b6dcc42 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -128,6 +128,7 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
> if (is_ret)
> tu->consumer.ret_handler = uretprobe_dispatcher;
> init_trace_uprobe_filter(&tu->filter);
> + tu->call.flags |= TRACE_EVENT_FL_USE_CALL_FILTER;
> return tu;
>
> error:
> @@ -561,7 +562,7 @@ static void uprobe_trace_print(struct trace_uprobe *tu,
> for (i = 0; i < tu->nr_args; i++)
> call_fetch(&tu->args[i].fetch, regs, data + tu->args[i].offset);
>
> - if (!filter_current_check_discard(buffer, call, entry, event))
> + if (!call_filter_check_discard(call, entry, buffer, event))
> trace_buffer_unlock_commit(buffer, event, 0, 0);
> }
>

2013-08-28 19:51:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v7 00/10] tracing: trace event triggers (repost)

On Tue, 27 Aug 2013 14:40:12 -0500
Tom Zanussi <[email protected]> wrote:

> Tom Zanussi (10):
> tracing: Add support for SOFT_DISABLE to syscall events
> tracing: add basic event trigger framework
> tracing: add 'traceon' and 'traceoff' event trigger commands
> tracing: add 'snapshot' event trigger command
> tracing: add 'stacktrace' event trigger command
> tracing: add 'enable_event' and 'disable_event' event trigger
> commands
> tracing: add and use generic set_trigger_filter() implementation
> tracing: update event filters for multibuffer
> tracing: add documentation for trace event triggers
> tracing: make register/unregister_ftrace_command __init

In your next series, can you please fix the subjects. The lines should
start with a capital letter. You did it correctly on your first patch,
but failed to do it for the others.

Thanks!

-- Steve

2013-08-29 02:52:47

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v7 07/10] tracing: add and use generic set_trigger_filter() implementation

On Wed, 2013-08-28 at 12:38 -0400, Steven Rostedt wrote:
> On Tue, 27 Aug 2013 14:40:19 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > enum {
> > FILTER_OTHER = 0,
> > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> > index 326ba32..6c701c3 100644
> > --- a/include/trace/ftrace.h
> > +++ b/include/trace/ftrace.h
> > @@ -412,13 +412,15 @@ static inline notrace int ftrace_get_offsets_##call( \
> > * struct ftrace_data_offsets_<call> __maybe_unused __data_offsets;
> > * struct ring_buffer_event *event;
> > * struct ftrace_raw_<call> *entry; <-- defined in stage 1
> > + * enum event_trigger_type __tt = ETT_NONE;
> > * struct ring_buffer *buffer;
> > * unsigned long irq_flags;
> > * int __data_size;
> > * int pc;
> > *
> > - * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> > - * &ftrace_file->flags))
> > + * if ((ftrace_file->flags & (FTRACE_EVENT_FL_SOFT_DISABLED |
> > + * FTRACE_EVENT_FL_TRIGGER_MODE)) ==
> > + * FTRACE_EVENT_FL_SOFT_DISABLED)
>
> Don't worry too much about 80 character limit here. Move the
> FL_SOFT_DISABLED up.
>
> > * return;
> > *
> > * local_save_flags(irq_flags);
> > @@ -437,9 +439,19 @@ static inline notrace int ftrace_get_offsets_##call( \
> > * { <assign>; } <-- Here we assign the entries by the __field and
> > * __array macros.
> > *
> > - * if (!filter_current_check_discard(buffer, event_call, entry, event))
> > - * trace_nowake_buffer_unlock_commit(buffer,
> > - * event, irq_flags, pc);
> > + * if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> > + * &ftrace_file->flags))
> > + * __tt = event_triggers_call(ftrace_file, entry);
> > + *
> > + * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> > + * &ftrace_file->flags))
> > + * ring_buffer_discard_commit(buffer, event);
> > + * else if (!filter_current_check_discard(buffer, event_call,
> > + * entry, event))
> > + * trace_buffer_unlock_commit(buffer, event, irq_flags, pc);
> > + *
> > + * if (__tt)
> > + * event_triggers_post_call(ftrace_file, __tt);
> > * }
> > *
> > * static struct trace_event ftrace_event_type_<call> = {
> > @@ -521,17 +533,15 @@ ftrace_raw_event_##call(void *__data, proto) \
> > struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
> > struct ring_buffer_event *event; \
> > struct ftrace_raw_##call *entry; \
> > + enum event_trigger_type __tt = ETT_NONE; \
> > struct ring_buffer *buffer; \
> > unsigned long irq_flags; \
> > int __data_size; \
> > int pc; \
> > \
> > - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > - &ftrace_file->flags)) \
> > - event_triggers_call(ftrace_file); \
> > - \
> > - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> > - &ftrace_file->flags)) \
> > + if ((ftrace_file->flags & (FTRACE_EVENT_FL_SOFT_DISABLED | \
> > + FTRACE_EVENT_FL_TRIGGER_MODE)) == \
> > + FTRACE_EVENT_FL_SOFT_DISABLED) \
>
> Ditto.
>
> Also, I don't think we need to worry about the flags changing, so we
> should be able to just save it.
>
> unsigned long eflags = ftrace_file_flags;
>
> And then we can also only delay the event triggers if it has a
> condition. What about adding a FTRACE_EVENT_FL_TRIGGER_COND_BIT, and do:
>
> if (!(eflags & FTRACE_EVENT_FL_TRIGGER_COND))
> if (eflags & FTRACE_EVENT_FL_TRIGGER_MODE)
> event_triggers_call(ftrace_file, NULL);
>
> if (eflags & FTRACE_EVENT_FL_SOFT_DISABLED)
> return;
> }
>

I like the additional TRIGGER_COND flag, but as written I think the
above leads to different output in the non-SOFT_DISABLED trigger cases
depending on whether it's set or or not. For instance, stacktrace
triggers invoked when TRIGGER_COND is cleared will print
trace_dump_stack() before the triggering event (above, which falls
through to print the triggering event following the stacktrace if not
SOFT_DISABLED), but if called for the same triggers with TRIGGER_COND
set, the stacktrace will end up following the triggering event
(event_triggers_call()/event_triggers_post_call() not called above, but
following the triggering event print.)

So above I think we want to call the triggers and then return only if
there are no filters on the triggers and the triggering event is soft
disabled e.g.:

if (!(eflags & FTRACE_EVENT_FL_TRIGGER_COND))
if ((eflags & FTRACE_EVENT_FL_TRIGGER_MODE) &&
(eflags & FTRACE_EVENT_FL_SOFT_DISABLED)) {
event_triggers_call(ftrace_file, NULL);
return;
}

Otherwise, fall through and call the triggers after the current event is
logged. Of course, none of this matters for the non-stacktrace
(non-logging) triggers - they can be called anytime without loss of
consistency in the output.

Tom

> > return; \
> > \
> > local_save_flags(irq_flags); \
> > @@ -551,8 +561,19 @@ ftrace_raw_event_##call(void *__data, proto) \
> > \
> > { assign; } \
> > \
> > - if (!filter_current_check_discard(buffer, event_call, entry, event)) \
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > + &ftrace_file->flags)) \
> > + __tt = event_triggers_call(ftrace_file, entry); \
>
> Then here we test just the TRIGGER_COND bit.
>

>
> > + \
> > + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> > + &ftrace_file->flags)) \
> > + ring_buffer_discard_commit(buffer, event); \
> > + else if (!filter_current_check_discard(buffer, event_call, \
> > + entry, event)) \
> > trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
> > + \
> > + if (__tt) \
> > + event_triggers_post_call(ftrace_file, __tt); \
>
> This part is fine.
>
> > }
> > /*
> > * The ftrace_test_probe is compiled out, it is only here as a build time check
> > diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> > index 3cb846e..af5f3b6 100644
> > --- a/kernel/trace/trace.h
> > +++ b/kernel/trace/trace.h
> > @@ -994,6 +994,10 @@ extern int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
> > extern void print_subsystem_event_filter(struct event_subsystem *system,
> > struct trace_seq *s);
> > extern int filter_assign_type(const char *type);
> > +extern int create_event_filter(struct ftrace_event_call *call,
> > + char *filter_str, bool set_str,
> > + struct event_filter **filterp);
> > +extern void free_event_filter(struct event_filter *filter);
> >
> > struct ftrace_event_field *
> > trace_find_event_field(struct ftrace_event_call *call, char *name);
> > diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
> > index 97daa8c..0c45aa1 100644
> > --- a/kernel/trace/trace_events_filter.c
> > +++ b/kernel/trace/trace_events_filter.c
> > @@ -781,6 +781,11 @@ static void __free_filter(struct event_filter *filter)
> > kfree(filter);
> > }
> >
> > +void free_event_filter(struct event_filter *filter)
> > +{
> > + __free_filter(filter);
> > +}
> > +
> > /*
> > * Called when destroying the ftrace_event_call.
> > * The call is being freed, so we do not need to worry about
> > @@ -1806,6 +1811,14 @@ static int create_filter(struct ftrace_event_call *call,
> > return err;
> > }
> >
> > +int create_event_filter(struct ftrace_event_call *call,
> > + char *filter_str, bool set_str,
> > + struct event_filter **filterp)
> > +{
> > + return create_filter(call, filter_str, set_str, filterp);
> > +}
> > +
> > +
> > /**
> > * create_system_filter - create a filter for an event_subsystem
> > * @system: event_subsystem to create a filter for
> > diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
> > index 54678e2..b5e7ca7 100644
> > --- a/kernel/trace/trace_events_trigger.c
> > +++ b/kernel/trace/trace_events_trigger.c
> > @@ -43,24 +43,53 @@ struct event_trigger_data {
> > static void
> > trigger_data_free(struct event_trigger_data *data)
> > {
> > + if (data->cmd_ops->set_filter)
> > + data->cmd_ops->set_filter(NULL, data, NULL);
> > +
> > synchronize_sched(); /* make sure current triggers exit before free */
> > kfree(data);
> > }
> >
> > -void event_triggers_call(struct ftrace_event_file *file)
> > +enum event_trigger_type
> > +event_triggers_call(struct ftrace_event_file *file, void *rec)
> > {
> > struct event_trigger_data *data;
> > + enum event_trigger_type tt = ETT_NONE;
> >
> > if (list_empty(&file->triggers))
> > - return;
> > + return tt;
> >
> > preempt_disable_notrace();
> > - list_for_each_entry_rcu(data, &file->triggers, list)
> > + list_for_each_entry_rcu(data, &file->triggers, list) {
> > + if (data->filter && !filter_match_preds(data->filter, rec))
>
> We would need a check for !rec, just to be safe (with the mods I talked
> about).
>
> > + continue;
> > + if (data->cmd_ops->post_trigger) {
> > + tt |= data->cmd_ops->trigger_type;
> > + continue;
> > + }
> > data->ops->func((void **)&data);
> > + }
> > preempt_enable_notrace();
> > +
> > + return tt;
> > }
> > EXPORT_SYMBOL_GPL(event_triggers_call);
> >
> > +void
> > +event_triggers_post_call(struct ftrace_event_file *file,
> > + enum event_trigger_type tt)
> > +{
> > + struct event_trigger_data *data;
> > +
> > + preempt_disable_notrace();
> > + list_for_each_entry_rcu(data, &file->triggers, list) {
> > + if (data->cmd_ops->trigger_type & tt)
> > + data->ops->func((void **)&data);
> > + }
> > + preempt_enable_notrace();
> > +}
> > +EXPORT_SYMBOL_GPL(event_triggers_post_call);
> > +
> > static void *trigger_next(struct seq_file *m, void *t, loff_t *pos)
> > {
> > struct ftrace_event_file *event_file = event_file_data(m->private);
> > @@ -561,6 +590,66 @@ event_trigger_callback(struct event_command *cmd_ops,
> > goto out;
> > }
> >
> > +/**
> > + * set_trigger_filter - generic event_command @set_filter
> > + * implementation
> > + *
> > + * Common implementation for event command filter parsing and filter
> > + * instantiation.
> > + *
> > + * Usually used directly as the @set_filter method in event command
> > + * implementations.
> > + *
> > + * Also used to remove a filter (if filter_str = NULL).
> > + */
> > +static int set_trigger_filter(char *filter_str, void *trigger_data,
> > + struct ftrace_event_file *file)
> > +{
> > + struct event_trigger_data *data = trigger_data;
> > + struct event_filter *filter = NULL, *tmp;
> > + int ret = -EINVAL;
> > + char *s;
> > +
> > + if (!filter_str) /* clear the current filter */
> > + goto assign;
> > +
> > + s = strsep(&filter_str, " \t");
> > +
> > + if (!strlen(s) || strcmp(s, "if") != 0)
> > + goto out;
> > +
> > + if (!filter_str)
> > + goto out;
> > +
> > + /* The filter is for the 'trigger' event, not the triggered event */
> > + ret = create_event_filter(file->event_call, filter_str, false, &filter);
> > + if (ret)
> > + goto out;
> > + assign:
> > + tmp = data->filter;
> > +
> > + rcu_assign_pointer(data->filter, filter);
> > +
> > + if (tmp) {
> > + /* Make sure the call is done with the filter */
> > + synchronize_sched();
> > + free_event_filter(tmp);
> > + }
> > +
> > + kfree(data->filter_str);
> > +
> > + if (filter_str) {
> > + data->filter_str = kstrdup(filter_str, GFP_KERNEL);
> > + if (!data->filter_str) {
> > + free_event_filter(data->filter);
> > + data->filter = NULL;
> > + ret = -ENOMEM;
> > + }
> > + }
> > + out:
> > + return ret;
> > +}
> > +
> > static void
> > traceon_trigger(void **_data)
> > {
> > @@ -698,6 +787,7 @@ static struct event_command trigger_traceon_cmd = {
> > .reg = register_trigger,
> > .unreg = unregister_trigger,
> > .get_trigger_ops = onoff_get_trigger_ops,
> > + .set_filter = set_trigger_filter,
> > };
> >
> > static struct event_command trigger_traceoff_cmd = {
> > @@ -707,6 +797,7 @@ static struct event_command trigger_traceoff_cmd = {
> > .reg = register_trigger,
> > .unreg = unregister_trigger,
> > .get_trigger_ops = onoff_get_trigger_ops,
> > + .set_filter = set_trigger_filter,
> > };
> >
> > static void
> > @@ -788,6 +879,7 @@ static struct event_command trigger_snapshot_cmd = {
> > .reg = register_snapshot_trigger,
> > .unreg = unregister_trigger,
> > .get_trigger_ops = snapshot_get_trigger_ops,
> > + .set_filter = set_trigger_filter,
> > };
> >
> > /*
> > @@ -867,6 +959,7 @@ static struct event_command trigger_stacktrace_cmd = {
> > .reg = register_trigger,
> > .unreg = unregister_trigger,
> > .get_trigger_ops = stacktrace_get_trigger_ops,
> > + .set_filter = set_trigger_filter,
> > };
> >
> > static __init void unregister_trigger_traceon_traceoff_cmds(void)
> > @@ -1194,6 +1287,7 @@ static struct event_command trigger_enable_cmd = {
> > .reg = event_enable_register_trigger,
> > .unreg = event_enable_unregister_trigger,
> > .get_trigger_ops = event_enable_get_trigger_ops,
> > + .set_filter = set_trigger_filter,
> > };
> >
> > static struct event_command trigger_disable_cmd = {
> > @@ -1203,6 +1297,7 @@ static struct event_command trigger_disable_cmd = {
> > .reg = event_enable_register_trigger,
> > .unreg = event_enable_unregister_trigger,
> > .get_trigger_ops = event_enable_get_trigger_ops,
> > + .set_filter = set_trigger_filter,
> > };
> >
> > static __init void unregister_trigger_enable_disable_cmds(void)
> > diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> > index 4f56d54..84cdbce 100644
> > --- a/kernel/trace/trace_syscalls.c
> > +++ b/kernel/trace/trace_syscalls.c
> > @@ -306,6 +306,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> > struct syscall_trace_enter *entry;
> > struct syscall_metadata *sys_data;
> > struct ring_buffer_event *event;
> > + enum event_trigger_type __tt = ETT_NONE;
> > struct ring_buffer *buffer;
> > unsigned long irq_flags;
> > int pc;
> > @@ -321,9 +322,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> > if (!ftrace_file)
> > return;
> >
> > - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> > - event_triggers_call(ftrace_file);
> > - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> > + if ((ftrace_file->flags &
> > + (FTRACE_EVENT_FL_SOFT_DISABLED | FTRACE_EVENT_FL_TRIGGER_MODE)) ==
> > + FTRACE_EVENT_FL_SOFT_DISABLED)
>
> We would need the same changes here too.
>
> -- Steve
>
> > return;
> >
> > sys_data = syscall_nr_to_meta(syscall_nr);
> > @@ -345,10 +346,17 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> > entry->nr = syscall_nr;
> > syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
> >
> > - if (!filter_current_check_discard(buffer, sys_data->enter_event,
> > - entry, event))
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> > + __tt = event_triggers_call(ftrace_file, entry);
> > +
> > + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> > + ring_buffer_discard_commit(buffer, event);
> > + else if (!filter_current_check_discard(buffer, sys_data->enter_event,
> > + entry, event))
> > trace_current_buffer_unlock_commit(buffer, event,
> > irq_flags, pc);
> > + if (__tt)
> > + event_triggers_post_call(ftrace_file, __tt);
> > }
> >
> > static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> > @@ -358,6 +366,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> > struct syscall_trace_exit *entry;
> > struct syscall_metadata *sys_data;
> > struct ring_buffer_event *event;
> > + enum event_trigger_type __tt = ETT_NONE;
> > struct ring_buffer *buffer;
> > unsigned long irq_flags;
> > int pc;
> > @@ -372,9 +381,9 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> > if (!ftrace_file)
> > return;
> >
> > - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> > - event_triggers_call(ftrace_file);
> > - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> > + if ((ftrace_file->flags &
> > + (FTRACE_EVENT_FL_SOFT_DISABLED | FTRACE_EVENT_FL_TRIGGER_MODE)) ==
> > + FTRACE_EVENT_FL_SOFT_DISABLED)
> > return;
> >
> > sys_data = syscall_nr_to_meta(syscall_nr);
> > @@ -395,10 +404,17 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> > entry->nr = syscall_nr;
> > entry->ret = syscall_get_return_value(current, regs);
> >
> > - if (!filter_current_check_discard(buffer, sys_data->exit_event,
> > - entry, event))
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &ftrace_file->flags))
> > + __tt = event_triggers_call(ftrace_file, entry);
> > +
> > + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &ftrace_file->flags))
> > + ring_buffer_discard_commit(buffer, event);
> > + else if (!filter_current_check_discard(buffer, sys_data->exit_event,
> > + entry, event))
> > trace_current_buffer_unlock_commit(buffer, event,
> > irq_flags, pc);
> > + if (__tt)
> > + event_triggers_post_call(ftrace_file, __tt);
> > }
> >
> > static int reg_event_syscall_enter(struct ftrace_event_file *file,
>

2013-08-29 02:54:44

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v7 00/10] tracing: trace event triggers (repost)

On Wed, 2013-08-28 at 15:51 -0400, Steven Rostedt wrote:
> On Tue, 27 Aug 2013 14:40:12 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Tom Zanussi (10):
> > tracing: Add support for SOFT_DISABLE to syscall events
> > tracing: add basic event trigger framework
> > tracing: add 'traceon' and 'traceoff' event trigger commands
> > tracing: add 'snapshot' event trigger command
> > tracing: add 'stacktrace' event trigger command
> > tracing: add 'enable_event' and 'disable_event' event trigger
> > commands
> > tracing: add and use generic set_trigger_filter() implementation
> > tracing: update event filters for multibuffer
> > tracing: add documentation for trace event triggers
> > tracing: make register/unregister_ftrace_command __init
>
> In your next series, can you please fix the subjects. The lines should
> start with a capital letter. You did it correctly on your first patch,
> but failed to do it for the others.
>
> Thanks!
>

Yeah, will do, and thanks for the review..

Tom

> -- Steve
>