Hi,
This patchset implements 'trace event triggers', which are similar to
the function triggers implemented for 'ftrace filter commands' (see
'Filter commands' in Documentation/trace/ftrace.txt), but instead of
being invoked from function calls are invoked by trace events.
Basically the patchset allows 'commands' to be triggered whenever a
given trace event is hit. The set of commands implemented by this
patchset are:
- enable/disable_event - enable or disable another event whenever
the trigger event is hit
- stacktrace - dump a stacktrace to the trace buffer whenever the
trigger event is hit
- snapshot - create a snapshot of the current trace buffer whenever
the trigger event is hit
- traceon/traceoff - turn tracing on or off whenever the trigger
event is hit
Triggers can also be conditionally invoked by associating a standard
trace event filter with them - if the given event passes the filter,
the trigger is invoked, otherwise it's not. (see 'Event filtering' in
Documentation/trace/events.txt for info on event filters).
See the last patch in the series for more complete documention on
event triggers and the available trigger commands, and below for some
simple examples of each of the above commands along with conditional
filtering.
The first four patches are bugfix patches or minor improvements which
can be applied regardless; the rest contain the basic framework and
implementations for each command.
This patchset was based on some ideas from Steve Rostedt, which he
outlined during a couple discussions at ELC and follow-on e-mails.
Code- and interface-wise, it's also partially based on the existing
function triggers implementation and essentially works on top of the
SOFT_DISABLE mode introduced for that. Both Steve and Masami
Hiramatsu took a look at a couple early versions of this patchset, and
offered some very useful suggestions reflected in this patchset -
thanks to them both for the ideas and for taking the time to do some
basic sanity reviews!
Below are a few concrete examples demonstrating each of the available
commands.
The first example attempts to capture all the kmalloc events that
happen as a result of reading a particular file.
The first part of the set of commands below adds a kmalloc
'enable_event' trigger to the sys_enter_read trace event - as a
result, when the sys_enter_read event occurs, kmalloc events are
enabled, resulting in those kmalloc events getting logged into the
trace buffer. The :1 at the end of the kmalloc enable_event specifies
that the enabling of kmalloc events on sys_enter_read will only happen
once - subsequent reads won't trigger the kmalloc logging. The next
part of the example reads a test file, which triggers the
sys_enter_read tracepoint and thus turns on the kmalloc events, and
once done, adds a trigger to sys_exit_read that disables kmalloc
events. The disable_event doesn't have a :1 appended, which means it
happens on every sys_exit_read.
# echo 'enable_event:kmem:kmalloc:1' > \
/sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger; \
cat ~/junk.txt > /dev/null; \
echo 'disable_event:kmem:kmalloc' > \
/sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
Just to show a bit of what happens under the covers, if we display the
kmalloc 'enable' file, we see that it's 'soft disabled' (the asterisk
after the enable flag). This means that it's actually enabled but is
in the SOFT_DISABLED state, and is essentially held back from actually
logging anything to the trace buffer, but can be made to log into the
buffer by simply flipping a bit :
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/enable
0*
If we look at the 'enable' file for the triggering sys_enter_read
trace event, we can see that it also has the 'soft disable' flag set.
This is because in the case of the triggering event, we also need to
have the trace event invoked regardless of whether or not its actually
being logged, so we can process the triggers. This functionality is
also built on top of the SOFT_DISABLE flag and is reflected in the
enable state as well:
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/enable
0*
To find out which triggers are set for a particular event, we can look
at the 'trigger' file for the event. Here's what the 'trigger' file
for the sys_enter_read event looks like:
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
enable_event:kmem:kmalloc:count=0
The 'count=0' field at the end shows that this trigger has no more
triggering ability left - it's essentially fired all its shots - if
it was still active, it would have a non-zero count.
Looking at the sys_exit_read, we see that since we didn't specify a
number at the end, the number of times it can fire is unlimited:
# cat /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
disable_event:kmem:kmalloc:unlimited
# cat /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/enable
0*
Finally, let's look at the results of the above set of commands by
cat'ing the 'trace' file:
# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 85/85 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
cat-2596 [001] .... 374.518849: kmalloc: call_site=ffffffff812de707 ptr=ffff8800306b9290 bytes_req=2 bytes_alloc=8 gfp_flags=GFP_KERNEL|GFP_ZERO
cat-2596 [001] .... 374.518956: kmalloc: call_site=ffffffff81182a12 ptr=ffff88010c8e1500 bytes_req=256 bytes_alloc=256 gfp_flags=GFP_KERNEL|GFP_ZERO
cat-2596 [001] .... 374.518959: kmalloc: call_site=ffffffff812d8e49 ptr=ffff88003002a200 bytes_req=32 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
cat-2596 [001] .... 374.518960: kmalloc: call_site=ffffffff812de707 ptr=ffff8800306b9088 bytes_req=2 bytes_alloc=8 gfp_flags=GFP_KERNEL|GFP_ZERO
cat-2596 [003] .... 374.519063: kmalloc: call_site=ffffffff812d9f50 ptr=ffff8800b793fd00 bytes_req=256 bytes_alloc=256 gfp_flags=GFP_KERNEL
cat-2596 [003] .... 374.519119: kmalloc: call_site=ffffffff811cc3bc ptr=ffff8800b7918900 bytes_req=128 bytes_alloc=128 gfp_flags=GFP_KERNEL
cat-2596 [003] .... 374.519122: kmalloc: call_site=ffffffff811cc4d2 ptr=ffff880030404800 bytes_req=504 bytes_alloc=512 gfp_flags=GFP_KERNEL
cat-2596 [003] .... 374.519125: kmalloc: call_site=ffffffff811cc64e ptr=ffff88003039d8a0 bytes_req=28 bytes_alloc=32 gfp_flags=GFP_KERNEL
.
.
.
Xorg-1194 [000] .... 374.543956: kmalloc: call_site=ffffffffa03a8599 ptr=ffff8800ba23b700 bytes_req=112 bytes_alloc=128 gfp_flags=GFP_TEMPORARY|GFP_NOWARN|GFP_NORETRY
Xorg-1194 [000] .... 374.543961: kmalloc: call_site=ffffffffa03a7639 ptr=ffff8800b7905b40 bytes_req=56 bytes_alloc=64 gfp_flags=GFP_TEMPORARY|GFP_ZERO
Xorg-1194 [000] .... 374.543973: kmalloc: call_site=ffffffffa039f716 ptr=ffff8800b7905ac0 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL
.
.
.
compiz-1769 [002] .... 374.547586: kmalloc: call_site=ffffffffa03a8599 ptr=ffff8800ba320400 bytes_req=952 bytes_alloc=1024 gfp_flags=GFP_TEMPORARY|GFP_NOWARN|GFP_NORETRY
compiz-1769 [002] .... 374.547592: kmalloc: call_site=ffffffffa03a7639 ptr=ffff8800bd5f7400 bytes_req=280 bytes_alloc=512 gfp_flags=GFP_TEMPORARY|GFP_ZERO
compiz-1769 [002] .... 374.547623: kmalloc: call_site=ffffffffa039f716 ptr=ffff8800b792d580 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL
.
.
.
cat-2596 [000] .... 374.646019: kmalloc: call_site=ffffffff8123df9f ptr=ffff8800ba2f2900 bytes_req=96 bytes_alloc=96 gfp_flags=GFP_NOFS|GFP_ZERO
cat-2596 [000] .... 374.648263: kmalloc: call_site=ffffffff8123df9f ptr=ffff8800ba2f2900 bytes_req=96 bytes_alloc=96 gfp_flags=GFP_NOFS|GFP_ZERO
cat-2596 [000] .... 374.650503: kmalloc: call_site=ffffffff8123df9f ptr=ffff8800ba2f2900 bytes_req=96 bytes_alloc=96 gfp_flags=GFP_NOFS|GFP_ZERO
.
.
.
bash-2425 [002] .... 374.654923: kmalloc: call_site=ffffffff8123df9f ptr=ffff8800b7a28780 bytes_req=96 bytes_alloc=96 gfp_flags=GFP_NOFS|GFP_ZERO
rsyslogd-974 [002] .... 374.655163: kmalloc: call_site=ffffffff81046ae6 ptr=ffff8800ba320400 bytes_req=1024 bytes_alloc=1024 gfp_flags=GFP_KERNEL
As you can see, we captured all the kmallocs from our 'cat' reads, but
also any other kmallocs that happened for other processes between the
time we turned on kmalloc events and turned them off. Future work
should add a way to screen out unwanted events e.g. the abilitiy to
capture the triggering pid in a simple variable and use that variable
in event filters to screen out other pids.
To turn off the events we turned on, simply reinvoke the commands
prefixed by '!':
# echo '!enable_event:kmem:kmalloc:1' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# echo '!disable_event:kmem:kmalloc' > /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
You can verify that the events have been turned off by again examining
the 'enable' and 'trigger' files:
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/enable
0
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/enable
0
The next example shows how to use the 'stacktrace' command. To have a
stacktrace logged every time a particular event occurs, simply echo
'stacktrace' into the 'trigger' file for that event:
# echo 'stacktrace' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
stacktrace:unlimited
Looking at the 'trace' output, we indeed see stack traces for every
kmalloc:
# cat /sys/kernel/debug/tracing/trace
compiz-1769 [003] .... 2422.614630: <stack trace>
=> i915_add_request
=> i915_gem_do_execbuffer.isra.15
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [002] .... 2422.619076: <stack trace>
=> drm_wait_vblank
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [000] .... 2422.625823: <stack trace>
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
.
.
.
bash-2842 [001] .... 2423.002059: <stack trace>
=> __tracing_open
=> tracing_open
=> do_dentry_open
=> finish_open
=> do_last
=> path_openat
=> do_filp_open
=> do_sys_open
=> SyS_open
=> system_call_fastpath
bash-2842 [001] .... 2423.002070: <stack trace>
=> __tracing_open
=> tracing_open
=> do_dentry_open
=> finish_open
=> do_last
=> path_openat
=> do_filp_open
=> do_sys_open
=> SyS_open
=> system_call_fastpath
For an event like kmalloc, however, we don't typically want to see a
stack trace for every single event, since the amount of data produced
is overwhelming. What we'd typically want to do is only log a stack
trace for particular events of interest. We can accomplish that by
appending an 'event filter' to the trigger. The event filters used to
filter triggers are exactly the same as those implemented for the
existing trace event 'filter' files - see the trace event
documentation for details.
First, let's turn off the existing stacktrace event, and clear the
trace buffer:
# echo '!stacktrace' > /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# echo > /sys/kernel/debug/tracing/trace
Now, we can add a new stacktrace trigger which will fire 5 times, but
only if the number of bytes requested by the caller was greater than
or equal to 512:
# echo 'stacktrace:5 if bytes_req >= 512' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
stacktrace:count=0 if bytes_req >= 512
>From looking at the trigger, we can see the event fired 5 times
(count=0) and looking at the 'trace' file, we can verify that:
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 5/5 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
rsyslogd-974 [000] .... 1796.412997: <stack trace>
=> kmem_cache_alloc_trace
=> do_syslog
=> kmsg_read
=> proc_reg_read
=> vfs_read
=> SyS_read
=> system_call_fastpath
compiz-1769 [000] .... 1796.427342: <stack trace>
=> __kmalloc
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [003] .... 1796.441251: <stack trace>
=> __kmalloc
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [003] .... 1796.441392: <stack trace>
=> __kmalloc
=> sg_kmalloc
=> __sg_alloc_table
=> sg_alloc_table
=> i915_gem_object_get_pages_gtt
=> i915_gem_object_get_pages
=> i915_gem_object_pin
=> i915_gem_execbuffer_reserve_object.isra.11
=> i915_gem_execbuffer_reserve
=> i915_gem_do_execbuffer.isra.15
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
Xorg-1194 [003] .... 1796.441672: <stack trace>
=> __kmalloc
=> i915_gem_execbuffer2
=> drm_ioctl
=> do_vfs_ioctl
=> SyS_ioctl
=> system_call_fastpath
So the trace output shows exactly 5 stacktraces, as expected.
Just for comparison, let's look at an event that's harder to trigger,
to see a count that isn't 0 in the trigger description:
# echo 'stacktrace:5 if bytes_req >= 65536' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
stacktrace:count=5 if bytes_req >= 65536
The next example shows how to use the 'snapshot' command to capture a
snapshot of the trace buffer when an 'interesting' event occurs.
In this case, we'll first start the entire block subsystem tracing:
# echo 1 > /sys/kernel/debug/tracing/events/block/enable
Next, we add a 'snapshot' trigger that will take a snapshot of all the
events leading up to the particular event we're interested in, which
is a block queue unplug with a depth > 1. In this case we're
interested in capturing the snapshot just one time, the first time it
occurs:
# echo 'snapshot:1 if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
It may take awhile for the condition to occur, but once it does, we
can see the entire sequence of block events leading up to in in the
'snapshot' file:
# cat /sys/kernel/debug/tracing/snapshot
jbd2/sdb1-8-278 [001] .... 382.075012: block_bio_queue: 8,16 WS 629429976 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [001] .... 382.075012: block_bio_backmerge: 8,16 WS 629429976 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [001] d... 382.075015: block_rq_insert: 8,16 WS 0 () 629429912 + 72 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [001] d... 382.075030: block_rq_issue: 8,16 WS 0 () 629429912 + 72 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [001] d... 382.075044: block_unplug: [jbd2/sdb1-8] 1
<idle>-0 [000] ..s. 382.075310: block_rq_complete: 8,16 WS () 629429912 + 72 [0]
jbd2/sdb1-8-278 [000] .... 382.075407: block_touch_buffer: 8,17 sector=78678492 size=4096
jbd2/sdb1-8-278 [000] .... 382.075413: block_bio_remap: 8,16 FWFS 629429984 + 8 <- (8,17) 629427936
jbd2/sdb1-8-278 [000] .... 382.075415: block_bio_queue: 8,16 FWFS 629429984 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [000] .... 382.075418: block_getrq: 8,16 FWFS 629429984 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [000] d... 382.075421: block_rq_insert: 8,16 FWFS 0 () 629429984 + 8 [jbd2/sdb1-8]
jbd2/sdb1-8-278 [000] d... 382.075424: block_rq_issue: 8,16 FWS 0 () 18446744073709551615 + 0 [jbd2/sdb1-8]
<idle>-0 [000] dNs. 382.115912: block_rq_issue: 8,16 WS 0 () 629429984 + 8 [swapper/0]
<idle>-0 [000] ..s. 382.116059: block_rq_complete: 8,16 WS () 629429984 + 8 [0]
<idle>-0 [000] dNs. 382.116079: block_rq_issue: 8,16 FWS 0 () 18446744073709551615 + 0 [swapper/0]
<idle>-0 [000] d.s. 382.131030: block_rq_complete: 8,16 WS () 629429984 + 0 [0]
jbd2/sdb1-8-278 [000] .... 382.131106: block_dirty_buffer: 8,17 sector=26 size=4096
jbd2/sdb1-8-278 [000] .... 382.131111: block_dirty_buffer: 8,17 sector=106954757 size=4096
.
.
.
kworker/u16:3-66 [002] .... 387.144505: block_bio_remap: 8,16 WM 2208 + 8 <- (8,17) 160
kworker/u16:3-66 [002] .... 387.144512: block_bio_queue: 8,16 WM 2208 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] .... 387.144522: block_getrq: 8,16 WM 2208 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] .... 387.144525: block_plug: [kworker/u16:3]
kworker/u16:3-66 [002] .... 387.144530: block_bio_remap: 8,16 WM 2216 + 8 <- (8,17) 168
kworker/u16:3-66 [002] .... 387.144531: block_bio_queue: 8,16 WM 2216 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] .... 387.144533: block_bio_backmerge: 8,16 WM 2216 + 8 [kworker/u16:3]
.
.
.
kworker/u16:3-66 [002] d... 387.144631: block_rq_insert: 8,16 WM 0 () 2208 + 16 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144636: block_rq_insert: 8,16 WM 0 () 2256 + 16 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144638: block_rq_insert: 8,16 WM 0 () 662702080 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144640: block_rq_insert: 8,16 WM 0 () 683673680 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144641: block_rq_insert: 8,16 WM 0 () 729812344 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144642: block_rq_insert: 8,16 WM 0 () 729828896 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144643: block_rq_insert: 8,16 WM 0 () 730599480 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144644: block_rq_insert: 8,16 WM 0 () 855640104 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144645: block_rq_insert: 8,16 WM 0 () 880805984 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144646: block_rq_insert: 8,16 WM 0 () 1186990400 + 8 [kworker/u16:3]
kworker/u16:3-66 [002] d... 387.144649: block_unplug: [kworker/u16:3] 10
The final example shows something very similer but using the
'traceoff' command to stop tracing when an 'interesting' event occurs.
The traceon and traceoff commands can be used together to toggle
tracing on and off in creative ways to capture different traces in the
'trace' buffer, but this example just shows essentially the same use
case as the previous example but using 'traceoff' to capture trace
data of interest in the standard 'trace' buffer.
Again, we'll start the entire block subsystem tracing:
# echo 1 > /sys/kernel/debug/tracing/events/block/enable
# echo 'traceoff:1 if nr_rq > 1' > \
/sys/kernel/debug/tracing/events/block/block_unplug/trigger
# cat /sys/kernel/debug/tracing/trace
kworker/u16:4-67 [000] .... 803.003670: block_bio_remap: 8,16 WM 2208 + 8 <- (8,17) 160
kworker/u16:4-67 [000] .... 803.003670: block_bio_queue: 8,16 WM 2208 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003672: block_getrq: 8,16 WM 2208 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003674: block_bio_remap: 8,16 WM 2216 + 8 <- (8,17) 168
kworker/u16:4-67 [000] .... 803.003675: block_bio_queue: 8,16 WM 2216 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003676: block_bio_backmerge: 8,16 WM 2216 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003678: block_bio_remap: 8,16 WM 2232 + 8 <- (8,17) 184
kworker/u16:4-67 [000] .... 803.003678: block_bio_queue: 8,16 WM 2232 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] .... 803.003680: block_getrq: 8,16 WM 2232 + 8 [kworker/u16:4]
.
.
.
kworker/u16:4-67 [000] d... 803.003720: block_rq_insert: 8,16 WM 0 () 285223776 + 16 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003721: block_rq_insert: 8,16 WM 0 () 662702080 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003722: block_rq_insert: 8,16 WM 0 () 683673680 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003723: block_rq_insert: 8,16 WM 0 () 730599480 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003724: block_rq_insert: 8,16 WM 0 () 763365384 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003725: block_rq_insert: 8,16 WM 0 () 880805984 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003726: block_rq_insert: 8,16 WM 0 () 1186990872 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003727: block_rq_insert: 8,16 WM 0 () 1187057608 + 8 [kworker/u16:4]
kworker/u16:4-67 [000] d... 803.003729: block_unplug: [kworker/u16:4] 14
The following changes since commit 8177a9d79c0e942dcac3312f15585d0344d505a5:
lseek(fd, n, SEEK_END) does *not* go to eof - n (2013-06-16 08:10:53 -1000)
are available in the git repository at:
git://git.yoctoproject.org/linux-yocto-contrib.git tzanussi/event-triggers-v1
http://git.yoctoproject.org/cgit/cgit.cgi/linux-yocto-contrib/log/?h=tzanussi/event-triggers-v1
Tom Zanussi (11):
tracing: simplify event_enable_read()
tracing: add missing syscall_metadata comment
tracing: add soft disable for syscall events
tracing: fix disabling of soft disable
tracing: add basic event trigger framework
tracing: add 'traceon' and 'traceoff' event trigger commands
tracing: add 'snapshot' event trigger command
tracing: add 'stacktrace' event trigger command
tracing: add 'enable_event' and 'disable_event' event trigger
commands
tracing: add and use generic set_trigger_filter() implementation
tracing: add documentation for trace event triggers
Documentation/trace/events.txt | 207 ++++++
include/linux/ftrace_event.h | 9 +-
include/linux/syscalls.h | 2 +
include/trace/ftrace.h | 18 +-
include/trace/syscall.h | 6 +
kernel/trace/Makefile | 1 +
kernel/trace/trace.c | 9 +
kernel/trace/trace.h | 46 ++
kernel/trace/trace_events.c | 82 ++-
kernel/trace/trace_events_filter.c | 13 +
kernel/trace/trace_events_trigger.c | 1244 +++++++++++++++++++++++++++++++++++
kernel/trace/trace_syscalls.c | 48 +-
12 files changed, 1655 insertions(+), 30 deletions(-)
create mode 100644 kernel/trace/trace_events_trigger.c
--
1.7.11.4
Add 'traceon' and 'traceoff' ftrace_func_command commands. traceon
and traceoff event triggers are added by the user via these commands
in a similar way and using practically the same syntax as the
analagous 'traceon' and 'traceoff' ftrace function commands, but
instead of writing to the set_ftrace_filter file, the traceon and
traceoff triggers are written to the per-event 'trigger' files:
echo 'traceon' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff' > .../tracing/events/somesys/someevent/trigger
The above command will turn tracing on or off whenever someevent is
hit.
This also adds a 'count' version that limits the number of times the
command will be invoked:
echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger
Where N is the number of times the command will be invoked.
The above commands will will turn tracing on or off whenever someevent
is hit, but only N times.
The event_trigger_init() and event_trigger_free() are meant to be
common implementations of the event_trigger_ops init() and free() ops.
Most trigger_ops implementations will use these, but some will
override and possibly reuse them.
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_trigger.c | 310 ++++++++++++++++++++++++++++++++++++
1 file changed, 310 insertions(+)
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 3a1b46b..edc1c01 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -334,7 +334,317 @@ static void unregister_trigger(char *glob, struct event_trigger_ops *ops,
data->ops->free(data->ops, (void **)&data);
}
+int
+event_trigger_print(const char *name, struct seq_file *m,
+ void *data, char *filter_str)
+{
+ long count = (long)data;
+
+ seq_printf(m, "%s", name);
+
+ if (count == -1)
+ seq_puts(m, ":unlimited");
+ else
+ seq_printf(m, ":count=%ld", count);
+
+ if (filter_str)
+ seq_printf(m, " if %s\n", filter_str);
+ else
+ seq_puts(m, "\n");
+
+ return 0;
+}
+
+static int
+event_trigger_init(struct event_trigger_ops *ops, void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ data->ref++;
+ return 0;
+}
+
+static void
+event_trigger_free(struct event_trigger_ops *ops, void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (WARN_ON_ONCE(data->ref <= 0))
+ return;
+
+ data->ref--;
+ if (!data->ref)
+ kfree(data);
+}
+
+static int
+event_trigger_callback(struct event_command *cmd_ops, void *cmd_data,
+ char *glob, char *cmd, char *param, int enabled)
+{
+ struct event_trigger_ops *trigger_ops;
+ struct event_trigger_data *trigger_data;
+ char *trigger = NULL;
+ char *number;
+ int ret;
+
+ if (!enabled)
+ return -EINVAL;
+
+ /* separate the trigger from the filter (t:n [if filter]) */
+ if (param && isdigit(param[0]))
+ trigger = strsep(¶m, " \t");
+
+ mutex_lock(&event_mutex);
+
+ trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
+
+ ret = -ENOMEM;
+ trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
+ if (!trigger_data)
+ goto out;
+
+ trigger_data->count = -1;
+ trigger_data->ops = trigger_ops;
+ INIT_LIST_HEAD(&trigger_data->list);
+
+ if (glob[0] == '!') {
+ cmd_ops->unreg(glob+1, trigger_ops, trigger_data, cmd_data);
+ kfree(trigger_data);
+ ret = 0;
+ goto out;
+ }
+
+ if (trigger) {
+ number = strsep(&trigger, ":");
+
+ ret = -EINVAL;
+ if (!strlen(number))
+ goto out_free;
+
+ /*
+ * We use the callback data field (which is a pointer)
+ * as our counter.
+ */
+ ret = kstrtoul(number, 0, &trigger_data->count);
+ if (ret)
+ goto out_free;
+ }
+
+ if (!param) /* if param is non-empty, it's supposed to be a filter */
+ goto out_reg;
+
+ if (!cmd_ops->set_filter)
+ goto out_reg;
+
+ ret = cmd_ops->set_filter(param, trigger_data, cmd_data);
+ if (ret < 0)
+ goto out_free;
+
+ out_reg:
+ ret = cmd_ops->reg(glob, trigger_ops, trigger_data, cmd_data);
+ /*
+ * The above returns on success the # of functions enabled,
+ * but if it didn't find any functions it returns zero.
+ * Consider no functions a failure too.
+ */
+ if (!ret) {
+ ret = -ENOENT;
+ goto out_free;
+ } else if (ret < 0)
+ goto out_free;
+ ret = 0;
+ out:
+ mutex_unlock(&event_mutex);
+ return ret;
+
+ out_free:
+ kfree(trigger_data);
+ goto out;
+}
+
+static void
+traceon_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (tracing_is_on())
+ return;
+
+ tracing_on();
+}
+
+static void
+traceon_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ traceon_trigger(_data);
+}
+
+static void
+traceoff_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!tracing_is_on())
+ return;
+
+ tracing_off();
+}
+
+static void
+traceoff_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ traceoff_trigger(_data);
+}
+
+static int
+traceon_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ return event_trigger_print("traceon", m, (void *)data->count,
+ data->filter_str);
+}
+
+static int
+traceoff_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ return event_trigger_print("traceoff", m, (void *)data->count,
+ data->filter_str);
+}
+
+static struct event_trigger_ops traceon_trigger_ops = {
+ .func = traceon_trigger,
+ .print = traceon_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops traceon_count_trigger_ops = {
+ .func = traceon_count_trigger,
+ .print = traceon_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops traceoff_trigger_ops = {
+ .func = traceoff_trigger,
+ .print = traceoff_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops traceoff_count_trigger_ops = {
+ .func = traceoff_count_trigger,
+ .print = traceoff_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops *
+onoff_get_trigger_ops(char *cmd, char *param)
+{
+ struct event_trigger_ops *ops;
+
+ /* we register both traceon and traceoff to this callback */
+ if (strcmp(cmd, "traceon") == 0)
+ ops = param ? &traceon_count_trigger_ops :
+ &traceon_trigger_ops;
+ else
+ ops = param ? &traceoff_count_trigger_ops :
+ &traceoff_trigger_ops;
+
+ return ops;
+}
+
+static struct event_command trigger_traceon_cmd = {
+ .name = "traceon",
+ .func = event_trigger_callback,
+ .reg = register_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = onoff_get_trigger_ops,
+};
+
+static struct event_command trigger_traceoff_cmd = {
+ .name = "traceoff",
+ .func = event_trigger_callback,
+ .reg = register_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = onoff_get_trigger_ops,
+};
+
+static __init void unregister_trigger_traceon_traceoff_cmds(void)
+{
+ unregister_event_command(&trigger_traceon_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ unregister_event_command(&trigger_traceoff_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+}
+
+static __init int register_trigger_traceon_traceoff_cmds(void)
+{
+ int ret;
+
+ ret = register_event_command(&trigger_traceon_cmd, &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0))
+ return ret;
+ ret = register_event_command(&trigger_traceoff_cmd, &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0))
+ unregister_trigger_traceon_traceoff_cmds();
+
+ return ret;
+}
+
__init int register_trigger_cmds(void)
{
+ int ret;
+
+ ret = register_trigger_traceon_traceoff_cmds();
+ if (ret) {
+ unregister_trigger_traceon_traceoff_cmds();
+ return ret;
+ }
+
return 0;
}
--
1.7.11.4
Add a 'trigger' file for each trace event, enabling 'trace event
triggers' to be set for trace events.
'trace event triggers' are patterned after the existing 'ftrace
function triggers' implementation except that triggers are written to
per-event 'trigger' files instead of to a single file such as the
'set_ftrace_filter' used for ftrace function triggers.
The implementation is meant to be entirely separate from ftrace
function triggers, in order to keep the respective implementations
relatively simple and to allow them to diverge.
The event trigger functionality is built on top of SOFT_DISABLE
functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
flags which is checked when any trace event fires. Triggers set for a
particular event need to be checked regardless of whether that event
is actually enabled or not - getting an event to fire even if it's not
enabled is essentially what's already implemented by SOFT_DISABLE
mode, so trigger mode directly reuses that. It essentially inherits
the soft disable logic in __ftrace_event_enable_disable() while adding
a bit of logic and trigger reference counting via tm_ref on top of
that. Because the enable_disable code needs to now be invoked from
outside trace_events.c, a wrapper is also added for those usages.
The triggers for an event are actually invoked via a new function,
event_triggers_call(), and code is also added to invoke them for
ftrace_raw_event calls as well as syscall events.
The main part of the patch creates a new trace_events_trigger.c file
to contain the trace event triggers implementation.
The standard open, read, and release file operations are implemented
here.
The open() implementation sets up for the various open modes of the
'trigger' file. It creates and attaches the trigger iterator and sets
up the command parser. If opened for reading set up the trigger
seq_ops.
The read() implementation parses the event trigger written to the
'trigger' file, looks up the trigger command, and passes it along to
that event_command's func() implementation for command-specific
processing.
The release() implementation does whatever cleanup is needed to
release the 'trigger' file, like releasing the parser and trigger
iterator, etc.
A couple of functions for event command registration and
unregistration are added, along with a list to add them to and a mutex
to protect them, as well as an (initially empty) registration function
to add the set of commands that will be added by future commits, and
call to it from the trace event initialization code.
also added are a couple trigger-specific data structures needed for
these implementations such as a trigger iterator and a struct for
trigger-specific data.
A couple structs consisting mostly of function meant to be implemented
in command-specific ways, event_command and event_trigger_ops, are
used by the generic event trigger command implementations. They're
being put into trace.h alongside the other trace_event data structures
and functions, in the expectation that they'll be needed in several
trace_event-related files such as trace_events_trigger.c and
trace_events.c.
The event_command.func() function is meant to be called by the trigger
parsing code in order to add a trigger instance to the corresponding
event. It essentially coordinates adding a live trigger instance to
the event, and arming the triggering the event.
Every event_command func() implementation essentially does the
same thing for any command:
- choose ops - use the value of param to choose either a number or
count version of event_trigger_ops specific to the command
- do the register or unregister of those ops
- associate a filter, if specified, with the triggering event
The reg() and unreg() ops allow command-specific implementations for
event_trigger_op registration and unregistration, and the
get_trigger_ops() op allows command-specific event_trigger_ops
selection to be parameterized. When a trigger instance is added, the
reg() op essentially adds that trigger to the triggering event and
arms it, while unreg() does the opposite. The set_filter() function
is used to associate a filter with the trigger - if the command
doesn't specify a set_filter() implementation, the command will ignore
filters.
The signature of func() adds a pointer to the event_command struct,
used to invoke those functions, along with a command_data param that
can be passed to the reg/unreg functions. This allows func()
implementations to use command-specific blobs and supports code
re-use.
The event_trigger_ops.func() command corrsponds to the trigger 'probe'
function that gets called when the triggering event is actually
invoked. The other functions are used to list the trigger when
needed, along with a couple mundane book-keeping functions.
Some common register/unregister_trigger() implementations of the
event_command reg()/unreg() callbacks are also provided, which add and
remove trigger instances to the per-event list of triggers, and
arm/disarm them as appropriate.
Most event commands will use these, but some will override and
possibly reuse them.
Signed-off-by: Tom Zanussi <[email protected]>
Idea-by: Steve Rostedt <[email protected]>
---
include/linux/ftrace_event.h | 9 +-
include/trace/ftrace.h | 4 +
kernel/trace/Makefile | 1 +
kernel/trace/trace.h | 37 ++++
kernel/trace/trace_events.c | 58 ++++--
kernel/trace/trace_events_trigger.c | 340 ++++++++++++++++++++++++++++++++++++
kernel/trace/trace_syscalls.c | 8 +
7 files changed, 446 insertions(+), 11 deletions(-)
create mode 100644 kernel/trace/trace_events_trigger.c
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4372658..10a5fa4 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -253,6 +253,7 @@ enum {
FTRACE_EVENT_FL_RECORDED_CMD_BIT,
FTRACE_EVENT_FL_SOFT_MODE_BIT,
FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
};
/*
@@ -261,13 +262,15 @@ enum {
* RECORDED_CMD - The comms should be recorded at sched_switch
* SOFT_MODE - The event is enabled/disabled by SOFT_DISABLED
* SOFT_DISABLED - When set, do not trace the event (even though its
- * tracepoint may be enabled)
+ * tracepoint may be enabled)
+ * TRIGGER_MODE - The event is enabled/disabled by SOFT_DISABLED
*/
enum {
FTRACE_EVENT_FL_ENABLED = (1 << FTRACE_EVENT_FL_ENABLED_BIT),
FTRACE_EVENT_FL_RECORDED_CMD = (1 << FTRACE_EVENT_FL_RECORDED_CMD_BIT),
FTRACE_EVENT_FL_SOFT_MODE = (1 << FTRACE_EVENT_FL_SOFT_MODE_BIT),
FTRACE_EVENT_FL_SOFT_DISABLED = (1 << FTRACE_EVENT_FL_SOFT_DISABLED_BIT),
+ FTRACE_EVENT_FL_TRIGGER_MODE = (1 << FTRACE_EVENT_FL_TRIGGER_MODE_BIT),
};
struct ftrace_event_file {
@@ -276,6 +279,7 @@ struct ftrace_event_file {
struct dentry *dir;
struct trace_array *tr;
struct ftrace_subsystem_dir *system;
+ struct list_head triggers;
/*
* 32 bit flags:
@@ -283,6 +287,7 @@ struct ftrace_event_file {
* bit 1: enabled cmd record
* bit 2: enable/disable with the soft disable bit
* bit 3: soft disabled
+ * bit 4: trigger enabled
*
* Note: The bits must be set atomically to prevent races
* from other writers. Reads of flags do not need to be in
@@ -294,6 +299,7 @@ struct ftrace_event_file {
*/
unsigned long flags;
atomic_t sm_ref; /* soft-mode reference counter */
+ atomic_t tm_ref; /* trigger-mode reference counter */
};
#define __TRACE_EVENT_FLAGS(name, value) \
@@ -314,6 +320,7 @@ extern int filter_current_check_discard(struct ring_buffer *buffer,
struct ftrace_event_call *call,
void *rec,
struct ring_buffer_event *event);
+extern void event_triggers_call(struct ftrace_event_file *file);
enum {
FILTER_OTHER = 0,
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 19edd7f..88ac7da 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -522,6 +522,10 @@ ftrace_raw_event_##call(void *__data, proto) \
int __data_size; \
int pc; \
\
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
+ &ftrace_file->flags)) \
+ event_triggers_call(ftrace_file); \
+ \
if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
&ftrace_file->flags)) \
return; \
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index d7e2068..1378e84 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -50,6 +50,7 @@ ifeq ($(CONFIG_PERF_EVENTS),y)
obj-$(CONFIG_EVENT_TRACING) += trace_event_perf.o
endif
obj-$(CONFIG_EVENT_TRACING) += trace_events_filter.o
+obj-$(CONFIG_EVENT_TRACING) += trace_events_trigger.o
obj-$(CONFIG_KPROBE_EVENT) += trace_kprobe.o
obj-$(CONFIG_TRACEPOINTS) += power-traces.o
ifeq ($(CONFIG_PM_RUNTIME),y)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 20572ed..3738d65 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1024,6 +1024,43 @@ extern int event_trace_del_tracer(struct trace_array *tr);
extern struct mutex event_mutex;
extern struct list_head ftrace_events;
+extern const struct file_operations event_trigger_fops;
+
+extern int register_trigger_cmds(void);
+
+struct event_trigger_ops {
+ void (*func)(void **data);
+ int (*init)(struct event_trigger_ops *ops,
+ void **data);
+ void (*free)(struct event_trigger_ops *ops,
+ void **data);
+ int (*print)(struct seq_file *m,
+ struct event_trigger_ops *ops,
+ void *data);
+};
+
+struct event_command {
+ struct list_head list;
+ char *name;
+ int (*func)(struct event_command *cmd_ops,
+ void *cmd_data, char *glob, char *cmd,
+ char *params, int enable);
+ int (*reg)(char *glob,
+ struct event_trigger_ops *trigger_ops,
+ void *trigger_data, void *cmd_data);
+ void (*unreg)(char *glob,
+ struct event_trigger_ops *trigger_ops,
+ void *trigger_data, void *cmd_data);
+ int (*set_filter)(char *filter_str,
+ void *trigger_data,
+ void *cmd_data);
+ struct event_trigger_ops *(*get_trigger_ops)(char *cmd, char *param);
+};
+
+extern int trace_event_enable_disable(struct ftrace_event_file *file,
+ int enable, int soft_disable,
+ int trigger_enable);
+
extern const char *__start___trace_bprintk_fmt[];
extern const char *__stop___trace_bprintk_fmt[];
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f9738dc..1fc1602 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -242,7 +242,8 @@ void trace_event_enable_cmd_record(bool enable)
}
static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
- int enable, int soft_disable)
+ int enable, int soft_disable,
+ int trigger_enable)
{
struct ftrace_event_call *call = file->event_call;
int ret = 0;
@@ -263,7 +264,13 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
* we do nothing. Do not disable the tracepoint, otherwise
* "soft enable"s (clearing the SOFT_DISABLED bit) wont work.
*/
- if (soft_disable) {
+ if (trigger_enable) {
+ if (atomic_dec_return(&file->tm_ref) > 0)
+ break;
+ clear_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
+ ret = __ftrace_event_enable_disable(file, enable, 1, 0);
+ break;
+ } else if (soft_disable) {
if (atomic_dec_return(&file->sm_ref) > 0)
break;
disable = file->flags & FTRACE_EVENT_FL_SOFT_DISABLED;
@@ -279,7 +286,7 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
}
call->class->reg(call, TRACE_REG_UNREGISTER, file);
}
- /* If in SOFT_MODE, just set the SOFT_DISABLE_BIT */
+ /* If in SOFT_MODE, just set the SOFT_DISABLE_BIT, else clear it */
if (file->flags & FTRACE_EVENT_FL_SOFT_MODE)
set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags);
else
@@ -294,7 +301,13 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
* set SOFT_DISABLED before enabling the event tracepoint, so
* it still seems to be disabled.
*/
- if (!soft_disable)
+ if (trigger_enable) {
+ if (atomic_inc_return(&file->tm_ref) > 1)
+ break;
+ set_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
+ ret = __ftrace_event_enable_disable(file, enable, 1, 0);
+ break;
+ } else if (!soft_disable)
clear_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags);
else {
if (atomic_inc_return(&file->sm_ref) > 1)
@@ -330,10 +343,18 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
return ret;
}
+int trace_event_enable_disable(struct ftrace_event_file *file,
+ int enable, int soft_disable,
+ int trigger_enable)
+{
+ return __ftrace_event_enable_disable(file, enable, soft_disable,
+ trigger_enable);
+}
+
static int ftrace_event_enable_disable(struct ftrace_event_file *file,
int enable)
{
- return __ftrace_event_enable_disable(file, enable, 0);
+ return __ftrace_event_enable_disable(file, enable, 0, 0);
}
static void ftrace_clear_events(struct trace_array *tr)
@@ -1380,6 +1401,7 @@ event_create_dir(struct dentry *parent,
const struct file_operations *id,
const struct file_operations *enable,
const struct file_operations *filter,
+ const struct file_operations *trigger,
const struct file_operations *format)
{
struct ftrace_event_call *call = file->event_call;
@@ -1432,6 +1454,9 @@ event_create_dir(struct dentry *parent,
trace_create_file("filter", 0644, file->dir, call,
filter);
+ trace_create_file("trigger", 0644, file->dir, file,
+ trigger);
+
trace_create_file("format", 0444, file->dir, call,
format);
@@ -1544,6 +1569,8 @@ trace_create_new_event(struct ftrace_event_call *call,
file->event_call = call;
file->tr = tr;
atomic_set(&file->sm_ref, 0);
+ atomic_set(&file->tm_ref, 0);
+ INIT_LIST_HEAD(&file->triggers);
list_add(&file->list, &tr->events);
return file;
@@ -1556,6 +1583,7 @@ __trace_add_new_event(struct ftrace_event_call *call,
const struct file_operations *id,
const struct file_operations *enable,
const struct file_operations *filter,
+ const struct file_operations *trigger,
const struct file_operations *format)
{
struct ftrace_event_file *file;
@@ -1564,7 +1592,7 @@ __trace_add_new_event(struct ftrace_event_call *call,
if (!file)
return -ENOMEM;
- return event_create_dir(tr->event_dir, file, id, enable, filter, format);
+ return event_create_dir(tr->event_dir, file, id, enable, filter, trigger, format);
}
/*
@@ -1643,6 +1671,7 @@ struct ftrace_module_file_ops {
struct file_operations enable;
struct file_operations format;
struct file_operations filter;
+ struct file_operations trigger;
};
static struct ftrace_module_file_ops *
@@ -1692,6 +1721,9 @@ trace_create_file_ops(struct module *mod)
file_ops->filter = ftrace_event_filter_fops;
file_ops->filter.owner = mod;
+ file_ops->trigger = event_trigger_fops;
+ file_ops->trigger.owner = mod;
+
file_ops->format = ftrace_event_format_fops;
file_ops->format.owner = mod;
@@ -1785,7 +1817,8 @@ __trace_add_new_mod_event(struct ftrace_event_call *call,
{
return __trace_add_new_event(call, tr,
&file_ops->id, &file_ops->enable,
- &file_ops->filter, &file_ops->format);
+ &file_ops->filter, &file_ops->filter,
+ &file_ops->format);
}
#else
@@ -1838,6 +1871,7 @@ __trace_add_event_dirs(struct trace_array *tr)
&ftrace_event_id_fops,
&ftrace_enable_fops,
&ftrace_event_filter_fops,
+ &event_trigger_fops,
&ftrace_event_format_fops);
if (ret < 0)
pr_warning("Could not create directory for event %s\n",
@@ -1963,7 +1997,7 @@ event_enable_free(struct ftrace_probe_ops *ops, unsigned long ip,
data->ref--;
if (!data->ref) {
/* Remove the SOFT_MODE flag */
- __ftrace_event_enable_disable(data->file, 0, 1);
+ __ftrace_event_enable_disable(data->file, 0, 1, 0);
module_put(data->file->event_call->mod);
kfree(data);
}
@@ -2079,7 +2113,7 @@ event_enable_func(struct ftrace_hash *hash,
goto out_free;
}
- ret = __ftrace_event_enable_disable(file, 1, 1);
+ ret = __ftrace_event_enable_disable(file, 1, 1, 0);
if (ret < 0)
goto out_put;
ret = register_ftrace_function_probe(glob, ops, data);
@@ -2100,7 +2134,7 @@ event_enable_func(struct ftrace_hash *hash,
return ret;
out_disable:
- __ftrace_event_enable_disable(file, 0, 1);
+ __ftrace_event_enable_disable(file, 0, 1, 0);
out_put:
module_put(file->event_call->mod);
out_free:
@@ -2153,6 +2187,7 @@ __trace_early_add_event_dirs(struct trace_array *tr)
&ftrace_event_id_fops,
&ftrace_enable_fops,
&ftrace_event_filter_fops,
+ &event_trigger_fops,
&ftrace_event_format_fops);
if (ret < 0)
pr_warning("Could not create directory for event %s\n",
@@ -2212,6 +2247,7 @@ __add_event_to_tracers(struct ftrace_event_call *call,
&ftrace_event_id_fops,
&ftrace_enable_fops,
&ftrace_event_filter_fops,
+ &event_trigger_fops,
&ftrace_event_format_fops);
}
}
@@ -2396,6 +2432,8 @@ static __init int event_trace_enable(void)
register_event_cmds();
+ register_trigger_cmds();
+
return 0;
}
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
new file mode 100644
index 0000000..3a1b46b
--- /dev/null
+++ b/kernel/trace/trace_events_trigger.c
@@ -0,0 +1,340 @@
+/*
+ * trace_events_trigger - trace event triggers
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2013 Tom Zanussi <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/ctype.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+
+#include "trace.h"
+
+static LIST_HEAD(trigger_commands);
+static DEFINE_MUTEX(trigger_cmd_mutex);
+
+struct event_trigger_data {
+ struct ftrace_event_file *file;
+ unsigned long count;
+ int ref;
+ bool enable;
+ struct event_trigger_ops *ops;
+ struct event_filter *filter;
+ char *filter_str;
+ struct list_head list;
+};
+
+struct trigger_iterator {
+ struct ftrace_event_file *file;
+};
+
+void event_triggers_call(struct ftrace_event_file *file)
+{
+ struct event_trigger_data *data;
+
+ if (list_empty(&file->triggers))
+ return;
+
+ preempt_disable_notrace();
+ list_for_each_entry_rcu(data, &file->triggers, list)
+ data->ops->func((void **)&data);
+ preempt_enable_notrace();
+}
+EXPORT_SYMBOL_GPL(event_triggers_call);
+
+static void *trigger_next(struct seq_file *m, void *t, loff_t *pos)
+{
+ struct trigger_iterator *iter = m->private;
+
+ return seq_list_next(t, &iter->file->triggers, pos);
+}
+
+static void *trigger_start(struct seq_file *m, loff_t *pos)
+{
+ struct trigger_iterator *iter = m->private;
+
+ mutex_lock(&event_mutex);
+
+ return seq_list_start(&iter->file->triggers, *pos);
+}
+
+static void trigger_stop(struct seq_file *m, void *t)
+{
+ mutex_unlock(&event_mutex);
+}
+
+static int trigger_show(struct seq_file *m, void *v)
+{
+ struct event_trigger_data *data;
+
+ data = list_entry(v, struct event_trigger_data, list);
+ data->ops->print(m, data->ops, data);
+
+ return 0;
+}
+
+static const struct seq_operations event_triggers_seq_ops = {
+ .start = trigger_start,
+ .next = trigger_next,
+ .stop = trigger_stop,
+ .show = trigger_show,
+};
+
+static int event_trigger_regex_open(struct inode *inode, struct file *file)
+{
+ struct trigger_iterator *iter;
+ int ret = 0;
+
+ iter = kzalloc(sizeof(*iter), GFP_KERNEL);
+ if (!iter)
+ return -ENOMEM;
+
+ iter->file = inode->i_private;
+
+ mutex_lock(&event_mutex);
+
+ if (file->f_mode & FMODE_READ) {
+ ret = seq_open(file, &event_triggers_seq_ops);
+ if (!ret) {
+ struct seq_file *m = file->private_data;
+ m->private = iter;
+ } else {
+ /* Failed */
+ kfree(iter);
+ }
+ } else
+ file->private_data = iter;
+
+ mutex_unlock(&event_mutex);
+
+ return ret;
+}
+
+static int trigger_process_regex(struct trigger_iterator *iter,
+ char *buff, int enable)
+{
+ struct event_command *p;
+ char *command, *next = buff;
+ int ret = -EINVAL;
+
+ command = strsep(&next, ": \t");
+ command = (command[0] != '!') ? command : command + 1;
+
+ mutex_lock(&trigger_cmd_mutex);
+ list_for_each_entry(p, &trigger_commands, list) {
+ if (strcmp(p->name, command) == 0) {
+ ret = p->func(p, iter, buff, command, next, enable);
+ goto out_unlock;
+ }
+ }
+ out_unlock:
+ mutex_unlock(&trigger_cmd_mutex);
+
+ return ret;
+}
+
+static ssize_t event_trigger_regex_write(struct file *file,
+ const char __user *ubuf,
+ size_t cnt, loff_t *ppos, int enable)
+{
+ struct trigger_iterator *iter = file->private_data;
+ ssize_t ret;
+ char *buf;
+
+ if (!cnt)
+ return 0;
+
+ if (cnt >= PAGE_SIZE)
+ return -EINVAL;
+
+ if (file->f_mode & FMODE_READ) {
+ struct seq_file *m = file->private_data;
+ iter = m->private;
+ } else
+ iter = file->private_data;
+
+ buf = (char *)__get_free_page(GFP_TEMPORARY);
+ if (!buf)
+ return -ENOMEM;
+
+ if (copy_from_user(buf, ubuf, cnt)) {
+ free_page((unsigned long) buf);
+ return -EFAULT;
+ }
+ buf[cnt] = '\0';
+ strim(buf);
+
+ ret = trigger_process_regex(iter, buf, enable);
+
+ free_page((unsigned long) buf);
+ if (ret < 0)
+ goto out;
+
+ *ppos += cnt;
+ ret = cnt;
+ out:
+ return ret;
+}
+
+static int event_trigger_regex_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *m = (struct seq_file *)file->private_data;
+ struct trigger_iterator *iter;
+
+ mutex_lock(&event_mutex);
+
+ if (file->f_mode & FMODE_READ) {
+ iter = m->private;
+
+ seq_release(inode, file);
+ } else
+ iter = file->private_data;
+
+ kfree(iter);
+
+ mutex_unlock(&event_mutex);
+
+ return 0;
+}
+
+static ssize_t
+event_trigger_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ return event_trigger_regex_write(filp, ubuf, cnt, ppos, 1);
+}
+
+static int
+event_trigger_open(struct inode *inode, struct file *filp)
+{
+ return event_trigger_regex_open(inode, filp);
+}
+
+static int
+event_trigger_release(struct inode *inode, struct file *file)
+{
+ return event_trigger_regex_release(inode, file);
+}
+
+const struct file_operations event_trigger_fops = {
+ .open = event_trigger_open,
+ .read = seq_read,
+ .write = event_trigger_write,
+ .llseek = ftrace_filter_lseek,
+ .release = event_trigger_release,
+};
+
+static int register_event_command(struct event_command *cmd,
+ struct list_head *cmd_list,
+ struct mutex *cmd_list_mutex)
+{
+ struct event_command *p;
+ int ret = 0;
+
+ mutex_lock(cmd_list_mutex);
+ list_for_each_entry(p, cmd_list, list) {
+ if (strcmp(cmd->name, p->name) == 0) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+ }
+ list_add(&cmd->list, cmd_list);
+ out_unlock:
+ mutex_unlock(cmd_list_mutex);
+
+ return ret;
+}
+
+static int unregister_event_command(struct event_command *cmd,
+ struct list_head *cmd_list,
+ struct mutex *cmd_list_mutex)
+{
+ struct event_command *p, *n;
+ int ret = -ENODEV;
+
+ mutex_lock(cmd_list_mutex);
+ list_for_each_entry_safe(p, n, cmd_list, list) {
+ if (strcmp(cmd->name, p->name) == 0) {
+ ret = 0;
+ list_del_init(&p->list);
+ goto out_unlock;
+ }
+ }
+ out_unlock:
+ mutex_unlock(cmd_list_mutex);
+
+ return ret;
+}
+
+static int register_trigger(char *glob, struct event_trigger_ops *ops,
+ void *trigger_data, void *cmd_data)
+{
+ struct trigger_iterator *iter = cmd_data;
+ struct event_trigger_data *data = trigger_data;
+ struct event_trigger_data *test;
+ int ret = 0;
+
+ list_for_each_entry_rcu(test, &iter->file->triggers, list) {
+ if (test->ops == data->ops) {
+ ret = -EEXIST;
+ goto out;
+ }
+ }
+
+ if (data->ops->init) {
+ ret = data->ops->init(data->ops, (void **)&data);
+ if (ret < 0)
+ goto out;
+ }
+
+ list_add_rcu(&data->list, &iter->file->triggers);
+ ret++;
+
+ if (trace_event_enable_disable(iter->file, 1, 0, 1) < 0) {
+ list_del_rcu(&data->list);
+ ret--;
+ }
+out:
+ return ret;
+}
+
+static void unregister_trigger(char *glob, struct event_trigger_ops *ops,
+ void *trigger_data, void *cmd_data)
+{
+ struct trigger_iterator *iter = cmd_data;
+ struct event_trigger_data *test = trigger_data;
+ struct event_trigger_data *data;
+ bool unregistered = false;
+
+ list_for_each_entry_rcu(data, &iter->file->triggers, list) {
+ if (data->ops == test->ops) {
+ unregistered = true;
+ list_del_rcu(&data->list);
+ trace_event_enable_disable(iter->file, 0, 0, 1);
+ break;
+ }
+ }
+
+ if (unregistered && data->ops->free)
+ data->ops->free(data->ops, (void **)&data);
+}
+
+__init int register_trigger_cmds(void)
+{
+ return 0;
+}
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 1d81881..e287011 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -319,6 +319,10 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
if (!sys_data)
return;
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
+ &sys_data->enter_file->flags))
+ event_triggers_call(sys_data->enter_file);
+
if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
&sys_data->enter_file->flags))
return;
@@ -359,6 +363,10 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
if (!sys_data)
return;
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
+ &sys_data->exit_file->flags))
+ event_triggers_call(sys_data->exit_file);
+
if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
&sys_data->exit_file->flags))
return;
--
1.7.11.4
Rather than enumerating each permutation, build the enable state
string up from the combination of states. This also allows for the
simpler addition of more states.
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 27963e2..ecb2609 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -624,17 +624,17 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
loff_t *ppos)
{
struct ftrace_event_file *file = filp->private_data;
- char *buf;
+ char buf[4] = "0";
- if (file->flags & FTRACE_EVENT_FL_ENABLED) {
- if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED)
- buf = "0*\n";
- else if (file->flags & FTRACE_EVENT_FL_SOFT_MODE)
- buf = "1*\n";
- else
- buf = "1\n";
- } else
- buf = "0\n";
+ if (file->flags & FTRACE_EVENT_FL_ENABLED &&
+ !(file->flags & FTRACE_EVENT_FL_SOFT_DISABLED))
+ strcpy(buf, "1");
+
+ if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
+ file->flags & FTRACE_EVENT_FL_SOFT_MODE)
+ strcat(buf, "*");
+
+ strcat(buf, "\n");
return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf));
}
--
1.7.11.4
Add support for SOFT_DISABLE to syscall events.
The original SOFT_DISABLE patches didn't add support for soft disable
of syscall events; this adds it and paves the way for future patches
allowing triggers to be added to syscall events, since triggers are
built on top of SOFT_DISABLE.
Because the trigger and SOFT_DISABLE bits are attached to the
ftrace_event_file associated with the event, pointers to the
ftrace_event_files associated with the event are added to the syscall
metadata entry for the event.
Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/syscalls.h | 2 ++
include/trace/syscall.h | 5 +++++
kernel/trace/trace_syscalls.c | 28 ++++++++++++++++++++++++----
3 files changed, 31 insertions(+), 4 deletions(-)
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 4147d70..b4c2afa 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -158,6 +158,8 @@ extern struct trace_event_functions exit_syscall_print_funcs;
.args = nb ? args_##sname : NULL, \
.enter_event = &event_enter_##sname, \
.exit_event = &event_exit_##sname, \
+ .enter_file = NULL, /* Filled in at boot */ \
+ .exit_file = NULL, /* Filled in at boot */ \
.enter_fields = LIST_HEAD_INIT(__syscall_meta_##sname.enter_fields), \
}; \
static struct syscall_metadata __used \
diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index fed853f..ba24d3a 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -19,6 +19,8 @@
* @enter_fields: list of fields for syscall_enter trace event
* @enter_event: associated syscall_enter trace event
* @exit_event: associated syscall_exit trace event
+ * @enter_file: associated syscall_enter ftrace event file
+ * @exit_file: associated syscall_exit ftrace event file
*/
struct syscall_metadata {
const char *name;
@@ -30,6 +32,9 @@ struct syscall_metadata {
struct ftrace_event_call *enter_event;
struct ftrace_event_call *exit_event;
+
+ struct ftrace_event_file *enter_file;
+ struct ftrace_event_file *exit_file;
};
#endif /* _TRACE_SYSCALL_H */
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8f2ac73..1d81881 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -319,6 +319,10 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
if (!sys_data)
return;
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ &sys_data->enter_file->flags))
+ return;
+
size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
buffer = tr->trace_buffer.buffer;
@@ -355,6 +359,10 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
if (!sys_data)
return;
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ &sys_data->exit_file->flags))
+ return;
+
buffer = tr->trace_buffer.buffer;
event = trace_buffer_lock_reserve(buffer,
sys_data->exit_event->event.type, sizeof(*entry), 0, 0);
@@ -374,10 +382,12 @@ static int reg_event_syscall_enter(struct ftrace_event_file *file,
struct ftrace_event_call *call)
{
struct trace_array *tr = file->tr;
+ struct syscall_metadata *meta;
int ret = 0;
int num;
- num = ((struct syscall_metadata *)call->data)->syscall_nr;
+ meta = (struct syscall_metadata *)call->data;
+ num = meta->syscall_nr;
if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return -ENOSYS;
mutex_lock(&syscall_trace_lock);
@@ -386,6 +396,7 @@ static int reg_event_syscall_enter(struct ftrace_event_file *file,
if (!ret) {
set_bit(num, tr->enabled_enter_syscalls);
tr->sys_refcount_enter++;
+ meta->enter_file = file;
}
mutex_unlock(&syscall_trace_lock);
return ret;
@@ -395,14 +406,17 @@ static void unreg_event_syscall_enter(struct ftrace_event_file *file,
struct ftrace_event_call *call)
{
struct trace_array *tr = file->tr;
+ struct syscall_metadata *meta;
int num;
- num = ((struct syscall_metadata *)call->data)->syscall_nr;
+ meta = (struct syscall_metadata *)call->data;
+ num = meta->syscall_nr;
if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return;
mutex_lock(&syscall_trace_lock);
tr->sys_refcount_enter--;
clear_bit(num, tr->enabled_enter_syscalls);
+ meta->enter_file = NULL;
if (!tr->sys_refcount_enter)
unregister_trace_sys_enter(ftrace_syscall_enter, tr);
mutex_unlock(&syscall_trace_lock);
@@ -412,10 +426,12 @@ static int reg_event_syscall_exit(struct ftrace_event_file *file,
struct ftrace_event_call *call)
{
struct trace_array *tr = file->tr;
+ struct syscall_metadata *meta;
int ret = 0;
int num;
- num = ((struct syscall_metadata *)call->data)->syscall_nr;
+ meta = (struct syscall_metadata *)call->data;
+ num = meta->syscall_nr;
if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return -ENOSYS;
mutex_lock(&syscall_trace_lock);
@@ -424,6 +440,7 @@ static int reg_event_syscall_exit(struct ftrace_event_file *file,
if (!ret) {
set_bit(num, tr->enabled_exit_syscalls);
tr->sys_refcount_exit++;
+ meta->exit_file = file;
}
mutex_unlock(&syscall_trace_lock);
return ret;
@@ -433,14 +450,17 @@ static void unreg_event_syscall_exit(struct ftrace_event_file *file,
struct ftrace_event_call *call)
{
struct trace_array *tr = file->tr;
+ struct syscall_metadata *meta;
int num;
- num = ((struct syscall_metadata *)call->data)->syscall_nr;
+ meta = (struct syscall_metadata *)call->data;
+ num = meta->syscall_nr;
if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
return;
mutex_lock(&syscall_trace_lock);
tr->sys_refcount_exit--;
clear_bit(num, tr->enabled_exit_syscalls);
+ meta->exit_file = NULL;
if (!tr->sys_refcount_exit)
unregister_trace_sys_exit(ftrace_syscall_exit, tr);
mutex_unlock(&syscall_trace_lock);
--
1.7.11.4
Add 'stacktrace' ftrace_func_command. stacktrace event triggers are
added by the user via this command in a similar way and using
practically the same syntax as the analogous 'stacktrace' ftrace
function command, but instead of writing to the set_ftrace_filter
file, the stacktrace event trigger is written to the per-event
'trigger' files:
echo 'stacktrace' > .../tracing/events/somesys/someevent/trigger
The above command will turn on stacktraces for someevent i.e. whenever
someevent is hit, a stacktrace will be logged.
This also adds a 'count' version that limits the number of times the
command will be invoked:
echo 'stacktrace:N' > .../tracing/events/somesys/someevent/trigger
Where N is the number of times the command will be invoked.
The above command will log N stacktraces for someevent i.e. whenever
someevent is hit N times, a stacktrace will be logged.
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_trigger.c | 88 +++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 3e8dfe1..ee456f3 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -690,6 +690,83 @@ static struct event_command trigger_snapshot_cmd = {
.get_trigger_ops = snapshot_get_trigger_ops,
};
+/*
+ * Skip 4:
+ * ftrace_stacktrace()
+ * function_trace_probe_call()
+ * ftrace_ops_list_func()
+ * ftrace_call()
+ */
+#define STACK_SKIP 4
+
+static void
+stacktrace_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ trace_dump_stack(STACK_SKIP);
+}
+
+static void
+stacktrace_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ stacktrace_trigger(_data);
+}
+
+static int
+stacktrace_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ return event_trigger_print("stacktrace", m, (void *)data->count,
+ data->filter_str);
+}
+
+static struct event_trigger_ops stacktrace_trigger_ops = {
+ .func = stacktrace_trigger,
+ .print = stacktrace_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops stacktrace_count_trigger_ops = {
+ .func = stacktrace_count_trigger,
+ .print = stacktrace_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops *
+stacktrace_get_trigger_ops(char *cmd, char *param)
+{
+ return param ? &stacktrace_count_trigger_ops : &stacktrace_trigger_ops;
+}
+
+static struct event_command trigger_stacktrace_cmd = {
+ .name = "stacktrace",
+ .func = event_trigger_callback,
+ .reg = register_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = stacktrace_get_trigger_ops,
+};
+
static __init void unregister_trigger_traceon_traceoff_cmds(void)
{
unregister_event_command(&trigger_traceon_cmd,
@@ -734,5 +811,16 @@ __init int register_trigger_cmds(void)
return ret;
}
+ ret = register_event_command(&trigger_stacktrace_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0)) {
+ unregister_trigger_traceon_traceoff_cmds();
+ unregister_event_command(&trigger_snapshot_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ return ret;
+ }
+
return 0;
}
--
1.7.11.4
Add 'snapshot' ftrace_func_command. snapshot event triggers are added
by the user via this command in a similar way and using practically
the same syntax as the analogous 'snapshot' ftrace function command,
but instead of writing to the set_ftrace_filter file, the snapshot
event trigger is written to the per-event 'trigger' files:
echo 'snapshot' > .../somesys/someevent/trigger
The above command will turn on snapshots for someevent i.e. whenever
someevent is hit, a snapshot will be done.
This also adds a 'count' version that limits the number of times the
command will be invoked:
echo 'snapshot:N' > .../somesys/someevent/trigger
Where N is the number of times the command will be invoked.
The above command will snapshot N times for someevent i.e. whenever
someevent is hit N times, a snapshot will be done.
Also adds a new ftrace_alloc_snapshot() function - the ftrace snapshot
command defines code that allocates a snapshot, which would be nice to
be able to reuse, which this does.
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace.c | 9 ++++
kernel/trace/trace.h | 1 +
kernel/trace/trace_events_trigger.c | 88 +++++++++++++++++++++++++++++++++++++
3 files changed, 98 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index e71a8be..96f3cdc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5169,6 +5169,15 @@ static const struct file_operations tracing_dyn_info_fops = {
};
#endif /* CONFIG_DYNAMIC_FTRACE */
+#if defined(CONFIG_TRACER_SNAPSHOT)
+int ftrace_alloc_snapshot(void)
+{
+ return alloc_snapshot(&global_trace);
+}
+#else
+int ftrace_alloc_snapshot(void) { return -ENOSYS; }
+#endif
+
#if defined(CONFIG_TRACER_SNAPSHOT) && defined(CONFIG_DYNAMIC_FTRACE)
static void
ftrace_snapshot(unsigned long ip, unsigned long parent_ip, void **data)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3738d65..7df20e8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1060,6 +1060,7 @@ struct event_command {
extern int trace_event_enable_disable(struct ftrace_event_file *file,
int enable, int soft_disable,
int trigger_enable);
+extern int ftrace_alloc_snapshot(void);
extern const char *__start___trace_bprintk_fmt[];
extern const char *__stop___trace_bprintk_fmt[];
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index edc1c01..3e8dfe1 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -610,6 +610,86 @@ static struct event_command trigger_traceoff_cmd = {
.get_trigger_ops = onoff_get_trigger_ops,
};
+static void
+snapshot_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ tracing_snapshot();
+}
+
+static void
+snapshot_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ snapshot_trigger(_data);
+}
+
+static int
+register_snapshot_trigger(char *glob, struct event_trigger_ops *ops,
+ void *data, void *cmd_data)
+{
+ int ret = register_trigger(glob, ops, data, cmd_data);
+
+ if (ret > 0)
+ ftrace_alloc_snapshot();
+
+ return ret;
+}
+
+static int
+snapshot_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ return event_trigger_print("snapshot", m, (void *)data->count,
+ data->filter_str);
+}
+
+static struct event_trigger_ops snapshot_trigger_ops = {
+ .func = snapshot_trigger,
+ .print = snapshot_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops snapshot_count_trigger_ops = {
+ .func = snapshot_count_trigger,
+ .print = snapshot_trigger_print,
+ .init = event_trigger_init,
+ .free = event_trigger_free,
+};
+
+static struct event_trigger_ops *
+snapshot_get_trigger_ops(char *cmd, char *param)
+{
+ return param ? &snapshot_count_trigger_ops : &snapshot_trigger_ops;
+}
+
+static struct event_command trigger_snapshot_cmd = {
+ .name = "snapshot",
+ .func = event_trigger_callback,
+ .reg = register_snapshot_trigger,
+ .unreg = unregister_trigger,
+ .get_trigger_ops = snapshot_get_trigger_ops,
+};
+
static __init void unregister_trigger_traceon_traceoff_cmds(void)
{
unregister_event_command(&trigger_traceon_cmd,
@@ -646,5 +726,13 @@ __init int register_trigger_cmds(void)
return ret;
}
+ ret = register_event_command(&trigger_snapshot_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0)) {
+ unregister_trigger_traceon_traceoff_cmds();
+ return ret;
+ }
+
return 0;
}
--
1.7.11.4
The comment on the soft disable 'disable' case of
__ftrace_event_enable_disable() states that the soft disable bit
should be cleared in that case, but currently only the soft mode bit
is actually cleared.
This essentially leaves the standard non-soft-enable enable/disable
paths as the only way to clear the soft disable flag, but the soft
disable bit should also be cleared when removing a trigger with '!'.
Also, the SOFT_DISABLED bit should never be set if SOFT_MODE is
cleared.
This fixes the above discrepancies.
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index ecb2609..f9738dc 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -282,6 +282,8 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
/* If in SOFT_MODE, just set the SOFT_DISABLE_BIT */
if (file->flags & FTRACE_EVENT_FL_SOFT_MODE)
set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags);
+ else
+ clear_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags);
break;
case 1:
/*
--
1.7.11.4
Add a generic event_command.set_trigger_filter() op implementation and
have the current set of trigger commands use it - this essentially
gives them all support for filters.
Syntactically, filters are supported by adding 'if <filter>' just
after the command, in which case only events matching the filter will
invoke the trigger. For example, to add a filter to an
enable/disable_event command:
echo 'enable_event:system:event if common_pid == 999' > \
.../othersys/otherevent/trigger
The above command will only enable the system:event event if the
common_pid field in the othersys:otherevent event is 999.
As another example, to add a filter to a stacktrace command:
echo 'stacktrace if common_pid == 999' > \
.../somesys/someevent/trigger
The above command will only trigger a stacktrace if the common_pid
field in the event is 999.
The filter syntax is the same as that described in the 'Event
filtering' section of Documentation/trace/events.txt.
Because triggers can now use filters, the trigger-invoking logic needs
to be moved - for ftrace_raw_event_calls, trigger invocation now needs
to happen after the { assign; } part of the call.
Also, because triggers need to be invoked even for soft-disabled
events, the SOFT_DISABLED check and return needs to be moved from the
top of the call to a point following the trigger check, which means
that soft-disabled events actually get discarded instead of simply
skipped.
There's also a bit of trickiness in that the triggers need to avoid
being invoked while an event is currently in the process of being
logged, since the trigger may itself log data into the trace buffer.
Thus we make sure the current event is committed before invoking
triggers.
The syscall event invocation code is also changed in analogous ways.
Because event triggers need to be able to create and free filters,
this also adds a couple external wrappers for the existing
create_filter and free_filter functions, which are too generic to be
made extern functions themselves.
Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ftrace_event.h | 2 +-
include/trace/ftrace.h | 22 +++++++++-----
kernel/trace/trace.h | 4 +++
kernel/trace/trace_events_filter.c | 13 ++++++++
kernel/trace/trace_events_trigger.c | 59 +++++++++++++++++++++++++++++++++++--
kernel/trace/trace_syscalls.c | 44 +++++++++++++++++----------
6 files changed, 117 insertions(+), 27 deletions(-)
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 10a5fa4..ebaa48a 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -320,7 +320,7 @@ extern int filter_current_check_discard(struct ring_buffer *buffer,
struct ftrace_event_call *call,
void *rec,
struct ring_buffer_event *event);
-extern void event_triggers_call(struct ftrace_event_file *file);
+extern void event_triggers_call(struct ftrace_event_file *file, void *rec);
enum {
FILTER_OTHER = 0,
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 88ac7da..7c5627f 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -522,14 +522,6 @@ ftrace_raw_event_##call(void *__data, proto) \
int __data_size; \
int pc; \
\
- if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
- &ftrace_file->flags)) \
- event_triggers_call(ftrace_file); \
- \
- if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
- &ftrace_file->flags)) \
- return; \
- \
local_save_flags(irq_flags); \
pc = preempt_count(); \
\
@@ -547,8 +539,22 @@ ftrace_raw_event_##call(void *__data, proto) \
\
{ assign; } \
\
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
+ &ftrace_file->flags)) { \
+ ring_buffer_discard_commit(buffer, event); \
+ \
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
+ &ftrace_file->flags)) \
+ event_triggers_call(ftrace_file, entry); \
+ return; \
+ } \
+ \
if (!filter_current_check_discard(buffer, event_call, entry, event)) \
trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
+ \
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
+ &ftrace_file->flags)) \
+ event_triggers_call(ftrace_file, entry); \
}
/*
* The ftrace_test_probe is compiled out, it is only here as a build time check
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 94b33ec..36d03bd 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -999,6 +999,10 @@ extern int apply_subsystem_event_filter(struct ftrace_subsystem_dir *dir,
extern void print_subsystem_event_filter(struct event_subsystem *system,
struct trace_seq *s);
extern int filter_assign_type(const char *type);
+extern int create_event_filter(struct ftrace_event_call *call,
+ char *filter_str, bool set_str,
+ struct event_filter **filterp);
+extern void free_event_filter(struct event_filter *filter);
struct ftrace_event_field *
trace_find_event_field(struct ftrace_event_call *call, char *name);
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index e1b653f..06fa00c 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -777,6 +777,11 @@ static void __free_filter(struct event_filter *filter)
kfree(filter);
}
+void free_event_filter(struct event_filter *filter)
+{
+ __free_filter(filter);
+}
+
/*
* Called when destroying the ftrace_event_call.
* The call is being freed, so we do not need to worry about
@@ -1802,6 +1807,14 @@ static int create_filter(struct ftrace_event_call *call,
return err;
}
+int create_event_filter(struct ftrace_event_call *call,
+ char *filter_str, bool set_str,
+ struct event_filter **filterp)
+{
+ return create_filter(call, filter_str, set_str, filterp);
+}
+
+
/**
* create_system_filter - create a filter for an event_subsystem
* @system: event_subsystem to create a filter for
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 9a2daa75..a972c82 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -43,7 +43,7 @@ struct trigger_iterator {
struct ftrace_event_file *file;
};
-void event_triggers_call(struct ftrace_event_file *file)
+void event_triggers_call(struct ftrace_event_file *file, void *rec)
{
struct event_trigger_data *data;
@@ -51,8 +51,11 @@ void event_triggers_call(struct ftrace_event_file *file)
return;
preempt_disable_notrace();
- list_for_each_entry_rcu(data, &file->triggers, list)
+ list_for_each_entry_rcu(data, &file->triggers, list) {
+ if (data->filter && !filter_match_preds(data->filter, rec))
+ continue;
data->ops->func((void **)&data);
+ }
preempt_enable_notrace();
}
EXPORT_SYMBOL_GPL(event_triggers_call);
@@ -379,6 +382,52 @@ event_trigger_free(struct event_trigger_ops *ops, void **_data)
kfree(data);
}
+static int set_trigger_filter(char *filter_str, void *trigger_data,
+ void *cmd_data)
+{
+ struct trigger_iterator *iter = cmd_data;
+ struct event_trigger_data *data = trigger_data;
+ struct event_filter *filter, *tmp;
+ int ret = -EINVAL;
+ char *s;
+
+ s = strsep(&filter_str, " \t");
+
+ if (!strlen(s) || strcmp(s, "if") != 0)
+ goto out;
+
+ if (!filter_str)
+ goto out;
+
+ /* The filter is for the 'trigger' event, not the triggered event */
+ ret = create_event_filter(iter->file->event_call,
+ filter_str, false, &filter);
+ if (ret)
+ goto out;
+
+ tmp = data->filter;
+
+ rcu_assign_pointer(data->filter, filter);
+
+ if (tmp) {
+ /* Make sure the call is done with the filter */
+ synchronize_sched();
+ free_event_filter(tmp);
+ }
+
+ kfree(data->filter_str);
+
+ data->filter_str = kstrdup(filter_str, GFP_KERNEL);
+ if (!data->filter_str) {
+ free_event_filter(data->filter);
+ data->filter = NULL;
+ ret = -ENOMEM;
+ }
+
+ out:
+ return ret;
+}
+
static int
event_trigger_callback(struct event_command *cmd_ops, void *cmd_data,
char *glob, char *cmd, char *param, int enabled)
@@ -600,6 +649,7 @@ static struct event_command trigger_traceon_cmd = {
.reg = register_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = onoff_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};
static struct event_command trigger_traceoff_cmd = {
@@ -608,6 +658,7 @@ static struct event_command trigger_traceoff_cmd = {
.reg = register_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = onoff_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};
static void
@@ -688,6 +739,7 @@ static struct event_command trigger_snapshot_cmd = {
.reg = register_snapshot_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = snapshot_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};
/*
@@ -765,6 +817,7 @@ static struct event_command trigger_stacktrace_cmd = {
.reg = register_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = stacktrace_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};
static __init void unregister_trigger_traceon_traceoff_cmds(void)
@@ -1092,6 +1145,7 @@ static struct event_command trigger_enable_cmd = {
.reg = event_enable_register_trigger,
.unreg = event_enable_unregister_trigger,
.get_trigger_ops = event_enable_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};
static struct event_command trigger_disable_cmd = {
@@ -1100,6 +1154,7 @@ static struct event_command trigger_disable_cmd = {
.reg = event_enable_register_trigger,
.unreg = event_enable_unregister_trigger,
.get_trigger_ops = event_enable_get_trigger_ops,
+ .set_filter = set_trigger_filter,
};
static __init void unregister_trigger_enable_disable_cmds(void)
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index e287011..47fa712 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -319,14 +319,6 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
if (!sys_data)
return;
- if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
- &sys_data->enter_file->flags))
- event_triggers_call(sys_data->enter_file);
-
- if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
- &sys_data->enter_file->flags))
- return;
-
size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
buffer = tr->trace_buffer.buffer;
@@ -339,9 +331,23 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
entry->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ &sys_data->enter_file->flags)) {
+ ring_buffer_discard_commit(buffer, event);
+
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
+ &sys_data->enter_file->flags))
+ event_triggers_call(sys_data->enter_file, entry);
+ return;
+ }
+
if (!filter_current_check_discard(buffer, sys_data->enter_event,
entry, event))
trace_current_buffer_unlock_commit(buffer, event, 0, 0);
+
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
+ &sys_data->enter_file->flags))
+ event_triggers_call(sys_data->enter_file, entry);
}
static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
@@ -363,14 +369,6 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
if (!sys_data)
return;
- if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
- &sys_data->exit_file->flags))
- event_triggers_call(sys_data->exit_file);
-
- if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
- &sys_data->exit_file->flags))
- return;
-
buffer = tr->trace_buffer.buffer;
event = trace_buffer_lock_reserve(buffer,
sys_data->exit_event->event.type, sizeof(*entry), 0, 0);
@@ -381,9 +379,23 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
entry->nr = syscall_nr;
entry->ret = syscall_get_return_value(current, regs);
+ if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
+ &sys_data->exit_file->flags)) {
+ ring_buffer_discard_commit(buffer, event);
+
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
+ &sys_data->exit_file->flags))
+ event_triggers_call(sys_data->exit_file, entry);
+ return;
+ }
+
if (!filter_current_check_discard(buffer, sys_data->exit_event,
entry, event))
trace_current_buffer_unlock_commit(buffer, event, 0, 0);
+
+ if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
+ &sys_data->exit_file->flags))
+ event_triggers_call(sys_data->exit_file, entry);
}
static int reg_event_syscall_enter(struct ftrace_event_file *file,
--
1.7.11.4
From: Tom Zanussi <[email protected]>
Add the missing syscall_metadata description for the enter_fields
struct member.
Signed-off-by: Tom Zanussi <[email protected]>
---
include/trace/syscall.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index 84bc419..fed853f 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -16,6 +16,7 @@
* @nb_args: number of parameters it takes
* @types: list of types as strings
* @args: list of args as strings (args[i] matches types[i])
+ * @enter_fields: list of fields for syscall_enter trace event
* @enter_event: associated syscall_enter trace event
* @exit_event: associated syscall_exit trace event
*/
--
1.7.11.4
Add 'enable_event' and 'disable_event' event_command commands.
enable_event and disable_event event triggers are added by the user
via these commands in a similar way and using practically the same
syntax as the analagous 'enable_event' and 'disable_event' ftrace
function commands, but instead of writing to the set_ftrace_filter
file, the enable_event and disable_event triggers are written to the
per-event 'trigger' files:
echo 'enable_event:system:event' > .../othersys/otherevent/trigger
echo 'disable_event:system:event' > .../othersys/otherevent/trigger
The above commands will enable or disable the 'system:event' trace
events whenever the othersys:otherevent events are hit.
This also adds a 'count' version that limits the number of times the
command will be invoked:
echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger
echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger
Where N is the number of times the command will be invoked.
The above commands will will enable or disable the 'system:event'
trace events whenever the othersys:otherevent events are hit, but only
N times.
This also makes the find_event_file() helper function extern, since
it's useful to use from other places, such as the event triggers code,
so make it accessible.
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace.h | 4 +
kernel/trace/trace_events.c | 2 +-
kernel/trace/trace_events_trigger.c | 363 ++++++++++++++++++++++++++++++++++++
3 files changed, 368 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 7df20e8..94b33ec 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1021,6 +1021,10 @@ extern void trace_event_enable_cmd_record(bool enable);
extern int event_trace_add_tracer(struct dentry *parent, struct trace_array *tr);
extern int event_trace_del_tracer(struct trace_array *tr);
+extern struct ftrace_event_file *find_event_file(struct trace_array *tr,
+ const char *system,
+ const char *event);
+
extern struct mutex event_mutex;
extern struct list_head ftrace_events;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 1fc1602..41e23ff 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1892,7 +1892,7 @@ struct event_probe_data {
bool enable;
};
-static struct ftrace_event_file *
+struct ftrace_event_file *
find_event_file(struct trace_array *tr, const char *system, const char *event)
{
struct ftrace_event_file *file;
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index ee456f3..9a2daa75 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -777,6 +777,357 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
&trigger_cmd_mutex);
}
+/* Avoid typos */
+#define ENABLE_EVENT_STR "enable_event"
+#define DISABLE_EVENT_STR "disable_event"
+
+static void
+event_enable_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (data->enable)
+ clear_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &data->file->flags);
+ else
+ set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &data->file->flags);
+}
+
+static void
+event_enable_count_trigger(void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (!data)
+ return;
+
+ if (!data->count)
+ return;
+
+ /* Skip if the event is in a state we want to switch to */
+ if (data->enable == !(data->file->flags & FTRACE_EVENT_FL_SOFT_DISABLED))
+ return;
+
+ if (data->count != -1)
+ (data->count)--;
+
+ event_enable_trigger(_data);
+}
+
+static int
+event_enable_trigger_print(struct seq_file *m, struct event_trigger_ops *ops,
+ void *_data)
+{
+ struct event_trigger_data *data = _data;
+
+ seq_printf(m, "%s:%s:%s",
+ data->enable ? ENABLE_EVENT_STR : DISABLE_EVENT_STR,
+ data->file->event_call->class->system,
+ data->file->event_call->name);
+
+ if (data->count == -1)
+ seq_puts(m, ":unlimited");
+ else
+ seq_printf(m, ":count=%ld", data->count);
+
+ if (data->filter_str)
+ seq_printf(m, " if %s\n", data->filter_str);
+ else
+ seq_puts(m, "\n");
+
+ return 0;
+}
+
+static void
+event_enable_trigger_free(struct event_trigger_ops *ops, void **_data)
+{
+ struct event_trigger_data **p = (struct event_trigger_data **)_data;
+ struct event_trigger_data *data = *p;
+
+ if (WARN_ON_ONCE(data->ref <= 0))
+ return;
+
+ data->ref--;
+ if (!data->ref) {
+ /* Remove the TRIGGER_MODE flag */
+ trace_event_enable_disable(data->file, 0, 0, 1);
+ module_put(data->file->event_call->mod);
+ kfree(data);
+ }
+}
+
+static struct event_trigger_ops event_enable_trigger_ops = {
+ .func = event_enable_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops event_enable_count_trigger_ops = {
+ .func = event_enable_count_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops event_disable_trigger_ops = {
+ .func = event_enable_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static struct event_trigger_ops event_disable_count_trigger_ops = {
+ .func = event_enable_count_trigger,
+ .print = event_enable_trigger_print,
+ .init = event_trigger_init,
+ .free = event_enable_trigger_free,
+};
+
+static int
+event_enable_trigger_func(struct event_command *cmd_ops, void *cmd_data,
+ char *glob, char *cmd, char *param, int enabled)
+{
+ struct trace_array *tr = top_trace_array();
+ struct ftrace_event_file *file;
+ struct event_trigger_ops *trigger_ops;
+ struct event_trigger_data *trigger_data;
+ const char *system;
+ const char *event;
+ char *trigger;
+ char *number;
+ bool enable;
+ int ret;
+
+ if (!enabled)
+ return -EINVAL;
+
+ if (!param)
+ return -EINVAL;
+
+ /* separate the trigger from the filter (s:e:n [if filter]) */
+ trigger = strsep(¶m, " \t");
+ if (!trigger)
+ return -EINVAL;
+
+ system = strsep(&trigger, ":");
+ if (!trigger)
+ return -EINVAL;
+
+ event = strsep(&trigger, ":");
+
+ mutex_lock(&event_mutex);
+
+ ret = -EINVAL;
+ file = find_event_file(tr, system, event);
+ if (!file)
+ goto out;
+
+ enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
+
+ trigger_ops = cmd_ops->get_trigger_ops(cmd, trigger);
+
+ ret = -ENOMEM;
+ trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
+ if (!trigger_data)
+ goto out;
+
+ trigger_data->enable = enable;
+ trigger_data->count = -1;
+ trigger_data->file = file;
+ trigger_data->ops = trigger_ops;
+ INIT_LIST_HEAD(&trigger_data->list);
+ RCU_INIT_POINTER(trigger_data->filter, NULL);
+
+ if (glob[0] == '!') {
+ cmd_ops->unreg(glob+1, trigger_ops, trigger_data, cmd_data);
+ kfree(trigger_data);
+ ret = 0;
+ goto out;
+ }
+
+ if (trigger) {
+ number = strsep(&trigger, ":");
+
+ ret = -EINVAL;
+ if (!strlen(number))
+ goto out_free;
+
+ /*
+ * We use the callback data field (which is a pointer)
+ * as our counter.
+ */
+ ret = kstrtoul(number, 0, &trigger_data->count);
+ if (ret)
+ goto out_free;
+ }
+
+ if (!param) /* if param is non-empty, it's supposed to be a filter */
+ goto out_reg;
+
+ if (!cmd_ops->set_filter)
+ goto out_reg;
+
+ ret = cmd_ops->set_filter(param, trigger_data, cmd_data);
+ if (ret < 0)
+ goto out_free;
+
+ out_reg:
+ /* Don't let event modules unload while probe registered */
+ ret = try_module_get(file->event_call->mod);
+ if (!ret) {
+ ret = -EBUSY;
+ goto out_free;
+ }
+
+ ret = trace_event_enable_disable(file, 1, 1, 0);
+ if (ret < 0)
+ goto out_put;
+ ret = cmd_ops->reg(glob, trigger_ops, trigger_data, cmd_data);
+ /*
+ * The above returns on success the # of functions enabled,
+ * but if it didn't find any functions it returns zero.
+ * Consider no functions a failure too.
+ */
+ if (!ret) {
+ ret = -ENOENT;
+ goto out_disable;
+ } else if (ret < 0)
+ goto out_disable;
+ /* Just return zero, not the number of enabled functions */
+ ret = 0;
+ out:
+ mutex_unlock(&event_mutex);
+ return ret;
+
+ out_disable:
+ trace_event_enable_disable(file, 0, 1, 0);
+ out_put:
+ module_put(file->event_call->mod);
+ out_free:
+ kfree(trigger_data);
+ goto out;
+}
+
+static int event_enable_register_trigger(char *glob,
+ struct event_trigger_ops *ops,
+ void *trigger_data, void *cmd_data)
+{
+ struct trigger_iterator *iter = cmd_data;
+ struct event_trigger_data *data = trigger_data;
+ struct event_trigger_data *test;
+ int ret = 0;
+
+ list_for_each_entry_rcu(test, &iter->file->triggers, list) {
+ if (test->file == data->file) {
+ ret = -EEXIST;
+ goto out;
+ }
+ }
+
+ if (data->ops->init) {
+ ret = data->ops->init(data->ops, (void **)&data);
+ if (ret < 0)
+ goto out;
+ }
+
+ list_add_rcu(&data->list, &iter->file->triggers);
+ ret++;
+
+ if (trace_event_enable_disable(iter->file, 1, 0, 1) < 0) {
+ list_del_rcu(&data->list);
+ ret--;
+ }
+out:
+ return ret;
+}
+
+static void event_enable_unregister_trigger(char *glob,
+ struct event_trigger_ops *ops,
+ void *trigger_data, void *cmd_data)
+{
+ struct trigger_iterator *iter = cmd_data;
+ struct event_trigger_data *test = trigger_data;
+ struct event_trigger_data *data;
+ bool unregistered = false;
+
+ list_for_each_entry_rcu(data, &iter->file->triggers, list) {
+ if (data->file == test->file) {
+ unregistered = true;
+ list_del_rcu(&data->list);
+ trace_event_enable_disable(iter->file, 0, 0, 1);
+ break;
+ }
+ }
+
+ if (unregistered && data->ops->free)
+ data->ops->free(data->ops, (void **)&data);
+}
+
+static struct event_trigger_ops *
+event_enable_get_trigger_ops(char *cmd, char *param)
+{
+ struct event_trigger_ops *ops;
+ bool enable;
+
+ enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
+
+ if (enable)
+ ops = param ? &event_enable_count_trigger_ops :
+ &event_enable_trigger_ops;
+ else
+ ops = param ? &event_disable_count_trigger_ops :
+ &event_disable_trigger_ops;
+
+ return ops;
+}
+
+static struct event_command trigger_enable_cmd = {
+ .name = ENABLE_EVENT_STR,
+ .func = event_enable_trigger_func,
+ .reg = event_enable_register_trigger,
+ .unreg = event_enable_unregister_trigger,
+ .get_trigger_ops = event_enable_get_trigger_ops,
+};
+
+static struct event_command trigger_disable_cmd = {
+ .name = DISABLE_EVENT_STR,
+ .func = event_enable_trigger_func,
+ .reg = event_enable_register_trigger,
+ .unreg = event_enable_unregister_trigger,
+ .get_trigger_ops = event_enable_get_trigger_ops,
+};
+
+static __init void unregister_trigger_enable_disable_cmds(void)
+{
+ unregister_event_command(&trigger_enable_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ unregister_event_command(&trigger_disable_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+}
+
+static __init int register_trigger_enable_disable_cmds(void)
+{
+ int ret;
+
+ ret = register_event_command(&trigger_enable_cmd, &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0))
+ return ret;
+ ret = register_event_command(&trigger_disable_cmd, &trigger_commands,
+ &trigger_cmd_mutex);
+ if (WARN_ON(ret < 0))
+ unregister_trigger_enable_disable_cmds();
+
+ return ret;
+}
+
static __init int register_trigger_traceon_traceoff_cmds(void)
{
int ret;
@@ -822,5 +1173,17 @@ __init int register_trigger_cmds(void)
return ret;
}
+ ret = register_trigger_enable_disable_cmds();
+ if (ret) {
+ unregister_trigger_traceon_traceoff_cmds();
+ unregister_event_command(&trigger_snapshot_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ unregister_event_command(&trigger_stacktrace_cmd,
+ &trigger_commands,
+ &trigger_cmd_mutex);
+ return ret;
+ }
+
return 0;
}
--
1.7.11.4
Provide a basic overview of trace event triggers and document the
available trigger commands, along with a few simple examples.
Signed-off-by: Tom Zanussi <[email protected]>
---
Documentation/trace/events.txt | 207 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 207 insertions(+)
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index bb24c2a..b39610f 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -278,3 +278,210 @@ their old filters):
prev_pid == 0
# cat sched_wakeup/filter
common_pid == 0
+
+6. Event triggers
+=================
+
+Trace events can be made to conditionally invoke trigger 'commands'
+which can take various forms and are described in detail below;
+examples would be enabling or disabling other trace events or invoking
+a stack trace whenever the trace event is hit. Whenever a trace event
+with attached triggers is invoked, the set of trigger commands
+associated with that event is invoked. Any given trigger can
+additionally have an event filter of the same form as described in
+section 5 (Event filtering) associated with it - the command will only
+be invoked if the event being invoked passes the associated filter.
+If no filter is associated with the trigger, it always passes.
+
+Triggers are added to and removed from a particular event by writing
+trigger expressions to the 'trigger' file for the given event.
+
+A given event can have any number of triggers associated with it,
+subject to any restrictions that individual commands may have in that
+regard.
+
+Event triggers are implemented on top of "soft" mode, which means that
+whenever a trace event has one or more triggers associated with it,
+the event is activated even if it isn't actually enabled, but is
+disabled in a "soft" mode. That is, the tracepoint will be called,
+but just will not be traced, unless of course it's actually enabled.
+This scheme allows triggers to be invoked even for events that aren't
+enabled, and also allows the current event filter implementation to be
+used for conditionally invoking triggers.
+
+The syntax for event triggers is roughly based on the syntax for
+set_ftrace_filter 'ftrace filter commands' (see the 'Filter commands'
+section of Documentation/trace/ftrace.txt), but there are major
+differences and the implementation isn't currently tied to it in any
+way, so beware about making generalizations between the two.
+
+6.1 Expression syntax
+---------------------
+
+Triggers are added by echoing the command to the 'trigger' file:
+
+ # echo 'command[:count] [if filter]' > trigger
+
+Triggers are removed by echoing the same command but starting with '!'
+to the 'trigger' file:
+
+ # echo '!command[:count] [if filter]' > trigger
+
+The [if filter] part isn't used in matching commands when removing, so
+leaving that off in a '!' command will accomplish the same thing as
+having it in.
+
+The filter syntax is the same as that described in the 'Event
+filtering' section above.
+
+For ease of use, writing to the trigger file using '>' currently just
+adds or removes a single trigger and there's no explicit '>>' support
+('>' actually behaves like '>>') or truncation support to remove all
+triggers (you have to use '!' for each one added.)
+
+6.2 Supported trigger commands
+------------------------------
+
+The following commands are supported:
+
+- enable_event/disable_event
+
+ These commands can enable or disable another trace event whenever
+ the triggering event is hit. When these commands are registered,
+ the other trace event is activated, but disabled in a "soft" mode.
+ That is, the tracepoint will be called, but just will not be traced.
+ The event tracepoint stays in this mode as long as there's a trigger
+ in effect that can trigger it.
+
+ For example, the following trigger causes kmalloc events to be
+ traced when a read system call is entered, and the :1 at the end
+ specifies that this enablement happens only once:
+
+ # echo 'enable_event:kmem:kmalloc:1' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+ The following trigger causes kmalloc events to stop being traced
+ when a read system call exits. This disablement happens on every
+ read system call exit:
+
+ # echo 'disable_event:kmem:kmalloc' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+
+ The format is:
+
+ enable_event:<system>:<event>[:count]
+ disable_event:<system>:<event>[:count]
+
+ To remove the above commands:
+
+ # echo '!enable_event:kmem:kmalloc:1' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+ # echo '!disable_event:kmem:kmalloc' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_exit_read/trigger
+
+ Note that there can be any number of enable/disable_event triggers
+ per triggering event, but there can only be one trigger per
+ triggered event. e.g. sys_enter_read can have triggers enabling both
+ kmem:kmalloc and sched:sched_switch, but can't have two kmem:kmalloc
+ versions such as kmem:kmalloc and kmem:kmalloc:1 or 'kmem:kmalloc if
+ bytes_req == 256' and 'kmem:kmalloc if bytes_alloc == 256' (they
+ could be combined into a single filter on kmem:kmalloc though).
+
+- stacktrace
+
+ This command dumps a stacktrace in the trace buffer whenever the
+ triggering event occurs.
+
+ For example, the following trigger dumps a stacktrace every time the
+ kmalloc tracepoint is hit:
+
+ # echo 'stacktrace' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The following trigger dumps a stacktrace the first 5 times a kmalloc
+ request happens with a size >= 64K
+
+ # echo 'stacktrace:5 if bytes_req >= 65536' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The format is:
+
+ stacktrace[:count]
+
+ To remove the above commands:
+
+ # echo '!stacktrace' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # echo '!stacktrace:5 if bytes_req >= 65536' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The latter can also be removed more simply by the following (without
+ the filter):
+
+ # echo '!stacktrace:5' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ Note that there can be only one stacktrace trigger per triggering
+ event.
+
+- snapshot
+
+ This command causes a snapshot to be triggered whenever the
+ triggering event occurs.
+
+ The following command creates a snapshot every time a block request
+ queue is unplugged with a depth > 1. If you were tracing a set of
+ events or functions at the time, the snapshot trace buffer would
+ capture those events when the trigger event occured:
+
+ # echo 'snapshot if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ To only snapshot once:
+
+ # echo 'snapshot:1 if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ To remove the above commands:
+
+ # echo '!snapshot if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ # echo '!snapshot:1 if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ Note that there can be only one snapshot trigger per triggering
+ event.
+
+- traceon/traceoff
+
+ These commands turn tracing on and off when the specified events are
+ hit. The parameter determines how many times the tracing system is
+ turned on and off. If unspecified, there is no limit.
+
+ The following command turns tracing off the first time a block
+ request queue is unplugged with a depth > 1. If you were tracing a
+ set of events or functions at the time, you could then examine the
+ trace buffer to see the sequence of events that led up to the
+ trigger event:
+
+ # echo 'traceoff:1 if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ To always disable tracing when nr_rq > 1 :
+
+ # echo 'traceoff if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ To remove the above commands:
+
+ # echo '!traceoff:1 if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ # echo '!traceoff if nr_rq > 1' > \
+ /sys/kernel/debug/tracing/events/block/block_unplug/trigger
+
+ Note that there can be only one traceon or traceoff trigger per
+ triggering event.
--
1.7.11.4
(2013/06/21 3:31), Tom Zanussi wrote:
> diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> index 88ac7da..7c5627f 100644
> --- a/include/trace/ftrace.h
> +++ b/include/trace/ftrace.h
> @@ -522,14 +522,6 @@ ftrace_raw_event_##call(void *__data, proto) \
> int __data_size; \
> int pc; \
> \
> - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> - &ftrace_file->flags)) \
> - event_triggers_call(ftrace_file); \
> - \
> - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> - &ftrace_file->flags)) \
> - return; \
> - \
> local_save_flags(irq_flags); \
> pc = preempt_count(); \
> \
> @@ -547,8 +539,22 @@ ftrace_raw_event_##call(void *__data, proto) \
> \
> { assign; } \
> \
> + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> + &ftrace_file->flags)) { \
> + ring_buffer_discard_commit(buffer, event); \
> + \
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> + &ftrace_file->flags)) \
> + event_triggers_call(ftrace_file, entry); \
> + return; \
> + } \
> + \
> if (!filter_current_check_discard(buffer, event_call, entry, event)) \
> trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
> + \
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> + &ftrace_file->flags)) \
> + event_triggers_call(ftrace_file, entry); \
Actually, since "entry" is a part of "event" which may be already discarded,
I think we should not access it here. It may not cause real problem because
even if it is discarded, that does NOT mean it is freed. However, it depends
on the ring-buffer implementation.
I recommend you to call event triggers before commit the event. It will also
make the code simpler :)
> diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> index e287011..47fa712 100644
> --- a/kernel/trace/trace_syscalls.c
> +++ b/kernel/trace/trace_syscalls.c
> @@ -319,14 +319,6 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> if (!sys_data)
> return;
>
> - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> - &sys_data->enter_file->flags))
> - event_triggers_call(sys_data->enter_file);
> -
> - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> - &sys_data->enter_file->flags))
> - return;
> -
> size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
>
> buffer = tr->trace_buffer.buffer;
> @@ -339,9 +331,23 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> entry->nr = syscall_nr;
> syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
>
> + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> + &sys_data->enter_file->flags)) {
> + ring_buffer_discard_commit(buffer, event);
> +
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> + &sys_data->enter_file->flags))
> + event_triggers_call(sys_data->enter_file, entry);
> + return;
> + }
> +
> if (!filter_current_check_discard(buffer, sys_data->enter_event,
> entry, event))
> trace_current_buffer_unlock_commit(buffer, event, 0, 0);
> +
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> + &sys_data->enter_file->flags))
> + event_triggers_call(sys_data->enter_file, entry);
> }
>
> static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> @@ -363,14 +369,6 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> if (!sys_data)
> return;
>
> - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> - &sys_data->exit_file->flags))
> - event_triggers_call(sys_data->exit_file);
> -
> - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> - &sys_data->exit_file->flags))
> - return;
> -
> buffer = tr->trace_buffer.buffer;
> event = trace_buffer_lock_reserve(buffer,
> sys_data->exit_event->event.type, sizeof(*entry), 0, 0);
> @@ -381,9 +379,23 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> entry->nr = syscall_nr;
> entry->ret = syscall_get_return_value(current, regs);
>
> + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> + &sys_data->exit_file->flags)) {
> + ring_buffer_discard_commit(buffer, event);
> +
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> + &sys_data->exit_file->flags))
> + event_triggers_call(sys_data->exit_file, entry);
> + return;
> + }
> +
> if (!filter_current_check_discard(buffer, sys_data->exit_event,
> entry, event))
> trace_current_buffer_unlock_commit(buffer, event, 0, 0);
> +
> + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> + &sys_data->exit_file->flags))
> + event_triggers_call(sys_data->exit_file, entry);
> }
>
> static int reg_event_syscall_enter(struct ftrace_event_file *file,
>
Same changes are needed here.
Thank you,
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2013/06/21 3:31), Tom Zanussi wrote:
> Rather than enumerating each permutation, build the enable state
> string up from the combination of states. This also allows for the
> simpler addition of more states.
>
> Signed-off-by: Tom Zanussi <[email protected]>
Looks nicer for me :)
Reviewed-by: Masami Hiramatsu <[email protected]>
Thanks,
> ---
> kernel/trace/trace_events.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 27963e2..ecb2609 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -624,17 +624,17 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
> loff_t *ppos)
> {
> struct ftrace_event_file *file = filp->private_data;
> - char *buf;
> + char buf[4] = "0";
>
> - if (file->flags & FTRACE_EVENT_FL_ENABLED) {
> - if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED)
> - buf = "0*\n";
> - else if (file->flags & FTRACE_EVENT_FL_SOFT_MODE)
> - buf = "1*\n";
> - else
> - buf = "1\n";
> - } else
> - buf = "0\n";
> + if (file->flags & FTRACE_EVENT_FL_ENABLED &&
> + !(file->flags & FTRACE_EVENT_FL_SOFT_DISABLED))
> + strcpy(buf, "1");
> +
> + if (file->flags & FTRACE_EVENT_FL_SOFT_DISABLED ||
> + file->flags & FTRACE_EVENT_FL_SOFT_MODE)
> + strcat(buf, "*");
> +
> + strcat(buf, "\n");
>
> return simple_read_from_buffer(ubuf, cnt, ppos, buf, strlen(buf));
> }
>
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
On 2013/6/21 2:31, Tom Zanussi wrote:
> Add support for SOFT_DISABLE to syscall events.
>
> The original SOFT_DISABLE patches didn't add support for soft disable
> of syscall events; this adds it and paves the way for future patches
> allowing triggers to be added to syscall events, since triggers are
> built on top of SOFT_DISABLE.
>
> Because the trigger and SOFT_DISABLE bits are attached to the
> ftrace_event_file associated with the event, pointers to the
> ftrace_event_files associated with the event are added to the syscall
> metadata entry for the event.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> include/linux/syscalls.h | 2 ++
> include/trace/syscall.h | 5 +++++
> kernel/trace/trace_syscalls.c | 28 ++++++++++++++++++++++++----
> 3 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 4147d70..b4c2afa 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -158,6 +158,8 @@ extern struct trace_event_functions exit_syscall_print_funcs;
> .args = nb ? args_##sname : NULL, \
> .enter_event = &event_enter_##sname, \
> .exit_event = &event_exit_##sname, \
> + .enter_file = NULL, /* Filled in at boot */ \
> + .exit_file = NULL, /* Filled in at boot */ \
> .enter_fields = LIST_HEAD_INIT(__syscall_meta_##sname.enter_fields), \
> }; \
> static struct syscall_metadata __used \
> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> index fed853f..ba24d3a 100644
> --- a/include/trace/syscall.h
> +++ b/include/trace/syscall.h
> @@ -19,6 +19,8 @@
> * @enter_fields: list of fields for syscall_enter trace event
> * @enter_event: associated syscall_enter trace event
> * @exit_event: associated syscall_exit trace event
> + * @enter_file: associated syscall_enter ftrace event file
> + * @exit_file: associated syscall_exit ftrace event file
> */
> struct syscall_metadata {
> const char *name;
> @@ -30,6 +32,9 @@ struct syscall_metadata {
>
> struct ftrace_event_call *enter_event;
> struct ftrace_event_call *exit_event;
> +
> + struct ftrace_event_file *enter_file;
> + struct ftrace_event_file *exit_file;
> };
I doubt this could work correctly.
struct ftrace_event_file is allocated dynamically, there could
have many ftrace_event_file in there, associated with different trace_array,
it means there may have many ftrace_event_file linked with same syscall_metadata.
The enter_file/exit_file pointer of syscall_metadata will be override if another
ftrace_event_file registered in some other trace_array.
Perhaps we could use simple list_head in here, which link with all registered ftrace_event_file.
Oleg use "struct event_file_link" for trace_kprobe, the structure could be reuse in here.
struct event_file_link {
struct ftrace_event_file *file;
struct list_head list;
};
struct syscall_metadata {
const char *name;
int syscall_nr;
int nb_args;
const char **types;
const char **args;
struct list_head enter_fields;
struct ftrace_event_call *enter_event;
struct ftrace_event_call *exit_event;
struct list_head enter_files;
struct list_head exit_files;
};
.jovi
(2013/06/21 3:31), Tom Zanussi wrote:
> From: Tom Zanussi <[email protected]>
>
> Add the missing syscall_metadata description for the enter_fields
> struct member.
>
> Signed-off-by: Tom Zanussi <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Steven, I think the first two patches may better be merged separately
as cleanups.
Thank you,
> ---
> include/trace/syscall.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> index 84bc419..fed853f 100644
> --- a/include/trace/syscall.h
> +++ b/include/trace/syscall.h
> @@ -16,6 +16,7 @@
> * @nb_args: number of parameters it takes
> * @types: list of types as strings
> * @args: list of args as strings (args[i] matches types[i])
> + * @enter_fields: list of fields for syscall_enter trace event
> * @enter_event: associated syscall_enter trace event
> * @exit_event: associated syscall_exit trace event
> */
>
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2013/06/21 3:31), Tom Zanussi wrote:
> The comment on the soft disable 'disable' case of
> __ftrace_event_enable_disable() states that the soft disable bit
> should be cleared in that case, but currently only the soft mode bit
> is actually cleared.
>
> This essentially leaves the standard non-soft-enable enable/disable
> paths as the only way to clear the soft disable flag, but the soft
> disable bit should also be cleared when removing a trigger with '!'.
Indeed, the soft-disabled flag may remain after the event itself
disabled. However that soft-disabled flag will be cleared when
the event is re-enabled. it seems no bad side-effect.
Thus I doubt this patch is separately required. I guess this is
required for adding new trigger flag, isn't it? :)
Thank you,
>
> Also, the SOFT_DISABLED bit should never be set if SOFT_MODE is
> cleared.
>
> This fixes the above discrepancies.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/trace_events.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index ecb2609..f9738dc 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -282,6 +282,8 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
> /* If in SOFT_MODE, just set the SOFT_DISABLE_BIT */
> if (file->flags & FTRACE_EVENT_FL_SOFT_MODE)
> set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags);
> + else
> + clear_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags);
> break;
> case 1:
> /*
>
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
(2013/06/21 3:31), Tom Zanussi wrote:
> The event trigger functionality is built on top of SOFT_DISABLE
> functionality. It adds a TRIGGER_MODE bit to the ftrace_event_file
> flags which is checked when any trace event fires. Triggers set for a
> particular event need to be checked regardless of whether that event
> is actually enabled or not - getting an event to fire even if it's not
> enabled is essentially what's already implemented by SOFT_DISABLE
> mode, so trigger mode directly reuses that. It essentially inherits
> the soft disable logic in __ftrace_event_enable_disable() while adding
> a bit of logic and trigger reference counting via tm_ref on top of
> that. Because the enable_disable code needs to now be invoked from
> outside trace_events.c, a wrapper is also added for those usages.
Agreed, but I think the implementation looks not enough.
You implemented it directly in __ftrace_event_enable_disable(),
but I think it should be wrapped with other function, something
like ftrace_event_trigger_enable_disable(), and don't touch
__ftrce_event_enable_disable().
[..]
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index f9738dc..1fc1602 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -242,7 +242,8 @@ void trace_event_enable_cmd_record(bool enable)
> }
>
> static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
> - int enable, int soft_disable)
> + int enable, int soft_disable,
> + int trigger_enable)
Here, you added trigger_enable, but the code implies this flag must
be set exclusively with soft_disable.
> {
> struct ftrace_event_call *call = file->event_call;
> int ret = 0;
> @@ -263,7 +264,13 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
> * we do nothing. Do not disable the tracepoint, otherwise
> * "soft enable"s (clearing the SOFT_DISABLED bit) wont work.
> */
> - if (soft_disable) {
> + if (trigger_enable) {
> + if (atomic_dec_return(&file->tm_ref) > 0)
> + break;
> + clear_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
> + ret = __ftrace_event_enable_disable(file, enable, 1, 0);
> + break;
Why break here? If all triggers are gone, and no other user is on this event,
this event should be disabled hardly.
> + } else if (soft_disable) {
> if (atomic_dec_return(&file->sm_ref) > 0)
> break;
> disable = file->flags & FTRACE_EVENT_FL_SOFT_DISABLED;
> @@ -279,7 +286,7 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
> }
> call->class->reg(call, TRACE_REG_UNREGISTER, file);
> }
> - /* If in SOFT_MODE, just set the SOFT_DISABLE_BIT */
> + /* If in SOFT_MODE, just set the SOFT_DISABLE_BIT, else clear it */
This part should be merged with previous patch (which I doubted...).
> if (file->flags & FTRACE_EVENT_FL_SOFT_MODE)
> set_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, &file->flags);
> else
> @@ -294,7 +301,13 @@ static int __ftrace_event_enable_disable(struct ftrace_event_file *file,
> * set SOFT_DISABLED before enabling the event tracepoint, so
> * it still seems to be disabled.
> */
> - if (!soft_disable)
> + if (trigger_enable) {
> + if (atomic_inc_return(&file->tm_ref) > 1)
> + break;
> + set_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, &file->flags);
> + ret = __ftrace_event_enable_disable(file, enable, 1, 0);
Hmm, if you do this, you can simply do as below :)
int ftrace_event_trigger_enable_disable(file, enable)
{
if (enable) {
atomic-count-up-and-return-if-its-not-first-one
set_bit(TRIGGER_MODE)
__ftrace_event_enable_disable(file, 1, 1);
} else {
atomic-count-down-and-return-if-its-not-last-one
clear_bit(TRIGGER_MODE)
__ftrace_event_enable_disable(file, 0, 1);
}
}
[OffTopic] BTW, I think current the 3rd argument name "soft_disable" is
a bit confused, it actually means "control event in soft-mode". Even if the
event is enabled, __ftrace_event_enable_disable(file, 0, 1) doesn't disable
it, but just removes soft-mode bit.
Thank you,
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
Hi Masami,
On Fri, 2013-06-21 at 13:18 +0900, Masami Hiramatsu wrote:
> (2013/06/21 3:31), Tom Zanussi wrote:
> > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> > index 88ac7da..7c5627f 100644
> > --- a/include/trace/ftrace.h
> > +++ b/include/trace/ftrace.h
> > @@ -522,14 +522,6 @@ ftrace_raw_event_##call(void *__data, proto) \
> > int __data_size; \
> > int pc; \
> > \
> > - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > - &ftrace_file->flags)) \
> > - event_triggers_call(ftrace_file); \
> > - \
> > - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> > - &ftrace_file->flags)) \
> > - return; \
> > - \
> > local_save_flags(irq_flags); \
> > pc = preempt_count(); \
> > \
> > @@ -547,8 +539,22 @@ ftrace_raw_event_##call(void *__data, proto) \
> > \
> > { assign; } \
> > \
> > + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> > + &ftrace_file->flags)) { \
> > + ring_buffer_discard_commit(buffer, event); \
> > + \
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > + &ftrace_file->flags)) \
> > + event_triggers_call(ftrace_file, entry); \
> > + return; \
> > + } \
> > + \
> > if (!filter_current_check_discard(buffer, event_call, entry, event)) \
> > trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
> > + \
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > + &ftrace_file->flags)) \
> > + event_triggers_call(ftrace_file, entry); \
>
> Actually, since "entry" is a part of "event" which may be already discarded,
> I think we should not access it here. It may not cause real problem because
> even if it is discarded, that does NOT mean it is freed. However, it depends
> on the ring-buffer implementation.
>
> I recommend you to call event triggers before commit the event. It will also
> make the code simpler :)
>
That's what I originally did, and is what I was referring to when I
mentioned the bit of 'trickiness' here. ;-)
The problem is that the trace_recursive_lock() check in
ring_buffer_lock_reserve() prevents a trigger that itself logs to the
ring buffer from reserving a slot in the buffer, since it's being done
from the same context as the triggering event. For example, the
stacktrace trigger, which calls trace_dump_stack().
So the code is a bit more convoluted than I'd like because of that -
since we can't really invoke the triggers before the current event is
committed, it waits until the current event is either discarded (because
we're soft disabled) or committed (logging a normally-enabled event).
But you do point out a real problem with this - it means having to look
at a discarded event in the soft-disabled case, and if for example some
interrupt actually reserves and logs an event between the time we do the
discard and invoke the filter, we could end up with a false filtering
outcome. I'm not sure how to prevent that though at this point - will
have think about it some more...
Tom
> > diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
> > index e287011..47fa712 100644
> > --- a/kernel/trace/trace_syscalls.c
> > +++ b/kernel/trace/trace_syscalls.c
> > @@ -319,14 +319,6 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> > if (!sys_data)
> > return;
> >
> > - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> > - &sys_data->enter_file->flags))
> > - event_triggers_call(sys_data->enter_file);
> > -
> > - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> > - &sys_data->enter_file->flags))
> > - return;
> > -
> > size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
> >
> > buffer = tr->trace_buffer.buffer;
> > @@ -339,9 +331,23 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
> > entry->nr = syscall_nr;
> > syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
> >
> > + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> > + &sys_data->enter_file->flags)) {
> > + ring_buffer_discard_commit(buffer, event);
> > +
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> > + &sys_data->enter_file->flags))
> > + event_triggers_call(sys_data->enter_file, entry);
> > + return;
> > + }
> > +
> > if (!filter_current_check_discard(buffer, sys_data->enter_event,
> > entry, event))
> > trace_current_buffer_unlock_commit(buffer, event, 0, 0);
> > +
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> > + &sys_data->enter_file->flags))
> > + event_triggers_call(sys_data->enter_file, entry);
> > }
> >
> > static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> > @@ -363,14 +369,6 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> > if (!sys_data)
> > return;
> >
> > - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> > - &sys_data->exit_file->flags))
> > - event_triggers_call(sys_data->exit_file);
> > -
> > - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> > - &sys_data->exit_file->flags))
> > - return;
> > -
> > buffer = tr->trace_buffer.buffer;
> > event = trace_buffer_lock_reserve(buffer,
> > sys_data->exit_event->event.type, sizeof(*entry), 0, 0);
> > @@ -381,9 +379,23 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
> > entry->nr = syscall_nr;
> > entry->ret = syscall_get_return_value(current, regs);
> >
> > + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
> > + &sys_data->exit_file->flags)) {
> > + ring_buffer_discard_commit(buffer, event);
> > +
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> > + &sys_data->exit_file->flags))
> > + event_triggers_call(sys_data->exit_file, entry);
> > + return;
> > + }
> > +
> > if (!filter_current_check_discard(buffer, sys_data->exit_event,
> > entry, event))
> > trace_current_buffer_unlock_commit(buffer, event, 0, 0);
> > +
> > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT,
> > + &sys_data->exit_file->flags))
> > + event_triggers_call(sys_data->exit_file, entry);
> > }
> >
> > static int reg_event_syscall_enter(struct ftrace_event_file *file,
> >
>
> Same changes are needed here.
>
> Thank you,
>
On Thu, 2013-06-20 at 13:31 -0500, Tom Zanussi wrote:
> Hi,
>
> This patchset implements 'trace event triggers', which are similar to
> the function triggers implemented for 'ftrace filter commands' (see
> 'Filter commands' in Documentation/trace/ftrace.txt), but instead of
> being invoked from function calls are invoked by trace events.
> Basically the patchset allows 'commands' to be triggered whenever a
> given trace event is hit. The set of commands implemented by this
> patchset are:
>
Hi Tom!
Thanks for doing this. I'll try to get some time to review the patch set
either today or next week.
-- Steve
On Fri, 2013-06-21 at 16:06 +0900, Masami Hiramatsu wrote:
> (2013/06/21 3:31), Tom Zanussi wrote:
> > From: Tom Zanussi <[email protected]>
> >
> > Add the missing syscall_metadata description for the enter_fields
> > struct member.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
>
> Reviewed-by: Masami Hiramatsu <[email protected]>
>
> Steven, I think the first two patches may better be merged separately
> as cleanups.
>
Yeah, I'll probably pull these in today, as part of my 3.11 queue.
-- Steve
On Fri, 2013-06-21 at 14:53 +0800, zhangwei(Jovi) wrote:
> On 2013/6/21 2:31, Tom Zanussi wrote:
> > Add support for SOFT_DISABLE to syscall events.
> >
> > The original SOFT_DISABLE patches didn't add support for soft disable
> > of syscall events; this adds it and paves the way for future patches
> > allowing triggers to be added to syscall events, since triggers are
> > built on top of SOFT_DISABLE.
> >
> > Because the trigger and SOFT_DISABLE bits are attached to the
> > ftrace_event_file associated with the event, pointers to the
> > ftrace_event_files associated with the event are added to the syscall
> > metadata entry for the event.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
> > ---
> > include/linux/syscalls.h | 2 ++
> > include/trace/syscall.h | 5 +++++
> > kernel/trace/trace_syscalls.c | 28 ++++++++++++++++++++++++----
> > 3 files changed, 31 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 4147d70..b4c2afa 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -158,6 +158,8 @@ extern struct trace_event_functions exit_syscall_print_funcs;
> > .args = nb ? args_##sname : NULL, \
> > .enter_event = &event_enter_##sname, \
> > .exit_event = &event_exit_##sname, \
> > + .enter_file = NULL, /* Filled in at boot */ \
> > + .exit_file = NULL, /* Filled in at boot */ \
> > .enter_fields = LIST_HEAD_INIT(__syscall_meta_##sname.enter_fields), \
> > }; \
> > static struct syscall_metadata __used \
> > diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> > index fed853f..ba24d3a 100644
> > --- a/include/trace/syscall.h
> > +++ b/include/trace/syscall.h
> > @@ -19,6 +19,8 @@
> > * @enter_fields: list of fields for syscall_enter trace event
> > * @enter_event: associated syscall_enter trace event
> > * @exit_event: associated syscall_exit trace event
> > + * @enter_file: associated syscall_enter ftrace event file
> > + * @exit_file: associated syscall_exit ftrace event file
> > */
> > struct syscall_metadata {
> > const char *name;
> > @@ -30,6 +32,9 @@ struct syscall_metadata {
> >
> > struct ftrace_event_call *enter_event;
> > struct ftrace_event_call *exit_event;
> > +
> > + struct ftrace_event_file *enter_file;
> > + struct ftrace_event_file *exit_file;
> > };
>
> I doubt this could work correctly.
> struct ftrace_event_file is allocated dynamically, there could
> have many ftrace_event_file in there, associated with different trace_array,
> it means there may have many ftrace_event_file linked with same syscall_metadata.
>
> The enter_file/exit_file pointer of syscall_metadata will be override if another
> ftrace_event_file registered in some other trace_array.
>
> Perhaps we could use simple list_head in here, which link with all registered ftrace_event_file.
>
> Oleg use "struct event_file_link" for trace_kprobe, the structure could be reuse in here.
>
>
> struct event_file_link {
> struct ftrace_event_file *file;
> struct list_head list;
> };
>
> struct syscall_metadata {
> const char *name;
> int syscall_nr;
> int nb_args;
> const char **types;
> const char **args;
> struct list_head enter_fields;
>
> struct ftrace_event_call *enter_event;
> struct ftrace_event_call *exit_event;
>
> struct list_head enter_files;
> struct list_head exit_files;
> };
>
I would really like to make these smaller. They are made for every
system call, and I consider tracing to always be added overhead. I hate
it when we waste more memory for tracing as very few people use it
(compared to those that use Linux).
Is it possible to allocate this only when its first used?
-- Steve
On Fri, 2013-06-21 at 20:12 +0900, Masami Hiramatsu wrote:
> (2013/06/21 3:31), Tom Zanussi wrote:
> > The comment on the soft disable 'disable' case of
> > __ftrace_event_enable_disable() states that the soft disable bit
> > should be cleared in that case, but currently only the soft mode bit
> > is actually cleared.
> >
> > This essentially leaves the standard non-soft-enable enable/disable
> > paths as the only way to clear the soft disable flag, but the soft
> > disable bit should also be cleared when removing a trigger with '!'.
>
> Indeed, the soft-disabled flag may remain after the event itself
> disabled. However that soft-disabled flag will be cleared when
> the event is re-enabled. it seems no bad side-effect.
>
> Thus I doubt this patch is separately required. I guess this is
> required for adding new trigger flag, isn't it? :)
Tom, I'm guessing Masami is correct here. It's needed for the trigger
work to work, correct?
Either way, I probably could add it as a clean up patch regardless. I'll
just have to test the hell out of it some more, as the accounting for
soft-disable vs real disable was a PITA.
-- Steve
On Fri, 2013-06-21 at 16:39 -0400, Steven Rostedt wrote:
> On Fri, 2013-06-21 at 20:12 +0900, Masami Hiramatsu wrote:
> > (2013/06/21 3:31), Tom Zanussi wrote:
> > > The comment on the soft disable 'disable' case of
> > > __ftrace_event_enable_disable() states that the soft disable bit
> > > should be cleared in that case, but currently only the soft mode bit
> > > is actually cleared.
> > >
> > > This essentially leaves the standard non-soft-enable enable/disable
> > > paths as the only way to clear the soft disable flag, but the soft
> > > disable bit should also be cleared when removing a trigger with '!'.
> >
> > Indeed, the soft-disabled flag may remain after the event itself
> > disabled. However that soft-disabled flag will be cleared when
> > the event is re-enabled. it seems no bad side-effect.
> >
> > Thus I doubt this patch is separately required. I guess this is
> > required for adding new trigger flag, isn't it? :)
>
> Tom, I'm guessing Masami is correct here. It's needed for the trigger
> work to work, correct?
>
Well, the trigger should really work without this - this is basically
just a cleanup I added because it bothered me that I couldn't completely
revert the enable state back to the original state that existed before I
added the trigger (by reverting the trigger using '!'). It also just
seemed obviously correct from looking at the code as well (though I
agree, it's hard to keep the state machine of that function in your head
in order to prove it correct, and the straggling soft-disable state
hasn't bothered anyone until now, so maybe it's not worth it..)
In any case, if the SOFT_DISABLED bit is erroneously set but there are
no triggers, it shouldn't be a problem, since the trigger calls would
just return immediately, so not having this patch wouldn't break
anything...
Tom
> Either way, I probably could add it as a clean up patch regardless. I'll
> just have to test the hell out of it some more, as the accounting for
> soft-disable vs real disable was a PITA.
>
> -- Steve
>
On Sat, Jun 22, 2013 at 4:22 AM, Steven Rostedt <[email protected]> wrote:
> On Fri, 2013-06-21 at 14:53 +0800, zhangwei(Jovi) wrote:
>> On 2013/6/21 2:31, Tom Zanussi wrote:
>> > Add support for SOFT_DISABLE to syscall events.
>> >
>> > The original SOFT_DISABLE patches didn't add support for soft disable
>> > of syscall events; this adds it and paves the way for future patches
>> > allowing triggers to be added to syscall events, since triggers are
>> > built on top of SOFT_DISABLE.
>> >
>> > Because the trigger and SOFT_DISABLE bits are attached to the
>> > ftrace_event_file associated with the event, pointers to the
>> > ftrace_event_files associated with the event are added to the syscall
>> > metadata entry for the event.
>> >
>> > Signed-off-by: Tom Zanussi <[email protected]>
>> > ---
>> > include/linux/syscalls.h | 2 ++
>> > include/trace/syscall.h | 5 +++++
>> > kernel/trace/trace_syscalls.c | 28 ++++++++++++++++++++++++----
>> > 3 files changed, 31 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> > index 4147d70..b4c2afa 100644
>> > --- a/include/linux/syscalls.h
>> > +++ b/include/linux/syscalls.h
>> > @@ -158,6 +158,8 @@ extern struct trace_event_functions exit_syscall_print_funcs;
>> > .args = nb ? args_##sname : NULL, \
>> > .enter_event = &event_enter_##sname, \
>> > .exit_event = &event_exit_##sname, \
>> > + .enter_file = NULL, /* Filled in at boot */ \
>> > + .exit_file = NULL, /* Filled in at boot */ \
>> > .enter_fields = LIST_HEAD_INIT(__syscall_meta_##sname.enter_fields), \
>> > }; \
>> > static struct syscall_metadata __used \
>> > diff --git a/include/trace/syscall.h b/include/trace/syscall.h
>> > index fed853f..ba24d3a 100644
>> > --- a/include/trace/syscall.h
>> > +++ b/include/trace/syscall.h
>> > @@ -19,6 +19,8 @@
>> > * @enter_fields: list of fields for syscall_enter trace event
>> > * @enter_event: associated syscall_enter trace event
>> > * @exit_event: associated syscall_exit trace event
>> > + * @enter_file: associated syscall_enter ftrace event file
>> > + * @exit_file: associated syscall_exit ftrace event file
>> > */
>> > struct syscall_metadata {
>> > const char *name;
>> > @@ -30,6 +32,9 @@ struct syscall_metadata {
>> >
>> > struct ftrace_event_call *enter_event;
>> > struct ftrace_event_call *exit_event;
>> > +
>> > + struct ftrace_event_file *enter_file;
>> > + struct ftrace_event_file *exit_file;
>> > };
>>
>> I doubt this could work correctly.
>> struct ftrace_event_file is allocated dynamically, there could
>> have many ftrace_event_file in there, associated with different trace_array,
>> it means there may have many ftrace_event_file linked with same syscall_metadata.
>>
>> The enter_file/exit_file pointer of syscall_metadata will be override if another
>> ftrace_event_file registered in some other trace_array.
>>
>> Perhaps we could use simple list_head in here, which link with all registered ftrace_event_file.
>>
>> Oleg use "struct event_file_link" for trace_kprobe, the structure could be reuse in here.
>>
>>
>> struct event_file_link {
>> struct ftrace_event_file *file;
>> struct list_head list;
>> };
>>
>> struct syscall_metadata {
>> const char *name;
>> int syscall_nr;
>> int nb_args;
>> const char **types;
>> const char **args;
>> struct list_head enter_fields;
>>
>> struct ftrace_event_call *enter_event;
>> struct ftrace_event_call *exit_event;
>>
>> struct list_head enter_files;
>> struct list_head exit_files;
>> };
>>
>
>
> I would really like to make these smaller. They are made for every
> system call, and I consider tracing to always be added overhead. I hate
> it when we waste more memory for tracing as very few people use it
> (compared to those that use Linux).
>
> Is it possible to allocate this only when its first used?
>
> -- Steve
I think the answer is yes.
Compare with link ftrace_event_file with static syscall_metadata, another option
is put into structure trace_array, just like enabled_enter_syscalls and
enabled_exit_syscalls BITMAP already in there
(need change to dynamic allocated NR_syscalls array, just keep a
pointer in trace_array).
Then in this way, there don't have any extra size overhead for static
syscall_metadata,
but need to allocate a array with NR_syscalls elements when first use
syscall tracing.
which option do you prefer? steven.
.jovi
On Fri, 2013-06-21 at 16:14 -0500, Tom Zanussi wrote:
> On Fri, 2013-06-21 at 16:39 -0400, Steven Rostedt wrote:
> > On Fri, 2013-06-21 at 20:12 +0900, Masami Hiramatsu wrote:
> > > (2013/06/21 3:31), Tom Zanussi wrote:
> > > > The comment on the soft disable 'disable' case of
> > > > __ftrace_event_enable_disable() states that the soft disable bit
> > > > should be cleared in that case, but currently only the soft mode bit
> > > > is actually cleared.
> > > >
> > > > This essentially leaves the standard non-soft-enable enable/disable
> > > > paths as the only way to clear the soft disable flag, but the soft
> > > > disable bit should also be cleared when removing a trigger with '!'.
> > >
> > > Indeed, the soft-disabled flag may remain after the event itself
> > > disabled. However that soft-disabled flag will be cleared when
> > > the event is re-enabled. it seems no bad side-effect.
> > >
> > > Thus I doubt this patch is separately required. I guess this is
> > > required for adding new trigger flag, isn't it? :)
> >
> > Tom, I'm guessing Masami is correct here. It's needed for the trigger
> > work to work, correct?
> >
>
> Well, the trigger should really work without this - this is basically
> just a cleanup I added because it bothered me that I couldn't completely
> revert the enable state back to the original state that existed before I
> added the trigger (by reverting the trigger using '!'). It also just
> seemed obviously correct from looking at the code as well (though I
> agree, it's hard to keep the state machine of that function in your head
> in order to prove it correct, and the straggling soft-disable state
> hasn't bothered anyone until now, so maybe it's not worth it..)
>
Looking into this a bit more, I think the reason it hasn't bothered
anyone until now is that it's been hidden by the existing
event_enable_read() implementation, which doesn't show any soft disable
state when the event is actually disabled, only when it's enabled. So
the case where SOFT_DISABLED is still set but the event is actually
disabled gets hidden by the catch-all "0" case.
My new version of event_enable_read() does show the soft disabled state
when the event is actually disabled, which is why I noticed it wasn't
getting turned off, and led to the current patch.
Ironically, the reason I refactored the function in the first place was
to add the '+' flag for triggers - redundant, yes, but useful for
debugging, not quite in the way I planned though. ;-) (It might be
that leaving the current function in place and remaining oblivious would
be ok, too, since it doesn't seem to really cause much of a problem in
any case...)
Tom
> In any case, if the SOFT_DISABLED bit is erroneously set but there are
> no triggers, it shouldn't be a problem, since the trigger calls would
> just return immediately, so not having this patch wouldn't break
> anything...
>
> Tom
>
>
> > Either way, I probably could add it as a clean up patch regardless. I'll
> > just have to test the hell out of it some more, as the accounting for
> > soft-disable vs real disable was a PITA.
> >
>
>
> > -- Steve
> >
>
On Sat, 2013-06-22 at 13:08 +0800, Jovi Zhang wrote:
> > I would really like to make these smaller. They are made for every
> > system call, and I consider tracing to always be added overhead. I hate
> > it when we waste more memory for tracing as very few people use it
> > (compared to those that use Linux).
> >
> > Is it possible to allocate this only when its first used?
> >
> > -- Steve
>
> I think the answer is yes.
>
> Compare with link ftrace_event_file with static syscall_metadata, another option
> is put into structure trace_array, just like enabled_enter_syscalls and
> enabled_exit_syscalls BITMAP already in there
> (need change to dynamic allocated NR_syscalls array, just keep a
> pointer in trace_array).
>
> Then in this way, there don't have any extra size overhead for static
> syscall_metadata,
> but need to allocate a array with NR_syscalls elements when first use
> syscall tracing.
>
> which option do you prefer? steven.
Hey, if it only allocates when tracing is used, that's a big plus.
Tracing should be treated as a drug. Just like not allowing smoking in
public places, only punish the users, not those that want to breath
fresh air.
-- Steve
On Fri, 2013-06-21 at 12:59 -0500, Tom Zanussi wrote:
> Hi Masami,
>
> On Fri, 2013-06-21 at 13:18 +0900, Masami Hiramatsu wrote:
> > (2013/06/21 3:31), Tom Zanussi wrote:
> > > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
> > > index 88ac7da..7c5627f 100644
> > > --- a/include/trace/ftrace.h
> > > +++ b/include/trace/ftrace.h
> > > @@ -522,14 +522,6 @@ ftrace_raw_event_##call(void *__data, proto) \
> > > int __data_size; \
> > > int pc; \
> > > \
> > > - if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > > - &ftrace_file->flags)) \
> > > - event_triggers_call(ftrace_file); \
> > > - \
> > > - if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> > > - &ftrace_file->flags)) \
> > > - return; \
I'd like to still exit early on soft disable if needed. Perhaps have a:
if (ftrace_file->flags & (FTRACE_EVENT_FL_TRIGGER_MODE_BIT | \
FTRACE_EVENT_FL_SOFT_DISABLED_BIT) == \
FTRACE_EVENT_FL_SOFT_DISABLED_BIT) \
return; \
> > > - \
> > > local_save_flags(irq_flags); \
> > > pc = preempt_count(); \
> > > \
> > > @@ -547,8 +539,22 @@ ftrace_raw_event_##call(void *__data, proto) \
> > > \
> > > { assign; } \
> > > \
> > > + if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT, \
> > > + &ftrace_file->flags)) { \
> > > + ring_buffer_discard_commit(buffer, event); \
> > > + \
> > > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > > + &ftrace_file->flags)) \
> > > + event_triggers_call(ftrace_file, entry); \
> > > + return; \
> > > + } \
> > > + \
> > > if (!filter_current_check_discard(buffer, event_call, entry, event)) \
> > > trace_buffer_unlock_commit(buffer, event, irq_flags, pc); \
> > > + \
> > > + if (test_bit(FTRACE_EVENT_FL_TRIGGER_MODE_BIT, \
> > > + &ftrace_file->flags)) \
> > > + event_triggers_call(ftrace_file, entry); \
> >
> > Actually, since "entry" is a part of "event" which may be already discarded,
> > I think we should not access it here. It may not cause real problem because
> > even if it is discarded, that does NOT mean it is freed. However, it depends
> > on the ring-buffer implementation.
> >
> > I recommend you to call event triggers before commit the event. It will also
> > make the code simpler :)
> >
>
> That's what I originally did, and is what I was referring to when I
> mentioned the bit of 'trickiness' here. ;-)
>
> The problem is that the trace_recursive_lock() check in
> ring_buffer_lock_reserve() prevents a trigger that itself logs to the
> ring buffer from reserving a slot in the buffer, since it's being done
> from the same context as the triggering event. For example, the
> stacktrace trigger, which calls trace_dump_stack().
Ah, yeah that would happen.
>
> So the code is a bit more convoluted than I'd like because of that -
> since we can't really invoke the triggers before the current event is
> committed, it waits until the current event is either discarded (because
> we're soft disabled) or committed (logging a normally-enabled event).
>
> But you do point out a real problem with this - it means having to look
> at a discarded event in the soft-disabled case, and if for example some
> interrupt actually reserves and logs an event between the time we do the
> discard and invoke the filter, we could end up with a false filtering
> outcome. I'm not sure how to prevent that though at this point - will
> have think about it some more...
>
Perhaps add a trigger mode here:
enum trigger_mode {
TM_NONE,
TM_STACK,
};
Then do:
trigger_mode = event_triggers_call(ftrace_file, entry);
(finish event)
if (unlikely(trigger_mode))
event_triggers_post_call(ftrace_file, trigger_mode);
Something like that.
-- Steve
(2013/06/22 14:25), Tom Zanussi wrote:
>
> Looking into this a bit more, I think the reason it hasn't bothered
> anyone until now is that it's been hidden by the existing
> event_enable_read() implementation, which doesn't show any soft disable
> state when the event is actually disabled, only when it's enabled. So
> the case where SOFT_DISABLED is still set but the event is actually
> disabled gets hidden by the catch-all "0" case.
>
> My new version of event_enable_read() does show the soft disabled state
> when the event is actually disabled, which is why I noticed it wasn't
> getting turned off, and led to the current patch.
>
> Ironically, the reason I refactored the function in the first place was
> to add the '+' flag for triggers - redundant, yes, but useful for
> debugging, not quite in the way I planned though. ;-) (It might be
> that leaving the current function in place and remaining oblivious would
> be ok, too, since it doesn't seem to really cause much of a problem in
> any case...)
>
Ah, I've just missed this. And indeed, if your event_enable_read() cleanup
patch changes the output, that is not actual cleanup patch.
I think you should merge this into [1/11] patch to avoid behavior change.
Thank you!
--
Masami HIRAMATSU
IT Management Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]