Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755429AbbETVVg (ORCPT ); Wed, 20 May 2015 17:21:36 -0400 Received: from mga02.intel.com ([134.134.136.20]:45354 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754819AbbETVUi (ORCPT ); Wed, 20 May 2015 17:20:38 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,466,1427785200"; d="scan'208";a="732690035" From: Tom Zanussi To: rostedt@goodmis.org Cc: daniel.wagner@bmw-carit.de, masami.hiramatsu.pt@hitachi.com, namhyung@kernel.org, josh@joshtriplett.org, andi@firstfloor.org, linux-kernel@vger.kernel.org, Tom Zanussi Subject: [PATCH v5 09/10] tracing: Add 'hist' trigger Documentation Date: Wed, 20 May 2015 16:19:56 -0500 Message-Id: <9c197e83411126520e1232d4fe1d4cd9a9a3212f.1432151408.git.tom.zanussi@linux.intel.com> X-Mailer: git-send-email 1.9.3 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 47563 Lines: 895 Add documentation and usage examples for 'hist' triggers. Signed-off-by: Tom Zanussi --- Documentation/trace/events.txt | 872 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 872 insertions(+) diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index 75d25a1..29dc9bf 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt @@ -494,3 +494,875 @@ The following commands are supported: Note that there can be only one traceon or traceoff trigger per triggering event. + +- hist + + This command aggregates event hits into a hash table keyed on a + trace event format fields (or stacktrace) and a set of running + totals derived from one or more trace event format fields and/or + event counts (hitcount). + + The format of a hist trigger is as follows: + + hist:keys=[:values=] + + When a matching event is hit, an entry is added to a hash table + using the key(s) and value(s) named. Keys and values correspond to + fields in the event's format description. Values must correspond to + numeric fields - on an event hit, the value(s) will be added to a + sum kept for that field. The special string 'hitcount' can be used + in place of an explicit value field - this is simply a count of + event hits. If 'values' isn't specified, an implicit 'hitcount' + value will be automatically created and used as the only value. + Keys can be any field, or the special string 'stacktrace', which + will use the event's kernel stacktrace as the key. The keywords + 'keys' or 'key' can be used to specify keys, and the keyworks + 'values', 'vals', or 'val' can be used to specify values. For the + time being, only a single key can be used - compound keys aren't yet + supported. + + 'hist' triggers add a 'hist' file to each event's subdirectory. + Reading the 'hist' file for the event will dump the hash table in + its entirety to stdout. By default, numeric fields are displayed as + base-10 integers. This can be modified by appending any of the + following modifiers to the field name: + + .hex display a number as a hex value + .sym display an address as a symbol + .syscall display a syscall id as a system call name + .execname display a common_pid as a program name + + A typical usage scenario would be the following to enable a hist + trigger, read its current contents, and then turn it off: + + # echo 'hist:keys=skbaddr.hex:vals=len' > \ + /sys/kernel/debug/tracing/events/net/netif_rx/trigger + + # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist + + # echo '!hist:keys=skbaddr.hex:vals=len' > \ + /sys/kernel/debug/tracing/events/net/netif_rx/trigger + + The trigger file itself can be read to show the details of the + currently attached hist trigger. This information is also displayed + at the top of the 'hist' file when read. + + By default, the size of the hash table is 2048 entries. The 'size' + param can be used to specify more or fewer than that. The units are + in terms of hashtable entries - if a run uses more entries than + specified, the results will show the number of 'drops', the number + of hits that were ignored. The size should be a power of 2 between + 128 and 131072 (any non- power-of-2 number specified will be rounded + up). + + The 'sort' param can be used to specify a value field to sort on. + The default if unspecified is 'hitcount' and the default sort order + is 'ascending'. To sort in the opposite direction, append + .descending' to the sort key. + + The 'pause' param can be used to pause an existing hist trigger or + to start a hist trigger but not log any events until told to do so. + 'continue' or 'cont' can be used to start or restart a paused hist + trigger. + + The 'clear' param will clear the contents of a running hist trigger + and leave its current paused/active state. + +- enable_hist/disable_hist + + The enable_hist and disable_hist triggers can be used to have one + event conditionally start and stop another event's already-attached + hist trigger. Any number of enable_hist and disable_hist triggers + can be attached to a given event, allowing that event to kick off + and stop aggregations on a host of other events. + + The format is very similar to the enable/disable_event triggers: + + enable_hist::[:count] + disable_hist::[:count] + + Instead of enabling or disabling the tracing of the target event + into the trace buffer as the enable/disable_event triggers do, the + enable/disable_hist triggers enable or disable the aggregation of + the target event into a hash table. + + A typical usage scenario for the enable_hist/disable_hist triggers + would be to first set up a paused hist trigger on some event, + followed by an enable_hist/disable_hist pair that turns the hist + aggregation on and off when conditions of interest are hit: + + # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + + # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + + # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + + The above sets up an initially paused hist trigger which is unpaused + and starts aggregating events when a given program is executed, and + which stops aggregating when the process exits and the hist trigger + is paused again. + + The examples below provide a more concrete illustration of the + concepts and typical usage patterns discussed above. + + +6.2 'hist' trigger examples +--------------------------- + + The first set of examples creates aggregations using the kmalloc + event. The fields that can be used for the hist trigger are listed + in the kmalloc event's format file: + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format + name: kmalloc + ID: 374 + format: + field:unsigned short common_type; offset:0; size:2; signed:0; + field:unsigned char common_flags; offset:2; size:1; signed:0; + field:unsigned char common_preempt_count; offset:3; size:1; signed:0; + field:int common_pid; offset:4; size:4; signed:1; + + field:unsigned long call_site; offset:8; size:8; signed:0; + field:const void * ptr; offset:16; size:8; signed:0; + field:size_t bytes_req; offset:24; size:8; signed:0; + field:size_t bytes_alloc; offset:32; size:8; signed:0; + field:gfp_t gfp_flags; offset:40; size:4; signed:0; + + We'll start by creating a hist trigger that generates a simple table + that lists the total number of bytes requested for each function in + the kernel that made one or more calls to kmalloc: + + # echo 'hist:key=call_site:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + This tells the tracing system to create a 'hist' trigger using the + call_site field of the kmalloc event as the key for the table, which + just means that each unique call_site address will have an entry + created for it in the table. The 'val=bytes_req' parameter tells + the hist trigger that for each unique entry (call_site) in the + table, it should keep a running total of the number of bytes + requested by that call_site. + + We'll let it run for awhile and then dump the contents of the 'hist' + file in the kmalloc event's subdirectory (for readability, a number + of entries have been omitted): + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active] + + call_site: 18446744071581750326 hitcount: 1 bytes_req: 24 + call_site: 18446744071583151255 hitcount: 1 bytes_req: 32 + call_site: 18446744071582443167 hitcount: 1 bytes_req: 264 + call_site: 18446744072104099935 hitcount: 2 bytes_req: 464 + call_site: 18446744071579323550 hitcount: 3 bytes_req: 168 + call_site: 18446744071580558850 hitcount: 4 bytes_req: 65536 + . + . + . + call_site: 18446744072101362738 hitcount: 4115 bytes_req: 362120 + call_site: 18446744072103235682 hitcount: 4115 bytes_req: 427960 + call_site: 18446744072102962280 hitcount: 4342 bytes_req: 1637584 + call_site: 18446744072102962328 hitcount: 4973 bytes_req: 360624 + call_site: 18446744072101400062 hitcount: 6465 bytes_req: 258600 + call_site: 18446744071582063959 hitcount: 7818 bytes_req: 15636 + call_site: 18446744072102968908 hitcount: 9315 bytes_req: 8912400 + call_site: 18446744072103122419 hitcount: 9315 bytes_req: 819720 + call_site: 18446744072101402850 hitcount: 10240 bytes_req: 573440 + call_site: 18446744072099471334 hitcount: 23820 bytes_req: 476512 + + Totals: + Hits: 109976 + Entries: 76 + Dropped: 0 + + The output displays a line for each entry, beginning with the key + specified in the trigger, followed by the value(s) also specified in + the trigger. At the beginning of the output is a line that displays + the trigger info, which can also be displayed by reading the + 'trigger' file: + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active] + + At the end of the output are a few lines that display the overall + totals for the run. The 'Hits' field shows the total number of + times the event trigger was hit, the 'Entries' fields shows the + total number of used entries in the hash table, and the 'Dropped' + field shows the number of hits that were dropped because the number + of used entries for the run exceeded the maximum number of entries + allowed for the table (normally 0, but if not a hint that you may + want to increase the size of the table using the 'size' param). + + Notice in the above output that there's an extra field, 'hitcount', + that wasn't specified in the trigger. Also notice that in the + trigger info a param,'sort=hitcount', which wasn't specified in the + trigger either. The reason is that every trigger implicitly keeps a + count of the total number of hits attributed to a given entry, + called the 'hitcount', and that in the absence of a user-specified + sort param, the hitcount is used as the default sort field. + + The value 'hitcount' can be used in place of an explicit value in + the 'values' param if you don't really need to have any particular + field summed and are mainly interested in hit frequencies. + + To turn the hist trigger off, simply call up the trigger in command + history and re-execute it with a '!' prepended: + + # echo '!hist:key=call_site:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + Finally, notice that the call_site as displayed in the output above + isn't really very useful. It's an address, but normally addresses + are displayed in hex. To have a numeric field displayed as hex + values, simply append '.hex' to the field name in the trigger: + + # echo 'hist:key=call_site.hex:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active] + + call_site: ffffffffa04e85cb hitcount: 1 bytes_req: 511 + call_site: ffffffff810a66a9 hitcount: 1 bytes_req: 1024 + call_site: ffffffff8152db82 hitcount: 1 bytes_req: 8 + call_site: ffffffffa008829e hitcount: 1 bytes_req: 7 + call_site: ffffffffa0087d6a hitcount: 1 bytes_req: 7 + call_site: ffffffffa02eb7e1 hitcount: 1 bytes_req: 433 + call_site: ffffffff8152cbde hitcount: 1 bytes_req: 192 + call_site: ffffffff811a2602 hitcount: 2 bytes_req: 32768 + . + . + . + call_site: ffffffffa0419062 hitcount: 724 bytes_req: 75296 + call_site: ffffffffa024fc32 hitcount: 724 bytes_req: 63712 + call_site: ffffffffa03d6468 hitcount: 852 bytes_req: 416736 + call_site: ffffffffa03d6498 hitcount: 944 bytes_req: 70984 + call_site: ffffffff81311d57 hitcount: 961 bytes_req: 1922 + call_site: ffffffffa0258dfe hitcount: 1429 bytes_req: 57160 + call_site: ffffffffa03fd5f3 hitcount: 1796 bytes_req: 158048 + call_site: ffffffffa03d7e4c hitcount: 1796 bytes_req: 2206848 + call_site: ffffffffa02598e2 hitcount: 3196 bytes_req: 178976 + call_site: ffffffffa0081fe6 hitcount: 9529 bytes_req: 190584 + + Totals: + Hits: 24542 + Entries: 60 + Dropped: 0 + + Even that's only marginally more useful - while hex values do look + more like addresses, what users are typically more interested in + when looking at text addresses are the corresponding symbols + instead. To have an address displayed as symbolic value instead, + simply append '.sym' to the field name in the trigger: + + # echo 'hist:key=call_site.sym:val=bytes_req' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active] + + call_site: [ffffffff8152db82] usb_control_msg hitcount: 1 bytes_req: 8 + call_site: [ffffffffa04e85cb] __ieee80211_start_scan hitcount: 1 bytes_req: 511 + call_site: [ffffffff815d7f6b] sk_prot_alloc hitcount: 1 bytes_req: 976 + call_site: [ffffffff8152cbde] usb_alloc_urb hitcount: 1 bytes_req: 192 + call_site: [ffffffff815d2c1b] sock_alloc_inode hitcount: 1 bytes_req: 64 + call_site: [ffffffffa02eb7e1] nl80211_trigger_scan hitcount: 1 bytes_req: 433 + call_site: [ffffffffa0087d6a] hidraw_report_event hitcount: 1 bytes_req: 7 + call_site: [ffffffff810a66a9] syslog_print_all hitcount: 1 bytes_req: 1024 + call_site: [ffffffffa008829e] hidraw_send_report hitcount: 1 bytes_req: 7 + call_site: [ffffffff811f1316] mounts_open_common hitcount: 2 bytes_req: 368 + call_site: [ffffffff811d0825] alloc_fdtable hitcount: 2 bytes_req: 96 + call_site: [ffffffffa042106f] __intel_framebuffer_create hitcount: 3 bytes_req: 432 + . + . + . + call_site: [ffffffff8136b079] sg_kmalloc hitcount: 461 bytes_req: 153696 + call_site: [ffffffffa03dbc5e] i915_gem_obj_lookup_or_create_vma hitcount: 462 bytes_req: 92400 + call_site: [ffffffffa03e213a] i915_gem_object_get_pages_gtt hitcount: 462 bytes_req: 7392 + call_site: [ffffffffa03d6498] i915_gem_do_execbuffer.isra.22 hitcount: 697 bytes_req: 52064 + call_site: [ffffffff81311d57] apparmor_file_alloc_security hitcount: 926 bytes_req: 1852 + call_site: [ffffffffa0258dfe] drm_vma_node_allow hitcount: 1050 bytes_req: 42000 + call_site: [ffffffffa03fd5f3] intel_ring_begin hitcount: 1116 bytes_req: 98208 + call_site: [ffffffffa03d7e4c] i915_gem_execbuffer2 hitcount: 1116 bytes_req: 956984 + call_site: [ffffffffa02598e2] drm_modeset_lock_crtc hitcount: 2055 bytes_req: 115080 + call_site: [ffffffffa0081fe6] hid_report_raw_event hitcount: 6433 bytes_req: 128664 + + Totals: + Hits: 17384 + Entries: 62 + Dropped: 0 + + Because the default sort key above is 'hitcount', the above shows a + the list of call_sites by increasing hitcount, so that at the bottom + we see the functions that made the most kmalloc calls during the + run. If instead we we wanted to see the top kmalloc callers in + terms of the number of bytes requested rather than the number of + calls, and we wanted the top caller to appear at the top, we can use + the 'sort' param, along with the 'descending' modifier: + + # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req:size=2048 [active] + + call_site: [ffffffffa03d7e4c] i915_gem_execbuffer2 hitcount: 682 bytes_req: 766360 + call_site: [ffffffff811d6c3b] seq_buf_alloc hitcount: 46 bytes_req: 186176 + call_site: [ffffffffa03d6468] i915_gem_do_execbuffer.isra.22 hitcount: 377 bytes_req: 142808 + call_site: [ffffffffa02598e2] drm_modeset_lock_crtc hitcount: 1107 bytes_req: 61992 + call_site: [ffffffffa03fd5f3] intel_ring_begin hitcount: 682 bytes_req: 60016 + call_site: [ffffffffa0081fe6] hid_report_raw_event hitcount: 2993 bytes_req: 59864 + call_site: [ffffffff8136b079] sg_kmalloc hitcount: 206 bytes_req: 48096 + call_site: [ffffffffa03dbc5e] i915_gem_obj_lookup_or_create_vma hitcount: 206 bytes_req: 41200 + call_site: [ffffffff811a2602] __kmalloc hitcount: 2 bytes_req: 32768 + call_site: [ffffffffa0419062] intel_crtc_page_flip hitcount: 305 bytes_req: 31720 + call_site: [ffffffffa024fc32] drm_mode_page_flip_ioctl hitcount: 305 bytes_req: 26840 + call_site: [ffffffffa0258dfe] drm_vma_node_allow hitcount: 603 bytes_req: 24120 + call_site: [ffffffffa03d6498] i915_gem_do_execbuffer.isra.22 hitcount: 305 bytes_req: 23072 + call_site: [ffffffff812775da] ext4_find_extent hitcount: 79 bytes_req: 7584 + . + . + . + call_site: [ffffffff811f3b07] inotify_handle_event hitcount: 2 bytes_req: 96 + call_site: [ffffffff811bfb6e] vfs_rename hitcount: 6 bytes_req: 66 + call_site: [ffffffff815d2c1b] sock_alloc_inode hitcount: 1 bytes_req: 64 + call_site: [ffffffff811d0825] alloc_fdtable hitcount: 1 bytes_req: 48 + call_site: [ffffffff81220376] proc_self_follow_link hitcount: 2 bytes_req: 24 + call_site: [ffffffff8112e1ec] hist_show hitcount: 2 bytes_req: 16 + call_site: [ffffffff8112e1d0] hist_show hitcount: 2 bytes_req: 16 + call_site: [ffffffff8152db82] usb_control_msg hitcount: 1 bytes_req: 8 + call_site: [ffffffffa008829e] hidraw_send_report hitcount: 1 bytes_req: 7 + call_site: [ffffffffa0087d6a] hidraw_report_event hitcount: 1 bytes_req: 7 + + Totals: + Hits: 9331 + Entries: 57 + Dropped: 0 + + We can also add multiple fields to the 'values' param. For example, + we might want to see the total number of bytes allocated alongside + bytes requested, and display the result sorted by bytes allocated in + a descending order: + + # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] + + call_site: [ffffffffa03d7e4c] i915_gem_execbuffer2 hitcount: 4427 bytes_req: 4190032 bytes_alloc: 5967744 + call_site: [ffffffff811d6c3b] seq_buf_alloc hitcount: 427 bytes_req: 1706768 bytes_alloc: 1765376 + call_site: [ffffffffa03d6468] i915_gem_do_execbuffer.isra.22 hitcount: 1978 bytes_req: 765936 bytes_alloc: 1434304 + call_site: [ffffffffa03fd5f3] intel_ring_begin hitcount: 4427 bytes_req: 389576 bytes_alloc: 424992 + call_site: [ffffffff8136b079] sg_kmalloc hitcount: 993 bytes_req: 305824 bytes_alloc: 376576 + call_site: [ffffffffa02598e2] drm_modeset_lock_crtc hitcount: 4319 bytes_req: 241864 bytes_alloc: 276416 + call_site: [ffffffffa03dbc5e] i915_gem_obj_lookup_or_create_vma hitcount: 993 bytes_req: 198600 bytes_alloc: 254208 + call_site: [ffffffffa0419062] intel_crtc_page_flip hitcount: 1806 bytes_req: 187824 bytes_alloc: 231168 + call_site: [ffffffff81424d78] __tty_buffer_request_room hitcount: 61 bytes_req: 128928 bytes_alloc: 229376 + call_site: [ffffffffa0081fe6] hid_report_raw_event hitcount: 9550 bytes_req: 191024 bytes_alloc: 210160 + call_site: [ffffffffa03d6498] i915_gem_do_execbuffer.isra.22 hitcount: 2449 bytes_req: 184720 bytes_alloc: 209216 + call_site: [ffffffffa0258dfe] drm_vma_node_allow hitcount: 3256 bytes_req: 130240 bytes_alloc: 208384 + . + . + . + call_site: [ffffffffa04e85cb] __ieee80211_start_scan hitcount: 3 bytes_req: 1533 bytes_alloc: 1536 + call_site: [ffffffffa02eb7e1] nl80211_trigger_scan hitcount: 3 bytes_req: 1299 bytes_alloc: 1536 + call_site: [ffffffff8152cbde] usb_alloc_urb hitcount: 6 bytes_req: 1152 bytes_alloc: 1152 + call_site: [ffffffff811f3b07] inotify_handle_event hitcount: 14 bytes_req: 744 bytes_alloc: 896 + call_site: [ffffffff81206b81] load_elf_binary hitcount: 24 bytes_req: 672 bytes_alloc: 768 + call_site: [ffffffff811bfb6e] vfs_rename hitcount: 38 bytes_req: 436 bytes_alloc: 640 + call_site: [ffffffff81220376] proc_self_follow_link hitcount: 24 bytes_req: 288 bytes_alloc: 384 + call_site: [ffffffff815d2c1b] sock_alloc_inode hitcount: 5 bytes_req: 320 bytes_alloc: 320 + call_site: [ffffffff811d0825] alloc_fdtable hitcount: 4 bytes_req: 192 bytes_alloc: 256 + call_site: [ffffffffa04ec05f] ieee80211_start_tx_ba_session hitcount: 1 bytes_req: 232 bytes_alloc: 256 + call_site: [ffffffffa0244701] drm_vm_open_locked hitcount: 5 bytes_req: 160 bytes_alloc: 160 + call_site: [ffffffff8112e1d0] hist_show hitcount: 10 bytes_req: 80 bytes_alloc: 80 + call_site: [ffffffff8112e1ec] hist_show hitcount: 10 bytes_req: 80 bytes_alloc: 80 + call_site: [ffffffff81074c9e] kthread_create_on_node hitcount: 1 bytes_req: 56 bytes_alloc: 64 + call_site: [ffffffff8152db82] usb_control_msg hitcount: 6 bytes_req: 48 bytes_alloc: 48 + call_site: [ffffffffa008829e] hidraw_send_report hitcount: 6 bytes_req: 42 bytes_alloc: 48 + call_site: [ffffffffa0087d6a] hidraw_report_event hitcount: 6 bytes_req: 42 bytes_alloc: 48 + + Totals: + Hits: 45576 + Entries: 66 + Dropped: 0 + + Finally, to finish off our kmalloc example, instead of simply having + the hist trigger display symbolic call_sites, we can have the hist + trigger additionally display the complete set of kernel stack traces + that led to each call_sites. To do that, we simply use the special + value 'stacktrace' for the key param: + + # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \ + /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger + + The above trigger will use the kernel stack trace in effect when an + event is triggered as the key for the hash table. This allows the + enumeration of every kernel callpath that led up to a particular + event, along with a running total of any of the event fields for + that event. Here we tally bytes requested and bytes allocated for + every callpath in the system that led up to a kmalloc (in this case + every callpath to a kmalloc for a kernel compile): + + # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist + trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active] + + stacktrace: + __kmalloc_track_caller+0x10b/0x190 + kstrdup+0x35/0x70 + __kernfs_new_node+0x34/0xf0 + kernfs_new_node+0x26/0x50 + kernfs_create_dir_ns+0x29/0x80 + sysfs_create_dir_ns+0x40/0xa0 + kobject_add_internal+0xbd/0x3a0 + kobject_add+0x60/0xb0 + device_add+0xff/0x5a0 + device_register+0x1e/0x30 + usb_create_ep_devs+0x81/0xd0 + usb_new_device+0x357/0x410 + hub_event+0xd68/0x11a0 + process_one_work+0x149/0x3d0 + worker_thread+0x121/0x4a0 + kthread+0xd2/0xf0 + hitcount: 1 bytes_req: 6 bytes_alloc: 8 + stacktrace: + kmem_cache_alloc_trace+0xfb/0x160 + usb_control_msg+0x42/0x110 + hub_port_status+0x84/0x120 + hub_port_reset+0x263/0x430 + hub_port_init+0x77/0xbb0 + hub_event+0x944/0x11a0 + process_one_work+0x149/0x3d0 + worker_thread+0x121/0x4a0 + kthread+0xd2/0xf0 + ret_from_fork+0x7c/0xb0 + hitcount: 1 bytes_req: 8 bytes_alloc: 8 + . + . + . + stacktrace: + __kmalloc+0x11b/0x1a0 + load_elf_phdrs+0x76/0xa0 + load_elf_binary+0x102/0x1780 + search_binary_handler+0x97/0x1d0 + do_execveat_common.isra.32+0x551/0x6e0 + SyS_execve+0x3a/0x50 + stub_execve+0x69/0xa0 + hitcount: 131791 bytes_req: 67822944 bytes_alloc: 80279552 + stacktrace: + __kmalloc+0x11b/0x1a0 + i915_gem_execbuffer2+0x6c/0x2c0 [i915] + drm_ioctl+0x1a4/0x630 [drm] + do_vfs_ioctl+0x2f0/0x4f0 + SyS_ioctl+0x81/0xa0 + system_call_fastpath+0x12/0x17 + hitcount: 164540 bytes_req: 134464456 bytes_alloc: 167693376 + stacktrace: + kmem_cache_alloc_trace+0xfb/0x160 + apparmor_file_alloc_security+0x27/0x40 + security_file_alloc+0x16/0x20 + get_empty_filp+0x93/0x1c0 + path_openat+0x31/0x620 + do_filp_open+0x3a/0x90 + do_sys_open+0x128/0x220 + SyS_open+0x1e/0x20 + system_call_fastpath+0x12/0x17 + hitcount: 22118413 bytes_req: 44236888 bytes_alloc: 176947328 + stacktrace: + __kmalloc+0x11b/0x1a0 + seq_buf_alloc+0x1b/0x50 + seq_read+0x2cc/0x370 + proc_reg_read+0x3d/0x80 + __vfs_read+0x18/0x50 + vfs_read+0x86/0x140 + SyS_read+0x46/0xb0 + system_call_fastpath+0x12/0x17 + hitcount: 66168 bytes_req: 271015944 bytes_alloc: 271015952 + + Totals: + Hits: 26780801 + Entries: 633 + Dropped: 0 + + If you key a hist trigger on pid, for example, to gather and display + sorted totals for each process, you can use the special .execname + modifier to display the executable names for the processes in the + table rather than raw pids. The example below keeps a per-process + sum of total bytes read: + + # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \ + /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger + + # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist + trigger info: hist:keys=common_pid.execname:vals=count:sort=count:size=2048 [active] + + common_pid: Xorg [18446612132314219843] hitcount: 166 count: 159744 + common_pid: gnome-terminal [18446612132314221635] hitcount: 44 count: 113321 + common_pid: compiz [18446612132314221491] hitcount: 13 count: 81968 + common_pid: bash [18446612132314231931] hitcount: 3 count: 66369 + common_pid: gdbus [18446612132314221135] hitcount: 91 count: 1456 + common_pid: acpid [18446612132314219701] hitcount: 48 count: 1152 + common_pid: ibus-daemon [18446612132314221129] hitcount: 71 count: 1136 + common_pid: gdbus [18446612132314221638] hitcount: 42 count: 672 + common_pid: gdbus [18446612132314221345] hitcount: 42 count: 672 + common_pid: ibus-engine-sim [18446612132314221344] hitcount: 14 count: 224 + common_pid: gdbus [18446612132314221262] hitcount: 12 count: 192 + common_pid: gdbus [18446612132314221249] hitcount: 6 count: 96 + common_pid: ibus-ui-gtk3 [18446612132314221245] hitcount: 6 count: 96 + common_pid: gdbus [18446612132314221504] hitcount: 5 count: 80 + common_pid: unity-panel-ser [18446612132314221243] hitcount: 3 count: 48 + common_pid: postgres [18446612132314220339] hitcount: 1 count: 16 + common_pid: gdbus [18446612132314221395] hitcount: 1 count: 16 + common_pid: rtkit-daemon [18446612132314220509] hitcount: 1 count: 8 + common_pid: upstart-dbus-br [18446612132314221100] hitcount: 4 count: 4 + common_pid: bash [18446612132314221924] hitcount: 4 count: 4 + common_pid: firefox [18446612132314222258] hitcount: 1 count: 1 + + Totals: + Hits: 578 + Entries: 21 + Dropped: 0 + + + Similarly, if you key a hist trigger on syscall id, for example to + gather and display a list of systemwide syscall hits, you can use + the special .syscall modifier to display the syscall names rather + than raw ids. The example below keeps a running total of syscall + counts for the system during the run: + + # echo 'hist:key=id.syscall:val=hitcount' > \ + /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger + + # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist + trigger info: hist:keys=id.syscall:vals=:sort=hitcount:size=2048 [active] + + id: sys_fadvise64 hitcount: 1 + id: sys_bind hitcount: 1 + id: sys_fsync hitcount: 1 + id: sys_getsockname hitcount: 1 + id: sys_readlink hitcount: 3 + id: sys_set_tid_address hitcount: 4 + . + . + . + id: sys_mprotect hitcount: 57 + id: sys_waitid hitcount: 64 + id: sys_rt_sigaction hitcount: 121 + id: sys_inotify_add_watch hitcount: 234 + id: sys_mmap hitcount: 258 + id: sys_nanosleep hitcount: 310 + id: sys_close hitcount: 323 + id: sys_rt_sigprocmask hitcount: 3520 + id: sys_futex hitcount: 4819 + id: sys_write hitcount: 8907 + id: sys_setitimer hitcount: 39039 + id: sys_writev hitcount: 64528 + id: sys_poll hitcount: 77493 + id: sys_recvmsg hitcount: 168240 + id: sys_ioctl hitcount: 448946 + + Totals: + Hits: 869971 + Entries: 75 + Dropped: 0 + + The next example uses a string field as the hash key and + demonstrates how you can manually pause and continue a hist trigger. + In this example, we'll aggregate fork counts and don't expect a + large number of entries in the hash table, so we'll drop it to a + much smaller number, say 256: + + # echo 'hist:key=child_comm:val=hitcount:size=256' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + trigger info: hist:keys=child_comm:vals=:sort=hitcount:size=256 [active] + + child_comm: gdbus hitcount: 1 + child_comm: smbd hitcount: 1 + child_comm: ibus-daemon hitcount: 1 + child_comm: Cache2 I/O hitcount: 1 + child_comm: dconf worker hitcount: 1 + child_comm: postgres hitcount: 1 + child_comm: unity-panel-ser hitcount: 2 + child_comm: compiz hitcount: 3 + child_comm: bash hitcount: 3 + child_comm: firefox hitcount: 60 + + Totals: + Hits: 74 + Entries: 10 + Dropped: 0 + + # echo 'hist:key=child_comm:val=hitcount:size=256:pause' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + + If we want to pause the hist trigger, we can simply append :pause to + the command that started the trigger. Notice that the trigger info + displays as [paused]: + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + trigger info: hist:keys=child_comm:vals=:sort=hitcount:size=256 [paused] + + child_comm: dconf worker hitcount: 1 + child_comm: gdbus hitcount: 1 + child_comm: Cache2 I/O hitcount: 1 + child_comm: ibus-daemon hitcount: 1 + child_comm: unity-panel-ser hitcount: 2 + child_comm: smbd hitcount: 2 + child_comm: postgres hitcount: 3 + child_comm: compiz hitcount: 3 + child_comm: bash hitcount: 3 + child_comm: emacs hitcount: 8 + child_comm: firefox hitcount: 60 + + Totals: + Hits: 85 + Entries: 11 + Dropped: 0 + + To manually continue having the trigger aggregate events, append + :cont instead. Notice that the trigger info displays as [active] + again, and the data has changed: + + # echo 'hist:key=child_comm:val=hitcount:size=256:cont' > + /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist + trigger info: hist:keys=child_comm:vals=:sort=hitcount:size=256 [active] + + child_comm: Cache2 I/O hitcount: 1 + child_comm: ibus-daemon hitcount: 1 + child_comm: kthreadd hitcount: 1 + child_comm: gdbus hitcount: 1 + child_comm: dconf worker hitcount: 1 + child_comm: unity-panel-ser hitcount: 2 + child_comm: smbd hitcount: 2 + child_comm: compiz hitcount: 3 + child_comm: postgres hitcount: 4 + child_comm: bash hitcount: 4 + child_comm: emacs hitcount: 8 + child_comm: firefox hitcount: 60 + + Totals: + Hits: 88 + Entries: 12 + Dropped: 0 + + The previous example showed how to start and stop a hist trigger by + appending 'pause' and 'continue' to the hist trigger command. A + hist trigger can also be started in a paused state by initially + starting the trigger with ':pause' appended. This allows you to + start the trigger only when you're ready to start collecting data + and not before. For example, start the trigger in a paused state, + then unpause it and do something you want to measure, then pause the + trigger when done. + + Of course, doing this manually can be difficult and error-prone, but + it is possible to automatically start and stop a hist trigger based + on some condition, via the enable_hist and disable_hist triggers. + + For example, suppose we wanted to take a look at the relative + weights in terms of skb length for each callpath that leads to a + netif_receieve_skb event when downloading a decent-sized file using + wget. + + First we set up an initially paused stacktrace trigger on the + netif_receive_skb event: + + # echo 'hist:key=stacktrace:vals=len:pause' > \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + + Next, we set up an 'enable_hist' trigger on the sched_process_exec + event, with an 'if filename==/usr/bin/wget' filter. The effect of + this new trigger is that it will 'unpause' the hist trigger we just + set up on netif_receive_skb if and only if it sees a + sched_process_exec event with a filename of '/usr/bin/wget'. When + that happens, all netif_receive_skb events are aggregated into a + hash table keyed on stacktrace: + + # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + + The aggregation continues until the netif_receive_skb is paused + again, which is what the following disable_hist event does by + creating a similar setup on the sched_process_exit event, using the + filter 'comm==wget': + + # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + + Whenever a process exits and the comm field of the disable_hist + trigger filter matches 'comm==wget', the netif_receive_skb hist + trigger is disabled. + + The overall effect is that netif_received_skb events are aggregated + into the hash table for only the duration of the wget. Executing a + wget command and then listing the 'hist' file will display the + output generated by the wget command: + + $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz + + # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist + trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused] + + stacktrace: + __netif_receive_skb_core+0x4ad/0x780 + __netif_receive_skb+0x18/0x60 + process_backlog+0xa8/0x150 + net_rx_action+0x15d/0x340 + __do_softirq+0x114/0x2c0 + do_softirq_own_stack+0x1c/0x30 + do_softirq+0x65/0x70 + __local_bh_enable_ip+0xb5/0xc0 + ip_finish_output+0x1f4/0x810 + ip_output+0x68/0xa0 + ip_local_out_sk+0x30/0x40 + ip_send_skb+0x1a/0x50 + udp_send_skb+0x173/0x2a0 + udp_sendmsg+0x2bf/0x9f0 + inet_sendmsg+0x63/0xb0 + do_sock_sendmsg+0x8c/0x100 + hitcount: 4 len: 388 + stacktrace: + __netif_receive_skb_core+0x4ad/0x780 + __netif_receive_skb+0x18/0x60 + netif_receive_skb_internal+0x23/0x90 + napi_gro_receive+0xc8/0x100 + ieee80211_deliver_skb+0xba/0x250 [mac80211] + ieee80211_rx_handlers+0xcc5/0x2240 [mac80211] + ieee80211_prepare_and_rx_handle+0x4e7/0xc30 [mac80211] + ieee80211_rx+0x31d/0x920 [mac80211] + iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm] + iwl_rx_dispatch+0x8e/0xf0 [iwldvm] + iwl_pcie_irq_handler+0xe1c/0x12c0 [iwlwifi] + irq_thread_fn+0x20/0x50 + irq_thread+0x11f/0x150 + kthread+0xd2/0xf0 + ret_from_fork+0x7c/0xb0 + hitcount: 35 len: 17415 + stacktrace: + __netif_receive_skb_core+0x4ad/0x780 + __netif_receive_skb+0x18/0x60 + netif_receive_skb_internal+0x23/0x90 + napi_gro_complete+0xa4/0xe0 + dev_gro_receive+0x233/0x360 + napi_gro_receive+0x30/0x100 + ieee80211_deliver_skb+0xba/0x250 [mac80211] + ieee80211_rx_handlers+0xcc5/0x2240 [mac80211] + ieee80211_prepare_and_rx_handle+0x4e7/0xc30 [mac80211] + ieee80211_rx+0x31d/0x920 [mac80211] + iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm] + iwl_rx_dispatch+0x8e/0xf0 [iwldvm] + iwl_pcie_irq_handler+0xe1c/0x12c0 [iwlwifi] + irq_thread_fn+0x20/0x50 + irq_thread+0x11f/0x150 + kthread+0xd2/0xf0 + hitcount: 155 len: 636342 + stacktrace: + __netif_receive_skb_core+0x4ad/0x780 + __netif_receive_skb+0x18/0x60 + netif_receive_skb_internal+0x23/0x90 + napi_gro_complete+0xa4/0xe0 + napi_gro_flush+0x6d/0x90 + iwl_pcie_irq_handler+0x90f/0x12c0 [iwlwifi] + irq_thread_fn+0x20/0x50 + irq_thread+0x11f/0x150 + kthread+0xd2/0xf0 + ret_from_fork+0x7c/0xb0 + hitcount: 1013 len: 5531908 + + Totals: + Hits: 1207 + Entries: 4 + Dropped: 0 + + The above shows all the netif_receive_skb callpaths and their total + lengths for the duration of the wget command. + + The 'clear' hist trigger param can be used to clear the hash table. + Suppose we wanted to try another run of the previous example but + this time also wanted to see the complete list of events that went + into the histogram. In order to avoid having to set everything up + again, we can just clear the histogram first: + + # echo 'hist:key=stacktrace:vals=len:clear' > \ + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger + + Just to verify that it is in fact cleared, here's what we now see in + the hist file: + + # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist + trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused] + + Totals: + Hits: 0 + Entries: 0 + Dropped: 0 + + Since we want to see the detailed list of every netif_receive_skb + event occurring during the new run, which are in fact same events + being aggregated into the hash table, we add some additional + 'enable_event' events the triggering sched_process_exec and + sched_process_exit events as such: + + # echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + + # echo 'disable_event:net:netif_receive_skb if comm==wget' > \ + /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + + If you read the trigger files for the sched_process_exec and + sched_process_exit triggers, you should see two triggers for each: + one enabling/disabling the hist aggregation and the other + enabling/disabling the logging of events: + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger + enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget + enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget + + # cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger + enable_event:net:netif_receive_skb:unlimited if comm==wget + disable_hist:net:netif_receive_skb:unlimited if comm==wget + + In other words, whever either of the sched_process_exec or + sched_process_exit events is hit and matches 'wget', it enables or + disables both the histogram and the event log, and what you end up + with is a hash table and set of events just covering the specified + duration. + + Displaying the 'hist' file should show something similar to what you + saw in the last run, but this time you should also see the + individual events in the trace file: + + # cat /sys/kernel/debug/tracing/trace + # tracer: nop + # + # entries-in-buffer/entries-written: 183/1426 #P:4 + # + # _-----=> irqs-off + # / _----=> need-resched + # | / _---=> hardirq/softirq + # || / _--=> preempt-depth + # ||| / delay + # TASK-PID CPU# |||| TIMESTAMP FUNCTION + # | | | |||| | | + wget-15108 [000] ..s1 31769.606929: netif_receive_skb: dev=lo skbaddr=ffff88009c353100 len=60 + wget-15108 [000] ..s1 31769.606999: netif_receive_skb: dev=lo skbaddr=ffff88009c353200 len=60 + dnsmasq-1382 [000] ..s1 31769.677652: netif_receive_skb: dev=lo skbaddr=ffff88009c352b00 len=130 + dnsmasq-1382 [000] ..s1 31769.685917: netif_receive_skb: dev=lo skbaddr=ffff88009c352200 len=138 + ##### CPU 2 buffer started #### + irq/29-iwlwifi-559 [002] ..s. 31772.031529: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433d00 len=2948 + irq/29-iwlwifi-559 [002] ..s. 31772.031572: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432200 len=1500 + irq/29-iwlwifi-559 [002] ..s. 31772.032196: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433100 len=2948 + irq/29-iwlwifi-559 [002] ..s. 31772.032761: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433000 len=2948 + irq/29-iwlwifi-559 [002] ..s. 31772.033220: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432e00 len=1500 + . + . + . -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/