Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751841AbdITOo4 (ORCPT ); Wed, 20 Sep 2017 10:44:56 -0400 Received: from mx0a-00010702.pphosted.com ([148.163.156.75]:38204 "EHLO mx0b-00010702.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751732AbdITOox (ORCPT ); Wed, 20 Sep 2017 10:44:53 -0400 Date: Wed, 20 Sep 2017 09:44:33 -0500 From: Julia Cartwright To: Tom Zanussi CC: , , , , , , , , , , , Subject: Re: [PATCH v2 37/40] tracing: Add inter-event hist trigger Documentation Message-ID: <20170920144433.GE29872@jcartwri.amer.corp.natinst.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.0 (2017-09-02) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-09-20_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_policy_notspam policy=outbound_policy score=30 priorityscore=1501 malwarescore=0 suspectscore=2 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=30 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1709200200 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13547 Lines: 297 On Tue, Sep 05, 2017 at 04:57:49PM -0500, Tom Zanussi wrote: > Add background and details on inter-event hist triggers, including > hist variables, synthetic events, and actions. > > Signed-off-by: Tom Zanussi > Signed-off-by: Baohong Liu > --- > Documentation/trace/events.txt | 385 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 385 insertions(+) > > diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt > index f271d87..7ee720b 100644 > --- a/Documentation/trace/events.txt > +++ b/Documentation/trace/events.txt > @@ -571,6 +571,7 @@ The following commands are supported: > .sym-offset display an address as a symbol and offset > .syscall display a syscall id as a system call name > .execname display a common_pid as a program name > + .usecs display a $common_timestamp in microseconds > > Note that in general the semantics of a given field aren't > interpreted when applying a modifier to it, but there are some > @@ -2101,3 +2102,387 @@ The following commands are supported: > Hits: 489 > Entries: 7 > Dropped: 0 > + > +6.3 Inter-event hist triggers > +----------------------------- > + > +Inter-event hist triggers are hist triggers that combine values from > +one or more other events and create a histogram using that data. Data > +from an inter-event histogram can in turn become the source for > +further combined histograms, thus providing a chain of related > +histograms, which is important for some applications. > + > +The most important example of an inter-event quantity that can be used > +in this manner is latency, which is simply a difference in timestamps > +between two events (although trace events don't have an externally > +visible timestamp field, the inter-event hist trigger support adds a > +pseudo-field to all events named '$common_timestamp' which can be used > +as if it were an actual event field) What's between the parentheses is covered below, is it worth mentioning in both places? [..] > + - Trace events don't have a 'timestamp' associated with them, but > + there is an implicit timestamp saved along with an event in the > + underlying ftrace ring buffer. This timestamp is now exposed as a > + a synthetic field named '$common_timestamp' which can be used in > + histograms as if it were any other event field. Note that it has > + a '$' prefixed to it - this is meant to indicate that it isn't an > + actual field in the trace format but rather is a synthesized value > + that nonetheless can be used as if it were an actual field. By > + default it is in units of nanoseconds; appending '.usecs' to a > + common_timestamp field changes the units to microseconds. > + > +These features are decribed in more detail in the following sections. ^ described > + > +6.3.1 Histogram Variables > +------------------------- > + > +Variables are simply named locations used for saving and retrieving > +values between matching events. A 'matching' event is defined as an > +event that has a matching key - if a variable is saved for a histogram > +entry corresponding to that key, any subsequent event with a matching > +key can access that variable. > + > +A variable's value is normally available to any subsequent event until > +it is set to something else by a subsequent event. The one exception > +to that rule is that any variable used in an expression is essentially > +'read-once' - once it's used by an expression in a subsequent event, > +it's reset to its 'unset' state, which means it can't be used again > +unless it's set again. This ensures not only that an event doesn't > +use an uninitialized variable in a calculation, but that that variable > +is used only once and not for any unrelated subsequent match. > + > +The basic syntax for saving a variable is to simply prefix a unique > +variable name not corresponding to any keyword along with an '=' sign > +to any event field. This would seem to make it much more difficult in the future to add new keywords for hist triggers without breaking users existing triggers. Maybe that's not a problem given the tracing ABI wild west. [..] > +6.3.2 Synthetic Events > +---------------------- > + > +Synthetic events are user-defined events generated from hist trigger > +variables or fields associated with one or more other events. Their > +purpose is to provide a mechanism for displaying data spanning > +multiple events consistent with the existing and already familiar > +usage for normal events. > + > +To define a synthetic event, the user writes a simple specification > +consisting of the name of the new event along with one or more > +variables and their types, which can be any valid field type, > +separated by semicolons, to the tracing/synthetic_events file. > + > +For instance, the following creates a new event named 'wakeup_latency' > +with 3 fields: lat, pid, and prio. Each of those fields is simply a > +variable reference to a variable on another event: > + > + # echo 'wakeup_latency \ > + u64 lat; \ > + pid_t pid; \ > + int prio' >> \ > + /sys/kernel/debug/tracing/synthetic_events > + > +Reading the tracing/synthetic_events file lists all the currently > +defined synthetic events, in this case the event defined above: > + > + # cat /sys/kernel/debug/tracing/synthetic_events > + wakeup_latency u64 lat; pid_t pid; int prio > + > +An existing synthetic event definition can be removed by prepending > +the command that defined it with a '!': > + > + # echo '!wakeup_latency u64 lat pid_t pid int prio' >> \ > + /sys/kernel/debug/tracing/synthetic_events > + > +At this point, there isn't yet an actual 'wakeup_latency' event > +instantiated in the event subsytem - for this to happen, a 'hist > +trigger action' needs to be instantiated and bound to actual fields > +and variables defined on other events (see Section 6.3.3 below). > + > +Once that is done, an event instance is created, and a histogram can > +be defined using it: > + > + # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \ > + /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger > + > +The new event is created under the tracing/events/synthetic/ directory > +and looks and behaves just like any other event: > + > + # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency > + enable filter format hist id trigger > + > +Like any other event, once a histogram is enabled for the event, the > +output can be displayed by reading the event's 'hist' file. > + > +6.3.3 Hist trigger 'actions' > +---------------------------- > + > +A hist trigger 'action' is a function that's executed whenever a > +histogram entry is added or updated. > + > +The default 'action' if no special function is explicity specified is > +as it always has been, to simply update the set of values associated > +with an entry. Some applications, however, may want to perform > +additional actions at that point, such as generate another event, or > +compare and save a maximum. > + > +The following additional actions are available. To specify an action > +for a given event, simply specify the action between colons in the > +hist trigger specification. > + > + - onmatch(matching.event).(param list) > + > + The 'onmatch(matching.event).(params)' hist > + trigger action is invoked whenever an event matches and the > + histogram entry would be added or updated. I understand from the documentation that 'onmatch' establishes a relation between events, but it isn't clear the nature of this relation. In other words: what criteria are used to determine whether two events match? Is it the trace buffer ordering? Time ordering? Something else? > It causes the named > + synthetic event to be generated with the values given in the > + 'param list'. The result is the generation of a synthetic event > + that consists of the values contained in those variables at the > + time the invoking event was hit. > + > + The 'param list' consists of one or more parameters which may be > + either variables or fields defined on either the 'matching.event' > + or the target event. The variables or fields specified in the > + param list may be either fully-qualified or unqualified. If a > + variable is specified as unqualified, it must be unique between > + the two events. A field name used as a param can be unqualified > + if it refers to the target event, but must be fully qualified if > + it refers to the matching event. A fully-qualified name is of the > + form 'system.event_name.$var_name' or 'system.event_name.field'. > + > + The 'matching.event' specification is simply the fully qualified > + event name of the event that matches the target event for the > + onmatch() functionality, in the form 'system.event_name'. > + > + Finally, the number and type of variables/fields in the 'param > + list' must match the number and types of the fields in the > + synthetic event being generated. > + > + As an example the below defines a simple synthetic event and uses > + a variable defined on the sched_wakeup_new event as a parameter > + when invoking the synthetic event. Here we define the synthetic > + event: > + > + # echo 'wakeup_new_test pid_t pid' >> \ > + /sys/kernel/debug/tracing/synthetic_events > + > + # cat /sys/kernel/debug/tracing/synthetic_events > + wakeup_new_test pid_t pid > + > + The following hist trigger both defines the missing testpid > + variable and specifies an onmatch() action that generates a > + wakeup_new_test synthetic event whenever a sched_wakeup_new event > + occurs, which because of the 'if comm == "cyclictest"' filter only > + happens when the executable is cyclictest: > + > + # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\ > + wakeup_new_test($testpid) if comm=="cyclictest"' >> \ > + /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger > + > + Creating and displaying a histogram based on those events is now > + just a matter of using the fields and new synthetic event in the > + tracing/events/synthetic directory, as usual: > + > + # echo 'hist:keys=pid:sort=pid' >> \ > + /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger > + > + Running 'cyclictest' should cause wakeup_new events to generate > + wakeup_new_test synthetic events which should result in histogram > + output in the wakeup_new_test event's hist file: > + > + # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist > + > + A more typical usage would be to use two events to calculate a > + latency. The following example uses a set of hist triggers to > + produce a 'wakeup_latency' histogram: > + > + First, we define a 'wakeup_latency' synthetic event: > + > + # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \ > + /sys/kernel/debug/tracing/synthetic_events > + > + Next, we specify that whenever we see a sched_waking event for a > + cyclictest thread, save the timestamp in a 'ts0' variable: > + > + # echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=$common_timestamp.usecs \ > + if comm=="cyclictest"' >> \ > + /sys/kernel/debug/tracing/events/sched/sched_waking/trigger > + > + Then, when the corresponding thread is actually scheduled onto the > + CPU by a sched_switch event, calculate the latency and use that > + along with another variable and an event field to generate a > + wakeup_latency synthetic event: > + > + # echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0:\ > + onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\ > + $saved_pid,next_prio) if next_comm=="cyclictest"' >> \ > + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger Just a check for understanding: The onmatch() relation between events is driven by the use of the histogram variables 'ts0' and 'saved_pid', here. If that's the case, this then precludes the matching only on fields statically defined in matching event's tracepoint? Continuing on your latency examples, if I wanted to look at hrtimer latencies: # echo 'hrtimer_latency u64 lat' >> /sys/kernel/debug/tracing/synthetic_events # echo 'hist:keys=hrtimer.hex:onmatch(timer.hrtimer_start).hrtimer_latency(now-expires)' > \ /sys/kernel/debug/tracing/events/timer/hrtimer_expire_entry/trigger (where 'now' and 'expires' are defined fields of the hrtimer_start and hrtimer_expire_entry tracepoints, respectively) This would _not_ work, because without the usage of a 'dummy' variable, there is no "capturing of context" in the hrtimer_start event. I would have to, instead, do: # echo 'hrtimer_latency u64 lat' >> /sys/kernel/debug/tracing/synthetic_events # # echo 'hist:keys=hrtimer.hex:ts0=expires' > \ /sys/kernel/debug/tracing/events/timer/hrtimer_start/trigger # echo 'hist:keys=hrtimer.hex:onmatch(timer.hrtimer_start).hrtimer_latency(now-$ts0)' > \ /sys/kernel/debug/tracing/events/timer/hrtimer_expire_entry/trigger That is, create a dummy $ts0 to establish the relation. Is my understanding correct? If so, that's not very intuitive (to me). Thanks, Julia