2017-09-05 21:58:06

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 00/40] tracing: Inter-event (e.g. latency) support

Hi,

This is V2 of the inter-event tracing patchset.

There are too many changes to list in detail, most of them directly
addressing input from V1, but here are the major changes from V1
(thanks to everyone who reviewed V1 and thanks to both Vedang Patel
and Baohong Liu for their contributions and included patches):

- cleaned up the event format
- add types to synthetic event fields
- change to mutex ordering to avoid splat
- suppress output of the contributing trace events into the trace
buffer, unless explicitly enabled
- changed from sched_wakeup to sched_waking in the examples
- extended timestamp updates
- use a flag to signify dynamic tracepoints, rather than via api changes
- recursion level fixes
- removal of the possibility of tracing_map duplicates (Vedang)
- removal of duplicate-merging code (Vedang)
- max buffer absolute timestamp fix (Baohong)
- separate variable definition and assignment (Baohong)
- make variables instance-safe
- split a couple larger patches into smaller, various refactoring
- string handling fixes
- use function pointer for synthetic tracepoint func
- use union for actions
- various clean-ups as mentioned in review
- Documentation updates
- special-case recursion allowance for synthetic event generation

NOTE: The first patch in the series, 'tracing: Exclude 'generic
fields' from histograms' should go in regardless of the rest, since it
fixes a bug in existing code.

Thanks,

Tom


This patchset adds support for 'inter-event' quantities to the trace
event subsystem. The most important example of inter-event quantities
are latencies, or the time differences between two events.

One of the main motivations for adding this capability is to provide a
general-purpose base that existing existing tools such as the -RT
latency_hist patchset can be built upon, while at the same time
providing a simple way for users to track latencies (or any
inter-event quantity) generically between any two events.

Previous -RT latency_hist patchsets that take advantage of the trace
event subsystem have been submitted, but they essentially hard-code
special-case tracepoints and application logic in ways that can't be
reused. It seemed to me that rather than providing a one-off patchset
devoted specifically to generating the specific histograms in the
latency_hist patchset, it should be possible to build the same
functionality on top of a generic layer allowing users to do similar
things for other non-latency_hist applications.

In addition to preliminary patches that add some basic missing
functionality such as a common ringbuffer-derived timestamp and
dynamically-creatable tracepoints, the overall patchset is divided up
into a few different areas that combine to produce the overall goal
(The Documentation patch explains all the details):

- variables and simple expressions required to calculate a latency

In order to calculate a latency or any inter-event value,
something from one event needs to be saved and later retrieved,
and some operation such as subtraction or addition is performed on
it. This means some minimal form of variables and expressions,
which the first set of patches implements. Saving and retrieving
events to use in a latency calculation is normally done using a
hash table, and that's exactly what we have with trace event hist
triggers, so that's where variables are instantiated, set, and
retrieved. Basically, variables are set on one entry and
retrieved and used by a 'matching' event.

- 'synthetic' events, combining variables from other events

The trace event interface is based on pseudo-files associated with
individual events, so it wouldn't really make sense to have
quantities derived from multiple events attached to any one of
those events. For that reason, the patchset implements a means of
combining variables from other events into a separate 'synthetic'
event, which can be treated as if it were just like any other
trace event in the system.

- 'actions' generating synthetic events, among other things

Variables and synthetic events provide the data and data structure
for new events, but something still needs to actually generate an
event using that data. 'Actions' are expanded to provide that
capability. Though it hasn't been explicitly called as much
before, the default 'action' currently for a hist trigger is to
update the matching histogram entry's sum values. This patchset
essentially expands that to provide a new 'onmatch.trace(event)'
action that can be used to have one event generate another. The
mechanism is extensible to other actions, and in fact the patchset
also includes another, 'onmax(var).save(field,...)' that can be
used to save context whenever a value exceeds the previous maximum
(something also needed by latency_hist).

I'm submitting the patchset (based on tracing/for-next) as an RFC not
only to get comments, but because there are still some problems I
haven't fixed yet...

Here are some examples that should make things less abstract.

====
Example - wakeup latency
====

This basically implements the -RT latency_hist 'wakeup_latency'
histogram using the synthetic events, variables, and actions
described. The output below is from a run of cyclictest using the
following command:

# rt-tests/cyclictest -p 80 -n -s -t 2

What we're measuring the latency of is the time between when a
thread (of cyclictest) is awakened and when it's scheduled in. To
do that we add triggers to sched_waking and sched_switch with the
appropriate variables, and on a matching sched_switch event,
generate a synthetic 'wakeup_latency' event. Since it's just
another trace event like any other, we can also define a histogram
on that event, the output of which is what we see displayed when
reading the wakeup_latency 'hist' file.

First, we create a synthetic event called wakeup_latency, that
creates 3 fields which will reference variables from other events:

# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events

Next we add a trigger to sched_waking, which saves the value of the
'$common_timestamp' when that event is hit in a variable, ts0. Note
that this happens only when 'comm==cyclictest'.

Also, '$common_timestamp' is a new field defined on every event (if
needed - if there are no users of timestamps in a trace, timestamps
won't be saved and there's no additional overhead from that).

# echo 'hist:keys=pid:ts0=$common_timestamp.usecs if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger

Next, we add a trigger to sched_switch. When the pid being switched
to matches the pid woken up by a previous sched_waking event, this
event grabs the ts0 variable saved on that event, takes the
difference between it and the current sched_switch's
$common_timestamp, and assigns it to a new 'wakeup_lat' variable.
It then generates the wakeup_latency synthetic event defined earlier
by 'invoking' it as a function using as parameters the wakeup_lat
variable and two sched_switch event fields directly:

# echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0: \
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_prio) \
if next_comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

Finally, all we have left to do is create a standard histogram
simply naming the fields of the wakeup_latency synthetic event:

# echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger

At any time, we can see the histogram output by simply reading the
synthetic/wakeup_latency/hist file:

# cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist

# event histogram
#
# trigger info: hist:keys=pid,prio,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
#

{ pid: 2418, prio: 120, lat: 6 } hitcount: 21
{ pid: 2418, prio: 120, lat: 7 } hitcount: 62
{ pid: 2418, prio: 120, lat: 8 } hitcount: 44
{ pid: 2418, prio: 120, lat: 9 } hitcount: 34
{ pid: 2418, prio: 120, lat: 10 } hitcount: 16
{ pid: 2418, prio: 120, lat: 11 } hitcount: 5
{ pid: 2418, prio: 120, lat: 12 } hitcount: 2
{ pid: 2418, prio: 120, lat: 13 } hitcount: 2
{ pid: 2418, prio: 120, lat: 14 } hitcount: 1
{ pid: 2418, prio: 120, lat: 15 } hitcount: 1
{ pid: 2418, prio: 120, lat: 16 } hitcount: 1
{ pid: 2418, prio: 120, lat: 18 } hitcount: 2
{ pid: 2418, prio: 120, lat: 19 } hitcount: 1
{ pid: 2418, prio: 120, lat: 20 } hitcount: 1
{ pid: 2418, prio: 120, lat: 21 } hitcount: 1
{ pid: 2418, prio: 120, lat: 22 } hitcount: 1
{ pid: 2418, prio: 120, lat: 23 } hitcount: 1
{ pid: 2418, prio: 120, lat: 56 } hitcount: 1
{ pid: 2418, prio: 120, lat: 60 } hitcount: 1
{ pid: 2418, prio: 120, lat: 123 } hitcount: 1
{ pid: 2419, prio: 19, lat: 4 } hitcount: 12
{ pid: 2419, prio: 19, lat: 5 } hitcount: 230
{ pid: 2419, prio: 19, lat: 6 } hitcount: 343
{ pid: 2419, prio: 19, lat: 7 } hitcount: 485
{ pid: 2419, prio: 19, lat: 8 } hitcount: 574
{ pid: 2419, prio: 19, lat: 9 } hitcount: 164
{ pid: 2419, prio: 19, lat: 10 } hitcount: 51
{ pid: 2419, prio: 19, lat: 11 } hitcount: 36
{ pid: 2419, prio: 19, lat: 12 } hitcount: 23
{ pid: 2419, prio: 19, lat: 13 } hitcount: 16
{ pid: 2419, prio: 19, lat: 14 } hitcount: 13
{ pid: 2419, prio: 19, lat: 15 } hitcount: 5
{ pid: 2419, prio: 19, lat: 16 } hitcount: 6
{ pid: 2419, prio: 19, lat: 17 } hitcount: 5
{ pid: 2419, prio: 19, lat: 18 } hitcount: 1
{ pid: 2419, prio: 19, lat: 19 } hitcount: 2
{ pid: 2419, prio: 19, lat: 26 } hitcount: 1
{ pid: 2419, prio: 19, lat: 29 } hitcount: 1
{ pid: 2419, prio: 19, lat: 37 } hitcount: 1
{ pid: 2419, prio: 19, lat: 38 } hitcount: 1
{ pid: 2420, prio: 19, lat: 4 } hitcount: 1
{ pid: 2420, prio: 19, lat: 5 } hitcount: 45
{ pid: 2420, prio: 19, lat: 6 } hitcount: 96
{ pid: 2420, prio: 19, lat: 7 } hitcount: 227
{ pid: 2420, prio: 19, lat: 8 } hitcount: 558
{ pid: 2420, prio: 19, lat: 9 } hitcount: 236
{ pid: 2420, prio: 19, lat: 10 } hitcount: 67
{ pid: 2420, prio: 19, lat: 11 } hitcount: 27
{ pid: 2420, prio: 19, lat: 12 } hitcount: 17
{ pid: 2420, prio: 19, lat: 13 } hitcount: 12
{ pid: 2420, prio: 19, lat: 14 } hitcount: 11
{ pid: 2420, prio: 19, lat: 15 } hitcount: 8
{ pid: 2420, prio: 19, lat: 16 } hitcount: 6
{ pid: 2420, prio: 19, lat: 17 } hitcount: 3
{ pid: 2420, prio: 19, lat: 18 } hitcount: 1
{ pid: 2420, prio: 19, lat: 23 } hitcount: 1
{ pid: 2420, prio: 19, lat: 25 } hitcount: 1
{ pid: 2420, prio: 19, lat: 26 } hitcount: 1
{ pid: 2420, prio: 19, lat: 434 } hitcount: 1

Totals:
Hits: 3488
Entries: 59
Dropped: 0

The above output uses the .usecs modifier to common_timestamp, so
the latencies are reported in microseconds. The default, without
the modifier, is nanoseconds, but that's too fine-grained to put
directly into a histogram - for that however we can use the .log2
modifier on the 'lat' key. Otherwise the rest is the same:

# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events

# echo 'hist:keys=pid:ts0=$common_timestamp if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger

# echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp-$ts0: \
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_prio) \
if next_comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

# echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger

# cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist

# event histogram
#
# trigger info: hist:keys=pid,prio,lat.log2:vals=hitcount:sort=pid,lat.log2:size=2048 [active]
#

{ pid: 2457, prio: 120, lat: ~ 2^13 } hitcount: 99
{ pid: 2457, prio: 120, lat: ~ 2^14 } hitcount: 91
{ pid: 2457, prio: 120, lat: ~ 2^15 } hitcount: 8
{ pid: 2458, prio: 19, lat: ~ 2^13 } hitcount: 1437
{ pid: 2458, prio: 19, lat: ~ 2^14 } hitcount: 519
{ pid: 2458, prio: 19, lat: ~ 2^15 } hitcount: 11
{ pid: 2458, prio: 19, lat: ~ 2^16 } hitcount: 2
{ pid: 2458, prio: 19, lat: ~ 2^18 } hitcount: 1
{ pid: 2459, prio: 19, lat: ~ 2^13 } hitcount: 874
{ pid: 2459, prio: 19, lat: ~ 2^14 } hitcount: 442
{ pid: 2459, prio: 19, lat: ~ 2^15 } hitcount: 4

Totals:
Hits: 3488
Entries: 11
Dropped: 0

====
Example - wakeup latency with onmax()
====

This example is the same as the previous ones, but here we're
additionally using the onmax() action to save some context (several
fields of the sched_switch event) whenever the latency (wakeup_lat)
exceeds the previous maximum.

As with the similar functionality of the -RT latency_hist
histograms, it's useful to be able to capture information about the
previous process, which potentially could have contributed to the
maximum latency that was saved.

# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events

# echo 'hist:keys=pid:ts0=$common_timestamp.usecs if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger

Here we add an onmax() action that saves some important fields of
the sched_switch event along with the maximum, in addition to
sending some of the same data to the synthetic event:

# echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0: \
onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm): \
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_prio) \
if next_comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

# echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger

To see the maximums and associated data for each pid, cat the
sched_switch event, as that's the event the onmax() action is
associated with:


# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

# event histogram
#
# trigger info: hist:keys=next_pid:vals=hitcount:wakeup_lat=$common_timestamp.usecs-$ts0:sort=hitcount:size=2048:clock=global:onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm):onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,next_prio) if next_comm=="cyclictest" [active]
#

{ next_pid: 2803 } hitcount: 198
max: 55 next_comm: cyclictest prev_pid: 0
prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2805 } hitcount: 1319
max: 53 next_comm: cyclictest prev_pid: 0
prev_prio: 120 prev_comm: swapper/1

{ next_pid: 2804 } hitcount: 1970
max: 79 next_comm: cyclictest prev_pid: 0
prev_prio: 120 prev_comm: swapper/0

Totals:
Hits: 3487
Entries: 3
Dropped: 0

And, verifying, we can see that the max latencies captured above
match the highest latencies for each thread in the wakeup_latency
histogram:

# cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist
# event histogram
#
# trigger info: hist:keys=pid,prio,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
#

{ pid: 2803, prio: 120, lat: 6 } hitcount: 12
{ pid: 2803, prio: 120, lat: 7 } hitcount: 42
...
{ pid: 2803, prio: 120, lat: 22 } hitcount: 2
{ pid: 2803, prio: 120, lat: 55 } hitcount: 1
{ pid: 2804, prio: 19, lat: 4 } hitcount: 5
{ pid: 2804, prio: 19, lat: 5 } hitcount: 188
...
{ pid: 2804, prio: 19, lat: 30 } hitcount: 1
{ pid: 2804, prio: 19, lat: 79 } hitcount: 1
{ pid: 2805, prio: 19, lat: 5 } hitcount: 19
{ pid: 2805, prio: 19, lat: 6 } hitcount: 73
...
{ pid: 2805, prio: 19, lat: 44 } hitcount: 1
{ pid: 2805, prio: 19, lat: 53 } hitcount: 1

Totals:
Hits: 3487
Entries: 57
Dropped: 0

====
Example - combined wakeup and switchtime (wakeupswitch) latency
====

Finally, this example is quite a bit more involved, but that's
because it implements 3 latencies, one which is a combination of the
other two. This also, is something that the -RT latency_hist
patchset does and which this patchset adds generic support for.

The latency_hist patchset creates a few individual latency
histograms but also combines them into larger overall combined
histograms. For example, the time between when a thread is awakened
and when it actually continues executing in userspace is something
covered by a histogram, but it's also broken down into two
sub-histograms, one covering the time between sched_waking and the
time the thread is scheduled in (wakeup_latency as above), and the
time between when the thread is scheduled in and the time it
actually begins executing again (return from sys_nanosleep), covered
by a separate switchtime_latency histogram.

The below combines the wakeup_latency histogram from before, adds a
new switchtime_latency histogram, and another, wakeupswitch_latency,
that's a combination of the other two.

There isn't anything really new here, other than that the use of the
addition operator to add two latencies to produce the
wakeupswitch_latency.

First, we create the familiar wakeup_latency histogram:

# echo 'wakeup_latency u64 lat; pid_t pid' >> \
/sys/kernel/debug/tracing/synthetic_events

# echo 'hist:keys=pid:ts0=$common_timestamp.usecs \
if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger

# echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0:\
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid) \
if next_comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

Here we save the wakeup_latency lat value as wakeup_lat for use
later in the combined latency:

# echo 'hist:keys=pid,lat:wakeup_lat=lat:sort=pid,lat' \
>> /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger

Next, we create the switchtime_latency histogram:

# echo 'switchtime_latency u64 lat; pid_t pid' >> \
/sys/kernel/debug/tracing/synthetic_events

Here we save the sched_switch next_pid field as 'pid'. This is so
we can access the next_pid in the matching sys_exit_nanosleep event.

# echo 'hist:keys=next_pid:pid=next_pid:ts0=$common_timestamp.usecs \
if next_comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

# echo 'hist:keys=common_pid:switchtime_lat=$common_timestamp.usecs-$ts0: \
onmatch(sched.sched_switch).switchtime_latency($switchtime_lat,$pid)' \
>> /sys/kernel/debug/tracing/events/syscalls/sys_exit_nanosleep/trigger

# echo 'hist:keys=pid,lat:sort=pid,lat' \
>> /sys/kernel/debug/tracing/events/synthetic/switchtime_latency/trigger

Finally, we create the combined wakeupswitch_latency:

# echo 'wakeupswitch_latency u64 lat; pid_t pid' >> \
/sys/kernel/debug/tracing/synthetic_events

Here we calculate the combined latency using the saved $wakeup_lat
variable from the wakeup_latency histogram and the lat value of the
switchtime_latency, save it as ws_lat and then use it to generate
the combined wakeupswitch latency:

# echo 'hist:keys=pid,lat:sort=pid,lat:ws_lat=$wakeup_lat+lat: \
onmatch(synthetic.wakeup_latency).wakeupswitch_latency($ws_lat,pid)' \
>> /sys/kernel/debug/tracing/events/synthetic/switchtime_latency/trigger

# echo 'hist:keys=pid,lat:sort=pid,lat' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeupswitch_latency/trigger


After running our cyclictest workload, we can now look at each
histogram, starting with wakeup_latency:

# cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist

# event histogram
#
# trigger info: hist:keys=pid,lat:vals=hitcount:wakeup_lat=lat:sort=pid,lat:size=2048 [active]
#

{ pid: 4146, lat: 6 } hitcount: 8
{ pid: 4146, lat: 7 } hitcount: 50
{ pid: 4146, lat: 8 } hitcount: 53
{ pid: 4146, lat: 9 } hitcount: 34
{ pid: 4146, lat: 10 } hitcount: 22
{ pid: 4146, lat: 11 } hitcount: 6
{ pid: 4146, lat: 12 } hitcount: 4
{ pid: 4146, lat: 13 } hitcount: 1
{ pid: 4146, lat: 14 } hitcount: 4
{ pid: 4146, lat: 15 } hitcount: 2
{ pid: 4146, lat: 16 } hitcount: 2
{ pid: 4146, lat: 17 } hitcount: 3
{ pid: 4146, lat: 19 } hitcount: 1
{ pid: 4146, lat: 20 } hitcount: 2
{ pid: 4146, lat: 21 } hitcount: 4
{ pid: 4146, lat: 24 } hitcount: 1
{ pid: 4146, lat: 53 } hitcount: 1
{ pid: 4147, lat: 4 } hitcount: 1
{ pid: 4147, lat: 5 } hitcount: 156
{ pid: 4147, lat: 6 } hitcount: 344
{ pid: 4147, lat: 7 } hitcount: 560
{ pid: 4147, lat: 8 } hitcount: 540
{ pid: 4147, lat: 9 } hitcount: 195
{ pid: 4147, lat: 10 } hitcount: 50
{ pid: 4147, lat: 11 } hitcount: 38
{ pid: 4147, lat: 12 } hitcount: 26
{ pid: 4147, lat: 13 } hitcount: 13
{ pid: 4147, lat: 14 } hitcount: 12
{ pid: 4147, lat: 15 } hitcount: 10
{ pid: 4147, lat: 16 } hitcount: 3
{ pid: 4147, lat: 17 } hitcount: 2
{ pid: 4147, lat: 18 } hitcount: 4
{ pid: 4147, lat: 19 } hitcount: 2
{ pid: 4147, lat: 20 } hitcount: 1
{ pid: 4147, lat: 21 } hitcount: 1
{ pid: 4147, lat: 26 } hitcount: 1
{ pid: 4147, lat: 35 } hitcount: 1
{ pid: 4147, lat: 59 } hitcount: 1
{ pid: 4148, lat: 5 } hitcount: 38
{ pid: 4148, lat: 6 } hitcount: 229
{ pid: 4148, lat: 7 } hitcount: 219
{ pid: 4148, lat: 8 } hitcount: 486
{ pid: 4148, lat: 9 } hitcount: 181
{ pid: 4148, lat: 10 } hitcount: 59
{ pid: 4148, lat: 11 } hitcount: 27
{ pid: 4148, lat: 12 } hitcount: 23
{ pid: 4148, lat: 13 } hitcount: 16
{ pid: 4148, lat: 14 } hitcount: 7
{ pid: 4148, lat: 15 } hitcount: 7
{ pid: 4148, lat: 16 } hitcount: 6
{ pid: 4148, lat: 17 } hitcount: 2
{ pid: 4148, lat: 18 } hitcount: 2
{ pid: 4148, lat: 19 } hitcount: 4
{ pid: 4148, lat: 20 } hitcount: 3
{ pid: 4148, lat: 25 } hitcount: 1
{ pid: 4148, lat: 26 } hitcount: 1
{ pid: 4148, lat: 27 } hitcount: 1
{ pid: 4148, lat: 29 } hitcount: 2
{ pid: 4148, lat: 47 } hitcount: 1

Totals:
Hits: 3474
Entries: 59
Dropped: 0

Here's the switchtime histogram:

# cat /sys/kernel/debug/tracing/events/synthetic/switchtime_latency/hist

# event histogram
#
# trigger info: hist:keys=pid,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
#

{ pid: 4146, lat: 3 } hitcount: 11
{ pid: 4146, lat: 4 } hitcount: 32
{ pid: 4146, lat: 5 } hitcount: 65
{ pid: 4146, lat: 6 } hitcount: 37
{ pid: 4146, lat: 7 } hitcount: 35
{ pid: 4146, lat: 8 } hitcount: 5
{ pid: 4146, lat: 10 } hitcount: 1
{ pid: 4146, lat: 11 } hitcount: 1
{ pid: 4146, lat: 12 } hitcount: 1
{ pid: 4146, lat: 13 } hitcount: 1
{ pid: 4146, lat: 14 } hitcount: 1
{ pid: 4146, lat: 15 } hitcount: 1
{ pid: 4146, lat: 16 } hitcount: 2
{ pid: 4146, lat: 17 } hitcount: 1
{ pid: 4146, lat: 18 } hitcount: 1
{ pid: 4146, lat: 20 } hitcount: 1
{ pid: 4146, lat: 22 } hitcount: 1
{ pid: 4146, lat: 55 } hitcount: 1
{ pid: 4147, lat: 3 } hitcount: 554
{ pid: 4147, lat: 4 } hitcount: 999
{ pid: 4147, lat: 5 } hitcount: 193
{ pid: 4147, lat: 6 } hitcount: 102
{ pid: 4147, lat: 7 } hitcount: 38
{ pid: 4147, lat: 8 } hitcount: 21
{ pid: 4147, lat: 9 } hitcount: 8
{ pid: 4147, lat: 10 } hitcount: 10
{ pid: 4147, lat: 11 } hitcount: 11
{ pid: 4147, lat: 12 } hitcount: 7
{ pid: 4147, lat: 13 } hitcount: 2
{ pid: 4147, lat: 14 } hitcount: 2
{ pid: 4147, lat: 15 } hitcount: 5
{ pid: 4147, lat: 16 } hitcount: 2
{ pid: 4147, lat: 17 } hitcount: 3
{ pid: 4147, lat: 18 } hitcount: 2
{ pid: 4147, lat: 23 } hitcount: 2
{ pid: 4148, lat: 3 } hitcount: 245
{ pid: 4148, lat: 4 } hitcount: 761
{ pid: 4148, lat: 5 } hitcount: 152
{ pid: 4148, lat: 6 } hitcount: 64
{ pid: 4148, lat: 7 } hitcount: 25
{ pid: 4148, lat: 8 } hitcount: 7
{ pid: 4148, lat: 9 } hitcount: 14
{ pid: 4148, lat: 10 } hitcount: 11
{ pid: 4148, lat: 11 } hitcount: 12
{ pid: 4148, lat: 12 } hitcount: 6
{ pid: 4148, lat: 13 } hitcount: 2
{ pid: 4148, lat: 14 } hitcount: 7
{ pid: 4148, lat: 15 } hitcount: 2
{ pid: 4148, lat: 17 } hitcount: 2
{ pid: 4148, lat: 18 } hitcount: 1
{ pid: 4148, lat: 19 } hitcount: 1
{ pid: 4148, lat: 24 } hitcount: 1
{ pid: 4148, lat: 25 } hitcount: 1
{ pid: 4148, lat: 42 } hitcount: 1

Totals:
Hits: 3474
Entries: 54
Dropped: 0

And here's the combined wakeupswitch latency histogram:

# cat /sys/kernel/debug/tracing/events/synthetic/wakeupswitch_latency/hist

# event histogram
#
# trigger info: hist:keys=pid,lat:vals=hitcount:sort=pid,lat:size=2048 [active]
#

{ pid: 4146, lat: 10 } hitcount: 16
{ pid: 4146, lat: 11 } hitcount: 18
{ pid: 4146, lat: 12 } hitcount: 25
{ pid: 4146, lat: 13 } hitcount: 38
{ pid: 4146, lat: 14 } hitcount: 25
{ pid: 4146, lat: 15 } hitcount: 14
{ pid: 4146, lat: 16 } hitcount: 17
{ pid: 4146, lat: 17 } hitcount: 14
{ pid: 4146, lat: 18 } hitcount: 7
{ pid: 4146, lat: 19 } hitcount: 1
{ pid: 4146, lat: 20 } hitcount: 3
{ pid: 4146, lat: 21 } hitcount: 2
{ pid: 4146, lat: 22 } hitcount: 1
{ pid: 4146, lat: 23 } hitcount: 1
{ pid: 4146, lat: 24 } hitcount: 1
{ pid: 4146, lat: 25 } hitcount: 2
{ pid: 4146, lat: 26 } hitcount: 2
{ pid: 4146, lat: 29 } hitcount: 1
{ pid: 4146, lat: 30 } hitcount: 1
{ pid: 4146, lat: 32 } hitcount: 1
{ pid: 4146, lat: 33 } hitcount: 2
{ pid: 4146, lat: 36 } hitcount: 1
{ pid: 4146, lat: 37 } hitcount: 1
{ pid: 4146, lat: 38 } hitcount: 1
{ pid: 4146, lat: 39 } hitcount: 1
{ pid: 4146, lat: 73 } hitcount: 1
{ pid: 4146, lat: 76 } hitcount: 1
{ pid: 4147, lat: 8 } hitcount: 54
{ pid: 4147, lat: 9 } hitcount: 205
{ pid: 4147, lat: 10 } hitcount: 391
{ pid: 4147, lat: 11 } hitcount: 544
{ pid: 4147, lat: 12 } hitcount: 342
{ pid: 4147, lat: 13 } hitcount: 141
{ pid: 4147, lat: 14 } hitcount: 68
{ pid: 4147, lat: 15 } hitcount: 46
{ pid: 4147, lat: 16 } hitcount: 42
{ pid: 4147, lat: 17 } hitcount: 23
{ pid: 4147, lat: 18 } hitcount: 23
{ pid: 4147, lat: 19 } hitcount: 17
{ pid: 4147, lat: 20 } hitcount: 8
{ pid: 4147, lat: 21 } hitcount: 7
{ pid: 4147, lat: 22 } hitcount: 7
{ pid: 4147, lat: 23 } hitcount: 8
{ pid: 4147, lat: 24 } hitcount: 4
{ pid: 4147, lat: 25 } hitcount: 6
{ pid: 4147, lat: 26 } hitcount: 4
{ pid: 4147, lat: 27 } hitcount: 1
{ pid: 4147, lat: 28 } hitcount: 4
{ pid: 4147, lat: 29 } hitcount: 2
{ pid: 4147, lat: 30 } hitcount: 3
{ pid: 4147, lat: 31 } hitcount: 3
{ pid: 4147, lat: 32 } hitcount: 1
{ pid: 4147, lat: 34 } hitcount: 1
{ pid: 4147, lat: 35 } hitcount: 3
{ pid: 4147, lat: 36 } hitcount: 1
{ pid: 4147, lat: 50 } hitcount: 1
{ pid: 4147, lat: 71 } hitcount: 1
{ pid: 4148, lat: 8 } hitcount: 2
{ pid: 4148, lat: 9 } hitcount: 100
{ pid: 4148, lat: 10 } hitcount: 156
{ pid: 4148, lat: 11 } hitcount: 340
{ pid: 4148, lat: 12 } hitcount: 341
{ pid: 4148, lat: 13 } hitcount: 139
{ pid: 4148, lat: 14 } hitcount: 55
{ pid: 4148, lat: 15 } hitcount: 42
{ pid: 4148, lat: 16 } hitcount: 24
{ pid: 4148, lat: 17 } hitcount: 17
{ pid: 4148, lat: 18 } hitcount: 18
{ pid: 4148, lat: 19 } hitcount: 17
{ pid: 4148, lat: 20 } hitcount: 7
{ pid: 4148, lat: 21 } hitcount: 4
{ pid: 4148, lat: 22 } hitcount: 7
{ pid: 4148, lat: 23 } hitcount: 6
{ pid: 4148, lat: 24 } hitcount: 3
{ pid: 4148, lat: 25 } hitcount: 7
{ pid: 4148, lat: 26 } hitcount: 5
{ pid: 4148, lat: 27 } hitcount: 3
{ pid: 4148, lat: 28 } hitcount: 3
{ pid: 4148, lat: 29 } hitcount: 1
{ pid: 4148, lat: 30 } hitcount: 4
{ pid: 4148, lat: 31 } hitcount: 1
{ pid: 4148, lat: 32 } hitcount: 2
{ pid: 4148, lat: 33 } hitcount: 3
{ pid: 4148, lat: 34 } hitcount: 1
{ pid: 4148, lat: 36 } hitcount: 1
{ pid: 4148, lat: 37 } hitcount: 2
{ pid: 4148, lat: 38 } hitcount: 1
{ pid: 4148, lat: 41 } hitcount: 1
{ pid: 4148, lat: 53 } hitcount: 1
{ pid: 4148, lat: 71 } hitcount: 1

Totals:
Hits: 3474
Entries: 90
Dropped: 0

Finally, just to show that synthetic events are indeed just like any
other event as far as the event subsystem is concerned, we can
enable the synthetic events and see the events appear in the trace
buffer:

# echo 1 > /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/enable
# echo 1 > /sys/kernel/debug/tracing/events/synthetic/switchtime_latency/enable
# echo 1 > /sys/kernel/debug/tracing/events/synthetic/wakeupswitch_latency/enable

Below is a snippet of the contents of the trace file produced when
the above histograms were generated:

# cat /sys/kernel/debug/tracing/trace

# tracer: nop
#
# entries-in-buffer/entries-written: 10503/10503 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [001] d..2 23532.240146: wakeup_latency: lat=4, pid=6853
cyclictest-6853 [001] .... 23532.240153: switchtime_latency: lat=7, pid=6853
cyclictest-6853 [001] .... 23532.240157: wakeupswitch_latency: lat=11, pid=6853
gnome-terminal--2500 [001] d..2 23532.240672: wakeup_latency: lat=5, pid=6854
cyclictest-6854 [001] .... 23532.240676: switchtime_latency: lat=4, pid=6854
cyclictest-6854 [001] .... 23532.240677: wakeupswitch_latency: lat=9, pid=6854
gnome-terminal--2500 [001] d..2 23532.241169: wakeup_latency: lat=4, pid=6853
cyclictest-6853 [001] .... 23532.241172: switchtime_latency: lat=3, pid=6853
cyclictest-6853 [001] .... 23532.241174: wakeupswitch_latency: lat=7, pid=6853
<idle>-0 [001] d..2 23532.242189: wakeup_latency: lat=6, pid=6853
cyclictest-6853 [001] .... 23532.242195: switchtime_latency: lat=8, pid=6853
<idle>-0 [000] d..2 23532.242196: wakeup_latency: lat=12, pid=6854
cyclictest-6853 [001] .... 23532.240146: wakeupswitch_latency: lat=14, pid=6853
cyclictest-6854 [000] .... 23532.242196: switchtime_latency: lat=4, pid=6854
<idle>-0 [001] d..2 23532.240146: wakeup_latency: lat=2, pid=6853
cyclictest-6854 [000] .... 23532.242196: wakeupswitch_latency: lat=16, pid=6854
cyclictest-6853 [001] .... 23532.240146: switchtime_latency: lat=3, pid=6853
...

One quick note about usage - the introduction of variables and
actions obviously makes it harder to determine the cause of a hist
trigger command failure - 'Invalid argument' is no longer sufficient
in many cases.

For that reason, a new 'extended error' mechanism has been added to
hist triggers, initially focused on variable and action-related
errors, but something that could possibly expanded to other error
conditions later.

To make use of it, simply read the 'hist' file of the event that was
the target of the command.

In this example, we've entered the same command twice, resulting in
an attempt to define the same variable (ts0) twice. After seeing
the 'Invalid argument' error for the command, we read the same
event's hist file and see a message to that effect at the bottom of
the file:

# echo 'hist:keys=pid:ts0=$common_timestamp.usecs if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger

# echo 'hist:keys=pid:ts0=$common_timestamp.usecs if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_waking/trigger

-su: echo: write error: Invalid argument

# cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist
# event histogram
#
#

Totals:
Hits: 0
Entries: 0
Dropped: 0
Duplicates: 0

ERROR: Variable already defined: ts0
Last command: keys=pid:ts0=$common_timestamp.usecs if comm=="cyclictest"


The following changes since commit edb096e00724f02db5f6ec7900f3bbd465c6c76f:

ftrace: Fix memleak when unregistering dynamic ops when tracing disabled (2017-09-01 13:55:49 -0400)

are available in the git repository at:

https://github.com/tzanussi/linux-trace-inter-event.git tzanussi/inter-event-v2
https://github.com/tzanussi/linux-trace-inter-event/tree/tzanussi/inter-event-v2

Baohong Liu (1):
tracing: Apply absolute timestamps to instance max buffer

Tom Zanussi (37):
tracing: Exclude 'generic fields' from histograms
tracing: Add hist_field_name() accessor
tracing: Reimplement log2
ring-buffer: Add interface for setting absolute time stamps
ring-buffer: Redefine the unimplemented RINGBUF_TIME_TIME_STAMP
tracing: Give event triggers access to ring_buffer_event
tracing: Add ring buffer event param to hist field functions
tracing: Increase tracing map KEYS_MAX size
tracing: Break out hist trigger assignment parsing
tracing: Make traceprobe parsing code reusable
tracing: Add hist trigger timestamp support
tracing: Add per-element variable support to tracing_map
tracing: Add hist_data member to hist_field
tracing: Add usecs modifier for hist trigger timestamps
tracing: Add variable support to hist triggers
tracing: Account for variables in named trigger compatibility
tracing: Add simple expression support to hist triggers
tracing: Generalize per-element hist trigger data
tracing: Pass tracing_map_elt to hist_field accessor functions
tracing: Add hist_field 'type' field
tracing: Add variable reference handling to hist triggers
tracing: Add support for dynamic tracepoints
tracing: Add hist trigger action hook
tracing: Add support for 'synthetic' events
tracing: Add support for 'field variables'
tracing: Add 'onmatch' hist trigger action support
tracing: Add 'onmax' hist trigger action support
tracing: Allow whitespace to surround hist trigger filter
tracing: Add cpu field for hist triggers
tracing: Add hist trigger support for variable reference aliases
tracing: Add 'last error' error facility for hist triggers
tracing: Reverse the order event_mutex/trace_types_lock are taken
tracing: Remove lookups from tracing_map hitcount
tracing: Add inter-event hist trigger Documentation
tracing: Make tracing_set_clock() non-static
tracing: Add a clock attribute for hist triggers
tracing: Add trace_event_buffer_reserve() variant that allows
recursion

Vedang Patel (2):
tracing: Add support to detect and avoid duplicates
tracing: Remove code which merges duplicates

Documentation/trace/events.txt | 431 ++++
include/linux/ring_buffer.h | 17 +-
include/linux/trace_events.h | 24 +-
include/linux/tracepoint-defs.h | 1 +
kernel/trace/ring_buffer.c | 126 +-
kernel/trace/trace.c | 205 +-
kernel/trace/trace.h | 25 +-
kernel/trace/trace_events.c | 51 +-
kernel/trace/trace_events_hist.c | 4472 +++++++++++++++++++++++++++++++----
kernel/trace/trace_events_trigger.c | 53 +-
kernel/trace/trace_kprobe.c | 18 +-
kernel/trace/trace_probe.c | 86 -
kernel/trace/trace_probe.h | 7 -
kernel/trace/trace_uprobe.c | 2 +-
kernel/trace/tracing_map.c | 229 +-
kernel/trace/tracing_map.h | 20 +-
kernel/tracepoint.c | 18 +-
17 files changed, 5073 insertions(+), 712 deletions(-)

--
1.9.3


2017-09-05 21:58:12

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 02/40] tracing: Add support to detect and avoid duplicates

From: Vedang Patel <[email protected]>

A duplicate in the tracing_map hash table is when 2 different entries
have the same key and, as a result, the key_hash. This is possible due
to a race condition in the algorithm. This race condition is inherent to
the algorithm and not a bug. This was fine because, until now, we were
only interested in the sum of all the values related to a particular
key (the duplicates are dealt with in tracing_map_sort_entries()). But,
with the inclusion of variables[1], we are interested in individual
values. So, it will not be clear what value to choose when
there are duplicates. So, the duplicates need to be removed.

The duplicates can occur in the code in the following scenarios:

- A thread is in the process of adding a new element. It has
successfully executed cmpxchg() and inserted the key. But, it is still
not done acquiring the trace_map_elt struct, populating it and storing
the pointer to the struct in the value field of tracing_map hash table.
If another thread comes in at this time and wants to add an element with
the same key, it will not see the current element and add a new one.

- There are multiple threads trying to execute cmpxchg at the same time,
one of the threads will succeed and the others will fail. The ones which
fail will go ahead increment 'idx' and add a new element there creating
a duplicate.

This patch detects and avoids the first condition by asking the thread
which detects the duplicate to loop one more time. There is also a
possibility of infinite loop if the thread which is trying to insert
goes to sleep indefinitely and the one which is trying to insert a new
element detects a duplicate. Which is why, the thread loops for
map_size iterations before returning NULL.

The second scenario is avoided by preventing the threads which failed
cmpxchg() from incrementing idx. This way, they will loop
around and check if the thread which succeeded in executing cmpxchg()
had the same key.

[1] - https://lkml.org/lkml/2017/6/26/751

Signed-off-by: Vedang Patel <[email protected]>
---
kernel/trace/tracing_map.c | 37 +++++++++++++++++++++++++++++++++----
1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index 305039b..437b490 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -414,6 +414,7 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
__tracing_map_insert(struct tracing_map *map, void *key, bool lookup_only)
{
u32 idx, key_hash, test_key;
+ int dup_try = 0;
struct tracing_map_entry *entry;

key_hash = jhash(key, map->key_size, 0);
@@ -426,10 +427,31 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
entry = TRACING_MAP_ENTRY(map->map, idx);
test_key = entry->key;

- if (test_key && test_key == key_hash && entry->val &&
- keys_match(key, entry->val->key, map->key_size)) {
- atomic64_inc(&map->hits);
- return entry->val;
+ if (test_key && test_key == key_hash) {
+ if (entry->val &&
+ keys_match(key, entry->val->key, map->key_size)) {
+ atomic64_inc(&map->hits);
+ return entry->val;
+ } else if (unlikely(!entry->val)) {
+ /*
+ * The key is present. But, val (pointer to elt
+ * struct) is still NULL. which means some other
+ * thread is in the process of inserting an
+ * element.
+ *
+ * On top of that, it's key_hash is same as the
+ * one being inserted right now. So, it's
+ * possible that the element has the same
+ * key as well.
+ */
+
+ dup_try++;
+ if (dup_try > map->map_size) {
+ atomic64_inc(&map->drops);
+ break;
+ }
+ continue;
+ }
}

if (!test_key) {
@@ -451,6 +473,13 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
atomic64_inc(&map->hits);

return entry->val;
+ } else {
+ /*
+ * cmpxchg() failed. Loop around once
+ * more to check what key was inserted.
+ */
+ dup_try++;
+ continue;
}
}

--
1.9.3

2017-09-05 21:58:22

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 07/40] tracing: Apply absolute timestamps to instance max buffer

From: Baohong Liu <[email protected]>

Currently absolute timestamps are applied to both regular and max
buffers only for global trace. For instance trace, absolute
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply absolute timestamps to instance max buffer as well.

Similarly, buffer clock change is applied to instance max buffer
as well.

Signed-off-by: Baohong Liu <[email protected]>
---
kernel/trace/trace.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 66d465e..719e4c1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6223,7 +6223,7 @@ static int tracing_set_clock(struct trace_array *tr, const char *clockstr)
tracing_reset_online_cpus(&tr->trace_buffer);

#ifdef CONFIG_TRACER_MAX_TRACE
- if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
+ if (tr->max_buffer.buffer)
ring_buffer_set_clock(tr->max_buffer.buffer, trace_clocks[i].func);
tracing_reset_online_cpus(&tr->max_buffer);
#endif
@@ -6307,7 +6307,7 @@ int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs)
tracing_reset_online_cpus(&tr->trace_buffer);

#ifdef CONFIG_TRACER_MAX_TRACE
- if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
+ if (tr->max_buffer.buffer)
ring_buffer_set_time_stamp_abs(tr->max_buffer.buffer, abs);
tracing_reset_online_cpus(&tr->max_buffer);
#endif
--
1.9.3

2017-09-05 21:58:18

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 04/40] tracing: Add hist_field_name() accessor

In preparation for hist_fields that won't be strictly based on
trace_event_fields, add a new hist_field_name() accessor to allow that
flexibility and update associated users.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 67 +++++++++++++++++++++++++++-------------
1 file changed, 45 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 773a66e..0184acd 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -146,6 +146,23 @@ struct hist_trigger_data {
struct tracing_map *map;
};

+static const char *hist_field_name(struct hist_field *field,
+ unsigned int level)
+{
+ const char *field_name = "";
+
+ if (level > 1)
+ return field_name;
+
+ if (field->field)
+ field_name = field->field->name;
+
+ if (field_name == NULL)
+ field_name = "";
+
+ return field_name;
+}
+
static hist_field_fn_t select_value_fn(int field_size, int field_is_signed)
{
hist_field_fn_t fn = NULL;
@@ -642,7 +659,6 @@ static int is_descending(const char *str)
static int create_sort_keys(struct hist_trigger_data *hist_data)
{
char *fields_str = hist_data->attrs->sort_key_str;
- struct ftrace_event_field *field = NULL;
struct tracing_map_sort_key *sort_key;
int descending, ret = 0;
unsigned int i, j;
@@ -659,7 +675,9 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
}

for (i = 0; i < TRACING_MAP_SORT_KEYS_MAX; i++) {
+ struct hist_field *hist_field;
char *field_str, *field_name;
+ const char *test_name;

sort_key = &hist_data->sort_keys[i];

@@ -692,8 +710,10 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
}

for (j = 1; j < hist_data->n_fields; j++) {
- field = hist_data->fields[j]->field;
- if (field && (strcmp(field_name, field->name) == 0)) {
+ hist_field = hist_data->fields[j];
+ test_name = hist_field_name(hist_field, 0);
+
+ if (strcmp(field_name, test_name) == 0) {
sort_key->field_idx = j;
descending = is_descending(field_str);
if (descending < 0) {
@@ -941,6 +961,7 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
struct hist_field *key_field;
char str[KSYM_SYMBOL_LEN];
bool multiline = false;
+ const char *field_name;
unsigned int i;
u64 uval;

@@ -952,26 +973,27 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
if (i > hist_data->n_vals)
seq_puts(m, ", ");

+ field_name = hist_field_name(key_field, 0);
+
if (key_field->flags & HIST_FIELD_FL_HEX) {
uval = *(u64 *)(key + key_field->offset);
- seq_printf(m, "%s: %llx",
- key_field->field->name, uval);
+ seq_printf(m, "%s: %llx", field_name, uval);
} else if (key_field->flags & HIST_FIELD_FL_SYM) {
uval = *(u64 *)(key + key_field->offset);
sprint_symbol_no_offset(str, uval);
- seq_printf(m, "%s: [%llx] %-45s",
- key_field->field->name, uval, str);
+ seq_printf(m, "%s: [%llx] %-45s", field_name,
+ uval, str);
} else if (key_field->flags & HIST_FIELD_FL_SYM_OFFSET) {
uval = *(u64 *)(key + key_field->offset);
sprint_symbol(str, uval);
- seq_printf(m, "%s: [%llx] %-55s",
- key_field->field->name, uval, str);
+ seq_printf(m, "%s: [%llx] %-55s", field_name,
+ uval, str);
} else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
char *comm = elt->private_data;

uval = *(u64 *)(key + key_field->offset);
- seq_printf(m, "%s: %-16s[%10llu]",
- key_field->field->name, comm, uval);
+ seq_printf(m, "%s: %-16s[%10llu]", field_name,
+ comm, uval);
} else if (key_field->flags & HIST_FIELD_FL_SYSCALL) {
const char *syscall_name;

@@ -980,8 +1002,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
if (!syscall_name)
syscall_name = "unknown_syscall";

- seq_printf(m, "%s: %-30s[%3llu]",
- key_field->field->name, syscall_name, uval);
+ seq_printf(m, "%s: %-30s[%3llu]", field_name,
+ syscall_name, uval);
} else if (key_field->flags & HIST_FIELD_FL_STACKTRACE) {
seq_puts(m, "stacktrace:\n");
hist_trigger_stacktrace_print(m,
@@ -989,15 +1011,14 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
HIST_STACKTRACE_DEPTH);
multiline = true;
} else if (key_field->flags & HIST_FIELD_FL_LOG2) {
- seq_printf(m, "%s: ~ 2^%-2llu", key_field->field->name,
+ seq_printf(m, "%s: ~ 2^%-2llu", field_name,
*(u64 *)(key + key_field->offset));
} else if (key_field->flags & HIST_FIELD_FL_STRING) {
- seq_printf(m, "%s: %-50s", key_field->field->name,
+ seq_printf(m, "%s: %-50s", field_name,
(char *)(key + key_field->offset));
} else {
uval = *(u64 *)(key + key_field->offset);
- seq_printf(m, "%s: %10llu", key_field->field->name,
- uval);
+ seq_printf(m, "%s: %10llu", field_name, uval);
}
}

@@ -1010,13 +1031,13 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
tracing_map_read_sum(elt, HITCOUNT_IDX));

for (i = 1; i < hist_data->n_vals; i++) {
+ field_name = hist_field_name(hist_data->fields[i], 0);
+
if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
- seq_printf(m, " %s: %10llx",
- hist_data->fields[i]->field->name,
+ seq_printf(m, " %s: %10llx", field_name,
tracing_map_read_sum(elt, i));
} else {
- seq_printf(m, " %s: %10llu",
- hist_data->fields[i]->field->name,
+ seq_printf(m, " %s: %10llu", field_name,
tracing_map_read_sum(elt, i));
}
}
@@ -1131,7 +1152,9 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)

static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
{
- seq_printf(m, "%s", hist_field->field->name);
+ const char *field_name = hist_field_name(hist_field, 0);
+
+ seq_printf(m, "%s", field_name);
if (hist_field->flags) {
const char *flags_str = get_hist_field_flags(hist_field);

--
1.9.3

2017-09-05 21:58:29

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 08/40] ring-buffer: Redefine the unimplemented RINGBUF_TIME_TIME_STAMP

RINGBUF_TYPE_TIME_STAMP is defined but not used, and from what I can
gather was reserved for something like an absolute timestamp feature
for the ring buffer, if not a complete replacement of the current
time_delta scheme.

This code redefines RINGBUF_TYPE_TIME_STAMP to implement absolute time
stamps. Another way to look at it is that it essentially forces
extended time_deltas for all events.

The motivation for doing this is to enable time_deltas that aren't
dependent on previous events in the ring buffer, making it feasible to
use the ring_buffer_event timetamps in a more random-access way, for
purposes other than serial event printing.

To set/reset this mode, use tracing_set_timestamp_abs() from the
previous interface patch.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ring_buffer.h | 12 ++---
kernel/trace/ring_buffer.c | 105 ++++++++++++++++++++++++++++++++------------
2 files changed, 83 insertions(+), 34 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 28e3472..74bc276 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -36,10 +36,12 @@ struct ring_buffer_event {
* array[0] = time delta (28 .. 59)
* size = 8 bytes
*
- * @RINGBUF_TYPE_TIME_STAMP: Sync time stamp with external clock
- * array[0] = tv_nsec
- * array[1..2] = tv_sec
- * size = 16 bytes
+ * @RINGBUF_TYPE_TIME_STAMP: Absolute timestamp
+ * Same format as TIME_EXTEND except that the
+ * value is an absolute timestamp, not a delta
+ * event.time_delta contains bottom 27 bits
+ * array[0] = top (28 .. 59) bits
+ * size = 8 bytes
*
* <= @RINGBUF_TYPE_DATA_TYPE_LEN_MAX:
* Data record
@@ -56,12 +58,12 @@ enum ring_buffer_type {
RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
RINGBUF_TYPE_PADDING,
RINGBUF_TYPE_TIME_EXTEND,
- /* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
RINGBUF_TYPE_TIME_STAMP,
};

unsigned ring_buffer_event_length(struct ring_buffer_event *event);
void *ring_buffer_event_data(struct ring_buffer_event *event);
+u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event);

/*
* ring_buffer_discard_commit will remove an event that has not
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index cff15a2..0bcc53e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -42,6 +42,8 @@ int ring_buffer_print_entry_header(struct trace_seq *s)
RINGBUF_TYPE_PADDING);
trace_seq_printf(s, "\ttime_extend : type == %d\n",
RINGBUF_TYPE_TIME_EXTEND);
+ trace_seq_printf(s, "\ttime_stamp : type == %d\n",
+ RINGBUF_TYPE_TIME_STAMP);
trace_seq_printf(s, "\tdata max type_len == %d\n",
RINGBUF_TYPE_DATA_TYPE_LEN_MAX);

@@ -141,12 +143,15 @@ int ring_buffer_print_entry_header(struct trace_seq *s)

enum {
RB_LEN_TIME_EXTEND = 8,
- RB_LEN_TIME_STAMP = 16,
+ RB_LEN_TIME_STAMP = 8,
};

#define skip_time_extend(event) \
((struct ring_buffer_event *)((char *)event + RB_LEN_TIME_EXTEND))

+#define extended_time(event) \
+ (event->type_len >= RINGBUF_TYPE_TIME_EXTEND)
+
static inline int rb_null_event(struct ring_buffer_event *event)
{
return event->type_len == RINGBUF_TYPE_PADDING && !event->time_delta;
@@ -210,7 +215,7 @@ static void rb_event_set_padding(struct ring_buffer_event *event)
{
unsigned len = 0;

- if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
+ if (extended_time(event)) {
/* time extends include the data event after it */
len = RB_LEN_TIME_EXTEND;
event = skip_time_extend(event);
@@ -232,7 +237,7 @@ unsigned ring_buffer_event_length(struct ring_buffer_event *event)
{
unsigned length;

- if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+ if (extended_time(event))
event = skip_time_extend(event);

length = rb_event_length(event);
@@ -249,7 +254,7 @@ unsigned ring_buffer_event_length(struct ring_buffer_event *event)
static __always_inline void *
rb_event_data(struct ring_buffer_event *event)
{
- if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+ if (extended_time(event))
event = skip_time_extend(event);
BUG_ON(event->type_len > RINGBUF_TYPE_DATA_TYPE_LEN_MAX);
/* If length is in len field, then array[0] has the data */
@@ -276,6 +281,27 @@ void *ring_buffer_event_data(struct ring_buffer_event *event)
#define TS_MASK ((1ULL << TS_SHIFT) - 1)
#define TS_DELTA_TEST (~TS_MASK)

+/**
+ * ring_buffer_event_time_stamp - return the event's extended timestamp
+ * @event: the event to get the timestamp of
+ *
+ * Returns the extended timestamp associated with a data event.
+ * An extended time_stamp is a 64-bit timestamp represented
+ * internally in a special way that makes the best use of space
+ * contained within a ring buffer event. This function decodes
+ * it and maps it to a straight u64 value.
+ */
+u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event)
+{
+ u64 ts;
+
+ ts = event->array[0];
+ ts <<= TS_SHIFT;
+ ts += event->time_delta;
+
+ return ts;
+}
+
/* Flag when events were overwritten */
#define RB_MISSED_EVENTS (1 << 31)
/* Missed count stored at end */
@@ -2220,13 +2246,16 @@ static void rb_inc_iter(struct ring_buffer_iter *iter)
}

/* Slow path, do not inline */
-static noinline struct ring_buffer_event *
-rb_add_time_stamp(struct ring_buffer_event *event, u64 delta)
+static struct noinline ring_buffer_event *
+rb_add_time_stamp(struct ring_buffer_event *event, u64 delta, bool abs)
{
- event->type_len = RINGBUF_TYPE_TIME_EXTEND;
+ if (abs)
+ event->type_len = RINGBUF_TYPE_TIME_STAMP;
+ else
+ event->type_len = RINGBUF_TYPE_TIME_EXTEND;

- /* Not the first event on the page? */
- if (rb_event_index(event)) {
+ /* Not the first event on the page, or not delta? */
+ if (abs || rb_event_index(event)) {
event->time_delta = delta & TS_MASK;
event->array[0] = delta >> TS_SHIFT;
} else {
@@ -2269,7 +2298,9 @@ static inline bool rb_event_is_commit(struct ring_buffer_per_cpu *cpu_buffer,
* add it to the start of the resevered space.
*/
if (unlikely(info->add_timestamp)) {
- event = rb_add_time_stamp(event, delta);
+ bool abs = ring_buffer_time_stamp_abs(cpu_buffer->buffer);
+
+ event = rb_add_time_stamp(event, info->delta, abs);
length -= RB_LEN_TIME_EXTEND;
delta = 0;
}
@@ -2457,7 +2488,7 @@ static __always_inline void rb_end_commit(struct ring_buffer_per_cpu *cpu_buffer

static inline void rb_event_discard(struct ring_buffer_event *event)
{
- if (event->type_len == RINGBUF_TYPE_TIME_EXTEND)
+ if (extended_time(event))
event = skip_time_extend(event);

/* array[0] holds the actual length for the discarded event */
@@ -2488,6 +2519,10 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
{
u64 delta;

+ /* In TIME_STAMP mode, write_stamp is unused, nothing to do */
+ if (event->type_len == RINGBUF_TYPE_TIME_STAMP)
+ return;
+
/*
* The event first in the commit queue updates the
* time stamp.
@@ -2501,9 +2536,7 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
cpu_buffer->write_stamp =
cpu_buffer->commit_page->page->time_stamp;
else if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
- delta = event->array[0];
- delta <<= TS_SHIFT;
- delta += event->time_delta;
+ delta = ring_buffer_event_time_stamp(event);
cpu_buffer->write_stamp += delta;
} else
cpu_buffer->write_stamp += event->time_delta;
@@ -2687,7 +2720,7 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
* If this is the first commit on the page, then it has the same
* timestamp as the page itself.
*/
- if (!tail)
+ if (!tail && !ring_buffer_time_stamp_abs(cpu_buffer->buffer))
info->delta = 0;

/* See if we shot pass the end of this buffer page */
@@ -2765,8 +2798,11 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
/* make sure this diff is calculated here */
barrier();

- /* Did the write stamp get updated already? */
- if (likely(info.ts >= cpu_buffer->write_stamp)) {
+ if (ring_buffer_time_stamp_abs(buffer)) {
+ info.delta = info.ts;
+ rb_handle_timestamp(cpu_buffer, &info);
+ } else /* Did the write stamp get updated already? */
+ if (likely(info.ts >= cpu_buffer->write_stamp)) {
info.delta = diff;
if (unlikely(test_time_stamp(info.delta)))
rb_handle_timestamp(cpu_buffer, &info);
@@ -3448,14 +3484,12 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
return;

case RINGBUF_TYPE_TIME_EXTEND:
- delta = event->array[0];
- delta <<= TS_SHIFT;
- delta += event->time_delta;
+ delta = ring_buffer_event_time_stamp(event);
cpu_buffer->read_stamp += delta;
return;

case RINGBUF_TYPE_TIME_STAMP:
- /* FIXME: not implemented */
+ /* In TIME_STAMP mode, write_stamp is unused, nothing to do */
return;

case RINGBUF_TYPE_DATA:
@@ -3479,14 +3513,12 @@ int ring_buffer_iter_empty(struct ring_buffer_iter *iter)
return;

case RINGBUF_TYPE_TIME_EXTEND:
- delta = event->array[0];
- delta <<= TS_SHIFT;
- delta += event->time_delta;
+ delta = ring_buffer_event_time_stamp(event);
iter->read_stamp += delta;
return;

case RINGBUF_TYPE_TIME_STAMP:
- /* FIXME: not implemented */
+ /* In TIME_STAMP mode, write_stamp is unused, nothing to do */
return;

case RINGBUF_TYPE_DATA:
@@ -3710,6 +3742,8 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
struct buffer_page *reader;
int nr_loops = 0;

+ if (ts)
+ *ts = 0;
again:
/*
* We repeat when a time extend is encountered.
@@ -3746,12 +3780,17 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
goto again;

case RINGBUF_TYPE_TIME_STAMP:
- /* FIXME: not implemented */
+ if (ts) {
+ *ts = ring_buffer_event_time_stamp(event);
+ ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
+ cpu_buffer->cpu, ts);
+ }
+ /* Internal data, OK to advance */
rb_advance_reader(cpu_buffer);
goto again;

case RINGBUF_TYPE_DATA:
- if (ts) {
+ if (ts && !(*ts)) {
*ts = cpu_buffer->read_stamp + event->time_delta;
ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
cpu_buffer->cpu, ts);
@@ -3776,6 +3815,9 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
struct ring_buffer_event *event;
int nr_loops = 0;

+ if (ts)
+ *ts = 0;
+
cpu_buffer = iter->cpu_buffer;
buffer = cpu_buffer->buffer;

@@ -3828,12 +3870,17 @@ static int rb_lost_events(struct ring_buffer_per_cpu *cpu_buffer)
goto again;

case RINGBUF_TYPE_TIME_STAMP:
- /* FIXME: not implemented */
+ if (ts) {
+ *ts = ring_buffer_event_time_stamp(event);
+ ring_buffer_normalize_time_stamp(cpu_buffer->buffer,
+ cpu_buffer->cpu, ts);
+ }
+ /* Internal data, OK to advance */
rb_advance_iter(iter);
goto again;

case RINGBUF_TYPE_DATA:
- if (ts) {
+ if (ts && !(*ts)) {
*ts = iter->read_stamp + event->time_delta;
ring_buffer_normalize_time_stamp(buffer,
cpu_buffer->cpu, ts);
--
1.9.3

2017-09-05 21:58:33

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 09/40] tracing: Give event triggers access to ring_buffer_event

The ring_buffer event can provide a timestamp that may be useful to
various triggers - pass it into the handlers for that purpose.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/trace_events.h | 14 ++++++-----
kernel/trace/trace.h | 9 +++----
kernel/trace/trace_events_hist.c | 11 +++++----
kernel/trace/trace_events_trigger.c | 47 +++++++++++++++++++++++--------------
4 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 3702b9c..bfd2a53 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -400,11 +400,13 @@ enum event_trigger_type {

extern int filter_match_preds(struct event_filter *filter, void *rec);

-extern enum event_trigger_type event_triggers_call(struct trace_event_file *file,
- void *rec);
-extern void event_triggers_post_call(struct trace_event_file *file,
- enum event_trigger_type tt,
- void *rec);
+extern enum event_trigger_type
+event_triggers_call(struct trace_event_file *file, void *rec,
+ struct ring_buffer_event *event);
+extern void
+event_triggers_post_call(struct trace_event_file *file,
+ enum event_trigger_type tt,
+ void *rec, struct ring_buffer_event *event);

bool trace_event_ignore_this_pid(struct trace_event_file *trace_file);

@@ -424,7 +426,7 @@ extern void event_triggers_post_call(struct trace_event_file *file,

if (!(eflags & EVENT_FILE_FL_TRIGGER_COND)) {
if (eflags & EVENT_FILE_FL_TRIGGER_MODE)
- event_triggers_call(file, NULL);
+ event_triggers_call(file, NULL, NULL);
if (eflags & EVENT_FILE_FL_SOFT_DISABLED)
return true;
if (eflags & EVENT_FILE_FL_PID_FILTER)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index dcfc67d..2ce3bde 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1293,7 +1293,7 @@ static inline void trace_buffer_unlock_commit(struct trace_array *tr,
unsigned long eflags = file->flags;

if (eflags & EVENT_FILE_FL_TRIGGER_COND)
- *tt = event_triggers_call(file, entry);
+ *tt = event_triggers_call(file, entry, event);

if (test_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags) ||
(unlikely(file->flags & EVENT_FILE_FL_FILTERED) &&
@@ -1330,7 +1330,7 @@ static inline void trace_buffer_unlock_commit(struct trace_array *tr,
trace_buffer_unlock_commit(file->tr, buffer, event, irq_flags, pc);

if (tt)
- event_triggers_post_call(file, tt, entry);
+ event_triggers_post_call(file, tt, entry, event);
}

/**
@@ -1363,7 +1363,7 @@ static inline void trace_buffer_unlock_commit(struct trace_array *tr,
irq_flags, pc, regs);

if (tt)
- event_triggers_post_call(file, tt, entry);
+ event_triggers_post_call(file, tt, entry, event);
}

#define FILTER_PRED_INVALID ((unsigned short)-1)
@@ -1588,7 +1588,8 @@ extern void set_named_trigger_data(struct event_trigger_data *data,
*/
struct event_trigger_ops {
void (*func)(struct event_trigger_data *data,
- void *rec);
+ void *rec,
+ struct ring_buffer_event *rbe);
int (*init)(struct event_trigger_ops *ops,
struct event_trigger_data *data);
void (*free)(struct event_trigger_ops *ops,
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index a16904c..9809243 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -909,7 +909,8 @@ static inline void add_to_key(char *compound_key, void *key,
memcpy(compound_key + key_field->offset, key, size);
}

-static void event_hist_trigger(struct event_trigger_data *data, void *rec)
+static void event_hist_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
struct hist_trigger_data *hist_data = data->private_data;
bool use_compound_key = (hist_data->n_keys > 1);
@@ -1660,7 +1661,8 @@ __init int register_trigger_hist_cmd(void)
}

static void
-hist_enable_trigger(struct event_trigger_data *data, void *rec)
+hist_enable_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
struct enable_trigger_data *enable_data = data->private_data;
struct event_trigger_data *test;
@@ -1676,7 +1678,8 @@ __init int register_trigger_hist_cmd(void)
}

static void
-hist_enable_count_trigger(struct event_trigger_data *data, void *rec)
+hist_enable_count_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
if (!data->count)
return;
@@ -1684,7 +1687,7 @@ __init int register_trigger_hist_cmd(void)
if (data->count != -1)
(data->count)--;

- hist_enable_trigger(data, rec);
+ hist_enable_trigger(data, rec, event);
}

static struct event_trigger_ops hist_enable_trigger_ops = {
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index f2ac9d4..9b0fe31 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -63,7 +63,8 @@ void trigger_data_free(struct event_trigger_data *data)
* any trigger that should be deferred, ETT_NONE if nothing to defer.
*/
enum event_trigger_type
-event_triggers_call(struct trace_event_file *file, void *rec)
+event_triggers_call(struct trace_event_file *file, void *rec,
+ struct ring_buffer_event *event)
{
struct event_trigger_data *data;
enum event_trigger_type tt = ETT_NONE;
@@ -76,7 +77,7 @@ enum event_trigger_type
if (data->paused)
continue;
if (!rec) {
- data->ops->func(data, rec);
+ data->ops->func(data, rec, event);
continue;
}
filter = rcu_dereference_sched(data->filter);
@@ -86,7 +87,7 @@ enum event_trigger_type
tt |= data->cmd_ops->trigger_type;
continue;
}
- data->ops->func(data, rec);
+ data->ops->func(data, rec, event);
}
return tt;
}
@@ -108,7 +109,7 @@ enum event_trigger_type
void
event_triggers_post_call(struct trace_event_file *file,
enum event_trigger_type tt,
- void *rec)
+ void *rec, struct ring_buffer_event *event)
{
struct event_trigger_data *data;

@@ -116,7 +117,7 @@ enum event_trigger_type
if (data->paused)
continue;
if (data->cmd_ops->trigger_type & tt)
- data->ops->func(data, rec);
+ data->ops->func(data, rec, event);
}
}
EXPORT_SYMBOL_GPL(event_triggers_post_call);
@@ -909,7 +910,8 @@ void set_named_trigger_data(struct event_trigger_data *data,
}

static void
-traceon_trigger(struct event_trigger_data *data, void *rec)
+traceon_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
if (tracing_is_on())
return;
@@ -918,7 +920,8 @@ void set_named_trigger_data(struct event_trigger_data *data,
}

static void
-traceon_count_trigger(struct event_trigger_data *data, void *rec)
+traceon_count_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
if (tracing_is_on())
return;
@@ -933,7 +936,8 @@ void set_named_trigger_data(struct event_trigger_data *data,
}

static void
-traceoff_trigger(struct event_trigger_data *data, void *rec)
+traceoff_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
if (!tracing_is_on())
return;
@@ -942,7 +946,8 @@ void set_named_trigger_data(struct event_trigger_data *data,
}

static void
-traceoff_count_trigger(struct event_trigger_data *data, void *rec)
+traceoff_count_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
if (!tracing_is_on())
return;
@@ -1039,13 +1044,15 @@ void set_named_trigger_data(struct event_trigger_data *data,

#ifdef CONFIG_TRACER_SNAPSHOT
static void
-snapshot_trigger(struct event_trigger_data *data, void *rec)
+snapshot_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
tracing_snapshot();
}

static void
-snapshot_count_trigger(struct event_trigger_data *data, void *rec)
+snapshot_count_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
if (!data->count)
return;
@@ -1053,7 +1060,7 @@ void set_named_trigger_data(struct event_trigger_data *data,
if (data->count != -1)
(data->count)--;

- snapshot_trigger(data, rec);
+ snapshot_trigger(data, rec, event);
}

static int
@@ -1132,13 +1139,15 @@ static __init int register_trigger_snapshot_cmd(void)
#define STACK_SKIP 3

static void
-stacktrace_trigger(struct event_trigger_data *data, void *rec)
+stacktrace_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
trace_dump_stack(STACK_SKIP);
}

static void
-stacktrace_count_trigger(struct event_trigger_data *data, void *rec)
+stacktrace_count_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
if (!data->count)
return;
@@ -1146,7 +1155,7 @@ static __init int register_trigger_snapshot_cmd(void)
if (data->count != -1)
(data->count)--;

- stacktrace_trigger(data, rec);
+ stacktrace_trigger(data, rec, event);
}

static int
@@ -1208,7 +1217,8 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
}

static void
-event_enable_trigger(struct event_trigger_data *data, void *rec)
+event_enable_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
struct enable_trigger_data *enable_data = data->private_data;

@@ -1219,7 +1229,8 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
}

static void
-event_enable_count_trigger(struct event_trigger_data *data, void *rec)
+event_enable_count_trigger(struct event_trigger_data *data, void *rec,
+ struct ring_buffer_event *event)
{
struct enable_trigger_data *enable_data = data->private_data;

@@ -1233,7 +1244,7 @@ static __init void unregister_trigger_traceon_traceoff_cmds(void)
if (data->count != -1)
(data->count)--;

- event_enable_trigger(data, rec);
+ event_enable_trigger(data, rec, event);
}

int event_enable_trigger_print(struct seq_file *m,
--
1.9.3

2017-09-05 21:58:44

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 13/40] tracing: Make traceprobe parsing code reusable

traceprobe_probes_write() and traceprobe_command() actually contain
nothing that ties them to kprobes - the code is generically useful for
similar types of parsing elsewhere, so separate it out and move it to
trace.c/trace.h.

Other than moving it, the only change is in naming:
traceprobe_probes_write() becomes trace_parse_run_command() and
traceprobe_command() becomes trace_run_command().

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace.c | 86 +++++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace.h | 7 ++++
kernel/trace/trace_kprobe.c | 18 +++++-----
kernel/trace/trace_probe.c | 86 ---------------------------------------------
kernel/trace/trace_probe.h | 7 ----
kernel/trace/trace_uprobe.c | 2 +-
6 files changed, 103 insertions(+), 103 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 719e4c1..b03813b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8295,6 +8295,92 @@ void ftrace_dump(enum ftrace_dump_mode oops_dump_mode)
}
EXPORT_SYMBOL_GPL(ftrace_dump);

+int trace_run_command(const char *buf, int (*createfn)(int, char **))
+{
+ char **argv;
+ int argc, ret;
+
+ argc = 0;
+ ret = 0;
+ argv = argv_split(GFP_KERNEL, buf, &argc);
+ if (!argv)
+ return -ENOMEM;
+
+ if (argc)
+ ret = createfn(argc, argv);
+
+ argv_free(argv);
+
+ return ret;
+}
+
+#define WRITE_BUFSIZE 4096
+
+ssize_t trace_parse_run_command(struct file *file, const char __user *buffer,
+ size_t count, loff_t *ppos,
+ int (*createfn)(int, char **))
+{
+ char *kbuf, *buf, *tmp;
+ int ret = 0;
+ size_t done = 0;
+ size_t size;
+
+ kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ while (done < count) {
+ size = count - done;
+
+ if (size >= WRITE_BUFSIZE)
+ size = WRITE_BUFSIZE - 1;
+
+ if (copy_from_user(kbuf, buffer + done, size)) {
+ ret = -EFAULT;
+ goto out;
+ }
+ kbuf[size] = '\0';
+ buf = kbuf;
+ do {
+ tmp = strchr(buf, '\n');
+ if (tmp) {
+ *tmp = '\0';
+ size = tmp - buf + 1;
+ } else {
+ size = strlen(buf);
+ if (done + size < count) {
+ if (buf != kbuf)
+ break;
+ /* This can accept WRITE_BUFSIZE - 2 ('\n' + '\0') */
+ pr_warn("Line length is too long: Should be less than %d\n",
+ WRITE_BUFSIZE - 2);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+ done += size;
+
+ /* Remove comments */
+ tmp = strchr(buf, '#');
+
+ if (tmp)
+ *tmp = '\0';
+
+ ret = trace_run_command(buf, createfn);
+ if (ret)
+ goto out;
+ buf += size;
+
+ } while (done < count);
+ }
+ ret = done;
+
+out:
+ kfree(kbuf);
+
+ return ret;
+}
+
__init static int tracer_alloc_buffers(void)
{
int ring_buf_size;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2ce3bde..2fed4ed 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1756,6 +1756,13 @@ extern int trace_event_enable_disable(struct trace_event_file *file,
int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set);
int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled);

+#define MAX_EVENT_NAME_LEN 64
+
+extern int trace_run_command(const char *buf, int (*createfn)(int, char**));
+extern ssize_t trace_parse_run_command(struct file *file,
+ const char __user *buffer, size_t count, loff_t *ppos,
+ int (*createfn)(int, char**));
+
/*
* Normal trace_printk() and friends allocates special buffers
* to do the manipulation, as well as saves the print formats
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index c9b5aa1..996902a 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -907,8 +907,8 @@ static int probes_open(struct inode *inode, struct file *file)
static ssize_t probes_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
- return traceprobe_probes_write(file, buffer, count, ppos,
- create_trace_kprobe);
+ return trace_parse_run_command(file, buffer, count, ppos,
+ create_trace_kprobe);
}

static const struct file_operations kprobe_events_ops = {
@@ -1433,9 +1433,9 @@ static __init int kprobe_trace_self_tests_init(void)

pr_info("Testing kprobe tracing: ");

- ret = traceprobe_command("p:testprobe kprobe_trace_selftest_target "
- "$stack $stack0 +0($stack)",
- create_trace_kprobe);
+ ret = trace_run_command("p:testprobe kprobe_trace_selftest_target "
+ "$stack $stack0 +0($stack)",
+ create_trace_kprobe);
if (WARN_ON_ONCE(ret)) {
pr_warn("error on probing function entry.\n");
warn++;
@@ -1455,8 +1455,8 @@ static __init int kprobe_trace_self_tests_init(void)
}
}

- ret = traceprobe_command("r:testprobe2 kprobe_trace_selftest_target "
- "$retval", create_trace_kprobe);
+ ret = trace_run_command("r:testprobe2 kprobe_trace_selftest_target "
+ "$retval", create_trace_kprobe);
if (WARN_ON_ONCE(ret)) {
pr_warn("error on probing function return.\n");
warn++;
@@ -1526,13 +1526,13 @@ static __init int kprobe_trace_self_tests_init(void)
disable_trace_kprobe(tk, file);
}

- ret = traceprobe_command("-:testprobe", create_trace_kprobe);
+ ret = trace_run_command("-:testprobe", create_trace_kprobe);
if (WARN_ON_ONCE(ret)) {
pr_warn("error on deleting a probe.\n");
warn++;
}

- ret = traceprobe_command("-:testprobe2", create_trace_kprobe);
+ ret = trace_run_command("-:testprobe2", create_trace_kprobe);
if (WARN_ON_ONCE(ret)) {
pr_warn("error on deleting a probe.\n");
warn++;
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 52478f0..d593573 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -623,92 +623,6 @@ void traceprobe_free_probe_arg(struct probe_arg *arg)
kfree(arg->comm);
}

-int traceprobe_command(const char *buf, int (*createfn)(int, char **))
-{
- char **argv;
- int argc, ret;
-
- argc = 0;
- ret = 0;
- argv = argv_split(GFP_KERNEL, buf, &argc);
- if (!argv)
- return -ENOMEM;
-
- if (argc)
- ret = createfn(argc, argv);
-
- argv_free(argv);
-
- return ret;
-}
-
-#define WRITE_BUFSIZE 4096
-
-ssize_t traceprobe_probes_write(struct file *file, const char __user *buffer,
- size_t count, loff_t *ppos,
- int (*createfn)(int, char **))
-{
- char *kbuf, *buf, *tmp;
- int ret = 0;
- size_t done = 0;
- size_t size;
-
- kbuf = kmalloc(WRITE_BUFSIZE, GFP_KERNEL);
- if (!kbuf)
- return -ENOMEM;
-
- while (done < count) {
- size = count - done;
-
- if (size >= WRITE_BUFSIZE)
- size = WRITE_BUFSIZE - 1;
-
- if (copy_from_user(kbuf, buffer + done, size)) {
- ret = -EFAULT;
- goto out;
- }
- kbuf[size] = '\0';
- buf = kbuf;
- do {
- tmp = strchr(buf, '\n');
- if (tmp) {
- *tmp = '\0';
- size = tmp - buf + 1;
- } else {
- size = strlen(buf);
- if (done + size < count) {
- if (buf != kbuf)
- break;
- /* This can accept WRITE_BUFSIZE - 2 ('\n' + '\0') */
- pr_warn("Line length is too long: Should be less than %d\n",
- WRITE_BUFSIZE - 2);
- ret = -EINVAL;
- goto out;
- }
- }
- done += size;
-
- /* Remove comments */
- tmp = strchr(buf, '#');
-
- if (tmp)
- *tmp = '\0';
-
- ret = traceprobe_command(buf, createfn);
- if (ret)
- goto out;
- buf += size;
-
- } while (done < count);
- }
- ret = done;
-
-out:
- kfree(kbuf);
-
- return ret;
-}
-
static int __set_print_fmt(struct trace_probe *tp, char *buf, int len,
bool is_return)
{
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 903273c..fb66e3e 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -42,7 +42,6 @@

#define MAX_TRACE_ARGS 128
#define MAX_ARGSTR_LEN 63
-#define MAX_EVENT_NAME_LEN 64
#define MAX_STRING_SIZE PATH_MAX

/* Reserved field names */
@@ -356,12 +355,6 @@ extern int traceprobe_conflict_field_name(const char *name,

extern int traceprobe_split_symbol_offset(char *symbol, unsigned long *offset);

-extern ssize_t traceprobe_probes_write(struct file *file,
- const char __user *buffer, size_t count, loff_t *ppos,
- int (*createfn)(int, char**));
-
-extern int traceprobe_command(const char *buf, int (*createfn)(int, char**));
-
/* Sum up total data length for dynamic arraies (strings) */
static nokprobe_inline int
__get_data_size(struct trace_probe *tp, struct pt_regs *regs)
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index a7581fe..402120b 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -651,7 +651,7 @@ static int probes_open(struct inode *inode, struct file *file)
static ssize_t probes_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
- return traceprobe_probes_write(file, buffer, count, ppos, create_trace_uprobe);
+ return trace_parse_run_command(file, buffer, count, ppos, create_trace_uprobe);
}

static const struct file_operations uprobe_events_ops = {
--
1.9.3

2017-09-05 21:58:50

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 17/40] tracing: Add usecs modifier for hist trigger timestamps

Appending .usecs onto a common_timestamp field will cause the
timestamp value to be in microseconds instead of the default
nanoseconds. A typical latency histogram using usecs would look like
this:

# echo 'hist:keys=pid,prio:ts0=$common_timestamp.usecs ...
# echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0 ...

This also adds an external trace_clock_in_ns() to trace.c for the
timestamp conversion.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace.c | 8 ++++++++
kernel/trace/trace.h | 2 ++
kernel/trace/trace_events_hist.c | 28 ++++++++++++++++++++++------
3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b03813b..6ee3a88 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1170,6 +1170,14 @@ unsigned long nsecs_to_usecs(unsigned long nsecs)
ARCH_TRACE_CLOCKS
};

+bool trace_clock_in_ns(struct trace_array *tr)
+{
+ if (trace_clocks[tr->clock_id].in_ns)
+ return true;
+
+ return false;
+}
+
/*
* trace_parser_get_init - gets the buffer for trace parser
*/
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 2fed4ed..02bfd5c 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,8 @@ enum {

extern int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs);

+extern bool trace_clock_in_ns(struct trace_array *tr);
+
/*
* The global tracer (top) should be the first trace array added,
* but we check the flag anyway.
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 65aad07..45a942e 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -90,12 +90,6 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event,
return (u64) ilog2(roundup_pow_of_two(val));
}

-static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
-{
- return ring_buffer_event_time_stamp(rbe);
-}
-
#define DEFINE_HIST_FIELD_FN(type) \
static u64 hist_field_##type(struct hist_field *hist_field, \
void *event, \
@@ -143,6 +137,7 @@ enum hist_field_flags {
HIST_FIELD_FL_STACKTRACE = 256,
HIST_FIELD_FL_LOG2 = 512,
HIST_FIELD_FL_TIMESTAMP = 1024,
+ HIST_FIELD_FL_TIMESTAMP_USECS = 2048,
};

struct hist_trigger_attrs {
@@ -153,6 +148,7 @@ struct hist_trigger_attrs {
bool pause;
bool cont;
bool clear;
+ bool ts_in_usecs;
unsigned int map_bits;
};

@@ -170,6 +166,20 @@ struct hist_trigger_data {
bool enable_timestamps;
};

+static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
+{
+ struct hist_trigger_data *hist_data = hist_field->hist_data;
+ struct trace_array *tr = hist_data->event_file->tr;
+
+ u64 ts = ring_buffer_event_time_stamp(rbe);
+
+ if (hist_data->attrs->ts_in_usecs && trace_clock_in_ns(tr))
+ ts = ns2usecs(ts);
+
+ return ts;
+}
+
static const char *hist_field_name(struct hist_field *field,
unsigned int level)
{
@@ -618,6 +628,8 @@ static int create_key_field(struct hist_trigger_data *hist_data,
flags |= HIST_FIELD_FL_SYSCALL;
else if (strcmp(field_str, "log2") == 0)
flags |= HIST_FIELD_FL_LOG2;
+ else if (strcmp(field_str, "usecs") == 0)
+ flags |= HIST_FIELD_FL_TIMESTAMP_USECS;
else {
ret = -EINVAL;
goto out;
@@ -627,6 +639,8 @@ static int create_key_field(struct hist_trigger_data *hist_data,
if (strcmp(field_name, "$common_timestamp") == 0) {
flags |= HIST_FIELD_FL_TIMESTAMP;
hist_data->enable_timestamps = true;
+ if (flags & HIST_FIELD_FL_TIMESTAMP_USECS)
+ hist_data->attrs->ts_in_usecs = true;
key_size = sizeof(u64);
} else {
field = trace_find_event_field(file->event_call, field_name);
@@ -1227,6 +1241,8 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)
flags_str = "syscall";
else if (hist_field->flags & HIST_FIELD_FL_LOG2)
flags_str = "log2";
+ else if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP_USECS)
+ flags_str = "usecs";

return flags_str;
}
--
1.9.3

2017-09-05 21:59:04

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 20/40] tracing: Add simple expression support to hist triggers

Add support for simple addition, subtraction, and unary expressions
(-(expr) and expr, where expr = b-a, a+b, a+b+c) to hist triggers, in
order to support a minimal set of useful inter-event calculations.

These operations are needed for calculating latencies between events
(timestamp1-timestamp0) and for combined latencies (latencies over 3
or more events).

In the process, factor out some common code from key and value
parsing.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 556 ++++++++++++++++++++++++++++++++-------
1 file changed, 460 insertions(+), 96 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 3c4eae9..bbca6d3 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -32,6 +32,13 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
#define HIST_FIELD_OPERANDS_MAX 2
#define HIST_FIELDS_MAX (TRACING_MAP_FIELDS_MAX + TRACING_MAP_VARS_MAX)

+enum field_op_id {
+ FIELD_OP_NONE,
+ FIELD_OP_PLUS,
+ FIELD_OP_MINUS,
+ FIELD_OP_UNARY_MINUS,
+};
+
struct hist_var {
char *name;
struct hist_trigger_data *hist_data;
@@ -48,6 +55,8 @@ struct hist_field {
struct hist_field *operands[HIST_FIELD_OPERANDS_MAX];
struct hist_trigger_data *hist_data;
struct hist_var var;
+ enum field_op_id operator;
+ char *name;
};

static u64 hist_field_none(struct hist_field *field, void *event,
@@ -98,6 +107,41 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event,
return (u64) ilog2(roundup_pow_of_two(val));
}

+static u64 hist_field_plus(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
+{
+ struct hist_field *operand1 = hist_field->operands[0];
+ struct hist_field *operand2 = hist_field->operands[1];
+
+ u64 val1 = operand1->fn(operand1, event, rbe);
+ u64 val2 = operand2->fn(operand2, event, rbe);
+
+ return val1 + val2;
+}
+
+static u64 hist_field_minus(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
+{
+ struct hist_field *operand1 = hist_field->operands[0];
+ struct hist_field *operand2 = hist_field->operands[1];
+
+ u64 val1 = operand1->fn(operand1, event, rbe);
+ u64 val2 = operand2->fn(operand2, event, rbe);
+
+ return val1 - val2;
+}
+
+static u64 hist_field_unary_minus(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
+{
+ struct hist_field *operand = hist_field->operands[0];
+
+ s64 sval = (s64)operand->fn(operand, event, rbe);
+ u64 val = (u64)-sval;
+
+ return val;
+}
+
#define DEFINE_HIST_FIELD_FN(type) \
static u64 hist_field_##type(struct hist_field *hist_field, \
void *event, \
@@ -148,6 +192,7 @@ enum hist_field_flags {
HIST_FIELD_FL_TIMESTAMP_USECS = 2048,
HIST_FIELD_FL_VAR = 4096,
HIST_FIELD_FL_VAR_ONLY = 8192,
+ HIST_FIELD_FL_EXPR = 16384,
};

struct var_defs {
@@ -218,6 +263,8 @@ static const char *hist_field_name(struct hist_field *field,
field_name = hist_field_name(field->operands[0], ++level);
else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
field_name = "$common_timestamp";
+ else if (field->flags & HIST_FIELD_FL_EXPR)
+ field_name = field->name;

if (field_name == NULL)
field_name = "";
@@ -441,6 +488,115 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
.elt_init = hist_trigger_elt_comm_init,
};

+static const char *get_hist_field_flags(struct hist_field *hist_field)
+{
+ const char *flags_str = NULL;
+
+ if (hist_field->flags & HIST_FIELD_FL_HEX)
+ flags_str = "hex";
+ else if (hist_field->flags & HIST_FIELD_FL_SYM)
+ flags_str = "sym";
+ else if (hist_field->flags & HIST_FIELD_FL_SYM_OFFSET)
+ flags_str = "sym-offset";
+ else if (hist_field->flags & HIST_FIELD_FL_EXECNAME)
+ flags_str = "execname";
+ else if (hist_field->flags & HIST_FIELD_FL_SYSCALL)
+ flags_str = "syscall";
+ else if (hist_field->flags & HIST_FIELD_FL_LOG2)
+ flags_str = "log2";
+ else if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP_USECS)
+ flags_str = "usecs";
+
+ return flags_str;
+}
+
+static char *expr_str(struct hist_field *field, unsigned int level)
+{
+ char *expr;
+
+ if (level > 1)
+ return NULL;
+
+ expr = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
+ if (!expr)
+ return NULL;
+
+ if (field->operator == FIELD_OP_UNARY_MINUS) {
+ char *subexpr;
+
+ strcat(expr, "-(");
+ subexpr = expr_str(field->operands[0], ++level);
+ if (!subexpr) {
+ kfree(expr);
+ return NULL;
+ }
+ strcat(expr, subexpr);
+ strcat(expr, ")");
+
+ return expr;
+ }
+
+ strcat(expr, hist_field_name(field->operands[0], 0));
+ if (field->operands[0]->flags) {
+ const char *flags_str = get_hist_field_flags(field->operands[0]);
+
+ if (flags_str) {
+ strcat(expr, ".");
+ strcat(expr, flags_str);
+ }
+ }
+
+ switch (field->operator) {
+ case FIELD_OP_MINUS:
+ strcat(expr, "-");
+ break;
+ case FIELD_OP_PLUS:
+ strcat(expr, "+");
+ break;
+ default:
+ kfree(expr);
+ return NULL;
+ }
+
+ strcat(expr, hist_field_name(field->operands[1], 0));
+ if (field->operands[1]->flags) {
+ const char *flags_str = get_hist_field_flags(field->operands[1]);
+
+ if (flags_str) {
+ strcat(expr, ".");
+ strcat(expr, flags_str);
+ }
+ }
+
+ return expr;
+}
+
+static int contains_operator(char *str)
+{
+ enum field_op_id field_op = FIELD_OP_NONE;
+ char *op;
+
+ op = strpbrk(str, "+-");
+ if (!op)
+ return FIELD_OP_NONE;
+
+ switch (*op) {
+ case '-':
+ if (*str == '-')
+ field_op = FIELD_OP_UNARY_MINUS;
+ else
+ field_op = FIELD_OP_MINUS;
+ break;
+ case '+':
+ field_op = FIELD_OP_PLUS;
+ break;
+ default:
+ break;
+ }
+
+ return field_op;
+}
+
static void destroy_hist_field(struct hist_field *hist_field,
unsigned int level)
{
@@ -456,6 +612,7 @@ static void destroy_hist_field(struct hist_field *hist_field,
destroy_hist_field(hist_field->operands[i], level + 1);

kfree(hist_field->var.name);
+ kfree(hist_field->name);

kfree(hist_field);
}
@@ -476,6 +633,9 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,

hist_field->hist_data = hist_data;

+ if (flags & HIST_FIELD_FL_EXPR)
+ goto out; /* caller will populate */
+
if (flags & HIST_FIELD_FL_HITCOUNT) {
hist_field->fn = hist_field_counter;
goto out;
@@ -548,6 +708,287 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
}
}

+static char *field_name_from_var(struct hist_trigger_data *hist_data,
+ char *var_name)
+{
+ char *name, *field;
+ unsigned int i;
+
+ for (i = 0; i < hist_data->attrs->var_defs.n_vars; i++) {
+ name = hist_data->attrs->var_defs.name[i];
+
+ if (strcmp(var_name, name) == 0) {
+ field = hist_data->attrs->var_defs.expr[i];
+ if (contains_operator(field))
+ continue;
+ return field;
+ }
+ }
+
+ return NULL;
+}
+
+static char *local_field_var_ref(struct hist_trigger_data *hist_data,
+ char *var_name)
+{
+ var_name++;
+
+ return field_name_from_var(hist_data, var_name);
+}
+
+static struct ftrace_event_field *
+parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
+ char *field_str, unsigned long *flags)
+{
+ struct ftrace_event_field *field = NULL;
+ char *field_name, *modifier, *str;
+
+ modifier = str = kstrdup(field_str, GFP_KERNEL);
+ if (!modifier)
+ return ERR_PTR(-ENOMEM);
+
+ field_name = strsep(&modifier, ".");
+ if (modifier) {
+ if (strcmp(modifier, "hex") == 0)
+ *flags |= HIST_FIELD_FL_HEX;
+ else if (strcmp(modifier, "sym") == 0)
+ *flags |= HIST_FIELD_FL_SYM;
+ else if (strcmp(modifier, "sym-offset") == 0)
+ *flags |= HIST_FIELD_FL_SYM_OFFSET;
+ else if ((strcmp(modifier, "execname") == 0) &&
+ (strcmp(field_name, "common_pid") == 0))
+ *flags |= HIST_FIELD_FL_EXECNAME;
+ else if (strcmp(modifier, "syscall") == 0)
+ *flags |= HIST_FIELD_FL_SYSCALL;
+ else if (strcmp(modifier, "log2") == 0)
+ *flags |= HIST_FIELD_FL_LOG2;
+ else if (strcmp(modifier, "usecs") == 0)
+ *flags |= HIST_FIELD_FL_TIMESTAMP_USECS;
+ else {
+ field = ERR_PTR(-EINVAL);
+ goto out;
+ }
+ }
+
+ if (strcmp(field_name, "$common_timestamp") == 0) {
+ *flags |= HIST_FIELD_FL_TIMESTAMP;
+ hist_data->enable_timestamps = true;
+ if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS)
+ hist_data->attrs->ts_in_usecs = true;
+ } else {
+ field = trace_find_event_field(file->event_call, field_name);
+ if (!field || !field->size) {
+ field = ERR_PTR(-EINVAL);
+ goto out;
+ }
+ }
+ out:
+ kfree(str);
+
+ return field;
+}
+
+struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file, char *str,
+ unsigned long *flags, char *var_name)
+{
+ char *s;
+ struct ftrace_event_field *field = NULL;
+ struct hist_field *hist_field = NULL;
+ int ret = 0;
+
+ s = local_field_var_ref(hist_data, str);
+ if (s)
+ str = s;
+
+ field = parse_field(hist_data, file, str, flags);
+ if (IS_ERR(field)) {
+ ret = PTR_ERR(field);
+ goto out;
+ }
+
+ hist_field = create_hist_field(hist_data, field, *flags, var_name);
+ if (!hist_field) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ return hist_field;
+ out:
+ return ERR_PTR(ret);
+}
+
+static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file,
+ char *str, unsigned long flags,
+ char *var_name, unsigned int level);
+
+static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file,
+ char *str, unsigned long flags,
+ char *var_name, unsigned int level)
+{
+ struct hist_field *operand1, *expr = NULL;
+ unsigned long operand_flags;
+ int ret = 0;
+ char *s;
+
+ // we support only -(xxx) i.e. explicit parens required
+
+ if (level > 2) {
+ ret = -EINVAL;
+ goto free;
+ }
+
+ str++; // skip leading '-'
+
+ s = strchr(str, '(');
+ if (s)
+ str++;
+ else {
+ ret = -EINVAL;
+ goto free;
+ }
+
+ s = strchr(str, ')');
+ if (s)
+ *s = '\0';
+ else {
+ ret = -EINVAL; // no closing ')'
+ goto free;
+ }
+
+ strsep(&str, "(");
+ if (!str)
+ goto free;
+
+ flags |= HIST_FIELD_FL_EXPR;
+ expr = create_hist_field(hist_data, NULL, flags, var_name);
+ if (!expr) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ operand_flags = 0;
+ operand1 = parse_expr(hist_data, file, str, operand_flags, NULL, ++level);
+ if (IS_ERR(operand1)) {
+ ret = PTR_ERR(operand1);
+ goto free;
+ }
+
+ expr->fn = hist_field_unary_minus;
+ expr->operands[0] = operand1;
+ expr->operator = FIELD_OP_UNARY_MINUS;
+ expr->name = expr_str(expr, 0);
+
+ return expr;
+ free:
+ return ERR_PTR(ret);
+}
+
+static int check_expr_operands(struct hist_field *operand1,
+ struct hist_field *operand2)
+{
+ unsigned long operand1_flags = operand1->flags;
+ unsigned long operand2_flags = operand2->flags;
+
+ if ((operand1_flags & HIST_FIELD_FL_TIMESTAMP_USECS) !=
+ (operand2_flags & HIST_FIELD_FL_TIMESTAMP_USECS))
+ return -EINVAL;
+
+ return 0;
+}
+
+static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file,
+ char *str, unsigned long flags,
+ char *var_name, unsigned int level)
+{
+ struct hist_field *operand1 = NULL, *operand2 = NULL, *expr = NULL;
+ unsigned long operand_flags;
+ int field_op, ret = -EINVAL;
+ char *sep, *operand1_str;
+
+ if (level > 2)
+ return ERR_PTR(-EINVAL);
+
+ field_op = contains_operator(str);
+
+ if (field_op == FIELD_OP_NONE)
+ return parse_atom(hist_data, file, str, &flags, var_name);
+
+ if (field_op == FIELD_OP_UNARY_MINUS)
+ return parse_unary(hist_data, file, str, flags, var_name, ++level);
+
+ switch (field_op) {
+ case FIELD_OP_MINUS:
+ sep = "-";
+ break;
+ case FIELD_OP_PLUS:
+ sep = "+";
+ break;
+ default:
+ goto free;
+ }
+
+ operand1_str = strsep(&str, sep);
+ if (!operand1_str || !str)
+ goto free;
+
+ operand_flags = 0;
+ operand1 = parse_atom(hist_data, file, operand1_str,
+ &operand_flags, NULL);
+ if (IS_ERR(operand1)) {
+ ret = PTR_ERR(operand1);
+ operand1 = NULL;
+ goto free;
+ }
+
+ // rest of string could be another expression e.g. b+c in a+b+c
+ operand_flags = 0;
+ operand2 = parse_expr(hist_data, file, str, operand_flags, NULL, ++level);
+ if (IS_ERR(operand2)) {
+ ret = PTR_ERR(operand2);
+ operand2 = NULL;
+ goto free;
+ }
+
+ ret = check_expr_operands(operand1, operand2);
+ if (ret)
+ goto free;
+
+ flags |= HIST_FIELD_FL_EXPR;
+ expr = create_hist_field(hist_data, NULL, flags, var_name);
+ if (!expr) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ expr->operands[0] = operand1;
+ expr->operands[1] = operand2;
+ expr->operator = field_op;
+ expr->name = expr_str(expr, 0);
+
+ switch (field_op) {
+ case FIELD_OP_MINUS:
+ expr->fn = hist_field_minus;
+ break;
+ case FIELD_OP_PLUS:
+ expr->fn = hist_field_plus;
+ break;
+ default:
+ goto free;
+ }
+
+ return expr;
+ free:
+ destroy_hist_field(operand1, 0);
+ destroy_hist_field(operand2, 0);
+ destroy_hist_field(expr, 0);
+
+ return ERR_PTR(ret);
+}
+
static int create_hitcount_val(struct hist_trigger_data *hist_data)
{
hist_data->fields[HITCOUNT_IDX] =
@@ -607,41 +1048,21 @@ static int __create_val_field(struct hist_trigger_data *hist_data,
char *var_name, char *field_str,
unsigned long flags)
{
- struct ftrace_event_field *field = NULL;
- char *field_name;
+ struct hist_field *hist_field;
int ret = 0;

- field_name = strsep(&field_str, ".");
- if (field_str) {
- if (strcmp(field_str, "hex") == 0)
- flags |= HIST_FIELD_FL_HEX;
- else {
- ret = -EINVAL;
- goto out;
- }
- }
-
- if (strcmp(field_name, "$common_timestamp") == 0) {
- flags |= HIST_FIELD_FL_TIMESTAMP;
- hist_data->enable_timestamps = true;
- } else {
- field = trace_find_event_field(file->event_call, field_name);
- if (!field || !field->size) {
- ret = -EINVAL;
- goto out;
- }
- }
-
- hist_data->fields[val_idx] = create_hist_field(hist_data, field, flags, var_name);
- if (!hist_data->fields[val_idx]) {
- ret = -ENOMEM;
+ hist_field = parse_expr(hist_data, file, field_str, flags, var_name, 0);
+ if (IS_ERR(hist_field)) {
+ ret = PTR_ERR(hist_field);
goto out;
}

+ hist_data->fields[val_idx] = hist_field;
+
++hist_data->n_vals;
++hist_data->n_fields;

- if (hist_data->fields[val_idx]->flags & HIST_FIELD_FL_VAR_ONLY)
+ if (hist_field->flags & HIST_FIELD_FL_VAR_ONLY)
hist_data->n_var_only++;

if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX + TRACING_MAP_VARS_MAX))
@@ -731,8 +1152,8 @@ static int create_key_field(struct hist_trigger_data *hist_data,
struct trace_event_file *file,
char *field_str)
{
- struct ftrace_event_field *field = NULL;
struct hist_field *hist_field = NULL;
+
unsigned long flags = 0;
unsigned int key_size;
int ret = 0;
@@ -747,60 +1168,24 @@ static int create_key_field(struct hist_trigger_data *hist_data,
key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH;
hist_field = create_hist_field(hist_data, NULL, flags, NULL);
} else {
- char *field_name = strsep(&field_str, ".");
-
- if (field_str) {
- if (strcmp(field_str, "hex") == 0)
- flags |= HIST_FIELD_FL_HEX;
- else if (strcmp(field_str, "sym") == 0)
- flags |= HIST_FIELD_FL_SYM;
- else if (strcmp(field_str, "sym-offset") == 0)
- flags |= HIST_FIELD_FL_SYM_OFFSET;
- else if ((strcmp(field_str, "execname") == 0) &&
- (strcmp(field_name, "common_pid") == 0))
- flags |= HIST_FIELD_FL_EXECNAME;
- else if (strcmp(field_str, "syscall") == 0)
- flags |= HIST_FIELD_FL_SYSCALL;
- else if (strcmp(field_str, "log2") == 0)
- flags |= HIST_FIELD_FL_LOG2;
- else if (strcmp(field_str, "usecs") == 0)
- flags |= HIST_FIELD_FL_TIMESTAMP_USECS;
- else {
- ret = -EINVAL;
- goto out;
- }
+ hist_field = parse_expr(hist_data, file, field_str, flags,
+ NULL, 0);
+ if (IS_ERR(hist_field)) {
+ ret = PTR_ERR(hist_field);
+ goto out;
}

- if (strcmp(field_name, "$common_timestamp") == 0) {
- flags |= HIST_FIELD_FL_TIMESTAMP;
- hist_data->enable_timestamps = true;
- if (flags & HIST_FIELD_FL_TIMESTAMP_USECS)
- hist_data->attrs->ts_in_usecs = true;
- key_size = sizeof(u64);
- } else {
- field = trace_find_event_field(file->event_call, field_name);
- if (!field || !field->size) {
- ret = -EINVAL;
- goto out;
- }
-
- if (is_string_field(field))
- key_size = MAX_FILTER_STR_VAL;
- else
- key_size = field->size;
- }
+ key_size = hist_field->size;
}

- hist_data->fields[key_idx] = create_hist_field(hist_data, field, flags, NULL);
- if (!hist_data->fields[key_idx]) {
- ret = -ENOMEM;
- goto out;
- }
+ hist_data->fields[key_idx] = hist_field;

key_size = ALIGN(key_size, sizeof(u64));
hist_data->fields[key_idx]->size = key_size;
hist_data->fields[key_idx]->offset = key_offset;
+
hist_data->key_size += key_size;
+
if (hist_data->key_size > HIST_KEY_SIZE_MAX) {
ret = -EINVAL;
goto out;
@@ -1383,7 +1768,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
for (i = 1; i < hist_data->n_vals; i++) {
field_name = hist_field_name(hist_data->fields[i], 0);

- if (hist_data->fields[i]->flags & HIST_FIELD_FL_VAR)
+ if (hist_data->fields[i]->flags & HIST_FIELD_FL_VAR ||
+ hist_data->fields[i]->flags & HIST_FIELD_FL_EXPR)
continue;

if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
@@ -1483,28 +1869,6 @@ static int event_hist_open(struct inode *inode, struct file *file)
.release = single_release,
};

-static const char *get_hist_field_flags(struct hist_field *hist_field)
-{
- const char *flags_str = NULL;
-
- if (hist_field->flags & HIST_FIELD_FL_HEX)
- flags_str = "hex";
- else if (hist_field->flags & HIST_FIELD_FL_SYM)
- flags_str = "sym";
- else if (hist_field->flags & HIST_FIELD_FL_SYM_OFFSET)
- flags_str = "sym-offset";
- else if (hist_field->flags & HIST_FIELD_FL_EXECNAME)
- flags_str = "execname";
- else if (hist_field->flags & HIST_FIELD_FL_SYSCALL)
- flags_str = "syscall";
- else if (hist_field->flags & HIST_FIELD_FL_LOG2)
- flags_str = "log2";
- else if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP_USECS)
- flags_str = "usecs";
-
- return flags_str;
-}
-
static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
{
const char *field_name = hist_field_name(hist_field, 0);
--
1.9.3

2017-09-05 21:59:17

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 25/40] tracing: Add support for dynamic tracepoints

The tracepoint infrastructure assumes statically-defined tracepoints
and uses static_keys for tracepoint enablement. In order to define
tracepoints on the fly, we need to have a dynamic counterpart.

Add a 'dynamic' flag to struct tracepoint along with accompanying
logic for this purpose.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/tracepoint-defs.h | 1 +
kernel/tracepoint.c | 18 +++++++++++++-----
2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index a031920..bc22d54 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -32,6 +32,7 @@ struct tracepoint {
int (*regfunc)(void);
void (*unregfunc)(void);
struct tracepoint_func __rcu *funcs;
+ bool dynamic;
};

#endif
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 685c50a..1c5957f 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -197,7 +197,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
struct tracepoint_func *old, *tp_funcs;
int ret;

- if (tp->regfunc && !static_key_enabled(&tp->key)) {
+ if (tp->regfunc &&
+ ((tp->dynamic && !(atomic_read(&tp->key.enabled) > 0)) ||
+ !static_key_enabled(&tp->key))) {
ret = tp->regfunc();
if (ret < 0)
return ret;
@@ -219,7 +221,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
* is used.
*/
rcu_assign_pointer(tp->funcs, tp_funcs);
- if (!static_key_enabled(&tp->key))
+ if (tp->dynamic && !(atomic_read(&tp->key.enabled) > 0))
+ atomic_inc(&tp->key.enabled);
+ else if (!tp->dynamic && !static_key_enabled(&tp->key))
static_key_slow_inc(&tp->key);
release_probes(old);
return 0;
@@ -246,10 +250,14 @@ static int tracepoint_remove_func(struct tracepoint *tp,

if (!tp_funcs) {
/* Removed last function */
- if (tp->unregfunc && static_key_enabled(&tp->key))
+ if (tp->unregfunc &&
+ ((tp->dynamic && (atomic_read(&tp->key.enabled) > 0)) ||
+ static_key_enabled(&tp->key)))
tp->unregfunc();

- if (static_key_enabled(&tp->key))
+ if (tp->dynamic && (atomic_read(&tp->key.enabled) > 0))
+ atomic_dec(&tp->key.enabled);
+ else if (!tp->dynamic && static_key_enabled(&tp->key))
static_key_slow_dec(&tp->key);
}
rcu_assign_pointer(tp->funcs, tp_funcs);
@@ -258,7 +266,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
}

/**
- * tracepoint_probe_register - Connect a probe to a tracepoint
+ * tracepoint_probe_register_prio - Connect a probe to a tracepoint
* @tp: tracepoint
* @probe: probe handler
* @data: tracepoint data
--
1.9.3

2017-09-05 21:59:31

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 29/40] tracing: Add 'onmatch' hist trigger action support

Add an 'onmatch(matching.event).<synthetic_event_name>(param list)'
hist trigger action which is invoked with the set of variables or
event fields named in the 'param list'. The result is the generation
of a synthetic event that consists of the values contained in those
variables and/or fields at the time the invoking event was hit.

As an example the below defines a simple synthetic event using a
variable defined on the sched_wakeup_new event, and shows the event
definition with unresolved fields, since the sched_wakeup_new event
with the testpid variable hasn't been defined yet:

# echo 'wakeup_new_test pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events

# cat /sys/kernel/debug/tracing/synthetic_events
wakeup_new_test pid_t pid; int prio

The following hist trigger both defines a testpid variable and
specifies an onmatch() trace action that uses that variable along with
a non-variable field to generate a wakeup_new_test synthetic event
whenever a sched_wakeup_new event occurs, which because of the 'if
comm == "cyclictest"' filter only happens when the executable is
cyclictest:

# echo 'hist:testpid=pid:keys=$testpid:\
onmatch(sched.sched_wakeup_new).wakeup_new_test($testpid, prio) \
if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger

Creating and displaying a histogram based on those events is now just
a matter of using the fields and new synthetic event in the
tracing/events/synthetic directory, as usual:

# echo 'hist:keys=pid,prio:sort=pid,prio' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 394 +++++++++++++++++++++++++++++++++++++--
1 file changed, 382 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index ea006a0..73d2bd4 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -323,7 +323,18 @@ typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,

struct action_data {
action_fn_t fn;
- unsigned int var_ref_idx;
+ unsigned int n_params;
+ char *params[SYNTH_FIELDS_MAX];
+
+ union {
+ struct {
+ unsigned int var_ref_idx;
+ char *match_event;
+ char *match_event_system;
+ char *synth_event_name;
+ struct synth_event *synth_event;
+ } onmatch;
+ };
};

static LIST_HEAD(synth_event_list);
@@ -927,6 +938,21 @@ static struct synth_event *alloc_synth_event(char *event_name, int n_fields,
return event;
}

+static void action_trace(struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt, void *rec,
+ struct ring_buffer_event *rbe,
+ struct action_data *data, u64 *var_ref_vals)
+{
+ struct synth_event *event = data->onmatch.synth_event;
+
+ trace_synth(event, var_ref_vals, data->onmatch.var_ref_idx);
+}
+
+struct hist_var_data {
+ struct list_head list;
+ struct hist_trigger_data *hist_data;
+};
+
static int create_synth_event(int argc, char **argv)
{
struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
@@ -967,10 +993,8 @@ static int create_synth_event(int argc, char **argv)
}
ret = -EEXIST;
goto out;
- } else if (delete_event) {
- ret = -EINVAL;
+ } else if (delete_event)
goto out;
- }

if (argc < 2) {
ret = -EINVAL;
@@ -1134,11 +1158,6 @@ static u64 hist_field_timestamp(struct hist_field *hist_field,
return ts;
}

-struct hist_var_data {
- struct list_head list;
- struct hist_trigger_data *hist_data;
-};
-
static struct hist_field *check_var_ref(struct hist_field *hist_field,
struct hist_trigger_data *var_data,
unsigned int var_idx)
@@ -1578,11 +1597,21 @@ static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)

static int parse_action(char *str, struct hist_trigger_attrs *attrs)
{
- int ret = 0;
+ int ret = -EINVAL;

if (attrs->n_actions >= HIST_ACTIONS_MAX)
return ret;

+ if ((strncmp(str, "onmatch(", strlen("onmatch(")) == 0)) {
+ attrs->action_str[attrs->n_actions] = kstrdup(str, GFP_KERNEL);
+ if (!attrs->action_str[attrs->n_actions]) {
+ ret = -ENOMEM;
+ return ret;
+ }
+ attrs->n_actions++;
+ ret = 0;
+ }
+
return ret;
}

@@ -2420,7 +2449,7 @@ static bool compatible_keys(struct hist_trigger_data *target_hist_data,

for (n = 0; n < n_keys; n++) {
hist_field = hist_data->fields[i + n];
- target_hist_field = hist_data->fields[j + n];
+ target_hist_field = target_hist_data->fields[j + n];

if (strcmp(hist_field->type, target_hist_field->type) != 0)
return false;
@@ -2738,6 +2767,27 @@ static struct field_var *create_field_var(struct hist_trigger_data *hist_data,
return create_field_var(hist_data, file, var_name);
}

+static void onmatch_destroy(struct action_data *data)
+{
+ unsigned int i;
+
+ mutex_lock(&synth_event_mutex);
+
+ kfree(data->onmatch.match_event);
+ kfree(data->onmatch.match_event_system);
+ kfree(data->onmatch.synth_event_name);
+
+ for (i = 0; i < data->n_params; i++)
+ kfree(data->params[i]);
+
+ kfree(data);
+
+ if (data->onmatch.synth_event)
+ data->onmatch.synth_event->ref--;
+
+ mutex_unlock(&synth_event_mutex);
+}
+
static void destroy_field_var(struct field_var *field_var)
{
if (!field_var)
@@ -2766,6 +2816,281 @@ static void save_field_var(struct hist_trigger_data *hist_data,
hist_data->n_field_var_str++;
}

+
+static void destroy_synth_var_refs(struct hist_trigger_data *hist_data)
+{
+ unsigned int i;
+
+ for (i = 0; i < hist_data->n_synth_var_refs; i++)
+ destroy_hist_field(hist_data->synth_var_refs[i], 0);
+}
+
+static void save_synth_var_ref(struct hist_trigger_data *hist_data,
+ struct hist_field *var_ref)
+{
+ hist_data->synth_var_refs[hist_data->n_synth_var_refs++] = var_ref;
+
+ hist_data->var_refs[hist_data->n_var_refs] = var_ref;
+ var_ref->var_ref_idx = hist_data->n_var_refs++;
+}
+
+static int check_synth_field(struct synth_event *event,
+ struct hist_field *hist_field,
+ unsigned int field_pos)
+{
+ struct synth_field *field;
+
+ if (field_pos >= event->n_fields)
+ return -EINVAL;
+
+ field = event->fields[field_pos];
+
+ if (strcmp(field->type, hist_field->type) != 0)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int parse_action_params(char *params, struct action_data *data)
+{
+ char *param, *saved_param;
+ int ret = 0;
+
+ while (params) {
+ if (data->n_params >= SYNTH_FIELDS_MAX)
+ goto out;
+
+ param = strsep(&params, ",");
+ if (!param)
+ goto out;
+
+ param = strstrip(param);
+ if (strlen(param) < 2) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ saved_param = kstrdup(param, GFP_KERNEL);
+ if (!saved_param) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ data->params[data->n_params++] = saved_param;
+ }
+ out:
+ return ret;
+}
+
+static struct hist_field *
+onmatch_find_var(struct hist_trigger_data *hist_data, struct action_data *data,
+ char *system, char *event, char *var)
+{
+ struct trace_array *tr = hist_data->event_file->tr;
+ struct hist_field *hist_field;
+
+ var++; /* skip '$' */
+
+ hist_field = find_target_event_var(hist_data, system, event, var);
+ if (!hist_field) {
+ if (!system) {
+ system = data->onmatch.match_event_system;
+ event = data->onmatch.match_event;
+ }
+
+ hist_field = find_event_var(tr, system, event, var);
+ }
+
+ return hist_field;
+}
+
+static struct hist_field *
+onmatch_create_field_var(struct hist_trigger_data *hist_data,
+ struct action_data *data, char *system,
+ char *event, char *var)
+{
+ struct hist_field *hist_field = NULL;
+ struct field_var *field_var;
+
+ field_var = create_target_field_var(hist_data, system, event, var);
+ if (IS_ERR(field_var))
+ goto out;
+
+ if (field_var) {
+ save_field_var(hist_data, field_var);
+ hist_field = field_var->var;
+ } else {
+ if (!system) {
+ system = data->onmatch.match_event_system;
+ event = data->onmatch.match_event;
+ }
+
+ hist_field = create_field_var_hist(hist_data, system, event, var);
+ if (IS_ERR(hist_field))
+ goto free;
+ }
+ out:
+ return hist_field;
+ free:
+ destroy_field_var(field_var);
+ hist_field = NULL;
+ goto out;
+}
+
+static int onmatch_create(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file,
+ struct action_data *data)
+{
+ char *event_name, *param, *system = NULL;
+ struct hist_field *hist_field, *var_ref;
+ unsigned int i, var_ref_idx;
+ unsigned int field_pos = 0;
+ struct synth_event *event;
+ int ret = 0;
+
+ mutex_lock(&synth_event_mutex);
+
+ event = find_synth_event(data->onmatch.synth_event_name);
+ if (!event) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ var_ref_idx = hist_data->n_var_refs;
+
+ for (i = 0; i < data->n_params; i++) {
+ char *p;
+
+ p = param = kstrdup(data->params[i], GFP_KERNEL);
+ if (!param)
+ goto out;
+
+ system = strsep(&param, ".");
+ if (!param) {
+ param = (char *)system;
+ system = event_name = NULL;
+ } else {
+ event_name = strsep(&param, ".");
+ if (!param) {
+ kfree(p);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ if (param[0] == '$')
+ hist_field = onmatch_find_var(hist_data, data, system,
+ event_name, param);
+ else
+ hist_field = onmatch_create_field_var(hist_data, data,
+ system,
+ event_name,
+ param);
+
+ if (!hist_field) {
+ kfree(p);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (check_synth_field(event, hist_field, field_pos) == 0) {
+ var_ref = create_var_ref(hist_field);
+ if (!var_ref) {
+ kfree(p);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ save_synth_var_ref(hist_data, var_ref);
+ field_pos++;
+ kfree(p);
+ continue;
+ }
+
+ kfree(p);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (field_pos != event->n_fields) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ data->fn = action_trace;
+ data->onmatch.synth_event = event;
+ data->onmatch.var_ref_idx = var_ref_idx;
+ hist_data->actions[hist_data->n_actions++] = data;
+ event->ref++;
+ out:
+ mutex_unlock(&synth_event_mutex);
+
+ return ret;
+}
+
+static struct action_data *onmatch_parse(struct trace_array *tr, char *str)
+{
+ char *match_event, *match_event_system;
+ char *synth_event_name, *params;
+ struct action_data *data;
+ int ret = -EINVAL;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return ERR_PTR(-ENOMEM);
+
+ match_event = strsep(&str, ")");
+ if (!match_event || !str)
+ goto free;
+
+ match_event_system = strsep(&match_event, ".");
+ if (!match_event)
+ goto free;
+
+ if (IS_ERR(event_file(tr, match_event_system, match_event)))
+ goto free;
+
+ data->onmatch.match_event = kstrdup(match_event, GFP_KERNEL);
+ if (!data->onmatch.match_event) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ data->onmatch.match_event_system = kstrdup(match_event_system, GFP_KERNEL);
+ if (!data->onmatch.match_event_system) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ strsep(&str, ".");
+ if (!str)
+ goto free;
+
+ synth_event_name = strsep(&str, "(");
+ if (!synth_event_name || !str)
+ goto free;
+
+ data->onmatch.synth_event_name = kstrdup(synth_event_name, GFP_KERNEL);
+ if (!data->onmatch.synth_event_name) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ params = strsep(&str, ")");
+ if (!params || !str || (str && strlen(str)))
+ goto free;
+
+ ret = parse_action_params(params, data);
+ if (ret)
+ goto free;
+ out:
+ return data;
+ free:
+ onmatch_destroy(data);
+ data = ERR_PTR(ret);
+ goto out;
+}
+
static int create_hitcount_val(struct hist_trigger_data *hist_data)
{
hist_data->fields[HITCOUNT_IDX] =
@@ -3194,19 +3519,38 @@ static void destroy_actions(struct hist_trigger_data *hist_data)
for (i = 0; i < hist_data->n_actions; i++) {
struct action_data *data = hist_data->actions[i];

- kfree(data);
+ if (data->fn == action_trace)
+ onmatch_destroy(data);
+ else
+ kfree(data);
}
}

static int create_actions(struct hist_trigger_data *hist_data,
struct trace_event_file *file)
{
+ struct trace_array *tr = hist_data->event_file->tr;
+ struct action_data *data;
unsigned int i;
int ret = 0;
char *str;

for (i = 0; i < hist_data->attrs->n_actions; i++) {
str = hist_data->attrs->action_str[i];
+
+ if (strncmp(str, "onmatch(", strlen("onmatch(")) == 0) {
+ char *action_str = str + strlen("onmatch(");
+
+ data = onmatch_parse(tr, action_str);
+ if (IS_ERR(data))
+ return PTR_ERR(data);
+
+ ret = onmatch_create(hist_data, file, data);
+ if (ret) {
+ onmatch_destroy(data);
+ return ret;
+ }
+ }
}

return ret;
@@ -3223,6 +3567,26 @@ static void print_actions(struct seq_file *m,
}
}

+static void print_onmatch_spec(struct seq_file *m,
+ struct hist_trigger_data *hist_data,
+ struct action_data *data)
+{
+ unsigned int i;
+
+ seq_printf(m, ":onmatch(%s.%s).", data->onmatch.match_event_system,
+ data->onmatch.match_event);
+
+ seq_printf(m, "%s(", data->onmatch.synth_event->name);
+
+ for (i = 0; i < data->n_params; i++) {
+ if (i)
+ seq_puts(m, ",");
+ seq_printf(m, "%s", data->params[i]);
+ }
+
+ seq_puts(m, ")");
+}
+
static void print_actions_spec(struct seq_file *m,
struct hist_trigger_data *hist_data)
{
@@ -3230,6 +3594,9 @@ static void print_actions_spec(struct seq_file *m,

for (i = 0; i < hist_data->n_actions; i++) {
struct action_data *data = hist_data->actions[i];
+
+ if (data->fn == action_trace)
+ print_onmatch_spec(m, hist_data, data);
}
}

@@ -3255,6 +3622,7 @@ static void destroy_hist_data(struct hist_trigger_data *hist_data)
destroy_actions(hist_data);
destroy_field_vars(hist_data);
destroy_field_var_hists(hist_data);
+ destroy_synth_var_refs(hist_data);

kfree(hist_data);
}
@@ -3603,6 +3971,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
}
}

+ print_actions(m, hist_data, elt);
+
seq_puts(m, "\n");
}

--
1.9.3

2017-09-05 21:59:38

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 30/40] tracing: Add 'onmax' hist trigger action support

Add an 'onmax(var).save(field,...)' hist trigger action which is
invoked whenever an event exceeds the current maximum.

The end result is that the trace event fields or variables specified
as the onmax.save() params will be saved if 'var' exceeds the current
maximum for that hist trigger entry. This allows context from the
event that exhibited the new maximum to be saved for later reference.
When the histogram is displayed, additional fields displaying the
saved values will be printed.

As an example the below defines a couple of hist triggers, one for
sched_wakeup and another for sched_switch, keyed on pid. Whenever a
sched_wakeup occurs, the timestamp is saved in the entry corresponding
to the current pid, and when the scheduler switches back to that pid,
the timestamp difference is calculated. If the resulting latency
exceeds the current maximum latency, the specified save() values are
saved:

# echo 'hist:keys=pid:ts0=$common_timestamp.usecs \
if comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger

# echo 'hist:keys=next_pid:\
wakeup_lat=$common_timestamp.usecs-$ts0:\
onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
if next_comm=="cyclictest"' >> \
/sys/kernel/debug/tracing/events/sched/sched_switch/trigger

When the histogram is displayed, the max value and the saved values
corresponding to the max are displayed following the rest of the
fields:

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 3728 } hitcount: 199 \
max: 123 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/3
{ next_pid: 3730 } hitcount: 1321 \
max: 15 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/1
{ next_pid: 3729 } hitcount: 1973\
max: 25 next_comm: cyclictest prev_pid: 0 \
prev_prio: 120 prev_comm: swapper/0

Totals:
Hits: 3493
Entries: 3
Dropped: 0

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 313 ++++++++++++++++++++++++++++++++++-----
1 file changed, 279 insertions(+), 34 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 73d2bd4..0a398f3 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -292,6 +292,10 @@ struct hist_trigger_data {
unsigned int n_field_var_str;
struct field_var_hist *field_var_hists[SYNTH_FIELDS_MAX];
unsigned int n_field_var_hists;
+
+ struct field_var *max_vars[SYNTH_FIELDS_MAX];
+ unsigned int n_max_vars;
+ unsigned int n_max_var_str;
};

struct synth_field {
@@ -334,6 +338,14 @@ struct action_data {
char *synth_event_name;
struct synth_event *synth_event;
} onmatch;
+
+ struct {
+ char *var_str;
+ char *fn_name;
+ unsigned int max_var_ref_idx;
+ struct hist_field *max_var;
+ struct hist_field *var;
+ } onmax;
};
};

@@ -1602,7 +1614,8 @@ static int parse_action(char *str, struct hist_trigger_attrs *attrs)
if (attrs->n_actions >= HIST_ACTIONS_MAX)
return ret;

- if ((strncmp(str, "onmatch(", strlen("onmatch(")) == 0)) {
+ if ((strncmp(str, "onmatch(", strlen("onmatch(")) == 0) ||
+ (strncmp(str, "onmax(", strlen("onmax(")) == 0)) {
attrs->action_str[attrs->n_actions] = kstrdup(str, GFP_KERNEL);
if (!attrs->action_str[attrs->n_actions]) {
ret = -ENOMEM;
@@ -1721,7 +1734,7 @@ static void hist_trigger_elt_data_free(struct tracing_map_elt *elt)
struct hist_elt_data *private_data = elt->private_data;
unsigned int i, n_str;

- n_str = hist_data->n_field_var_str;
+ n_str = hist_data->n_field_var_str + hist_data->n_max_var_str;

for (i = 0; i < n_str; i++)
kfree(private_data->field_var_str[i]);
@@ -1756,7 +1769,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
}
}

- n_str = hist_data->n_field_var_str;
+ n_str = hist_data->n_field_var_str + hist_data->n_max_var_str;

size = STR_VAR_LEN_MAX;

@@ -2658,6 +2671,15 @@ static void update_field_vars(struct hist_trigger_data *hist_data,
hist_data->n_field_vars, 0);
}

+static void update_max_vars(struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *rec)
+{
+ __update_field_vars(elt, rbe, rec, hist_data->max_vars,
+ hist_data->n_max_vars, hist_data->n_field_var_str);
+}
+
static struct hist_field *create_var(struct hist_trigger_data *hist_data,
struct trace_event_file *file,
char *name, int size, const char *type)
@@ -2767,6 +2789,223 @@ static struct field_var *create_field_var(struct hist_trigger_data *hist_data,
return create_field_var(hist_data, file, var_name);
}

+static void onmax_print(struct seq_file *m,
+ struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt,
+ struct action_data *data)
+{
+ unsigned int i, save_var_idx, max_idx = data->onmax.max_var->var.idx;
+
+ seq_printf(m, "\n\tmax: %10llu", tracing_map_read_var(elt, max_idx));
+
+ for (i = 0; i < hist_data->n_max_vars; i++) {
+ struct hist_field *save_val = hist_data->max_vars[i]->val;
+ struct hist_field *save_var = hist_data->max_vars[i]->var;
+ u64 val;
+
+ save_var_idx = save_var->var.idx;
+
+ val = tracing_map_read_var(elt, save_var_idx);
+
+ if (save_val->flags & HIST_FIELD_FL_STRING) {
+ seq_printf(m, " %s: %-32s", save_var->var.name,
+ (char *)(uintptr_t)(val));
+ } else
+ seq_printf(m, " %s: %10llu", save_var->var.name, val);
+ }
+}
+
+static void onmax_save(struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt, void *rec,
+ struct ring_buffer_event *rbe,
+ struct action_data *data, u64 *var_ref_vals)
+{
+ unsigned int max_idx = data->onmax.max_var->var.idx;
+ unsigned int max_var_ref_idx = data->onmax.max_var_ref_idx;
+
+ u64 var_val, max_val;
+
+ var_val = var_ref_vals[max_var_ref_idx];
+ max_val = tracing_map_read_var(elt, max_idx);
+
+ if (var_val <= max_val)
+ return;
+
+ tracing_map_set_var(elt, max_idx, var_val);
+
+ update_max_vars(hist_data, elt, rbe, rec);
+}
+
+static void onmax_destroy(struct action_data *data)
+{
+ unsigned int i;
+
+ destroy_hist_field(data->onmax.max_var, 0);
+ destroy_hist_field(data->onmax.var, 0);
+
+ kfree(data->onmax.var_str);
+ kfree(data->onmax.fn_name);
+
+ for (i = 0; i < data->n_params; i++)
+ kfree(data->params[i]);
+
+ kfree(data);
+}
+
+static int onmax_create(struct hist_trigger_data *hist_data,
+ struct action_data *data)
+{
+ struct trace_event_call *call = hist_data->event_file->event_call;
+ struct trace_event_file *file = hist_data->event_file;
+ struct hist_field *var_field, *ref_field, *max_var;
+ unsigned int var_ref_idx = hist_data->n_var_refs;
+ struct field_var *field_var;
+ char *onmax_var_str, *param;
+ const char *event_name;
+ unsigned long flags;
+ unsigned int i;
+ int ret = 0;
+
+ onmax_var_str = data->onmax.var_str;
+ if (onmax_var_str[0] != '$')
+ return -EINVAL;
+ onmax_var_str++;
+
+ event_name = trace_event_name(call);
+ var_field = find_target_event_var(hist_data, NULL, NULL, onmax_var_str);
+ if (!var_field)
+ return -EINVAL;
+
+ flags = HIST_FIELD_FL_VAR_REF;
+ ref_field = create_hist_field(hist_data, NULL, flags, NULL);
+ if (!ref_field)
+ return -ENOMEM;
+
+ if (init_var_ref(ref_field, var_field)) {
+ destroy_hist_field(ref_field, 0);
+ ret = -ENOMEM;
+ goto out;
+ }
+ hist_data->var_refs[hist_data->n_var_refs] = ref_field;
+ ref_field->var_ref_idx = hist_data->n_var_refs++;
+ data->onmax.var = ref_field;
+
+ data->fn = onmax_save;
+ data->onmax.max_var_ref_idx = var_ref_idx;
+ max_var = create_var(hist_data, file, "max", sizeof(u64), "u64");
+ if (IS_ERR(max_var)) {
+ ret = PTR_ERR(max_var);
+ goto out;
+ }
+ data->onmax.max_var = max_var;
+
+ for (i = 0; i < data->n_params; i++) {
+ param = kstrdup(data->params[i], GFP_KERNEL);
+ if (!param)
+ goto out;
+
+ field_var = create_target_field_var(hist_data, NULL, NULL, param);
+ if (IS_ERR(field_var)) {
+ ret = PTR_ERR(field_var);
+ kfree(param);
+ goto out;
+ }
+
+ hist_data->max_vars[hist_data->n_max_vars++] = field_var;
+ if (field_var->val->flags & HIST_FIELD_FL_STRING)
+ hist_data->n_max_var_str++;
+
+ kfree(param);
+ }
+
+ hist_data->actions[hist_data->n_actions++] = data;
+ out:
+ return ret;
+}
+
+static int parse_action_params(char *params, struct action_data *data)
+{
+ char *param, *saved_param;
+ int ret = 0;
+
+ while (params) {
+ if (data->n_params >= SYNTH_FIELDS_MAX)
+ goto out;
+
+ param = strsep(&params, ",");
+ if (!param)
+ goto out;
+
+ param = strstrip(param);
+ if (strlen(param) < 2) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ saved_param = kstrdup(param, GFP_KERNEL);
+ if (!saved_param) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ data->params[data->n_params++] = saved_param;
+ }
+ out:
+ return ret;
+}
+
+static struct action_data *onmax_parse(char *str)
+{
+ char *onmax_fn_name, *onmax_var_str;
+ struct action_data *data;
+ int ret = -EINVAL;
+
+ data = kzalloc(sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return ERR_PTR(-ENOMEM);
+
+ onmax_var_str = strsep(&str, ")");
+ if (!onmax_var_str || !str)
+ return ERR_PTR(-EINVAL);
+ data->onmax.var_str = kstrdup(onmax_var_str, GFP_KERNEL);
+ if (!data->onmax.var_str) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ strsep(&str, ".");
+ if (!str)
+ goto free;
+
+ onmax_fn_name = strsep(&str, "(");
+ if (!onmax_fn_name || !str)
+ goto free;
+
+ if (strncmp(onmax_fn_name, "save", strlen("save")) == 0) {
+ char *params = strsep(&str, ")");
+
+ if (!params)
+ goto free;
+
+ ret = parse_action_params(params, data);
+ if (ret)
+ goto free;
+ } else
+ goto free;
+
+ data->onmax.fn_name = kstrdup(onmax_fn_name, GFP_KERNEL);
+ if (!data->onmax.fn_name) {
+ ret = -ENOMEM;
+ goto free;
+ }
+ out:
+ return data;
+ free:
+ onmax_destroy(data);
+ data = ERR_PTR(ret);
+ goto out;
+}
+
static void onmatch_destroy(struct action_data *data)
{
unsigned int i;
@@ -2851,37 +3090,6 @@ static int check_synth_field(struct synth_event *event,
return 0;
}

-static int parse_action_params(char *params, struct action_data *data)
-{
- char *param, *saved_param;
- int ret = 0;
-
- while (params) {
- if (data->n_params >= SYNTH_FIELDS_MAX)
- goto out;
-
- param = strsep(&params, ",");
- if (!param)
- goto out;
-
- param = strstrip(param);
- if (strlen(param) < 2) {
- ret = -EINVAL;
- goto out;
- }
-
- saved_param = kstrdup(param, GFP_KERNEL);
- if (!saved_param) {
- ret = -ENOMEM;
- goto out;
- }
-
- data->params[data->n_params++] = saved_param;
- }
- out:
- return ret;
-}
-
static struct hist_field *
onmatch_find_var(struct hist_trigger_data *hist_data, struct action_data *data,
char *system, char *event, char *var)
@@ -3521,6 +3729,8 @@ static void destroy_actions(struct hist_trigger_data *hist_data)

if (data->fn == action_trace)
onmatch_destroy(data);
+ else if (data->fn == onmax_save)
+ onmax_destroy(data);
else
kfree(data);
}
@@ -3550,6 +3760,18 @@ static int create_actions(struct hist_trigger_data *hist_data,
onmatch_destroy(data);
return ret;
}
+ } else if (strncmp(str, "onmax(", strlen("onmax(")) == 0) {
+ char *action_str = str + strlen("onmax(");
+
+ data = onmax_parse(action_str);
+ if (IS_ERR(data))
+ return PTR_ERR(data);
+
+ ret = onmax_create(hist_data, data);
+ if (ret) {
+ onmax_destroy(data);
+ return ret;
+ }
}
}

@@ -3564,9 +3786,30 @@ static void print_actions(struct seq_file *m,

for (i = 0; i < hist_data->n_actions; i++) {
struct action_data *data = hist_data->actions[i];
+
+ if (data->fn == onmax_save)
+ onmax_print(m, hist_data, elt, data);
}
}

+static void print_onmax_spec(struct seq_file *m,
+ struct hist_trigger_data *hist_data,
+ struct action_data *data)
+{
+ unsigned int i;
+
+ seq_puts(m, ":onmax(");
+ seq_printf(m, "%s", data->onmax.var_str);
+ seq_printf(m, ").%s(", data->onmax.fn_name);
+
+ for (i = 0; i < hist_data->n_max_vars; i++) {
+ seq_printf(m, "%s", hist_data->max_vars[i]->var->var.name);
+ if (i < hist_data->n_max_vars - 1)
+ seq_puts(m, ",");
+ }
+ seq_puts(m, ")");
+}
+
static void print_onmatch_spec(struct seq_file *m,
struct hist_trigger_data *hist_data,
struct action_data *data)
@@ -3597,6 +3840,8 @@ static void print_actions_spec(struct seq_file *m,

if (data->fn == action_trace)
print_onmatch_spec(m, hist_data, data);
+ else if (data->fn == onmax_save)
+ print_onmax_spec(m, hist_data, data);
}
}

--
1.9.3

2017-09-05 21:59:51

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 35/40] tracing: Reverse the order event_mutex/trace_types_lock are taken

Change the order event_mutex and trace_types_lock are taken, to avoid
circular dependencies and lockdep spew.

Changing the order shouldn't matter to any current code, but does to
anything that takes the event_mutex first and then trace_types_lock.
This is the case when calling tracing_set_clock from inside an event
command, which already holds the event_mutex.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c93540c..889802c 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1406,8 +1406,8 @@ static int subsystem_open(struct inode *inode, struct file *filp)
return -ENODEV;

/* Make sure the system still exists */
- mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
+ mutex_lock(&trace_types_lock);
list_for_each_entry(tr, &ftrace_trace_arrays, list) {
list_for_each_entry(dir, &tr->systems, list) {
if (dir == inode->i_private) {
@@ -1421,8 +1421,8 @@ static int subsystem_open(struct inode *inode, struct file *filp)
}
}
exit_loop:
- mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

if (!system)
return -ENODEV;
@@ -2294,15 +2294,15 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
int trace_add_event_call(struct trace_event_call *call)
{
int ret;
- mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
+ mutex_lock(&trace_types_lock);

ret = __register_event(call, NULL);
if (ret >= 0)
__add_event_to_tracers(call);

- mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);
return ret;
}

@@ -2356,13 +2356,13 @@ int trace_remove_event_call(struct trace_event_call *call)
{
int ret;

- mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
+ mutex_lock(&trace_types_lock);
down_write(&trace_event_sem);
ret = probe_remove_event_call(call);
up_write(&trace_event_sem);
- mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

return ret;
}
@@ -2424,8 +2424,8 @@ static int trace_module_notify(struct notifier_block *self,
{
struct module *mod = data;

- mutex_lock(&trace_types_lock);
mutex_lock(&event_mutex);
+ mutex_lock(&trace_types_lock);
switch (val) {
case MODULE_STATE_COMING:
trace_module_add_events(mod);
@@ -2434,8 +2434,8 @@ static int trace_module_notify(struct notifier_block *self,
trace_module_remove_events(mod);
break;
}
- mutex_unlock(&event_mutex);
mutex_unlock(&trace_types_lock);
+ mutex_unlock(&event_mutex);

return 0;
}
--
1.9.3

2017-09-05 21:59:57

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 40/40] tracing: Add trace_event_buffer_reserve() variant that allows recursion

Synthetic event generation requires the reservation of a second event
while the reservation of a previous event is still in progress. The
trace_recursive_lock() check in ring_buffer_lock_reserve() prevents
this however.

This sets up a special reserve pathway for this particular case,
leaving existing pathways untouched, other than an additional check in
ring_buffer_lock_reserve() and trace_event_buffer_reserve(). These
checks could be gotten rid of as well, with copies of those functions,
but for now try to avoid that unless necessary.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ring_buffer.h | 3 +-
include/linux/trace_events.h | 10 +++++++
kernel/trace/ring_buffer.c | 10 +++++--
kernel/trace/trace.c | 65 +++++++++++++++++++++++++++-------------
kernel/trace/trace_events.c | 35 +++++++++++++++++-----
kernel/trace/trace_events_hist.c | 4 +--
6 files changed, 93 insertions(+), 34 deletions(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 74bc276..5459516 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -113,7 +113,8 @@ int ring_buffer_poll_wait(struct ring_buffer *buffer, int cpu,
void ring_buffer_change_overwrite(struct ring_buffer *buffer, int val);

struct ring_buffer_event *ring_buffer_lock_reserve(struct ring_buffer *buffer,
- unsigned long length);
+ unsigned long length,
+ bool allow_recursion);
int ring_buffer_unlock_commit(struct ring_buffer *buffer,
struct ring_buffer_event *event);
int ring_buffer_write(struct ring_buffer *buffer,
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index bfd2a53..eb03abb 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -151,6 +151,12 @@ struct ring_buffer_event *
int type, unsigned long len,
unsigned long flags, int pc);

+struct ring_buffer_event *
+trace_event_buffer_lock_reserve_recursive(struct ring_buffer **current_buffer,
+ struct trace_event_file *trace_file,
+ int type, unsigned long len,
+ unsigned long flags, int pc);
+
#define TRACE_RECORD_CMDLINE BIT(0)
#define TRACE_RECORD_TGID BIT(1)

@@ -210,6 +216,10 @@ void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
struct trace_event_file *trace_file,
unsigned long len);

+void *trace_event_buffer_reserve_recursive(struct trace_event_buffer *fbuffer,
+ struct trace_event_file *trace_file,
+ unsigned long len);
+
void trace_event_buffer_commit(struct trace_event_buffer *fbuffer);

enum {
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0bcc53e..8e5bcfa 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2830,6 +2830,7 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
* ring_buffer_lock_reserve - reserve a part of the buffer
* @buffer: the ring buffer to reserve from
* @length: the length of the data to reserve (excluding event header)
+ * @allow_recursion: flag allowing recursion check to be overridden
*
* Returns a reseverd event on the ring buffer to copy directly to.
* The user of this interface will need to get the body to write into
@@ -2842,7 +2843,8 @@ int ring_buffer_unlock_commit(struct ring_buffer *buffer,
* If NULL is returned, then nothing has been allocated or locked.
*/
struct ring_buffer_event *
-ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
+ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length,
+ bool allow_recursion)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
@@ -2867,8 +2869,10 @@ struct ring_buffer_event *
if (unlikely(length > BUF_MAX_DATA_SIZE))
goto out;

- if (unlikely(trace_recursive_lock(cpu_buffer)))
- goto out;
+ if (unlikely(trace_recursive_lock(cpu_buffer))) {
+ if (!allow_recursion)
+ goto out;
+ }

event = rb_reserve_next_event(buffer, cpu_buffer, length);
if (!event)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ecdf456..1d009e4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -744,13 +744,14 @@ static inline void ftrace_trace_stack(struct trace_array *tr,

static __always_inline struct ring_buffer_event *
__trace_buffer_lock_reserve(struct ring_buffer *buffer,
- int type,
- unsigned long len,
- unsigned long flags, int pc)
+ int type,
+ unsigned long len,
+ unsigned long flags, int pc,
+ bool allow_recursion)
{
struct ring_buffer_event *event;

- event = ring_buffer_lock_reserve(buffer, len);
+ event = ring_buffer_lock_reserve(buffer, len, allow_recursion);
if (event != NULL)
trace_event_setup(event, type, flags, pc);

@@ -829,8 +830,8 @@ int __trace_puts(unsigned long ip, const char *str, int size)

local_save_flags(irq_flags);
buffer = global_trace.trace_buffer.buffer;
- event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, alloc,
- irq_flags, pc);
+ event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, alloc,
+ irq_flags, pc, false);
if (!event)
return 0;

@@ -878,7 +879,7 @@ int __trace_bputs(unsigned long ip, const char *str)
local_save_flags(irq_flags);
buffer = global_trace.trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_BPUTS, size,
- irq_flags, pc);
+ irq_flags, pc, false);
if (!event)
return 0;

@@ -2150,7 +2151,7 @@ struct ring_buffer_event *
unsigned long len,
unsigned long flags, int pc)
{
- return __trace_buffer_lock_reserve(buffer, type, len, flags, pc);
+ return __trace_buffer_lock_reserve(buffer, type, len, flags, pc, false);
}

DEFINE_PER_CPU(struct ring_buffer_event *, trace_buffered_event);
@@ -2267,10 +2268,11 @@ void trace_buffered_event_disable(void)
static struct ring_buffer *temp_buffer;

struct ring_buffer_event *
-trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
- struct trace_event_file *trace_file,
- int type, unsigned long len,
- unsigned long flags, int pc)
+__trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
+ struct trace_event_file *trace_file,
+ int type, unsigned long len,
+ unsigned long flags, int pc,
+ bool allow_recursion)
{
struct ring_buffer_event *entry;
int val;
@@ -2291,7 +2293,7 @@ struct ring_buffer_event *
}

entry = __trace_buffer_lock_reserve(*current_rb,
- type, len, flags, pc);
+ type, len, flags, pc, allow_recursion);
/*
* If tracing is off, but we have triggers enabled
* we still need to look at the event data. Use the temp_buffer
@@ -2301,12 +2303,33 @@ struct ring_buffer_event *
if (!entry && trace_file->flags & EVENT_FILE_FL_TRIGGER_COND) {
*current_rb = temp_buffer;
entry = __trace_buffer_lock_reserve(*current_rb,
- type, len, flags, pc);
+ type, len, flags, pc, allow_recursion);
}
return entry;
}
+
+struct ring_buffer_event *
+trace_event_buffer_lock_reserve(struct ring_buffer **current_rb,
+ struct trace_event_file *trace_file,
+ int type, unsigned long len,
+ unsigned long flags, int pc)
+{
+ return __trace_event_buffer_lock_reserve(current_rb, trace_file, type,
+ len, flags, pc, false);
+}
EXPORT_SYMBOL_GPL(trace_event_buffer_lock_reserve);

+struct ring_buffer_event *
+trace_event_buffer_lock_reserve_recursive(struct ring_buffer **current_rb,
+ struct trace_event_file *trace_file,
+ int type, unsigned long len,
+ unsigned long flags, int pc)
+{
+ return __trace_event_buffer_lock_reserve(current_rb, trace_file, type,
+ len, flags, pc, true);
+}
+EXPORT_SYMBOL_GPL(trace_event_buffer_lock_reserve_recursive);
+
static DEFINE_SPINLOCK(tracepoint_iter_lock);
static DEFINE_MUTEX(tracepoint_printk_mutex);

@@ -2548,7 +2571,7 @@ int unregister_ftrace_export(struct trace_export *export)
struct ftrace_entry *entry;

event = __trace_buffer_lock_reserve(buffer, TRACE_FN, sizeof(*entry),
- flags, pc);
+ flags, pc, false);
if (!event)
return;
entry = ring_buffer_event_data(event);
@@ -2628,7 +2651,7 @@ static void __ftrace_trace_stack(struct ring_buffer *buffer,
size *= sizeof(unsigned long);

event = __trace_buffer_lock_reserve(buffer, TRACE_STACK,
- sizeof(*entry) + size, flags, pc);
+ sizeof(*entry) + size, flags, pc, false);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
@@ -2759,7 +2782,7 @@ void trace_dump_stack(int skip)
__this_cpu_inc(user_stack_count);

event = __trace_buffer_lock_reserve(buffer, TRACE_USER_STACK,
- sizeof(*entry), flags, pc);
+ sizeof(*entry), flags, pc, false);
if (!event)
goto out_drop_count;
entry = ring_buffer_event_data(event);
@@ -2930,7 +2953,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
size = sizeof(*entry) + sizeof(u32) * len;
buffer = tr->trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_BPRINT, size,
- flags, pc);
+ flags, pc, false);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
@@ -2986,7 +3009,7 @@ int trace_vbprintk(unsigned long ip, const char *fmt, va_list args)
local_save_flags(flags);
size = sizeof(*entry) + len + 1;
event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
- flags, pc);
+ flags, pc, false);
if (!event)
goto out;
entry = ring_buffer_event_data(event);
@@ -6097,7 +6120,7 @@ static ssize_t tracing_splice_read_pipe(struct file *filp,

buffer = tr->trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
- irq_flags, preempt_count());
+ irq_flags, preempt_count(), false);
if (unlikely(!event))
/* Ring buffer disabled, return as if not open for write */
return -EBADF;
@@ -6169,7 +6192,7 @@ static ssize_t tracing_splice_read_pipe(struct file *filp,

buffer = tr->trace_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_RAW_DATA, size,
- irq_flags, preempt_count());
+ irq_flags, preempt_count(), false);
if (!event)
/* Ring buffer disabled, return as if not open for write */
return -EBADF;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 889802c..7b90462 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -249,9 +249,9 @@ bool trace_event_ignore_this_pid(struct trace_event_file *trace_file)
}
EXPORT_SYMBOL_GPL(trace_event_ignore_this_pid);

-void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
- struct trace_event_file *trace_file,
- unsigned long len)
+void *__trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
+ struct trace_event_file *trace_file,
+ unsigned long len, bool allow_recursion)
{
struct trace_event_call *event_call = trace_file->event_call;

@@ -271,18 +271,39 @@ void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
fbuffer->pc--;
fbuffer->trace_file = trace_file;

- fbuffer->event =
- trace_event_buffer_lock_reserve(&fbuffer->buffer, trace_file,
- event_call->event.type, len,
- fbuffer->flags, fbuffer->pc);
+ if (!allow_recursion)
+ fbuffer->event =
+ trace_event_buffer_lock_reserve(&fbuffer->buffer, trace_file,
+ event_call->event.type, len,
+ fbuffer->flags, fbuffer->pc);
+ else
+ fbuffer->event =
+ trace_event_buffer_lock_reserve_recursive(&fbuffer->buffer, trace_file,
+ event_call->event.type, len,
+ fbuffer->flags, fbuffer->pc);
if (!fbuffer->event)
return NULL;

fbuffer->entry = ring_buffer_event_data(fbuffer->event);
return fbuffer->entry;
}
+
+void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
+ struct trace_event_file *trace_file,
+ unsigned long len)
+{
+ return __trace_event_buffer_reserve(fbuffer, trace_file, len, false);
+}
EXPORT_SYMBOL_GPL(trace_event_buffer_reserve);

+void *trace_event_buffer_reserve_recursive(struct trace_event_buffer *fbuffer,
+ struct trace_event_file *trace_file,
+ unsigned long len)
+{
+ return __trace_event_buffer_reserve(fbuffer, trace_file, len, true);
+}
+EXPORT_SYMBOL_GPL(trace_event_buffer_reserve_recursive);
+
int trace_event_reg(struct trace_event_call *call,
enum trace_reg type, void *data)
{
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index eb77eee..67a25bf 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -674,8 +674,8 @@ static notrace void trace_event_raw_event_synth(void *__data,

fields_size = event->n_u64 * sizeof(u64);

- entry = trace_event_buffer_reserve(&fbuffer, trace_file,
- sizeof(*entry) + fields_size);
+ entry = trace_event_buffer_reserve_recursive(&fbuffer, trace_file,
+ sizeof(*entry) + fields_size);
if (!entry)
return;

--
1.9.3

2017-09-05 22:00:19

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 39/40] tracing: Add a clock attribute for hist triggers

The default clock if timestamps are used in a histogram is "global".
If timestamps aren't used, the clock is irrelevant.

Use the "clock=" param only if you want to override the default
"global" clock for a histogram with timestamps.

Signed-off-by: Tom Zanussi <[email protected]>
---
Documentation/trace/events.txt | 9 +++++++++
kernel/trace/trace_events_hist.c | 34 +++++++++++++++++++++++++++++++---
2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 7ee720b..9bd4b7f 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -2173,6 +2173,15 @@ features have been added to the hist trigger support:
default it is in units of nanoseconds; appending '.usecs' to a
common_timestamp field changes the units to microseconds.

+A note on inter-event timestamps: If $common_timestamp is used in a
+histogram, the trace buffer is automatically switched over to using
+absolute timestamps and the "global" trace clock, in order to avoid
+bogus timestamp differences with other clocks that aren't coherent
+across CPUs. This can be overridden by specifying one of the other
+trace clocks instead, using the "clock=XXX" hist trigger attribute,
+where XXX is any of the clocks listed in the tracing/trace_clock
+pseudo-file.
+
These features are decribed in more detail in the following sections.

6.3.1 Histogram Variables
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 655b731..eb77eee 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -241,6 +241,7 @@ struct hist_trigger_attrs {
char *vals_str;
char *sort_key_str;
char *name;
+ char *clock;
bool pause;
bool cont;
bool clear;
@@ -1701,6 +1702,7 @@ static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
kfree(attrs->sort_key_str);
kfree(attrs->keys_str);
kfree(attrs->vals_str);
+ kfree(attrs->clock);
kfree(attrs);
}

@@ -1740,7 +1742,16 @@ static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
else if (strncmp(str, "name=", strlen("name=")) == 0)
attrs->name = kstrdup(str, GFP_KERNEL);
- else if (strncmp(str, "size=", strlen("size=")) == 0) {
+ else if (strncmp(str, "clock=", strlen("clock=")) == 0) {
+ strsep(&str, "=");
+ if (!str) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ str = strstrip(str);
+ attrs->clock = kstrdup(str, GFP_KERNEL);
+ } else if (strncmp(str, "size=", strlen("size=")) == 0) {
int map_bits = parse_map_size(str);

if (map_bits < 0) {
@@ -1803,6 +1814,12 @@ static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
goto free;
}

+ if (!attrs->clock) {
+ attrs->clock = kstrdup("global", GFP_KERNEL);
+ if (!attrs->clock)
+ goto free;
+ }
+
return attrs;
free:
destroy_hist_trigger_attrs(attrs);
@@ -4644,6 +4661,8 @@ static int event_hist_trigger_print(struct seq_file *m,
seq_puts(m, ".descending");
}
seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits));
+ if (hist_data->enable_timestamps)
+ seq_printf(m, ":clock=%s", hist_data->attrs->clock);

print_actions_spec(m, hist_data);

@@ -4907,10 +4926,19 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
goto out;
}

- ret++;
+ if (hist_data->enable_timestamps) {
+ char *clock = hist_data->attrs->clock;
+
+ ret = tracing_set_clock(file->tr, hist_data->attrs->clock);
+ if (ret) {
+ hist_err("Couldn't set trace_clock: ", clock);
+ goto out;
+ }

- if (hist_data->enable_timestamps)
tracing_set_time_stamp_abs(file->tr, true);
+ }
+
+ ret++;
out:
return ret;
}
--
1.9.3

2017-09-05 22:00:51

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 38/40] tracing: Make tracing_set_clock() non-static

Allow tracing code outside of trace.c to access tracing_set_clock().

Some applications may require a particular clock in order to function
properly, such as latency calculations.

Also, add an accessor returning the current clock string.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace.c | 2 +-
kernel/trace/trace.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d40839d..ecdf456 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6207,7 +6207,7 @@ static int tracing_clock_show(struct seq_file *m, void *v)
return 0;
}

-static int tracing_set_clock(struct trace_array *tr, const char *clockstr)
+int tracing_set_clock(struct trace_array *tr, const char *clockstr)
{
int i;

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 7b78762..1f3443a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -288,6 +288,7 @@ enum {
extern void trace_array_put(struct trace_array *tr);

extern int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs);
+extern int tracing_set_clock(struct trace_array *tr, const char *clockstr);

extern bool trace_clock_in_ns(struct trace_array *tr);

--
1.9.3

2017-09-05 22:01:13

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 36/40] tracing: Remove lookups from tracing_map hitcount

Lookups inflate the hitcount, making it essentially useless. Only
inserts and updates should really affect the hitcount anyway, so
explicitly filter lookups out.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/tracing_map.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index a4e5a56..f8e2338 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -538,7 +538,8 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
if (test_key && test_key == key_hash) {
if (entry->val &&
keys_match(key, entry->val->key, map->key_size)) {
- atomic64_inc(&map->hits);
+ if (!lookup_only)
+ atomic64_inc(&map->hits);
return entry->val;
} else if (unlikely(!entry->val)) {
/*
--
1.9.3

2017-09-05 22:01:10

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 37/40] tracing: Add inter-event hist trigger Documentation

Add background and details on inter-event hist triggers, including
hist variables, synthetic events, and actions.

Signed-off-by: Tom Zanussi <[email protected]>
Signed-off-by: Baohong Liu <[email protected]>
---
Documentation/trace/events.txt | 385 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 385 insertions(+)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index f271d87..7ee720b 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -571,6 +571,7 @@ The following commands are supported:
.sym-offset display an address as a symbol and offset
.syscall display a syscall id as a system call name
.execname display a common_pid as a program name
+ .usecs display a $common_timestamp in microseconds

Note that in general the semantics of a given field aren't
interpreted when applying a modifier to it, but there are some
@@ -2101,3 +2102,387 @@ The following commands are supported:
Hits: 489
Entries: 7
Dropped: 0
+
+6.3 Inter-event hist triggers
+-----------------------------
+
+Inter-event hist triggers are hist triggers that combine values from
+one or more other events and create a histogram using that data. Data
+from an inter-event histogram can in turn become the source for
+further combined histograms, thus providing a chain of related
+histograms, which is important for some applications.
+
+The most important example of an inter-event quantity that can be used
+in this manner is latency, which is simply a difference in timestamps
+between two events (although trace events don't have an externally
+visible timestamp field, the inter-event hist trigger support adds a
+pseudo-field to all events named '$common_timestamp' which can be used
+as if it were an actual event field). Although latency is the most
+important inter-event quantity, note that because the support is
+completely general across the trace event subsystem, any event field
+can be used in an inter-event quantity.
+
+An example of a histogram that combines data from other histograms
+into a useful chain would be a 'wakeupswitch latency' histogram that
+combines a 'wakeup latency' histogram and a 'switch latency'
+histogram.
+
+Normally, a hist trigger specification consists of a (possibly
+compound) key along with one or more numeric values, which are
+continually updated sums associated with that key. A histogram
+specification in this case consists of individual key and value
+specifications that refer to trace event fields associated with a
+single event type.
+
+The inter-event hist trigger extension allows fields from multiple
+events to be referenced and combined into a multi-event histogram
+specification. In support of this overall goal, a few enabling
+features have been added to the hist trigger support:
+
+ - In order to compute an inter-event quantity, a value from one
+ event needs to saved and then referenced from another event. This
+ requires the introduction of support for histogram 'variables'.
+
+ - The computation of inter-event quantities and their combination
+ require some minimal amount of support for applying simple
+ expressions to variables (+ and -).
+
+ - A histogram consisting of inter-event quantities isn't logically a
+ histogram on either event (so having the 'hist' file for either
+ event host the histogram output doesn't really make sense). To
+ address the idea that the histogram is associated with a
+ combination of events, support is added allowing the creation of
+ 'synthetic' events that are events derived from other events.
+ These synthetic events are full-fledged events just like any other
+ and can be used as such, as for instance to create the
+ 'combination' histograms mentioned previously.
+
+ - A set of 'actions' can be associated with histogram entries -
+ these can be used to generate the previously mentioned synthetic
+ events, but can also be used for other purposes, such as for
+ example saving context when a 'max' latency has been hit.
+
+ - Trace events don't have a 'timestamp' associated with them, but
+ there is an implicit timestamp saved along with an event in the
+ underlying ftrace ring buffer. This timestamp is now exposed as a
+ a synthetic field named '$common_timestamp' which can be used in
+ histograms as if it were any other event field. Note that it has
+ a '$' prefixed to it - this is meant to indicate that it isn't an
+ actual field in the trace format but rather is a synthesized value
+ that nonetheless can be used as if it were an actual field. By
+ default it is in units of nanoseconds; appending '.usecs' to a
+ common_timestamp field changes the units to microseconds.
+
+These features are decribed in more detail in the following sections.
+
+6.3.1 Histogram Variables
+-------------------------
+
+Variables are simply named locations used for saving and retrieving
+values between matching events. A 'matching' event is defined as an
+event that has a matching key - if a variable is saved for a histogram
+entry corresponding to that key, any subsequent event with a matching
+key can access that variable.
+
+A variable's value is normally available to any subsequent event until
+it is set to something else by a subsequent event. The one exception
+to that rule is that any variable used in an expression is essentially
+'read-once' - once it's used by an expression in a subsequent event,
+it's reset to its 'unset' state, which means it can't be used again
+unless it's set again. This ensures not only that an event doesn't
+use an uninitialized variable in a calculation, but that that variable
+is used only once and not for any unrelated subsequent match.
+
+The basic syntax for saving a variable is to simply prefix a unique
+variable name not corresponding to any keyword along with an '=' sign
+to any event field.
+
+Either keys or values can be saved and retrieved in this way. This
+creates a variable named 'ts0' for a histogram entry with the key
+'next_pid':
+
+ # echo 'hist:keys=next_pid:vals=$ts0:ts0=$common_timestamp ... >> \
+ event/trigger
+
+The ts0 variable can be accessed by any subsequent event having the
+same pid as 'next_pid'.
+
+Variable references are formed by prepending the variable name with
+the '$' sign. Thus for example, the ts0 variable above would be
+referenced as '$ts0' in expressions.
+
+Because 'vals=' is used, the $common_timestamp variable value above
+will also be summed as a normal histogram value would (though for a
+timestamp it makes little sense).
+
+The below shows that a key value can also be saved in the same way:
+
+ # echo 'hist:timer_pid=common_pid:key=timer_pid ...' >> event/trigger
+
+If a variable isn't a key variable or prefixed with 'vals=', the
+associated event field will be saved in a variable but won't be summed
+as a value:
+
+ # echo 'hist:keys=next_pid:ts1=$common_timestamp ... >> event/trigger
+
+Multiple variables can be assigned at the same time. The below would
+result in both ts0 and b being created as variables, with both
+common_timestamp and field1 additionally being summed as values:
+
+ # echo 'hist:keys=pid:vals=$ts0,$b:ts0=$common_timestamp,b=field1 ... >> \
+ event/trigger
+
+Note that variable assignments can appear either preceding or
+following their use. The command below behaves identically to the
+command above:
+
+ # echo 'hist:keys=pid:ts0=$common_timestamp,b=field1:vals=$ts0,$b ... >> \
+ event/trigger
+
+Any number of variables not bound to a 'vals=' prefix can also be
+assigned by simply separating them with colons. Below is the same
+thing but without the values being summed in the histogram:
+
+ # echo 'hist:keys=pid:ts0=$common_timestamp:b=field1 ... >> event/trigger
+
+Variables set as above can be referenced and used in expressions on
+another event.
+
+For example, here's how a latency can be calculated:
+
+ # echo 'hist:keys=pid,prio:ts0=$common_timestamp ... >> event1/trigger
+ # echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp-$ts0 ... >> event2/trigger
+
+In the first line above, the event's timetamp is saved into the
+variable ts0. In the next line, ts0 is subtracted from the second
+event's timestamp to produce the latency, which is then assigned into
+yet another variable, 'wakeup_lat'. The hist trigger below in turn
+makes use of the wakeup_lat variable to compute a combined latency
+using the same key and variable from yet another event:
+
+ # echo 'hist:key=pid:wakeupswitch_lat=$wakeup_lat+$switchtime_lat ... >> event3/trigger
+
+6.3.2 Synthetic Events
+----------------------
+
+Synthetic events are user-defined events generated from hist trigger
+variables or fields associated with one or more other events. Their
+purpose is to provide a mechanism for displaying data spanning
+multiple events consistent with the existing and already familiar
+usage for normal events.
+
+To define a synthetic event, the user writes a simple specification
+consisting of the name of the new event along with one or more
+variables and their types, which can be any valid field type,
+separated by semicolons, to the tracing/synthetic_events file.
+
+For instance, the following creates a new event named 'wakeup_latency'
+with 3 fields: lat, pid, and prio. Each of those fields is simply a
+variable reference to a variable on another event:
+
+ # echo 'wakeup_latency \
+ u64 lat; \
+ pid_t pid; \
+ int prio' >> \
+ /sys/kernel/debug/tracing/synthetic_events
+
+Reading the tracing/synthetic_events file lists all the currently
+defined synthetic events, in this case the event defined above:
+
+ # cat /sys/kernel/debug/tracing/synthetic_events
+ wakeup_latency u64 lat; pid_t pid; int prio
+
+An existing synthetic event definition can be removed by prepending
+the command that defined it with a '!':
+
+ # echo '!wakeup_latency u64 lat pid_t pid int prio' >> \
+ /sys/kernel/debug/tracing/synthetic_events
+
+At this point, there isn't yet an actual 'wakeup_latency' event
+instantiated in the event subsytem - for this to happen, a 'hist
+trigger action' needs to be instantiated and bound to actual fields
+and variables defined on other events (see Section 6.3.3 below).
+
+Once that is done, an event instance is created, and a histogram can
+be defined using it:
+
+ # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
+ /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
+
+The new event is created under the tracing/events/synthetic/ directory
+and looks and behaves just like any other event:
+
+ # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
+ enable filter format hist id trigger
+
+Like any other event, once a histogram is enabled for the event, the
+output can be displayed by reading the event's 'hist' file.
+
+6.3.3 Hist trigger 'actions'
+----------------------------
+
+A hist trigger 'action' is a function that's executed whenever a
+histogram entry is added or updated.
+
+The default 'action' if no special function is explicity specified is
+as it always has been, to simply update the set of values associated
+with an entry. Some applications, however, may want to perform
+additional actions at that point, such as generate another event, or
+compare and save a maximum.
+
+The following additional actions are available. To specify an action
+for a given event, simply specify the action between colons in the
+hist trigger specification.
+
+ - onmatch(matching.event).<synthetic_event_name>(param list)
+
+ The 'onmatch(matching.event).<synthetic_event_name>(params)' hist
+ trigger action is invoked whenever an event matches and the
+ histogram entry would be added or updated. It causes the named
+ synthetic event to be generated with the values given in the
+ 'param list'. The result is the generation of a synthetic event
+ that consists of the values contained in those variables at the
+ time the invoking event was hit.
+
+ The 'param list' consists of one or more parameters which may be
+ either variables or fields defined on either the 'matching.event'
+ or the target event. The variables or fields specified in the
+ param list may be either fully-qualified or unqualified. If a
+ variable is specified as unqualified, it must be unique between
+ the two events. A field name used as a param can be unqualified
+ if it refers to the target event, but must be fully qualified if
+ it refers to the matching event. A fully-qualified name is of the
+ form 'system.event_name.$var_name' or 'system.event_name.field'.
+
+ The 'matching.event' specification is simply the fully qualified
+ event name of the event that matches the target event for the
+ onmatch() functionality, in the form 'system.event_name'.
+
+ Finally, the number and type of variables/fields in the 'param
+ list' must match the number and types of the fields in the
+ synthetic event being generated.
+
+ As an example the below defines a simple synthetic event and uses
+ a variable defined on the sched_wakeup_new event as a parameter
+ when invoking the synthetic event. Here we define the synthetic
+ event:
+
+ # echo 'wakeup_new_test pid_t pid' >> \
+ /sys/kernel/debug/tracing/synthetic_events
+
+ # cat /sys/kernel/debug/tracing/synthetic_events
+ wakeup_new_test pid_t pid
+
+ The following hist trigger both defines the missing testpid
+ variable and specifies an onmatch() action that generates a
+ wakeup_new_test synthetic event whenever a sched_wakeup_new event
+ occurs, which because of the 'if comm == "cyclictest"' filter only
+ happens when the executable is cyclictest:
+
+ # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\
+ wakeup_new_test($testpid) if comm=="cyclictest"' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
+
+ Creating and displaying a histogram based on those events is now
+ just a matter of using the fields and new synthetic event in the
+ tracing/events/synthetic directory, as usual:
+
+ # echo 'hist:keys=pid:sort=pid' >> \
+ /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger
+
+ Running 'cyclictest' should cause wakeup_new events to generate
+ wakeup_new_test synthetic events which should result in histogram
+ output in the wakeup_new_test event's hist file:
+
+ # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist
+
+ A more typical usage would be to use two events to calculate a
+ latency. The following example uses a set of hist triggers to
+ produce a 'wakeup_latency' histogram:
+
+ First, we define a 'wakeup_latency' synthetic event:
+
+ # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
+ /sys/kernel/debug/tracing/synthetic_events
+
+ Next, we specify that whenever we see a sched_waking event for a
+ cyclictest thread, save the timestamp in a 'ts0' variable:
+
+ # echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=$common_timestamp.usecs \
+ if comm=="cyclictest"' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
+
+ Then, when the corresponding thread is actually scheduled onto the
+ CPU by a sched_switch event, calculate the latency and use that
+ along with another variable and an event field to generate a
+ wakeup_latency synthetic event:
+
+ # echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0:\
+ onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\
+ $saved_pid,next_prio) if next_comm=="cyclictest"' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+ We also need to create a histogram on the wakeup_latency synthetic
+ event in order to aggregate the generated synthetic event data:
+
+ # echo 'hist:keys=pid,prio,lat:sort=pid,lat' >> \
+ /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
+
+ Finally, once we've run cyclictest to actually generate some
+ events, we can see the output by looking at the wakeup_latency
+ synthetic event's hist file:
+
+ # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/hist
+
+ - onmax(var).save(field,.. .)
+
+ The 'onmax(var).save(field,...)' hist trigger action is invoked
+ whenever the value of 'var' associated with a histogram entry
+ exceeds the current maximum contained in that variable.
+
+ The end result is that the trace event fields specified as the
+ onmax.save() params will be saved if 'var' exceeds the current
+ maximum for that hist trigger entry. This allows context from the
+ event that exhibited the new maximum to be saved for later
+ reference. When the histogram is displayed, additional fields
+ displaying the saved values will be printed.
+
+ As an example the below defines a couple of hist triggers, one for
+ sched_waking and another for sched_switch, keyed on pid. Whenever
+ a sched_waking occurs, the timestamp is saved in the entry
+ corresponding to the current pid, and when the scheduler switches
+ back to that pid, the timestamp difference is calculated. If the
+ resulting latency, stored in wakeup_lat, exceeds the current
+ maximum latency, the values specified in the save() fields are
+ recoreded:
+
+ # echo 'hist:keys=pid:ts0=$common_timestamp.usecs \
+ if comm=="cyclictest"' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
+
+ # echo 'hist:keys=next_pid:\
+ wakeup_lat=$common_timestamp.usecs-$ts0:\
+ onmax($wakeup_lat).save(next_comm,prev_pid,prev_prio,prev_comm) \
+ if next_comm=="cyclictest"' >> \
+ /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
+
+ When the histogram is displayed, the max value and the saved
+ values corresponding to the max are displayed following the rest
+ of the fields:
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
+ { next_pid: 2255 } hitcount: 239
+ common_timestamp-ts0: 0
+ max: 27
+ next_comm: cyclictest
+ prev_pid: 0 prev_prio: 120 prev_comm: swapper/1
+
+ { next_pid: 2256 } hitcount: 2355
+ common_timestamp-ts0: 0
+ max: 49 next_comm: cyclictest
+ prev_pid: 0 prev_prio: 120 prev_comm: swapper/0
+
+ Totals:
+ Hits: 12970
+ Entries: 2
+ Dropped: 0
--
1.9.3

2017-09-05 21:59:36

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 27/40] tracing: Add support for 'synthetic' events

Synthetic events are user-defined events generated from hist trigger
variables saved from one or more other events.

To define a synthetic event, the user writes a simple specification
consisting of the name of the new event along with one or more
variables and their type(s), to the tracing/synthetic_events file.

For instance, the following creates a new event named 'wakeup_latency'
with 3 fields: lat, pid, and prio:

# echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
/sys/kernel/debug/tracing/synthetic_events

Reading the tracing/synthetic_events file lists all the
currently-defined synthetic events, in this case the event we defined
above:

# cat /sys/kernel/debug/tracing/synthetic_events
wakeup_latency u64 lat; pid_t pid; int prio

At this point, the synthetic event is ready to use, and a histogram
can be defined using it:

# echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
/sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger

The new event is created under the tracing/events/synthetic/ directory
and looks and behaves just like any other event:

# ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
enable filter format hist id trigger

Although a histogram can be defined for it, nothing will happen until
an action tracing that event via the trace_synth() function occurs.
The trace_synth() function is very similar to all the other trace_*
invocations spread throughout the kernel, except in this case the
trace_ function and its corresponding tracepoint isn't statically
generated but defined by the user at run-time.

How this can be automatically hooked up via a hist trigger 'action' is
discussed in a subsequent patch.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 863 +++++++++++++++++++++++++++++++++++++++
1 file changed, 863 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index c3edf7a..2906b92 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -20,10 +20,16 @@
#include <linux/slab.h>
#include <linux/stacktrace.h>
#include <linux/rculist.h>
+#include <linux/tracefs.h>

#include "tracing_map.h"
#include "trace.h"

+#define SYNTH_SYSTEM "synthetic"
+#define SYNTH_FIELDS_MAX 16
+
+#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */
+
struct hist_field;

typedef u64 (*hist_field_fn_t) (struct hist_field *field,
@@ -270,6 +276,26 @@ struct hist_trigger_data {
unsigned int n_actions;
};

+struct synth_field {
+ char *type;
+ char *name;
+ size_t size;
+ bool is_signed;
+ bool is_string;
+};
+
+struct synth_event {
+ struct list_head list;
+ int ref;
+ char *name;
+ struct synth_field **fields;
+ unsigned int n_fields;
+ unsigned int n_u64;
+ struct trace_event_class class;
+ struct trace_event_call call;
+ struct tracepoint *tp;
+};
+
struct action_data;

typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
@@ -282,6 +308,798 @@ struct action_data {
unsigned int var_ref_idx;
};

+static LIST_HEAD(synth_event_list);
+static DEFINE_MUTEX(synth_event_mutex);
+
+struct synth_trace_event {
+ struct trace_entry ent;
+ u64 fields[];
+};
+
+static int synth_event_define_fields(struct trace_event_call *call)
+{
+ struct synth_trace_event trace;
+ int offset = offsetof(typeof(trace), fields);
+ struct synth_event *event = call->data;
+ unsigned int i, size, n_u64;
+ char *name, *type;
+ bool is_signed;
+ int ret = 0;
+
+ for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
+ size = event->fields[i]->size;
+ is_signed = event->fields[i]->is_signed;
+ type = event->fields[i]->type;
+ name = event->fields[i]->name;
+ ret = trace_define_field(call, type, name, offset, size,
+ is_signed, FILTER_OTHER);
+ if (ret)
+ break;
+
+ if (event->fields[i]->is_string) {
+ offset += STR_VAR_LEN_MAX;
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ } else {
+ offset += sizeof(u64);
+ n_u64++;
+ }
+ }
+
+ event->n_u64 = n_u64;
+
+ return ret;
+}
+
+static bool synth_field_signed(char *type)
+{
+ if (strncmp(type, "u", 1) == 0)
+ return false;
+
+ return true;
+}
+
+static int synth_field_is_string(char *type)
+{
+ if (strstr(type, "char[") != NULL)
+ return true;
+
+ return false;
+}
+
+static int synth_field_string_size(char *type)
+{
+ char buf[4], *end, *start;
+ unsigned int len;
+ int size, err;
+
+ start = strstr(type, "char[");
+ if (start == NULL)
+ return -EINVAL;
+ start += strlen("char[");
+
+ end = strchr(type, ']');
+ if (!end || end < start)
+ return -EINVAL;
+
+ len = end - start;
+ if (len > 2)
+ return -EINVAL;
+
+ strncpy(buf, start, len);
+ buf[len] = '\0';
+
+ err = kstrtouint(buf, 0, &size);
+ if (err)
+ return err;
+
+ if (size > STR_VAR_LEN_MAX)
+ return -EINVAL;
+
+ return size;
+}
+
+static int synth_field_size(char *type)
+{
+ int size = 0;
+
+ if (strcmp(type, "s64") == 0)
+ size = sizeof(s64);
+ else if (strcmp(type, "u64") == 0)
+ size = sizeof(u64);
+ else if (strcmp(type, "s32") == 0)
+ size = sizeof(s32);
+ else if (strcmp(type, "u32") == 0)
+ size = sizeof(u32);
+ else if (strcmp(type, "s16") == 0)
+ size = sizeof(s16);
+ else if (strcmp(type, "u16") == 0)
+ size = sizeof(u16);
+ else if (strcmp(type, "s8") == 0)
+ size = sizeof(s8);
+ else if (strcmp(type, "u8") == 0)
+ size = sizeof(u8);
+ else if (strcmp(type, "char") == 0)
+ size = sizeof(char);
+ else if (strcmp(type, "unsigned char") == 0)
+ size = sizeof(unsigned char);
+ else if (strcmp(type, "int") == 0)
+ size = sizeof(int);
+ else if (strcmp(type, "unsigned int") == 0)
+ size = sizeof(unsigned int);
+ else if (strcmp(type, "long") == 0)
+ size = sizeof(long);
+ else if (strcmp(type, "unsigned long") == 0)
+ size = sizeof(unsigned long);
+ else if (strcmp(type, "pid_t") == 0)
+ size = sizeof(pid_t);
+ else if (synth_field_is_string(type))
+ size = synth_field_string_size(type);
+
+ return size;
+}
+
+static const char *synth_field_fmt(char *type)
+{
+ const char *fmt = "%llu";
+
+ if (strcmp(type, "s64") == 0)
+ fmt = "%lld";
+ else if (strcmp(type, "u64") == 0)
+ fmt = "%llu";
+ else if (strcmp(type, "s32") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "u32") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "s16") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "u16") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "s8") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "u8") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "char") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "unsigned char") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "int") == 0)
+ fmt = "%d";
+ else if (strcmp(type, "unsigned int") == 0)
+ fmt = "%u";
+ else if (strcmp(type, "long") == 0)
+ fmt = "%ld";
+ else if (strcmp(type, "unsigned long") == 0)
+ fmt = "%lu";
+ else if (strcmp(type, "pid_t") == 0)
+ fmt = "%d";
+ else if (strstr(type, "[") == 0)
+ fmt = "%s";
+
+ return fmt;
+}
+
+static enum print_line_t print_synth_event(struct trace_iterator *iter,
+ int flags,
+ struct trace_event *event)
+{
+ struct trace_array *tr = iter->tr;
+ struct trace_seq *s = &iter->seq;
+ struct synth_trace_event *entry;
+ struct synth_event *se;
+ unsigned int i, n_u64;
+ char print_fmt[32];
+ const char *fmt;
+
+ entry = (struct synth_trace_event *)iter->ent;
+ se = container_of(event, struct synth_event, call.event);
+
+ trace_seq_printf(s, "%s: ", se->name);
+
+ for (i = 0, n_u64 = 0; i < se->n_fields; i++) {
+ if (trace_seq_has_overflowed(s))
+ goto end;
+
+ fmt = synth_field_fmt(se->fields[i]->type);
+
+ /* parameter types */
+ if (tr->trace_flags & TRACE_ITER_VERBOSE)
+ trace_seq_printf(s, "%s ", fmt);
+
+ sprintf(print_fmt, "%%s=%s%%s", fmt);
+
+ /* parameter values */
+ if (se->fields[i]->is_string) {
+ trace_seq_printf(s, print_fmt, se->fields[i]->name,
+ (char *)entry->fields[n_u64],
+ i == se->n_fields - 1 ? "" : " ");
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ } else {
+ trace_seq_printf(s, print_fmt, se->fields[i]->name,
+ entry->fields[n_u64],
+ i == se->n_fields - 1 ? "" : " ");
+ n_u64++;
+ }
+ }
+end:
+ trace_seq_putc(s, '\n');
+
+ return trace_handle_return(s);
+}
+
+static struct trace_event_functions synth_event_funcs = {
+ .trace = print_synth_event
+};
+
+static notrace void trace_event_raw_event_synth(void *__data,
+ u64 *var_ref_vals,
+ unsigned int var_ref_idx)
+{
+ struct trace_event_file *trace_file = __data;
+ struct synth_trace_event *entry;
+ struct trace_event_buffer fbuffer;
+ struct synth_event *event;
+ unsigned int i, n_u64;
+ int fields_size = 0;
+
+ event = trace_file->event_call->data;
+
+ if (trace_trigger_soft_disabled(trace_file))
+ return;
+
+ fields_size = event->n_u64 * sizeof(u64);
+
+ entry = trace_event_buffer_reserve(&fbuffer, trace_file,
+ sizeof(*entry) + fields_size);
+ if (!entry)
+ return;
+
+ for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
+ if (event->fields[i]->is_string) {
+ char *str_val = (char *)var_ref_vals[var_ref_idx + i];
+ char *str_field = (char *)&entry->fields[n_u64];
+
+ strncpy(str_field, str_val, STR_VAR_LEN_MAX);
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ } else {
+ entry->fields[i] = var_ref_vals[var_ref_idx + i];
+ n_u64++;
+ }
+ }
+
+ trace_event_buffer_commit(&fbuffer);
+}
+
+static void free_synth_event_print_fmt(struct trace_event_call *call)
+{
+ if (call)
+ kfree(call->print_fmt);
+}
+
+static int __set_synth_event_print_fmt(struct synth_event *event,
+ char *buf, int len)
+{
+ const char *fmt;
+ int pos = 0;
+ int i;
+
+ /* When len=0, we just calculate the needed length */
+#define LEN_OR_ZERO (len ? len - pos : 0)
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
+ for (i = 0; i < event->n_fields; i++) {
+ fmt = synth_field_fmt(event->fields[i]->type);
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s",
+ event->fields[i]->name, fmt,
+ i == event->n_fields - 1 ? "" : ", ");
+ }
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
+
+ for (i = 0; i < event->n_fields; i++) {
+ pos += snprintf(buf + pos, LEN_OR_ZERO,
+ ", REC->%s", event->fields[i]->name);
+ }
+
+#undef LEN_OR_ZERO
+
+ /* return the length of print_fmt */
+ return pos;
+}
+
+static int set_synth_event_print_fmt(struct trace_event_call *call)
+{
+ struct synth_event *event = call->data;
+ char *print_fmt;
+ int len;
+
+ /* First: called with 0 length to calculate the needed length */
+ len = __set_synth_event_print_fmt(event, NULL, 0);
+
+ print_fmt = kmalloc(len + 1, GFP_KERNEL);
+ if (!print_fmt)
+ return -ENOMEM;
+
+ /* Second: actually write the @print_fmt */
+ __set_synth_event_print_fmt(event, print_fmt, len + 1);
+ call->print_fmt = print_fmt;
+
+ return 0;
+}
+
+static void free_synth_field(struct synth_field *field)
+{
+ kfree(field->type);
+ kfree(field->name);
+ kfree(field);
+}
+
+static struct synth_field *parse_synth_field(char *field_type,
+ char *field_name)
+{
+ struct synth_field *field;
+ int len, ret = 0;
+ char *array;
+
+ if (field_type[0] == ';')
+ field_type++;
+
+ len = strlen(field_name);
+ if (field_name[len - 1] == ';')
+ field_name[len - 1] = '\0';
+
+ field = kzalloc(sizeof(*field), GFP_KERNEL);
+ if (!field)
+ return ERR_PTR(-ENOMEM);
+
+ len = strlen(field_type) + 1;
+ array = strchr(field_name, '[');
+ if (array)
+ len += strlen(array);
+ field->type = kzalloc(len, GFP_KERNEL);
+ if (!field->type) {
+ ret = -ENOMEM;
+ goto free;
+ }
+ strcat(field->type, field_type);
+ if (array) {
+ strcat(field->type, array);
+ *array = '\0';
+ }
+
+ field->size = synth_field_size(field->type);
+ if (!field->size) {
+ ret = -EINVAL;
+ goto free;
+ }
+
+ if (synth_field_is_string(field->type))
+ field->is_string = true;
+
+ field->is_signed = synth_field_signed(field->type);
+
+ field->name = kstrdup(field_name, GFP_KERNEL);
+ if (!field->name) {
+ ret = -ENOMEM;
+ goto free;
+ }
+ out:
+ return field;
+ free:
+ free_synth_field(field);
+ field = ERR_PTR(ret);
+ goto out;
+}
+
+static void free_synth_tracepoint(struct tracepoint *tp)
+{
+ if (!tp)
+ return;
+
+ kfree(tp->name);
+ kfree(tp);
+}
+
+static struct tracepoint *alloc_synth_tracepoint(char *name)
+{
+ struct tracepoint *tp;
+ int ret = 0;
+
+ tp = kzalloc(sizeof(*tp), GFP_KERNEL);
+ if (!tp) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ tp->name = kstrdup(name, GFP_KERNEL);
+ if (!tp->name) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ tp->dynamic = true;
+
+ return tp;
+ free:
+ free_synth_tracepoint(tp);
+
+ return ERR_PTR(ret);
+}
+
+typedef void (*synth_probe_func_t) (void *__data, u64 *var_ref_vals,
+ unsigned int var_ref_idx);
+
+static inline void trace_synth(struct synth_event *event, u64 *var_ref_vals,
+ unsigned int var_ref_idx)
+{
+ struct tracepoint *tp = event->tp;
+
+ if (unlikely(atomic_read(&tp->key.enabled) > 0)) {
+ struct tracepoint_func *probe_func_ptr;
+ synth_probe_func_t probe_func;
+ void *__data;
+
+ if (!(cpu_online(raw_smp_processor_id())))
+ return;
+
+ probe_func_ptr = rcu_dereference_sched((tp)->funcs);
+ if (probe_func_ptr) {
+ do {
+ probe_func = (probe_func_ptr)->func;
+ __data = (probe_func_ptr)->data;
+ probe_func(__data, var_ref_vals, var_ref_idx);
+ } while ((++probe_func_ptr)->func);
+ }
+ }
+}
+
+static struct synth_event *find_synth_event(const char *name)
+{
+ struct synth_event *event;
+
+ list_for_each_entry(event, &synth_event_list, list) {
+ if (strcmp(event->name, name) == 0)
+ return event;
+ }
+
+ return NULL;
+}
+
+static int register_synth_event(struct synth_event *event)
+{
+ struct trace_event_call *call = &event->call;
+ int ret = 0;
+
+ event->call.class = &event->class;
+ event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL);
+ if (!event->class.system) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ event->tp = alloc_synth_tracepoint(event->name);
+ if (IS_ERR(event->tp)) {
+ ret = PTR_ERR(event->tp);
+ event->tp = NULL;
+ goto out;
+ }
+
+ INIT_LIST_HEAD(&call->class->fields);
+ call->event.funcs = &synth_event_funcs;
+ call->class->define_fields = synth_event_define_fields;
+
+ ret = register_trace_event(&call->event);
+ if (!ret) {
+ ret = -ENODEV;
+ goto out;
+ }
+ call->flags = TRACE_EVENT_FL_TRACEPOINT;
+ call->class->reg = trace_event_reg;
+ call->class->probe = trace_event_raw_event_synth;
+ call->data = event;
+ call->tp = event->tp;
+
+ mutex_unlock(&synth_event_mutex);
+ ret = trace_add_event_call(call);
+ mutex_lock(&synth_event_mutex);
+ if (ret) {
+ pr_warn("Failed to register synthetic event: %s\n",
+ trace_event_name(call));
+ goto err;
+ }
+
+ ret = set_synth_event_print_fmt(call);
+ if (ret < 0) {
+ mutex_unlock(&synth_event_mutex);
+ trace_remove_event_call(call);
+ mutex_lock(&synth_event_mutex);
+ goto err;
+ }
+ out:
+ return ret;
+ err:
+ unregister_trace_event(&call->event);
+ goto out;
+}
+
+static int unregister_synth_event(struct synth_event *event)
+{
+ struct trace_event_call *call = &event->call;
+ int ret;
+
+ mutex_unlock(&synth_event_mutex);
+ ret = trace_remove_event_call(call);
+ mutex_lock(&synth_event_mutex);
+ if (ret) {
+ pr_warn("Failed to remove synthetic event: %s\n",
+ trace_event_name(call));
+ free_synth_event_print_fmt(call);
+ unregister_trace_event(&call->event);
+ }
+
+ return ret;
+}
+
+static void remove_synth_event(struct synth_event *event)
+{
+ unregister_synth_event(event);
+ list_del(&event->list);
+}
+
+static int add_synth_event(struct synth_event *event)
+{
+ int ret;
+
+ ret = register_synth_event(event);
+ if (ret)
+ return ret;
+
+ list_add(&event->list, &synth_event_list);
+
+ return 0;
+}
+
+static void free_synth_event(struct synth_event *event)
+{
+ unsigned int i;
+
+ if (!event)
+ return;
+
+ for (i = 0; i < event->n_fields; i++)
+ free_synth_field(event->fields[i]);
+
+ kfree(event->fields);
+ kfree(event->name);
+ kfree(event->class.system);
+ free_synth_tracepoint(event->tp);
+ free_synth_event_print_fmt(&event->call);
+ kfree(event);
+}
+
+static struct synth_event *alloc_synth_event(char *event_name, int n_fields,
+ struct synth_field **fields)
+{
+ struct synth_event *event;
+ unsigned int i;
+
+ event = kzalloc(sizeof(*event), GFP_KERNEL);
+ if (!event) {
+ event = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ event->name = kstrdup(event_name, GFP_KERNEL);
+ if (!event->name) {
+ kfree(event);
+ event = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ event->fields = kcalloc(n_fields, sizeof(event->fields), GFP_KERNEL);
+ if (!event->fields) {
+ free_synth_event(event);
+ event = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ for (i = 0; i < n_fields; i++)
+ event->fields[i] = fields[i];
+
+ event->n_fields = n_fields;
+ out:
+ return event;
+}
+
+static int create_synth_event(int argc, char **argv)
+{
+ struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
+ struct synth_event *event = NULL;
+ bool delete_event = false;
+ int i, n_fields = 0, ret = 0;
+ char *name;
+
+ mutex_lock(&synth_event_mutex);
+
+ /*
+ * Argument syntax:
+ * - Add synthetic event: <event_name> field[;field] ...
+ * - Remove synthetic event: !<event_name> field[;field] ...
+ * where 'field' = type field_name
+ */
+ if (argc < 1) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ name = argv[0];
+ if (name[0] == '!') {
+ delete_event = true;
+ name++;
+ }
+
+ event = find_synth_event(name);
+ if (event) {
+ if (delete_event) {
+ if (event->ref) {
+ ret = -EBUSY;
+ goto out;
+ }
+ remove_synth_event(event);
+ free_synth_event(event);
+ goto out;
+ }
+ ret = -EEXIST;
+ goto out;
+ } else if (delete_event) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (argc < 2) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ for (i = 1; i < argc - 1; i++) {
+ if (strcmp(argv[i], ";") == 0)
+ continue;
+ if (n_fields == SYNTH_FIELDS_MAX) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ field = parse_synth_field(argv[i], argv[i + 1]);
+ if (IS_ERR(field)) {
+ ret = PTR_ERR(field);
+ goto err;
+ }
+ fields[n_fields] = field;
+ i++; n_fields++;
+ }
+
+ if (i < argc) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ event = alloc_synth_event(name, n_fields, fields);
+ if (IS_ERR(event)) {
+ ret = PTR_ERR(event);
+ event = NULL;
+ goto err;
+ }
+
+ add_synth_event(event);
+ out:
+ mutex_unlock(&synth_event_mutex);
+
+ return ret;
+ err:
+ for (i = 0; i < n_fields; i++)
+ free_synth_field(fields[i]);
+ free_synth_event(event);
+
+ goto out;
+}
+
+static int release_all_synth_events(void)
+{
+ struct synth_event *event, *e;
+ int ret = 0;
+
+ mutex_lock(&synth_event_mutex);
+
+ list_for_each_entry(event, &synth_event_list, list) {
+ if (event->ref) {
+ ret = -EBUSY;
+ goto out;
+ }
+ }
+
+ list_for_each_entry_safe(event, e, &synth_event_list, list) {
+ remove_synth_event(event);
+ free_synth_event(event);
+ }
+ out:
+ mutex_unlock(&synth_event_mutex);
+
+ return ret;
+}
+
+
+static void *synth_events_seq_start(struct seq_file *m, loff_t *pos)
+{
+ mutex_lock(&synth_event_mutex);
+
+ return seq_list_start(&synth_event_list, *pos);
+}
+
+static void *synth_events_seq_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ return seq_list_next(v, &synth_event_list, pos);
+}
+
+static void synth_events_seq_stop(struct seq_file *m, void *v)
+{
+ mutex_unlock(&synth_event_mutex);
+}
+
+static int synth_events_seq_show(struct seq_file *m, void *v)
+{
+ struct synth_field *field;
+ struct synth_event *event = v;
+ unsigned int i;
+
+ seq_printf(m, "%s\t", event->name);
+
+ for (i = 0; i < event->n_fields; i++) {
+ field = event->fields[i];
+
+ /* parameter values */
+ seq_printf(m, "%s %s%s", field->type, field->name,
+ i == event->n_fields - 1 ? "" : "; ");
+ }
+
+ seq_putc(m, '\n');
+
+ return 0;
+}
+
+static const struct seq_operations synth_events_seq_op = {
+ .start = synth_events_seq_start,
+ .next = synth_events_seq_next,
+ .stop = synth_events_seq_stop,
+ .show = synth_events_seq_show
+};
+
+static int synth_events_open(struct inode *inode, struct file *file)
+{
+ int ret;
+
+ if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
+ ret = release_all_synth_events();
+ if (ret < 0)
+ return ret;
+ }
+
+ return seq_open(file, &synth_events_seq_op);
+}
+
+static ssize_t synth_events_write(struct file *file,
+ const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ return trace_parse_run_command(file, buffer, count, ppos,
+ create_synth_event);
+}
+
+static const struct file_operations synth_events_fops = {
+ .open = synth_events_open,
+ .write = synth_events_write,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
static u64 hist_field_timestamp(struct hist_field *hist_field,
struct tracing_map_elt *elt,
struct ring_buffer_event *rbe,
@@ -2933,6 +3751,8 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
struct hist_trigger_attrs *attrs;
struct event_trigger_ops *trigger_ops;
struct hist_trigger_data *hist_data;
+ struct synth_event *se;
+ const char *se_name;
bool remove = false;
char *trigger;
int ret = 0;
@@ -2991,6 +3811,14 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
}

cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
+
+ mutex_lock(&synth_event_mutex);
+ se_name = trace_event_name(file->event_call);
+ se = find_synth_event(se_name);
+ if (se)
+ se->ref--;
+ mutex_unlock(&synth_event_mutex);
+
ret = 0;
goto out_free;
}
@@ -3008,6 +3836,13 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
} else if (ret < 0)
goto out_free;

+ mutex_lock(&synth_event_mutex);
+ se_name = trace_event_name(file->event_call);
+ se = find_synth_event(se_name);
+ if (se)
+ se->ref++;
+ mutex_unlock(&synth_event_mutex);
+
if (get_named_trigger_data(trigger_data))
goto enable;

@@ -3198,3 +4033,31 @@ __init int register_trigger_hist_enable_disable_cmds(void)

return ret;
}
+
+static __init int trace_events_hist_init(void)
+{
+ struct dentry *entry = NULL;
+ struct dentry *d_tracer;
+ int err = 0;
+
+ d_tracer = tracing_init_dentry();
+ if (IS_ERR(d_tracer)) {
+ err = PTR_ERR(d_tracer);
+ goto err;
+ }
+
+ entry = tracefs_create_file("synthetic_events", 0644, d_tracer,
+ NULL, &synth_events_fops);
+ if (!entry) {
+ err = -ENODEV;
+ goto err;
+ }
+
+ return err;
+ err:
+ pr_warn("Could not create tracefs 'synthetic_events' entry\n");
+
+ return err;
+}
+
+fs_initcall(trace_events_hist_init);
--
1.9.3

2017-09-05 22:01:56

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 34/40] tracing: Add 'last error' error facility for hist triggers

With the addition of variables and actions, it's become necessary to
provide more detailed error information to users about syntax errors.

Add a 'last error' facility accessible via the erroring event's 'hist'
file. Reading the hist file after an error will display more detailed
information about what went wrong, if information is available. This
extended error information will be available until the next hist
trigger command for that event.

# echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
echo: write error: Invalid argument

# cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist

ERROR: Couldn't yyy: zzz
Last command: xxx

Also add specific error messages for variable and action errors.

Signed-off-by: Tom Zanussi <[email protected]>
---
Documentation/trace/events.txt | 19 ++++
kernel/trace/trace_events_hist.c | 188 ++++++++++++++++++++++++++++++++++++---
2 files changed, 194 insertions(+), 13 deletions(-)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 9717688..f271d87 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -686,6 +686,25 @@ The following commands are supported:
interpreted as microseconds.
cpu int - the cpu on which the event occurred.

+ Extended error information
+ --------------------------
+
+ For some error conditions encountered when invoking a hist trigger
+ command, extended error information is available via the
+ corresponding event's 'hist' file. Reading the hist file after an
+ error will display more detailed information about what went wrong,
+ if information is available. This extended error information will
+ be available until the next hist trigger command for that event.
+
+ If available for a given error condition, the extended error
+ information and usage takes the following form:
+
+ # echo xxx > /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
+ echo: write error: Invalid argument
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
+ ERROR: Couldn't yyy: zzz
+ Last command: xxx

6.2 'hist' trigger examples
---------------------------
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 764c5e5..655b731 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -351,6 +351,88 @@ struct action_data {
};
};

+
+static char *hist_err_str;
+static char *last_hist_cmd;
+
+static int hist_err_alloc(void)
+{
+ int ret = 0;
+
+ last_hist_cmd = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
+ if (!last_hist_cmd)
+ return -ENOMEM;
+
+ hist_err_str = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
+ if (!hist_err_str) {
+ kfree(last_hist_cmd);
+ ret = -ENOMEM;
+ }
+
+ return ret;
+}
+
+static void last_cmd_set(char *str)
+{
+ if (!last_hist_cmd || !str)
+ return;
+
+ if (strlen(str) > MAX_FILTER_STR_VAL - 1)
+ return;
+
+ strcpy(last_hist_cmd, str);
+}
+
+static void hist_err(char *str, char *var)
+{
+ int maxlen = MAX_FILTER_STR_VAL - 1;
+
+ if (!hist_err_str || !str)
+ return;
+
+ if (strlen(hist_err_str))
+ return;
+
+ if (!var)
+ var = "";
+
+ if (strlen(hist_err_str) + strlen(str) + strlen(var) > maxlen)
+ return;
+
+ strcat(hist_err_str, str);
+ strcat(hist_err_str, var);
+}
+
+static void hist_err_event(char *str, char *system, char *event, char *var)
+{
+ char err[MAX_FILTER_STR_VAL];
+
+ if (system && var)
+ sprintf(err, "%s.%s.%s", system, event, var);
+ else if (system)
+ sprintf(err, "%s.%s", system, event);
+ else
+ strcpy(err, var);
+
+ hist_err(str, err);
+}
+
+static void hist_err_clear(void)
+{
+ if (!hist_err_str)
+ return;
+
+ hist_err_str[0] = '\0';
+}
+
+static bool have_hist_err(void)
+{
+ if (hist_err_str && strlen(hist_err_str))
+ return true;
+
+ return false;
+}
+
static LIST_HEAD(synth_event_list);
static DEFINE_MUTEX(synth_event_mutex);

@@ -2110,9 +2192,18 @@ static struct hist_field *create_var_ref(struct hist_field *var_field)
return ref_field;
}

+static bool is_common_field(char *var_name)
+{
+ if (strncmp(var_name, "$common_timestamp", strlen("$common_timestamp")) == 0)
+ return true;
+
+ return false;
+}
+
static bool is_var_ref(char *var_name)
{
- if (!var_name || strlen(var_name) < 2 || var_name[0] != '$')
+ if (!var_name || strlen(var_name) < 2 || var_name[0] != '$' ||
+ is_common_field(var_name))
return false;

return true;
@@ -2164,6 +2255,10 @@ static struct hist_field *parse_var_ref(struct trace_array *tr,
if (var_field)
ref_field = create_var_ref(var_field);

+ if (!ref_field)
+ hist_err_event("Couldn't find variable: $",
+ system, event_name, var_name);
+
return ref_field;
}

@@ -2399,8 +2494,10 @@ static int check_expr_operands(struct hist_field *operand1,
}

if ((operand1_flags & HIST_FIELD_FL_TIMESTAMP_USECS) !=
- (operand2_flags & HIST_FIELD_FL_TIMESTAMP_USECS))
+ (operand2_flags & HIST_FIELD_FL_TIMESTAMP_USECS)) {
+ hist_err("Timestamp units in expression don't match", NULL);
return -EINVAL;
+ }

return 0;
}
@@ -2600,19 +2697,27 @@ static struct trace_event_file *event_file(struct trace_array *tr,
char *cmd;
int ret;

- if (target_hist_data->n_field_var_hists >= SYNTH_FIELDS_MAX)
+ if (target_hist_data->n_field_var_hists >= SYNTH_FIELDS_MAX) {
+ hist_err_event("onmatch: Too many field variables defined: ",
+ system, event_name, field_name);
return ERR_PTR(-EINVAL);
+ }

file = event_file(tr, system, event_name);

if (IS_ERR(file)) {
+ hist_err_event("onmatch: Event file not found: ",
+ system, event_name, field_name);
ret = PTR_ERR(file);
return ERR_PTR(ret);
}

hist_data = find_compatible_hist(target_hist_data, file);
- if (!hist_data)
+ if (!hist_data) {
+ hist_err_event("onmatch: Matching event histogram not found: ",
+ system, event_name, field_name);
return ERR_PTR(-EINVAL);
+ }

var_hist = kzalloc(sizeof(*var_hist), GFP_KERNEL);
if (!var_hist)
@@ -2660,6 +2765,8 @@ static struct trace_event_file *event_file(struct trace_array *tr,
kfree(cmd);
kfree(var_hist->cmd);
kfree(var_hist);
+ hist_err_event("onmatch: Couldn't create histogram for field: ",
+ system, event_name, field_name);
return ERR_PTR(ret);
}

@@ -2671,6 +2778,8 @@ static struct trace_event_file *event_file(struct trace_array *tr,
kfree(cmd);
kfree(var_hist->cmd);
kfree(var_hist);
+ hist_err_event("onmatch: Couldn't find synthetic variable: ",
+ system, event_name, field_name);
return ERR_PTR(-EINVAL);
}

@@ -2807,18 +2916,21 @@ static struct field_var *create_field_var(struct hist_trigger_data *hist_data,
int ret = 0;

if (hist_data->n_field_vars >= SYNTH_FIELDS_MAX) {
+ hist_err("Too many field variables defined: ", field_name);
ret = -EINVAL;
goto err;
}

val = parse_atom(hist_data, file, field_name, &flags, NULL);
if (IS_ERR(val)) {
+ hist_err("Couldn't parse field variable: ", field_name);
ret = PTR_ERR(val);
goto err;
}

var = create_var(hist_data, file, field_name, val->size, val->type);
if (IS_ERR(var)) {
+ hist_err("Couldn't create or find variable: ", field_name);
kfree(val);
ret = PTR_ERR(var);
goto err;
@@ -2943,14 +3055,18 @@ static int onmax_create(struct hist_trigger_data *hist_data,
int ret = 0;

onmax_var_str = data->onmax.var_str;
- if (onmax_var_str[0] != '$')
+ if (onmax_var_str[0] != '$') {
+ hist_err("onmax: For onmax(x), x must be a variable: ", onmax_var_str);
return -EINVAL;
+ }
onmax_var_str++;

event_name = trace_event_name(call);
var_field = find_target_event_var(hist_data, NULL, NULL, onmax_var_str);
- if (!var_field)
+ if (!var_field) {
+ hist_err("onmax: Couldn't find onmax variable: ", onmax_var_str);
return -EINVAL;
+ }

flags = HIST_FIELD_FL_VAR_REF;
ref_field = create_hist_field(hist_data, NULL, flags, NULL);
@@ -2970,6 +3086,7 @@ static int onmax_create(struct hist_trigger_data *hist_data,
data->onmax.max_var_ref_idx = var_ref_idx;
max_var = create_var(hist_data, file, "max", sizeof(u64), "u64");
if (IS_ERR(max_var)) {
+ hist_err("onmax: Couldn't create onmax variable: ", "max");
ret = PTR_ERR(max_var);
goto out;
}
@@ -2982,6 +3099,7 @@ static int onmax_create(struct hist_trigger_data *hist_data,

field_var = create_target_field_var(hist_data, NULL, NULL, param);
if (IS_ERR(field_var)) {
+ hist_err("onmax: Couldn't create field variable: ", param);
ret = PTR_ERR(field_var);
kfree(param);
goto out;
@@ -3014,6 +3132,7 @@ static int parse_action_params(char *params, struct action_data *data)

param = strstrip(param);
if (strlen(param) < 2) {
+ hist_err("Invalid action param: ", param);
ret = -EINVAL;
goto out;
}
@@ -3185,6 +3304,9 @@ static int check_synth_field(struct synth_event *event,
hist_field = find_event_var(tr, system, event, var);
}

+ if (!hist_field)
+ hist_err_event("onmatch: Couldn't find onmatch param: $", system, event, var);
+
return hist_field;
}

@@ -3236,6 +3358,7 @@ static int onmatch_create(struct hist_trigger_data *hist_data,

event = find_synth_event(data->onmatch.synth_event_name);
if (!event) {
+ hist_err("onmatch: Couldn't find synthetic event: ", data->onmatch.synth_event_name);
ret = -EINVAL;
goto out;
}
@@ -3275,6 +3398,7 @@ static int onmatch_create(struct hist_trigger_data *hist_data,
ret = -EINVAL;
goto out;
}
+
if (check_synth_field(event, hist_field, field_pos) == 0) {
var_ref = create_var_ref(hist_field);
if (!var_ref) {
@@ -3289,12 +3413,15 @@ static int onmatch_create(struct hist_trigger_data *hist_data,
continue;
}

+ hist_err_event("onmatch: Param type doesn't match synthetic event field type: ",
+ system, event_name, param);
kfree(p);
ret = -EINVAL;
goto out;
}

if (field_pos != event->n_fields) {
+ hist_err("onmatch: Param count doesn't match synthetic event field count: ", event->name);
ret = -EINVAL;
goto out;
}
@@ -3322,15 +3449,22 @@ static struct action_data *onmatch_parse(struct trace_array *tr, char *str)
return ERR_PTR(-ENOMEM);

match_event = strsep(&str, ")");
- if (!match_event || !str)
+ if (!match_event || !str) {
+ hist_err("onmatch: Missing closing paren: ", match_event);
goto free;
+ }

match_event_system = strsep(&match_event, ".");
- if (!match_event)
+ if (!match_event) {
+ hist_err("onmatch: Missing subsystem for match event: ", match_event_system);
goto free;
+ }

- if (IS_ERR(event_file(tr, match_event_system, match_event)))
+ if (IS_ERR(event_file(tr, match_event_system, match_event))) {
+ hist_err_event("onmatch: Invalid subsystem or event name: ",
+ match_event_system, match_event, NULL);
goto free;
+ }

data->onmatch.match_event = kstrdup(match_event, GFP_KERNEL);
if (!data->onmatch.match_event) {
@@ -3345,12 +3479,16 @@ static struct action_data *onmatch_parse(struct trace_array *tr, char *str)
}

strsep(&str, ".");
- if (!str)
+ if (!str) {
+ hist_err("onmatch: Missing . after onmatch(): ", str);
goto free;
+ }

synth_event_name = strsep(&str, "(");
- if (!synth_event_name || !str)
+ if (!synth_event_name || !str) {
+ hist_err("onmatch: Missing opening paramlist paren: ", synth_event_name);
goto free;
+ }

data->onmatch.synth_event_name = kstrdup(synth_event_name, GFP_KERNEL);
if (!data->onmatch.synth_event_name) {
@@ -3359,8 +3497,10 @@ static struct action_data *onmatch_parse(struct trace_array *tr, char *str)
}

params = strsep(&str, ")");
- if (!params || !str || (str && strlen(str)))
+ if (!params || !str || (str && strlen(str))) {
+ hist_err("onmatch: Missing closing paramlist paren: ", params);
goto free;
+ }

ret = parse_action_params(params, data);
if (ret)
@@ -3440,12 +3580,14 @@ static int create_var_field(struct hist_trigger_data *hist_data,
return -EINVAL;

if (find_var(file, var_name) && !hist_data->remove) {
+ hist_err("Variable already defined: ", var_name);
return -EINVAL;
}

flags |= HIST_FIELD_FL_VAR;
hist_data->n_vars++;
if (hist_data->n_vars > TRACING_MAP_VARS_MAX) {
+ hist_err("Too many variables defined: ", var_name);
return -EINVAL;
}

@@ -3636,6 +3778,7 @@ static int parse_var_defs(struct hist_trigger_data *hist_data)

var_name = strsep(&field_str, "=");
if (!var_name || !field_str) {
+ hist_err("Malformed assignment: ", var_name);
ret = -EINVAL;
goto free;
}
@@ -4362,6 +4505,11 @@ static int hist_show(struct seq_file *m, void *v)
hist_trigger_show(m, data, n++);
}

+ if (have_hist_err()) {
+ seq_printf(m, "\nERROR: %s\n", hist_err_str);
+ seq_printf(m, " Last command: %s\n", last_hist_cmd);
+ }
+
out_unlock:
mutex_unlock(&event_mutex);

@@ -4709,6 +4857,7 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
if (named_data) {
if (!hist_trigger_match(data, named_data, named_data,
true)) {
+ hist_err("Named hist trigger doesn't match existing named trigger (includes variables): ", hist_data->attrs->name);
ret = -EINVAL;
goto out;
}
@@ -4728,13 +4877,16 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
test->paused = false;
else if (hist_data->attrs->clear)
hist_clear(test);
- else
+ else {
+ hist_err("Hist trigger already exists", NULL);
ret = -EEXIST;
+ }
goto out;
}
}
new:
if (hist_data->attrs->cont || hist_data->attrs->clear) {
+ hist_err("Can't clear or continue a nonexistent hist trigger", NULL);
ret = -ENOENT;
goto out;
}
@@ -4901,6 +5053,11 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
char *trigger, *p;
int ret = 0;

+ if (glob && strlen(glob)) {
+ last_cmd_set(param);
+ hist_err_clear();
+ }
+
if (!param)
return -EINVAL;

@@ -5019,6 +5176,9 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
/* Just return zero, not the number of registered triggers */
ret = 0;
out:
+ if (ret == 0)
+ hist_err_clear();
+
return ret;
out_unreg:
cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
@@ -5208,6 +5368,8 @@ static __init int trace_events_hist_init(void)
goto err;
}

+ hist_err_alloc();
+
return err;
err:
pr_warn("Could not create tracefs 'synthetic_events' entry\n");
--
1.9.3

2017-09-05 22:02:16

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 33/40] tracing: Add hist trigger support for variable reference aliases

Add support for alias=$somevar where alias can be used as
onmatch($alias).

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 61 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 58 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0782766..764c5e5 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -227,6 +227,7 @@ enum hist_field_flags {
HIST_FIELD_FL_EXPR = 16384,
HIST_FIELD_FL_VAR_REF = 32768,
HIST_FIELD_FL_CPU = 65536,
+ HIST_FIELD_FL_ALIAS = 131072,
};

struct var_defs {
@@ -1525,7 +1526,8 @@ static const char *hist_field_name(struct hist_field *field,

if (field->field)
field_name = field->field->name;
- else if (field->flags & HIST_FIELD_FL_LOG2)
+ else if (field->flags & HIST_FIELD_FL_LOG2 ||
+ field->flags & HIST_FIELD_FL_ALIAS)
field_name = hist_field_name(field->operands[0], ++level);
else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
field_name = "$common_timestamp";
@@ -1961,7 +1963,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,

hist_field->hist_data = hist_data;

- if (flags & HIST_FIELD_FL_EXPR)
+ if (flags & HIST_FIELD_FL_EXPR || flags & HIST_FIELD_FL_ALIAS)
goto out; /* caller will populate */

if (flags & HIST_FIELD_FL_VAR_REF) {
@@ -2219,6 +2221,29 @@ static struct hist_field *parse_var_ref(struct trace_array *tr,
return field;
}

+static struct hist_field *create_alias(struct hist_trigger_data *hist_data,
+ struct hist_field *var_ref,
+ char *var_name)
+{
+ struct hist_field *alias = NULL;
+ unsigned long flags = HIST_FIELD_FL_ALIAS | HIST_FIELD_FL_VAR |
+ HIST_FIELD_FL_VAR_ONLY;
+
+ alias = create_hist_field(hist_data, NULL, flags, var_name);
+ if (!alias)
+ return NULL;
+
+ alias->fn = var_ref->fn;
+ alias->operands[0] = var_ref;
+
+ if (init_var_ref(alias, var_ref)) {
+ destroy_hist_field(alias, 0);
+ return NULL;
+ }
+
+ return alias;
+}
+
struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
struct trace_event_file *file, char *str,
unsigned long *flags, char *var_name)
@@ -2245,6 +2270,13 @@ struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
if (hist_field) {
hist_data->var_refs[hist_data->n_var_refs] = hist_field;
hist_field->var_ref_idx = hist_data->n_var_refs++;
+ if (var_name) {
+ hist_field = create_alias(hist_data, hist_field, var_name);
+ if (!hist_field) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
return hist_field;
}
} else
@@ -2346,6 +2378,26 @@ static int check_expr_operands(struct hist_field *operand1,
unsigned long operand1_flags = operand1->flags;
unsigned long operand2_flags = operand2->flags;

+ if ((operand1_flags & HIST_FIELD_FL_VAR_REF) ||
+ (operand1_flags & HIST_FIELD_FL_ALIAS)) {
+ struct hist_field *var;
+
+ var = find_var_field(operand1->var.hist_data, operand1->name);
+ if (!var)
+ return -EINVAL;
+ operand1_flags = var->flags;
+ }
+
+ if ((operand2_flags & HIST_FIELD_FL_VAR_REF) ||
+ (operand2_flags & HIST_FIELD_FL_ALIAS)) {
+ struct hist_field *var;
+
+ var = find_var_field(operand2->var.hist_data, operand2->name);
+ if (!var)
+ return -EINVAL;
+ operand2_flags = var->flags;
+ }
+
if ((operand1_flags & HIST_FIELD_FL_TIMESTAMP_USECS) !=
(operand2_flags & HIST_FIELD_FL_TIMESTAMP_USECS))
return -EINVAL;
@@ -4339,8 +4391,11 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
seq_puts(m, "$common_timestamp");
else if (hist_field->flags & HIST_FIELD_FL_CPU)
seq_puts(m, "cpu");
- else if (field_name)
+ else if (field_name) {
+ if (hist_field->flags & HIST_FIELD_FL_ALIAS)
+ seq_putc(m, '$');
seq_printf(m, "%s", field_name);
+ }

if (hist_field->flags) {
const char *flags_str = get_hist_field_flags(hist_field);
--
1.9.3

2017-09-05 22:02:37

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 32/40] tracing: Add cpu field for hist triggers

A common key to use in a histogram is the cpuid - add a new cpu
'synthetic' field for that purpose. This field is named cpu rather
than $cpu or $common_cpu because 'cpu' already exists as a special
filter field and it makes more sense to match that rather than add
another name for the same thing.

Signed-off-by: Tom Zanussi <[email protected]>
---
Documentation/trace/events.txt | 18 ++++++++++++++++++
kernel/trace/trace_events_hist.c | 30 +++++++++++++++++++++++++++---
2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 2cc08d4..9717688 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -668,6 +668,24 @@ The following commands are supported:
The examples below provide a more concrete illustration of the
concepts and typical usage patterns discussed above.

+ 'synthetic' event fields
+ ------------------------
+
+ There are a number of 'synthetic fields' available for use as keys
+ or values in a hist trigger. These look like and behave as if they
+ were event fields, but aren't actually part of the event's field
+ definition or format file. They are however available for any
+ event, and can be used anywhere an actual event field could be.
+ 'Synthetic' field names are always prefixed with a '$' character to
+ indicate that they're not normal fields (with the exception of
+ 'cpu', for compatibility with existing filter usage):
+
+ $common_timestamp u64 - timestamp (from ring buffer) associated
+ with the event, in nanoseconds. May be
+ modified by .usecs to have timestamps
+ interpreted as microseconds.
+ cpu int - the cpu on which the event occurred.
+

6.2 'hist' trigger examples
---------------------------
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 4f66f2e..0782766 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -226,6 +226,7 @@ enum hist_field_flags {
HIST_FIELD_FL_VAR_ONLY = 8192,
HIST_FIELD_FL_EXPR = 16384,
HIST_FIELD_FL_VAR_REF = 32768,
+ HIST_FIELD_FL_CPU = 65536,
};

struct var_defs {
@@ -1170,6 +1171,16 @@ static u64 hist_field_timestamp(struct hist_field *hist_field,
return ts;
}

+static u64 hist_field_cpu(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
+{
+ int cpu = raw_smp_processor_id();
+
+ return cpu;
+}
+
static struct hist_field *check_var_ref(struct hist_field *hist_field,
struct hist_trigger_data *var_data,
unsigned int var_idx)
@@ -1518,6 +1529,8 @@ static const char *hist_field_name(struct hist_field *field,
field_name = hist_field_name(field->operands[0], ++level);
else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
field_name = "$common_timestamp";
+ else if (field->flags & HIST_FIELD_FL_CPU)
+ field_name = "cpu";
else if (field->flags & HIST_FIELD_FL_EXPR ||
field->flags & HIST_FIELD_FL_VAR_REF)
field_name = field->name;
@@ -1990,6 +2003,15 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
goto out;
}

+ if (flags & HIST_FIELD_FL_CPU) {
+ hist_field->fn = hist_field_cpu;
+ hist_field->size = sizeof(int);
+ hist_field->type = kstrdup("int", GFP_KERNEL);
+ if (!hist_field->type)
+ goto free;
+ goto out;
+ }
+
if (WARN_ON_ONCE(!field))
goto out;

@@ -2182,7 +2204,9 @@ static struct hist_field *parse_var_ref(struct trace_array *tr,
hist_data->enable_timestamps = true;
if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS)
hist_data->attrs->ts_in_usecs = true;
- } else {
+ } else if (strcmp(field_name, "cpu") == 0)
+ *flags |= HIST_FIELD_FL_CPU;
+ else {
field = trace_find_event_field(file->event_call, field_name);
if (!field || !field->size) {
field = ERR_PTR(-EINVAL);
@@ -3185,7 +3209,6 @@ static int onmatch_create(struct hist_trigger_data *hist_data,
goto out;
}
}
-
if (param[0] == '$')
hist_field = onmatch_find_var(hist_data, data, system,
event_name, param);
@@ -3200,7 +3223,6 @@ static int onmatch_create(struct hist_trigger_data *hist_data,
ret = -EINVAL;
goto out;
}
-
if (check_synth_field(event, hist_field, field_pos) == 0) {
var_ref = create_var_ref(hist_field);
if (!var_ref) {
@@ -4315,6 +4337,8 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)

if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP)
seq_puts(m, "$common_timestamp");
+ else if (hist_field->flags & HIST_FIELD_FL_CPU)
+ seq_puts(m, "cpu");
else if (field_name)
seq_printf(m, "%s", field_name);

--
1.9.3

2017-09-05 21:59:28

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 28/40] tracing: Add support for 'field variables'

Users should be able to directly specify event fields in hist trigger
'actions' rather than being forced to explicitly create a variable for
that purpose.

Add support allowing fields to be used directly in actions, which
essentially does just that - creates 'invisible' variables for each
bare field specified in an action. If a bare field refers to a field
on another (matching) event, it even creates a special histogram for
the purpose (since variables can't be defined on an existing histogram
after histogram creation).

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 452 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 451 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 2906b92..ea006a0 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -254,6 +254,16 @@ struct hist_trigger_attrs {
struct var_defs var_defs;
};

+struct field_var {
+ struct hist_field *var;
+ struct hist_field *val;
+};
+
+struct field_var_hist {
+ struct hist_trigger_data *hist_data;
+ char *cmd;
+};
+
struct hist_trigger_data {
struct hist_field *fields[HIST_FIELDS_MAX];
unsigned int n_vals;
@@ -274,6 +284,14 @@ struct hist_trigger_data {

struct action_data *actions[HIST_ACTIONS_MAX];
unsigned int n_actions;
+
+ struct hist_field *synth_var_refs[SYNTH_FIELDS_MAX];
+ unsigned int n_synth_var_refs;
+ struct field_var *field_vars[SYNTH_FIELDS_MAX];
+ unsigned int n_field_vars;
+ unsigned int n_field_var_str;
+ struct field_var_hist *field_var_hists[SYNTH_FIELDS_MAX];
+ unsigned int n_field_var_hists;
};

struct synth_field {
@@ -1392,6 +1410,7 @@ static struct hist_field *find_event_var(struct trace_array *tr,
struct hist_elt_data {
char *comm;
u64 *var_ref_vals;
+ char *field_var_str[SYNTH_FIELDS_MAX];
};

static u64 hist_field_var_ref(struct hist_field *hist_field,
@@ -1669,7 +1688,14 @@ static inline void save_comm(char *comm, struct task_struct *task)

static void hist_trigger_elt_data_free(struct tracing_map_elt *elt)
{
+ struct hist_trigger_data *hist_data = elt->map->private_data;
struct hist_elt_data *private_data = elt->private_data;
+ unsigned int i, n_str;
+
+ n_str = hist_data->n_field_var_str;
+
+ for (i = 0; i < n_str; i++)
+ kfree(private_data->field_var_str[i]);

kfree(private_data->comm);
kfree(private_data);
@@ -1681,7 +1707,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
unsigned int size = TASK_COMM_LEN + 1;
struct hist_elt_data *elt_data;
struct hist_field *key_field;
- unsigned int i;
+ unsigned int i, n_str;

elt->private_data = elt_data = kzalloc(sizeof(*elt_data), GFP_KERNEL);
if (!elt_data)
@@ -1701,6 +1727,18 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
}
}

+ n_str = hist_data->n_field_var_str;
+
+ size = STR_VAR_LEN_MAX;
+
+ for (i = 0; i < n_str; i++) {
+ elt_data->field_var_str[i] = kzalloc(size, GFP_KERNEL);
+ if (!elt_data->field_var_str[i]) {
+ hist_trigger_elt_data_free(elt);
+ return -ENOMEM;
+ }
+ }
+
return 0;
}

@@ -2347,6 +2385,387 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
return ERR_PTR(ret);
}

+static char *find_trigger_filter(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file)
+{
+ struct event_trigger_data *test;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ if (test->private_data == hist_data)
+ return test->filter_str;
+ }
+ }
+
+ return NULL;
+}
+
+static struct event_command trigger_hist_cmd;
+static int event_hist_trigger_func(struct event_command *cmd_ops,
+ struct trace_event_file *file,
+ char *glob, char *cmd, char *param);
+
+static bool compatible_keys(struct hist_trigger_data *target_hist_data,
+ struct hist_trigger_data *hist_data,
+ unsigned int n_keys)
+{
+ struct hist_field *target_hist_field, *hist_field;
+ unsigned int n, i, j;
+
+ if (hist_data->n_fields - hist_data->n_vals != n_keys)
+ return false;
+
+ i = hist_data->n_vals;
+ j = target_hist_data->n_vals;
+
+ for (n = 0; n < n_keys; n++) {
+ hist_field = hist_data->fields[i + n];
+ target_hist_field = hist_data->fields[j + n];
+
+ if (strcmp(hist_field->type, target_hist_field->type) != 0)
+ return false;
+ if (hist_field->size != target_hist_field->size)
+ return false;
+ if (hist_field->is_signed != target_hist_field->is_signed)
+ return false;
+ }
+
+ return true;
+}
+
+static struct hist_trigger_data *
+find_compatible_hist(struct hist_trigger_data *target_hist_data,
+ struct trace_event_file *file)
+{
+ struct hist_trigger_data *hist_data;
+ struct event_trigger_data *test;
+ unsigned int n_keys;
+
+ n_keys = target_hist_data->n_fields - target_hist_data->n_vals;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ hist_data = test->private_data;
+
+ if (compatible_keys(target_hist_data, hist_data, n_keys))
+ return hist_data;
+ }
+ }
+
+ return NULL;
+}
+
+static struct trace_event_file *event_file(struct trace_array *tr,
+ char *system, char *event_name)
+{
+ struct trace_event_file *file;
+
+ file = find_event_file(tr, system, event_name);
+ if (!file)
+ return ERR_PTR(-EINVAL);
+
+ return file;
+}
+
+static struct hist_field *
+create_field_var_hist(struct hist_trigger_data *target_hist_data,
+ char *system, char *event_name, char *field_name)
+{
+ struct trace_array *tr = target_hist_data->event_file->tr;
+ struct hist_field *event_var = ERR_PTR(-EINVAL);
+ struct hist_trigger_data *hist_data;
+ unsigned int i, n, first = true;
+ struct field_var_hist *var_hist;
+ struct trace_event_file *file;
+ struct hist_field *key_field;
+ char *saved_filter;
+ char *cmd;
+ int ret;
+
+ if (target_hist_data->n_field_var_hists >= SYNTH_FIELDS_MAX)
+ return ERR_PTR(-EINVAL);
+
+ file = event_file(tr, system, event_name);
+
+ if (IS_ERR(file)) {
+ ret = PTR_ERR(file);
+ return ERR_PTR(ret);
+ }
+
+ hist_data = find_compatible_hist(target_hist_data, file);
+ if (!hist_data)
+ return ERR_PTR(-EINVAL);
+
+ var_hist = kzalloc(sizeof(*var_hist), GFP_KERNEL);
+ if (!var_hist)
+ return ERR_PTR(-ENOMEM);
+
+ cmd = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
+ if (!cmd) {
+ kfree(var_hist);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ strcat(cmd, "keys=");
+
+ for_each_hist_key_field(i, hist_data) {
+ key_field = hist_data->fields[i];
+ if (!first)
+ strcat(cmd, ",");
+ strcat(cmd, key_field->field->name);
+ first = false;
+ }
+
+ strcat(cmd, ":synthetic_");
+ strcat(cmd, field_name);
+ strcat(cmd, "=");
+ strcat(cmd, field_name);
+
+ saved_filter = find_trigger_filter(hist_data, file);
+ if (saved_filter) {
+ strcat(cmd, " if ");
+ strcat(cmd, saved_filter);
+ }
+
+ var_hist->cmd = kstrdup(cmd, GFP_KERNEL);
+ if (!var_hist->cmd) {
+ kfree(cmd);
+ kfree(var_hist);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ var_hist->hist_data = hist_data;
+
+ ret = event_hist_trigger_func(&trigger_hist_cmd, file,
+ "", "hist", cmd);
+ if (ret) {
+ kfree(cmd);
+ kfree(var_hist->cmd);
+ kfree(var_hist);
+ return ERR_PTR(ret);
+ }
+
+ strcpy(cmd, "synthetic_");
+ strcat(cmd, field_name);
+
+ event_var = find_event_var(tr, system, event_name, cmd);
+ if (!event_var) {
+ kfree(cmd);
+ kfree(var_hist->cmd);
+ kfree(var_hist);
+ return ERR_PTR(-EINVAL);
+ }
+
+ n = target_hist_data->n_field_var_hists;
+ target_hist_data->field_var_hists[n] = var_hist;
+ target_hist_data->n_field_var_hists++;
+
+ return event_var;
+}
+
+static struct hist_field *
+find_target_event_var(struct hist_trigger_data *hist_data,
+ char *system, char *event_name, char *var_name)
+{
+ struct trace_event_file *file = hist_data->event_file;
+ struct hist_field *hist_field = NULL;
+
+ if (system) {
+ struct trace_event_call *call;
+
+ if (!event_name)
+ return NULL;
+
+ call = file->event_call;
+
+ if (strcmp(system, call->class->system) != 0)
+ return NULL;
+
+ if (strcmp(event_name, trace_event_name(call)) != 0)
+ return NULL;
+ }
+
+ hist_field = find_var_field(hist_data, var_name);
+
+ return hist_field;
+}
+
+static inline void __update_field_vars(struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *rec,
+ struct field_var **field_vars,
+ unsigned int n_field_vars,
+ unsigned int field_var_str_start)
+{
+ struct hist_elt_data *elt_data = elt->private_data;
+ unsigned int i, j, var_idx;
+ u64 var_val;
+
+ for (i = 0, j = field_var_str_start; i < n_field_vars; i++) {
+ struct field_var *field_var = field_vars[i];
+ struct hist_field *var = field_var->var;
+ struct hist_field *val = field_var->val;
+
+ var_val = val->fn(val, elt, rbe, rec);
+ var_idx = var->var.idx;
+
+ if (val->flags & HIST_FIELD_FL_STRING) {
+ char *str = elt_data->field_var_str[j++];
+ char *val_str = (char *)(uintptr_t)var_val;
+
+ strncpy(str, val_str, STR_VAR_LEN_MAX);
+ var_val = (u64)(uintptr_t)str;
+ }
+ tracing_map_set_var(elt, var_idx, var_val);
+ }
+}
+
+static void update_field_vars(struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *rec)
+{
+ __update_field_vars(elt, rbe, rec, hist_data->field_vars,
+ hist_data->n_field_vars, 0);
+}
+
+static struct hist_field *create_var(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file,
+ char *name, int size, const char *type)
+{
+ struct hist_field *var;
+ int idx;
+
+ if (find_var(file, name) && !hist_data->remove) {
+ var = ERR_PTR(-EINVAL);
+ goto out;
+ }
+
+ var = kzalloc(sizeof(struct hist_field), GFP_KERNEL);
+ if (!var) {
+ var = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+
+ idx = tracing_map_add_var(hist_data->map);
+ if (idx < 0) {
+ kfree(var);
+ var = ERR_PTR(-EINVAL);
+ goto out;
+ }
+
+ var->flags = HIST_FIELD_FL_VAR;
+ var->var.idx = idx;
+ var->var.hist_data = var->hist_data = hist_data;
+ var->size = size;
+ var->var.name = kstrdup(name, GFP_KERNEL);
+ var->type = kstrdup(type, GFP_KERNEL);
+ if (!var->var.name || !var->type) {
+ kfree(var->var.name);
+ kfree(var->type);
+ kfree(var);
+ var = ERR_PTR(-ENOMEM);
+ }
+ out:
+ return var;
+}
+
+static struct field_var *create_field_var(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file,
+ char *field_name)
+{
+ struct hist_field *val = NULL, *var = NULL;
+ unsigned long flags = HIST_FIELD_FL_VAR;
+ struct field_var *field_var;
+ int ret = 0;
+
+ if (hist_data->n_field_vars >= SYNTH_FIELDS_MAX) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ val = parse_atom(hist_data, file, field_name, &flags, NULL);
+ if (IS_ERR(val)) {
+ ret = PTR_ERR(val);
+ goto err;
+ }
+
+ var = create_var(hist_data, file, field_name, val->size, val->type);
+ if (IS_ERR(var)) {
+ kfree(val);
+ ret = PTR_ERR(var);
+ goto err;
+ }
+
+ field_var = kzalloc(sizeof(struct field_var), GFP_KERNEL);
+ if (!field_var) {
+ kfree(val);
+ kfree(var);
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ field_var->var = var;
+ field_var->val = val;
+ out:
+ return field_var;
+ err:
+ field_var = ERR_PTR(ret);
+ goto out;
+}
+
+static struct field_var *
+create_target_field_var(struct hist_trigger_data *hist_data,
+ char *system, char *event_name, char *var_name)
+{
+ struct trace_event_file *file = hist_data->event_file;
+
+ if (system) {
+ struct trace_event_call *call;
+
+ if (!event_name)
+ return NULL;
+
+ call = file->event_call;
+
+ if (strcmp(system, call->class->system) != 0)
+ return NULL;
+
+ if (strcmp(event_name, trace_event_name(call)) != 0)
+ return NULL;
+ }
+
+ return create_field_var(hist_data, file, var_name);
+}
+
+static void destroy_field_var(struct field_var *field_var)
+{
+ if (!field_var)
+ return;
+
+ destroy_hist_field(field_var->var, 0);
+ destroy_hist_field(field_var->val, 0);
+
+ kfree(field_var);
+}
+
+static void destroy_field_vars(struct hist_trigger_data *hist_data)
+{
+ unsigned int i;
+
+ for (i = 0; i < hist_data->n_field_vars; i++)
+ destroy_field_var(hist_data->field_vars[i]);
+}
+
+static void save_field_var(struct hist_trigger_data *hist_data,
+ struct field_var *field_var)
+{
+ hist_data->field_vars[hist_data->n_field_vars++] = field_var;
+
+ if (field_var->val->flags & HIST_FIELD_FL_STRING)
+ hist_data->n_field_var_str++;
+}
+
static int create_hitcount_val(struct hist_trigger_data *hist_data)
{
hist_data->fields[HITCOUNT_IDX] =
@@ -2814,6 +3233,16 @@ static void print_actions_spec(struct seq_file *m,
}
}

+static void destroy_field_var_hists(struct hist_trigger_data *hist_data)
+{
+ unsigned int i;
+
+ for (i = 0; i < hist_data->n_field_var_hists; i++) {
+ kfree(hist_data->field_var_hists[i]->cmd);
+ kfree(hist_data->field_var_hists[i]);
+ }
+}
+
static void destroy_hist_data(struct hist_trigger_data *hist_data)
{
if (!hist_data)
@@ -2824,6 +3253,8 @@ static void destroy_hist_data(struct hist_trigger_data *hist_data)
tracing_map_destroy(hist_data->map);

destroy_actions(hist_data);
+ destroy_field_vars(hist_data);
+ destroy_field_var_hists(hist_data);

kfree(hist_data);
}
@@ -2957,6 +3388,8 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
tracing_map_set_var(elt, var_idx, hist_val);
}
}
+
+ update_field_vars(hist_data, elt, rbe, rec);
}

static inline void add_to_key(char *compound_key, void *key,
@@ -3677,6 +4110,21 @@ static bool hist_trigger_check_refs(struct event_trigger_data *data,
return false;
}

+static void unregister_field_var_hists(struct hist_trigger_data *hist_data)
+{
+ struct trace_event_file *file;
+ unsigned int i;
+ char *cmd;
+ int ret;
+
+ for (i = 0; i < hist_data->n_field_var_hists; i++) {
+ file = hist_data->field_var_hists[i]->hist_data->event_file;
+ cmd = hist_data->field_var_hists[i]->cmd;
+ ret = event_hist_trigger_func(&trigger_hist_cmd, file,
+ "!hist", "hist", cmd);
+ }
+}
+
static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,
struct event_trigger_data *data,
struct trace_event_file *file)
@@ -3692,6 +4140,7 @@ static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,
if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
if (!hist_trigger_match(data, test, named_data, false))
continue;
+ unregister_field_var_hists(test->private_data);
unregistered = true;
list_del_rcu(&test->list);
trace_event_trigger_enable_disable(file, 0);
@@ -3733,6 +4182,7 @@ static void hist_unreg_all(struct trace_event_file *file)

list_for_each_entry_safe(test, n, &file->triggers, list) {
if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ unregister_field_var_hists(test->private_data);
list_del_rcu(&test->list);
trace_event_trigger_enable_disable(file, 0);
update_cond_flag(file);
--
1.9.3

2017-09-05 22:03:18

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 31/40] tracing: Allow whitespace to surround hist trigger filter

The existing code only allows for one space before and after the 'if'
specifying the filter for a hist trigger. Add code to make that more
permissive as far as whitespace goes.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0a398f3..4f66f2e 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -4819,7 +4819,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
struct synth_event *se;
const char *se_name;
bool remove = false;
- char *trigger;
+ char *trigger, *p;
int ret = 0;

if (!param)
@@ -4829,9 +4829,19 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
remove = true;

/* separate the trigger from the filter (k:v [if filter]) */
- trigger = strsep(&param, " \t");
- if (!trigger)
- return -EINVAL;
+ trigger = param;
+ p = strstr(param, " if");
+ if (!p)
+ p = strstr(param, "\tif");
+ if (p) {
+ if (p == trigger)
+ return -EINVAL;
+ param = p + 1;
+ param = strstrip(param);
+ *p = '\0';
+ trigger = strstrip(trigger);
+ } else
+ param = NULL;

attrs = parse_hist_trigger_attrs(trigger);
if (IS_ERR(attrs))
@@ -4889,6 +4899,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
}

ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
+
/*
* The above returns on success the # of triggers registered,
* but if it didn't register any it returns zero. Consider no
--
1.9.3

2017-09-05 21:59:12

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 23/40] tracing: Add hist_field 'type' field

Future support for synthetic events requires hist_field 'type'
information, so add a field for that.

Also, make other hist_field attribute usage consistent (size,
is_signed, etc).

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index c7dbe37..4650c22 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -54,6 +54,7 @@ struct hist_field {
unsigned int size;
unsigned int offset;
unsigned int is_signed;
+ const char *type;
struct hist_field *operands[HIST_FIELD_OPERANDS_MAX];
struct hist_trigger_data *hist_data;
struct hist_var var;
@@ -650,6 +651,7 @@ static void destroy_hist_field(struct hist_field *hist_field,

kfree(hist_field->var.name);
kfree(hist_field->name);
+ kfree(hist_field->type);

kfree(hist_field);
}
@@ -675,6 +677,10 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,

if (flags & HIST_FIELD_FL_HITCOUNT) {
hist_field->fn = hist_field_counter;
+ hist_field->size = sizeof(u64);
+ hist_field->type = kstrdup("u64", GFP_KERNEL);
+ if (!hist_field->type)
+ goto free;
goto out;
}

@@ -688,12 +694,18 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
hist_field->fn = hist_field_log2;
hist_field->operands[0] = create_hist_field(hist_data, field, fl, NULL);
hist_field->size = hist_field->operands[0]->size;
+ hist_field->type = kstrdup(hist_field->operands[0]->type, GFP_KERNEL);
+ if (!hist_field->type)
+ goto free;
goto out;
}

if (flags & HIST_FIELD_FL_TIMESTAMP) {
hist_field->fn = hist_field_timestamp;
hist_field->size = sizeof(u64);
+ hist_field->type = kstrdup("u64", GFP_KERNEL);
+ if (!hist_field->type)
+ goto free;
goto out;
}

@@ -703,6 +715,11 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
if (is_string_field(field)) {
flags |= HIST_FIELD_FL_STRING;

+ hist_field->size = MAX_FILTER_STR_VAL;
+ hist_field->type = kstrdup(field->type, GFP_KERNEL);
+ if (!hist_field->type)
+ goto free;
+
if (field->filter_type == FILTER_STATIC_STRING)
hist_field->fn = hist_field_string;
else if (field->filter_type == FILTER_DYN_STRING)
@@ -710,6 +727,12 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
else
hist_field->fn = hist_field_pstring;
} else {
+ hist_field->size = field->size;
+ hist_field->is_signed = field->is_signed;
+ hist_field->type = kstrdup(field->type, GFP_KERNEL);
+ if (!hist_field->type)
+ goto free;
+
hist_field->fn = select_value_fn(field->size,
field->is_signed);
if (!hist_field->fn) {
@@ -917,6 +940,11 @@ static struct hist_field *parse_unary(struct hist_trigger_data *hist_data,
expr->operands[0] = operand1;
expr->operator = FIELD_OP_UNARY_MINUS;
expr->name = expr_str(expr, 0);
+ expr->type = kstrdup(operand1->type, GFP_KERNEL);
+ if (!expr->type) {
+ ret = -ENOMEM;
+ goto free;
+ }

return expr;
free:
@@ -1005,6 +1033,11 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
expr->operands[1] = operand2;
expr->operator = field_op;
expr->name = expr_str(expr, 0);
+ expr->type = kstrdup(operand1->type, GFP_KERNEL);
+ if (!expr->type) {
+ ret = -ENOMEM;
+ goto free;
+ }

switch (field_op) {
case FIELD_OP_MINUS:
--
1.9.3

2017-09-05 22:03:45

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 26/40] tracing: Add hist trigger action hook

Add a hook for executing extra actions whenever a histogram entry is
added or updated.

The default 'action' when a hist entry is added to a histogram is to
update the set of values associated with it. Some applications may
want to perform additional actions at that point, such as generate
another event, or compare and save a maximum.

Add a simple framework for doing that; specific actions will be
implemented on top of it in later patches.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 114 +++++++++++++++++++++++++++++++++++++--
1 file changed, 111 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 397dca1..c3edf7a 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -33,6 +33,7 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field,

#define HIST_FIELD_OPERANDS_MAX 2
#define HIST_FIELDS_MAX (TRACING_MAP_FIELDS_MAX + TRACING_MAP_VARS_MAX)
+#define HIST_ACTIONS_MAX 8

enum field_op_id {
FIELD_OP_NONE,
@@ -241,6 +242,9 @@ struct hist_trigger_attrs {
char *assignment_str[TRACING_MAP_VARS_MAX];
unsigned int n_assignments;

+ char *action_str[HIST_ACTIONS_MAX];
+ unsigned int n_actions;
+
struct var_defs var_defs;
};

@@ -261,6 +265,21 @@ struct hist_trigger_data {
bool remove;
struct hist_field *var_refs[TRACING_MAP_VARS_MAX];
unsigned int n_var_refs;
+
+ struct action_data *actions[HIST_ACTIONS_MAX];
+ unsigned int n_actions;
+};
+
+struct action_data;
+
+typedef void (*action_fn_t) (struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt, void *rec,
+ struct ring_buffer_event *rbe,
+ struct action_data *data, u64 *var_ref_vals);
+
+struct action_data {
+ action_fn_t fn;
+ unsigned int var_ref_idx;
};

static u64 hist_field_timestamp(struct hist_field *hist_field,
@@ -710,6 +729,9 @@ static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
for (i = 0; i < attrs->n_assignments; i++)
kfree(attrs->assignment_str[i]);

+ for (i = 0; i < attrs->n_actions; i++)
+ kfree(attrs->action_str[i]);
+
kfree(attrs->name);
kfree(attrs->sort_key_str);
kfree(attrs->keys_str);
@@ -717,6 +739,16 @@ static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
kfree(attrs);
}

+static int parse_action(char *str, struct hist_trigger_attrs *attrs)
+{
+ int ret = 0;
+
+ if (attrs->n_actions >= HIST_ACTIONS_MAX)
+ return ret;
+
+ return ret;
+}
+
static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
{
int ret = 0;
@@ -784,8 +816,9 @@ static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
else if (strcmp(str, "clear") == 0)
attrs->clear = true;
else {
- ret = -EINVAL;
- goto free;
+ ret = parse_action(str, attrs);
+ if (ret)
+ goto free;
}
}

@@ -1917,11 +1950,63 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
return ret;
}

+static void destroy_actions(struct hist_trigger_data *hist_data)
+{
+ unsigned int i;
+
+ for (i = 0; i < hist_data->n_actions; i++) {
+ struct action_data *data = hist_data->actions[i];
+
+ kfree(data);
+ }
+}
+
+static int create_actions(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file)
+{
+ unsigned int i;
+ int ret = 0;
+ char *str;
+
+ for (i = 0; i < hist_data->attrs->n_actions; i++) {
+ str = hist_data->attrs->action_str[i];
+ }
+
+ return ret;
+}
+
+static void print_actions(struct seq_file *m,
+ struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt)
+{
+ unsigned int i;
+
+ for (i = 0; i < hist_data->n_actions; i++) {
+ struct action_data *data = hist_data->actions[i];
+ }
+}
+
+static void print_actions_spec(struct seq_file *m,
+ struct hist_trigger_data *hist_data)
+{
+ unsigned int i;
+
+ for (i = 0; i < hist_data->n_actions; i++) {
+ struct action_data *data = hist_data->actions[i];
+ }
+}
+
static void destroy_hist_data(struct hist_trigger_data *hist_data)
{
+ if (!hist_data)
+ return;
+
destroy_hist_trigger_attrs(hist_data->attrs);
destroy_hist_fields(hist_data);
tracing_map_destroy(hist_data->map);
+
+ destroy_actions(hist_data);
+
kfree(hist_data);
}

@@ -2080,6 +2165,20 @@ static inline void add_to_key(char *compound_key, void *key,
memcpy(compound_key + key_field->offset, key, size);
}

+static void
+hist_trigger_actions(struct hist_trigger_data *hist_data,
+ struct tracing_map_elt *elt, void *rec,
+ struct ring_buffer_event *rbe, u64 *var_ref_vals)
+{
+ struct action_data *data;
+ unsigned int i;
+
+ for (i = 0; i < hist_data->n_actions; i++) {
+ data = hist_data->actions[i];
+ data->fn(hist_data, elt, rec, rbe, data, var_ref_vals);
+ }
+}
+
static void event_hist_trigger(struct event_trigger_data *data, void *rec,
struct ring_buffer_event *rbe)
{
@@ -2135,6 +2234,9 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
return;

hist_trigger_elt_update(hist_data, elt, rec, rbe, var_ref_vals);
+
+ if (resolve_var_refs(hist_data, key, var_ref_vals, true))
+ hist_trigger_actions(hist_data, elt, rec, rbe, var_ref_vals);
}

static void hist_trigger_stacktrace_print(struct seq_file *m,
@@ -2450,6 +2552,8 @@ static int event_hist_trigger_print(struct seq_file *m,
}
seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits));

+ print_actions_spec(m, hist_data);
+
if (data->filter_str)
seq_printf(m, " if %s", data->filter_str);

@@ -2910,6 +3014,10 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
if (has_hist_vars(hist_data))
save_hist_vars(hist_data);

+ ret = create_actions(hist_data, file);
+ if (ret)
+ goto out_unreg;
+
ret = tracing_map_init(hist_data->map);
if (ret)
goto out_unreg;
@@ -2931,8 +3039,8 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
remove_hist_vars(hist_data);

kfree(trigger_data);
-
destroy_hist_data(hist_data);
+
goto out;
}

--
1.9.3

2017-09-05 22:04:15

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 24/40] tracing: Add variable reference handling to hist triggers

Add the necessary infrastructure to allow the variables defined on one
event to be referenced in another. This allows variables set by a
previous event to be referenced and used in expressions combining the
variable values saved by that previous event and the event fields of
the current event. For example, here's how a latency can be
calculated and saved into yet another variable named 'wakeup_lat':

# echo 'hist:keys=pid,prio:ts0=common_timestamp ...
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp-$ts0 ...

In the first event, the event's timetamp is saved into the variable
ts0. In the next line, ts0 is subtracted from the second event's
timestamp to produce the latency.

Further users of variable references will be described in subsequent
patches, such as for instance how the 'wakeup_lat' variable above can
be displayed in a latency histogram.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace.c | 2 +
kernel/trace/trace.h | 3 +
kernel/trace/trace_events_hist.c | 606 ++++++++++++++++++++++++++++++++----
kernel/trace/trace_events_trigger.c | 6 +
4 files changed, 561 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 6ee3a88..d40839d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -7746,6 +7746,7 @@ static int instance_mkdir(const char *name)

INIT_LIST_HEAD(&tr->systems);
INIT_LIST_HEAD(&tr->events);
+ INIT_LIST_HEAD(&tr->hist_vars);

if (allocate_trace_buffers(tr, trace_buf_size) < 0)
goto out_free_tr;
@@ -8489,6 +8490,7 @@ __init static int tracer_alloc_buffers(void)

INIT_LIST_HEAD(&global_trace.systems);
INIT_LIST_HEAD(&global_trace.events);
+ INIT_LIST_HEAD(&global_trace.hist_vars);
list_add(&global_trace.list, &ftrace_trace_arrays);

apply_trace_boot_options();
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 02bfd5c..7b78762 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -273,6 +273,7 @@ struct trace_array {
int function_enabled;
#endif
int time_stamp_abs_ref;
+ struct list_head hist_vars;
};

enum {
@@ -1547,6 +1548,8 @@ extern int save_named_trigger(const char *name,
extern void unpause_named_trigger(struct event_trigger_data *data);
extern void set_named_trigger_data(struct event_trigger_data *data,
struct event_trigger_data *named_data);
+extern struct event_trigger_data *
+get_named_trigger_data(struct event_trigger_data *data);
extern int register_event_command(struct event_command *cmd);
extern int unregister_event_command(struct event_command *cmd);
extern int register_trigger_hist_enable_disable_cmds(void);
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 4650c22..397dca1 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -60,6 +60,9 @@ struct hist_field {
struct hist_var var;
enum field_op_id operator;
char *name;
+ unsigned int var_idx;
+ unsigned int var_ref_idx;
+ bool read_once;
};

static u64 hist_field_none(struct hist_field *field,
@@ -215,6 +218,7 @@ enum hist_field_flags {
HIST_FIELD_FL_VAR = 4096,
HIST_FIELD_FL_VAR_ONLY = 8192,
HIST_FIELD_FL_EXPR = 16384,
+ HIST_FIELD_FL_VAR_REF = 32768,
};

struct var_defs {
@@ -255,6 +259,8 @@ struct hist_trigger_data {
struct tracing_map *map;
bool enable_timestamps;
bool remove;
+ struct hist_field *var_refs[TRACING_MAP_VARS_MAX];
+ unsigned int n_var_refs;
};

static u64 hist_field_timestamp(struct hist_field *hist_field,
@@ -273,10 +279,344 @@ static u64 hist_field_timestamp(struct hist_field *hist_field,
return ts;
}

+struct hist_var_data {
+ struct list_head list;
+ struct hist_trigger_data *hist_data;
+};
+
+static struct hist_field *check_var_ref(struct hist_field *hist_field,
+ struct hist_trigger_data *var_data,
+ unsigned int var_idx)
+{
+ struct hist_field *found = NULL;
+
+ if (hist_field && hist_field->flags & HIST_FIELD_FL_VAR_REF) {
+ if (hist_field->var.idx == var_idx &&
+ hist_field->var.hist_data == var_data) {
+ found = hist_field;
+ }
+ }
+
+ return found;
+}
+
+static struct hist_field *find_var_ref(struct hist_trigger_data *hist_data,
+ struct hist_trigger_data *var_data,
+ unsigned int var_idx)
+{
+ struct hist_field *hist_field, *found = NULL;
+ unsigned int i, j;
+
+ for_each_hist_field(i, hist_data) {
+ hist_field = hist_data->fields[i];
+ found = check_var_ref(hist_field, var_data, var_idx);
+ if (found)
+ return found;
+
+ for (j = 0; j < HIST_FIELD_OPERANDS_MAX; j++) {
+ struct hist_field *operand;
+
+ operand = hist_field->operands[j];
+ found = check_var_ref(operand, var_data, var_idx);
+ if (found)
+ return found;
+ }
+ }
+
+ return found;
+}
+
+static struct hist_field *find_any_var_ref(struct hist_trigger_data *hist_data,
+ unsigned int var_idx)
+{
+ struct trace_array *tr = hist_data->event_file->tr;
+ struct hist_field *found = NULL;
+ struct hist_var_data *var_data;
+
+ list_for_each_entry(var_data, &tr->hist_vars, list) {
+ found = find_var_ref(var_data->hist_data, hist_data, var_idx);
+ if (found)
+ break;
+ }
+
+ return found;
+}
+
+static bool check_var_refs(struct hist_trigger_data *hist_data)
+{
+ struct hist_field *field;
+ bool found = false;
+ int i;
+
+ for_each_hist_field(i, hist_data) {
+ field = hist_data->fields[i];
+ if (field && field->flags & HIST_FIELD_FL_VAR) {
+ if (find_any_var_ref(hist_data, field->var.idx)) {
+ found = true;
+ break;
+ }
+ }
+ }
+
+ return found;
+}
+
+static struct hist_var_data *find_hist_vars(struct hist_trigger_data *hist_data)
+{
+ struct trace_array *tr = hist_data->event_file->tr;
+ struct hist_var_data *var_data, *found = NULL;
+
+ list_for_each_entry(var_data, &tr->hist_vars, list) {
+ if (var_data->hist_data == hist_data) {
+ found = var_data;
+ break;
+ }
+ }
+
+ return found;
+}
+
+static bool has_hist_vars(struct hist_trigger_data *hist_data)
+{
+ struct hist_field *hist_field;
+ int i, j;
+
+ for_each_hist_field(i, hist_data) {
+ hist_field = hist_data->fields[i];
+ if (hist_field &&
+ (hist_field->flags & HIST_FIELD_FL_VAR ||
+ hist_field->flags & HIST_FIELD_FL_VAR_REF))
+ return true;
+
+ for (j = 0; j < HIST_FIELD_OPERANDS_MAX; j++) {
+ struct hist_field *operand;
+
+ operand = hist_field->operands[j];
+ if (operand &&
+ (operand->flags & HIST_FIELD_FL_VAR ||
+ operand->flags & HIST_FIELD_FL_VAR_REF))
+ return true;
+ }
+ }
+
+ return false;
+}
+
+static int save_hist_vars(struct hist_trigger_data *hist_data)
+{
+ struct trace_array *tr = hist_data->event_file->tr;
+ struct hist_var_data *var_data;
+
+ var_data = find_hist_vars(hist_data);
+ if (var_data)
+ return 0;
+
+ if (trace_array_get(tr) < 0)
+ return -ENODEV;
+
+ var_data = kzalloc(sizeof(*var_data), GFP_KERNEL);
+ if (!var_data) {
+ trace_array_put(tr);
+ return -ENOMEM;
+ }
+
+ var_data->hist_data = hist_data;
+ list_add(&var_data->list, &tr->hist_vars);
+
+ return 0;
+}
+
+static void remove_hist_vars(struct hist_trigger_data *hist_data)
+{
+ struct trace_array *tr = hist_data->event_file->tr;
+ struct hist_var_data *var_data;
+
+ var_data = find_hist_vars(hist_data);
+ if (!var_data)
+ return;
+
+ if (WARN_ON(check_var_refs(hist_data)))
+ return;
+
+ list_del(&var_data->list);
+
+ kfree(var_data);
+
+ trace_array_put(tr);
+}
+
+static struct hist_field *find_var_field(struct hist_trigger_data *hist_data,
+ const char *var_name)
+{
+ struct hist_field *hist_field, *found = NULL;
+ int i;
+
+ for_each_hist_field(i, hist_data) {
+ hist_field = hist_data->fields[i];
+ if (hist_field && hist_field->flags & HIST_FIELD_FL_VAR &&
+ strcmp(hist_field->var.name, var_name) == 0) {
+ found = hist_field;
+ break;
+ }
+ }
+
+ return found;
+}
+
+static struct hist_field *find_var(struct trace_event_file *file,
+ const char *var_name)
+{
+ struct hist_trigger_data *hist_data;
+ struct event_trigger_data *test;
+ struct hist_field *hist_field;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ hist_data = test->private_data;
+ hist_field = find_var_field(hist_data, var_name);
+ if (hist_field)
+ return hist_field;
+ }
+ }
+
+ return NULL;
+}
+
+static struct trace_event_file *find_var_file(struct trace_array *tr,
+ const char *system,
+ const char *event_name,
+ const char *var_name)
+{
+ struct hist_trigger_data *var_hist_data;
+ struct hist_var_data *var_data;
+ struct trace_event_call *call;
+ struct trace_event_file *file;
+ const char *name;
+
+ list_for_each_entry(var_data, &tr->hist_vars, list) {
+ var_hist_data = var_data->hist_data;
+ file = var_hist_data->event_file;
+ call = file->event_call;
+ name = trace_event_name(call);
+
+ if (!system || !event_name) {
+ if (find_var(file, var_name))
+ return file;
+ continue;
+ }
+
+ if (strcmp(event_name, name) != 0)
+ continue;
+ if (strcmp(system, call->class->system) != 0)
+ continue;
+
+ return file;
+ }
+
+ return NULL;
+}
+
+static struct hist_field *find_file_var(struct trace_event_file *file,
+ const char *var_name)
+{
+ struct hist_trigger_data *test_data;
+ struct event_trigger_data *test;
+ struct hist_field *hist_field;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ test_data = test->private_data;
+ hist_field = find_var_field(test_data, var_name);
+ if (hist_field)
+ return hist_field;
+ }
+ }
+
+ return NULL;
+}
+
+static struct hist_field *find_event_var(struct trace_array *tr,
+ const char *system,
+ const char *event_name,
+ const char *var_name)
+{
+ struct hist_field *hist_field = NULL;
+ struct trace_event_file *file;
+
+ file = find_var_file(tr, system, event_name, var_name);
+ if (!file)
+ return NULL;
+
+ hist_field = find_file_var(file, var_name);
+
+ return hist_field;
+}
+
struct hist_elt_data {
char *comm;
+ u64 *var_ref_vals;
};

+static u64 hist_field_var_ref(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
+{
+ struct hist_elt_data *elt_data;
+ u64 var_val = 0;
+
+ elt_data = elt->private_data;
+ var_val = elt_data->var_ref_vals[hist_field->var_ref_idx];
+
+ return var_val;
+}
+
+static bool resolve_var_refs(struct hist_trigger_data *hist_data, void *key,
+ u64 *var_ref_vals, bool self)
+{
+ struct hist_trigger_data *var_data;
+ struct tracing_map_elt *var_elt;
+ struct hist_field *hist_field;
+ unsigned int i, var_idx;
+ bool resolved = true;
+ u64 var_val = 0;
+
+ for (i = 0; i < hist_data->n_var_refs; i++) {
+ hist_field = hist_data->var_refs[i];
+ var_idx = hist_field->var.idx;
+ var_data = hist_field->var.hist_data;
+
+ if (var_data == NULL) {
+ resolved = false;
+ break;
+ }
+
+ if ((self && var_data != hist_data) ||
+ (!self && var_data == hist_data))
+ continue;
+
+ var_elt = tracing_map_lookup(var_data->map, key);
+ if (!var_elt) {
+ resolved = false;
+ break;
+ }
+
+ if (!tracing_map_var_set(var_elt, var_idx)) {
+ resolved = false;
+ break;
+ }
+
+ if (self || !hist_field->read_once)
+ var_val = tracing_map_read_var(var_elt, var_idx);
+ else
+ var_val = tracing_map_read_var_once(var_elt, var_idx);
+
+ var_ref_vals[i] = var_val;
+ }
+
+ return resolved;
+}
+
static const char *hist_field_name(struct hist_field *field,
unsigned int level)
{
@@ -291,7 +631,8 @@ static const char *hist_field_name(struct hist_field *field,
field_name = hist_field_name(field->operands[0], ++level);
else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
field_name = "$common_timestamp";
- else if (field->flags & HIST_FIELD_FL_EXPR)
+ else if (field->flags & HIST_FIELD_FL_EXPR ||
+ field->flags & HIST_FIELD_FL_VAR_REF)
field_name = field->name;

if (field_name == NULL)
@@ -574,6 +915,8 @@ static char *expr_str(struct hist_field *field, unsigned int level)
return expr;
}

+ if (field->operands[0]->flags & HIST_FIELD_FL_VAR_REF)
+ strcat(expr, "$");
strcat(expr, hist_field_name(field->operands[0], 0));
if (field->operands[0]->flags) {
const char *flags_str = get_hist_field_flags(field->operands[0]);
@@ -596,6 +939,8 @@ static char *expr_str(struct hist_field *field, unsigned int level)
return NULL;
}

+ if (field->operands[1]->flags & HIST_FIELD_FL_VAR_REF)
+ strcat(expr, "$");
strcat(expr, hist_field_name(field->operands[1], 0));
if (field->operands[1]->flags) {
const char *flags_str = get_hist_field_flags(field->operands[1]);
@@ -675,6 +1020,11 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
if (flags & HIST_FIELD_FL_EXPR)
goto out; /* caller will populate */

+ if (flags & HIST_FIELD_FL_VAR_REF) {
+ hist_field->fn = hist_field_var_ref;
+ goto out;
+ }
+
if (flags & HIST_FIELD_FL_HITCOUNT) {
hist_field->fn = hist_field_counter;
hist_field->size = sizeof(u64);
@@ -768,6 +1118,51 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
}
}

+static int init_var_ref(struct hist_field *ref_field,
+ struct hist_field *var_field)
+{
+ ref_field->var.idx = var_field->var.idx;
+ ref_field->var.hist_data = var_field->hist_data;
+ ref_field->size = var_field->size;
+ ref_field->is_signed = var_field->is_signed;
+
+ ref_field->name = kstrdup(var_field->var.name, GFP_KERNEL);
+ if (!ref_field->name)
+ return -ENOMEM;
+
+ ref_field->type = kstrdup(var_field->type, GFP_KERNEL);
+ if (!ref_field->type) {
+ kfree(ref_field->name);
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static struct hist_field *create_var_ref(struct hist_field *var_field)
+{
+ unsigned long flags = HIST_FIELD_FL_VAR_REF;
+ struct hist_field *ref_field;
+
+ ref_field = create_hist_field(var_field->hist_data, NULL, flags, NULL);
+ if (ref_field) {
+ if (init_var_ref(ref_field, var_field)) {
+ destroy_hist_field(ref_field, 0);
+ return NULL;
+ }
+ }
+
+ return ref_field;
+}
+
+static bool is_var_ref(char *var_name)
+{
+ if (!var_name || strlen(var_name) < 2 || var_name[0] != '$')
+ return false;
+
+ return true;
+}
+
static char *field_name_from_var(struct hist_trigger_data *hist_data,
char *var_name)
{
@@ -779,7 +1174,7 @@ static char *field_name_from_var(struct hist_trigger_data *hist_data,

if (strcmp(var_name, name) == 0) {
field = hist_data->attrs->var_defs.expr[i];
- if (contains_operator(field))
+ if (contains_operator(field) || is_var_ref(field))
continue;
return field;
}
@@ -791,11 +1186,32 @@ static char *field_name_from_var(struct hist_trigger_data *hist_data,
static char *local_field_var_ref(struct hist_trigger_data *hist_data,
char *var_name)
{
+ if (!is_var_ref(var_name))
+ return NULL;
+
var_name++;

return field_name_from_var(hist_data, var_name);
}

+static struct hist_field *parse_var_ref(struct trace_array *tr,
+ char *system, char *event_name,
+ char *var_name)
+{
+ struct hist_field *var_field = NULL, *ref_field = NULL;
+
+ if (!is_var_ref(var_name))
+ return NULL;
+
+ var_name++;
+
+ var_field = find_event_var(tr, system, event_name, var_name);
+ if (var_field)
+ ref_field = create_var_ref(var_field);
+
+ return ref_field;
+}
+
static struct ftrace_event_field *
parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
char *field_str, unsigned long *flags)
@@ -852,13 +1268,31 @@ struct hist_field *parse_atom(struct hist_trigger_data *hist_data,
struct trace_event_file *file, char *str,
unsigned long *flags, char *var_name)
{
- char *s;
+ char *s, *ref_system = NULL, *ref_event = NULL, *ref_var = str;
+ struct trace_array *tr = hist_data->event_file->tr;
struct ftrace_event_field *field = NULL;
struct hist_field *hist_field = NULL;
int ret = 0;

- s = local_field_var_ref(hist_data, str);
- if (s)
+ s = strchr(str, '.');
+ if (s) {
+ s = strchr(++s, '.');
+ if (s) {
+ ref_system = strsep(&str, ".");
+ ref_event = strsep(&str, ".");
+ ref_var = str;
+ }
+ }
+
+ s = local_field_var_ref(hist_data, ref_var);
+ if (!s) {
+ hist_field = parse_var_ref(tr, ref_system, ref_event, ref_var);
+ if (hist_field) {
+ hist_data->var_refs[hist_data->n_var_refs] = hist_field;
+ hist_field->var_ref_idx = hist_data->n_var_refs++;
+ return hist_field;
+ }
+ } else
str = s;

field = parse_field(hist_data, file, str, flags);
@@ -1029,6 +1463,9 @@ static struct hist_field *parse_expr(struct hist_trigger_data *hist_data,
goto free;
}

+ operand1->read_once = true;
+ operand2->read_once = true;
+
expr->operands[0] = operand1;
expr->operands[1] = operand2;
expr->operator = field_op;
@@ -1075,43 +1512,6 @@ static int create_hitcount_val(struct hist_trigger_data *hist_data)
return 0;
}

-static struct hist_field *find_var_field(struct hist_trigger_data *hist_data,
- const char *var_name)
-{
- struct hist_field *hist_field, *found = NULL;
- int i;
-
- for_each_hist_field(i, hist_data) {
- hist_field = hist_data->fields[i];
- if (hist_field && hist_field->flags & HIST_FIELD_FL_VAR &&
- strcmp(hist_field->var.name, var_name) == 0) {
- found = hist_field;
- break;
- }
- }
-
- return found;
-}
-
-static struct hist_field *find_var(struct trace_event_file *file,
- const char *var_name)
-{
- struct hist_trigger_data *hist_data;
- struct event_trigger_data *test;
- struct hist_field *hist_field;
-
- list_for_each_entry_rcu(test, &file->triggers, list) {
- if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
- hist_data = test->private_data;
- hist_field = find_var_field(hist_data, var_name);
- if (hist_field)
- return hist_field;
- }
- }
-
- return NULL;
-}
-
static int __create_val_field(struct hist_trigger_data *hist_data,
unsigned int val_idx,
struct trace_event_file *file,
@@ -1245,6 +1645,12 @@ static int create_key_field(struct hist_trigger_data *hist_data,
goto out;
}

+ if (hist_field->flags & HIST_FIELD_FL_VAR_REF) {
+ destroy_hist_field(hist_field, 0);
+ ret = -EINVAL;
+ goto out;
+ }
+
key_size = hist_field->size;
}

@@ -1580,6 +1986,7 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)

hist_data->attrs = attrs;
hist_data->remove = remove;
+ hist_data->event_file = file;

ret = create_hist_fields(hist_data, file);
if (ret)
@@ -1602,12 +2009,6 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
ret = create_tracing_map_fields(hist_data);
if (ret)
goto free;
-
- ret = tracing_map_init(hist_data->map);
- if (ret)
- goto free;
-
- hist_data->event_file = file;
out:
return hist_data;
free:
@@ -1622,12 +2023,17 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)

static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
struct tracing_map_elt *elt, void *rec,
- struct ring_buffer_event *rbe)
+ struct ring_buffer_event *rbe,
+ u64 *var_ref_vals)
{
+ struct hist_elt_data *elt_data;
struct hist_field *hist_field;
unsigned int i, var_idx;
u64 hist_val;

+ elt_data = elt->private_data;
+ elt_data->var_ref_vals = var_ref_vals;
+
for_each_hist_val_field(i, hist_data) {
hist_field = hist_data->fields[i];
hist_val = hist_field->fn(hist_field, elt, rbe, rec);
@@ -1680,6 +2086,7 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
struct hist_trigger_data *hist_data = data->private_data;
bool use_compound_key = (hist_data->n_keys > 1);
unsigned long entries[HIST_STACKTRACE_DEPTH];
+ u64 var_ref_vals[TRACING_MAP_VARS_MAX];
char compound_key[HIST_KEY_SIZE_MAX];
struct tracing_map_elt *elt = NULL;
struct stack_trace stacktrace;
@@ -1719,9 +2126,15 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
if (use_compound_key)
key = compound_key;

+ if (hist_data->n_var_refs &&
+ !resolve_var_refs(hist_data, key, var_ref_vals, false))
+ return;
+
elt = tracing_map_insert(hist_data->map, key);
- if (elt)
- hist_trigger_elt_update(hist_data, elt, rec, rbe);
+ if (!elt)
+ return;
+
+ hist_trigger_elt_update(hist_data, elt, rec, rbe, var_ref_vals);
}

static void hist_trigger_stacktrace_print(struct seq_file *m,
@@ -1824,7 +2237,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
field_name = hist_field_name(hist_data->fields[i], 0);

if (hist_data->fields[i]->flags & HIST_FIELD_FL_VAR ||
- hist_data->fields[i]->flags & HIST_FIELD_FL_EXPR)
+ hist_data->fields[i]->flags & HIST_FIELD_FL_EXPR ||
+ hist_data->fields[i]->flags & HIST_FIELD_FL_VAR_REF)
continue;

if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
@@ -2074,7 +2488,11 @@ static void event_hist_trigger_free(struct event_trigger_ops *ops,
if (!data->ref) {
if (data->name)
del_named_trigger(data);
+
trigger_data_free(data);
+
+ remove_hist_vars(hist_data);
+
destroy_hist_data(hist_data);
}
}
@@ -2288,23 +2706,55 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,
goto out;
}

- list_add_rcu(&data->list, &file->triggers);
ret++;

- update_cond_flag(file);
-
if (hist_data->enable_timestamps)
tracing_set_time_stamp_abs(file->tr, true);
+ out:
+ return ret;
+}
+
+static int hist_trigger_enable(struct event_trigger_data *data,
+ struct trace_event_file *file)
+{
+ int ret = 0;
+
+ list_add_rcu(&data->list, &file->triggers);
+
+ update_cond_flag(file);

if (trace_event_trigger_enable_disable(file, 1) < 0) {
list_del_rcu(&data->list);
update_cond_flag(file);
ret--;
}
- out:
+
return ret;
}

+static bool hist_trigger_check_refs(struct event_trigger_data *data,
+ struct trace_event_file *file)
+{
+ struct hist_trigger_data *hist_data = data->private_data;
+ struct event_trigger_data *test, *named_data = NULL;
+
+ if (hist_data->attrs->name)
+ named_data = find_named_trigger(hist_data->attrs->name);
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ if (!hist_trigger_match(data, test, named_data, false))
+ continue;
+ hist_data = test->private_data;
+ if (check_var_refs(hist_data))
+ return true;
+ break;
+ }
+ }
+
+ return false;
+}
+
static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,
struct event_trigger_data *data,
struct trace_event_file *file)
@@ -2335,10 +2785,30 @@ static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,
tracing_set_time_stamp_abs(file->tr, false);
}

+static bool hist_file_check_refs(struct trace_event_file *file)
+{
+ struct hist_trigger_data *hist_data;
+ struct event_trigger_data *test;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ hist_data = test->private_data;
+ if (check_var_refs(hist_data))
+ return true;
+ break;
+ }
+ }
+
+ return false;
+}
+
static void hist_unreg_all(struct trace_event_file *file)
{
struct event_trigger_data *test, *n;

+ if (hist_file_check_refs(file))
+ return;
+
list_for_each_entry_safe(test, n, &file->triggers, list) {
if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
list_del_rcu(&test->list);
@@ -2411,6 +2881,11 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
}

if (remove) {
+ if (hist_trigger_check_refs(trigger_data, file)) {
+ ret = -EBUSY;
+ goto out_free;
+ }
+
cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
ret = 0;
goto out_free;
@@ -2428,14 +2903,33 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
goto out_free;
} else if (ret < 0)
goto out_free;
+
+ if (get_named_trigger_data(trigger_data))
+ goto enable;
+
+ if (has_hist_vars(hist_data))
+ save_hist_vars(hist_data);
+
+ ret = tracing_map_init(hist_data->map);
+ if (ret)
+ goto out_unreg;
+enable:
+ ret = hist_trigger_enable(trigger_data, file);
+ if (ret)
+ goto out_unreg;
+
/* Just return zero, not the number of registered triggers */
ret = 0;
out:
return ret;
+ out_unreg:
+ cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
out_free:
if (cmd_ops->set_filter)
cmd_ops->set_filter(NULL, trigger_data, NULL);

+ remove_hist_vars(hist_data);
+
kfree(trigger_data);

destroy_hist_data(hist_data);
diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index 9b0fe31..a7a5bed 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -909,6 +909,12 @@ void set_named_trigger_data(struct event_trigger_data *data,
data->named_data = named_data;
}

+struct event_trigger_data *
+get_named_trigger_data(struct event_trigger_data *data)
+{
+ return data->named_data;
+}
+
static void
traceon_trigger(struct event_trigger_data *data, void *rec,
struct ring_buffer_event *event)
--
1.9.3

2017-09-05 22:04:39

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 22/40] tracing: Pass tracing_map_elt to hist_field accessor functions

Some accessor functions, such as for variable references, require
access to a corrsponding tracing_map_elt.

Add a tracing_map_elt param to the function signature and update the
accessor functions accordingly.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 91 +++++++++++++++++++++++++---------------
1 file changed, 57 insertions(+), 34 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index f17d394..c7dbe37 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -26,8 +26,10 @@

struct hist_field;

-typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
- struct ring_buffer_event *rbe);
+typedef u64 (*hist_field_fn_t) (struct hist_field *field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event);

#define HIST_FIELD_OPERANDS_MAX 2
#define HIST_FIELDS_MAX (TRACING_MAP_FIELDS_MAX + TRACING_MAP_VARS_MAX)
@@ -59,28 +61,36 @@ struct hist_field {
char *name;
};

-static u64 hist_field_none(struct hist_field *field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_none(struct hist_field *field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
return 0;
}

-static u64 hist_field_counter(struct hist_field *field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_counter(struct hist_field *field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
return 1;
}

-static u64 hist_field_string(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_string(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
char *addr = (char *)(event + hist_field->field->offset);

return (u64)(unsigned long)addr;
}

-static u64 hist_field_dynstring(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_dynstring(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
u32 str_item = *(u32 *)(event + hist_field->field->offset);
int str_loc = str_item & 0xffff;
@@ -89,54 +99,64 @@ static u64 hist_field_dynstring(struct hist_field *hist_field, void *event,
return (u64)(unsigned long)addr;
}

-static u64 hist_field_pstring(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_pstring(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
char **addr = (char **)(event + hist_field->field->offset);

return (u64)(unsigned long)*addr;
}

-static u64 hist_field_log2(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_log2(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
struct hist_field *operand = hist_field->operands[0];

- u64 val = operand->fn(operand, event, rbe);
+ u64 val = operand->fn(operand, elt, rbe, event);

return (u64) ilog2(roundup_pow_of_two(val));
}

-static u64 hist_field_plus(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_plus(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];

- u64 val1 = operand1->fn(operand1, event, rbe);
- u64 val2 = operand2->fn(operand2, event, rbe);
+ u64 val1 = operand1->fn(operand1, elt, rbe, event);
+ u64 val2 = operand2->fn(operand2, elt, rbe, event);

return val1 + val2;
}

-static u64 hist_field_minus(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_minus(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
struct hist_field *operand1 = hist_field->operands[0];
struct hist_field *operand2 = hist_field->operands[1];

- u64 val1 = operand1->fn(operand1, event, rbe);
- u64 val2 = operand2->fn(operand2, event, rbe);
+ u64 val1 = operand1->fn(operand1, elt, rbe, event);
+ u64 val2 = operand2->fn(operand2, elt, rbe, event);

return val1 - val2;
}

-static u64 hist_field_unary_minus(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_unary_minus(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
struct hist_field *operand = hist_field->operands[0];

- s64 sval = (s64)operand->fn(operand, event, rbe);
+ s64 sval = (s64)operand->fn(operand, elt, rbe, event);
u64 val = (u64)-sval;

return val;
@@ -144,8 +164,9 @@ static u64 hist_field_unary_minus(struct hist_field *hist_field, void *event,

#define DEFINE_HIST_FIELD_FN(type) \
static u64 hist_field_##type(struct hist_field *hist_field, \
- void *event, \
- struct ring_buffer_event *rbe) \
+ struct tracing_map_elt *elt, \
+ struct ring_buffer_event *rbe, \
+ void *event) \
{ \
type *addr = (type *)(event + hist_field->field->offset); \
\
@@ -235,8 +256,10 @@ struct hist_trigger_data {
bool remove;
};

-static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
- struct ring_buffer_event *rbe)
+static u64 hist_field_timestamp(struct hist_field *hist_field,
+ struct tracing_map_elt *elt,
+ struct ring_buffer_event *rbe,
+ void *event)
{
struct hist_trigger_data *hist_data = hist_field->hist_data;
struct trace_array *tr = hist_data->event_file->tr;
@@ -1574,7 +1597,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,

for_each_hist_val_field(i, hist_data) {
hist_field = hist_data->fields[i];
- hist_val = hist_field->fn(hist_field, rbe, rec);
+ hist_val = hist_field->fn(hist_field, elt, rbe, rec);
if (hist_field->flags & HIST_FIELD_FL_VAR) {
var_idx = hist_field->var.idx;
tracing_map_set_var(elt, var_idx, hist_val);
@@ -1587,7 +1610,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
for_each_hist_key_field(i, hist_data) {
hist_field = hist_data->fields[i];
if (hist_field->flags & HIST_FIELD_FL_VAR) {
- hist_val = hist_field->fn(hist_field, rbe, rec);
+ hist_val = hist_field->fn(hist_field, elt, rbe, rec);
var_idx = hist_field->var.idx;
tracing_map_set_var(elt, var_idx, hist_val);
}
@@ -1625,9 +1648,9 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,
bool use_compound_key = (hist_data->n_keys > 1);
unsigned long entries[HIST_STACKTRACE_DEPTH];
char compound_key[HIST_KEY_SIZE_MAX];
+ struct tracing_map_elt *elt = NULL;
struct stack_trace stacktrace;
struct hist_field *key_field;
- struct tracing_map_elt *elt;
u64 field_contents;
void *key = NULL;
unsigned int i;
@@ -1648,7 +1671,7 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,

key = entries;
} else {
- field_contents = key_field->fn(key_field, rec, rbe);
+ field_contents = key_field->fn(key_field, elt, rbe, rec);
if (key_field->flags & HIST_FIELD_FL_STRING) {
key = (void *)(unsigned long)field_contents;
use_compound_key = true;
--
1.9.3

2017-09-05 21:58:59

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 21/40] tracing: Generalize per-element hist trigger data

Up until now, hist triggers only needed per-element support for saving
'comm' data, which was saved directly as a private data pointer.

In anticipation of the need to save other data besides 'comm', add a
new hist_elt_data struct for the purpose, and switch the current
'comm'-related code over to that.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 65 ++++++++++++++++++++--------------------
1 file changed, 32 insertions(+), 33 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index bbca6d3..f17d394 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -249,6 +249,10 @@ static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
return ts;
}

+struct hist_elt_data {
+ char *comm;
+};
+
static const char *hist_field_name(struct hist_field *field,
unsigned int level)
{
@@ -447,26 +451,36 @@ static inline void save_comm(char *comm, struct task_struct *task)
memcpy(comm, task->comm, TASK_COMM_LEN);
}

-static void hist_trigger_elt_comm_free(struct tracing_map_elt *elt)
+static void hist_trigger_elt_data_free(struct tracing_map_elt *elt)
{
- kfree((char *)elt->private_data);
+ struct hist_elt_data *private_data = elt->private_data;
+
+ kfree(private_data->comm);
+ kfree(private_data);
}

-static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
+static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
{
struct hist_trigger_data *hist_data = elt->map->private_data;
+ unsigned int size = TASK_COMM_LEN + 1;
+ struct hist_elt_data *elt_data;
struct hist_field *key_field;
unsigned int i;

+ elt->private_data = elt_data = kzalloc(sizeof(*elt_data), GFP_KERNEL);
+ if (!elt_data)
+ return -ENOMEM;
+
for_each_hist_key_field(i, hist_data) {
key_field = hist_data->fields[i];

if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
- unsigned int size = TASK_COMM_LEN + 1;
-
- elt->private_data = kzalloc(size, GFP_KERNEL);
- if (!elt->private_data)
+ elt_data->comm = kzalloc(size, GFP_KERNEL);
+ if (!elt_data->comm) {
+ kfree(elt_data);
+ elt->private_data = NULL;
return -ENOMEM;
+ }
break;
}
}
@@ -474,18 +488,18 @@ static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
return 0;
}

-static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
+static void hist_trigger_elt_data_init(struct tracing_map_elt *elt)
{
- char *comm = elt->private_data;
+ struct hist_elt_data *private_data = elt->private_data;

- if (comm)
- save_comm(comm, current);
+ if (private_data->comm)
+ save_comm(private_data->comm, current);
}

-static const struct tracing_map_ops hist_trigger_elt_comm_ops = {
- .elt_alloc = hist_trigger_elt_comm_alloc,
- .elt_free = hist_trigger_elt_comm_free,
- .elt_init = hist_trigger_elt_comm_init,
+static const struct tracing_map_ops hist_trigger_elt_data_ops = {
+ .elt_alloc = hist_trigger_elt_data_alloc,
+ .elt_free = hist_trigger_elt_data_free,
+ .elt_init = hist_trigger_elt_data_init,
};

static const char *get_hist_field_flags(struct hist_field *hist_field)
@@ -1494,21 +1508,6 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
return 0;
}

-static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
-{
- struct hist_field *key_field;
- unsigned int i;
-
- for_each_hist_key_field(i, hist_data) {
- key_field = hist_data->fields[i];
-
- if (key_field->flags & HIST_FIELD_FL_EXECNAME)
- return true;
- }
-
- return false;
-}
-
static struct hist_trigger_data *
create_hist_data(unsigned int map_bits,
struct hist_trigger_attrs *attrs,
@@ -1534,8 +1533,7 @@ static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
if (ret)
goto free;

- if (need_tracing_map_ops(hist_data))
- map_ops = &hist_trigger_elt_comm_ops;
+ map_ops = &hist_trigger_elt_data_ops;

hist_data->map = tracing_map_create(map_bits, hist_data->key_size,
map_ops, hist_data);
@@ -1724,7 +1722,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
seq_printf(m, "%s: [%llx] %-55s", field_name,
uval, str);
} else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
- char *comm = elt->private_data;
+ struct hist_elt_data *elt_data = elt->private_data;
+ char *comm = elt_data->comm;

uval = *(u64 *)(key + key_field->offset);
seq_printf(m, "%s: %-16s[%10llu]", field_name,
--
1.9.3

2017-09-05 22:05:13

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 19/40] tracing: Account for variables in named trigger compatibility

Named triggers must also have the same set of variables in order to be
considered compatible - update the trigger match test to account for
that.

The reason for this requirement is that named triggers with variables
are meant to allow one or more events to set the same variable.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index f60e8ee..3c4eae9 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1598,7 +1598,7 @@ static int event_hist_trigger_print(struct seq_file *m,
sort_key = &hist_data->sort_keys[i];
idx = sort_key->field_idx;

- if (WARN_ON(idx >= TRACING_MAP_FIELDS_MAX))
+ if (WARN_ON(idx >= HIST_FIELDS_MAX))
return -EINVAL;

if (i > 0)
@@ -1786,6 +1786,12 @@ static bool hist_trigger_match(struct event_trigger_data *data,
return false;
if (key_field->is_signed != key_field_test->is_signed)
return false;
+ if ((key_field->var.name && !key_field_test->var.name) ||
+ (!key_field->var.name && key_field_test->var.name))
+ return false;
+ if ((key_field->var.name && key_field_test->var.name) &&
+ strcmp(key_field->var.name, key_field_test->var.name) != 0)
+ return false;
}

for (i = 0; i < hist_data->n_sort_keys; i++) {
--
1.9.3

2017-09-05 22:05:36

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 18/40] tracing: Add variable support to hist triggers

Add support for saving the value of a current event's event field by
assigning it to a variable that can be read by a subsequent event.

The basic syntax for saving a variable is to simply prefix a unique
variable name not corresponding to any keyword along with an '=' sign
to any event field.

Both keys and values can be saved and retrieved in this way:

# echo 'hist:keys=next_pid:vals=$ts0:ts0=common_timestamp ...
# echo 'hist:timer_pid=common_pid:key=$timer_pid ...'

If a variable isn't a key variable or prefixed with 'vals=', the
associated event field will be saved in a variable but won't be summed
as a value:

# echo 'hist:keys=next_pid:ts1=common_timestamp:...

Multiple variables can be assigned at the same time:

# echo 'hist:keys=pid:vals=$ts0,$b,field2:ts0=common_timestamp,b=field1 ...

Multiple (or single) variables can also be assigned at the same time
using separate assignments:

# echo 'hist:keys=pid:vals=$ts0:ts0=common_timestamp:b=field1:c=field2 ...

Variables set as above can be used by being referenced from another
event, as described in a subsequent patch.

Signed-off-by: Tom Zanussi <[email protected]>
Signed-off-by: Baohong Liu <[email protected]>
---
kernel/trace/trace_events_hist.c | 372 +++++++++++++++++++++++++++++++++++----
1 file changed, 333 insertions(+), 39 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 45a942e..f60e8ee 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -30,6 +30,13 @@ typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
struct ring_buffer_event *rbe);

#define HIST_FIELD_OPERANDS_MAX 2
+#define HIST_FIELDS_MAX (TRACING_MAP_FIELDS_MAX + TRACING_MAP_VARS_MAX)
+
+struct hist_var {
+ char *name;
+ struct hist_trigger_data *hist_data;
+ unsigned int idx;
+};

struct hist_field {
struct ftrace_event_field *field;
@@ -40,6 +47,7 @@ struct hist_field {
unsigned int is_signed;
struct hist_field *operands[HIST_FIELD_OPERANDS_MAX];
struct hist_trigger_data *hist_data;
+ struct hist_var var;
};

static u64 hist_field_none(struct hist_field *field, void *event,
@@ -138,6 +146,14 @@ enum hist_field_flags {
HIST_FIELD_FL_LOG2 = 512,
HIST_FIELD_FL_TIMESTAMP = 1024,
HIST_FIELD_FL_TIMESTAMP_USECS = 2048,
+ HIST_FIELD_FL_VAR = 4096,
+ HIST_FIELD_FL_VAR_ONLY = 8192,
+};
+
+struct var_defs {
+ unsigned int n_vars;
+ char *name[TRACING_MAP_VARS_MAX];
+ char *expr[TRACING_MAP_VARS_MAX];
};

struct hist_trigger_attrs {
@@ -150,13 +166,20 @@ struct hist_trigger_attrs {
bool clear;
bool ts_in_usecs;
unsigned int map_bits;
+
+ char *assignment_str[TRACING_MAP_VARS_MAX];
+ unsigned int n_assignments;
+
+ struct var_defs var_defs;
};

struct hist_trigger_data {
- struct hist_field *fields[TRACING_MAP_FIELDS_MAX];
+ struct hist_field *fields[HIST_FIELDS_MAX];
unsigned int n_vals;
unsigned int n_keys;
unsigned int n_fields;
+ unsigned int n_vars;
+ unsigned int n_var_only;
unsigned int key_size;
struct tracing_map_sort_key sort_keys[TRACING_MAP_SORT_KEYS_MAX];
unsigned int n_sort_keys;
@@ -164,6 +187,7 @@ struct hist_trigger_data {
struct hist_trigger_attrs *attrs;
struct tracing_map *map;
bool enable_timestamps;
+ bool remove;
};

static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
@@ -262,9 +286,14 @@ static int parse_map_size(char *str)

static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
{
+ unsigned int i;
+
if (!attrs)
return;

+ for (i = 0; i < attrs->n_assignments; i++)
+ kfree(attrs->assignment_str[i]);
+
kfree(attrs->name);
kfree(attrs->sort_key_str);
kfree(attrs->keys_str);
@@ -295,8 +324,22 @@ static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
goto out;
}
attrs->map_bits = map_bits;
- } else
- ret = -EINVAL;
+ } else {
+ char *assignment;
+
+ if (attrs->n_assignments == TRACING_MAP_VARS_MAX) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ assignment = kstrdup(str, GFP_KERNEL);
+ if (!assignment) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ attrs->assignment_str[attrs->n_assignments++] = assignment;
+ }
out:
return ret;
}
@@ -412,12 +455,15 @@ static void destroy_hist_field(struct hist_field *hist_field,
for (i = 0; i < HIST_FIELD_OPERANDS_MAX; i++)
destroy_hist_field(hist_field->operands[i], level + 1);

+ kfree(hist_field->var.name);
+
kfree(hist_field);
}

static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
struct ftrace_event_field *field,
- unsigned long flags)
+ unsigned long flags,
+ char *var_name)
{
struct hist_field *hist_field;

@@ -443,7 +489,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
if (flags & HIST_FIELD_FL_LOG2) {
unsigned long fl = flags & ~HIST_FIELD_FL_LOG2;
hist_field->fn = hist_field_log2;
- hist_field->operands[0] = create_hist_field(hist_data, field, fl);
+ hist_field->operands[0] = create_hist_field(hist_data, field, fl, NULL);
hist_field->size = hist_field->operands[0]->size;
goto out;
}
@@ -478,14 +524,23 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
hist_field->field = field;
hist_field->flags = flags;

+ if (var_name) {
+ hist_field->var.name = kstrdup(var_name, GFP_KERNEL);
+ if (!hist_field->var.name)
+ goto free;
+ }
+
return hist_field;
+ free:
+ destroy_hist_field(hist_field, 0);
+ return NULL;
}

static void destroy_hist_fields(struct hist_trigger_data *hist_data)
{
unsigned int i;

- for (i = 0; i < TRACING_MAP_FIELDS_MAX; i++) {
+ for (i = 0; i < HIST_FIELDS_MAX; i++) {
if (hist_data->fields[i]) {
destroy_hist_field(hist_data->fields[i], 0);
hist_data->fields[i] = NULL;
@@ -496,11 +551,12 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
static int create_hitcount_val(struct hist_trigger_data *hist_data)
{
hist_data->fields[HITCOUNT_IDX] =
- create_hist_field(hist_data, NULL, HIST_FIELD_FL_HITCOUNT);
+ create_hist_field(hist_data, NULL, HIST_FIELD_FL_HITCOUNT, NULL);
if (!hist_data->fields[HITCOUNT_IDX])
return -ENOMEM;

hist_data->n_vals++;
+ hist_data->n_fields++;

if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX))
return -EINVAL;
@@ -508,19 +564,53 @@ static int create_hitcount_val(struct hist_trigger_data *hist_data)
return 0;
}

-static int create_val_field(struct hist_trigger_data *hist_data,
- unsigned int val_idx,
- struct trace_event_file *file,
- char *field_str)
+static struct hist_field *find_var_field(struct hist_trigger_data *hist_data,
+ const char *var_name)
+{
+ struct hist_field *hist_field, *found = NULL;
+ int i;
+
+ for_each_hist_field(i, hist_data) {
+ hist_field = hist_data->fields[i];
+ if (hist_field && hist_field->flags & HIST_FIELD_FL_VAR &&
+ strcmp(hist_field->var.name, var_name) == 0) {
+ found = hist_field;
+ break;
+ }
+ }
+
+ return found;
+}
+
+static struct hist_field *find_var(struct trace_event_file *file,
+ const char *var_name)
+{
+ struct hist_trigger_data *hist_data;
+ struct event_trigger_data *test;
+ struct hist_field *hist_field;
+
+ list_for_each_entry_rcu(test, &file->triggers, list) {
+ if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
+ hist_data = test->private_data;
+ hist_field = find_var_field(hist_data, var_name);
+ if (hist_field)
+ return hist_field;
+ }
+ }
+
+ return NULL;
+}
+
+static int __create_val_field(struct hist_trigger_data *hist_data,
+ unsigned int val_idx,
+ struct trace_event_file *file,
+ char *var_name, char *field_str,
+ unsigned long flags)
{
struct ftrace_event_field *field = NULL;
- unsigned long flags = 0;
char *field_name;
int ret = 0;

- if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX))
- return -EINVAL;
-
field_name = strsep(&field_str, ".");
if (field_str) {
if (strcmp(field_str, "hex") == 0)
@@ -542,25 +632,65 @@ static int create_val_field(struct hist_trigger_data *hist_data,
}
}

- hist_data->fields[val_idx] = create_hist_field(hist_data, field, flags);
+ hist_data->fields[val_idx] = create_hist_field(hist_data, field, flags, var_name);
if (!hist_data->fields[val_idx]) {
ret = -ENOMEM;
goto out;
}

++hist_data->n_vals;
+ ++hist_data->n_fields;

- if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX))
+ if (hist_data->fields[val_idx]->flags & HIST_FIELD_FL_VAR_ONLY)
+ hist_data->n_var_only++;
+
+ if (WARN_ON(hist_data->n_vals > TRACING_MAP_VALS_MAX + TRACING_MAP_VARS_MAX))
ret = -EINVAL;
out:
return ret;
}

+static int create_val_field(struct hist_trigger_data *hist_data,
+ unsigned int val_idx,
+ struct trace_event_file *file,
+ char *field_str)
+{
+ if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX))
+ return -EINVAL;
+
+ return __create_val_field(hist_data, val_idx, file, NULL, field_str, 0);
+}
+
+static int create_var_field(struct hist_trigger_data *hist_data,
+ unsigned int val_idx,
+ struct trace_event_file *file,
+ char *var_name, char *expr_str)
+{
+ unsigned long flags = 0;
+
+ if (WARN_ON(val_idx >= TRACING_MAP_VALS_MAX + TRACING_MAP_VARS_MAX))
+ return -EINVAL;
+
+ if (find_var(file, var_name) && !hist_data->remove) {
+ return -EINVAL;
+ }
+
+ flags |= HIST_FIELD_FL_VAR;
+ hist_data->n_vars++;
+ if (hist_data->n_vars > TRACING_MAP_VARS_MAX) {
+ return -EINVAL;
+ }
+
+ flags |= HIST_FIELD_FL_VAR_ONLY;
+
+ return __create_val_field(hist_data, val_idx, file, var_name, expr_str, flags);
+}
+
static int create_val_fields(struct hist_trigger_data *hist_data,
struct trace_event_file *file)
{
char *fields_str, *field_str;
- unsigned int i, j;
+ unsigned int i, j = 1;
int ret;

ret = create_hitcount_val(hist_data);
@@ -580,12 +710,15 @@ static int create_val_fields(struct hist_trigger_data *hist_data,
field_str = strsep(&fields_str, ",");
if (!field_str)
break;
+
if (strcmp(field_str, "hitcount") == 0)
continue;
+
ret = create_val_field(hist_data, j++, file, field_str);
if (ret)
goto out;
}
+
if (fields_str && (strcmp(fields_str, "hitcount") != 0))
ret = -EINVAL;
out:
@@ -599,11 +732,12 @@ static int create_key_field(struct hist_trigger_data *hist_data,
char *field_str)
{
struct ftrace_event_field *field = NULL;
+ struct hist_field *hist_field = NULL;
unsigned long flags = 0;
unsigned int key_size;
int ret = 0;

- if (WARN_ON(key_idx >= TRACING_MAP_FIELDS_MAX))
+ if (WARN_ON(key_idx >= HIST_FIELDS_MAX))
return -EINVAL;

flags |= HIST_FIELD_FL_KEY;
@@ -611,6 +745,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
if (strcmp(field_str, "stacktrace") == 0) {
flags |= HIST_FIELD_FL_STACKTRACE;
key_size = sizeof(unsigned long) * HIST_STACKTRACE_DEPTH;
+ hist_field = create_hist_field(hist_data, NULL, flags, NULL);
} else {
char *field_name = strsep(&field_str, ".");

@@ -656,7 +791,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
}
}

- hist_data->fields[key_idx] = create_hist_field(hist_data, field, flags);
+ hist_data->fields[key_idx] = create_hist_field(hist_data, field, flags, NULL);
if (!hist_data->fields[key_idx]) {
ret = -ENOMEM;
goto out;
@@ -672,6 +807,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
}

hist_data->n_keys++;
+ hist_data->n_fields++;

if (WARN_ON(hist_data->n_keys > TRACING_MAP_KEYS_MAX))
return -EINVAL;
@@ -715,21 +851,108 @@ static int create_key_fields(struct hist_trigger_data *hist_data,
return ret;
}

+static int create_var_fields(struct hist_trigger_data *hist_data,
+ struct trace_event_file *file)
+{
+ unsigned int i, j = hist_data->n_vals;
+ int ret = 0;
+
+ unsigned int n_vars = hist_data->attrs->var_defs.n_vars;
+
+ for (i = 0; i < n_vars; i++) {
+ char *var_name = hist_data->attrs->var_defs.name[i];
+ char *expr = hist_data->attrs->var_defs.expr[i];
+
+ ret = create_var_field(hist_data, j++, file, var_name, expr);
+ if (ret)
+ goto out;
+ }
+ out:
+ return ret;
+}
+
+static void free_var_defs(struct hist_trigger_data *hist_data)
+{
+ unsigned int i;
+
+ for (i = 0; i < hist_data->attrs->var_defs.n_vars; i++) {
+ kfree(hist_data->attrs->var_defs.name[i]);
+ kfree(hist_data->attrs->var_defs.expr[i]);
+ }
+
+ hist_data->attrs->var_defs.n_vars = 0;
+}
+
+static int parse_var_defs(struct hist_trigger_data *hist_data)
+{
+ char *s, *str, *var_name, *field_str;
+ unsigned int i, j, n_vars = 0;
+ int ret = 0;
+
+ for (i = 0; i < hist_data->attrs->n_assignments; i++) {
+ str = hist_data->attrs->assignment_str[i];
+ for (j = 0; j < TRACING_MAP_VARS_MAX; j++) {
+ field_str = strsep(&str, ",");
+ if (!field_str)
+ break;
+
+ var_name = strsep(&field_str, "=");
+ if (!var_name || !field_str) {
+ ret = -EINVAL;
+ goto free;
+ }
+
+ s = kstrdup(var_name, GFP_KERNEL);
+ if (!s) {
+ ret = -ENOMEM;
+ goto free;
+ }
+ hist_data->attrs->var_defs.name[n_vars] = s;
+
+ s = kstrdup(field_str, GFP_KERNEL);
+ if (!s) {
+ ret = -ENOMEM;
+ goto free;
+ }
+ hist_data->attrs->var_defs.expr[n_vars++] = s;
+
+ hist_data->attrs->var_defs.n_vars = n_vars;
+
+ if (n_vars == TRACING_MAP_VARS_MAX)
+ goto free;
+ }
+ }
+
+ return ret;
+ free:
+ free_var_defs(hist_data);
+
+ return ret;
+}
+
static int create_hist_fields(struct hist_trigger_data *hist_data,
struct trace_event_file *file)
{
int ret;

+ ret = parse_var_defs(hist_data);
+ if (ret)
+ goto out;
+
ret = create_val_fields(hist_data, file);
if (ret)
goto out;

- ret = create_key_fields(hist_data, file);
+ ret = create_var_fields(hist_data, file);
if (ret)
goto out;

- hist_data->n_fields = hist_data->n_vals + hist_data->n_keys;
+ ret = create_key_fields(hist_data, file);
+ if (ret)
+ goto out;
out:
+ free_var_defs(hist_data);
+
return ret;
}

@@ -752,7 +975,7 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
char *fields_str = hist_data->attrs->sort_key_str;
struct tracing_map_sort_key *sort_key;
int descending, ret = 0;
- unsigned int i, j;
+ unsigned int i, j, k;

hist_data->n_sort_keys = 1; /* we always have at least one, hitcount */

@@ -800,12 +1023,19 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
continue;
}

- for (j = 1; j < hist_data->n_fields; j++) {
+ for (j = 1, k = 1; j < hist_data->n_fields; j++) {
+ unsigned int idx;
+
hist_field = hist_data->fields[j];
+ if (hist_field->flags & HIST_FIELD_FL_VAR_ONLY)
+ continue;
+
+ idx = k++;
+
test_name = hist_field_name(hist_field, 0);

if (strcmp(field_name, test_name) == 0) {
- sort_key->field_idx = j;
+ sort_key->field_idx = idx;
descending = is_descending(field_str);
if (descending < 0) {
ret = descending;
@@ -820,6 +1050,7 @@ static int create_sort_keys(struct hist_trigger_data *hist_data)
break;
}
}
+
hist_data->n_sort_keys = i;
out:
return ret;
@@ -860,12 +1091,19 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
idx = tracing_map_add_key_field(map,
hist_field->offset,
cmp_fn);
-
- } else
+ } else if (!(hist_field->flags & HIST_FIELD_FL_VAR))
idx = tracing_map_add_sum_field(map);

if (idx < 0)
return idx;
+
+ if (hist_field->flags & HIST_FIELD_FL_VAR) {
+ idx = tracing_map_add_var(map);
+ if (idx < 0)
+ return idx;
+ hist_field->var.idx = idx;
+ hist_field->var.hist_data = hist_data;
+ }
}

return 0;
@@ -889,7 +1127,8 @@ static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
static struct hist_trigger_data *
create_hist_data(unsigned int map_bits,
struct hist_trigger_attrs *attrs,
- struct trace_event_file *file)
+ struct trace_event_file *file,
+ bool remove)
{
const struct tracing_map_ops *map_ops = NULL;
struct hist_trigger_data *hist_data;
@@ -900,6 +1139,7 @@ static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
return ERR_PTR(-ENOMEM);

hist_data->attrs = attrs;
+ hist_data->remove = remove;

ret = create_hist_fields(hist_data, file);
if (ret)
@@ -946,14 +1186,29 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
struct ring_buffer_event *rbe)
{
struct hist_field *hist_field;
- unsigned int i;
+ unsigned int i, var_idx;
u64 hist_val;

for_each_hist_val_field(i, hist_data) {
hist_field = hist_data->fields[i];
- hist_val = hist_field->fn(hist_field, rec, rbe);
+ hist_val = hist_field->fn(hist_field, rbe, rec);
+ if (hist_field->flags & HIST_FIELD_FL_VAR) {
+ var_idx = hist_field->var.idx;
+ tracing_map_set_var(elt, var_idx, hist_val);
+ if (hist_field->flags & HIST_FIELD_FL_VAR_ONLY)
+ continue;
+ }
tracing_map_update_sum(elt, i, hist_val);
}
+
+ for_each_hist_key_field(i, hist_data) {
+ hist_field = hist_data->fields[i];
+ if (hist_field->flags & HIST_FIELD_FL_VAR) {
+ hist_val = hist_field->fn(hist_field, rbe, rec);
+ var_idx = hist_field->var.idx;
+ tracing_map_set_var(elt, var_idx, hist_val);
+ }
+ }
}

static inline void add_to_key(char *compound_key, void *key,
@@ -1128,6 +1383,9 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
for (i = 1; i < hist_data->n_vals; i++) {
field_name = hist_field_name(hist_data->fields[i], 0);

+ if (hist_data->fields[i]->flags & HIST_FIELD_FL_VAR)
+ continue;
+
if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
seq_printf(m, " %s: %10llx", field_name,
tracing_map_read_sum(elt, i));
@@ -1251,6 +1509,9 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
{
const char *field_name = hist_field_name(hist_field, 0);

+ if (hist_field->var.name)
+ seq_printf(m, "%s=", hist_field->var.name);
+
if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP)
seq_puts(m, "$common_timestamp");
else if (field_name)
@@ -1269,7 +1530,8 @@ static int event_hist_trigger_print(struct seq_file *m,
struct event_trigger_data *data)
{
struct hist_trigger_data *hist_data = data->private_data;
- struct hist_field *key_field;
+ bool have_var_only = false;
+ struct hist_field *field;
unsigned int i;

seq_puts(m, "hist:");
@@ -1280,25 +1542,47 @@ static int event_hist_trigger_print(struct seq_file *m,
seq_puts(m, "keys=");

for_each_hist_key_field(i, hist_data) {
- key_field = hist_data->fields[i];
+ field = hist_data->fields[i];

if (i > hist_data->n_vals)
seq_puts(m, ",");

- if (key_field->flags & HIST_FIELD_FL_STACKTRACE)
+ if (field->flags & HIST_FIELD_FL_STACKTRACE)
seq_puts(m, "stacktrace");
else
- hist_field_print(m, key_field);
+ hist_field_print(m, field);
}

seq_puts(m, ":vals=");

for_each_hist_val_field(i, hist_data) {
+ field = hist_data->fields[i];
+ if (field->flags & HIST_FIELD_FL_VAR_ONLY) {
+ have_var_only = true;
+ continue;
+ }
+
if (i == HITCOUNT_IDX)
seq_puts(m, "hitcount");
else {
seq_puts(m, ",");
- hist_field_print(m, hist_data->fields[i]);
+ hist_field_print(m, field);
+ }
+ }
+
+ if (have_var_only) {
+ unsigned int n = 0;
+
+ seq_puts(m, ":");
+
+ for_each_hist_val_field(i, hist_data) {
+ field = hist_data->fields[i];
+
+ if (field->flags & HIST_FIELD_FL_VAR_ONLY) {
+ if (n++)
+ seq_puts(m, ",");
+ hist_field_print(m, field);
+ }
}
}

@@ -1306,7 +1590,10 @@ static int event_hist_trigger_print(struct seq_file *m,

for (i = 0; i < hist_data->n_sort_keys; i++) {
struct tracing_map_sort_key *sort_key;
- unsigned int idx;
+ unsigned int idx, first_key_idx;
+
+ /* skip VAR_ONLY vals */
+ first_key_idx = hist_data->n_vals - hist_data->n_var_only;

sort_key = &hist_data->sort_keys[i];
idx = sort_key->field_idx;
@@ -1319,8 +1606,11 @@ static int event_hist_trigger_print(struct seq_file *m,

if (idx == HITCOUNT_IDX)
seq_puts(m, "hitcount");
- else
+ else {
+ if (idx >= first_key_idx)
+ idx += hist_data->n_var_only;
hist_field_print(m, hist_data->fields[idx]);
+ }

if (sort_key->descending)
seq_puts(m, ".descending");
@@ -1644,12 +1934,16 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
struct hist_trigger_attrs *attrs;
struct event_trigger_ops *trigger_ops;
struct hist_trigger_data *hist_data;
+ bool remove = false;
char *trigger;
int ret = 0;

if (!param)
return -EINVAL;

+ if (glob[0] == '!')
+ remove = true;
+
/* separate the trigger from the filter (k:v [if filter]) */
trigger = strsep(&param, " \t");
if (!trigger)
@@ -1662,7 +1956,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
if (attrs->map_bits)
hist_trigger_bits = attrs->map_bits;

- hist_data = create_hist_data(hist_trigger_bits, attrs, file);
+ hist_data = create_hist_data(hist_trigger_bits, attrs, file, remove);
if (IS_ERR(hist_data)) {
destroy_hist_trigger_attrs(attrs);
return PTR_ERR(hist_data);
@@ -1691,7 +1985,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
goto out_free;
}

- if (glob[0] == '!') {
+ if (remove) {
cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
ret = 0;
goto out_free;
--
1.9.3

2017-09-05 21:58:41

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 12/40] tracing: Break out hist trigger assignment parsing

This will make it easier to add variables, and makes the parsing code
cleaner regardless.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 56 +++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 21 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index bec2574..3d92faa 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -251,6 +251,35 @@ static void destroy_hist_trigger_attrs(struct hist_trigger_attrs *attrs)
kfree(attrs);
}

+static int parse_assignment(char *str, struct hist_trigger_attrs *attrs)
+{
+ int ret = 0;
+
+ if ((strncmp(str, "key=", strlen("key=")) == 0) ||
+ (strncmp(str, "keys=", strlen("keys=")) == 0))
+ attrs->keys_str = kstrdup(str, GFP_KERNEL);
+ else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
+ (strncmp(str, "vals=", strlen("vals=")) == 0) ||
+ (strncmp(str, "values=", strlen("values=")) == 0))
+ attrs->vals_str = kstrdup(str, GFP_KERNEL);
+ else if (strncmp(str, "sort=", strlen("sort=")) == 0)
+ attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
+ else if (strncmp(str, "name=", strlen("name=")) == 0)
+ attrs->name = kstrdup(str, GFP_KERNEL);
+ else if (strncmp(str, "size=", strlen("size=")) == 0) {
+ int map_bits = parse_map_size(str);
+
+ if (map_bits < 0) {
+ ret = map_bits;
+ goto out;
+ }
+ attrs->map_bits = map_bits;
+ } else
+ ret = -EINVAL;
+ out:
+ return ret;
+}
+
static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
{
struct hist_trigger_attrs *attrs;
@@ -263,33 +292,18 @@ static struct hist_trigger_attrs *parse_hist_trigger_attrs(char *trigger_str)
while (trigger_str) {
char *str = strsep(&trigger_str, ":");

- if ((strncmp(str, "key=", strlen("key=")) == 0) ||
- (strncmp(str, "keys=", strlen("keys=")) == 0))
- attrs->keys_str = kstrdup(str, GFP_KERNEL);
- else if ((strncmp(str, "val=", strlen("val=")) == 0) ||
- (strncmp(str, "vals=", strlen("vals=")) == 0) ||
- (strncmp(str, "values=", strlen("values=")) == 0))
- attrs->vals_str = kstrdup(str, GFP_KERNEL);
- else if (strncmp(str, "sort=", strlen("sort=")) == 0)
- attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
- else if (strncmp(str, "name=", strlen("name=")) == 0)
- attrs->name = kstrdup(str, GFP_KERNEL);
- else if (strcmp(str, "pause") == 0)
+ if (strchr(str, '=')) {
+ ret = parse_assignment(str, attrs);
+ if (ret)
+ goto free;
+ } else if (strcmp(str, "pause") == 0)
attrs->pause = true;
else if ((strcmp(str, "cont") == 0) ||
(strcmp(str, "continue") == 0))
attrs->cont = true;
else if (strcmp(str, "clear") == 0)
attrs->clear = true;
- else if (strncmp(str, "size=", strlen("size=")) == 0) {
- int map_bits = parse_map_size(str);
-
- if (map_bits < 0) {
- ret = map_bits;
- goto free;
- }
- attrs->map_bits = map_bits;
- } else {
+ else {
ret = -EINVAL;
goto free;
}
--
1.9.3

2017-09-05 22:06:05

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 16/40] tracing: Add hist_data member to hist_field

Allow hist_data access via hist_field. Some users of hist_fields
require or will require more access to the associated hist_data.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index dfaf07b..65aad07 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -39,6 +39,7 @@ struct hist_field {
unsigned int offset;
unsigned int is_signed;
struct hist_field *operands[HIST_FIELD_OPERANDS_MAX];
+ struct hist_trigger_data *hist_data;
};

static u64 hist_field_none(struct hist_field *field, void *event,
@@ -404,7 +405,8 @@ static void destroy_hist_field(struct hist_field *hist_field,
kfree(hist_field);
}

-static struct hist_field *create_hist_field(struct ftrace_event_field *field,
+static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
+ struct ftrace_event_field *field,
unsigned long flags)
{
struct hist_field *hist_field;
@@ -416,6 +418,8 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
if (!hist_field)
return NULL;

+ hist_field->hist_data = hist_data;
+
if (flags & HIST_FIELD_FL_HITCOUNT) {
hist_field->fn = hist_field_counter;
goto out;
@@ -429,7 +433,7 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
if (flags & HIST_FIELD_FL_LOG2) {
unsigned long fl = flags & ~HIST_FIELD_FL_LOG2;
hist_field->fn = hist_field_log2;
- hist_field->operands[0] = create_hist_field(field, fl);
+ hist_field->operands[0] = create_hist_field(hist_data, field, fl);
hist_field->size = hist_field->operands[0]->size;
goto out;
}
@@ -482,7 +486,7 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)
static int create_hitcount_val(struct hist_trigger_data *hist_data)
{
hist_data->fields[HITCOUNT_IDX] =
- create_hist_field(NULL, HIST_FIELD_FL_HITCOUNT);
+ create_hist_field(hist_data, NULL, HIST_FIELD_FL_HITCOUNT);
if (!hist_data->fields[HITCOUNT_IDX])
return -ENOMEM;

@@ -528,7 +532,7 @@ static int create_val_field(struct hist_trigger_data *hist_data,
}
}

- hist_data->fields[val_idx] = create_hist_field(field, flags);
+ hist_data->fields[val_idx] = create_hist_field(hist_data, field, flags);
if (!hist_data->fields[val_idx]) {
ret = -ENOMEM;
goto out;
@@ -638,7 +642,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
}
}

- hist_data->fields[key_idx] = create_hist_field(field, flags);
+ hist_data->fields[key_idx] = create_hist_field(hist_data, field, flags);
if (!hist_data->fields[key_idx]) {
ret = -ENOMEM;
goto out;
--
1.9.3

2017-09-05 22:06:25

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 15/40] tracing: Add per-element variable support to tracing_map

In order to allow information to be passed between trace events, add
support for per-element variables to tracing_map. This provides a
means for histograms to associate a value or values with an entry when
it's saved or updated, and retrieved by a subsequent event occurrences.

Variables can be set using tracing_map_set_var() and read using
tracing_map_read_var(). tracing_map_var_set() returns true or false
depending on whether or not the variable has been set or not, which is
important for event-matching applications.

tracing_map_read_var_once() reads the variable and resets it to the
'unset' state, implementing read-once variables, which are also
important for event-matching uses.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/tracing_map.c | 108 +++++++++++++++++++++++++++++++++++++++++++++
kernel/trace/tracing_map.h | 11 +++++
2 files changed, 119 insertions(+)

diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index 421a0fa..a4e5a56 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -66,6 +66,73 @@ u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i)
return (u64)atomic64_read(&elt->fields[i].sum);
}

+/**
+ * tracing_map_set_var - Assign a tracing_map_elt's variable field
+ * @elt: The tracing_map_elt
+ * @i: The index of the given variable associated with the tracing_map_elt
+ * @n: The value to assign
+ *
+ * Assign n to variable i associated with the specified tracing_map_elt
+ * instance. The index i is the index returned by the call to
+ * tracing_map_add_var() when the tracing map was set up.
+ */
+void tracing_map_set_var(struct tracing_map_elt *elt, unsigned int i, u64 n)
+{
+ atomic64_set(&elt->vars[i], n);
+ elt->var_set[i] = true;
+}
+
+/**
+ * tracing_map_var_set - Return whether or not a variable has been set
+ * @elt: The tracing_map_elt
+ * @i: The index of the given variable associated with the tracing_map_elt
+ *
+ * Return true if the variable has been set, false otherwise. The
+ * index i is the index returned by the call to tracing_map_add_var()
+ * when the tracing map was set up.
+ */
+bool tracing_map_var_set(struct tracing_map_elt *elt, unsigned int i)
+{
+ return elt->var_set[i];
+}
+
+/**
+ * tracing_map_read_var - Return the value of a tracing_map_elt's variable field
+ * @elt: The tracing_map_elt
+ * @i: The index of the given variable associated with the tracing_map_elt
+ *
+ * Retrieve the value of the variable i associated with the specified
+ * tracing_map_elt instance. The index i is the index returned by the
+ * call to tracing_map_add_var() when the tracing map was set
+ * up.
+ *
+ * Return: The variable value associated with field i for elt.
+ */
+u64 tracing_map_read_var(struct tracing_map_elt *elt, unsigned int i)
+{
+ return (u64)atomic64_read(&elt->vars[i]);
+}
+
+/**
+ * tracing_map_read_var_once - Return and reset a tracing_map_elt's variable field
+ * @elt: The tracing_map_elt
+ * @i: The index of the given variable associated with the tracing_map_elt
+ *
+ * Retrieve the value of the variable i associated with the specified
+ * tracing_map_elt instance, and reset the variable to the 'not set'
+ * state. The index i is the index returned by the call to
+ * tracing_map_add_var() when the tracing map was set up. The reset
+ * essentially makes the variable a read-once variable if it's only
+ * accessed using this function.
+ *
+ * Return: The variable value associated with field i for elt.
+ */
+u64 tracing_map_read_var_once(struct tracing_map_elt *elt, unsigned int i)
+{
+ elt->var_set[i] = false;
+ return (u64)atomic64_read(&elt->vars[i]);
+}
+
int tracing_map_cmp_string(void *val_a, void *val_b)
{
char *a = val_a;
@@ -171,6 +238,28 @@ int tracing_map_add_sum_field(struct tracing_map *map)
}

/**
+ * tracing_map_add_var - Add a field describing a tracing_map var
+ * @map: The tracing_map
+ *
+ * Add a var to the map and return the index identifying it in the map
+ * and associated tracing_map_elts. This is the index used for
+ * instance to update a var for a particular tracing_map_elt using
+ * tracing_map_update_var() or reading it via tracing_map_read_var().
+ *
+ * Return: The index identifying the var in the map and associated
+ * tracing_map_elts, or -EINVAL on error.
+ */
+int tracing_map_add_var(struct tracing_map *map)
+{
+ int ret = -EINVAL;
+
+ if (map->n_vars < TRACING_MAP_VARS_MAX)
+ ret = map->n_vars++;
+
+ return ret;
+}
+
+/**
* tracing_map_add_key_field - Add a field describing a tracing_map key
* @map: The tracing_map
* @offset: The offset within the key
@@ -280,6 +369,11 @@ static void tracing_map_elt_clear(struct tracing_map_elt *elt)
if (elt->fields[i].cmp_fn == tracing_map_cmp_atomic64)
atomic64_set(&elt->fields[i].sum, 0);

+ for (i = 0; i < elt->map->n_vars; i++) {
+ atomic64_set(&elt->vars[i], 0);
+ elt->var_set[i] = false;
+ }
+
if (elt->map->ops && elt->map->ops->elt_clear)
elt->map->ops->elt_clear(elt);
}
@@ -306,6 +400,8 @@ static void tracing_map_elt_free(struct tracing_map_elt *elt)
if (elt->map->ops && elt->map->ops->elt_free)
elt->map->ops->elt_free(elt);
kfree(elt->fields);
+ kfree(elt->vars);
+ kfree(elt->var_set);
kfree(elt->key);
kfree(elt);
}
@@ -333,6 +429,18 @@ static struct tracing_map_elt *tracing_map_elt_alloc(struct tracing_map *map)
goto free;
}

+ elt->vars = kcalloc(map->n_vars, sizeof(*elt->vars), GFP_KERNEL);
+ if (!elt->vars) {
+ err = -ENOMEM;
+ goto free;
+ }
+
+ elt->var_set = kcalloc(map->n_vars, sizeof(*elt->var_set), GFP_KERNEL);
+ if (!elt->var_set) {
+ err = -ENOMEM;
+ goto free;
+ }
+
tracing_map_elt_init_fields(elt);

if (map->ops && map->ops->elt_alloc) {
diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
index 98ef6d6..2800a6b 100644
--- a/kernel/trace/tracing_map.h
+++ b/kernel/trace/tracing_map.h
@@ -9,6 +9,7 @@
#define TRACING_MAP_VALS_MAX 3
#define TRACING_MAP_FIELDS_MAX (TRACING_MAP_KEYS_MAX + \
TRACING_MAP_VALS_MAX)
+#define TRACING_MAP_VARS_MAX 16
#define TRACING_MAP_SORT_KEYS_MAX 2

typedef int (*tracing_map_cmp_fn_t) (void *val_a, void *val_b);
@@ -136,6 +137,8 @@ struct tracing_map_field {
struct tracing_map_elt {
struct tracing_map *map;
struct tracing_map_field *fields;
+ atomic64_t *vars;
+ bool *var_set;
void *key;
void *private_data;
};
@@ -191,6 +194,7 @@ struct tracing_map {
int key_idx[TRACING_MAP_KEYS_MAX];
unsigned int n_keys;
struct tracing_map_sort_key sort_key;
+ unsigned int n_vars;
atomic64_t hits;
atomic64_t drops;
};
@@ -240,6 +244,7 @@ struct tracing_map_ops {
extern int tracing_map_init(struct tracing_map *map);

extern int tracing_map_add_sum_field(struct tracing_map *map);
+extern int tracing_map_add_var(struct tracing_map *map);
extern int tracing_map_add_key_field(struct tracing_map *map,
unsigned int offset,
tracing_map_cmp_fn_t cmp_fn);
@@ -259,7 +264,13 @@ extern tracing_map_cmp_fn_t tracing_map_cmp_num(int field_size,

extern void tracing_map_update_sum(struct tracing_map_elt *elt,
unsigned int i, u64 n);
+extern void tracing_map_set_var(struct tracing_map_elt *elt,
+ unsigned int i, u64 n);
+extern bool tracing_map_var_set(struct tracing_map_elt *elt, unsigned int i);
extern u64 tracing_map_read_sum(struct tracing_map_elt *elt, unsigned int i);
+extern u64 tracing_map_read_var(struct tracing_map_elt *elt, unsigned int i);
+extern u64 tracing_map_read_var_once(struct tracing_map_elt *elt, unsigned int i);
+
extern void tracing_map_set_field_descr(struct tracing_map *map,
unsigned int i,
unsigned int key_offset,
--
1.9.3

2017-09-05 22:07:06

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 14/40] tracing: Add hist trigger timestamp support

Add support for a timestamp event field. This is actually a 'pseudo-'
event field in that it behaves like it's part of the event record, but
is really part of the corresponding ring buffer event.

To make use of the timestamp field, users can specify
"$common_timestamp" as a field name for any histogram. Note that this
doesn't make much sense on its own either as either a key or value,
but needs to be supported even so, since follow-on patches will add
support for making use of this field in time deltas. The '$' is used
as a prefix on the variable name to indicate that it's not an bonafide
event field - so you won't find it in the event description - but
rather it's a synthetic field that can be used like a real field).

Note that the use of this field requires the ring buffer be put into
TIME_EXTEND_ABS mode, which saves the complete timestamp for each
event rather than an offset. This mode will be enabled if and only if
a histogram makes use of the "$common_timestamp" field.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 88 +++++++++++++++++++++++++++++-----------
1 file changed, 65 insertions(+), 23 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 3d92faa..dfaf07b 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -89,6 +89,12 @@ static u64 hist_field_log2(struct hist_field *hist_field, void *event,
return (u64) ilog2(roundup_pow_of_two(val));
}

+static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
+{
+ return ring_buffer_event_time_stamp(rbe);
+}
+
#define DEFINE_HIST_FIELD_FN(type) \
static u64 hist_field_##type(struct hist_field *hist_field, \
void *event, \
@@ -135,6 +141,7 @@ enum hist_field_flags {
HIST_FIELD_FL_SYSCALL = 128,
HIST_FIELD_FL_STACKTRACE = 256,
HIST_FIELD_FL_LOG2 = 512,
+ HIST_FIELD_FL_TIMESTAMP = 1024,
};

struct hist_trigger_attrs {
@@ -159,6 +166,7 @@ struct hist_trigger_data {
struct trace_event_file *event_file;
struct hist_trigger_attrs *attrs;
struct tracing_map *map;
+ bool enable_timestamps;
};

static const char *hist_field_name(struct hist_field *field,
@@ -173,6 +181,8 @@ static const char *hist_field_name(struct hist_field *field,
field_name = field->field->name;
else if (field->flags & HIST_FIELD_FL_LOG2)
field_name = hist_field_name(field->operands[0], ++level);
+ else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
+ field_name = "$common_timestamp";

if (field_name == NULL)
field_name = "";
@@ -424,6 +434,12 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
goto out;
}

+ if (flags & HIST_FIELD_FL_TIMESTAMP) {
+ hist_field->fn = hist_field_timestamp;
+ hist_field->size = sizeof(u64);
+ goto out;
+ }
+
if (WARN_ON_ONCE(!field))
goto out;

@@ -501,10 +517,15 @@ static int create_val_field(struct hist_trigger_data *hist_data,
}
}

- field = trace_find_event_field(file->event_call, field_name);
- if (!field || !field->size) {
- ret = -EINVAL;
- goto out;
+ if (strcmp(field_name, "$common_timestamp") == 0) {
+ flags |= HIST_FIELD_FL_TIMESTAMP;
+ hist_data->enable_timestamps = true;
+ } else {
+ field = trace_find_event_field(file->event_call, field_name);
+ if (!field || !field->size) {
+ ret = -EINVAL;
+ goto out;
+ }
}

hist_data->fields[val_idx] = create_hist_field(field, flags);
@@ -599,16 +620,22 @@ static int create_key_field(struct hist_trigger_data *hist_data,
}
}

- field = trace_find_event_field(file->event_call, field_name);
- if (!field || !field->size) {
- ret = -EINVAL;
- goto out;
- }
+ if (strcmp(field_name, "$common_timestamp") == 0) {
+ flags |= HIST_FIELD_FL_TIMESTAMP;
+ hist_data->enable_timestamps = true;
+ key_size = sizeof(u64);
+ } else {
+ field = trace_find_event_field(file->event_call, field_name);
+ if (!field || !field->size) {
+ ret = -EINVAL;
+ goto out;
+ }

- if (is_string_field(field))
- key_size = MAX_FILTER_STR_VAL;
- else
- key_size = field->size;
+ if (is_string_field(field))
+ key_size = MAX_FILTER_STR_VAL;
+ else
+ key_size = field->size;
+ }
}

hist_data->fields[key_idx] = create_hist_field(field, flags);
@@ -804,6 +831,9 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)

if (hist_field->flags & HIST_FIELD_FL_STACKTRACE)
cmp_fn = tracing_map_cmp_none;
+ else if (!field)
+ cmp_fn = tracing_map_cmp_num(hist_field->size,
+ hist_field->is_signed);
else if (is_string_field(field))
cmp_fn = tracing_map_cmp_string;
else
@@ -1201,7 +1231,11 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
{
const char *field_name = hist_field_name(hist_field, 0);

- seq_printf(m, "%s", field_name);
+ if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP)
+ seq_puts(m, "$common_timestamp");
+ else if (field_name)
+ seq_printf(m, "%s", field_name);
+
if (hist_field->flags) {
const char *flags_str = get_hist_field_flags(hist_field);

@@ -1252,27 +1286,25 @@ static int event_hist_trigger_print(struct seq_file *m,

for (i = 0; i < hist_data->n_sort_keys; i++) {
struct tracing_map_sort_key *sort_key;
+ unsigned int idx;

sort_key = &hist_data->sort_keys[i];
+ idx = sort_key->field_idx;
+
+ if (WARN_ON(idx >= TRACING_MAP_FIELDS_MAX))
+ return -EINVAL;

if (i > 0)
seq_puts(m, ",");

- if (sort_key->field_idx == HITCOUNT_IDX)
+ if (idx == HITCOUNT_IDX)
seq_puts(m, "hitcount");
- else {
- unsigned int idx = sort_key->field_idx;
-
- if (WARN_ON(idx >= TRACING_MAP_FIELDS_MAX))
- return -EINVAL;
-
+ else
hist_field_print(m, hist_data->fields[idx]);
- }

if (sort_key->descending)
seq_puts(m, ".descending");
}
-
seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits));

if (data->filter_str)
@@ -1440,6 +1472,10 @@ static bool hist_trigger_match(struct event_trigger_data *data,
return false;
if (key_field->offset != key_field_test->offset)
return false;
+ if (key_field->size != key_field_test->size)
+ return false;
+ if (key_field->is_signed != key_field_test->is_signed)
+ return false;
}

for (i = 0; i < hist_data->n_sort_keys; i++) {
@@ -1522,6 +1558,9 @@ static int hist_register_trigger(char *glob, struct event_trigger_ops *ops,

update_cond_flag(file);

+ if (hist_data->enable_timestamps)
+ tracing_set_time_stamp_abs(file->tr, true);
+
if (trace_event_trigger_enable_disable(file, 1) < 0) {
list_del_rcu(&data->list);
update_cond_flag(file);
@@ -1556,6 +1595,9 @@ static void hist_unregister_trigger(char *glob, struct event_trigger_ops *ops,

if (unregistered && test->ops->free)
test->ops->free(test->ops, test);
+
+ if (hist_data->enable_timestamps)
+ tracing_set_time_stamp_abs(file->tr, false);
}

static void hist_unreg_all(struct trace_event_file *file)
--
1.9.3

2017-09-05 22:07:26

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 11/40] tracing: Increase tracing map KEYS_MAX size

The current default for the number of subkeys in a compound key is 2,
which is too restrictive. Increase it to a more realistic value of 3.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/tracing_map.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
index 4eabbcb..98ef6d6 100644
--- a/kernel/trace/tracing_map.h
+++ b/kernel/trace/tracing_map.h
@@ -5,7 +5,7 @@
#define TRACING_MAP_BITS_MAX 17
#define TRACING_MAP_BITS_MIN 7

-#define TRACING_MAP_KEYS_MAX 2
+#define TRACING_MAP_KEYS_MAX 3
#define TRACING_MAP_VALS_MAX 3
#define TRACING_MAP_FIELDS_MAX (TRACING_MAP_KEYS_MAX + \
TRACING_MAP_VALS_MAX)
--
1.9.3

2017-09-05 22:07:55

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 10/40] tracing: Add ring buffer event param to hist field functions

Some events such as timestamps require access to a ring_buffer_event
struct; add a param so that hist field functions can access that.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 39 ++++++++++++++++++++++++---------------
1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 9809243..bec2574 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -26,7 +26,8 @@

struct hist_field;

-typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event);
+typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event,
+ struct ring_buffer_event *rbe);

#define HIST_FIELD_OPERANDS_MAX 2

@@ -40,24 +41,28 @@ struct hist_field {
struct hist_field *operands[HIST_FIELD_OPERANDS_MAX];
};

-static u64 hist_field_none(struct hist_field *field, void *event)
+static u64 hist_field_none(struct hist_field *field, void *event,
+ struct ring_buffer_event *rbe)
{
return 0;
}

-static u64 hist_field_counter(struct hist_field *field, void *event)
+static u64 hist_field_counter(struct hist_field *field, void *event,
+ struct ring_buffer_event *rbe)
{
return 1;
}

-static u64 hist_field_string(struct hist_field *hist_field, void *event)
+static u64 hist_field_string(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
{
char *addr = (char *)(event + hist_field->field->offset);

return (u64)(unsigned long)addr;
}

-static u64 hist_field_dynstring(struct hist_field *hist_field, void *event)
+static u64 hist_field_dynstring(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
{
u32 str_item = *(u32 *)(event + hist_field->field->offset);
int str_loc = str_item & 0xffff;
@@ -66,24 +71,28 @@ static u64 hist_field_dynstring(struct hist_field *hist_field, void *event)
return (u64)(unsigned long)addr;
}

-static u64 hist_field_pstring(struct hist_field *hist_field, void *event)
+static u64 hist_field_pstring(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
{
char **addr = (char **)(event + hist_field->field->offset);

return (u64)(unsigned long)*addr;
}

-static u64 hist_field_log2(struct hist_field *hist_field, void *event)
+static u64 hist_field_log2(struct hist_field *hist_field, void *event,
+ struct ring_buffer_event *rbe)
{
struct hist_field *operand = hist_field->operands[0];

- u64 val = operand->fn(operand, event);
+ u64 val = operand->fn(operand, event, rbe);

return (u64) ilog2(roundup_pow_of_two(val));
}

#define DEFINE_HIST_FIELD_FN(type) \
-static u64 hist_field_##type(struct hist_field *hist_field, void *event)\
+ static u64 hist_field_##type(struct hist_field *hist_field, \
+ void *event, \
+ struct ring_buffer_event *rbe) \
{ \
type *addr = (type *)(event + hist_field->field->offset); \
\
@@ -871,8 +880,8 @@ static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
}

static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,
- struct tracing_map_elt *elt,
- void *rec)
+ struct tracing_map_elt *elt, void *rec,
+ struct ring_buffer_event *rbe)
{
struct hist_field *hist_field;
unsigned int i;
@@ -880,7 +889,7 @@ static void hist_trigger_elt_update(struct hist_trigger_data *hist_data,

for_each_hist_val_field(i, hist_data) {
hist_field = hist_data->fields[i];
- hist_val = hist_field->fn(hist_field, rec);
+ hist_val = hist_field->fn(hist_field, rec, rbe);
tracing_map_update_sum(elt, i, hist_val);
}
}
@@ -910,7 +919,7 @@ static inline void add_to_key(char *compound_key, void *key,
}

static void event_hist_trigger(struct event_trigger_data *data, void *rec,
- struct ring_buffer_event *event)
+ struct ring_buffer_event *rbe)
{
struct hist_trigger_data *hist_data = data->private_data;
bool use_compound_key = (hist_data->n_keys > 1);
@@ -939,7 +948,7 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,

key = entries;
} else {
- field_contents = key_field->fn(key_field, rec);
+ field_contents = key_field->fn(key_field, rec, rbe);
if (key_field->flags & HIST_FIELD_FL_STRING) {
key = (void *)(unsigned long)field_contents;
use_compound_key = true;
@@ -956,7 +965,7 @@ static void event_hist_trigger(struct event_trigger_data *data, void *rec,

elt = tracing_map_insert(hist_data->map, key);
if (elt)
- hist_trigger_elt_update(hist_data, elt, rec);
+ hist_trigger_elt_update(hist_data, elt, rec, rbe);
}

static void hist_trigger_stacktrace_print(struct seq_file *m,
--
1.9.3

2017-09-05 22:08:36

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 05/40] tracing: Reimplement log2

log2 as currently implemented applies only to u64 trace_event_field
derived fields, and assumes that anything it's applied to is a u64
field.

To prepare for synthetic fields like latencies, log2 should be
applicable to those as well, so take the opportunity now to fix the
current problems as well as expand to more general uses.

log2 should be thought of as a chaining function rather than a field
type. To enable this as well as possible future function
implementations, add a hist_field operand array into the hist_field
definition for this purpose, and make use of it to implement the log2
'function'.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0184acd..a16904c 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -28,12 +28,16 @@

typedef u64 (*hist_field_fn_t) (struct hist_field *field, void *event);

+#define HIST_FIELD_OPERANDS_MAX 2
+
struct hist_field {
struct ftrace_event_field *field;
unsigned long flags;
hist_field_fn_t fn;
unsigned int size;
unsigned int offset;
+ unsigned int is_signed;
+ struct hist_field *operands[HIST_FIELD_OPERANDS_MAX];
};

static u64 hist_field_none(struct hist_field *field, void *event)
@@ -71,7 +75,9 @@ static u64 hist_field_pstring(struct hist_field *hist_field, void *event)

static u64 hist_field_log2(struct hist_field *hist_field, void *event)
{
- u64 val = *(u64 *)(event + hist_field->field->offset);
+ struct hist_field *operand = hist_field->operands[0];
+
+ u64 val = operand->fn(operand, event);

return (u64) ilog2(roundup_pow_of_two(val));
}
@@ -156,6 +162,8 @@ static const char *hist_field_name(struct hist_field *field,

if (field->field)
field_name = field->field->name;
+ else if (field->flags & HIST_FIELD_FL_LOG2)
+ field_name = hist_field_name(field->operands[0], ++level);

if (field_name == NULL)
field_name = "";
@@ -346,8 +354,20 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
.elt_init = hist_trigger_elt_comm_init,
};

-static void destroy_hist_field(struct hist_field *hist_field)
+static void destroy_hist_field(struct hist_field *hist_field,
+ unsigned int level)
{
+ unsigned int i;
+
+ if (level > 2)
+ return;
+
+ if (!hist_field)
+ return;
+
+ for (i = 0; i < HIST_FIELD_OPERANDS_MAX; i++)
+ destroy_hist_field(hist_field->operands[i], level + 1);
+
kfree(hist_field);
}

@@ -374,7 +394,10 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
}

if (flags & HIST_FIELD_FL_LOG2) {
+ unsigned long fl = flags & ~HIST_FIELD_FL_LOG2;
hist_field->fn = hist_field_log2;
+ hist_field->operands[0] = create_hist_field(field, fl);
+ hist_field->size = hist_field->operands[0]->size;
goto out;
}

@@ -394,7 +417,7 @@ static struct hist_field *create_hist_field(struct ftrace_event_field *field,
hist_field->fn = select_value_fn(field->size,
field->is_signed);
if (!hist_field->fn) {
- destroy_hist_field(hist_field);
+ destroy_hist_field(hist_field, 0);
return NULL;
}
}
@@ -411,7 +434,7 @@ static void destroy_hist_fields(struct hist_trigger_data *hist_data)

for (i = 0; i < TRACING_MAP_FIELDS_MAX; i++) {
if (hist_data->fields[i]) {
- destroy_hist_field(hist_data->fields[i]);
+ destroy_hist_field(hist_data->fields[i], 0);
hist_data->fields[i] = NULL;
}
}
--
1.9.3

2017-09-05 22:08:32

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 06/40] ring-buffer: Add interface for setting absolute time stamps

Define a new function, tracing_set_time_stamp_abs(), which can be used
to enable or disable the use of absolute timestamps rather than time
deltas for a trace array.

This resets the buffer to prevent a mix of time deltas and absolute
timestamps.

Only the interface is added here; a subsequent patch will add the
underlying implementation.

Signed-off-by: Tom Zanussi <[email protected]>
---
include/linux/ring_buffer.h | 2 ++
kernel/trace/ring_buffer.c | 11 +++++++++++
kernel/trace/trace.c | 40 +++++++++++++++++++++++++++++++++++++++-
kernel/trace/trace.h | 3 +++
4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index ee9b461..28e3472 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -180,6 +180,8 @@ void ring_buffer_normalize_time_stamp(struct ring_buffer *buffer,
int cpu, u64 *ts);
void ring_buffer_set_clock(struct ring_buffer *buffer,
u64 (*clock)(void));
+void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs);
+bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer);

size_t ring_buffer_page_len(void *page);

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 81279c6..cff15a2 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -485,6 +485,7 @@ struct ring_buffer {
u64 (*clock)(void);

struct rb_irq_work irq_work;
+ bool time_stamp_abs;
};

struct ring_buffer_iter {
@@ -1379,6 +1380,16 @@ void ring_buffer_set_clock(struct ring_buffer *buffer,
buffer->clock = clock;
}

+void ring_buffer_set_time_stamp_abs(struct ring_buffer *buffer, bool abs)
+{
+ buffer->time_stamp_abs = abs;
+}
+
+bool ring_buffer_time_stamp_abs(struct ring_buffer *buffer)
+{
+ return buffer->time_stamp_abs;
+}
+
static void rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer);

static inline unsigned long rb_page_entries(struct buffer_page *bpage)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 30338a8..66d465e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2269,7 +2269,7 @@ struct ring_buffer_event *

*current_rb = trace_file->tr->trace_buffer.buffer;

- if ((trace_file->flags &
+ if (!ring_buffer_time_stamp_abs(*current_rb) && (trace_file->flags &
(EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) &&
(entry = this_cpu_read(trace_buffered_event))) {
/* Try to use the per cpu buffer first */
@@ -6279,6 +6279,44 @@ static int tracing_clock_open(struct inode *inode, struct file *file)
return ret;
}

+int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs)
+{
+ int ret = 0;
+
+ mutex_lock(&trace_types_lock);
+
+ if (abs && tr->time_stamp_abs_ref++)
+ goto out;
+
+ if (!abs) {
+ if (WARN_ON_ONCE(!tr->time_stamp_abs_ref)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (--tr->time_stamp_abs_ref)
+ goto out;
+ }
+
+ ring_buffer_set_time_stamp_abs(tr->trace_buffer.buffer, abs);
+
+ /*
+ * New timestamps may not be consistent with the previous setting.
+ * Reset the buffer so that it doesn't have incomparable timestamps.
+ */
+ tracing_reset_online_cpus(&tr->trace_buffer);
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+ if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
+ ring_buffer_set_time_stamp_abs(tr->max_buffer.buffer, abs);
+ tracing_reset_online_cpus(&tr->max_buffer);
+#endif
+ out:
+ mutex_unlock(&trace_types_lock);
+
+ return ret;
+}
+
struct ftrace_buffer_info {
struct trace_iterator iter;
void *spare;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index fb5d54d..dcfc67d 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -272,6 +272,7 @@ struct trace_array {
/* function tracing enabled */
int function_enabled;
#endif
+ int time_stamp_abs_ref;
};

enum {
@@ -285,6 +286,8 @@ enum {
extern int trace_array_get(struct trace_array *tr);
extern void trace_array_put(struct trace_array *tr);

+extern int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs);
+
/*
* The global tracer (top) should be the first trace array added,
* but we check the flag anyway.
--
1.9.3

2017-09-05 22:09:15

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 03/40] tracing: Remove code which merges duplicates

From: Vedang Patel <[email protected]>

We now have the logic to detect and remove duplicates in the
tracing_map hash table. The code which merges duplicates in the
histogram is redundant now. So, modify this code just to detect
duplicates. The duplication detection code is still kept to ensure
that any rare race condition which might cause duplicates does not go
unnoticed.

Signed-off-by: Vedang Patel <[email protected]>
---
kernel/trace/trace_events_hist.c | 11 ------
kernel/trace/tracing_map.c | 83 +++-------------------------------------
kernel/trace/tracing_map.h | 7 ----
3 files changed, 6 insertions(+), 95 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 7eb975a..773a66e 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -315,16 +315,6 @@ static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
return 0;
}

-static void hist_trigger_elt_comm_copy(struct tracing_map_elt *to,
- struct tracing_map_elt *from)
-{
- char *comm_from = from->private_data;
- char *comm_to = to->private_data;
-
- if (comm_from)
- memcpy(comm_to, comm_from, TASK_COMM_LEN + 1);
-}
-
static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
{
char *comm = elt->private_data;
@@ -335,7 +325,6 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)

static const struct tracing_map_ops hist_trigger_elt_comm_ops = {
.elt_alloc = hist_trigger_elt_comm_alloc,
- .elt_copy = hist_trigger_elt_comm_copy,
.elt_free = hist_trigger_elt_comm_free,
.elt_init = hist_trigger_elt_comm_init,
};
diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
index 437b490..421a0fa 100644
--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -844,67 +844,15 @@ void tracing_map_destroy_sort_entries(struct tracing_map_sort_entry **entries,
return sort_entry;
}

-static struct tracing_map_elt *copy_elt(struct tracing_map_elt *elt)
-{
- struct tracing_map_elt *dup_elt;
- unsigned int i;
-
- dup_elt = tracing_map_elt_alloc(elt->map);
- if (IS_ERR(dup_elt))
- return NULL;
-
- if (elt->map->ops && elt->map->ops->elt_copy)
- elt->map->ops->elt_copy(dup_elt, elt);
-
- dup_elt->private_data = elt->private_data;
- memcpy(dup_elt->key, elt->key, elt->map->key_size);
-
- for (i = 0; i < elt->map->n_fields; i++) {
- atomic64_set(&dup_elt->fields[i].sum,
- atomic64_read(&elt->fields[i].sum));
- dup_elt->fields[i].cmp_fn = elt->fields[i].cmp_fn;
- }
-
- return dup_elt;
-}
-
-static int merge_dup(struct tracing_map_sort_entry **sort_entries,
- unsigned int target, unsigned int dup)
-{
- struct tracing_map_elt *target_elt, *elt;
- bool first_dup = (target - dup) == 1;
- int i;
-
- if (first_dup) {
- elt = sort_entries[target]->elt;
- target_elt = copy_elt(elt);
- if (!target_elt)
- return -ENOMEM;
- sort_entries[target]->elt = target_elt;
- sort_entries[target]->elt_copied = true;
- } else
- target_elt = sort_entries[target]->elt;
-
- elt = sort_entries[dup]->elt;
-
- for (i = 0; i < elt->map->n_fields; i++)
- atomic64_add(atomic64_read(&elt->fields[i].sum),
- &target_elt->fields[i].sum);
-
- sort_entries[dup]->dup = true;
-
- return 0;
-}
-
-static int merge_dups(struct tracing_map_sort_entry **sort_entries,
+static void detect_dups(struct tracing_map_sort_entry **sort_entries,
int n_entries, unsigned int key_size)
{
unsigned int dups = 0, total_dups = 0;
- int err, i, j;
+ int i;
void *key;

if (n_entries < 2)
- return total_dups;
+ return;

sort(sort_entries, n_entries, sizeof(struct tracing_map_sort_entry *),
(int (*)(const void *, const void *))cmp_entries_dup, NULL);
@@ -913,30 +861,14 @@ static int merge_dups(struct tracing_map_sort_entry **sort_entries,
for (i = 1; i < n_entries; i++) {
if (!memcmp(sort_entries[i]->key, key, key_size)) {
dups++; total_dups++;
- err = merge_dup(sort_entries, i - dups, i);
- if (err)
- return err;
continue;
}
key = sort_entries[i]->key;
dups = 0;
}

- if (!total_dups)
- return total_dups;
-
- for (i = 0, j = 0; i < n_entries; i++) {
- if (!sort_entries[i]->dup) {
- sort_entries[j] = sort_entries[i];
- if (j++ != i)
- sort_entries[i] = NULL;
- } else {
- destroy_sort_entry(sort_entries[i]);
- sort_entries[i] = NULL;
- }
- }
-
- return total_dups;
+ WARN_ONCE(total_dups > 0,
+ "Duplicates detected: %d\n", total_dups);
}

static bool is_key(struct tracing_map *map, unsigned int field_idx)
@@ -1062,10 +994,7 @@ int tracing_map_sort_entries(struct tracing_map *map,
return 1;
}

- ret = merge_dups(entries, n_entries, map->key_size);
- if (ret < 0)
- goto free;
- n_entries -= ret;
+ detect_dups(entries, n_entries, map->key_size);

if (is_key(map, sort_keys[0].field_idx))
cmp_entries_fn = cmp_entries_key;
diff --git a/kernel/trace/tracing_map.h b/kernel/trace/tracing_map.h
index 618838f..4eabbcb 100644
--- a/kernel/trace/tracing_map.h
+++ b/kernel/trace/tracing_map.h
@@ -214,11 +214,6 @@ struct tracing_map {
* Element allocation occurs before tracing begins, when the
* tracing_map_init() call is made by client code.
*
- * @elt_copy: At certain points in the lifetime of an element, it may
- * need to be copied. The copy should include a copy of the
- * client-allocated data, which can be copied into the 'to'
- * element from the 'from' element.
- *
* @elt_free: When a tracing_map_elt is freed, this function is called
* and allows client-allocated per-element data to be freed.
*
@@ -232,8 +227,6 @@ struct tracing_map {
*/
struct tracing_map_ops {
int (*elt_alloc)(struct tracing_map_elt *elt);
- void (*elt_copy)(struct tracing_map_elt *to,
- struct tracing_map_elt *from);
void (*elt_free)(struct tracing_map_elt *elt);
void (*elt_clear)(struct tracing_map_elt *elt);
void (*elt_init)(struct tracing_map_elt *elt);
--
1.9.3

2017-09-05 22:09:37

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH v2 01/40] tracing: Exclude 'generic fields' from histograms

There are a small number of 'generic fields' (comm/COMM/cpu/CPU) that
are found by trace_find_event_field() but are only meant for
filtering. Specifically, they unlike normal fields, they have a size
of 0 and thus wreak havoc when used as a histogram key.

Exclude these (return -EINVAL) when used as histogram keys.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_hist.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1c21d0e..7eb975a 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -450,7 +450,7 @@ static int create_val_field(struct hist_trigger_data *hist_data,
}

field = trace_find_event_field(file->event_call, field_name);
- if (!field) {
+ if (!field || !field->size) {
ret = -EINVAL;
goto out;
}
@@ -548,7 +548,7 @@ static int create_key_field(struct hist_trigger_data *hist_data,
}

field = trace_find_event_field(file->event_call, field_name);
- if (!field) {
+ if (!field || !field->size) {
ret = -EINVAL;
goto out;
}
--
1.9.3

2017-09-05 23:29:20

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH v2 25/40] tracing: Add support for dynamic tracepoints

----- On Sep 5, 2017, at 5:57 PM, Tom Zanussi [email protected] wrote:

> The tracepoint infrastructure assumes statically-defined tracepoints
> and uses static_keys for tracepoint enablement. In order to define
> tracepoints on the fly, we need to have a dynamic counterpart.
>
> Add a 'dynamic' flag to struct tracepoint along with accompanying
> logic for this purpose.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> include/linux/tracepoint-defs.h | 1 +
> kernel/tracepoint.c | 18 +++++++++++++-----
> 2 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> index a031920..bc22d54 100644
> --- a/include/linux/tracepoint-defs.h
> +++ b/include/linux/tracepoint-defs.h
> @@ -32,6 +32,7 @@ struct tracepoint {
> int (*regfunc)(void);
> void (*unregfunc)(void);
> struct tracepoint_func __rcu *funcs;
> + bool dynamic;
> };
>
> #endif
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 685c50a..1c5957f 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -197,7 +197,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
> struct tracepoint_func *old, *tp_funcs;
> int ret;
>
> - if (tp->regfunc && !static_key_enabled(&tp->key)) {
> + if (tp->regfunc &&
> + ((tp->dynamic && !(atomic_read(&tp->key.enabled) > 0)) ||
> + !static_key_enabled(&tp->key))) {
> ret = tp->regfunc();
> if (ret < 0)
> return ret;
> @@ -219,7 +221,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
> * is used.
> */
> rcu_assign_pointer(tp->funcs, tp_funcs);
> - if (!static_key_enabled(&tp->key))
> + if (tp->dynamic && !(atomic_read(&tp->key.enabled) > 0))
> + atomic_inc(&tp->key.enabled);
> + else if (!tp->dynamic && !static_key_enabled(&tp->key))
> static_key_slow_inc(&tp->key);
> release_probes(old);
> return 0;
> @@ -246,10 +250,14 @@ static int tracepoint_remove_func(struct tracepoint *tp,
>
> if (!tp_funcs) {
> /* Removed last function */
> - if (tp->unregfunc && static_key_enabled(&tp->key))
> + if (tp->unregfunc &&
> + ((tp->dynamic && (atomic_read(&tp->key.enabled) > 0)) ||
> + static_key_enabled(&tp->key)))
> tp->unregfunc();
>
> - if (static_key_enabled(&tp->key))
> + if (tp->dynamic && (atomic_read(&tp->key.enabled) > 0))
> + atomic_dec(&tp->key.enabled);
> + else if (!tp->dynamic && static_key_enabled(&tp->key))
> static_key_slow_dec(&tp->key);
> }
> rcu_assign_pointer(tp->funcs, tp_funcs);
> @@ -258,7 +266,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> }
>
> /**
> - * tracepoint_probe_register - Connect a probe to a tracepoint
> + * tracepoint_probe_register_prio - Connect a probe to a tracepoint
> * @tp: tracepoint
> * @probe: probe handler
> * @data: tracepoint data

Hi Tom,

Thanks for updating your approach to dynamic tracepoints.

Since you're fixing up this comment above tracepoint_probe_register_prio,
can you also remove the following line above tracepoint_probe_register
while you are at it ?

* @prio: priority of this function over other registered functions

only the tracepoint_probe_register_prio function has the "prio" parameter.

Thanks!

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2017-09-06 02:35:17

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 25/40] tracing: Add support for dynamic tracepoints

Hi Mathieu,

On Tue, 2017-09-05 at 23:29 +0000, Mathieu Desnoyers wrote:
> ----- On Sep 5, 2017, at 5:57 PM, Tom Zanussi [email protected] wrote:
>
> > The tracepoint infrastructure assumes statically-defined tracepoints
> > and uses static_keys for tracepoint enablement. In order to define
> > tracepoints on the fly, we need to have a dynamic counterpart.
> >
> > Add a 'dynamic' flag to struct tracepoint along with accompanying
> > logic for this purpose.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
> > ---
> > include/linux/tracepoint-defs.h | 1 +
> > kernel/tracepoint.c | 18 +++++++++++++-----
> > 2 files changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> > index a031920..bc22d54 100644
> > --- a/include/linux/tracepoint-defs.h
> > +++ b/include/linux/tracepoint-defs.h
> > @@ -32,6 +32,7 @@ struct tracepoint {
> > int (*regfunc)(void);
> > void (*unregfunc)(void);
> > struct tracepoint_func __rcu *funcs;
> > + bool dynamic;
> > };
> >
> > #endif
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index 685c50a..1c5957f 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -197,7 +197,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
> > struct tracepoint_func *old, *tp_funcs;
> > int ret;
> >
> > - if (tp->regfunc && !static_key_enabled(&tp->key)) {
> > + if (tp->regfunc &&
> > + ((tp->dynamic && !(atomic_read(&tp->key.enabled) > 0)) ||
> > + !static_key_enabled(&tp->key))) {
> > ret = tp->regfunc();
> > if (ret < 0)
> > return ret;
> > @@ -219,7 +221,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
> > * is used.
> > */
> > rcu_assign_pointer(tp->funcs, tp_funcs);
> > - if (!static_key_enabled(&tp->key))
> > + if (tp->dynamic && !(atomic_read(&tp->key.enabled) > 0))
> > + atomic_inc(&tp->key.enabled);
> > + else if (!tp->dynamic && !static_key_enabled(&tp->key))
> > static_key_slow_inc(&tp->key);
> > release_probes(old);
> > return 0;
> > @@ -246,10 +250,14 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> >
> > if (!tp_funcs) {
> > /* Removed last function */
> > - if (tp->unregfunc && static_key_enabled(&tp->key))
> > + if (tp->unregfunc &&
> > + ((tp->dynamic && (atomic_read(&tp->key.enabled) > 0)) ||
> > + static_key_enabled(&tp->key)))
> > tp->unregfunc();
> >
> > - if (static_key_enabled(&tp->key))
> > + if (tp->dynamic && (atomic_read(&tp->key.enabled) > 0))
> > + atomic_dec(&tp->key.enabled);
> > + else if (!tp->dynamic && static_key_enabled(&tp->key))
> > static_key_slow_dec(&tp->key);
> > }
> > rcu_assign_pointer(tp->funcs, tp_funcs);
> > @@ -258,7 +266,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> > }
> >
> > /**
> > - * tracepoint_probe_register - Connect a probe to a tracepoint
> > + * tracepoint_probe_register_prio - Connect a probe to a tracepoint
> > * @tp: tracepoint
> > * @probe: probe handler
> > * @data: tracepoint data
>
> Hi Tom,
>
> Thanks for updating your approach to dynamic tracepoints.
>
> Since you're fixing up this comment above tracepoint_probe_register_prio,
> can you also remove the following line above tracepoint_probe_register
> while you are at it ?
>

Sure, will do.

Thanks,

Tom


2017-09-06 18:32:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 02/40] tracing: Add support to detect and avoid duplicates

On Tue, 5 Sep 2017 16:57:14 -0500
Tom Zanussi <[email protected]> wrote:

> From: Vedang Patel <[email protected]>
>
> A duplicate in the tracing_map hash table is when 2 different entries
> have the same key and, as a result, the key_hash. This is possible due
> to a race condition in the algorithm. This race condition is inherent to
> the algorithm and not a bug. This was fine because, until now, we were
> only interested in the sum of all the values related to a particular
> key (the duplicates are dealt with in tracing_map_sort_entries()). But,
> with the inclusion of variables[1], we are interested in individual
> values. So, it will not be clear what value to choose when
> there are duplicates. So, the duplicates need to be removed.
>


> [1] - https://lkml.org/lkml/2017/6/26/751

FYI, something like this should have:

Link: http://lkml.kernel.org/r/[email protected]

And avoid any non kernel.org archiving system.

-- Steve

>
> Signed-off-by: Vedang Patel <[email protected]>
> ---
>

2017-09-06 18:47:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 02/40] tracing: Add support to detect and avoid duplicates

On Tue, 5 Sep 2017 16:57:14 -0500
Tom Zanussi <[email protected]> wrote:


> diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
> index 305039b..437b490 100644
> --- a/kernel/trace/tracing_map.c
> +++ b/kernel/trace/tracing_map.c
> @@ -414,6 +414,7 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
> __tracing_map_insert(struct tracing_map *map, void *key, bool lookup_only)
> {
> u32 idx, key_hash, test_key;
> + int dup_try = 0;
> struct tracing_map_entry *entry;
>
> key_hash = jhash(key, map->key_size, 0);
> @@ -426,10 +427,31 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
> entry = TRACING_MAP_ENTRY(map->map, idx);
> test_key = entry->key;
>
> - if (test_key && test_key == key_hash && entry->val &&
> - keys_match(key, entry->val->key, map->key_size)) {
> - atomic64_inc(&map->hits);
> - return entry->val;
> + if (test_key && test_key == key_hash) {
> + if (entry->val &&
> + keys_match(key, entry->val->key, map->key_size)) {
> + atomic64_inc(&map->hits);
> + return entry->val;
> + } else if (unlikely(!entry->val)) {

I'm thinking we need a READ_ONCE() here.

val = READ_ONCE(entry->val);

then use "val" instead of entry->val. Otherwise, wont it be possible
if two tasks are inserting at the same time, to have this:

(Using reg as when the value is read into a register from memory)

CPU0 CPU1
---- ----
reg = entry->val
(reg == zero)

entry->val = elt;

keys_match(key, reg)
(false)

reg = entry->val
(reg = elt)

if (unlikely(!reg))

Causes the if to fail.

A READ_ONCE(), would make sure the entry->val used to test against key
would also be the same value used to test if it is zero.

-- Steve



> + /*
> + * The key is present. But, val (pointer to elt
> + * struct) is still NULL. which means some other
> + * thread is in the process of inserting an
> + * element.
> + *
> + * On top of that, it's key_hash is same as the
> + * one being inserted right now. So, it's
> + * possible that the element has the same
> + * key as well.
> + */
> +
> + dup_try++;
> + if (dup_try > map->map_size) {
> + atomic64_inc(&map->drops);
> + break;
> + }
> + continue;
> + }
> }
>
> if (!test_key) {
> @@ -451,6 +473,13 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
> atomic64_inc(&map->hits);
>
> return entry->val;
> + } else {
> + /*
> + * cmpxchg() failed. Loop around once
> + * more to check what key was inserted.
> + */
> + dup_try++;
> + continue;
> }
> }
>

2017-09-06 19:58:03

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 07/40] tracing: Apply absolute timestamps to instance max buffer

On Tue, 5 Sep 2017 16:57:19 -0500
Tom Zanussi <[email protected]> wrote:

> From: Baohong Liu <[email protected]>
>
> Currently absolute timestamps are applied to both regular and max
> buffers only for global trace. For instance trace, absolute
> timestamps are applied only to regular buffer. But, regular and max
> buffers can be swapped, for example, following a snapshot. So, for
> instance trace, bad timestamps can be seen following a snapshot.
> Let's apply absolute timestamps to instance max buffer as well.
>
> Similarly, buffer clock change is applied to instance max buffer
> as well.

Hmm, this is a bug fix in its own right. I'll pull this in for the
current merge window and slap a stable tag on it too.

-- Steve

>
> Signed-off-by: Baohong Liu <[email protected]>
> ---
> kernel/trace/trace.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 66d465e..719e4c1 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -6223,7 +6223,7 @@ static int tracing_set_clock(struct trace_array *tr, const char *clockstr)
> tracing_reset_online_cpus(&tr->trace_buffer);
>
> #ifdef CONFIG_TRACER_MAX_TRACE
> - if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
> + if (tr->max_buffer.buffer)
> ring_buffer_set_clock(tr->max_buffer.buffer, trace_clocks[i].func);
> tracing_reset_online_cpus(&tr->max_buffer);
> #endif
> @@ -6307,7 +6307,7 @@ int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs)
> tracing_reset_online_cpus(&tr->trace_buffer);
>
> #ifdef CONFIG_TRACER_MAX_TRACE
> - if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
> + if (tr->max_buffer.buffer)
> ring_buffer_set_time_stamp_abs(tr->max_buffer.buffer, abs);
> tracing_reset_online_cpus(&tr->max_buffer);
> #endif

2017-09-06 20:58:07

by Patel, Vedang

[permalink] [raw]
Subject: Re: [PATCH v2 02/40] tracing: Add support to detect and avoid duplicates

On Wed, 2017-09-06 at 14:47 -0400, Steven Rostedt wrote:
> On Tue,  5 Sep 2017 16:57:14 -0500
> Tom Zanussi <[email protected]> wrote:
>
>
> >
> > diff --git a/kernel/trace/tracing_map.c
> > b/kernel/trace/tracing_map.c
> > index 305039b..437b490 100644
> > --- a/kernel/trace/tracing_map.c
> > +++ b/kernel/trace/tracing_map.c
> > @@ -414,6 +414,7 @@ static inline bool keys_match(void *key, void
> > *test_key, unsigned key_size)
> >  __tracing_map_insert(struct tracing_map *map, void *key, bool
> > lookup_only)
> >  {
> >   u32 idx, key_hash, test_key;
> > + int dup_try = 0;
> >   struct tracing_map_entry *entry;
> >  
> >   key_hash = jhash(key, map->key_size, 0);
> > @@ -426,10 +427,31 @@ static inline bool keys_match(void *key, void
> > *test_key, unsigned key_size)
> >   entry = TRACING_MAP_ENTRY(map->map, idx);
> >   test_key = entry->key;
> >  
> > - if (test_key && test_key == key_hash && entry->val
> > &&
> > -     keys_match(key, entry->val->key, map-
> > >key_size)) {
> > - atomic64_inc(&map->hits);
> > - return entry->val;
> > + if (test_key && test_key == key_hash) {
> > + if (entry->val &&
> > +     keys_match(key, entry->val->key, map-
> > >key_size)) {
> > + atomic64_inc(&map->hits);
> > + return entry->val;
> > + } else if (unlikely(!entry->val)) {
> I'm thinking we need a READ_ONCE() here.
>
> val = READ_ONCE(entry->val);
>
> then use "val" instead of entry->val. Otherwise, wont it be possible
> if two tasks are inserting at the same time, to have this:
>
> (Using reg as when the value is read into a register from memory)
>
> CPU0 CPU1
> ---- ----
>  reg = entry->val
>  (reg == zero)
>
>    entry->val = elt;
>
>  keys_match(key, reg)
>  (false)
>
>  reg = entry->val
>  (reg = elt)
>
>  if (unlikely(!reg))
>
> Causes the if to fail.
>
> A READ_ONCE(), would make sure the entry->val used to test against
> key
> would also be the same value used to test if it is zero.
>
Hi Steve, 

Thanks for the input. 

I agree with your change. Adding READ_ONCE will avoid a race condition
which might result in adding duplicates. Will add it in the next
version.

-Vedang
> -- Steve
>
>
>
> >
> > + /*
> > +  * The key is present. But, val
> > (pointer to elt
> > +  * struct) is still NULL. which
> > means some other
> > +  * thread is in the process of
> > inserting an
> > +  * element.
> > +  *
> > +  * On top of that, it's key_hash
> > is same as the
> > +  * one being inserted right now.
> > So, it's
> > +  * possible that the element has
> > the same
> > +  * key as well.
> > +  */
> > +
> > + dup_try++;
> > + if (dup_try > map->map_size) {
> > + atomic64_inc(&map->drops);
> > + break;
> > + }
> > + continue;
> > + }
> >   }
> >  
> >   if (!test_key) {
> > @@ -451,6 +473,13 @@ static inline bool keys_match(void *key, void
> > *test_key, unsigned key_size)
> >   atomic64_inc(&map->hits);
> >  
> >   return entry->val;
> > + } else {
> > + /*
> > +  * cmpxchg() failed. Loop around
> > once
> > +  * more to check what key was
> > inserted.
> > +  */
> > + dup_try++;
> > + continue;
> >   }
> >   }
> >  

2017-09-07 00:49:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 07/40] tracing: Apply absolute timestamps to instance max buffer

On Tue, 5 Sep 2017 16:57:19 -0500
Tom Zanussi <[email protected]> wrote:

> From: Baohong Liu <[email protected]>
>
> Currently absolute timestamps are applied to both regular and max
> buffers only for global trace. For instance trace, absolute
> timestamps are applied only to regular buffer. But, regular and max
> buffers can be swapped, for example, following a snapshot. So, for
> instance trace, bad timestamps can be seen following a snapshot.
> Let's apply absolute timestamps to instance max buffer as well.
>
> Similarly, buffer clock change is applied to instance max buffer
> as well.
>
> Signed-off-by: Baohong Liu <[email protected]>
> ---
> kernel/trace/trace.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 66d465e..719e4c1 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -6223,7 +6223,7 @@ static int tracing_set_clock(struct trace_array *tr, const char *clockstr)
> tracing_reset_online_cpus(&tr->trace_buffer);
>
> #ifdef CONFIG_TRACER_MAX_TRACE
> - if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
> + if (tr->max_buffer.buffer)
> ring_buffer_set_clock(tr->max_buffer.buffer, trace_clocks[i].func);
> tracing_reset_online_cpus(&tr->max_buffer);
> #endif




> @@ -6307,7 +6307,7 @@ int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs)
> tracing_reset_online_cpus(&tr->trace_buffer);
>
> #ifdef CONFIG_TRACER_MAX_TRACE
> - if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
> + if (tr->max_buffer.buffer)
> ring_buffer_set_time_stamp_abs(tr->max_buffer.buffer, abs);
> tracing_reset_online_cpus(&tr->max_buffer);
> #endif

Please fold this part into the previous patch. I'm adding the first
part to my tree now and will start testing it tonight, and push it to
Linus by the weekend.

-- Steve

2017-09-07 01:16:01

by Liu, Baohong

[permalink] [raw]
Subject: RE: [PATCH v2 07/40] tracing: Apply absolute timestamps to instance max buffer

On Wed, 6 Sep 2017 20:49:46 -0400 Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:19 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > From: Baohong Liu <[email protected]>
> >
> > Currently absolute timestamps are applied to both regular and max
> > buffers only for global trace. For instance trace, absolute timestamps
> > are applied only to regular buffer. But, regular and max buffers can
> > be swapped, for example, following a snapshot. So, for instance trace,
> > bad timestamps can be seen following a snapshot.
> > Let's apply absolute timestamps to instance max buffer as well.
> >
> > Similarly, buffer clock change is applied to instance max buffer as
> > well.
> >
> > Signed-off-by: Baohong Liu <[email protected]>
> > ---
> > kernel/trace/trace.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index
> > 66d465e..719e4c1 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -6223,7 +6223,7 @@ static int tracing_set_clock(struct trace_array *tr,
> const char *clockstr)
> > tracing_reset_online_cpus(&tr->trace_buffer);
> >
> > #ifdef CONFIG_TRACER_MAX_TRACE
> > - if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
> > + if (tr->max_buffer.buffer)
> > ring_buffer_set_clock(tr->max_buffer.buffer,
> trace_clocks[i].func);
> > tracing_reset_online_cpus(&tr->max_buffer);
> > #endif
>
>
>
>
> > @@ -6307,7 +6307,7 @@ int tracing_set_time_stamp_abs(struct trace_array
> *tr, bool abs)
> > tracing_reset_online_cpus(&tr->trace_buffer);
> >
> > #ifdef CONFIG_TRACER_MAX_TRACE
> > - if (tr->flags & TRACE_ARRAY_FL_GLOBAL && tr->max_buffer.buffer)
> > + if (tr->max_buffer.buffer)
> > ring_buffer_set_time_stamp_abs(tr->max_buffer.buffer, abs);
> > tracing_reset_online_cpus(&tr->max_buffer);
> > #endif
>
> Please fold this part into the previous patch. I'm adding the first part to my tree
> now and will start testing it tonight, and push it to Linus by the weekend.

Will do.

Thanks,

Baohong

>
> -- Steve
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the
> body of a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2017-09-07 14:35:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 08/40] ring-buffer: Redefine the unimplemented RINGBUF_TIME_TIME_STAMP

On Tue, 5 Sep 2017 16:57:20 -0500
Tom Zanussi <[email protected]> wrote:
> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> index 28e3472..74bc276 100644
> --- a/include/linux/ring_buffer.h
> +++ b/include/linux/ring_buffer.h
> @@ -36,10 +36,12 @@ struct ring_buffer_event {
> * array[0] = time delta (28 .. 59)
> * size = 8 bytes
> *
> - * @RINGBUF_TYPE_TIME_STAMP: Sync time stamp with external clock
> - * array[0] = tv_nsec
> - * array[1..2] = tv_sec
> - * size = 16 bytes
> + * @RINGBUF_TYPE_TIME_STAMP: Absolute timestamp
> + * Same format as TIME_EXTEND except that the
> + * value is an absolute timestamp, not a delta
> + * event.time_delta contains bottom 27 bits
> + * array[0] = top (28 .. 59) bits
> + * size = 8 bytes

Is it going to be an issue that our time stamp is only 59 bits?

2^59 = 576,460,752,303,423,488

Thus, 2^59 nanoseconds (I doubt we will need to have precision better
than nanoseconds) = 576,460,752 seconds = 9,607,679 minutes = 160,127
hours = 6,671 days = 18 years.

We would be screwed if we trace for more than 18 years. ;-)

That's why I had it as 16 bytes, to be able to hold a full 64 bit
timestamp (and still be 8 byte aligned). But since we've gone this long
without needing this, I'm sure a 59 bit absolute timestamp should be
good enough.

> *
> * <= @RINGBUF_TYPE_DATA_TYPE_LEN_MAX:
> * Data record
> @@ -56,12 +58,12 @@ enum ring_buffer_type {
> RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
> RINGBUF_TYPE_PADDING,
> RINGBUF_TYPE_TIME_EXTEND,
> - /* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
> RINGBUF_TYPE_TIME_STAMP,
> };
>
> unsigned ring_buffer_event_length(struct ring_buffer_event *event);
> void *ring_buffer_event_data(struct ring_buffer_event *event);
> +u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event);
>
> /*
> * ring_buffer_discard_commit will remove an event that has not




> @@ -2488,6 +2519,10 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
> {
> u64 delta;
>
> + /* In TIME_STAMP mode, write_stamp is unused, nothing to do */

No, we still need to keep the write_stamp updated. Sure, it doesn't use
it, but I do want to have absolute and delta timestamps working
together in a single buffer. It shouldn't be one or the other. In fact,
I plan on using it that way for nested events.

Maybe for this feature we can let it slide. But I will be working on
fixing that.

-- Steve

> + if (event->type_len == RINGBUF_TYPE_TIME_STAMP)
> + return;
> +
> /*
> * The event first in the commit queue updates the
> * time stamp.
> @@ -2501,9 +2536,7 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
> cpu_buffer->write_stamp =
> cpu_buffer->commit_page->page->time_stamp;
> else if (event->type_len == RINGBUF_TYPE_TIME_EXTEND) {
> - delta = event->array[0];
> - delta <<= TS_SHIFT;
> - delta += event->time_delta;
> + delta = ring_buffer_event_time_stamp(event);
> cpu_buffer->write_stamp += delta;
> } else

2017-09-07 15:05:52

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 08/40] ring-buffer: Redefine the unimplemented RINGBUF_TIME_TIME_STAMP

Hi Steve,

On Thu, 2017-09-07 at 10:35 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:20 -0500
> Tom Zanussi <[email protected]> wrote:
> > diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
> > index 28e3472..74bc276 100644
> > --- a/include/linux/ring_buffer.h
> > +++ b/include/linux/ring_buffer.h
> > @@ -36,10 +36,12 @@ struct ring_buffer_event {
> > * array[0] = time delta (28 .. 59)
> > * size = 8 bytes
> > *
> > - * @RINGBUF_TYPE_TIME_STAMP: Sync time stamp with external clock
> > - * array[0] = tv_nsec
> > - * array[1..2] = tv_sec
> > - * size = 16 bytes
> > + * @RINGBUF_TYPE_TIME_STAMP: Absolute timestamp
> > + * Same format as TIME_EXTEND except that the
> > + * value is an absolute timestamp, not a delta
> > + * event.time_delta contains bottom 27 bits
> > + * array[0] = top (28 .. 59) bits
> > + * size = 8 bytes
>
> Is it going to be an issue that our time stamp is only 59 bits?
>
> 2^59 = 576,460,752,303,423,488
>
> Thus, 2^59 nanoseconds (I doubt we will need to have precision better
> than nanoseconds) = 576,460,752 seconds = 9,607,679 minutes = 160,127
> hours = 6,671 days = 18 years.
>
> We would be screwed if we trace for more than 18 years. ;-)
>
> That's why I had it as 16 bytes, to be able to hold a full 64 bit
> timestamp (and still be 8 byte aligned). But since we've gone this long
> without needing this, I'm sure a 59 bit absolute timestamp should be
> good enough.
>

Yeah, I would think it should be good enough, but then I don't
realistically envision a machine with an 18 year uptime with tracing
enabled, maybe someone else does though. ;-)

> > *
> > * <= @RINGBUF_TYPE_DATA_TYPE_LEN_MAX:
> > * Data record
> > @@ -56,12 +58,12 @@ enum ring_buffer_type {
> > RINGBUF_TYPE_DATA_TYPE_LEN_MAX = 28,
> > RINGBUF_TYPE_PADDING,
> > RINGBUF_TYPE_TIME_EXTEND,
> > - /* FIXME: RINGBUF_TYPE_TIME_STAMP not implemented */
> > RINGBUF_TYPE_TIME_STAMP,
> > };
> >
> > unsigned ring_buffer_event_length(struct ring_buffer_event *event);
> > void *ring_buffer_event_data(struct ring_buffer_event *event);
> > +u64 ring_buffer_event_time_stamp(struct ring_buffer_event *event);
> >
> > /*
> > * ring_buffer_discard_commit will remove an event that has not
>
>
>
>
> > @@ -2488,6 +2519,10 @@ static inline void rb_event_discard(struct ring_buffer_event *event)
> > {
> > u64 delta;
> >
> > + /* In TIME_STAMP mode, write_stamp is unused, nothing to do */
>
> No, we still need to keep the write_stamp updated. Sure, it doesn't use
> it, but I do want to have absolute and delta timestamps working
> together in a single buffer. It shouldn't be one or the other. In fact,
> I plan on using it that way for nested events.
>
> Maybe for this feature we can let it slide. But I will be working on
> fixing that.
>

OK, great, thanks.

Tom


2017-09-07 16:40:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 19/40] tracing: Account for variables in named trigger compatibility

On Tue, 5 Sep 2017 16:57:31 -0500
Tom Zanussi <[email protected]> wrote:

> @@ -1786,6 +1786,12 @@ static bool hist_trigger_match(struct event_trigger_data *data,
> return false;
> if (key_field->is_signed != key_field_test->is_signed)
> return false;
> + if ((key_field->var.name && !key_field_test->var.name) ||
> + (!key_field->var.name && key_field_test->var.name))
> + return false;

Short cut:

if (!!key_field->var.name != !!key_field_test->var.name)
return false;

> + if ((key_field->var.name && key_field_test->var.name) &&

Only need to test if key_field->var.name, as the previous if statement
would exit out if key_field_test->var.name is false.

-- Steve

> + strcmp(key_field->var.name, key_field_test->var.name) != 0)
> + return false;
> }
>
> for (i = 0; i < hist_data->n_sort_keys; i++) {

2017-09-07 16:46:10

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 20/40] tracing: Add simple expression support to hist triggers

On Tue, 5 Sep 2017 16:57:32 -0500
Tom Zanussi <[email protected]> wrote:

> #define DEFINE_HIST_FIELD_FN(type) \
> static u64 hist_field_##type(struct hist_field *hist_field, \
> void *event, \
> @@ -148,6 +192,7 @@ enum hist_field_flags {
> HIST_FIELD_FL_TIMESTAMP_USECS = 2048,
> HIST_FIELD_FL_VAR = 4096,
> HIST_FIELD_FL_VAR_ONLY = 8192,
> + HIST_FIELD_FL_EXPR = 16384,

When numbers get this big, I'm a fan of:

HIST_FIELD_FL_TIMESTAMP_USECS = (1 << 11),
HIST_FIELD_FL_VAR = (1 << 12),
HIST_FIELD_FL_VAR_ONLY = (1 << 13),
HIST_FIELD_FL_EXPR = (1 << 14),

It makes it much easier to debug with printk()s and also know how many
bits you are actually using.

> };
>
> struct var_defs {
> @@ -218,6 +263,8 @@ static const char *hist_field_name(struct hist_field *field,
> field_name = hist_field_name(field->operands[0], ++level);
> else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
> field_name = "$common_timestamp";
> + else if (field->flags & HIST_FIELD_FL_EXPR)
> + field_name = field->name;
>
> if (field_name == NULL)
> field_name = "";
> @@ -441,6 +488,115 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
> .elt_init = hist_trigger_elt_comm_init,
> };
>
> +static const char *get_hist_field_flags(struct hist_field *hist_field)
> +{
> + const char *flags_str = NULL;
> +
> + if (hist_field->flags & HIST_FIELD_FL_HEX)
> + flags_str = "hex";
> + else if (hist_field->flags & HIST_FIELD_FL_SYM)
> + flags_str = "sym";
> + else if (hist_field->flags & HIST_FIELD_FL_SYM_OFFSET)
> + flags_str = "sym-offset";
> + else if (hist_field->flags & HIST_FIELD_FL_EXECNAME)
> + flags_str = "execname";
> + else if (hist_field->flags & HIST_FIELD_FL_SYSCALL)
> + flags_str = "syscall";
> + else if (hist_field->flags & HIST_FIELD_FL_LOG2)
> + flags_str = "log2";
> + else if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP_USECS)
> + flags_str = "usecs";
> +
> + return flags_str;
> +}

Also, can you make the moving of the code a separate patch before this
patch. It makes git blame and such nicer.

-- Steve

2017-09-07 17:01:37

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 20/40] tracing: Add simple expression support to hist triggers

On Thu, 2017-09-07 at 12:46 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:32 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > #define DEFINE_HIST_FIELD_FN(type) \
> > static u64 hist_field_##type(struct hist_field *hist_field, \
> > void *event, \
> > @@ -148,6 +192,7 @@ enum hist_field_flags {
> > HIST_FIELD_FL_TIMESTAMP_USECS = 2048,
> > HIST_FIELD_FL_VAR = 4096,
> > HIST_FIELD_FL_VAR_ONLY = 8192,
> > + HIST_FIELD_FL_EXPR = 16384,
>
> When numbers get this big, I'm a fan of:
>
> HIST_FIELD_FL_TIMESTAMP_USECS = (1 << 11),
> HIST_FIELD_FL_VAR = (1 << 12),
> HIST_FIELD_FL_VAR_ONLY = (1 << 13),
> HIST_FIELD_FL_EXPR = (1 << 14),
>
> It makes it much easier to debug with printk()s and also know how many
> bits you are actually using.
>
> > };
> >
> > struct var_defs {
> > @@ -218,6 +263,8 @@ static const char *hist_field_name(struct hist_field *field,
> > field_name = hist_field_name(field->operands[0], ++level);
> > else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
> > field_name = "$common_timestamp";
> > + else if (field->flags & HIST_FIELD_FL_EXPR)
> > + field_name = field->name;
> >
> > if (field_name == NULL)
> > field_name = "";
> > @@ -441,6 +488,115 @@ static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
> > .elt_init = hist_trigger_elt_comm_init,
> > };
> >
> > +static const char *get_hist_field_flags(struct hist_field *hist_field)
> > +{
> > + const char *flags_str = NULL;
> > +
> > + if (hist_field->flags & HIST_FIELD_FL_HEX)
> > + flags_str = "hex";
> > + else if (hist_field->flags & HIST_FIELD_FL_SYM)
> > + flags_str = "sym";
> > + else if (hist_field->flags & HIST_FIELD_FL_SYM_OFFSET)
> > + flags_str = "sym-offset";
> > + else if (hist_field->flags & HIST_FIELD_FL_EXECNAME)
> > + flags_str = "execname";
> > + else if (hist_field->flags & HIST_FIELD_FL_SYSCALL)
> > + flags_str = "syscall";
> > + else if (hist_field->flags & HIST_FIELD_FL_LOG2)
> > + flags_str = "log2";
> > + else if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP_USECS)
> > + flags_str = "usecs";
> > +
> > + return flags_str;
> > +}
>
> Also, can you make the moving of the code a separate patch before this
> patch. It makes git blame and such nicer.
>

Sure, will do as well as the above..

Tom


2017-09-07 17:03:29

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 19/40] tracing: Account for variables in named trigger compatibility

On Thu, 2017-09-07 at 12:40 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:31 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > @@ -1786,6 +1786,12 @@ static bool hist_trigger_match(struct event_trigger_data *data,
> > return false;
> > if (key_field->is_signed != key_field_test->is_signed)
> > return false;
> > + if ((key_field->var.name && !key_field_test->var.name) ||
> > + (!key_field->var.name && key_field_test->var.name))
> > + return false;
>
> Short cut:
>
> if (!!key_field->var.name != !!key_field_test->var.name)
> return false;
>

Nice!

> > + if ((key_field->var.name && key_field_test->var.name) &&
>
> Only need to test if key_field->var.name, as the previous if statement
> would exit out if key_field_test->var.name is false.
>

OK, will change, thanks.

Tom


2017-09-07 17:56:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 21/40] tracing: Generalize per-element hist trigger data

On Tue, 5 Sep 2017 16:57:33 -0500
Tom Zanussi <[email protected]> wrote:

> Up until now, hist triggers only needed per-element support for saving
> 'comm' data, which was saved directly as a private data pointer.
>
> In anticipation of the need to save other data besides 'comm', add a
> new hist_elt_data struct for the purpose, and switch the current
> 'comm'-related code over to that.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/trace_events_hist.c | 65 ++++++++++++++++++++--------------------
> 1 file changed, 32 insertions(+), 33 deletions(-)
>
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index bbca6d3..f17d394 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -249,6 +249,10 @@ static u64 hist_field_timestamp(struct hist_field *hist_field, void *event,
> return ts;
> }
>
> +struct hist_elt_data {
> + char *comm;
> +};
> +
> static const char *hist_field_name(struct hist_field *field,
> unsigned int level)
> {
> @@ -447,26 +451,36 @@ static inline void save_comm(char *comm, struct task_struct *task)
> memcpy(comm, task->comm, TASK_COMM_LEN);
> }
>
> -static void hist_trigger_elt_comm_free(struct tracing_map_elt *elt)
> +static void hist_trigger_elt_data_free(struct tracing_map_elt *elt)
> {
> - kfree((char *)elt->private_data);
> + struct hist_elt_data *private_data = elt->private_data;

Small nit, please don't call this variable "private_data". Call it
"elt_data" like you do below. Try to keep variable names consistent.

> +
> + kfree(private_data->comm);
> + kfree(private_data);
> }
>
> -static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
> +static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
> {
> struct hist_trigger_data *hist_data = elt->map->private_data;
> + unsigned int size = TASK_COMM_LEN + 1;
> + struct hist_elt_data *elt_data;
> struct hist_field *key_field;
> unsigned int i;
>
> + elt->private_data = elt_data = kzalloc(sizeof(*elt_data), GFP_KERNEL);

What about just allocating elt_data here, but not assigning private_data.

> + if (!elt_data)
> + return -ENOMEM;
> +
> for_each_hist_key_field(i, hist_data) {
> key_field = hist_data->fields[i];
>
> if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
> - unsigned int size = TASK_COMM_LEN + 1;
> -
> - elt->private_data = kzalloc(size, GFP_KERNEL);
> - if (!elt->private_data)
> + elt_data->comm = kzalloc(size, GFP_KERNEL);
> + if (!elt_data->comm) {
> + kfree(elt_data);
> + elt->private_data = NULL;

Then here, we don't need to remember to NULL it out.

> return -ENOMEM;
> + }
> break;
> }
> }

Then we can have after the loop.

elt->private_data = elt_data;


> @@ -474,18 +488,18 @@ static int hist_trigger_elt_comm_alloc(struct tracing_map_elt *elt)
> return 0;
> }
>
> -static void hist_trigger_elt_comm_init(struct tracing_map_elt *elt)
> +static void hist_trigger_elt_data_init(struct tracing_map_elt *elt)
> {
> - char *comm = elt->private_data;
> + struct hist_elt_data *private_data = elt->private_data;

Again, please call this elt_data.

>
> - if (comm)
> - save_comm(comm, current);
> + if (private_data->comm)
> + save_comm(private_data->comm, current);
> }
>
> -static const struct tracing_map_ops hist_trigger_elt_comm_ops = {
> - .elt_alloc = hist_trigger_elt_comm_alloc,
> - .elt_free = hist_trigger_elt_comm_free,
> - .elt_init = hist_trigger_elt_comm_init,
> +static const struct tracing_map_ops hist_trigger_elt_data_ops = {
> + .elt_alloc = hist_trigger_elt_data_alloc,
> + .elt_free = hist_trigger_elt_data_free,
> + .elt_init = hist_trigger_elt_data_init,
> };
>
> static const char *get_hist_field_flags(struct hist_field *hist_field)
> @@ -1494,21 +1508,6 @@ static int create_tracing_map_fields(struct hist_trigger_data *hist_data)
> return 0;
> }
>
> -static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
> -{
> - struct hist_field *key_field;
> - unsigned int i;
> -
> - for_each_hist_key_field(i, hist_data) {
> - key_field = hist_data->fields[i];
> -
> - if (key_field->flags & HIST_FIELD_FL_EXECNAME)
> - return true;
> - }
> -
> - return false;
> -}
> -
> static struct hist_trigger_data *
> create_hist_data(unsigned int map_bits,
> struct hist_trigger_attrs *attrs,
> @@ -1534,8 +1533,7 @@ static bool need_tracing_map_ops(struct hist_trigger_data *hist_data)
> if (ret)
> goto free;
>
> - if (need_tracing_map_ops(hist_data))
> - map_ops = &hist_trigger_elt_comm_ops;
> + map_ops = &hist_trigger_elt_data_ops;
>
> hist_data->map = tracing_map_create(map_bits, hist_data->key_size,
> map_ops, hist_data);
> @@ -1724,7 +1722,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
> seq_printf(m, "%s: [%llx] %-55s", field_name,
> uval, str);
> } else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
> - char *comm = elt->private_data;
> + struct hist_elt_data *elt_data = elt->private_data;

I wonder if we should have a return WARN_ON_ONCE(!elt_data); here just
in case.

-- Steve

> + char *comm = elt_data->comm;
>
> uval = *(u64 *)(key + key_field->offset);
> seq_printf(m, "%s: %-16s[%10llu]",
> field_name,

2017-09-07 18:14:59

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 21/40] tracing: Generalize per-element hist trigger data

On Thu, 2017-09-07 at 13:56 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:33 -0500
> Tom Zanussi <[email protected]> wrote:
>
[...]
> > hist_data->map = tracing_map_create(map_bits, hist_data->key_size,
> > map_ops, hist_data);
> > @@ -1724,7 +1722,8 @@ static void hist_trigger_stacktrace_print(struct seq_file *m,
> > seq_printf(m, "%s: [%llx] %-55s", field_name,
> > uval, str);
> > } else if (key_field->flags & HIST_FIELD_FL_EXECNAME) {
> > - char *comm = elt->private_data;
> > + struct hist_elt_data *elt_data = elt->private_data;
>
> I wonder if we should have a return WARN_ON_ONCE(!elt_data); here just
> in case.
>

Yeah, that makes sense, as do the other suggestions above, will update.

Thanks,

Tom



2017-09-07 22:02:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 25/40] tracing: Add support for dynamic tracepoints

On Tue, 5 Sep 2017 16:57:37 -0500
Tom Zanussi <[email protected]> wrote:

> The tracepoint infrastructure assumes statically-defined tracepoints
> and uses static_keys for tracepoint enablement. In order to define
> tracepoints on the fly, we need to have a dynamic counterpart.

Do we?

I believe the static keys should work just fine if you don't have any
defined static keys jumps, and they should work as a simple counter.

Could we do that instead of adding another variable to a structure that
is defined over a thousand times? Save us 4k of memory.

-- Steve

>
> Add a 'dynamic' flag to struct tracepoint along with accompanying
> logic for this purpose.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> include/linux/tracepoint-defs.h | 1 +
> kernel/tracepoint.c | 18 +++++++++++++-----
> 2 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
> index a031920..bc22d54 100644
> --- a/include/linux/tracepoint-defs.h
> +++ b/include/linux/tracepoint-defs.h
> @@ -32,6 +32,7 @@ struct tracepoint {
> int (*regfunc)(void);
> void (*unregfunc)(void);
> struct tracepoint_func __rcu *funcs;
> + bool dynamic;
> };
>
> #endif
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 685c50a..1c5957f 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -197,7 +197,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
> struct tracepoint_func *old, *tp_funcs;
> int ret;
>
> - if (tp->regfunc && !static_key_enabled(&tp->key)) {
> + if (tp->regfunc &&
> + ((tp->dynamic && !(atomic_read(&tp->key.enabled) > 0)) ||
> + !static_key_enabled(&tp->key))) {
> ret = tp->regfunc();
> if (ret < 0)
> return ret;
> @@ -219,7 +221,9 @@ static int tracepoint_add_func(struct tracepoint *tp,
> * is used.
> */
> rcu_assign_pointer(tp->funcs, tp_funcs);
> - if (!static_key_enabled(&tp->key))
> + if (tp->dynamic && !(atomic_read(&tp->key.enabled) > 0))
> + atomic_inc(&tp->key.enabled);
> + else if (!tp->dynamic && !static_key_enabled(&tp->key))
> static_key_slow_inc(&tp->key);
> release_probes(old);
> return 0;
> @@ -246,10 +250,14 @@ static int tracepoint_remove_func(struct tracepoint *tp,
>
> if (!tp_funcs) {
> /* Removed last function */
> - if (tp->unregfunc && static_key_enabled(&tp->key))
> + if (tp->unregfunc &&
> + ((tp->dynamic && (atomic_read(&tp->key.enabled) > 0)) ||
> + static_key_enabled(&tp->key)))
> tp->unregfunc();
>
> - if (static_key_enabled(&tp->key))
> + if (tp->dynamic && (atomic_read(&tp->key.enabled) > 0))
> + atomic_dec(&tp->key.enabled);
> + else if (!tp->dynamic && static_key_enabled(&tp->key))
> static_key_slow_dec(&tp->key);
> }
> rcu_assign_pointer(tp->funcs, tp_funcs);
> @@ -258,7 +266,7 @@ static int tracepoint_remove_func(struct tracepoint *tp,
> }
>
> /**
> - * tracepoint_probe_register - Connect a probe to a tracepoint
> + * tracepoint_probe_register_prio - Connect a probe to a tracepoint
> * @tp: tracepoint
> * @probe: probe handler
> * @data: tracepoint data

2017-09-07 22:18:34

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 40/40] tracing: Add trace_event_buffer_reserve() variant that allows recursion

Hi Tom,

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.13 next-20170907]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Tom-Zanussi/tracing-Inter-event-e-g-latency-support/20170908-054142
config: x86_64-randconfig-x018-201736 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

kernel//trace/ring_buffer_benchmark.c: In function 'ring_buffer_producer':
>> kernel//trace/ring_buffer_benchmark.c:253:12: error: too few arguments to function 'ring_buffer_lock_reserve'
event = ring_buffer_lock_reserve(buffer, 10);
^~~~~~~~~~~~~~~~~~~~~~~~
In file included from kernel//trace/ring_buffer_benchmark.c:6:0:
include/linux/ring_buffer.h:115:27: note: declared here
struct ring_buffer_event *ring_buffer_lock_reserve(struct ring_buffer *buffer,
^~~~~~~~~~~~~~~~~~~~~~~~

vim +/ring_buffer_lock_reserve +253 kernel//trace/ring_buffer_benchmark.c

5092dbc96 Steven Rostedt 2009-05-05 228
5092dbc96 Steven Rostedt 2009-05-05 229 static void ring_buffer_producer(void)
5092dbc96 Steven Rostedt 2009-05-05 230 {
da194930e Tina Ruchandani 2015-01-28 231 ktime_t start_time, end_time, timeout;
5092dbc96 Steven Rostedt 2009-05-05 232 unsigned long long time;
5092dbc96 Steven Rostedt 2009-05-05 233 unsigned long long entries;
5092dbc96 Steven Rostedt 2009-05-05 234 unsigned long long overruns;
5092dbc96 Steven Rostedt 2009-05-05 235 unsigned long missed = 0;
5092dbc96 Steven Rostedt 2009-05-05 236 unsigned long hit = 0;
5092dbc96 Steven Rostedt 2009-05-05 237 unsigned long avg;
5092dbc96 Steven Rostedt 2009-05-05 238 int cnt = 0;
5092dbc96 Steven Rostedt 2009-05-05 239
5092dbc96 Steven Rostedt 2009-05-05 240 /*
5092dbc96 Steven Rostedt 2009-05-05 241 * Hammer the buffer for 10 secs (this may
5092dbc96 Steven Rostedt 2009-05-05 242 * make the system stall)
5092dbc96 Steven Rostedt 2009-05-05 243 */
4b221f031 Steven Rostedt 2009-06-17 244 trace_printk("Starting ring buffer hammer\n");
da194930e Tina Ruchandani 2015-01-28 245 start_time = ktime_get();
da194930e Tina Ruchandani 2015-01-28 246 timeout = ktime_add_ns(start_time, RUN_TIME * NSEC_PER_SEC);
5092dbc96 Steven Rostedt 2009-05-05 247 do {
5092dbc96 Steven Rostedt 2009-05-05 248 struct ring_buffer_event *event;
5092dbc96 Steven Rostedt 2009-05-05 249 int *entry;
a6f0eb6ad Steven Rostedt 2009-11-11 250 int i;
5092dbc96 Steven Rostedt 2009-05-05 251
a6f0eb6ad Steven Rostedt 2009-11-11 252 for (i = 0; i < write_iteration; i++) {
5092dbc96 Steven Rostedt 2009-05-05 @253 event = ring_buffer_lock_reserve(buffer, 10);
5092dbc96 Steven Rostedt 2009-05-05 254 if (!event) {
5092dbc96 Steven Rostedt 2009-05-05 255 missed++;
5092dbc96 Steven Rostedt 2009-05-05 256 } else {
5092dbc96 Steven Rostedt 2009-05-05 257 hit++;
5092dbc96 Steven Rostedt 2009-05-05 258 entry = ring_buffer_event_data(event);
5092dbc96 Steven Rostedt 2009-05-05 259 *entry = smp_processor_id();
5092dbc96 Steven Rostedt 2009-05-05 260 ring_buffer_unlock_commit(buffer, event);
5092dbc96 Steven Rostedt 2009-05-05 261 }
a6f0eb6ad Steven Rostedt 2009-11-11 262 }
da194930e Tina Ruchandani 2015-01-28 263 end_time = ktime_get();
5092dbc96 Steven Rostedt 2009-05-05 264
0574ea421 Steven Rostedt 2009-05-07 265 cnt++;
0574ea421 Steven Rostedt 2009-05-07 266 if (consumer && !(cnt % wakeup_interval))
5092dbc96 Steven Rostedt 2009-05-05 267 wake_up_process(consumer);
5092dbc96 Steven Rostedt 2009-05-05 268

:::::: The code at line 253 was first introduced by commit
:::::: 5092dbc96f3acdac5433b27c06860352dc6d23b9 ring-buffer: add benchmark and tester

:::::: TO: Steven Rostedt <[email protected]>
:::::: CC: Steven Rostedt <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (4.17 kB)
.config.gz (34.17 kB)
Download all attachments

2017-09-07 22:35:37

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 40/40] tracing: Add trace_event_buffer_reserve() variant that allows recursion

Hi Tom,

[auto build test ERROR on tip/perf/core]
[also build test ERROR on v4.13 next-20170907]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Tom-Zanussi/tracing-Inter-event-e-g-latency-support/20170908-054142
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa

All errors (new ones prefixed by >>):

kernel/trace/ring_buffer.c: In function 'rb_write_something':
>> kernel/trace/ring_buffer.c:4836:10: error: too few arguments to function 'ring_buffer_lock_reserve'
event = ring_buffer_lock_reserve(data->buffer, len);
^
kernel/trace/ring_buffer.c:2846:1: note: declared here
ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length,
^

vim +/ring_buffer_lock_reserve +4836 kernel/trace/ring_buffer.c

6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4813)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4814) static __init int rb_write_something(struct rb_test_data *data, bool nested)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4815) {
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4816) struct ring_buffer_event *event;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4817) struct rb_item *item;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4818) bool started;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4819) int event_len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4820) int size;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4821) int len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4822) int cnt;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4823)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4824) /* Have nested writes different that what is written */
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4825) cnt = data->cnt + (nested ? 27 : 0);
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4826)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4827) /* Multiply cnt by ~e, to make some unique increment */
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4828) size = (data->cnt * 68 / 25) % (sizeof(rb_string) - 1);
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4829)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4830) len = size + sizeof(struct rb_item);
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4831)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4832) started = rb_test_started;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4833) /* read rb_test_started before checking buffer enabled */
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4834) smp_rmb();
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4835)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 @4836) event = ring_buffer_lock_reserve(data->buffer, len);
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4837) if (!event) {
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4838) /* Ignore dropped events before test starts. */
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4839) if (started) {
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4840) if (nested)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4841) data->bytes_dropped += len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4842) else
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4843) data->bytes_dropped_nested += len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4844) }
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4845) return len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4846) }
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4847)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4848) event_len = ring_buffer_event_length(event);
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4849)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4850) if (RB_WARN_ON(data->buffer, event_len < len))
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4851) goto out;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4852)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4853) item = ring_buffer_event_data(event);
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4854) item->size = size;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4855) memcpy(item->str, rb_string, size);
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4856)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4857) if (nested) {
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4858) data->bytes_alloc_nested += event_len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4859) data->bytes_written_nested += len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4860) data->events_nested++;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4861) if (!data->min_size_nested || len < data->min_size_nested)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4862) data->min_size_nested = len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4863) if (len > data->max_size_nested)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4864) data->max_size_nested = len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4865) } else {
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4866) data->bytes_alloc += event_len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4867) data->bytes_written += len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4868) data->events++;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4869) if (!data->min_size || len < data->min_size)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4870) data->max_size = len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4871) if (len > data->max_size)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4872) data->max_size = len;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4873) }
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4874)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4875) out:
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4876) ring_buffer_unlock_commit(data->buffer, event);
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4877)
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4878) return 0;
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4879) }
6c43e554a2 Steven Rostedt (Red Hat 2013-03-15 4880)

:::::: The code at line 4836 was first introduced by commit
:::::: 6c43e554a2a5c1f2caf1733d46719bc58de3e37b ring-buffer: Add ring buffer startup selftest

:::::: TO: Steven Rostedt (Red Hat) <[email protected]>
:::::: CC: Steven Rostedt <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (6.64 kB)
.config.gz (49.86 kB)
Download all attachments

2017-09-07 23:40:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 27/40] tracing: Add support for 'synthetic' events

On Tue, 5 Sep 2017 16:57:39 -0500
Tom Zanussi <[email protected]> wrote:


> +static int synth_field_string_size(char *type)
> +{
> + char buf[4], *end, *start;
> + unsigned int len;
> + int size, err;
> +
> + start = strstr(type, "char[");
> + if (start == NULL)
> + return -EINVAL;
> + start += strlen("char[");
> +
> + end = strchr(type, ']');
> + if (!end || end < start)
> + return -EINVAL;
> +
> + len = end - start;
> + if (len > 2)

Is there a reason for max of 2? Could it be 3?

> + return -EINVAL;
> +
> + strncpy(buf, start, len);
> + buf[len] = '\0';

With len=3, buf[len] would be the 4th byte, which buf is defined to be.

> +
> + err = kstrtouint(buf, 0, &size);
> + if (err)
> + return err;
> +
> + if (size > STR_VAR_LEN_MAX)
> + return -EINVAL;
> +
> + return size;
> +}
> +
> +static int synth_field_size(char *type)
> +{
> + int size = 0;
> +
> + if (strcmp(type, "s64") == 0)
> + size = sizeof(s64);
> + else if (strcmp(type, "u64") == 0)
> + size = sizeof(u64);
> + else if (strcmp(type, "s32") == 0)
> + size = sizeof(s32);
> + else if (strcmp(type, "u32") == 0)
> + size = sizeof(u32);
> + else if (strcmp(type, "s16") == 0)
> + size = sizeof(s16);
> + else if (strcmp(type, "u16") == 0)
> + size = sizeof(u16);
> + else if (strcmp(type, "s8") == 0)
> + size = sizeof(s8);
> + else if (strcmp(type, "u8") == 0)
> + size = sizeof(u8);
> + else if (strcmp(type, "char") == 0)
> + size = sizeof(char);
> + else if (strcmp(type, "unsigned char") == 0)
> + size = sizeof(unsigned char);
> + else if (strcmp(type, "int") == 0)
> + size = sizeof(int);
> + else if (strcmp(type, "unsigned int") == 0)
> + size = sizeof(unsigned int);
> + else if (strcmp(type, "long") == 0)
> + size = sizeof(long);
> + else if (strcmp(type, "unsigned long") == 0)
> + size = sizeof(unsigned long);
> + else if (strcmp(type, "pid_t") == 0)
> + size = sizeof(pid_t);
> + else if (synth_field_is_string(type))
> + size = synth_field_string_size(type);
> +
> + return size;
> +}
> +
> +static const char *synth_field_fmt(char *type)
> +{
> + const char *fmt = "%llu";
> +
> + if (strcmp(type, "s64") == 0)
> + fmt = "%lld";
> + else if (strcmp(type, "u64") == 0)
> + fmt = "%llu";
> + else if (strcmp(type, "s32") == 0)
> + fmt = "%d";
> + else if (strcmp(type, "u32") == 0)
> + fmt = "%u";
> + else if (strcmp(type, "s16") == 0)
> + fmt = "%d";
> + else if (strcmp(type, "u16") == 0)
> + fmt = "%u";
> + else if (strcmp(type, "s8") == 0)
> + fmt = "%d";
> + else if (strcmp(type, "u8") == 0)
> + fmt = "%u";
> + else if (strcmp(type, "char") == 0)
> + fmt = "%d";
> + else if (strcmp(type, "unsigned char") == 0)
> + fmt = "%u";
> + else if (strcmp(type, "int") == 0)
> + fmt = "%d";
> + else if (strcmp(type, "unsigned int") == 0)
> + fmt = "%u";
> + else if (strcmp(type, "long") == 0)
> + fmt = "%ld";
> + else if (strcmp(type, "unsigned long") == 0)
> + fmt = "%lu";
> + else if (strcmp(type, "pid_t") == 0)
> + fmt = "%d";
> + else if (strstr(type, "[") == 0)
> + fmt = "%s";
> +
> + return fmt;
> +}
> +
> +static enum print_line_t print_synth_event(struct trace_iterator *iter,
> + int flags,
> + struct trace_event *event)
> +{
> + struct trace_array *tr = iter->tr;
> + struct trace_seq *s = &iter->seq;
> + struct synth_trace_event *entry;
> + struct synth_event *se;
> + unsigned int i, n_u64;
> + char print_fmt[32];
> + const char *fmt;
> +
> + entry = (struct synth_trace_event *)iter->ent;
> + se = container_of(event, struct synth_event, call.event);
> +
> + trace_seq_printf(s, "%s: ", se->name);
> +
> + for (i = 0, n_u64 = 0; i < se->n_fields; i++) {
> + if (trace_seq_has_overflowed(s))
> + goto end;
> +
> + fmt = synth_field_fmt(se->fields[i]->type);
> +
> + /* parameter types */
> + if (tr->trace_flags & TRACE_ITER_VERBOSE)
> + trace_seq_printf(s, "%s ", fmt);
> +
> + sprintf(print_fmt, "%%s=%s%%s", fmt);

Please use snprintf().

> +
> + /* parameter values */
> + if (se->fields[i]->is_string) {
> + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> + (char *)entry->fields[n_u64],
> + i == se->n_fields - 1 ? "" : " ");
> + n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + } else {
> + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> + entry->fields[n_u64],
> + i == se->n_fields - 1 ? "" : " ");
> + n_u64++;
> + }
> + }
> +end:
> + trace_seq_putc(s, '\n');
> +
> + return trace_handle_return(s);
> +}
> +
> +static struct trace_event_functions synth_event_funcs = {
> + .trace = print_synth_event
> +};
> +
> +static notrace void trace_event_raw_event_synth(void *__data,
> + u64 *var_ref_vals,
> + unsigned int var_ref_idx)
> +{
> + struct trace_event_file *trace_file = __data;
> + struct synth_trace_event *entry;
> + struct trace_event_buffer fbuffer;
> + struct synth_event *event;
> + unsigned int i, n_u64;
> + int fields_size = 0;
> +
> + event = trace_file->event_call->data;
> +
> + if (trace_trigger_soft_disabled(trace_file))
> + return;
> +
> + fields_size = event->n_u64 * sizeof(u64);
> +
> + entry = trace_event_buffer_reserve(&fbuffer, trace_file,
> + sizeof(*entry) + fields_size);
> + if (!entry)
> + return;
> +
> + for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
> + if (event->fields[i]->is_string) {
> + char *str_val = (char *)var_ref_vals[var_ref_idx + i];
> + char *str_field = (char *)&entry->fields[n_u64];
> +
> + strncpy(str_field, str_val, STR_VAR_LEN_MAX);
> + n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + } else {
> + entry->fields[i] = var_ref_vals[var_ref_idx + i];
> + n_u64++;
> + }
> + }
> +
> + trace_event_buffer_commit(&fbuffer);
> +}
> +
> +static void free_synth_event_print_fmt(struct trace_event_call *call)
> +{
> + if (call)
> + kfree(call->print_fmt);

For safety reasons should this be:

if (call) {
kfree(call->print_fmt);
call->print_fmt = NULL;
}
?

> +}
> +
> +static int __set_synth_event_print_fmt(struct synth_event *event,
> + char *buf, int len)
> +{
> + const char *fmt;
> + int pos = 0;
> + int i;
> +
> + /* When len=0, we just calculate the needed length */
> +#define LEN_OR_ZERO (len ? len - pos : 0)
> +
> + pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
> + for (i = 0; i < event->n_fields; i++) {
> + fmt = synth_field_fmt(event->fields[i]->type);
> + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s",
> + event->fields[i]->name, fmt,
> + i == event->n_fields - 1 ? "" : ", ");
> + }
> + pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
> +
> + for (i = 0; i < event->n_fields; i++) {
> + pos += snprintf(buf + pos, LEN_OR_ZERO,
> + ", REC->%s", event->fields[i]->name);
> + }
> +
> +#undef LEN_OR_ZERO
> +
> + /* return the length of print_fmt */
> + return pos;
> +}
> +
> +static int set_synth_event_print_fmt(struct trace_event_call *call)
> +{
> + struct synth_event *event = call->data;
> + char *print_fmt;
> + int len;
> +
> + /* First: called with 0 length to calculate the needed length */
> + len = __set_synth_event_print_fmt(event, NULL, 0);
> +
> + print_fmt = kmalloc(len + 1, GFP_KERNEL);
> + if (!print_fmt)
> + return -ENOMEM;
> +
> + /* Second: actually write the @print_fmt */
> + __set_synth_event_print_fmt(event, print_fmt, len + 1);
> + call->print_fmt = print_fmt;
> +
> + return 0;
> +}
> +
> +static void free_synth_field(struct synth_field *field)
> +{
> + kfree(field->type);
> + kfree(field->name);
> + kfree(field);
> +}
> +
> +static struct synth_field *parse_synth_field(char *field_type,
> + char *field_name)
> +{
> + struct synth_field *field;
> + int len, ret = 0;
> + char *array;
> +
> + if (field_type[0] == ';')
> + field_type++;
> +
> + len = strlen(field_name);
> + if (field_name[len - 1] == ';')
> + field_name[len - 1] = '\0';
> +
> + field = kzalloc(sizeof(*field), GFP_KERNEL);
> + if (!field)
> + return ERR_PTR(-ENOMEM);
> +
> + len = strlen(field_type) + 1;
> + array = strchr(field_name, '[');
> + if (array)
> + len += strlen(array);
> + field->type = kzalloc(len, GFP_KERNEL);
> + if (!field->type) {
> + ret = -ENOMEM;
> + goto free;
> + }
> + strcat(field->type, field_type);
> + if (array) {
> + strcat(field->type, array);
> + *array = '\0';
> + }
> +
> + field->size = synth_field_size(field->type);
> + if (!field->size) {
> + ret = -EINVAL;
> + goto free;
> + }
> +
> + if (synth_field_is_string(field->type))
> + field->is_string = true;
> +
> + field->is_signed = synth_field_signed(field->type);
> +
> + field->name = kstrdup(field_name, GFP_KERNEL);
> + if (!field->name) {
> + ret = -ENOMEM;
> + goto free;
> + }
> + out:
> + return field;
> + free:
> + free_synth_field(field);
> + field = ERR_PTR(ret);
> + goto out;
> +}
> +
> +static void free_synth_tracepoint(struct tracepoint *tp)
> +{
> + if (!tp)
> + return;
> +
> + kfree(tp->name);
> + kfree(tp);
> +}
> +
> +static struct tracepoint *alloc_synth_tracepoint(char *name)
> +{
> + struct tracepoint *tp;
> + int ret = 0;
> +
> + tp = kzalloc(sizeof(*tp), GFP_KERNEL);
> + if (!tp) {
> + ret = -ENOMEM;
> + goto free;

Why the goto free here? It's the first allocation. Should just be able
to return ERR_PTR(-ENOMEM).

> + }
> +
> + tp->name = kstrdup(name, GFP_KERNEL);
> + if (!tp->name) {
> + ret = -ENOMEM;
> + goto free;

Then we don't even need the goto. Just free free tp, and return with
error.

> + }
> +
> + tp->dynamic = true;
> +
> + return tp;
> + free:
> + free_synth_tracepoint(tp);
> +
> + return ERR_PTR(ret);
> +}
> +
> +typedef void (*synth_probe_func_t) (void *__data, u64 *var_ref_vals,
> + unsigned int var_ref_idx);
> +
> +static inline void trace_synth(struct synth_event *event, u64 *var_ref_vals,
> + unsigned int var_ref_idx)
> +{
> + struct tracepoint *tp = event->tp;
> +
> + if (unlikely(atomic_read(&tp->key.enabled) > 0)) {
> + struct tracepoint_func *probe_func_ptr;
> + synth_probe_func_t probe_func;
> + void *__data;
> +
> + if (!(cpu_online(raw_smp_processor_id())))
> + return;
> +
> + probe_func_ptr = rcu_dereference_sched((tp)->funcs);
> + if (probe_func_ptr) {
> + do {
> + probe_func = (probe_func_ptr)->func;
> + __data = (probe_func_ptr)->data;

Are the parenthesis around probe_func_ptr required?

> + probe_func(__data, var_ref_vals, var_ref_idx);
> + } while ((++probe_func_ptr)->func);
> + }
> + }
> +}
> +
> +static struct synth_event *find_synth_event(const char *name)
> +{
> + struct synth_event *event;
> +
> + list_for_each_entry(event, &synth_event_list, list) {
> + if (strcmp(event->name, name) == 0)
> + return event;
> + }
> +
> + return NULL;
> +}
> +
> +static int register_synth_event(struct synth_event *event)
> +{
> + struct trace_event_call *call = &event->call;
> + int ret = 0;
> +
> + event->call.class = &event->class;
> + event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL);
> + if (!event->class.system) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + event->tp = alloc_synth_tracepoint(event->name);
> + if (IS_ERR(event->tp)) {
> + ret = PTR_ERR(event->tp);
> + event->tp = NULL;
> + goto out;
> + }
> +
> + INIT_LIST_HEAD(&call->class->fields);
> + call->event.funcs = &synth_event_funcs;
> + call->class->define_fields = synth_event_define_fields;
> +
> + ret = register_trace_event(&call->event);
> + if (!ret) {
> + ret = -ENODEV;
> + goto out;
> + }
> + call->flags = TRACE_EVENT_FL_TRACEPOINT;
> + call->class->reg = trace_event_reg;
> + call->class->probe = trace_event_raw_event_synth;
> + call->data = event;
> + call->tp = event->tp;
> +

Could you comment what lock inversion is being avoided by the releasing
of this mutex.

> + mutex_unlock(&synth_event_mutex);

Please add a comment before this function that states that this
function releases synth_event_mutex.

> + ret = trace_add_event_call(call);
> + mutex_lock(&synth_event_mutex);
> + if (ret) {
> + pr_warn("Failed to register synthetic event: %s\n",
> + trace_event_name(call));
> + goto err;
> + }
> +
> + ret = set_synth_event_print_fmt(call);
> + if (ret < 0) {
> + mutex_unlock(&synth_event_mutex);
> + trace_remove_event_call(call);
> + mutex_lock(&synth_event_mutex);
> + goto err;
> + }
> + out:
> + return ret;
> + err:
> + unregister_trace_event(&call->event);
> + goto out;
> +}
> +
> +static int unregister_synth_event(struct synth_event *event)
> +{
> + struct trace_event_call *call = &event->call;
> + int ret;
> +
> + mutex_unlock(&synth_event_mutex);

Same here.

> + ret = trace_remove_event_call(call);
> + mutex_lock(&synth_event_mutex);
> + if (ret) {
> + pr_warn("Failed to remove synthetic event: %s\n",
> + trace_event_name(call));
> + free_synth_event_print_fmt(call);

Is it safe to call unregister_trace_event() with the synth_event_mutex
held?

> + unregister_trace_event(&call->event);
> + }
> +
> + return ret;
> +}
> +
> +static void remove_synth_event(struct synth_event *event)
> +{
> + unregister_synth_event(event);
> + list_del(&event->list);
> +}
> +
> +static int add_synth_event(struct synth_event *event)
> +{
> + int ret;
> +
> + ret = register_synth_event(event);
> + if (ret)
> + return ret;
> +
> + list_add(&event->list, &synth_event_list);
> +
> + return 0;
> +}
> +
> +static void free_synth_event(struct synth_event *event)
> +{
> + unsigned int i;
> +
> + if (!event)
> + return;
> +
> + for (i = 0; i < event->n_fields; i++)
> + free_synth_field(event->fields[i]);
> +
> + kfree(event->fields);
> + kfree(event->name);
> + kfree(event->class.system);
> + free_synth_tracepoint(event->tp);
> + free_synth_event_print_fmt(&event->call);
> + kfree(event);
> +}
> +
> +static struct synth_event *alloc_synth_event(char *event_name, int n_fields,
> + struct synth_field **fields)
> +{
> + struct synth_event *event;
> + unsigned int i;
> +
> + event = kzalloc(sizeof(*event), GFP_KERNEL);
> + if (!event) {
> + event = ERR_PTR(-ENOMEM);
> + goto out;
> + }
> +
> + event->name = kstrdup(event_name, GFP_KERNEL);
> + if (!event->name) {
> + kfree(event);
> + event = ERR_PTR(-ENOMEM);
> + goto out;
> + }
> +
> + event->fields = kcalloc(n_fields, sizeof(event->fields), GFP_KERNEL);
> + if (!event->fields) {
> + free_synth_event(event);
> + event = ERR_PTR(-ENOMEM);
> + goto out;

Interesting error handling process. I'm fine with keeping this.

> + }
> +
> + for (i = 0; i < n_fields; i++)
> + event->fields[i] = fields[i];
> +
> + event->n_fields = n_fields;
> + out:
> + return event;
> +}
> +
> +static int create_synth_event(int argc, char **argv)
> +{
> + struct synth_field *field, *fields[SYNTH_FIELDS_MAX];
> + struct synth_event *event = NULL;
> + bool delete_event = false;
> + int i, n_fields = 0, ret = 0;
> + char *name;
> +
> + mutex_lock(&synth_event_mutex);
> +
> + /*
> + * Argument syntax:
> + * - Add synthetic event: <event_name> field[;field] ...
> + * - Remove synthetic event: !<event_name> field[;field] ...
> + * where 'field' = type field_name
> + */
> + if (argc < 1) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + name = argv[0];
> + if (name[0] == '!') {
> + delete_event = true;
> + name++;
> + }
> +
> + event = find_synth_event(name);
> + if (event) {
> + if (delete_event) {
> + if (event->ref) {
> + ret = -EBUSY;
> + goto out;
> + }
> + remove_synth_event(event);
> + free_synth_event(event);
> + goto out;
> + }
> + ret = -EEXIST;
> + goto out;
> + } else if (delete_event) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if (argc < 2) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + for (i = 1; i < argc - 1; i++) {
> + if (strcmp(argv[i], ";") == 0)
> + continue;
> + if (n_fields == SYNTH_FIELDS_MAX) {
> + ret = -EINVAL;
> + goto err;
> + }
> +
> + field = parse_synth_field(argv[i], argv[i + 1]);
> + if (IS_ERR(field)) {
> + ret = PTR_ERR(field);
> + goto err;
> + }
> + fields[n_fields] = field;
> + i++; n_fields++;
> + }
> +
> + if (i < argc) {
> + ret = -EINVAL;
> + goto err;
> + }
> +
> + event = alloc_synth_event(name, n_fields, fields);
> + if (IS_ERR(event)) {
> + ret = PTR_ERR(event);
> + event = NULL;
> + goto err;
> + }
> +

Add comment that the synth_event_mutex is released by this function.

> + add_synth_event(event);
> + out:
> + mutex_unlock(&synth_event_mutex);
> +
> + return ret;
> + err:
> + for (i = 0; i < n_fields; i++)
> + free_synth_field(fields[i]);
> + free_synth_event(event);
> +
> + goto out;
> +}
> +
> +static int release_all_synth_events(void)
> +{
> + struct synth_event *event, *e;
> + int ret = 0;
> +
> + mutex_lock(&synth_event_mutex);
> +
> + list_for_each_entry(event, &synth_event_list, list) {
> + if (event->ref) {
> + ret = -EBUSY;
> + goto out;
> + }
> + }
> +
> + list_for_each_entry_safe(event, e, &synth_event_list, list) {

Same here.

-- Steve

> + remove_synth_event(event);
> + free_synth_event(event);
> + }
> + out:
> + mutex_unlock(&synth_event_mutex);
> +
> + return ret;
> +}

2017-09-07 23:43:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] tracing: Add support for 'field variables'

On Tue, 5 Sep 2017 16:57:40 -0500
Tom Zanussi <[email protected]> wrote:

> Users should be able to directly specify event fields in hist trigger
> 'actions' rather than being forced to explicitly create a variable for
> that purpose.
>
> Add support allowing fields to be used directly in actions, which
> essentially does just that - creates 'invisible' variables for each
> bare field specified in an action. If a bare field refers to a field
> on another (matching) event, it even creates a special histogram for
> the purpose (since variables can't be defined on an existing histogram
> after histogram creation).

Can you show some examples here to describe what you mean better?

-- Steve

>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/trace_events_hist.c | 452 ++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 451 insertions(+), 1 deletion(-)
>
>

2017-09-08 14:18:32

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 25/40] tracing: Add support for dynamic tracepoints

Hi Steve,

On Thu, 2017-09-07 at 18:02 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:37 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > The tracepoint infrastructure assumes statically-defined tracepoints
> > and uses static_keys for tracepoint enablement. In order to define
> > tracepoints on the fly, we need to have a dynamic counterpart.
>
> Do we?
>
> I believe the static keys should work just fine if you don't have any
> defined static keys jumps, and they should work as a simple counter.
>
> Could we do that instead of adding another variable to a structure that
> is defined over a thousand times? Save us 4k of memory.
>

OK, yeah, makes sense if possible - let me look into that...

Tom



2017-09-08 14:30:46

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 27/40] tracing: Add support for 'synthetic' events

On Thu, 2017-09-07 at 19:40 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:39 -0500
> Tom Zanussi <[email protected]> wrote:
>
>
> > +static int synth_field_string_size(char *type)
> > +{
> > + char buf[4], *end, *start;
> > + unsigned int len;
> > + int size, err;
> > +
> > + start = strstr(type, "char[");
> > + if (start == NULL)
> > + return -EINVAL;
> > + start += strlen("char[");
> > +
> > + end = strchr(type, ']');
> > + if (!end || end < start)
> > + return -EINVAL;
> > +
> > + len = end - start;
> > + if (len > 2)
>
> Is there a reason for max of 2? Could it be 3?
>

You're right, it should be 3.

> > + return -EINVAL;
> > +
> > + strncpy(buf, start, len);
> > + buf[len] = '\0';
>
> With len=3, buf[len] would be the 4th byte, which buf is defined to be.
>

Yep.

> > +
> > + err = kstrtouint(buf, 0, &size);
> > + if (err)
> > + return err;
> > +
> > + if (size > STR_VAR_LEN_MAX)
> > + return -EINVAL;
> > +
> > + return size;
> > +}
> > +
> > +static int synth_field_size(char *type)
> > +{
> > + int size = 0;
> > +
> > + if (strcmp(type, "s64") == 0)
> > + size = sizeof(s64);
> > + else if (strcmp(type, "u64") == 0)
> > + size = sizeof(u64);
> > + else if (strcmp(type, "s32") == 0)
> > + size = sizeof(s32);
> > + else if (strcmp(type, "u32") == 0)
> > + size = sizeof(u32);
> > + else if (strcmp(type, "s16") == 0)
> > + size = sizeof(s16);
> > + else if (strcmp(type, "u16") == 0)
> > + size = sizeof(u16);
> > + else if (strcmp(type, "s8") == 0)
> > + size = sizeof(s8);
> > + else if (strcmp(type, "u8") == 0)
> > + size = sizeof(u8);
> > + else if (strcmp(type, "char") == 0)
> > + size = sizeof(char);
> > + else if (strcmp(type, "unsigned char") == 0)
> > + size = sizeof(unsigned char);
> > + else if (strcmp(type, "int") == 0)
> > + size = sizeof(int);
> > + else if (strcmp(type, "unsigned int") == 0)
> > + size = sizeof(unsigned int);
> > + else if (strcmp(type, "long") == 0)
> > + size = sizeof(long);
> > + else if (strcmp(type, "unsigned long") == 0)
> > + size = sizeof(unsigned long);
> > + else if (strcmp(type, "pid_t") == 0)
> > + size = sizeof(pid_t);
> > + else if (synth_field_is_string(type))
> > + size = synth_field_string_size(type);
> > +
> > + return size;
> > +}
> > +
> > +static const char *synth_field_fmt(char *type)
> > +{
> > + const char *fmt = "%llu";
> > +
> > + if (strcmp(type, "s64") == 0)
> > + fmt = "%lld";
> > + else if (strcmp(type, "u64") == 0)
> > + fmt = "%llu";
> > + else if (strcmp(type, "s32") == 0)
> > + fmt = "%d";
> > + else if (strcmp(type, "u32") == 0)
> > + fmt = "%u";
> > + else if (strcmp(type, "s16") == 0)
> > + fmt = "%d";
> > + else if (strcmp(type, "u16") == 0)
> > + fmt = "%u";
> > + else if (strcmp(type, "s8") == 0)
> > + fmt = "%d";
> > + else if (strcmp(type, "u8") == 0)
> > + fmt = "%u";
> > + else if (strcmp(type, "char") == 0)
> > + fmt = "%d";
> > + else if (strcmp(type, "unsigned char") == 0)
> > + fmt = "%u";
> > + else if (strcmp(type, "int") == 0)
> > + fmt = "%d";
> > + else if (strcmp(type, "unsigned int") == 0)
> > + fmt = "%u";
> > + else if (strcmp(type, "long") == 0)
> > + fmt = "%ld";
> > + else if (strcmp(type, "unsigned long") == 0)
> > + fmt = "%lu";
> > + else if (strcmp(type, "pid_t") == 0)
> > + fmt = "%d";
> > + else if (strstr(type, "[") == 0)
> > + fmt = "%s";
> > +
> > + return fmt;
> > +}
> > +
> > +static enum print_line_t print_synth_event(struct trace_iterator *iter,
> > + int flags,
> > + struct trace_event *event)
> > +{
> > + struct trace_array *tr = iter->tr;
> > + struct trace_seq *s = &iter->seq;
> > + struct synth_trace_event *entry;
> > + struct synth_event *se;
> > + unsigned int i, n_u64;
> > + char print_fmt[32];
> > + const char *fmt;
> > +
> > + entry = (struct synth_trace_event *)iter->ent;
> > + se = container_of(event, struct synth_event, call.event);
> > +
> > + trace_seq_printf(s, "%s: ", se->name);
> > +
> > + for (i = 0, n_u64 = 0; i < se->n_fields; i++) {
> > + if (trace_seq_has_overflowed(s))
> > + goto end;
> > +
> > + fmt = synth_field_fmt(se->fields[i]->type);
> > +
> > + /* parameter types */
> > + if (tr->trace_flags & TRACE_ITER_VERBOSE)
> > + trace_seq_printf(s, "%s ", fmt);
> > +
> > + sprintf(print_fmt, "%%s=%s%%s", fmt);
>
> Please use snprintf().
>
> > +
> > + /* parameter values */
> > + if (se->fields[i]->is_string) {
> > + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> > + (char *)entry->fields[n_u64],
> > + i == se->n_fields - 1 ? "" : " ");
> > + n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> > + } else {
> > + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> > + entry->fields[n_u64],
> > + i == se->n_fields - 1 ? "" : " ");
> > + n_u64++;
> > + }
> > + }
> > +end:
> > + trace_seq_putc(s, '\n');
> > +
> > + return trace_handle_return(s);
> > +}
> > +
> > +static struct trace_event_functions synth_event_funcs = {
> > + .trace = print_synth_event
> > +};
> > +
> > +static notrace void trace_event_raw_event_synth(void *__data,
> > + u64 *var_ref_vals,
> > + unsigned int var_ref_idx)
> > +{
> > + struct trace_event_file *trace_file = __data;
> > + struct synth_trace_event *entry;
> > + struct trace_event_buffer fbuffer;
> > + struct synth_event *event;
> > + unsigned int i, n_u64;
> > + int fields_size = 0;
> > +
> > + event = trace_file->event_call->data;
> > +
> > + if (trace_trigger_soft_disabled(trace_file))
> > + return;
> > +
> > + fields_size = event->n_u64 * sizeof(u64);
> > +
> > + entry = trace_event_buffer_reserve(&fbuffer, trace_file,
> > + sizeof(*entry) + fields_size);
> > + if (!entry)
> > + return;
> > +
> > + for (i = 0, n_u64 = 0; i < event->n_fields; i++) {
> > + if (event->fields[i]->is_string) {
> > + char *str_val = (char *)var_ref_vals[var_ref_idx + i];
> > + char *str_field = (char *)&entry->fields[n_u64];
> > +
> > + strncpy(str_field, str_val, STR_VAR_LEN_MAX);
> > + n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> > + } else {
> > + entry->fields[i] = var_ref_vals[var_ref_idx + i];
> > + n_u64++;
> > + }
> > + }
> > +
> > + trace_event_buffer_commit(&fbuffer);
> > +}
> > +
> > +static void free_synth_event_print_fmt(struct trace_event_call *call)
> > +{
> > + if (call)
> > + kfree(call->print_fmt);
>
> For safety reasons should this be:
>
> if (call) {
> kfree(call->print_fmt);
> call->print_fmt = NULL;
> }
> ?
>

Yeah, will change.

> > +}
> > +
> > +static int __set_synth_event_print_fmt(struct synth_event *event,
> > + char *buf, int len)
> > +{
> > + const char *fmt;
> > + int pos = 0;
> > + int i;
> > +
> > + /* When len=0, we just calculate the needed length */
> > +#define LEN_OR_ZERO (len ? len - pos : 0)
> > +
> > + pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
> > + for (i = 0; i < event->n_fields; i++) {
> > + fmt = synth_field_fmt(event->fields[i]->type);
> > + pos += snprintf(buf + pos, LEN_OR_ZERO, "%s=%s%s",
> > + event->fields[i]->name, fmt,
> > + i == event->n_fields - 1 ? "" : ", ");
> > + }
> > + pos += snprintf(buf + pos, LEN_OR_ZERO, "\"");
> > +
> > + for (i = 0; i < event->n_fields; i++) {
> > + pos += snprintf(buf + pos, LEN_OR_ZERO,
> > + ", REC->%s", event->fields[i]->name);
> > + }
> > +
> > +#undef LEN_OR_ZERO
> > +
> > + /* return the length of print_fmt */
> > + return pos;
> > +}
> > +
> > +static int set_synth_event_print_fmt(struct trace_event_call *call)
> > +{
> > + struct synth_event *event = call->data;
> > + char *print_fmt;
> > + int len;
> > +
> > + /* First: called with 0 length to calculate the needed length */
> > + len = __set_synth_event_print_fmt(event, NULL, 0);
> > +
> > + print_fmt = kmalloc(len + 1, GFP_KERNEL);
> > + if (!print_fmt)
> > + return -ENOMEM;
> > +
> > + /* Second: actually write the @print_fmt */
> > + __set_synth_event_print_fmt(event, print_fmt, len + 1);
> > + call->print_fmt = print_fmt;
> > +
> > + return 0;
> > +}
> > +
> > +static void free_synth_field(struct synth_field *field)
> > +{
> > + kfree(field->type);
> > + kfree(field->name);
> > + kfree(field);
> > +}
> > +
> > +static struct synth_field *parse_synth_field(char *field_type,
> > + char *field_name)
> > +{
> > + struct synth_field *field;
> > + int len, ret = 0;
> > + char *array;
> > +
> > + if (field_type[0] == ';')
> > + field_type++;
> > +
> > + len = strlen(field_name);
> > + if (field_name[len - 1] == ';')
> > + field_name[len - 1] = '\0';
> > +
> > + field = kzalloc(sizeof(*field), GFP_KERNEL);
> > + if (!field)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + len = strlen(field_type) + 1;
> > + array = strchr(field_name, '[');
> > + if (array)
> > + len += strlen(array);
> > + field->type = kzalloc(len, GFP_KERNEL);
> > + if (!field->type) {
> > + ret = -ENOMEM;
> > + goto free;
> > + }
> > + strcat(field->type, field_type);
> > + if (array) {
> > + strcat(field->type, array);
> > + *array = '\0';
> > + }
> > +
> > + field->size = synth_field_size(field->type);
> > + if (!field->size) {
> > + ret = -EINVAL;
> > + goto free;
> > + }
> > +
> > + if (synth_field_is_string(field->type))
> > + field->is_string = true;
> > +
> > + field->is_signed = synth_field_signed(field->type);
> > +
> > + field->name = kstrdup(field_name, GFP_KERNEL);
> > + if (!field->name) {
> > + ret = -ENOMEM;
> > + goto free;
> > + }
> > + out:
> > + return field;
> > + free:
> > + free_synth_field(field);
> > + field = ERR_PTR(ret);
> > + goto out;
> > +}
> > +
> > +static void free_synth_tracepoint(struct tracepoint *tp)
> > +{
> > + if (!tp)
> > + return;
> > +
> > + kfree(tp->name);
> > + kfree(tp);
> > +}
> > +
> > +static struct tracepoint *alloc_synth_tracepoint(char *name)
> > +{
> > + struct tracepoint *tp;
> > + int ret = 0;
> > +
> > + tp = kzalloc(sizeof(*tp), GFP_KERNEL);
> > + if (!tp) {
> > + ret = -ENOMEM;
> > + goto free;
>
> Why the goto free here? It's the first allocation. Should just be able
> to return ERR_PTR(-ENOMEM).
>
> > + }
> > +
> > + tp->name = kstrdup(name, GFP_KERNEL);
> > + if (!tp->name) {
> > + ret = -ENOMEM;
> > + goto free;
>
> Then we don't even need the goto. Just free free tp, and return with
> error.
>
> > + }
> > +
> > + tp->dynamic = true;
> > +
> > + return tp;
> > + free:
> > + free_synth_tracepoint(tp);
> > +
> > + return ERR_PTR(ret);
> > +}
> > +
> > +typedef void (*synth_probe_func_t) (void *__data, u64 *var_ref_vals,
> > + unsigned int var_ref_idx);
> > +
> > +static inline void trace_synth(struct synth_event *event, u64 *var_ref_vals,
> > + unsigned int var_ref_idx)
> > +{
> > + struct tracepoint *tp = event->tp;
> > +
> > + if (unlikely(atomic_read(&tp->key.enabled) > 0)) {
> > + struct tracepoint_func *probe_func_ptr;
> > + synth_probe_func_t probe_func;
> > + void *__data;
> > +
> > + if (!(cpu_online(raw_smp_processor_id())))
> > + return;
> > +
> > + probe_func_ptr = rcu_dereference_sched((tp)->funcs);
> > + if (probe_func_ptr) {
> > + do {
> > + probe_func = (probe_func_ptr)->func;
> > + __data = (probe_func_ptr)->data;
>
> Are the parenthesis around probe_func_ptr required?
>
> > + probe_func(__data, var_ref_vals, var_ref_idx);
> > + } while ((++probe_func_ptr)->func);
> > + }
> > + }
> > +}
> > +
> > +static struct synth_event *find_synth_event(const char *name)
> > +{
> > + struct synth_event *event;
> > +
> > + list_for_each_entry(event, &synth_event_list, list) {
> > + if (strcmp(event->name, name) == 0)
> > + return event;
> > + }
> > +
> > + return NULL;
> > +}
> > +
> > +static int register_synth_event(struct synth_event *event)
> > +{
> > + struct trace_event_call *call = &event->call;
> > + int ret = 0;
> > +
> > + event->call.class = &event->class;
> > + event->class.system = kstrdup(SYNTH_SYSTEM, GFP_KERNEL);
> > + if (!event->class.system) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + event->tp = alloc_synth_tracepoint(event->name);
> > + if (IS_ERR(event->tp)) {
> > + ret = PTR_ERR(event->tp);
> > + event->tp = NULL;
> > + goto out;
> > + }
> > +
> > + INIT_LIST_HEAD(&call->class->fields);
> > + call->event.funcs = &synth_event_funcs;
> > + call->class->define_fields = synth_event_define_fields;
> > +
> > + ret = register_trace_event(&call->event);
> > + if (!ret) {
> > + ret = -ENODEV;
> > + goto out;
> > + }
> > + call->flags = TRACE_EVENT_FL_TRACEPOINT;
> > + call->class->reg = trace_event_reg;
> > + call->class->probe = trace_event_raw_event_synth;
> > + call->data = event;
> > + call->tp = event->tp;
> > +
>
> Could you comment what lock inversion is being avoided by the releasing
> of this mutex.
>

Yeah, this is because trace_add/remove_event_call() would otherwise grab
event_mutex with synth_event_mutex held, but a hist trigger cmd which
already has the event_mutex held when called can grab the
synth_event_mutex.

> > + mutex_unlock(&synth_event_mutex);
>
> Please add a comment before this function that states that this
> function releases synth_event_mutex.
>

OK, will do.

> > + ret = trace_add_event_call(call);
> > + mutex_lock(&synth_event_mutex);
> > + if (ret) {
> > + pr_warn("Failed to register synthetic event: %s\n",
> > + trace_event_name(call));
> > + goto err;
> > + }
> > +
> > + ret = set_synth_event_print_fmt(call);
> > + if (ret < 0) {
> > + mutex_unlock(&synth_event_mutex);
> > + trace_remove_event_call(call);
> > + mutex_lock(&synth_event_mutex);
> > + goto err;
> > + }
> > + out:
> > + return ret;
> > + err:
> > + unregister_trace_event(&call->event);
> > + goto out;
> > +}
> > +
> > +static int unregister_synth_event(struct synth_event *event)
> > +{
> > + struct trace_event_call *call = &event->call;
> > + int ret;
> > +
> > + mutex_unlock(&synth_event_mutex);
>
> Same here.
>
> > + ret = trace_remove_event_call(call);
> > + mutex_lock(&synth_event_mutex);
> > + if (ret) {
> > + pr_warn("Failed to remove synthetic event: %s\n",
> > + trace_event_name(call));
> > + free_synth_event_print_fmt(call);
>
> Is it safe to call unregister_trace_event() with the synth_event_mutex
> held?
>

Yeah, I don't see a problem here.

For the rest of the comments, will update as suggested...

Thanks,

Tom


2017-09-08 15:37:35

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 28/40] tracing: Add support for 'field variables'

On Thu, 2017-09-07 at 19:43 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:40 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Users should be able to directly specify event fields in hist trigger
> > 'actions' rather than being forced to explicitly create a variable for
> > that purpose.
> >
> > Add support allowing fields to be used directly in actions, which
> > essentially does just that - creates 'invisible' variables for each
> > bare field specified in an action. If a bare field refers to a field
> > on another (matching) event, it even creates a special histogram for
> > the purpose (since variables can't be defined on an existing histogram
> > after histogram creation).
>
> Can you show some examples here to describe what you mean better?
>

Yeah, I guess that description is pretty opaque without an example.

Here's a simple example that demonstrates both. Basically the onmatch()
action creates a list of variables corresponding to the parameters of
the synthetic event to be generated, and then uses those values to
generate the event. So for the wakeup_latency synthetic event 'call'
below the first param, $wakeup_lat, is a variable defined explicitly on
sched_switch, where 'next_pid' is just a normal field on sched_switch,
and prio is a normal field on sched_waking.

Since the mechanism works on variables, those two normal fields just
have 'invisible' variables created internally for them. In the case of
'prio', which is on another event, we actually need to create an
additional hist trigger and define the invisible event on that, since
once a hist trigger is defined, variables can't be added to it later.

echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> /sys/kernel/debug/tracing/synthetic_events

echo 'hist:keys=pid:ts0=$common_timestamp.usecs >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger

echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0:
onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,next_pid,prio)
>> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

Anyway, I just tested it and a recent change broke the latter case, will
add the fix to v3.

Tom



2017-09-08 18:50:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 31/40] tracing: Allow whitespace to surround hist trigger filter

On Tue, 5 Sep 2017 16:57:43 -0500
Tom Zanussi <[email protected]> wrote:

> The existing code only allows for one space before and after the 'if'
> specifying the filter for a hist trigger. Add code to make that more
> permissive as far as whitespace goes.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/trace_events_hist.c | 19 +++++++++++++++----
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 0a398f3..4f66f2e 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -4819,7 +4819,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
> struct synth_event *se;
> const char *se_name;
> bool remove = false;
> - char *trigger;
> + char *trigger, *p;
> int ret = 0;
>
> if (!param)
> @@ -4829,9 +4829,19 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
> remove = true;
>
> /* separate the trigger from the filter (k:v [if filter]) */
> - trigger = strsep(&param, " \t");
> - if (!trigger)
> - return -EINVAL;
> + trigger = param;
> + p = strstr(param, " if");
> + if (!p)
> + p = strstr(param, "\tif");
> + if (p) {
> + if (p == trigger)
> + return -EINVAL;
> + param = p + 1;
> + param = strstrip(param);
> + *p = '\0';
> + trigger = strstrip(trigger);
> + } else
> + param = NULL;

This seems rather complex. Wouldn't the following work?

param = skip_spaces(param);
trigger = strsep(&param, " \t");
if (param)
param = strstrip(param);

-- Steve

>
> attrs = parse_hist_trigger_attrs(trigger);
> if (IS_ERR(attrs))
> @@ -4889,6 +4899,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
> }
>
> ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> +
> /*
> * The above returns on success the # of triggers registered,
> * but if it didn't register any it returns zero. Consider no

2017-09-08 19:08:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 32/40] tracing: Add cpu field for hist triggers

On Tue, 5 Sep 2017 16:57:44 -0500
Tom Zanussi <[email protected]> wrote:

> A common key to use in a histogram is the cpuid - add a new cpu
> 'synthetic' field for that purpose. This field is named cpu rather
> than $cpu or $common_cpu because 'cpu' already exists as a special
> filter field and it makes more sense to match that rather than add
> another name for the same thing.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> Documentation/trace/events.txt | 18 ++++++++++++++++++
> kernel/trace/trace_events_hist.c | 30 +++++++++++++++++++++++++++---
> 2 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
> index 2cc08d4..9717688 100644
> --- a/Documentation/trace/events.txt
> +++ b/Documentation/trace/events.txt
> @@ -668,6 +668,24 @@ The following commands are supported:
> The examples below provide a more concrete illustration of the
> concepts and typical usage patterns discussed above.
>
> + 'synthetic' event fields
> + ------------------------
> +
> + There are a number of 'synthetic fields' available for use as keys
> + or values in a hist trigger. These look like and behave as if they
> + were event fields, but aren't actually part of the event's field
> + definition or format file. They are however available for any
> + event, and can be used anywhere an actual event field could be.
> + 'Synthetic' field names are always prefixed with a '$' character to
> + indicate that they're not normal fields (with the exception of
> + 'cpu', for compatibility with existing filter usage):
> +
> + $common_timestamp u64 - timestamp (from ring buffer) associated
> + with the event, in nanoseconds. May be
> + modified by .usecs to have timestamps
> + interpreted as microseconds.

I guess the above should have been added with the synthetic field
addition.

> + cpu int - the cpu on which the event occurred.

Then this (and the explanation of '$' for cpu above) should be added
with this patch.


> +
>
> 6.2 'hist' trigger examples
> ---------------------------
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 4f66f2e..0782766 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -226,6 +226,7 @@ enum hist_field_flags {
> HIST_FIELD_FL_VAR_ONLY = 8192,
> HIST_FIELD_FL_EXPR = 16384,
> HIST_FIELD_FL_VAR_REF = 32768,
> + HIST_FIELD_FL_CPU = 65536,
> };
>
> struct var_defs {
> @@ -1170,6 +1171,16 @@ static u64 hist_field_timestamp(struct hist_field *hist_field,
> return ts;
> }
>
> +static u64 hist_field_cpu(struct hist_field *hist_field,
> + struct tracing_map_elt *elt,
> + struct ring_buffer_event *rbe,
> + void *event)
> +{
> + int cpu = raw_smp_processor_id();

Hmm, I wonder if this should be just smp_processor_id(). Why have the
raw_?

> +
> + return cpu;
> +}
> +
> static struct hist_field *check_var_ref(struct hist_field *hist_field,
> struct hist_trigger_data *var_data,
> unsigned int var_idx)
> @@ -1518,6 +1529,8 @@ static const char *hist_field_name(struct hist_field *field,
> field_name = hist_field_name(field->operands[0], ++level);
> else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
> field_name = "$common_timestamp";
> + else if (field->flags & HIST_FIELD_FL_CPU)
> + field_name = "cpu";
> else if (field->flags & HIST_FIELD_FL_EXPR ||
> field->flags & HIST_FIELD_FL_VAR_REF)
> field_name = field->name;
> @@ -1990,6 +2003,15 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
> goto out;
> }
>
> + if (flags & HIST_FIELD_FL_CPU) {
> + hist_field->fn = hist_field_cpu;
> + hist_field->size = sizeof(int);
> + hist_field->type = kstrdup("int", GFP_KERNEL);
> + if (!hist_field->type)
> + goto free;
> + goto out;
> + }
> +
> if (WARN_ON_ONCE(!field))
> goto out;
>
> @@ -2182,7 +2204,9 @@ static struct hist_field *parse_var_ref(struct trace_array *tr,
> hist_data->enable_timestamps = true;
> if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS)
> hist_data->attrs->ts_in_usecs = true;
> - } else {
> + } else if (strcmp(field_name, "cpu") == 0)
> + *flags |= HIST_FIELD_FL_CPU;
> + else {
> field = trace_find_event_field(file->event_call, field_name);
> if (!field || !field->size) {
> field = ERR_PTR(-EINVAL);


> @@ -3185,7 +3209,6 @@ static int onmatch_create(struct hist_trigger_data *hist_data,
> goto out;
> }
> }
> -
> if (param[0] == '$')
> hist_field = onmatch_find_var(hist_data, data, system,
> event_name, param);
> @@ -3200,7 +3223,6 @@ static int onmatch_create(struct hist_trigger_data *hist_data,
> ret = -EINVAL;
> goto out;
> }
> -

Why the modification of whitespace here?

-- Steve

> if (check_synth_field(event, hist_field, field_pos) == 0) {
> var_ref = create_var_ref(hist_field);
> if (!var_ref) {
> @@ -4315,6 +4337,8 @@ static void hist_field_print(struct seq_file *m, struct hist_field *hist_field)
>
> if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP)
> seq_puts(m, "$common_timestamp");
> + else if (hist_field->flags & HIST_FIELD_FL_CPU)
> + seq_puts(m, "cpu");
> else if (field_name)
> seq_printf(m, "%s", field_name);
>

2017-09-08 19:08:26

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 31/40] tracing: Allow whitespace to surround hist trigger filter

On Fri, 2017-09-08 at 14:50 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:43 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > The existing code only allows for one space before and after the 'if'
> > specifying the filter for a hist trigger. Add code to make that more
> > permissive as far as whitespace goes.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
> > ---
> > kernel/trace/trace_events_hist.c | 19 +++++++++++++++----
> > 1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> > index 0a398f3..4f66f2e 100644
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -4819,7 +4819,7 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
> > struct synth_event *se;
> > const char *se_name;
> > bool remove = false;
> > - char *trigger;
> > + char *trigger, *p;
> > int ret = 0;
> >
> > if (!param)
> > @@ -4829,9 +4829,19 @@ static int event_hist_trigger_func(struct event_command *cmd_ops,
> > remove = true;
> >
> > /* separate the trigger from the filter (k:v [if filter]) */
> > - trigger = strsep(&param, " \t");
> > - if (!trigger)
> > - return -EINVAL;
> > + trigger = param;
> > + p = strstr(param, " if");
> > + if (!p)
> > + p = strstr(param, "\tif");
> > + if (p) {
> > + if (p == trigger)
> > + return -EINVAL;
> > + param = p + 1;
> > + param = strstrip(param);
> > + *p = '\0';
> > + trigger = strstrip(trigger);
> > + } else
> > + param = NULL;
>
> This seems rather complex. Wouldn't the following work?
>
> param = skip_spaces(param);
> trigger = strsep(&param, " \t");
> if (param)
> param = strstrip(param);
>

Yes, much better ;-)

Tom


2017-09-08 19:10:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 33/40] tracing: Add hist trigger support for variable reference aliases

On Tue, 5 Sep 2017 16:57:45 -0500
Tom Zanussi <[email protected]> wrote:

> Add support for alias=$somevar where alias can be used as
> onmatch($alias).

This change log is very lacking. What exactly is the purpose of alias?

Sounds like the variable is going undercover.

-- Steve

>
> Signed-off-by: Tom Zanussi <[email protected]>

2017-09-08 19:25:20

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 34/40] tracing: Add 'last error' error facility for hist triggers

On Tue, 5 Sep 2017 16:57:46 -0500
Tom Zanussi <[email protected]> wrote:

> +static char *hist_err_str;
> +static char *last_hist_cmd;
> +
> +static int hist_err_alloc(void)
> +{
> + int ret = 0;
> +
> + last_hist_cmd = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
> + if (!last_hist_cmd)
> + return -ENOMEM;
> +
> + hist_err_str = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
> + if (!hist_err_str) {
> + kfree(last_hist_cmd);
> + ret = -ENOMEM;
> + }

This gets allocated during boot up. Why have it be allocated in the
first place? Just have it be strings:

static char hist_err_str[MAX_FILTER_STR_VAL];
static char last_hist_cmd[MAX_FILTER_STR_VAL];

You are not saving any space by doing it this way. In fact, you waste
it because now you need to add the pointers to the strings.

> +
> + return ret;
> +}
> +
> +static void last_cmd_set(char *str)
> +{
> + if (!last_hist_cmd || !str)
> + return;
> +
> + if (strlen(str) > MAX_FILTER_STR_VAL - 1)
> + return;

Instead of returning nothing, why not just truncate it?

> +
> + strcpy(last_hist_cmd, str);

strncpy(last_hist_cmd, str, MAX_FILTER_STR_VAL - 1);
last_hist_cmd[MAX_FILTER_STR_VAL - 1] = 0;

> +}
> +
> +static void hist_err(char *str, char *var)
> +{
> + int maxlen = MAX_FILTER_STR_VAL - 1;
> +
> + if (!hist_err_str || !str)
> + return;
> +
> + if (strlen(hist_err_str))
> + return;
> +
> + if (!var)
> + var = "";
> +
> + if (strlen(hist_err_str) + strlen(str) + strlen(var) > maxlen)
> + return;
> +
> + strcat(hist_err_str, str);
> + strcat(hist_err_str, var);
> +}
> +
> +static void hist_err_event(char *str, char *system, char *event, char *var)
> +{
> + char err[MAX_FILTER_STR_VAL];
> +
> + if (system && var)
> + sprintf(err, "%s.%s.%s", system, event, var);
> + else if (system)
> + sprintf(err, "%s.%s", system, event);
> + else
> + strcpy(err, var);

Use snprintf() and strncpy() for the above.

-- Steve

> +
> + hist_err(str, err);
> +}
> +
> +static void hist_err_clear(void)
> +{
> + if (!hist_err_str)
> + return;
> +
> + hist_err_str[0] = '\0';
> +}
> +
> +static bool have_hist_err(void)
> +{
> + if (hist_err_str && strlen(hist_err_str))
> + return true;
> +
> + return false;
> +}
> +

2017-09-08 19:31:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 35/40] tracing: Reverse the order event_mutex/trace_types_lock are taken

On Tue, 5 Sep 2017 16:57:47 -0500
Tom Zanussi <[email protected]> wrote:

> Change the order event_mutex and trace_types_lock are taken, to avoid
> circular dependencies and lockdep spew.
>
> Changing the order shouldn't matter to any current code, but does to
> anything that takes the event_mutex first and then trace_types_lock.
> This is the case when calling tracing_set_clock from inside an event
> command, which already holds the event_mutex.

This is a very scary patch. I'll apply it and run a bunch of tests with
lockdep enabled. Let's see what blows up (or not).

-- Steve

>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/trace_events.c | 16 ++++++++--------
> 1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index c93540c..889802c 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -1406,8 +1406,8 @@ static int subsystem_open(struct inode *inode, struct file *filp)
> return -ENODEV;
>
> /* Make sure the system still exists */
> - mutex_lock(&trace_types_lock);
> mutex_lock(&event_mutex);
> + mutex_lock(&trace_types_lock);
> list_for_each_entry(tr, &ftrace_trace_arrays, list) {
> list_for_each_entry(dir, &tr->systems, list) {
> if (dir == inode->i_private) {
> @@ -1421,8 +1421,8 @@ static int subsystem_open(struct inode *inode, struct file *filp)
> }
> }
> exit_loop:
> - mutex_unlock(&event_mutex);
> mutex_unlock(&trace_types_lock);
> + mutex_unlock(&event_mutex);
>
> if (!system)
> return -ENODEV;
> @@ -2294,15 +2294,15 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
> int trace_add_event_call(struct trace_event_call *call)
> {
> int ret;
> - mutex_lock(&trace_types_lock);
> mutex_lock(&event_mutex);
> + mutex_lock(&trace_types_lock);
>
> ret = __register_event(call, NULL);
> if (ret >= 0)
> __add_event_to_tracers(call);
>
> - mutex_unlock(&event_mutex);
> mutex_unlock(&trace_types_lock);
> + mutex_unlock(&event_mutex);
> return ret;
> }
>
> @@ -2356,13 +2356,13 @@ int trace_remove_event_call(struct trace_event_call *call)
> {
> int ret;
>
> - mutex_lock(&trace_types_lock);
> mutex_lock(&event_mutex);
> + mutex_lock(&trace_types_lock);
> down_write(&trace_event_sem);
> ret = probe_remove_event_call(call);
> up_write(&trace_event_sem);
> - mutex_unlock(&event_mutex);
> mutex_unlock(&trace_types_lock);
> + mutex_unlock(&event_mutex);
>
> return ret;
> }
> @@ -2424,8 +2424,8 @@ static int trace_module_notify(struct notifier_block *self,
> {
> struct module *mod = data;
>
> - mutex_lock(&trace_types_lock);
> mutex_lock(&event_mutex);
> + mutex_lock(&trace_types_lock);
> switch (val) {
> case MODULE_STATE_COMING:
> trace_module_add_events(mod);
> @@ -2434,8 +2434,8 @@ static int trace_module_notify(struct notifier_block *self,
> trace_module_remove_events(mod);
> break;
> }
> - mutex_unlock(&event_mutex);
> mutex_unlock(&trace_types_lock);
> + mutex_unlock(&event_mutex);
>
> return 0;
> }

2017-09-08 19:35:46

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 32/40] tracing: Add cpu field for hist triggers

On Fri, 2017-09-08 at 15:08 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:44 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > A common key to use in a histogram is the cpuid - add a new cpu
> > 'synthetic' field for that purpose. This field is named cpu rather
> > than $cpu or $common_cpu because 'cpu' already exists as a special
> > filter field and it makes more sense to match that rather than add
> > another name for the same thing.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
> > ---
> > Documentation/trace/events.txt | 18 ++++++++++++++++++
> > kernel/trace/trace_events_hist.c | 30 +++++++++++++++++++++++++++---
> > 2 files changed, 45 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
> > index 2cc08d4..9717688 100644
> > --- a/Documentation/trace/events.txt
> > +++ b/Documentation/trace/events.txt
> > @@ -668,6 +668,24 @@ The following commands are supported:
> > The examples below provide a more concrete illustration of the
> > concepts and typical usage patterns discussed above.
> >
> > + 'synthetic' event fields
> > + ------------------------
> > +
> > + There are a number of 'synthetic fields' available for use as keys
> > + or values in a hist trigger. These look like and behave as if they
> > + were event fields, but aren't actually part of the event's field
> > + definition or format file. They are however available for any
> > + event, and can be used anywhere an actual event field could be.
> > + 'Synthetic' field names are always prefixed with a '$' character to
> > + indicate that they're not normal fields (with the exception of
> > + 'cpu', for compatibility with existing filter usage):
> > +
> > + $common_timestamp u64 - timestamp (from ring buffer) associated
> > + with the event, in nanoseconds. May be
> > + modified by .usecs to have timestamps
> > + interpreted as microseconds.
>
> I guess the above should have been added with the synthetic field
> addition.
>

Yeah, somehow this bit got shuffled to the wrong place refactoring the
patches.

> > + cpu int - the cpu on which the event occurred.
>
> Then this (and the explanation of '$' for cpu above) should be added
> with this patch.
>

Yep, will do.

>
> > +
> >
> > 6.2 'hist' trigger examples
> > ---------------------------
> > diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> > index 4f66f2e..0782766 100644
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -226,6 +226,7 @@ enum hist_field_flags {
> > HIST_FIELD_FL_VAR_ONLY = 8192,
> > HIST_FIELD_FL_EXPR = 16384,
> > HIST_FIELD_FL_VAR_REF = 32768,
> > + HIST_FIELD_FL_CPU = 65536,
> > };
> >
> > struct var_defs {
> > @@ -1170,6 +1171,16 @@ static u64 hist_field_timestamp(struct hist_field *hist_field,
> > return ts;
> > }
> >
> > +static u64 hist_field_cpu(struct hist_field *hist_field,
> > + struct tracing_map_elt *elt,
> > + struct ring_buffer_event *rbe,
> > + void *event)
> > +{
> > + int cpu = raw_smp_processor_id();
>
> Hmm, I wonder if this should be just smp_processor_id(). Why have the
> raw_?
>

You're right, smp_processor_id() should be fine.

> > +
> > + return cpu;
> > +}
> > +
> > static struct hist_field *check_var_ref(struct hist_field *hist_field,
> > struct hist_trigger_data *var_data,
> > unsigned int var_idx)
> > @@ -1518,6 +1529,8 @@ static const char *hist_field_name(struct hist_field *field,
> > field_name = hist_field_name(field->operands[0], ++level);
> > else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
> > field_name = "$common_timestamp";
> > + else if (field->flags & HIST_FIELD_FL_CPU)
> > + field_name = "cpu";
> > else if (field->flags & HIST_FIELD_FL_EXPR ||
> > field->flags & HIST_FIELD_FL_VAR_REF)
> > field_name = field->name;
> > @@ -1990,6 +2003,15 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
> > goto out;
> > }
> >
> > + if (flags & HIST_FIELD_FL_CPU) {
> > + hist_field->fn = hist_field_cpu;
> > + hist_field->size = sizeof(int);
> > + hist_field->type = kstrdup("int", GFP_KERNEL);
> > + if (!hist_field->type)
> > + goto free;
> > + goto out;
> > + }
> > +
> > if (WARN_ON_ONCE(!field))
> > goto out;
> >
> > @@ -2182,7 +2204,9 @@ static struct hist_field *parse_var_ref(struct trace_array *tr,
> > hist_data->enable_timestamps = true;
> > if (*flags & HIST_FIELD_FL_TIMESTAMP_USECS)
> > hist_data->attrs->ts_in_usecs = true;
> > - } else {
> > + } else if (strcmp(field_name, "cpu") == 0)
> > + *flags |= HIST_FIELD_FL_CPU;
> > + else {
> > field = trace_find_event_field(file->event_call, field_name);
> > if (!field || !field->size) {
> > field = ERR_PTR(-EINVAL);
>
>
> > @@ -3185,7 +3209,6 @@ static int onmatch_create(struct hist_trigger_data *hist_data,
> > goto out;
> > }
> > }
> > -
> > if (param[0] == '$')
> > hist_field = onmatch_find_var(hist_data, data, system,
> > event_name, param);
> > @@ -3200,7 +3223,6 @@ static int onmatch_create(struct hist_trigger_data *hist_data,
> > ret = -EINVAL;
> > goto out;
> > }
> > -
>
> Why the modification of whitespace here?
>

Again no good reason, another casualty of refactoring lots of
patches ;-)

Tom


2017-09-08 19:42:07

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 33/40] tracing: Add hist trigger support for variable reference aliases

On Fri, 2017-09-08 at 15:09 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:45 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Add support for alias=$somevar where alias can be used as
> > onmatch($alias).
>
> This change log is very lacking. What exactly is the purpose of alias?
>
> Sounds like the variable is going undercover.
>

This was a feature added at the request of a user, who wanted more
flexibility in making naming clearer in certain cases e.g.:

# echo 'hist:keys=next_pid:new_lat=$common_timestamp.usecs' > /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

# echo 'hist:keys=pid:latency=$new_lat:onmatch(sched.sched_switch).wake2($latency,pid)' > /sys/kernel/debug/tracing/events/synthetic/wake1/trigger

I'll add a better description and exampe to the change log for this.

Tom



2017-09-08 19:41:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 35/40] tracing: Reverse the order event_mutex/trace_types_lock are taken

On Fri, 8 Sep 2017 15:31:35 -0400
Steven Rostedt <[email protected]> wrote:

> On Tue, 5 Sep 2017 16:57:47 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Change the order event_mutex and trace_types_lock are taken, to avoid
> > circular dependencies and lockdep spew.
> >
> > Changing the order shouldn't matter to any current code, but does to
> > anything that takes the event_mutex first and then trace_types_lock.
> > This is the case when calling tracing_set_clock from inside an event
> > command, which already holds the event_mutex.
>
> This is a very scary patch. I'll apply it and run a bunch of tests with
> lockdep enabled. Let's see what blows up (or not).

Boom!

======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc7-test+ #84 Not tainted
------------------------------------------------------
mkdir/1674 is trying to acquire lock:
(event_mutex){+.+.+.}, at: [<ffffffff811b18bd>] event_trace_add_tracer+0x1d/0xb0

but task is already holding lock:
(trace_types_lock){+.+.+.}, at: [<ffffffff811a121f>] instance_mkdir+0x2f/0x250

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (trace_types_lock){+.+.+.}:
lock_acquire+0xe3/0x1d0
__mutex_lock+0x81/0x950
mutex_lock_nested+0x1b/0x20
trace_module_notify+0x33/0x1b0
notifier_call_chain+0x4a/0x70
__blocking_notifier_call_chain+0x4d/0x70
blocking_notifier_call_chain+0x16/0x20
load_module+0x21df/0x2dd0
SYSC_finit_module+0xbc/0xf0
SyS_finit_module+0xe/0x10
do_syscall_64+0x62/0x140
return_from_SYSCALL_64+0x0/0x7a

-> #0 (event_mutex){+.+.+.}:
__lock_acquire+0x1026/0x11d0
lock_acquire+0xe3/0x1d0
__mutex_lock+0x81/0x950
mutex_lock_nested+0x1b/0x20
event_trace_add_tracer+0x1d/0xb0
instance_mkdir+0x173/0x250
tracefs_syscall_mkdir+0x40/0x70
vfs_mkdir+0xfb/0x190
SyS_mkdir+0x6b/0xd0
entry_SYSCALL_64_fastpath+0x1f/0xbe

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(trace_types_lock);
lock(event_mutex);
lock(trace_types_lock);
lock(event_mutex);

*** DEADLOCK ***

2 locks held by mkdir/1674:
#0: (sb_writers#10){.+.+.+}, at: [<ffffffff812b4824>] mnt_want_write+0x24/0x50
#1: (trace_types_lock){+.+.+.}, at: [<ffffffff811a121f>] instance_mkdir+0x2f/0x250

stack backtrace:
CPU: 3 PID: 1674 Comm: mkdir Not tainted 4.13.0-rc7-test+ #84
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
dump_stack+0x86/0xcf
print_circular_bug+0x1be/0x210
__lock_acquire+0x1026/0x11d0
lock_acquire+0xe3/0x1d0
? lock_acquire+0xe3/0x1d0
? event_trace_add_tracer+0x1d/0xb0
? event_trace_add_tracer+0x1d/0xb0
__mutex_lock+0x81/0x950
? event_trace_add_tracer+0x1d/0xb0
? event_trace_add_tracer+0x1d/0xb0
? __create_dir+0xcb/0x130
mutex_lock_nested+0x1b/0x20
? mutex_lock_nested+0x1b/0x20
event_trace_add_tracer+0x1d/0xb0
instance_mkdir+0x173/0x250
tracefs_syscall_mkdir+0x40/0x70
? tracefs_syscall_mkdir+0x40/0x70
vfs_mkdir+0xfb/0x190
SyS_mkdir+0x6b/0xd0
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x7f4867afa947
RSP: 002b:00007ffd3dc35c08 EFLAGS: 00000202 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f4867afa947
RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffd3dc3764f
RBP: ffffc90000f2ff98 R08: 00000000000001ff R09: 000055f96a551ac0
R10: 000055f96b700010 R11: 0000000000000202 R12: 0000000000000001
R13: 00007ffd3dc35f28 R14: 00000000000001ff R15: ffffffff8189316a
? entry_SYSCALL_64_after_swapgs+0x17/0x4f

It appears to be caused by instance creation. I'll look at that.


-- Steve

2017-09-08 19:44:30

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 34/40] tracing: Add 'last error' error facility for hist triggers

On Fri, 2017-09-08 at 15:25 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:46 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > +static char *hist_err_str;
> > +static char *last_hist_cmd;
> > +
> > +static int hist_err_alloc(void)
> > +{
> > + int ret = 0;
> > +
> > + last_hist_cmd = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
> > + if (!last_hist_cmd)
> > + return -ENOMEM;
> > +
> > + hist_err_str = kzalloc(MAX_FILTER_STR_VAL, GFP_KERNEL);
> > + if (!hist_err_str) {
> > + kfree(last_hist_cmd);
> > + ret = -ENOMEM;
> > + }
>
> This gets allocated during boot up. Why have it be allocated in the
> first place? Just have it be strings:
>
> static char hist_err_str[MAX_FILTER_STR_VAL];
> static char last_hist_cmd[MAX_FILTER_STR_VAL];
>
> You are not saving any space by doing it this way. In fact, you waste
> it because now you need to add the pointers to the strings.
>

Good point. I'll change that along with the other suggestions below.

Thanks,

Tom


2017-09-08 20:00:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 35/40] tracing: Reverse the order event_mutex/trace_types_lock are taken

On Fri, 8 Sep 2017 15:41:36 -0400
Steven Rostedt <[email protected]> wrote:

> On Fri, 8 Sep 2017 15:31:35 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > On Tue, 5 Sep 2017 16:57:47 -0500
> > Tom Zanussi <[email protected]> wrote:
> >
> > > Change the order event_mutex and trace_types_lock are taken, to avoid
> > > circular dependencies and lockdep spew.
> > >
> > > Changing the order shouldn't matter to any current code, but does to
> > > anything that takes the event_mutex first and then trace_types_lock.
> > > This is the case when calling tracing_set_clock from inside an event
> > > command, which already holds the event_mutex.
> >
> > This is a very scary patch. I'll apply it and run a bunch of tests with
> > lockdep enabled. Let's see what blows up (or not).
>
> Boom!
>


> It appears to be caused by instance creation. I'll look at that.

OK, this may be a simple fix. I'll send you a patch to fold in after I
finish testing it.

-- Steve

2017-09-08 20:28:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 40/40] tracing: Add trace_event_buffer_reserve() variant that allows recursion

On Tue, 5 Sep 2017 16:57:52 -0500
Tom Zanussi <[email protected]> wrote:

> Synthetic event generation requires the reservation of a second event
> while the reservation of a previous event is still in progress. The
> trace_recursive_lock() check in ring_buffer_lock_reserve() prevents
> this however.
>
> This sets up a special reserve pathway for this particular case,
> leaving existing pathways untouched, other than an additional check in
> ring_buffer_lock_reserve() and trace_event_buffer_reserve(). These
> checks could be gotten rid of as well, with copies of those functions,
> but for now try to avoid that unless necessary.
>
> Signed-off-by: Tom Zanussi <[email protected]>

I've been planing on changing that lock, which may help you here
without having to mess around with parameters. That is to simply add a
counter. Would this patch help you. You can add a patch to increment
the count to 5 with an explanation of handling synthetic events, but
even getting to 4 is extremely unlikely.

I'll make this into an official patch if this works for you, and then
you can include it in your series.

-- Steve

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0bcc53e..9dbb459 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2582,61 +2582,29 @@ rb_wakeups(struct ring_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
* The lock and unlock are done within a preempt disable section.
* The current_context per_cpu variable can only be modified
* by the current task between lock and unlock. But it can
- * be modified more than once via an interrupt. To pass this
- * information from the lock to the unlock without having to
- * access the 'in_interrupt()' functions again (which do show
- * a bit of overhead in something as critical as function tracing,
- * we use a bitmask trick.
+ * be modified more than once via an interrupt. There are four
+ * different contexts that we need to consider.
*
- * bit 0 = NMI context
- * bit 1 = IRQ context
- * bit 2 = SoftIRQ context
- * bit 3 = normal context.
+ * Normal context.
+ * SoftIRQ context
+ * IRQ context
+ * NMI context
*
- * This works because this is the order of contexts that can
- * preempt other contexts. A SoftIRQ never preempts an IRQ
- * context.
- *
- * When the context is determined, the corresponding bit is
- * checked and set (if it was set, then a recursion of that context
- * happened).
- *
- * On unlock, we need to clear this bit. To do so, just subtract
- * 1 from the current_context and AND it to itself.
- *
- * (binary)
- * 101 - 1 = 100
- * 101 & 100 = 100 (clearing bit zero)
- *
- * 1010 - 1 = 1001
- * 1010 & 1001 = 1000 (clearing bit 1)
- *
- * The least significant bit can be cleared this way, and it
- * just so happens that it is the same bit corresponding to
- * the current context.
+ * If for some reason the ring buffer starts to recurse, we
+ * only allow that to happen at most 4 times (one for each
+ * context). If it happens 5 times, then we consider this a
+ * recusive loop and do not let it go further.
*/

static __always_inline int
trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
{
- unsigned int val = cpu_buffer->current_context;
- int bit;
-
- if (in_interrupt()) {
- if (in_nmi())
- bit = RB_CTX_NMI;
- else if (in_irq())
- bit = RB_CTX_IRQ;
- else
- bit = RB_CTX_SOFTIRQ;
- } else
- bit = RB_CTX_NORMAL;
-
- if (unlikely(val & (1 << bit)))
+ if (cpu_buffer->current_context >= 4)
return 1;

- val |= (1 << bit);
- cpu_buffer->current_context = val;
+ cpu_buffer->current_context++;
+ /* Interrupts must see this update */
+ barrier();

return 0;
}
@@ -2644,7 +2612,9 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
static __always_inline void
trace_recursive_unlock(struct ring_buffer_per_cpu *cpu_buffer)
{
- cpu_buffer->current_context &= cpu_buffer->current_context - 1;
+ /* Don't let the dec leak out */
+ barrier();
+ cpu_buffer->current_context--;
}

/**

2017-09-08 20:41:45

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 40/40] tracing: Add trace_event_buffer_reserve() variant that allows recursion

On Fri, 2017-09-08 at 16:27 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:52 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Synthetic event generation requires the reservation of a second event
> > while the reservation of a previous event is still in progress. The
> > trace_recursive_lock() check in ring_buffer_lock_reserve() prevents
> > this however.
> >
> > This sets up a special reserve pathway for this particular case,
> > leaving existing pathways untouched, other than an additional check in
> > ring_buffer_lock_reserve() and trace_event_buffer_reserve(). These
> > checks could be gotten rid of as well, with copies of those functions,
> > but for now try to avoid that unless necessary.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
>
> I've been planing on changing that lock, which may help you here
> without having to mess around with parameters. That is to simply add a
> counter. Would this patch help you. You can add a patch to increment
> the count to 5 with an explanation of handling synthetic events, but
> even getting to 4 is extremely unlikely.
>
> I'll make this into an official patch if this works for you, and then
> you can include it in your series.
>

Yes, this should definitely work, and in fact I considered something
similar at one point. Thanks for doing this, it's much simpler than I
would have ended up with!

Tom


> -- Steve
>
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 0bcc53e..9dbb459 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2582,61 +2582,29 @@ rb_wakeups(struct ring_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
> * The lock and unlock are done within a preempt disable section.
> * The current_context per_cpu variable can only be modified
> * by the current task between lock and unlock. But it can
> - * be modified more than once via an interrupt. To pass this
> - * information from the lock to the unlock without having to
> - * access the 'in_interrupt()' functions again (which do show
> - * a bit of overhead in something as critical as function tracing,
> - * we use a bitmask trick.
> + * be modified more than once via an interrupt. There are four
> + * different contexts that we need to consider.
> *
> - * bit 0 = NMI context
> - * bit 1 = IRQ context
> - * bit 2 = SoftIRQ context
> - * bit 3 = normal context.
> + * Normal context.
> + * SoftIRQ context
> + * IRQ context
> + * NMI context
> *
> - * This works because this is the order of contexts that can
> - * preempt other contexts. A SoftIRQ never preempts an IRQ
> - * context.
> - *
> - * When the context is determined, the corresponding bit is
> - * checked and set (if it was set, then a recursion of that context
> - * happened).
> - *
> - * On unlock, we need to clear this bit. To do so, just subtract
> - * 1 from the current_context and AND it to itself.
> - *
> - * (binary)
> - * 101 - 1 = 100
> - * 101 & 100 = 100 (clearing bit zero)
> - *
> - * 1010 - 1 = 1001
> - * 1010 & 1001 = 1000 (clearing bit 1)
> - *
> - * The least significant bit can be cleared this way, and it
> - * just so happens that it is the same bit corresponding to
> - * the current context.
> + * If for some reason the ring buffer starts to recurse, we
> + * only allow that to happen at most 4 times (one for each
> + * context). If it happens 5 times, then we consider this a
> + * recusive loop and do not let it go further.
> */
>
> static __always_inline int
> trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
> {
> - unsigned int val = cpu_buffer->current_context;
> - int bit;
> -
> - if (in_interrupt()) {
> - if (in_nmi())
> - bit = RB_CTX_NMI;
> - else if (in_irq())
> - bit = RB_CTX_IRQ;
> - else
> - bit = RB_CTX_SOFTIRQ;
> - } else
> - bit = RB_CTX_NORMAL;
> -
> - if (unlikely(val & (1 << bit)))
> + if (cpu_buffer->current_context >= 4)
> return 1;
>
> - val |= (1 << bit);
> - cpu_buffer->current_context = val;
> + cpu_buffer->current_context++;
> + /* Interrupts must see this update */
> + barrier();
>
> return 0;
> }
> @@ -2644,7 +2612,9 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
> static __always_inline void
> trace_recursive_unlock(struct ring_buffer_per_cpu *cpu_buffer)
> {
> - cpu_buffer->current_context &= cpu_buffer->current_context - 1;
> + /* Don't let the dec leak out */
> + barrier();
> + cpu_buffer->current_context--;
> }
>
> /**


2017-09-12 01:50:08

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 00/40] tracing: Inter-event (e.g. latency) support

On Tue, 5 Sep 2017 16:57:12 -0500
Tom Zanussi <[email protected]> wrote:

> Hi,
>
> This is V2 of the inter-event tracing patchset.
>
> There are too many changes to list in detail, most of them directly
> addressing input from V1, but here are the major changes from V1
> (thanks to everyone who reviewed V1 and thanks to both Vedang Patel
> and Baohong Liu for their contributions and included patches):
[..]

>
> Documentation/trace/events.txt | 431 ++++
> include/linux/ring_buffer.h | 17 +-
> include/linux/trace_events.h | 24 +-
> include/linux/tracepoint-defs.h | 1 +
> kernel/trace/ring_buffer.c | 126 +-
> kernel/trace/trace.c | 205 +-
> kernel/trace/trace.h | 25 +-
> kernel/trace/trace_events.c | 51 +-
> kernel/trace/trace_events_hist.c | 4472 +++++++++++++++++++++++++++++++----
> kernel/trace/trace_events_trigger.c | 53 +-
> kernel/trace/trace_kprobe.c | 18 +-
> kernel/trace/trace_probe.c | 86 -
> kernel/trace/trace_probe.h | 7 -
> kernel/trace/trace_uprobe.c | 2 +-
> kernel/trace/tracing_map.c | 229 +-
> kernel/trace/tracing_map.h | 20 +-
> kernel/tracepoint.c | 18 +-
> 17 files changed, 5073 insertions(+), 712 deletions(-)

There seems no Makefile and Kconfig changes. Would this mean
we just need CONFIG_HIST_TRIGGERS=y to enable it?

I think you'd better update the HIST_TRIGGER's description
so that user can notice which should be enabled for inter-event
tracing.

Thank you,



--
Masami Hiramatsu <[email protected]>

2017-09-12 02:16:17

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 36/40] tracing: Remove lookups from tracing_map hitcount

Hi Tom,

On Tue, 5 Sep 2017 16:57:48 -0500
Tom Zanussi <[email protected]> wrote:

> Lookups inflate the hitcount, making it essentially useless. Only
> inserts and updates should really affect the hitcount anyway, so
> explicitly filter lookups out.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/tracing_map.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
> index a4e5a56..f8e2338 100644
> --- a/kernel/trace/tracing_map.c
> +++ b/kernel/trace/tracing_map.c
> @@ -538,7 +538,8 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
> if (test_key && test_key == key_hash) {
> if (entry->val &&
> keys_match(key, entry->val->key, map->key_size)) {
> - atomic64_inc(&map->hits);
> + if (!lookup_only)
> + atomic64_inc(&map->hits);

Is this a kind of bugfix for current code?
If so, such patch can be posted in separate series.

Thank you,

> return entry->val;
> } else if (unlikely(!entry->val)) {
> /*
> --
> 1.9.3
>


--
Masami Hiramatsu <[email protected]>

2017-09-12 02:18:55

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 38/40] tracing: Make tracing_set_clock() non-static

On Tue, 5 Sep 2017 16:57:50 -0500
Tom Zanussi <[email protected]> wrote:

> Allow tracing code outside of trace.c to access tracing_set_clock().
>
> Some applications may require a particular clock in order to function
> properly, such as latency calculations.
>
> Also, add an accessor returning the current clock string.

It seems this patch should be merged in the next patch, which
is actual consumer of this API.

Thank you,

>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/trace.c | 2 +-
> kernel/trace/trace.h | 1 +
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index d40839d..ecdf456 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -6207,7 +6207,7 @@ static int tracing_clock_show(struct seq_file *m, void *v)
> return 0;
> }
>
> -static int tracing_set_clock(struct trace_array *tr, const char *clockstr)
> +int tracing_set_clock(struct trace_array *tr, const char *clockstr)
> {
> int i;
>
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 7b78762..1f3443a 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -288,6 +288,7 @@ enum {
> extern void trace_array_put(struct trace_array *tr);
>
> extern int tracing_set_time_stamp_abs(struct trace_array *tr, bool abs);
> +extern int tracing_set_clock(struct trace_array *tr, const char *clockstr);
>
> extern bool trace_clock_in_ns(struct trace_array *tr);
>
> --
> 1.9.3
>


--
Masami Hiramatsu <[email protected]>

2017-09-12 14:14:44

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 00/40] tracing: Inter-event (e.g. latency) support

Hi Masami,

On Tue, 2017-09-12 at 10:50 +0900, Masami Hiramatsu wrote:
> On Tue, 5 Sep 2017 16:57:12 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Hi,
> >
> > This is V2 of the inter-event tracing patchset.
> >
> > There are too many changes to list in detail, most of them directly
> > addressing input from V1, but here are the major changes from V1
> > (thanks to everyone who reviewed V1 and thanks to both Vedang Patel
> > and Baohong Liu for their contributions and included patches):
> [..]
>
> >
> > Documentation/trace/events.txt | 431 ++++
> > include/linux/ring_buffer.h | 17 +-
> > include/linux/trace_events.h | 24 +-
> > include/linux/tracepoint-defs.h | 1 +
> > kernel/trace/ring_buffer.c | 126 +-
> > kernel/trace/trace.c | 205 +-
> > kernel/trace/trace.h | 25 +-
> > kernel/trace/trace_events.c | 51 +-
> > kernel/trace/trace_events_hist.c | 4472 +++++++++++++++++++++++++++++++----
> > kernel/trace/trace_events_trigger.c | 53 +-
> > kernel/trace/trace_kprobe.c | 18 +-
> > kernel/trace/trace_probe.c | 86 -
> > kernel/trace/trace_probe.h | 7 -
> > kernel/trace/trace_uprobe.c | 2 +-
> > kernel/trace/tracing_map.c | 229 +-
> > kernel/trace/tracing_map.h | 20 +-
> > kernel/tracepoint.c | 18 +-
> > 17 files changed, 5073 insertions(+), 712 deletions(-)
>
> There seems no Makefile and Kconfig changes. Would this mean
> we just need CONFIG_HIST_TRIGGERS=y to enable it?
>

Yes - I didn't see a need for a separate config for this.

> I think you'd better update the HIST_TRIGGER's description
> so that user can notice which should be enabled for inter-event
> tracing.
>

Good point, will do.

Thanks,

Tom


2017-09-12 14:16:23

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 36/40] tracing: Remove lookups from tracing_map hitcount

Hi Masami,

On Tue, 2017-09-12 at 11:16 +0900, Masami Hiramatsu wrote:
> Hi Tom,
>
> On Tue, 5 Sep 2017 16:57:48 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Lookups inflate the hitcount, making it essentially useless. Only
> > inserts and updates should really affect the hitcount anyway, so
> > explicitly filter lookups out.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
> > ---
> > kernel/trace/tracing_map.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c
> > index a4e5a56..f8e2338 100644
> > --- a/kernel/trace/tracing_map.c
> > +++ b/kernel/trace/tracing_map.c
> > @@ -538,7 +538,8 @@ static inline bool keys_match(void *key, void *test_key, unsigned key_size)
> > if (test_key && test_key == key_hash) {
> > if (entry->val &&
> > keys_match(key, entry->val->key, map->key_size)) {
> > - atomic64_inc(&map->hits);
> > + if (!lookup_only)
> > + atomic64_inc(&map->hits);
>
> Is this a kind of bugfix for current code?
> If so, such patch can be posted in separate series.
>

Yeah, that sounds like a good idea - will separate this out along with
another one or two in the same boat..

Thanks,

Tom


2017-09-12 14:18:03

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 38/40] tracing: Make tracing_set_clock() non-static

On Tue, 2017-09-12 at 11:18 +0900, Masami Hiramatsu wrote:
> On Tue, 5 Sep 2017 16:57:50 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Allow tracing code outside of trace.c to access tracing_set_clock().
> >
> > Some applications may require a particular clock in order to function
> > properly, such as latency calculations.
> >
> > Also, add an accessor returning the current clock string.
>
> It seems this patch should be merged in the next patch, which
> is actual consumer of this API.
>

Yeah, that might make more sense in this case.

Thanks,

Tom


2017-09-19 16:31:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 00/40] tracing: Inter-event (e.g. latency) support

On Tue, 5 Sep 2017 16:57:12 -0500
Tom Zanussi <[email protected]> wrote:

> Hi,
>
> This is V2 of the inter-event tracing patchset.
>

Hi Tom,

I was wondering if you had a v3 ready? I would like to get it into the
next merge window, but I would also like it to be in linux-next early,
which means we need the next version rather soon (hopefully that will
be the last version).

-- Steve

2017-09-19 18:44:46

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 00/40] tracing: Inter-event (e.g. latency) support

Hi Steve,

On Tue, 2017-09-19 at 12:31 -0400, Steven Rostedt wrote:
> On Tue, 5 Sep 2017 16:57:12 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Hi,
> >
> > This is V2 of the inter-event tracing patchset.
> >
>
> Hi Tom,
>
> I was wondering if you had a v3 ready? I would like to get it into the
> next merge window, but I would also like it to be in linux-next early,
> which means we need the next version rather soon (hopefully that will
> be the last version).
>

Yeah, it's almost ready. At this point, I've addressed all the comments
except for:

- PATCH v2 25/40] tracing: Add support for dynamic tracepoints

which I need to do a little bit of research on to figure out what
exactly I need to do there.

I was also kind of looking for a couple patches from you to fold in
which you had mentioned you were going to send for:

- [PATCH v2 35/40] tracing: Reverse the order event_mutex/trace_types_lock are taken
- [PATCH v2 40/40] tracing: Add trace_event_buffer_reserve() variant that allows recursion

I've been using the trace_recursive_lock() patch you posted in place of
the latter and it's working fine. :-)

Tom

> -- Steve


2017-09-20 14:44:56

by Julia Cartwright

[permalink] [raw]
Subject: Re: [PATCH v2 37/40] tracing: Add inter-event hist trigger Documentation

On Tue, Sep 05, 2017 at 04:57:49PM -0500, Tom Zanussi wrote:
> Add background and details on inter-event hist triggers, including
> hist variables, synthetic events, and actions.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> Signed-off-by: Baohong Liu <[email protected]>
> ---
> Documentation/trace/events.txt | 385 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 385 insertions(+)
>
> diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
> index f271d87..7ee720b 100644
> --- a/Documentation/trace/events.txt
> +++ b/Documentation/trace/events.txt
> @@ -571,6 +571,7 @@ The following commands are supported:
> .sym-offset display an address as a symbol and offset
> .syscall display a syscall id as a system call name
> .execname display a common_pid as a program name
> + .usecs display a $common_timestamp in microseconds
>
> Note that in general the semantics of a given field aren't
> interpreted when applying a modifier to it, but there are some
> @@ -2101,3 +2102,387 @@ The following commands are supported:
> Hits: 489
> Entries: 7
> Dropped: 0
> +
> +6.3 Inter-event hist triggers
> +-----------------------------
> +
> +Inter-event hist triggers are hist triggers that combine values from
> +one or more other events and create a histogram using that data. Data
> +from an inter-event histogram can in turn become the source for
> +further combined histograms, thus providing a chain of related
> +histograms, which is important for some applications.
> +
> +The most important example of an inter-event quantity that can be used
> +in this manner is latency, which is simply a difference in timestamps
> +between two events (although trace events don't have an externally
> +visible timestamp field, the inter-event hist trigger support adds a
> +pseudo-field to all events named '$common_timestamp' which can be used
> +as if it were an actual event field)

What's between the parentheses is covered below, is it worth mentioning
in both places?

[..]
> + - Trace events don't have a 'timestamp' associated with them, but
> + there is an implicit timestamp saved along with an event in the
> + underlying ftrace ring buffer. This timestamp is now exposed as a
> + a synthetic field named '$common_timestamp' which can be used in
> + histograms as if it were any other event field. Note that it has
> + a '$' prefixed to it - this is meant to indicate that it isn't an
> + actual field in the trace format but rather is a synthesized value
> + that nonetheless can be used as if it were an actual field. By
> + default it is in units of nanoseconds; appending '.usecs' to a
> + common_timestamp field changes the units to microseconds.
> +
> +These features are decribed in more detail in the following sections.

^ described

> +
> +6.3.1 Histogram Variables
> +-------------------------
> +
> +Variables are simply named locations used for saving and retrieving
> +values between matching events. A 'matching' event is defined as an
> +event that has a matching key - if a variable is saved for a histogram
> +entry corresponding to that key, any subsequent event with a matching
> +key can access that variable.
> +
> +A variable's value is normally available to any subsequent event until
> +it is set to something else by a subsequent event. The one exception
> +to that rule is that any variable used in an expression is essentially
> +'read-once' - once it's used by an expression in a subsequent event,
> +it's reset to its 'unset' state, which means it can't be used again
> +unless it's set again. This ensures not only that an event doesn't
> +use an uninitialized variable in a calculation, but that that variable
> +is used only once and not for any unrelated subsequent match.
> +
> +The basic syntax for saving a variable is to simply prefix a unique
> +variable name not corresponding to any keyword along with an '=' sign
> +to any event field.

This would seem to make it much more difficult in the future to add new
keywords for hist triggers without breaking users existing triggers.
Maybe that's not a problem given the tracing ABI wild west.

[..]
> +6.3.2 Synthetic Events
> +----------------------
> +
> +Synthetic events are user-defined events generated from hist trigger
> +variables or fields associated with one or more other events. Their
> +purpose is to provide a mechanism for displaying data spanning
> +multiple events consistent with the existing and already familiar
> +usage for normal events.
> +
> +To define a synthetic event, the user writes a simple specification
> +consisting of the name of the new event along with one or more
> +variables and their types, which can be any valid field type,
> +separated by semicolons, to the tracing/synthetic_events file.
> +
> +For instance, the following creates a new event named 'wakeup_latency'
> +with 3 fields: lat, pid, and prio. Each of those fields is simply a
> +variable reference to a variable on another event:
> +
> + # echo 'wakeup_latency \
> + u64 lat; \
> + pid_t pid; \
> + int prio' >> \
> + /sys/kernel/debug/tracing/synthetic_events
> +
> +Reading the tracing/synthetic_events file lists all the currently
> +defined synthetic events, in this case the event defined above:
> +
> + # cat /sys/kernel/debug/tracing/synthetic_events
> + wakeup_latency u64 lat; pid_t pid; int prio
> +
> +An existing synthetic event definition can be removed by prepending
> +the command that defined it with a '!':
> +
> + # echo '!wakeup_latency u64 lat pid_t pid int prio' >> \
> + /sys/kernel/debug/tracing/synthetic_events
> +
> +At this point, there isn't yet an actual 'wakeup_latency' event
> +instantiated in the event subsytem - for this to happen, a 'hist
> +trigger action' needs to be instantiated and bound to actual fields
> +and variables defined on other events (see Section 6.3.3 below).
> +
> +Once that is done, an event instance is created, and a histogram can
> +be defined using it:
> +
> + # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
> + /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
> +
> +The new event is created under the tracing/events/synthetic/ directory
> +and looks and behaves just like any other event:
> +
> + # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
> + enable filter format hist id trigger
> +
> +Like any other event, once a histogram is enabled for the event, the
> +output can be displayed by reading the event's 'hist' file.
> +
> +6.3.3 Hist trigger 'actions'
> +----------------------------
> +
> +A hist trigger 'action' is a function that's executed whenever a
> +histogram entry is added or updated.
> +
> +The default 'action' if no special function is explicity specified is
> +as it always has been, to simply update the set of values associated
> +with an entry. Some applications, however, may want to perform
> +additional actions at that point, such as generate another event, or
> +compare and save a maximum.
> +
> +The following additional actions are available. To specify an action
> +for a given event, simply specify the action between colons in the
> +hist trigger specification.
> +
> + - onmatch(matching.event).<synthetic_event_name>(param list)
> +
> + The 'onmatch(matching.event).<synthetic_event_name>(params)' hist
> + trigger action is invoked whenever an event matches and the
> + histogram entry would be added or updated.

I understand from the documentation that 'onmatch' establishes a
relation between events, but it isn't clear the nature of this relation.

In other words: what criteria are used to determine whether two events
match? Is it the trace buffer ordering? Time ordering? Something
else?

> It causes the named
> + synthetic event to be generated with the values given in the
> + 'param list'. The result is the generation of a synthetic event
> + that consists of the values contained in those variables at the
> + time the invoking event was hit.
> +
> + The 'param list' consists of one or more parameters which may be
> + either variables or fields defined on either the 'matching.event'
> + or the target event. The variables or fields specified in the
> + param list may be either fully-qualified or unqualified. If a
> + variable is specified as unqualified, it must be unique between
> + the two events. A field name used as a param can be unqualified
> + if it refers to the target event, but must be fully qualified if
> + it refers to the matching event. A fully-qualified name is of the
> + form 'system.event_name.$var_name' or 'system.event_name.field'.
> +
> + The 'matching.event' specification is simply the fully qualified
> + event name of the event that matches the target event for the
> + onmatch() functionality, in the form 'system.event_name'.
> +
> + Finally, the number and type of variables/fields in the 'param
> + list' must match the number and types of the fields in the
> + synthetic event being generated.
> +
> + As an example the below defines a simple synthetic event and uses
> + a variable defined on the sched_wakeup_new event as a parameter
> + when invoking the synthetic event. Here we define the synthetic
> + event:
> +
> + # echo 'wakeup_new_test pid_t pid' >> \
> + /sys/kernel/debug/tracing/synthetic_events
> +
> + # cat /sys/kernel/debug/tracing/synthetic_events
> + wakeup_new_test pid_t pid
> +
> + The following hist trigger both defines the missing testpid
> + variable and specifies an onmatch() action that generates a
> + wakeup_new_test synthetic event whenever a sched_wakeup_new event
> + occurs, which because of the 'if comm == "cyclictest"' filter only
> + happens when the executable is cyclictest:
> +
> + # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\
> + wakeup_new_test($testpid) if comm=="cyclictest"' >> \
> + /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
> +
> + Creating and displaying a histogram based on those events is now
> + just a matter of using the fields and new synthetic event in the
> + tracing/events/synthetic directory, as usual:
> +
> + # echo 'hist:keys=pid:sort=pid' >> \
> + /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger
> +
> + Running 'cyclictest' should cause wakeup_new events to generate
> + wakeup_new_test synthetic events which should result in histogram
> + output in the wakeup_new_test event's hist file:
> +
> + # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist
> +
> + A more typical usage would be to use two events to calculate a
> + latency. The following example uses a set of hist triggers to
> + produce a 'wakeup_latency' histogram:
> +
> + First, we define a 'wakeup_latency' synthetic event:
> +
> + # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
> + /sys/kernel/debug/tracing/synthetic_events
> +
> + Next, we specify that whenever we see a sched_waking event for a
> + cyclictest thread, save the timestamp in a 'ts0' variable:
> +
> + # echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=$common_timestamp.usecs \
> + if comm=="cyclictest"' >> \
> + /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
> +
> + Then, when the corresponding thread is actually scheduled onto the
> + CPU by a sched_switch event, calculate the latency and use that
> + along with another variable and an event field to generate a
> + wakeup_latency synthetic event:
> +
> + # echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0:\
> + onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\
> + $saved_pid,next_prio) if next_comm=="cyclictest"' >> \
> + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

Just a check for understanding:

The onmatch() relation between events is driven by the use of the
histogram variables 'ts0' and 'saved_pid', here.

If that's the case, this then precludes the matching only on fields
statically defined in matching event's tracepoint?

Continuing on your latency examples, if I wanted to look at hrtimer
latencies:

# echo 'hrtimer_latency u64 lat' >> /sys/kernel/debug/tracing/synthetic_events

# echo 'hist:keys=hrtimer.hex:onmatch(timer.hrtimer_start).hrtimer_latency(now-expires)' > \
/sys/kernel/debug/tracing/events/timer/hrtimer_expire_entry/trigger

(where 'now' and 'expires' are defined fields of the hrtimer_start
and hrtimer_expire_entry tracepoints, respectively)

This would _not_ work, because without the usage of a 'dummy' variable,
there is no "capturing of context" in the hrtimer_start event.

I would have to, instead, do:

# echo 'hrtimer_latency u64 lat' >> /sys/kernel/debug/tracing/synthetic_events
#
# echo 'hist:keys=hrtimer.hex:ts0=expires' > \
/sys/kernel/debug/tracing/events/timer/hrtimer_start/trigger

# echo 'hist:keys=hrtimer.hex:onmatch(timer.hrtimer_start).hrtimer_latency(now-$ts0)' > \
/sys/kernel/debug/tracing/events/timer/hrtimer_expire_entry/trigger

That is, create a dummy $ts0 to establish the relation.

Is my understanding correct? If so, that's not very intuitive (to me).

Thanks,
Julia

2017-09-20 17:15:51

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 37/40] tracing: Add inter-event hist trigger Documentation

Hi Julia,

On Wed, 2017-09-20 at 09:44 -0500, Julia Cartwright wrote:
> On Tue, Sep 05, 2017 at 04:57:49PM -0500, Tom Zanussi wrote:
> > Add background and details on inter-event hist triggers, including
> > hist variables, synthetic events, and actions.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
> > Signed-off-by: Baohong Liu <[email protected]>
> > ---
> > Documentation/trace/events.txt | 385 +++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 385 insertions(+)
> >
> > diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
> > index f271d87..7ee720b 100644
> > --- a/Documentation/trace/events.txt
> > +++ b/Documentation/trace/events.txt
> > @@ -571,6 +571,7 @@ The following commands are supported:
> > .sym-offset display an address as a symbol and offset
> > .syscall display a syscall id as a system call name
> > .execname display a common_pid as a program name
> > + .usecs display a $common_timestamp in microseconds
> >
> > Note that in general the semantics of a given field aren't
> > interpreted when applying a modifier to it, but there are some
> > @@ -2101,3 +2102,387 @@ The following commands are supported:
> > Hits: 489
> > Entries: 7
> > Dropped: 0
> > +
> > +6.3 Inter-event hist triggers
> > +-----------------------------
> > +
> > +Inter-event hist triggers are hist triggers that combine values from
> > +one or more other events and create a histogram using that data. Data
> > +from an inter-event histogram can in turn become the source for
> > +further combined histograms, thus providing a chain of related
> > +histograms, which is important for some applications.
> > +
> > +The most important example of an inter-event quantity that can be used
> > +in this manner is latency, which is simply a difference in timestamps
> > +between two events (although trace events don't have an externally
> > +visible timestamp field, the inter-event hist trigger support adds a
> > +pseudo-field to all events named '$common_timestamp' which can be used
> > +as if it were an actual event field)
>
> What's between the parentheses is covered below, is it worth mentioning
> in both places?
>

Probably not, maybe just a quick mention to see below for details on
timestamps, will change regardless..

> [..]
> > + - Trace events don't have a 'timestamp' associated with them, but
> > + there is an implicit timestamp saved along with an event in the
> > + underlying ftrace ring buffer. This timestamp is now exposed as a
> > + a synthetic field named '$common_timestamp' which can be used in
> > + histograms as if it were any other event field. Note that it has
> > + a '$' prefixed to it - this is meant to indicate that it isn't an
> > + actual field in the trace format but rather is a synthesized value
> > + that nonetheless can be used as if it were an actual field. By
> > + default it is in units of nanoseconds; appending '.usecs' to a
> > + common_timestamp field changes the units to microseconds.
> > +
> > +These features are decribed in more detail in the following sections.
>
> ^ described
>

Right, thanks for pointing that out ;-)

> > +
> > +6.3.1 Histogram Variables
> > +-------------------------
> > +
> > +Variables are simply named locations used for saving and retrieving
> > +values between matching events. A 'matching' event is defined as an
> > +event that has a matching key - if a variable is saved for a histogram
> > +entry corresponding to that key, any subsequent event with a matching
> > +key can access that variable.
> > +
> > +A variable's value is normally available to any subsequent event until
> > +it is set to something else by a subsequent event. The one exception
> > +to that rule is that any variable used in an expression is essentially
> > +'read-once' - once it's used by an expression in a subsequent event,
> > +it's reset to its 'unset' state, which means it can't be used again
> > +unless it's set again. This ensures not only that an event doesn't
> > +use an uninitialized variable in a calculation, but that that variable
> > +is used only once and not for any unrelated subsequent match.
> > +
> > +The basic syntax for saving a variable is to simply prefix a unique
> > +variable name not corresponding to any keyword along with an '=' sign
> > +to any event field.
>
> This would seem to make it much more difficult in the future to add new
> keywords for hist triggers without breaking users existing triggers.
> Maybe that's not a problem given the tracing ABI wild west.
>

Hmm, yeah, not sure how to implement general-purpose variable assignment
without knowing what reserved words would be ahead of time..

> [..]
> > +6.3.2 Synthetic Events
> > +----------------------
> > +
> > +Synthetic events are user-defined events generated from hist trigger
> > +variables or fields associated with one or more other events. Their
> > +purpose is to provide a mechanism for displaying data spanning
> > +multiple events consistent with the existing and already familiar
> > +usage for normal events.
> > +
> > +To define a synthetic event, the user writes a simple specification
> > +consisting of the name of the new event along with one or more
> > +variables and their types, which can be any valid field type,
> > +separated by semicolons, to the tracing/synthetic_events file.
> > +
> > +For instance, the following creates a new event named 'wakeup_latency'
> > +with 3 fields: lat, pid, and prio. Each of those fields is simply a
> > +variable reference to a variable on another event:
> > +
> > + # echo 'wakeup_latency \
> > + u64 lat; \
> > + pid_t pid; \
> > + int prio' >> \
> > + /sys/kernel/debug/tracing/synthetic_events
> > +
> > +Reading the tracing/synthetic_events file lists all the currently
> > +defined synthetic events, in this case the event defined above:
> > +
> > + # cat /sys/kernel/debug/tracing/synthetic_events
> > + wakeup_latency u64 lat; pid_t pid; int prio
> > +
> > +An existing synthetic event definition can be removed by prepending
> > +the command that defined it with a '!':
> > +
> > + # echo '!wakeup_latency u64 lat pid_t pid int prio' >> \
> > + /sys/kernel/debug/tracing/synthetic_events
> > +
> > +At this point, there isn't yet an actual 'wakeup_latency' event
> > +instantiated in the event subsytem - for this to happen, a 'hist
> > +trigger action' needs to be instantiated and bound to actual fields
> > +and variables defined on other events (see Section 6.3.3 below).
> > +
> > +Once that is done, an event instance is created, and a histogram can
> > +be defined using it:
> > +
> > + # echo 'hist:keys=pid,prio,lat.log2:sort=pid,lat' >> \
> > + /sys/kernel/debug/tracing/events/synthetic/wakeup_latency/trigger
> > +
> > +The new event is created under the tracing/events/synthetic/ directory
> > +and looks and behaves just like any other event:
> > +
> > + # ls /sys/kernel/debug/tracing/events/synthetic/wakeup_latency
> > + enable filter format hist id trigger
> > +
> > +Like any other event, once a histogram is enabled for the event, the
> > +output can be displayed by reading the event's 'hist' file.
> > +
> > +6.3.3 Hist trigger 'actions'
> > +----------------------------
> > +
> > +A hist trigger 'action' is a function that's executed whenever a
> > +histogram entry is added or updated.
> > +
> > +The default 'action' if no special function is explicity specified is
> > +as it always has been, to simply update the set of values associated
> > +with an entry. Some applications, however, may want to perform
> > +additional actions at that point, such as generate another event, or
> > +compare and save a maximum.
> > +
> > +The following additional actions are available. To specify an action
> > +for a given event, simply specify the action between colons in the
> > +hist trigger specification.
> > +
> > + - onmatch(matching.event).<synthetic_event_name>(param list)
> > +
> > + The 'onmatch(matching.event).<synthetic_event_name>(params)' hist
> > + trigger action is invoked whenever an event matches and the
> > + histogram entry would be added or updated.
>
> I understand from the documentation that 'onmatch' establishes a
> relation between events, but it isn't clear the nature of this relation.
>

Well, the syntax and idea behind 'onmatch' has evolved a bit following
the review comments of the past couple iterations of the patchset. The
underlying mechanism hasn't changed much though, which is that basically
a synthetic event is generated using variables from one or more
'matching' events, where 'matching' means that an event with a matching
key was found in a histogram for that event, and the value of the
variable associated with it is used for the corresponding param when
generating the synthetic event. So when the event that defines the
onmatch() action occurs, the matching events having the same key as the
onmatch() synthetic event params are looked up and if all are found the
onmatch() action generates the synthetic events using those matching
variables.

Though that's still all true, the most common use case consists of two
events, the event the onmatch() is defined on and the matching.event
specified as onmatch(matching.event). For most users and use cases that
makes the relationship clear. That doesn't mean there can only be one
matching event participating - the main change that the switch to the
matching.event syntax added was that if a variable isn't found on the
event onmatch() is defined on, then it's automatically looked for on the
matching.event event. Basically a syntax change making it more natural
for the user to specify triggers using the most common use case.

> In other words: what criteria are used to determine whether two events
> match? Is it the trace buffer ordering? Time ordering? Something
> else?

A match for an event is just the last matching.event found in the
matching.event histogram. Or in the more general case, the last event
found in the histogram(s) for each variable used in the synthetic event
generation.

>
> > It causes the named
> > + synthetic event to be generated with the values given in the
> > + 'param list'. The result is the generation of a synthetic event
> > + that consists of the values contained in those variables at the
> > + time the invoking event was hit.
> > +
> > + The 'param list' consists of one or more parameters which may be
> > + either variables or fields defined on either the 'matching.event'
> > + or the target event. The variables or fields specified in the
> > + param list may be either fully-qualified or unqualified. If a
> > + variable is specified as unqualified, it must be unique between
> > + the two events. A field name used as a param can be unqualified
> > + if it refers to the target event, but must be fully qualified if
> > + it refers to the matching event. A fully-qualified name is of the
> > + form 'system.event_name.$var_name' or 'system.event_name.field'.
> > +
> > + The 'matching.event' specification is simply the fully qualified
> > + event name of the event that matches the target event for the
> > + onmatch() functionality, in the form 'system.event_name'.
> > +
> > + Finally, the number and type of variables/fields in the 'param
> > + list' must match the number and types of the fields in the
> > + synthetic event being generated.
> > +
> > + As an example the below defines a simple synthetic event and uses
> > + a variable defined on the sched_wakeup_new event as a parameter
> > + when invoking the synthetic event. Here we define the synthetic
> > + event:
> > +
> > + # echo 'wakeup_new_test pid_t pid' >> \
> > + /sys/kernel/debug/tracing/synthetic_events
> > +
> > + # cat /sys/kernel/debug/tracing/synthetic_events
> > + wakeup_new_test pid_t pid
> > +
> > + The following hist trigger both defines the missing testpid
> > + variable and specifies an onmatch() action that generates a
> > + wakeup_new_test synthetic event whenever a sched_wakeup_new event
> > + occurs, which because of the 'if comm == "cyclictest"' filter only
> > + happens when the executable is cyclictest:
> > +
> > + # echo 'hist:keys=$testpid:testpid=pid:onmatch(sched.sched_wakeup_new).\
> > + wakeup_new_test($testpid) if comm=="cyclictest"' >> \
> > + /sys/kernel/debug/tracing/events/sched/sched_wakeup_new/trigger
> > +
> > + Creating and displaying a histogram based on those events is now
> > + just a matter of using the fields and new synthetic event in the
> > + tracing/events/synthetic directory, as usual:
> > +
> > + # echo 'hist:keys=pid:sort=pid' >> \
> > + /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/trigger
> > +
> > + Running 'cyclictest' should cause wakeup_new events to generate
> > + wakeup_new_test synthetic events which should result in histogram
> > + output in the wakeup_new_test event's hist file:
> > +
> > + # cat /sys/kernel/debug/tracing/events/synthetic/wakeup_new_test/hist
> > +
> > + A more typical usage would be to use two events to calculate a
> > + latency. The following example uses a set of hist triggers to
> > + produce a 'wakeup_latency' histogram:
> > +
> > + First, we define a 'wakeup_latency' synthetic event:
> > +
> > + # echo 'wakeup_latency u64 lat; pid_t pid; int prio' >> \
> > + /sys/kernel/debug/tracing/synthetic_events
> > +
> > + Next, we specify that whenever we see a sched_waking event for a
> > + cyclictest thread, save the timestamp in a 'ts0' variable:
> > +
> > + # echo 'hist:keys=$saved_pid:saved_pid=pid:ts0=$common_timestamp.usecs \
> > + if comm=="cyclictest"' >> \
> > + /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
> > +
> > + Then, when the corresponding thread is actually scheduled onto the
> > + CPU by a sched_switch event, calculate the latency and use that
> > + along with another variable and an event field to generate a
> > + wakeup_latency synthetic event:
> > +
> > + # echo 'hist:keys=next_pid:wakeup_lat=$common_timestamp.usecs-$ts0:\
> > + onmatch(sched.sched_waking).wakeup_latency($wakeup_lat,\
> > + $saved_pid,next_prio) if next_comm=="cyclictest"' >> \
> > + /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
>
> Just a check for understanding:
>
> The onmatch() relation between events is driven by the use of the
> histogram variables 'ts0' and 'saved_pid', here.
>
> If that's the case, this then precludes the matching only on fields
> statically defined in matching event's tracepoint?
>
> Continuing on your latency examples, if I wanted to look at hrtimer
> latencies:
>
> # echo 'hrtimer_latency u64 lat' >> /sys/kernel/debug/tracing/synthetic_events
>
> # echo 'hist:keys=hrtimer.hex:onmatch(timer.hrtimer_start).hrtimer_latency(now-expires)' > \
> /sys/kernel/debug/tracing/events/timer/hrtimer_expire_entry/trigger
>
> (where 'now' and 'expires' are defined fields of the hrtimer_start
> and hrtimer_expire_entry tracepoints, respectively)
>
> This would _not_ work, because without the usage of a 'dummy' variable,
> there is no "capturing of context" in the hrtimer_start event.
>
> I would have to, instead, do:
>
> # echo 'hrtimer_latency u64 lat' >> /sys/kernel/debug/tracing/synthetic_events
> #
> # echo 'hist:keys=hrtimer.hex:ts0=expires' > \
> /sys/kernel/debug/tracing/events/timer/hrtimer_start/trigger
>
> # echo 'hist:keys=hrtimer.hex:onmatch(timer.hrtimer_start).hrtimer_latency(now-$ts0)' > \
> /sys/kernel/debug/tracing/events/timer/hrtimer_expire_entry/trigger
>
> That is, create a dummy $ts0 to establish the relation.
>
> Is my understanding correct? If so, that's not very intuitive (to me).
>

Yes, that's correct, that's how it works, but there's no reason we
couldn't add a higher-level capability that would automatically
translate hrtimer_latency(now-expires) into the low-level syntax.

What's implemented so far is necessarily pretty low-level, but I'm
pretty sure there will be some follow-ons providing higher-level
constructs like this, once the basic mechanism settles..

Thanks,

Tom

> Thanks,
> Julia


2017-09-21 20:20:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v2 00/40] tracing: Inter-event (e.g. latency) support

On Tue, 19 Sep 2017 13:44:28 -0500
Tom Zanussi <[email protected]> wrote:

> Yeah, it's almost ready. At this point, I've addressed all the comments
> except for:
>
> - PATCH v2 25/40] tracing: Add support for dynamic tracepoints
>
> which I need to do a little bit of research on to figure out what
> exactly I need to do there.

Let me know if you need any help.

>
> I was also kind of looking for a couple patches from you to fold in
> which you had mentioned you were going to send for:
>
> - [PATCH v2 35/40] tracing: Reverse the order event_mutex/trace_types_lock are taken

OK, I have a stand alone patch that you don't need to fold in that does
this properly. I tested the crap out of it (missed a few places), but
should be good. I'll send that netx.


> - [PATCH v2 40/40] tracing: Add trace_event_buffer_reserve() variant that allows recursion

I have this working too, but want to test it a little more before
sending. Once I do, you can add both patches ahead of your series. I
may just apply them to my tree now and start running them through my
formal tests.

>
> I've been using the trace_recursive_lock() patch you posted in place of
> the latter and it's working fine. :-)

Thanks!

-- Steve

2017-09-21 21:11:58

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v2 00/40] tracing: Inter-event (e.g. latency) support

Hi Steve,

On Thu, 2017-09-21 at 16:20 -0400, Steven Rostedt wrote:
> On Tue, 19 Sep 2017 13:44:28 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Yeah, it's almost ready. At this point, I've addressed all the comments
> > except for:
> >
> > - PATCH v2 25/40] tracing: Add support for dynamic tracepoints
> >
> > which I need to do a little bit of research on to figure out what
> > exactly I need to do there.
>
> Let me know if you need any help.
>

OK, thanks.

> >
> > I was also kind of looking for a couple patches from you to fold in
> > which you had mentioned you were going to send for:
> >
> > - [PATCH v2 35/40] tracing: Reverse the order event_mutex/trace_types_lock are taken
>
> OK, I have a stand alone patch that you don't need to fold in that does
> this properly. I tested the crap out of it (missed a few places), but
> should be good. I'll send that netx.
>
>
> > - [PATCH v2 40/40] tracing: Add trace_event_buffer_reserve() variant that allows recursion
>
> I have this working too, but want to test it a little more before
> sending. Once I do, you can add both patches ahead of your series. I
> may just apply them to my tree now and start running them through my
> formal tests.
>

Thanks for these patches - testing now with the mutex patch you just
sent and the old trace_recursive_lock() patch, hope to have a v3 out
soon.

Tom