2022-09-04 04:13:38

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 0/6] tracing/hist: Add percentage histogram suffixes

Hi,

Here is the 5th version of .percent and .graph suffixes for histogram
trigger to show the value in percentage and in bar-graph respectively.
The previous version is here;

https://lore.kernel.org/all/166157298537.348924.2537162090505397377.stgit@devnote2

This version updated the last patch to add "nohitcount" option for
supressing display of hitcount according to Tom's suggestion[5/6],
fix README file[4/6], and update the histogram document[6/6].

This will help us to check the trend of the histogram instantly
without any post processing tool.

Here shows an example of the percentage and the bar graph of
the runtime of the running tasks without raw hitcount field.

# cd /sys/kernel/debug/tracing/
# echo hist:keys=pid:vals=hitcount.percent,hitcount.graph:sort=pid:NOHC > \
events/sched/sched_stat_runtime/trigger
# sleep 10
# cat events/sched/sched_stat_runtime/hist
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount.percent,hitcount.graph:sort=pid:size=2048:nohitcount [active]
#

{ pid: 14 } hitcount (%): 6.91 hitcount: ##
{ pid: 16 } hitcount (%): 1.13 hitcount:
{ pid: 57 } hitcount (%): 6.99 hitcount: ##
{ pid: 61 } hitcount (%): 55.28 hitcount: ####################
{ pid: 67 } hitcount (%): 5.52 hitcount: ##
{ pid: 69 } hitcount (%): 20.08 hitcount: #######
{ pid: 71 } hitcount (%): 0.40 hitcount:
{ pid: 77 } hitcount (%): 0.97 hitcount:
{ pid: 78 } hitcount (%): 0.56 hitcount:
{ pid: 145 } hitcount (%): 1.13 hitcount:
{ pid: 153 } hitcount (%): 0.48 hitcount:
{ pid: 154 } hitcount (%): 0.48 hitcount:

Totals:
Hits: 1230
Entries: 12
Dropped: 0


With the NOHC (or nohitcount) option, the histogram will skip
display of raw hitcount, but it will allow to display percent
and graph of hitcount.

Trigger Result
val=runtime -> Show hitcount and runtime
val=runtime:NOHC -> Show runtime
val=hitcount:NOHC -> Error
(noval) -> show hitcount
(noval):NOHC -> Error
val=hitcount.parcent -> show hitcount and hitcount%
val=hitcount.parcent:NOHC -> show hitcount%

Thank you,

---

Masami Hiramatsu (Google) (5):
tracing: Fix to check event_mutex is held while accessing trigger list
tracing: Add .percent suffix option to histogram values
tracing: Add .graph suffix option to histogram value
tracing: Add nohitcount option for suppressing display of raw hitcount
tracing: docs: Update histogram doc for .percent/.graph and 'nohitcount'

Tom Zanussi (1):
tracing: Allow multiple hitcount values in histograms


Documentation/trace/histogram.rst | 10 ++
kernel/trace/trace.c | 7 +
kernel/trace/trace_events_hist.c | 177 ++++++++++++++++++++++++++++++-----
kernel/trace/trace_events_trigger.c | 3 -
4 files changed, 171 insertions(+), 26 deletions(-)

--
Masami Hiramatsu (Google) <[email protected]>


2022-09-04 04:14:40

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 3/6] tracing: Add .percent suffix option to histogram values

From: Masami Hiramatsu (Google) <[email protected]>

Add .percent suffix option to show the histogram values in percentage.
This feature is useful when we need yo undersntand the overall trend
for the histograms of large values.
E.g. this shows the runtime percentage for each tasks.

------
# cd /sys/kernel/debug/tracing/
# echo hist:keys=pid:vals=hitcount,runtime.percent:sort=pid > \
events/sched/sched_stat_runtime/trigger
# sleep 10
# cat events/sched/sched_stat_runtime/hist
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount,runtime.percent:sort=pid:size=2048 [active]
#

{ pid: 8 } hitcount: 7 runtime (%): 4.14
{ pid: 14 } hitcount: 5 runtime (%): 3.69
{ pid: 16 } hitcount: 11 runtime (%): 3.41
{ pid: 61 } hitcount: 41 runtime (%): 19.75
{ pid: 65 } hitcount: 4 runtime (%): 1.48
{ pid: 70 } hitcount: 6 runtime (%): 3.60
{ pid: 72 } hitcount: 2 runtime (%): 1.10
{ pid: 144 } hitcount: 10 runtime (%): 32.01
{ pid: 151 } hitcount: 8 runtime (%): 22.66
{ pid: 152 } hitcount: 2 runtime (%): 8.10

Totals:
Hits: 96
Entries: 10
Dropped: 0
-----

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v5:
- Show raw hitcount value always even if hitcount.percent
is specified.
Changes in v4:
- Port on the Tom's hitcount patch.
Changes in v3:
- Show "(%)" in the field name.
- Return div0 error only if the value is not zero.
- Simplify the print code with hist_trigger_print_val().
- Show hitcount percentage correctly.
Changes in v2:
- Use div64_*().
- Show an error when failing to calculate the percentage.
---
kernel/trace/trace.c | 3 +
kernel/trace/trace_events_hist.c | 90 ++++++++++++++++++++++++++++++++------
2 files changed, 78 insertions(+), 15 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d3005279165d..5e5efc550ac4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5700,7 +5700,8 @@ static const char readme_msg[] =
"\t .syscall display a syscall id as a syscall name\n"
"\t .log2 display log2 value rather than raw number\n"
"\t .buckets=size display values in groups of size rather than raw number\n"
- "\t .usecs display a common_timestamp in microseconds\n\n"
+ "\t .usecs display a common_timestamp in microseconds\n"
+ "\t .percent display a number of percentage value\n\n"
"\t The 'pause' parameter can be used to pause an existing hist\n"
"\t trigger or to start a hist trigger but not log any events\n"
"\t until told to do so. 'continue' can be used to start or\n"
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index e80a41e380bb..053b23e10c48 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -506,6 +506,7 @@ enum hist_field_flags {
HIST_FIELD_FL_ALIAS = 1 << 16,
HIST_FIELD_FL_BUCKET = 1 << 17,
HIST_FIELD_FL_CONST = 1 << 18,
+ HIST_FIELD_FL_PERCENT = 1 << 19,
};

struct var_defs {
@@ -1707,6 +1708,8 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)
flags_str = "buckets";
else if (hist_field->flags & HIST_FIELD_FL_TIMESTAMP_USECS)
flags_str = "usecs";
+ else if (hist_field->flags & HIST_FIELD_FL_PERCENT)
+ flags_str = "percent";

return flags_str;
}
@@ -2315,6 +2318,10 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
if (ret || !(*buckets))
goto error;
*flags |= HIST_FIELD_FL_BUCKET;
+ } else if (strncmp(modifier, "percent", 7) == 0) {
+ if (*flags & (HIST_FIELD_FL_VAR | HIST_FIELD_FL_KEY))
+ goto error;
+ *flags |= HIST_FIELD_FL_PERCENT;
} else {
error:
hist_err(tr, HIST_ERR_BAD_FIELD_MODIFIER, errpos(modifier));
@@ -5288,33 +5295,69 @@ static void hist_trigger_print_key(struct seq_file *m,
seq_puts(m, "}");
}

+/* Get the 100 times of the percentage of @val in @total */
+static inline unsigned int __get_percentage(u64 val, u64 total)
+{
+ if (!total)
+ goto div0;
+
+ if (val < (U64_MAX / 10000))
+ return (unsigned int)div64_ul(val * 10000, total);
+
+ total = div64_u64(total, 10000);
+ if (!total)
+ goto div0;
+
+ return (unsigned int)div64_ul(val, total);
+div0:
+ return val ? UINT_MAX : 0;
+}
+
+static void hist_trigger_print_val(struct seq_file *m, unsigned int idx,
+ const char *field_name, unsigned long flags,
+ u64 *totals, struct tracing_map_elt *elt)
+{
+ u64 val = tracing_map_read_sum(elt, idx);
+ unsigned int pc;
+
+ if (flags & HIST_FIELD_FL_PERCENT) {
+ pc = __get_percentage(val, totals[idx]);
+ if (pc == UINT_MAX)
+ seq_printf(m, " %s (%%):[ERROR]", field_name);
+ else
+ seq_printf(m, " %s (%%): %3u.%02u", field_name,
+ pc / 100, pc % 100);
+ } else if (flags & HIST_FIELD_FL_HEX) {
+ seq_printf(m, " %s: %10llx", field_name, val);
+ } else {
+ seq_printf(m, " %s: %10llu", field_name, val);
+ }
+}
+
static void hist_trigger_entry_print(struct seq_file *m,
struct hist_trigger_data *hist_data,
+ u64 *totals,
void *key,
struct tracing_map_elt *elt)
{
const char *field_name;
- unsigned int i;
+ unsigned int i = HITCOUNT_IDX;
+ unsigned long flags;

hist_trigger_print_key(m, hist_data, key, elt);

- seq_printf(m, " hitcount: %10llu",
- tracing_map_read_sum(elt, HITCOUNT_IDX));
+ /* At first, show the raw hitcount always */
+ hist_trigger_print_val(m, i, "hitcount", 0, totals, elt);

for (i = 1; i < hist_data->n_vals; i++) {
field_name = hist_field_name(hist_data->fields[i], 0);
+ flags = hist_data->fields[i]->flags;

- if (hist_data->fields[i]->flags & HIST_FIELD_FL_VAR ||
- hist_data->fields[i]->flags & HIST_FIELD_FL_EXPR)
+ if (flags & HIST_FIELD_FL_VAR || flags & HIST_FIELD_FL_EXPR)
continue;

- if (hist_data->fields[i]->flags & HIST_FIELD_FL_HEX) {
- seq_printf(m, " %s: %10llx", field_name,
- tracing_map_read_sum(elt, i));
- } else {
- seq_printf(m, " %s: %10llu", field_name,
- tracing_map_read_sum(elt, i));
- }
+ seq_puts(m, " ");
+ hist_trigger_print_val(m, i, field_name, flags, totals, elt);
}

print_actions(m, hist_data, elt);
@@ -5327,7 +5370,8 @@ static int print_entries(struct seq_file *m,
{
struct tracing_map_sort_entry **sort_entries = NULL;
struct tracing_map *map = hist_data->map;
- int i, n_entries;
+ int i, j, n_entries;
+ u64 *totals = NULL;

n_entries = tracing_map_sort_entries(map, hist_data->sort_keys,
hist_data->n_sort_keys,
@@ -5335,11 +5379,29 @@ static int print_entries(struct seq_file *m,
if (n_entries < 0)
return n_entries;

+ for (j = 0; j < hist_data->n_vals; j++) {
+ if (!(hist_data->fields[j]->flags & HIST_FIELD_FL_PERCENT))
+ continue;
+ if (!totals) {
+ totals = kcalloc(hist_data->n_vals, sizeof(u64),
+ GFP_KERNEL);
+ if (!totals) {
+ n_entries = -ENOMEM;
+ goto out;
+ }
+ }
+ for (i = 0; i < n_entries; i++)
+ totals[j] += tracing_map_read_sum(
+ sort_entries[i]->elt, j);
+ }
+
for (i = 0; i < n_entries; i++)
- hist_trigger_entry_print(m, hist_data,
+ hist_trigger_entry_print(m, hist_data, totals,
sort_entries[i]->key,
sort_entries[i]->elt);

+ kfree(totals);
+out:
tracing_map_destroy_sort_entries(sort_entries, n_entries);

return n_entries;

2022-09-04 04:21:50

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 1/6] tracing: Allow multiple hitcount values in histograms

From: Tom Zanussi <[email protected]>

The hitcount is treated specially in the histograms - since it's
always expected to be there regardless of whether the user specified
anything or not, it's always added as the first histogram value.

Currently the code doesn't allow it to be added more than once as a
value, which is inconsistent with all the other possible values. It
would seem to be a pointless thing to want to do, but other features
being added such as percent and graph modifiers don't work properly
with the current hitcount restrictions.

Fix this by allowing multiple hitcounts to be added.

Signed-off-by: Tom Zanussi <[email protected]>

---
Changes in v4:
- Initialize n_hitcount.
---
kernel/trace/trace_events_hist.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 92fbd72b6408..e80a41e380bb 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1356,6 +1356,8 @@ static const char *hist_field_name(struct hist_field *field,
field_name = field->name;
} else if (field->flags & HIST_FIELD_FL_TIMESTAMP)
field_name = "common_timestamp";
+ else if (field->flags & HIST_FIELD_FL_HITCOUNT)
+ field_name = "hitcount";

if (field_name == NULL)
field_name = "";
@@ -2328,6 +2330,8 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
hist_data->attrs->ts_in_usecs = true;
} else if (strcmp(field_name, "common_cpu") == 0)
*flags |= HIST_FIELD_FL_CPU;
+ else if (strcmp(field_name, "hitcount") == 0)
+ *flags |= HIST_FIELD_FL_HITCOUNT;
else {
field = trace_find_event_field(file->event_call, field_name);
if (!field || !field->size) {
@@ -4328,8 +4332,8 @@ static int create_var_field(struct hist_trigger_data *hist_data,
static int create_val_fields(struct hist_trigger_data *hist_data,
struct trace_event_file *file)
{
+ unsigned int i, j = 1, n_hitcount = 0;
char *fields_str, *field_str;
- unsigned int i, j = 1;
int ret;

ret = create_hitcount_val(hist_data);
@@ -4346,8 +4350,10 @@ static int create_val_fields(struct hist_trigger_data *hist_data,
if (!field_str)
break;

- if (strcmp(field_str, "hitcount") == 0)
- continue;
+ if (strcmp(field_str, "hitcount") == 0) {
+ if (!n_hitcount++)
+ continue;
+ }

ret = create_val_field(hist_data, j++, file, field_str);
if (ret)

2022-09-04 04:22:29

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 5/6] tracing: Add nohitcount option for suppressing display of raw hitcount

From: Masami Hiramatsu (Google) <[email protected]>

Add 'nohitcount' ('NOHC' for short) option for suppressing display of
the raw hitcount column in the histogram.
Note that you must specify at least one value except raw 'hitcount'
when you specify this nohitcount option.

# cd /sys/kernel/debug/tracing/
# echo hist:keys=pid:vals=runtime.percent,runtime.graph:sort=pid:NOHC > \
events/sched/sched_stat_runtime/trigger
# sleep 10
# cat events/sched/sched_stat_runtime/hist
# event histogram
#
# trigger info: hist:keys=pid:vals=runtime.percent,runtime.graph:sort=pid:size=2048:nohitcount [active]
#

{ pid: 8 } runtime (%): 3.02 runtime: #
{ pid: 14 } runtime (%): 2.25 runtime:
{ pid: 16 } runtime (%): 2.25 runtime:
{ pid: 26 } runtime (%): 0.17 runtime:
{ pid: 61 } runtime (%): 11.52 runtime: ####
{ pid: 67 } runtime (%): 1.56 runtime:
{ pid: 68 } runtime (%): 0.84 runtime:
{ pid: 76 } runtime (%): 0.92 runtime:
{ pid: 117 } runtime (%): 2.50 runtime: #
{ pid: 146 } runtime (%): 49.88 runtime: ####################
{ pid: 157 } runtime (%): 16.63 runtime: ######
{ pid: 158 } runtime (%): 8.38 runtime: ###


Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v5:
- Add nohitcount and NOHC option to avoid showing hitcount.
---
kernel/trace/trace.c | 3 +++
kernel/trace/trace_events_hist.c | 34 ++++++++++++++++++++++++++--------
2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index aec092b4ed3e..e39d3d3b64ac 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5654,6 +5654,7 @@ static const char readme_msg[] =
"\t [:size=#entries]\n"
"\t [:pause][:continue][:clear]\n"
"\t [:name=histname1]\n"
+ "\t [:nohitcount]\n"
"\t [:<handler>.<action>]\n"
"\t [if <filter>]\n\n"
"\t Note, special fields can be used as well:\n"
@@ -5710,6 +5711,8 @@ static const char readme_msg[] =
"\t The 'clear' parameter will clear the contents of a running\n"
"\t hist trigger and leave its current paused/active state\n"
"\t unchanged.\n\n"
+ "\t The 'nohitcount' (or NOHC) parameter will suppress display of\n"
+ "\t raw hitcount in the histogram.\n\n"
"\t The enable_hist and disable_hist triggers can be used to\n"
"\t have one event conditionally start and stop another event's\n"
"\t already-attached hist trigger. The syntax is analogous to\n"
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 753da682d404..8d1db056766f 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -69,7 +69,8 @@
C(INVALID_STR_OPERAND, "String type can not be an operand in expression"), \
C(EXPECT_NUMBER, "Expecting numeric literal"), \
C(UNARY_MINUS_SUBEXPR, "Unary minus not supported in sub-expressions"), \
- C(DIVISION_BY_ZERO, "Division by zero"),
+ C(DIVISION_BY_ZERO, "Division by zero"), \
+ C(NEED_NOHC_VAL, "Non-hitcount value is required for 'nohitcount'"),

#undef C
#define C(a, b) HIST_ERR_##a
@@ -526,6 +527,7 @@ struct hist_trigger_attrs {
bool cont;
bool clear;
bool ts_in_usecs;
+ bool no_hitcount;
unsigned int map_bits;

char *assignment_str[TRACING_MAP_VARS_MAX];
@@ -1550,7 +1552,10 @@ parse_hist_trigger_attrs(struct trace_array *tr, char *trigger_str)
ret = parse_assignment(tr, str, attrs);
if (ret)
goto free;
- } else if (strcmp(str, "pause") == 0)
+ } else if (strcmp(str, "nohitcount") == 0 ||
+ strcmp(str, "NOHC") == 0)
+ attrs->no_hitcount = true;
+ else if (strcmp(str, "pause") == 0)
attrs->pause = true;
else if ((strcmp(str, "cont") == 0) ||
(strcmp(str, "continue") == 0))
@@ -4377,6 +4382,12 @@ static int create_val_fields(struct hist_trigger_data *hist_data,
if (fields_str && (strcmp(fields_str, "hitcount") != 0))
ret = -EINVAL;
out:
+ /* There is only raw hitcount but nohitcount suppresses it. */
+ if (j == 1 && hist_data->attrs->no_hitcount) {
+ hist_err(hist_data->event_file->tr, HIST_ERR_NEED_NOHC_VAL, 0);
+ ret = -ENOENT;
+ }
+
return ret;
}

@@ -5385,13 +5396,13 @@ static void hist_trigger_entry_print(struct seq_file *m,

hist_trigger_print_key(m, hist_data, key, elt);

- /* At first, show the raw hitcount always */
- hist_trigger_print_val(m, i, "hitcount", 0, stats, elt);
+ /* At first, show the raw hitcount if !nohitcount */
+ if (!hist_data->attrs->no_hitcount)
+ hist_trigger_print_val(m, i, "hitcount", 0, stats, elt);

for (i = 1; i < hist_data->n_vals; i++) {
field_name = hist_field_name(hist_data->fields[i], 0);
flags = hist_data->fields[i]->flags;
-
if (flags & HIST_FIELD_FL_VAR || flags & HIST_FIELD_FL_EXPR)
continue;

@@ -5836,6 +5847,7 @@ static int event_hist_trigger_print(struct seq_file *m,
struct hist_trigger_data *hist_data = data->private_data;
struct hist_field *field;
bool have_var = false;
+ bool show_val = false;
unsigned int i;

seq_puts(m, HIST_PREFIX);
@@ -5866,12 +5878,16 @@ static int event_hist_trigger_print(struct seq_file *m,
continue;
}

- if (i == HITCOUNT_IDX)
+ if (i == HITCOUNT_IDX) {
+ if (hist_data->attrs->no_hitcount)
+ continue;
seq_puts(m, "hitcount");
- else {
- seq_puts(m, ",");
+ } else {
+ if (show_val)
+ seq_puts(m, ",");
hist_field_print(m, field);
}
+ show_val = true;
}

if (have_var) {
@@ -5922,6 +5938,8 @@ static int event_hist_trigger_print(struct seq_file *m,
seq_printf(m, ":size=%u", (1 << hist_data->map->map_bits));
if (hist_data->enable_timestamps)
seq_printf(m, ":clock=%s", hist_data->attrs->clock);
+ if (hist_data->attrs->no_hitcount)
+ seq_puts(m, ":nohitcount");

print_actions_spec(m, hist_data);


2022-09-04 04:22:39

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 2/6] tracing: Fix to check event_mutex is held while accessing trigger list

From: Masami Hiramatsu (Google) <[email protected]>

Since the check_user_trigger() is called outside of RCU
read lock, this list_for_each_entry_rcu() caused a suspicious
RCU usage warning.

# echo hist:keys=pid > events/sched/sched_stat_runtime/trigger
# cat events/sched/sched_stat_runtime/trigger
[ 43.167032]
[ 43.167418] =============================
[ 43.167992] WARNING: suspicious RCU usage
[ 43.168567] 5.19.0-rc5-00029-g19ebe4651abf #59 Not tainted
[ 43.169283] -----------------------------
[ 43.169863] kernel/trace/trace_events_trigger.c:145 RCU-list traversed in non-reader section!!
...

However, this file->triggers list is safe when it is accessed
under event_mutex is held.
To fix this warning, adds a lockdep_is_held check to the
list_for_each_entry_rcu().

Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
kernel/trace/trace_events_trigger.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c
index cb866c3141af..918730d74932 100644
--- a/kernel/trace/trace_events_trigger.c
+++ b/kernel/trace/trace_events_trigger.c
@@ -142,7 +142,8 @@ static bool check_user_trigger(struct trace_event_file *file)
{
struct event_trigger_data *data;

- list_for_each_entry_rcu(data, &file->triggers, list) {
+ list_for_each_entry_rcu(data, &file->triggers, list,
+ lockdep_is_held(&event_mutex)) {
if (data->flags & EVENT_TRIGGER_FL_PROBE)
continue;
return true;

2022-09-04 04:22:39

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 4/6] tracing: Add .graph suffix option to histogram value

From: Masami Hiramatsu (Google) <[email protected]>

Add the .graph suffix which shows the bar graph of the histogram value.

For example, the below example shows that the bar graph
of the histogram of the runtime for each tasks.

------
# cd /sys/kernel/debug/tracing/
# echo hist:keys=pid:vals=runtime.graph:sort=pid > \
events/sched/sched_stat_runtime/trigger
# sleep 10
# cat events/sched/sched_stat_runtime/hist
# event histogram
#
# trigger info: hist:keys=pid:vals=hitcount,runtime.graph:sort=pid:size=2048 [active]
#

{ pid: 14 } hitcount: 2 runtime:
{ pid: 16 } hitcount: 8 runtime:
{ pid: 26 } hitcount: 1 runtime:
{ pid: 57 } hitcount: 3 runtime:
{ pid: 61 } hitcount: 20 runtime: ###
{ pid: 66 } hitcount: 2 runtime:
{ pid: 70 } hitcount: 3 runtime:
{ pid: 72 } hitcount: 2 runtime:
{ pid: 145 } hitcount: 14 runtime: ####################
{ pid: 152 } hitcount: 5 runtime: #######
{ pid: 153 } hitcount: 2 runtime: ####

Totals:
Hits: 62
Entries: 11
Dropped: 0
-------

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v5:
- Show raw hitcount value always even if hitcount.graph
is specified.
- Update README file
Changes in v3:
- Show hitcount.graph in trigger info correctly.
- Show both hitcount.percent and hitcount.graph on the same histogram.
Changes in v2:
- Show an error when failing to calculate the percentage.
---
kernel/trace/trace.c | 3 +
kernel/trace/trace_events_hist.c | 77 ++++++++++++++++++++++++++++++--------
2 files changed, 63 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5e5efc550ac4..aec092b4ed3e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5701,7 +5701,8 @@ static const char readme_msg[] =
"\t .log2 display log2 value rather than raw number\n"
"\t .buckets=size display values in groups of size rather than raw number\n"
"\t .usecs display a common_timestamp in microseconds\n"
- "\t .percent display a number of percentage value\n\n"
+ "\t .percent display a number of percentage value\n"
+ "\t .graph display a bar-graph of a value\n\n"
"\t The 'pause' parameter can be used to pause an existing hist\n"
"\t trigger or to start a hist trigger but not log any events\n"
"\t until told to do so. 'continue' can be used to start or\n"
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 053b23e10c48..753da682d404 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -507,6 +507,7 @@ enum hist_field_flags {
HIST_FIELD_FL_BUCKET = 1 << 17,
HIST_FIELD_FL_CONST = 1 << 18,
HIST_FIELD_FL_PERCENT = 1 << 19,
+ HIST_FIELD_FL_GRAPH = 1 << 20,
};

struct var_defs {
@@ -1710,6 +1711,8 @@ static const char *get_hist_field_flags(struct hist_field *hist_field)
flags_str = "usecs";
else if (hist_field->flags & HIST_FIELD_FL_PERCENT)
flags_str = "percent";
+ else if (hist_field->flags & HIST_FIELD_FL_GRAPH)
+ flags_str = "graph";

return flags_str;
}
@@ -2322,6 +2325,10 @@ parse_field(struct hist_trigger_data *hist_data, struct trace_event_file *file,
if (*flags & (HIST_FIELD_FL_VAR | HIST_FIELD_FL_KEY))
goto error;
*flags |= HIST_FIELD_FL_PERCENT;
+ } else if (strncmp(modifier, "graph", 5) == 0) {
+ if (*flags & (HIST_FIELD_FL_VAR | HIST_FIELD_FL_KEY))
+ goto error;
+ *flags |= HIST_FIELD_FL_GRAPH;
} else {
error:
hist_err(tr, HIST_ERR_BAD_FIELD_MODIFIER, errpos(modifier));
@@ -5313,20 +5320,52 @@ static inline unsigned int __get_percentage(u64 val, u64 total)
return val ? UINT_MAX : 0;
}

+#define BAR_CHAR '#'
+
+static inline const char *__fill_bar_str(char *buf, int size, u64 val, u64 max)
+{
+ unsigned int len = __get_percentage(val, max);
+ int i;
+
+ if (len == UINT_MAX) {
+ snprintf(buf, size, "[ERROR]");
+ return buf;
+ }
+
+ len = len * size / 10000;
+ for (i = 0; i < len && i < size; i++)
+ buf[i] = BAR_CHAR;
+ while (i < size)
+ buf[i++] = ' ';
+ buf[size] = '\0';
+
+ return buf;
+}
+
+struct hist_val_stat {
+ u64 max;
+ u64 total;
+};
+
static void hist_trigger_print_val(struct seq_file *m, unsigned int idx,
const char *field_name, unsigned long flags,
- u64 *totals, struct tracing_map_elt *elt)
+ struct hist_val_stat *stats,
+ struct tracing_map_elt *elt)
{
u64 val = tracing_map_read_sum(elt, idx);
unsigned int pc;
+ char bar[21];

if (flags & HIST_FIELD_FL_PERCENT) {
- pc = __get_percentage(val, totals[idx]);
+ pc = __get_percentage(val, stats[idx].total);
if (pc == UINT_MAX)
seq_printf(m, " %s (%%):[ERROR]", field_name);
else
seq_printf(m, " %s (%%): %3u.%02u", field_name,
pc / 100, pc % 100);
+ } else if (flags & HIST_FIELD_FL_GRAPH) {
+ seq_printf(m, " %s: %20s", field_name,
+ __fill_bar_str(bar, 20, val, stats[idx].max));
} else if (flags & HIST_FIELD_FL_HEX) {
seq_printf(m, " %s: %10llx", field_name, val);
} else {
@@ -5336,7 +5375,7 @@ static void hist_trigger_print_val(struct seq_file *m, unsigned int idx,

static void hist_trigger_entry_print(struct seq_file *m,
struct hist_trigger_data *hist_data,
- u64 *totals,
+ struct hist_val_stat *stats,
void *key,
struct tracing_map_elt *elt)
{
@@ -5347,7 +5386,7 @@ static void hist_trigger_entry_print(struct seq_file *m,
hist_trigger_print_key(m, hist_data, key, elt);

/* At first, show the raw hitcount always */
- hist_trigger_print_val(m, i, "hitcount", 0, totals, elt);
+ hist_trigger_print_val(m, i, "hitcount", 0, stats, elt);

for (i = 1; i < hist_data->n_vals; i++) {
field_name = hist_field_name(hist_data->fields[i], 0);
@@ -5357,7 +5396,7 @@ static void hist_trigger_entry_print(struct seq_file *m,
continue;

seq_puts(m, " ");
- hist_trigger_print_val(m, i, field_name, flags, totals, elt);
+ hist_trigger_print_val(m, i, field_name, flags, stats, elt);
}

print_actions(m, hist_data, elt);
@@ -5371,7 +5410,8 @@ static int print_entries(struct seq_file *m,
struct tracing_map_sort_entry **sort_entries = NULL;
struct tracing_map *map = hist_data->map;
int i, j, n_entries;
- u64 *totals = NULL;
+ struct hist_val_stat *stats = NULL;
+ u64 val;

n_entries = tracing_map_sort_entries(map, hist_data->sort_keys,
hist_data->n_sort_keys,
@@ -5379,28 +5419,33 @@ static int print_entries(struct seq_file *m,
if (n_entries < 0)
return n_entries;

+ /* Calculate the max and the total for each field if needed. */
for (j = 0; j < hist_data->n_vals; j++) {
- if (!(hist_data->fields[j]->flags & HIST_FIELD_FL_PERCENT))
+ if (!(hist_data->fields[j]->flags &
+ (HIST_FIELD_FL_PERCENT | HIST_FIELD_FL_GRAPH)))
continue;
- if (!totals) {
- totals = kcalloc(hist_data->n_vals, sizeof(u64),
- GFP_KERNEL);
- if (!totals) {
+ if (!stats) {
+ stats = kcalloc(hist_data->n_vals, sizeof(*stats),
+ GFP_KERNEL);
+ if (!stats) {
n_entries = -ENOMEM;
goto out;
}
}
- for (i = 0; i < n_entries; i++)
- totals[j] += tracing_map_read_sum(
- sort_entries[i]->elt, j);
+ for (i = 0; i < n_entries; i++) {
+ val = tracing_map_read_sum(sort_entries[i]->elt, j);
+ stats[j].total += val;
+ if (stats[j].max < val)
+ stats[j].max = val;
+ }
}

for (i = 0; i < n_entries; i++)
- hist_trigger_entry_print(m, hist_data, totals,
+ hist_trigger_entry_print(m, hist_data, stats,
sort_entries[i]->key,
sort_entries[i]->elt);

- kfree(totals);
+ kfree(stats);
out:
tracing_map_destroy_sort_entries(sort_entries, n_entries);


2022-09-04 04:23:08

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 6/6] tracing: docs: Update histogram doc for .percent/.graph and 'nohitcount'

From: Masami Hiramatsu (Google) <[email protected]>

Update histogram document for .percent/.graph suffixes and 'nohitcount'
option.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Documentation/trace/histogram.rst | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index 859fd1b76c63..b2da4e9c62e8 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -25,7 +25,7 @@ Documentation written by Tom Zanussi

hist:keys=<field1[,field2,...]>[:values=<field1[,field2,...]>]
[:sort=<field1[,field2,...]>][:size=#entries][:pause][:continue]
- [:clear][:name=histname1][:<handler>.<action>] [if <filter>]
+ [:clear][:name=histname1][:nohitcount][:<handler>.<action>] [if <filter>]

When a matching event is hit, an entry is added to a hash table
using the key(s) and value(s) named. Keys and values correspond to
@@ -79,6 +79,8 @@ Documentation written by Tom Zanussi
.log2 display log2 value rather than raw number
.buckets=size display grouping of values rather than raw number
.usecs display a common_timestamp in microseconds
+ .percent display a number of percentage value
+ .graph display a bar-graph of a value
============= =================================================

Note that in general the semantics of a given field aren't
@@ -137,6 +139,12 @@ Documentation written by Tom Zanussi
existing trigger, rather than via the '>' operator, which will cause
the trigger to be removed through truncation.

+ The 'nohitcount' (or NOHC) parameter will suppress display of
+ raw hitcount in the histogram. This option requires at least one
+ value field which is not a 'raw hitcount'. For example,
+ 'hist:...:vals=hitcount:nohitcount' is rejected, but
+ 'hist:...:vals=hitcount.percent:nohitcount' is OK.
+
- enable_hist/disable_hist

The enable_hist and disable_hist triggers can be used to have one

2022-09-07 02:29:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 1/6] tracing: Allow multiple hitcount values in histograms

On Sun, 4 Sep 2022 13:12:20 +0900
"Masami Hiramatsu (Google)" <[email protected]> wrote:

> From: Tom Zanussi <[email protected]>
>
> The hitcount is treated specially in the histograms - since it's
> always expected to be there regardless of whether the user specified
> anything or not, it's always added as the first histogram value.
>
> Currently the code doesn't allow it to be added more than once as a
> value, which is inconsistent with all the other possible values. It
> would seem to be a pointless thing to want to do, but other features
> being added such as percent and graph modifiers don't work properly
> with the current hitcount restrictions.
>
> Fix this by allowing multiple hitcounts to be added.
>
> Signed-off-by: Tom Zanussi <[email protected]>

Hi Masami,

When posting patches from other people, you still need to add your
Signed-off-by, as that denotes the people who processed the patch before it
made it into git.

-- Steve

2022-09-07 02:32:50

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 2/6] tracing: Fix to check event_mutex is held while accessing trigger list

On Sun, 4 Sep 2022 13:12:29 +0900
"Masami Hiramatsu (Google)" <[email protected]> wrote:

> From: Masami Hiramatsu (Google) <[email protected]>
>
> Since the check_user_trigger() is called outside of RCU
> read lock, this list_for_each_entry_rcu() caused a suspicious
> RCU usage warning.
>
> # echo hist:keys=pid > events/sched/sched_stat_runtime/trigger
> # cat events/sched/sched_stat_runtime/trigger
> [ 43.167032]
> [ 43.167418] =============================
> [ 43.167992] WARNING: suspicious RCU usage
> [ 43.168567] 5.19.0-rc5-00029-g19ebe4651abf #59 Not tainted
> [ 43.169283] -----------------------------
> [ 43.169863] kernel/trace/trace_events_trigger.c:145 RCU-list traversed in non-reader section!!
> ...
>
> However, this file->triggers list is safe when it is accessed
> under event_mutex is held.
> To fix this warning, adds a lockdep_is_held check to the
> list_for_each_entry_rcu().
>
> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> ---

I'm going to pull this patch out of these series, mark it for stable, and
add it to my queue that is going to Linus.

-- Steve

2022-09-07 07:08:25

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 1/6] tracing: Allow multiple hitcount values in histograms

On Tue, 6 Sep 2022 22:03:46 -0400
Steven Rostedt <[email protected]> wrote:

> On Sun, 4 Sep 2022 13:12:20 +0900
> "Masami Hiramatsu (Google)" <[email protected]> wrote:
>
> > From: Tom Zanussi <[email protected]>
> >
> > The hitcount is treated specially in the histograms - since it's
> > always expected to be there regardless of whether the user specified
> > anything or not, it's always added as the first histogram value.
> >
> > Currently the code doesn't allow it to be added more than once as a
> > value, which is inconsistent with all the other possible values. It
> > would seem to be a pointless thing to want to do, but other features
> > being added such as percent and graph modifiers don't work properly
> > with the current hitcount restrictions.
> >
> > Fix this by allowing multiple hitcounts to be added.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
>
> Hi Masami,
>
> When posting patches from other people, you still need to add your
> Signed-off-by, as that denotes the people who processed the patch before it
> made it into git.

Oops, I got it.

Please add my signed-off-by;

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>

Thank you,

>
> -- Steve


--
Masami Hiramatsu (Google) <[email protected]>

2022-09-08 22:24:56

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH v5 0/6] tracing/hist: Add percentage histogram suffixes

Hi Masami,

On Sun, 2022-09-04 at 13:12 +0900, Masami Hiramatsu (Google) wrote:
> Hi,
>
> Here is the 5th version of .percent and .graph suffixes for histogram
> trigger to show the value in percentage and in bar-graph
> respectively.
> The previous version is here;
>
> https://lore.kernel.org/all/166157298537.348924.2537162090505397377.stgit@devnote2
>
> This version updated the last patch to add "nohitcount" option for
> supressing display of hitcount according to Tom's suggestion[5/6],
> fix README file[4/6], and update the histogram document[6/6].
>
> This will help us to check the trend of the histogram instantly
> without any post processing tool.
>
> Here shows an example of the percentage and the bar graph of
> the runtime of the running tasks without raw hitcount field.
>
>   # cd /sys/kernel/debug/tracing/
>   # echo
> hist:keys=pid:vals=hitcount.percent,hitcount.graph:sort=pid:NOHC > \
>         events/sched/sched_stat_runtime/trigger
>   # sleep 10
>   # cat events/sched/sched_stat_runtime/hist
>  # event histogram
>  #
>  # trigger info:
> hist:keys=pid:vals=hitcount.percent,hitcount.graph:sort=pid:size=2048
> :nohitcount [active]
>  #
>  
>  { pid:         14 }  hitcount (%):   6.91  hitcount:
> ##                 
>  { pid:         16 }  hitcount (%):   1.13 
> hitcount:                    
>  { pid:         57 }  hitcount (%):   6.99  hitcount:
> ##                 
>  { pid:         61 }  hitcount (%):  55.28  hitcount:
> ####################
>  { pid:         67 }  hitcount (%):   5.52  hitcount:
> ##                 
>  { pid:         69 }  hitcount (%):  20.08  hitcount:
> #######            
>  { pid:         71 }  hitcount (%):   0.40 
> hitcount:                    
>  { pid:         77 }  hitcount (%):   0.97 
> hitcount:                    
>  { pid:         78 }  hitcount (%):   0.56 
> hitcount:                    
>  { pid:        145 }  hitcount (%):   1.13 
> hitcount:                    
>  { pid:        153 }  hitcount (%):   0.48 
> hitcount:                    
>  { pid:        154 }  hitcount (%):   0.48 
> hitcount:                    
>  
>  Totals:
>      Hits: 1230
>      Entries: 12
>      Dropped: 0
>
>
> With the NOHC (or nohitcount) option, the histogram will skip
> display of raw hitcount, but it will allow to display percent
> and graph of hitcount.
>
> Trigger                      Result
> val=runtime               -> Show hitcount and runtime
> val=runtime:NOHC          -> Show runtime
> val=hitcount:NOHC         -> Error
> (noval)                   -> show hitcount
> (noval):NOHC              -> Error
> val=hitcount.parcent      -> show hitcount and hitcount%
> val=hitcount.parcent:NOHC -> show hitcount%

It all looks good to me, thanks! For the whole set:

Reviewed-by: Tom Zanussi <[email protected]>
Tested-by: Tom Zanussi <[email protected]>

Tom



>
> Thank you,
>
> ---
>
> Masami Hiramatsu (Google) (5):
>       tracing: Fix to check event_mutex is held while accessing
> trigger list
>       tracing: Add .percent suffix option to histogram values
>       tracing: Add .graph suffix option to histogram value
>       tracing: Add nohitcount option for suppressing display of raw
> hitcount
>       tracing: docs: Update histogram doc for .percent/.graph and
> 'nohitcount'
>
> Tom Zanussi (1):
>       tracing: Allow multiple hitcount values in histograms
>
>
>  Documentation/trace/histogram.rst   |   10 ++
>  kernel/trace/trace.c                |    7 +
>  kernel/trace/trace_events_hist.c    |  177
> ++++++++++++++++++++++++++++++-----
>  kernel/trace/trace_events_trigger.c |    3 -
>  4 files changed, 171 insertions(+), 26 deletions(-)
>
> --
> Masami Hiramatsu (Google) <[email protected]>