2020-09-29 20:35:22

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH 0/3] tracing: Add dynamic strings for synthetic events

Hi,

This patchset adds support for dynamic strings for synthetic events,
as requested by Axel Rasmussen.

Actually, the first two patches should be applied in any case - the
first just changes the current max string length and the second fixes
a bug I found while testing.

It works fine for my simple test cases, but I still need to do a lot
more testing, especially of the in-kernel API parts, which don't
affect Axel's use case.

Anyway, Axel, please try it out and send me your Tested-by: if it
works for you, and your broken testcase(s) if it doesn't. ;-)

Thanks,

Tom

The following changes since commit 106c076d23cca67c959a6fd1ccadb5b3ef01ddc9:

mmap_lock: add tracepoints around lock acquisition (2020-09-23 08:48:08 -0500)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/zanussi/linux-trace.git ftrace/synth-dynstring-v0

Tom Zanussi (3):
tracing: Change STR_VAR_MAX_LEN
tracing: Fix parse_synth_field() error handling
tracing: Add support for dynamic strings to synthetic events

Documentation/trace/events.rst | 15 ++-
Documentation/trace/histogram.rst | 18 ++++
kernel/trace/trace_events_hist.c | 9 ++
kernel/trace/trace_events_synth.c | 163 ++++++++++++++++++++++++++----
kernel/trace/trace_synth.h | 6 +-
5 files changed, 186 insertions(+), 25 deletions(-)

--
2.17.1


2020-09-29 20:35:43

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH 2/3] tracing: Fix parse_synth_field() error handling

synth_field_size() returns either the size or an error. However, the
code assigns the return val to ssize_t which is unsigned, and then
tests whether it's less than 0, which it isn't so discards the error.

Do the test before assignment to field->size.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_synth.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index a9cd7793f7ea..6e7282c7b530 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -465,6 +465,7 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
struct synth_field *field;
const char *prefix = NULL, *field_type = argv[0], *field_name, *array;
int len, ret = 0;
+ int size;

if (field_type[0] == ';')
field_type++;
@@ -520,11 +521,12 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
field->type[len - 1] = '\0';
}

- field->size = synth_field_size(field->type);
- if (!field->size) {
+ size = synth_field_size(field->type);
+ if (size < 0) {
ret = -EINVAL;
goto free;
}
+ field->size = size;

if (synth_field_is_string(field->type))
field->is_string = true;
--
2.17.1

2020-09-29 20:36:16

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH 1/3] tracing: Change STR_VAR_MAX_LEN

32 is too small for this value, and anyway it makes more sense to use
MAX_FILTER_STR_VAL, as this is also the value used for variable-length
__strings.

Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_synth.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h
index ac35c45207c4..5166705d1556 100644
--- a/kernel/trace/trace_synth.h
+++ b/kernel/trace/trace_synth.h
@@ -7,7 +7,7 @@
#define SYNTH_SYSTEM "synthetic"
#define SYNTH_FIELDS_MAX 32

-#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */
+#define STR_VAR_LEN_MAX MAX_FILTER_STR_VAL /* must be multiple of sizeof(u64) */

struct synth_field {
char *type;
--
2.17.1

2020-09-29 20:37:03

by Tom Zanussi

[permalink] [raw]
Subject: [PATCH 3/3] tracing: Add support for dynamic strings to synthetic events

Currently, sythetic events only support static string fields such as:

# echo 'test_latency u64 lat; char somename[32]' > /sys/kernel/debug/tracing/synthetic_events

Which is fine, but wastes a lot of space in the event.

It also prevents the most commonly-defined strings in the existing
trace events e.g. those defined using __string(), from being passed to
synthetic events via the trace() action.

With this change, synthetic events with dynamic fields can be defined:

# echo 'test_latency u64 lat; char somename[]' > /sys/kernel/debug/tracing/synthetic_events

And the trace() action can be used to generate events using either
dynamic or static strings:

# echo 'hist:keys=name:lat=common_timestamp.usecs-$ts0:onmatch(sys.event).test_latency($lat,name)' > /sys/kernel/debug/tracing/events

The synthetic event dynamic strings are implemented in the same way as
the existing __data_loc strings and appear as such in the format file.

Signed-off-by: Tom Zanussi <[email protected]>
---
Documentation/trace/events.rst | 15 ++-
Documentation/trace/histogram.rst | 18 ++++
kernel/trace/trace_events_hist.c | 9 ++
kernel/trace/trace_events_synth.c | 157 ++++++++++++++++++++++++++----
kernel/trace/trace_synth.h | 4 +
5 files changed, 181 insertions(+), 22 deletions(-)

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index f792b1959a33..2a5aa48eff6c 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -589,8 +589,19 @@ name::
{ .type = "int", .name = "my_int_field" },
};

-See synth_field_size() for available types. If field_name contains [n]
-the field is considered to be an array.
+See synth_field_size() for available types.
+
+If field_name contains [n], the field is considered to be a static array.
+
+If field_names contains[] (no subscript), the field is considered to
+be a dynamic array, which will only take as much space in the event as
+is required to hold the array.
+
+Because space for an event is reserved before assigning field values
+to the event, using dynamic arrays implies that the piecewise
+in-kernel API described below can't be used with dynamic arrays. The
+other non-piecewise in-kernel APIs can, however, be used with dynamic
+arrays.

If the event is created from within a module, a pointer to the module
must be passed to synth_event_create(). This will ensure that the
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index 8408670d0328..b573604deabd 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1776,6 +1776,24 @@ consisting of the name of the new event along with one or more
variables and their types, which can be any valid field type,
separated by semicolons, to the tracing/synthetic_events file.

+See synth_field_size() for available types.
+
+If field_name contains [n], the field is considered to be a static array.
+
+If field_names contains[] (no subscript), the field is considered to
+be a dynamic array, which will only take as much space in the event as
+is required to hold the array.
+
+A string field can be specified using either the static notation:
+
+ char name[32];
+
+Or the dynamic:
+
+ char name[];
+
+The size limit for either is 256.
+
For instance, the following creates a new event named 'wakeup_latency'
with 3 fields: lat, pid, and prio. Each of those fields is simply a
variable reference to a variable on another event::
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1b2ef6490229..ee6b27c82350 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -3279,6 +3279,15 @@ static int check_synth_field(struct synth_event *event,

field = event->fields[field_pos];

+ /*
+ * A dynamic string synth field can accept static or
+ * dynamic. A static string synth field can only accept a
+ * same-sized static string, which is checked for later.
+ */
+ if (strstr(hist_field->type, "char[") && field->is_string
+ && field->is_dynamic)
+ return 0;
+
if (strcmp(field->type, hist_field->type) != 0) {
if (field->size != hist_field->size ||
field->is_signed != hist_field->is_signed)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 6e7282c7b530..00c6c61038d1 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -88,7 +88,7 @@ static int synth_event_define_fields(struct trace_event_call *call)

event->fields[i]->offset = n_u64;

- if (event->fields[i]->is_string) {
+ if (event->fields[i]->is_string && !event->fields[i]->is_dynamic) {
offset += STR_VAR_LEN_MAX;
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
@@ -139,6 +139,9 @@ static int synth_field_string_size(char *type)
if (len > 3)
return -EINVAL;

+ if (len == 0)
+ return 0; /* variable-length string */
+
strncpy(buf, start, len);
buf[len] = '\0';

@@ -290,10 +293,25 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,

/* parameter values */
if (se->fields[i]->is_string) {
- trace_seq_printf(s, print_fmt, se->fields[i]->name,
- (char *)&entry->fields[n_u64],
- i == se->n_fields - 1 ? "" : " ");
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ if (se->fields[i]->is_dynamic) {
+ u32 offset, data_offset;
+ char *str_field;
+
+ offset = (u32)entry->fields[n_u64];
+ data_offset = offset & 0xffff;
+
+ str_field = (char *)entry + data_offset;
+
+ trace_seq_printf(s, print_fmt, se->fields[i]->name,
+ str_field,
+ i == se->n_fields - 1 ? "" : " ");
+ n_u64++;
+ } else {
+ trace_seq_printf(s, print_fmt, se->fields[i]->name,
+ (char *)&entry->fields[n_u64],
+ i == se->n_fields - 1 ? "" : " ");
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ }
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
@@ -325,16 +343,52 @@ static struct trace_event_functions synth_event_funcs = {
.trace = print_synth_event
};

+static unsigned int trace_string(struct synth_trace_event *entry,
+ struct synth_event *event,
+ char *str_val,
+ bool is_dynamic,
+ unsigned int data_size,
+ unsigned int *n_u64)
+{
+ unsigned int len = 0;
+ char *str_field;
+
+ if (is_dynamic) {
+ u32 data_offset;
+
+ data_offset = offsetof(typeof(*entry), fields);
+ data_offset += event->n_u64 * sizeof(u64);
+ data_offset += data_size;
+
+ str_field = (char *)entry + data_offset;
+
+ len = strlen(str_val) + 1;
+ strscpy(str_field, str_val, len);
+
+ data_offset |= len << 16;
+ *(u32 *)&entry->fields[*n_u64] = data_offset;
+
+ (*n_u64)++;
+ } else {
+ str_field = (char *)&entry->fields[*n_u64];
+
+ strscpy(str_field, str_val, STR_VAR_LEN_MAX);
+ (*n_u64) += STR_VAR_LEN_MAX / sizeof(u64);
+ }
+
+ return len;
+}
+
static notrace void trace_event_raw_event_synth(void *__data,
u64 *var_ref_vals,
unsigned int *var_ref_idx)
{
+ unsigned int i, n_u64, val_idx, len, data_size = 0;
struct trace_event_file *trace_file = __data;
struct synth_trace_event *entry;
struct trace_event_buffer fbuffer;
struct trace_buffer *buffer;
struct synth_event *event;
- unsigned int i, n_u64, val_idx;
int fields_size = 0;

event = trace_file->event_call->data;
@@ -344,6 +398,18 @@ static notrace void trace_event_raw_event_synth(void *__data,

fields_size = event->n_u64 * sizeof(u64);

+ for (i = 0; i < event->n_dynamic_fields; i++) {
+ unsigned int field_pos = event->dynamic_fields[i]->field_pos;
+ char *str_val;
+
+ val_idx = var_ref_idx[field_pos];
+ str_val = (char *)(long)var_ref_vals[val_idx];
+
+ len = strlen(str_val) + 1;
+
+ fields_size += len;
+ }
+
/*
* Avoid ring buffer recursion detection, as this event
* is being performed within another event.
@@ -360,10 +426,11 @@ static notrace void trace_event_raw_event_synth(void *__data,
val_idx = var_ref_idx[i];
if (event->fields[i]->is_string) {
char *str_val = (char *)(long)var_ref_vals[val_idx];
- char *str_field = (char *)&entry->fields[n_u64];

- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ len = trace_string(entry, event, str_val,
+ event->fields[i]->is_dynamic,
+ data_size, &n_u64);
+ data_size += len; /* only dynamic string increments */
} else {
struct synth_field *field = event->fields[i];
u64 val = var_ref_vals[val_idx];
@@ -525,6 +592,27 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
if (size < 0) {
ret = -EINVAL;
goto free;
+ } else if (size == 0) {
+ if (synth_field_is_string(field->type)) {
+ char *type;
+
+ type = kzalloc(sizeof("__data_loc ") + strlen(field->type) + 1, GFP_KERNEL);
+ if (!type) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ strcat(type, "__data_loc ");
+ strcat(type, field->type);
+ kfree(field->type);
+ field->type = type;
+
+ field->is_dynamic = true;
+ size = sizeof(u64);
+ } else {
+ ret = -EINVAL;
+ goto free;
+ }
}
field->size = size;

@@ -532,7 +620,6 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
field->is_string = true;

field->is_signed = synth_field_signed(field->type);
-
out:
return field;
free:
@@ -663,6 +750,7 @@ static void free_synth_event(struct synth_event *event)
free_synth_field(event->fields[i]);

kfree(event->fields);
+ kfree(event->dynamic_fields);
kfree(event->name);
kfree(event->class.system);
free_synth_tracepoint(event->tp);
@@ -673,8 +761,8 @@ static void free_synth_event(struct synth_event *event)
static struct synth_event *alloc_synth_event(const char *name, int n_fields,
struct synth_field **fields)
{
+ unsigned int i, j, n_dynamic_fields = 0;
struct synth_event *event;
- unsigned int i;

event = kzalloc(sizeof(*event), GFP_KERNEL);
if (!event) {
@@ -696,11 +784,33 @@ static struct synth_event *alloc_synth_event(const char *name, int n_fields,
goto out;
}

+ for (i = 0; i < n_fields; i++)
+ if (fields[i]->is_dynamic)
+ n_dynamic_fields++;
+
+ if (n_dynamic_fields) {
+ event->dynamic_fields = kcalloc(n_dynamic_fields,
+ sizeof(*event->dynamic_fields),
+ GFP_KERNEL);
+ if (!event->dynamic_fields) {
+ free_synth_event(event);
+ event = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+ }
+
dyn_event_init(&event->devent, &synth_event_ops);

- for (i = 0; i < n_fields; i++)
+ for (i = 0, j = 0; i < n_fields; i++) {
event->fields[i] = fields[i];

+ if (fields[i]->is_dynamic) {
+ event->dynamic_fields[j] = fields[i];
+ event->dynamic_fields[j]->field_pos = i;
+ event->dynamic_fields[j++] = fields[i];
+ event->n_dynamic_fields++;
+ }
+ }
event->n_fields = n_fields;
out:
return event;
@@ -1276,8 +1386,8 @@ __synth_event_trace_end(struct synth_event_trace_state *trace_state)
*/
int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
{
+ unsigned int i, n_u64, len, data_size = 0;
struct synth_event_trace_state state;
- unsigned int i, n_u64;
va_list args;
int ret;

@@ -1301,10 +1411,11 @@ int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)

if (state.event->fields[i]->is_string) {
char *str_val = (char *)(long)val;
- char *str_field = (char *)&state.entry->fields[n_u64];

- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ len = trace_string(state.entry, state.event, str_val,
+ state.event->fields[i]->is_dynamic,
+ data_size, &n_u64);
+ data_size += len; /* only dynamic string increments */
} else {
struct synth_field *field = state.event->fields[i];

@@ -1357,8 +1468,8 @@ EXPORT_SYMBOL_GPL(synth_event_trace);
int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
unsigned int n_vals)
{
+ unsigned int i, n_u64, len, data_size = 0;
struct synth_event_trace_state state;
- unsigned int i, n_u64;
int ret;

ret = __synth_event_trace_start(file, &state);
@@ -1376,10 +1487,11 @@ int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
if (state.event->fields[i]->is_string) {
char *str_val = (char *)(long)vals[i];
- char *str_field = (char *)&state.entry->fields[n_u64];

- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ len = trace_string(state.entry, state.event, str_val,
+ state.event->fields[i]->is_dynamic,
+ data_size, &n_u64);
+ data_size += len; /* only dynamic string increments */
} else {
struct synth_field *field = state.event->fields[i];
u64 val = vals[i];
@@ -1510,6 +1622,11 @@ static int __synth_event_add_val(const char *field_name, u64 val,
char *str_val = (char *)(long)val;
char *str_field;

+ if (field->is_dynamic) { /* add_val can't do dynamic strings */
+ ret = -EINVAL;
+ goto out;
+ }
+
if (!str_val) {
ret = -EINVAL;
goto out;
diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h
index 5166705d1556..6e146b959dcd 100644
--- a/kernel/trace/trace_synth.h
+++ b/kernel/trace/trace_synth.h
@@ -16,6 +16,8 @@ struct synth_field {
unsigned int offset;
bool is_signed;
bool is_string;
+ bool is_dynamic;
+ bool field_pos;
};

struct synth_event {
@@ -24,6 +26,8 @@ struct synth_event {
char *name;
struct synth_field **fields;
unsigned int n_fields;
+ struct synth_field **dynamic_fields;
+ unsigned int n_dynamic_fields;
unsigned int n_u64;
struct trace_event_class class;
struct trace_event_call call;
--
2.17.1

2020-09-29 22:01:34

by Axel Rasmussen

[permalink] [raw]
Subject: Re: [PATCH 2/3] tracing: Fix parse_synth_field() error handling

On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]> wrote:
>
> synth_field_size() returns either the size or an error. However, the
> code assigns the return val to ssize_t which is unsigned, and then
> tests whether it's less than 0, which it isn't so discards the error.

I think the patch is correct, but the commit message is not.
field->size is a size_t (unsigned), not an ssize_t (signed). I think
this should say instead something like:

synth_field_size() returns either a positive size or an error (zero or
a negative value). However, the existing code assumes the only error
value is 0. It doesn't handle negative error codes, as it assigns
directly to field->size (a size_t; unsigned), thereby interpreting the
error code as a valid size instead.

>
> Do the test before assignment to field->size.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/trace_events_synth.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
> index a9cd7793f7ea..6e7282c7b530 100644
> --- a/kernel/trace/trace_events_synth.c
> +++ b/kernel/trace/trace_events_synth.c
> @@ -465,6 +465,7 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
> struct synth_field *field;
> const char *prefix = NULL, *field_type = argv[0], *field_name, *array;
> int len, ret = 0;
> + int size;

Why not make this an ssize_t

>
> if (field_type[0] == ';')
> field_type++;
> @@ -520,11 +521,12 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
> field->type[len - 1] = '\0';
> }
>
> - field->size = synth_field_size(field->type);
> - if (!field->size) {
> + size = synth_field_size(field->type);
> + if (size < 0) {
> ret = -EINVAL;
> goto free;
> }
> + field->size = size;
>
> if (synth_field_is_string(field->type))
> field->is_string = true;
> --
> 2.17.1
>

2020-09-29 22:06:21

by Axel Rasmussen

[permalink] [raw]
Subject: Re: [PATCH 3/3] tracing: Add support for dynamic strings to synthetic events

On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]> wrote:
>
> Currently, sythetic events only support static string fields such as:
>
> # echo 'test_latency u64 lat; char somename[32]' > /sys/kernel/debug/tracing/synthetic_events
>
> Which is fine, but wastes a lot of space in the event.
>
> It also prevents the most commonly-defined strings in the existing
> trace events e.g. those defined using __string(), from being passed to
> synthetic events via the trace() action.
>
> With this change, synthetic events with dynamic fields can be defined:
>
> # echo 'test_latency u64 lat; char somename[]' > /sys/kernel/debug/tracing/synthetic_events
>
> And the trace() action can be used to generate events using either
> dynamic or static strings:
>
> # echo 'hist:keys=name:lat=common_timestamp.usecs-$ts0:onmatch(sys.event).test_latency($lat,name)' > /sys/kernel/debug/tracing/events
>
> The synthetic event dynamic strings are implemented in the same way as
> the existing __data_loc strings and appear as such in the format file.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> Documentation/trace/events.rst | 15 ++-
> Documentation/trace/histogram.rst | 18 ++++
> kernel/trace/trace_events_hist.c | 9 ++
> kernel/trace/trace_events_synth.c | 157 ++++++++++++++++++++++++++----
> kernel/trace/trace_synth.h | 4 +
> 5 files changed, 181 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
> index f792b1959a33..2a5aa48eff6c 100644
> --- a/Documentation/trace/events.rst
> +++ b/Documentation/trace/events.rst
> @@ -589,8 +589,19 @@ name::
> { .type = "int", .name = "my_int_field" },
> };
>
> -See synth_field_size() for available types. If field_name contains [n]
> -the field is considered to be an array.
> +See synth_field_size() for available types.
> +
> +If field_name contains [n], the field is considered to be a static array.
> +
> +If field_names contains[] (no subscript), the field is considered to
> +be a dynamic array, which will only take as much space in the event as
> +is required to hold the array.
> +
> +Because space for an event is reserved before assigning field values
> +to the event, using dynamic arrays implies that the piecewise
> +in-kernel API described below can't be used with dynamic arrays. The
> +other non-piecewise in-kernel APIs can, however, be used with dynamic
> +arrays.
>
> If the event is created from within a module, a pointer to the module
> must be passed to synth_event_create(). This will ensure that the
> diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
> index 8408670d0328..b573604deabd 100644
> --- a/Documentation/trace/histogram.rst
> +++ b/Documentation/trace/histogram.rst
> @@ -1776,6 +1776,24 @@ consisting of the name of the new event along with one or more
> variables and their types, which can be any valid field type,
> separated by semicolons, to the tracing/synthetic_events file.
>
> +See synth_field_size() for available types.
> +
> +If field_name contains [n], the field is considered to be a static array.
> +
> +If field_names contains[] (no subscript), the field is considered to
> +be a dynamic array, which will only take as much space in the event as
> +is required to hold the array.
> +
> +A string field can be specified using either the static notation:
> +
> + char name[32];
> +
> +Or the dynamic:
> +
> + char name[];
> +
> +The size limit for either is 256.
> +
> For instance, the following creates a new event named 'wakeup_latency'
> with 3 fields: lat, pid, and prio. Each of those fields is simply a
> variable reference to a variable on another event::
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 1b2ef6490229..ee6b27c82350 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -3279,6 +3279,15 @@ static int check_synth_field(struct synth_event *event,
>
> field = event->fields[field_pos];
>
> + /*
> + * A dynamic string synth field can accept static or
> + * dynamic. A static string synth field can only accept a
> + * same-sized static string, which is checked for later.
> + */
> + if (strstr(hist_field->type, "char[") && field->is_string
> + && field->is_dynamic)
> + return 0;
> +
> if (strcmp(field->type, hist_field->type) != 0) {
> if (field->size != hist_field->size ||
> field->is_signed != hist_field->is_signed)
> diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
> index 6e7282c7b530..00c6c61038d1 100644
> --- a/kernel/trace/trace_events_synth.c
> +++ b/kernel/trace/trace_events_synth.c
> @@ -88,7 +88,7 @@ static int synth_event_define_fields(struct trace_event_call *call)
>
> event->fields[i]->offset = n_u64;
>
> - if (event->fields[i]->is_string) {
> + if (event->fields[i]->is_string && !event->fields[i]->is_dynamic) {
> offset += STR_VAR_LEN_MAX;
> n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> } else {
> @@ -139,6 +139,9 @@ static int synth_field_string_size(char *type)
> if (len > 3)
> return -EINVAL;
>
> + if (len == 0)
> + return 0; /* variable-length string */
> +
> strncpy(buf, start, len);
> buf[len] = '\0';
>
> @@ -290,10 +293,25 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
>
> /* parameter values */
> if (se->fields[i]->is_string) {
> - trace_seq_printf(s, print_fmt, se->fields[i]->name,
> - (char *)&entry->fields[n_u64],
> - i == se->n_fields - 1 ? "" : " ");
> - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + if (se->fields[i]->is_dynamic) {
> + u32 offset, data_offset;
> + char *str_field;
> +
> + offset = (u32)entry->fields[n_u64];
> + data_offset = offset & 0xffff;
> +
> + str_field = (char *)entry + data_offset;

Is it better to re-use __get_str from include/trace/trace_events.h
instead of writing this out directly?

> +
> + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> + str_field,
> + i == se->n_fields - 1 ? "" : " ");
> + n_u64++;
> + } else {
> + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> + (char *)&entry->fields[n_u64],
> + i == se->n_fields - 1 ? "" : " ");
> + n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + }
> } else {
> struct trace_print_flags __flags[] = {
> __def_gfpflag_names, {-1, NULL} };
> @@ -325,16 +343,52 @@ static struct trace_event_functions synth_event_funcs = {
> .trace = print_synth_event
> };
>
> +static unsigned int trace_string(struct synth_trace_event *entry,
> + struct synth_event *event,
> + char *str_val,
> + bool is_dynamic,
> + unsigned int data_size,
> + unsigned int *n_u64)
> +{
> + unsigned int len = 0;
> + char *str_field;
> +
> + if (is_dynamic) {
> + u32 data_offset;
> +
> + data_offset = offsetof(typeof(*entry), fields);
> + data_offset += event->n_u64 * sizeof(u64);
> + data_offset += data_size;
> +
> + str_field = (char *)entry + data_offset;
> +
> + len = strlen(str_val) + 1;
> + strscpy(str_field, str_val, len);
> +
> + data_offset |= len << 16;
> + *(u32 *)&entry->fields[*n_u64] = data_offset;

Similar thing here, is it possible to reuse __dynamic_array or __string?

> +
> + (*n_u64)++;
> + } else {
> + str_field = (char *)&entry->fields[*n_u64];
> +
> + strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> + (*n_u64) += STR_VAR_LEN_MAX / sizeof(u64);
> + }
> +
> + return len;
> +}
> +
> static notrace void trace_event_raw_event_synth(void *__data,
> u64 *var_ref_vals,
> unsigned int *var_ref_idx)
> {
> + unsigned int i, n_u64, val_idx, len, data_size = 0;
> struct trace_event_file *trace_file = __data;
> struct synth_trace_event *entry;
> struct trace_event_buffer fbuffer;
> struct trace_buffer *buffer;
> struct synth_event *event;
> - unsigned int i, n_u64, val_idx;
> int fields_size = 0;
>
> event = trace_file->event_call->data;
> @@ -344,6 +398,18 @@ static notrace void trace_event_raw_event_synth(void *__data,
>
> fields_size = event->n_u64 * sizeof(u64);
>
> + for (i = 0; i < event->n_dynamic_fields; i++) {
> + unsigned int field_pos = event->dynamic_fields[i]->field_pos;
> + char *str_val;
> +
> + val_idx = var_ref_idx[field_pos];
> + str_val = (char *)(long)var_ref_vals[val_idx];
> +
> + len = strlen(str_val) + 1;
> +
> + fields_size += len;
> + }
> +
> /*
> * Avoid ring buffer recursion detection, as this event
> * is being performed within another event.
> @@ -360,10 +426,11 @@ static notrace void trace_event_raw_event_synth(void *__data,
> val_idx = var_ref_idx[i];
> if (event->fields[i]->is_string) {
> char *str_val = (char *)(long)var_ref_vals[val_idx];
> - char *str_field = (char *)&entry->fields[n_u64];
>
> - strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + len = trace_string(entry, event, str_val,
> + event->fields[i]->is_dynamic,
> + data_size, &n_u64);
> + data_size += len; /* only dynamic string increments */
> } else {
> struct synth_field *field = event->fields[i];
> u64 val = var_ref_vals[val_idx];
> @@ -525,6 +592,27 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
> if (size < 0) {
> ret = -EINVAL;
> goto free;
> + } else if (size == 0) {
> + if (synth_field_is_string(field->type)) {
> + char *type;
> +
> + type = kzalloc(sizeof("__data_loc ") + strlen(field->type) + 1, GFP_KERNEL);
> + if (!type) {
> + ret = -ENOMEM;
> + goto free;
> + }
> +
> + strcat(type, "__data_loc ");
> + strcat(type, field->type);
> + kfree(field->type);
> + field->type = type;
> +
> + field->is_dynamic = true;
> + size = sizeof(u64);
> + } else {
> + ret = -EINVAL;
> + goto free;
> + }
> }
> field->size = size;
>
> @@ -532,7 +620,6 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
> field->is_string = true;
>
> field->is_signed = synth_field_signed(field->type);
> -
> out:
> return field;
> free:
> @@ -663,6 +750,7 @@ static void free_synth_event(struct synth_event *event)
> free_synth_field(event->fields[i]);
>
> kfree(event->fields);
> + kfree(event->dynamic_fields);
> kfree(event->name);
> kfree(event->class.system);
> free_synth_tracepoint(event->tp);
> @@ -673,8 +761,8 @@ static void free_synth_event(struct synth_event *event)
> static struct synth_event *alloc_synth_event(const char *name, int n_fields,
> struct synth_field **fields)
> {
> + unsigned int i, j, n_dynamic_fields = 0;
> struct synth_event *event;
> - unsigned int i;
>
> event = kzalloc(sizeof(*event), GFP_KERNEL);
> if (!event) {
> @@ -696,11 +784,33 @@ static struct synth_event *alloc_synth_event(const char *name, int n_fields,
> goto out;
> }
>
> + for (i = 0; i < n_fields; i++)
> + if (fields[i]->is_dynamic)
> + n_dynamic_fields++;
> +
> + if (n_dynamic_fields) {
> + event->dynamic_fields = kcalloc(n_dynamic_fields,
> + sizeof(*event->dynamic_fields),
> + GFP_KERNEL);
> + if (!event->dynamic_fields) {
> + free_synth_event(event);
> + event = ERR_PTR(-ENOMEM);
> + goto out;
> + }
> + }
> +
> dyn_event_init(&event->devent, &synth_event_ops);
>
> - for (i = 0; i < n_fields; i++)
> + for (i = 0, j = 0; i < n_fields; i++) {
> event->fields[i] = fields[i];
>
> + if (fields[i]->is_dynamic) {
> + event->dynamic_fields[j] = fields[i];
> + event->dynamic_fields[j]->field_pos = i;
> + event->dynamic_fields[j++] = fields[i];
> + event->n_dynamic_fields++;
> + }
> + }
> event->n_fields = n_fields;
> out:
> return event;
> @@ -1276,8 +1386,8 @@ __synth_event_trace_end(struct synth_event_trace_state *trace_state)
> */
> int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
> {
> + unsigned int i, n_u64, len, data_size = 0;
> struct synth_event_trace_state state;
> - unsigned int i, n_u64;
> va_list args;
> int ret;
>
> @@ -1301,10 +1411,11 @@ int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
>
> if (state.event->fields[i]->is_string) {
> char *str_val = (char *)(long)val;
> - char *str_field = (char *)&state.entry->fields[n_u64];
>
> - strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + len = trace_string(state.entry, state.event, str_val,
> + state.event->fields[i]->is_dynamic,
> + data_size, &n_u64);
> + data_size += len; /* only dynamic string increments */
> } else {
> struct synth_field *field = state.event->fields[i];
>
> @@ -1357,8 +1468,8 @@ EXPORT_SYMBOL_GPL(synth_event_trace);
> int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
> unsigned int n_vals)
> {
> + unsigned int i, n_u64, len, data_size = 0;
> struct synth_event_trace_state state;
> - unsigned int i, n_u64;
> int ret;
>
> ret = __synth_event_trace_start(file, &state);
> @@ -1376,10 +1487,11 @@ int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
> for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
> if (state.event->fields[i]->is_string) {
> char *str_val = (char *)(long)vals[i];
> - char *str_field = (char *)&state.entry->fields[n_u64];
>
> - strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + len = trace_string(state.entry, state.event, str_val,
> + state.event->fields[i]->is_dynamic,
> + data_size, &n_u64);
> + data_size += len; /* only dynamic string increments */
> } else {
> struct synth_field *field = state.event->fields[i];
> u64 val = vals[i];
> @@ -1510,6 +1622,11 @@ static int __synth_event_add_val(const char *field_name, u64 val,
> char *str_val = (char *)(long)val;
> char *str_field;
>
> + if (field->is_dynamic) { /* add_val can't do dynamic strings */
> + ret = -EINVAL;
> + goto out;
> + }
> +
> if (!str_val) {
> ret = -EINVAL;
> goto out;
> diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h
> index 5166705d1556..6e146b959dcd 100644
> --- a/kernel/trace/trace_synth.h
> +++ b/kernel/trace/trace_synth.h
> @@ -16,6 +16,8 @@ struct synth_field {
> unsigned int offset;
> bool is_signed;
> bool is_string;
> + bool is_dynamic;
> + bool field_pos;
> };
>
> struct synth_event {
> @@ -24,6 +26,8 @@ struct synth_event {
> char *name;
> struct synth_field **fields;
> unsigned int n_fields;
> + struct synth_field **dynamic_fields;
> + unsigned int n_dynamic_fields;
> unsigned int n_u64;
> struct trace_event_class class;
> struct trace_event_call call;
> --
> 2.17.1
>

2020-09-29 22:11:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 3/3] tracing: Add support for dynamic strings to synthetic events

On Tue, 29 Sep 2020 15:01:59 -0700
Axel Rasmussen <[email protected]> wrote:

> > event->fields[i]->offset = n_u64;
> >
> > - if (event->fields[i]->is_string) {
> > + if (event->fields[i]->is_string && !event->fields[i]->is_dynamic) {
> > offset += STR_VAR_LEN_MAX;
> > n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> > } else {
> > @@ -139,6 +139,9 @@ static int synth_field_string_size(char *type)
> > if (len > 3)
> > return -EINVAL;
> >
> > + if (len == 0)
> > + return 0; /* variable-length string */
> > +
> > strncpy(buf, start, len);
> > buf[len] = '\0';
> >
> > @@ -290,10 +293,25 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
> >
> > /* parameter values */
> > if (se->fields[i]->is_string) {
> > - trace_seq_printf(s, print_fmt, se->fields[i]->name,
> > - (char *)&entry->fields[n_u64],
> > - i == se->n_fields - 1 ? "" : " ");
> > - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> > + if (se->fields[i]->is_dynamic) {
> > + u32 offset, data_offset;
> > + char *str_field;
> > +
> > + offset = (u32)entry->fields[n_u64];
> > + data_offset = offset & 0xffff;
> > +
> > + str_field = (char *)entry + data_offset;
>
> Is it better to re-use __get_str from include/trace/trace_events.h
> instead of writing this out directly?
>
> > +
> > + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> > + str_field,
> > + i == se->n_fields - 1 ? "" : " ");
> > + n_u64++;
> > + } else {
> > + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> > + (char *)&entry->fields[n_u64],
> > + i == se->n_fields - 1 ? "" : " ");
> > + n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> > + }
> > } else {
> > struct trace_print_flags __flags[] = {
> > __def_gfpflag_names, {-1, NULL} };
> > @@ -325,16 +343,52 @@ static struct trace_event_functions synth_event_funcs = {
> > .trace = print_synth_event
> > };
> >
> > +static unsigned int trace_string(struct synth_trace_event *entry,
> > + struct synth_event *event,
> > + char *str_val,
> > + bool is_dynamic,
> > + unsigned int data_size,
> > + unsigned int *n_u64)
> > +{
> > + unsigned int len = 0;
> > + char *str_field;
> > +
> > + if (is_dynamic) {
> > + u32 data_offset;
> > +
> > + data_offset = offsetof(typeof(*entry), fields);
> > + data_offset += event->n_u64 * sizeof(u64);
> > + data_offset += data_size;
> > +
> > + str_field = (char *)entry + data_offset;
> > +
> > + len = strlen(str_val) + 1;
> > + strscpy(str_field, str_val, len);
> > +
> > + data_offset |= len << 16;
> > + *(u32 *)&entry->fields[*n_u64] = data_offset;
>
> Similar thing here, is it possible to reuse __dynamic_array or __string?

Interesting idea.

I'd prefer to keep it broken out for this patch set, and then we could look
at incorporating this file to use the macros of <trace/trace_event.h>.

But I much rather have that be a separate patch that does that as an
enhancement than to include it in this file doing more work.

-- Steve


>
> > +
> > + (*n_u64)++;
> > + } else {
> > + str_field = (char *)&entry->fields[*n_u64];
> > +
> > + strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> > + (*n_u64) += STR_VAR_LEN_MAX / sizeof(u64);
> > + }
> > +
> > + return len;
> > +}
> > +
> > static n

2020-09-29 22:21:27

by Axel Rasmussen

[permalink] [raw]
Subject: Re: [PATCH 0/3] tracing: Add dynamic strings for synthetic events

On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]> wrote:
>
> Hi,
>
> This patchset adds support for dynamic strings for synthetic events,
> as requested by Axel Rasmussen.
>
> Actually, the first two patches should be applied in any case - the
> first just changes the current max string length and the second fixes
> a bug I found while testing.
>
> It works fine for my simple test cases, but I still need to do a lot
> more testing, especially of the in-kernel API parts, which don't
> affect Axel's use case.
>
> Anyway, Axel, please try it out and send me your Tested-by: if it
> works for you, and your broken testcase(s) if it doesn't. ;-)

Thanks for working on this!

Capturing a histogram with a histogram trigger on the synthetic
event[1] seems to work as expected for me.

However, I tried creating the synthetic event, and then capturing a
histogram from it using bpftrace [2], and this doesn't seem to work.
It's completely possible I'm just doing something wrong, or the
bpftrace I'm using is buggy (getting it to work with __data_loc fields
at all requires a super bleeding edge bpftrace [3]). bpftrace reports
that it fails to attach to the synthetic event, and I hit some
WARN_ONs [4].

Does it seem possible this issue is related? If not, feel free to
disregard. :) I'll spend some more time digging into this tomorrow.



[1]:
echo 'mmap_lock_latency u64 time; char memcg_path[], bool write' > \
/sys/kernel/tracing/synthetic_events

echo 'hist:keys=memcg_path,write:ts0=common_timestamp.usecs' > \
/sys/kernel/tracing/events/mmap_lock/mmap_lock_start_locking/trigger

echo 'hist:keys=memcg_path,write:latency=common_timestamp.usecs-$ts0:onmatch(mmap_lock.mmap_lock_start_locking).mmap_lock_latency($latency,memcg_path,write)'
> \
/sys/kernel/tracing/events/mmap_lock/mmap_lock_acquire_returned/trigger

echo 'hist:keys=memcg_path,write:values=time' > \
/sys/kernel/tracing/events/synthetic/mmap_lock_latency/trigger

[2]:
(Same setup of the synthetic event as [1], except adding the histogram
trigger...)

cat <<EOF > /root/bpftypes.h
#include <stdbool.h>
#include <stdint.h>
typedef uint64_t u64;
typedef int64_t s64;
EOF

cat <<EOF > /root/mmap-lock-per-memcg.bpf
tracepoint:synthetic:mmap_lock_latency {
@latency[args->memcg_path] = hist(args->time);
}
EOF

/home/axelrasmussen/bpftrace/build/src/bpftrace \
-v \
-I "/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include" \
--include "/root/bpftypes.h" \
/root/mmap-lock-per-memcg.bpf

[3]: https://github.com/iovisor/bpftrace/pull/1542

[4]:
[ 31.361814] kauditd_printk_skb: 6 callbacks suppressed
[ 31.361815] audit: type=1334 audit(1601414765.236:34): prog-id=11 op=LOAD

From `WARN_ON(!tp_func->func)`:

[ 31.365310] ------------[ cut here ]------------
[ 31.365316] WARNING: CPU: 0 PID: 235 at kernel/tracepoint.c:136
tracepoint_add_func+0x2ad/0x300
[ 31.366594] Modules linked in: xhci_pci xhci_hcd virtio_net
net_failover failover virtio_crypto crypto_engine virtio_console
virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_ht
[ 31.366603] CPU: 0 PID: 235 Comm: bpftrace Not tainted 5.9.0-smp-DEV #1
[ 31.366604] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.13.0-1 04/01/2014
[ 31.366606] RIP: 0010:tracepoint_add_func+0x2ad/0x300
[ 31.366607] Code: 17 ff ff ff 4c 89 ff e8 e1 0c 0a 00 e9 22 ff ff
ff 41 ff 57 18 85 c0 4c 8b 04 24 8b 54 24 08 0f 89 9c fd ff ff e9 31
ff ff ff <0f> 0b b8 ea ff ff ff e9 f9 fd5
[ 31.366607] RSP: 0018:ffff911554fdbc50 EFLAGS: 00010246
[ 31.366608] RAX: 0000000000000000 RBX: ffff9115583c1200 RCX: 0000000000000000
[ 31.366609] RDX: 000000000000000a RSI: 0000000000000000 RDI: ffff9115583c1200
[ 31.366609] RBP: ffff911554fdbc90 R08: ffff911554fdbca0 R09: ffffcaabffdb5018
[ 31.366610] R10: 0000000000000080 R11: 0000000000000080 R12: ffff911558013c90
[ 31.366610] R13: 0000000000000000 R14: ffff911554f84d10 R15: ffff9115583c1200
[ 31.366611] FS: 00007f9bdf8f0c00(0000) GS:ffff91155fc00000(0000)
knlGS:0000000000000000
[ 31.366612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.366614] CR2: 00007f9bdefbc000 CR3: 0000000814fe0006 CR4: 0000000000370ef0
[ 31.366615] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 31.366615] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 31.366616] Call Trace:
[ 31.366620] tracepoint_probe_register+0x58/0x90
[ 31.366625] trace_event_reg+0x56/0xa0
[ 31.366626] perf_trace_event_reg.isra.2+0xba/0x130
[ 31.366627] perf_trace_event_init+0x32/0x70
[ 31.366628] perf_trace_init+0x6e/0xa0
[ 31.366631] perf_tp_event_init+0x28/0x40
[ 31.366632] perf_try_init_event+0xe7/0x130
[ 31.366633] perf_event_alloc+0x4ba/0xdc0
[ 31.366635] ? __alloc_fd+0x40/0x160
[ 31.366637] __do_sys_perf_event_open+0x2bb/0xe90
[ 31.366638] __x64_sys_perf_event_open+0x20/0x30
[ 31.366641] do_syscall_64+0x31/0x40
[ 31.366644] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 31.366645] RIP: 0033:0x7f9be4886d5d
[ 31.366646] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 488
[ 31.366646] RSP: 002b:00007ffdcf6e4d68 EFLAGS: 00000246 ORIG_RAX:
000000000000012a
[ 31.366647] RAX: ffffffffffffffda RBX: 00007ffdcf6e5e6c RCX: 00007f9be4886d5d
[ 31.366647] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00007ffdcf6e4d80
[ 31.366648] RBP: 00007ffdcf6e4d80 R08: 0000000000000008 R09: 000056095c148578
[ 31.366651] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000000b
[ 31.366652] R13: 00000000ffffffff R14: 00007ffdcf6e5e70 R15: 00007ffdcf6e4e00
[ 31.366653] ---[ end trace 25e0e681c1442c93 ]---

From `WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);`:

[ 31.367345] ------------[ cut here ]------------
[ 31.367347] WARNING: CPU: 0 PID: 235 at kernel/tracepoint.c:243
tracepoint_add_func+0xb2/0x300
[ 31.368612] Modules linked in: xhci_pci xhci_hcd virtio_net
net_failover failover virtio_crypto crypto_engine virtio_console
virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_ht
[ 31.368615] CPU: 0 PID: 235 Comm: bpftrace Tainted: G W
5.9.0-smp-DEV #1
[ 31.368615] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.13.0-1 04/01/2014
[ 31.368617] RIP: 0010:tracepoint_add_func+0xb2/0x300
[ 31.368617] Code: c6 48 8b 01 48 85 c0 74 28 41 83 fd ff 75 df 3b
51 10 44 0f 4f eb 48 39 c6 75 d8 49 8b 40 08 48 39 41 08 75 ce b8 ef
ff ff ff <0f> 0b e9 25 01 00 00 8d 43 02d
[ 31.368618] RSP: 0018:ffff911554fdbc50 EFLAGS: 00010246
[ 31.368618] RAX: 00000000ffffffea RBX: ffff9115583c1200 RCX: 0000000000000000
[ 31.368619] RDX: 000000000000000a RSI: 0000000000000000 RDI: ffff9115583c1200
[ 31.368619] RBP: ffff911554fdbc90 R08: ffff911554fdbca0 R09: ffffcaabffdb5018
[ 31.368620] R10: 0000000000000080 R11: 0000000000000080 R12: ffff911558013c90
[ 31.368620] R13: 0000000000000000 R14: ffff911554f84d10 R15: ffff9115583c1200
[ 31.368621] FS: 00007f9bdf8f0c00(0000) GS:ffff91155fc00000(0000)
knlGS:0000000000000000
[ 31.368621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.368623] CR2: 00007f9bdefbc000 CR3: 0000000814fe0006 CR4: 0000000000370ef0
[ 31.368624] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 31.368624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 31.368625] Call Trace:
[ 31.368626] tracepoint_probe_register+0x58/0x90
[ 31.368627] trace_event_reg+0x56/0xa0
[ 31.368628] perf_trace_event_reg.isra.2+0xba/0x130
[ 31.368629] perf_trace_event_init+0x32/0x70
[ 31.368630] perf_trace_init+0x6e/0xa0
[ 31.368631] perf_tp_event_init+0x28/0x40
[ 31.368632] perf_try_init_event+0xe7/0x130
[ 31.368632] perf_event_alloc+0x4ba/0xdc0
[ 31.368634] ? __alloc_fd+0x40/0x160
[ 31.368635] __do_sys_perf_event_open+0x2bb/0xe90
[ 31.368636] __x64_sys_perf_event_open+0x20/0x30
[ 31.368637] do_syscall_64+0x31/0x40
[ 31.368638] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 31.368639] RIP: 0033:0x7f9be4886d5d
[ 31.368639] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 488
[ 31.368640] RSP: 002b:00007ffdcf6e4d68 EFLAGS: 00000246 ORIG_RAX:
000000000000012a
[ 31.368640] RAX: ffffffffffffffda RBX: 00007ffdcf6e5e6c RCX: 00007f9be4886d5d
[ 31.368641] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00007ffdcf6e4d80
[ 31.368641] RBP: 00007ffdcf6e4d80 R08: 0000000000000008 R09: 000056095c148578
[ 31.368642] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000000b
[ 31.368642] R13: 00000000ffffffff R14: 00007ffdcf6e5e70 R15: 00007ffdcf6e4e00
[ 31.368643] ---[ end trace 25e0e681c1442c94 ]---
[ 31.377271] audit: type=1334 audit(1601414765.252:35): prog-id=11 op=UNLOAD

>
> Thanks,
>
> Tom
>
> The following changes since commit 106c076d23cca67c959a6fd1ccadb5b3ef01ddc9:
>
> mmap_lock: add tracepoints around lock acquisition (2020-09-23 08:48:08 -0500)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/zanussi/linux-trace.git ftrace/synth-dynstring-v0
>
> Tom Zanussi (3):
> tracing: Change STR_VAR_MAX_LEN
> tracing: Fix parse_synth_field() error handling
> tracing: Add support for dynamic strings to synthetic events
>
> Documentation/trace/events.rst | 15 ++-
> Documentation/trace/histogram.rst | 18 ++++
> kernel/trace/trace_events_hist.c | 9 ++
> kernel/trace/trace_events_synth.c | 163 ++++++++++++++++++++++++++----
> kernel/trace/trace_synth.h | 6 +-
> 5 files changed, 186 insertions(+), 25 deletions(-)
>
> --
> 2.17.1
>

2020-09-30 12:12:27

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH 0/3] tracing: Add dynamic strings for synthetic events

Hi Axel,

On Tue, 2020-09-29 at 15:19 -0700, Axel Rasmussen wrote:
> On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]>
> wrote:
> >
> > Hi,
> >
> > This patchset adds support for dynamic strings for synthetic
> > events,
> > as requested by Axel Rasmussen.
> >
> > Actually, the first two patches should be applied in any case - the
> > first just changes the current max string length and the second
> > fixes
> > a bug I found while testing.
> >
> > It works fine for my simple test cases, but I still need to do a
> > lot
> > more testing, especially of the in-kernel API parts, which don't
> > affect Axel's use case.
> >
> > Anyway, Axel, please try it out and send me your Tested-by: if it
> > works for you, and your broken testcase(s) if it doesn't. ;-)
>
> Thanks for working on this!
>
> Capturing a histogram with a histogram trigger on the synthetic
> event[1] seems to work as expected for me.
>
> However, I tried creating the synthetic event, and then capturing a
> histogram from it using bpftrace [2], and this doesn't seem to work.
> It's completely possible I'm just doing something wrong, or the
> bpftrace I'm using is buggy (getting it to work with __data_loc
> fields
> at all requires a super bleeding edge bpftrace [3]). bpftrace reports
> that it fails to attach to the synthetic event, and I hit some
> WARN_ONs [4].
>
> Does it seem possible this issue is related? If not, feel free to
> disregard. :) I'll spend some more time digging into this tomorrow.
>

It's possible, if the bpf stuff is using the in-kernel trace events
API, the part that I mentioned I hadn't tested yet :-/. There's one
important piece missing for that in this patch - I'll be working on
adding it and posting a v2 today.

Tom

>
>
> [1]:
> echo 'mmap_lock_latency u64 time; char memcg_path[], bool write' > \
> /sys/kernel/tracing/synthetic_events
>
> echo 'hist:keys=memcg_path,write:ts0=common_timestamp.usecs' > \
> /sys/kernel/tracing/events/mmap_lock/mmap_lock_start_locking/
> trigger
>
> echo 'hist:keys=memcg_path,write:latency=common_timestamp.usecs-
> $ts0:onmatch(mmap_lock.mmap_lock_start_locking).mmap_lock_latency($la
> tency,memcg_path,write)'
> > \
>
> /sys/kernel/tracing/events/mmap_lock/mmap_lock_acquire_return
> ed/trigger
>
> echo 'hist:keys=memcg_path,write:values=time' > \
> /sys/kernel/tracing/events/synthetic/mmap_lock_latency/trigge
> r
>
> [2]:
> (Same setup of the synthetic event as [1], except adding the
> histogram
> trigger...)
>
> cat <<EOF > /root/bpftypes.h
> #include <stdbool.h>
> #include <stdint.h>
> typedef uint64_t u64;
> typedef int64_t s64;
> EOF
>
> cat <<EOF > /root/mmap-lock-per-memcg.bpf
> tracepoint:synthetic:mmap_lock_latency {
> @latency[args->memcg_path] = hist(args->time);
> }
> EOF
>
> /home/axelrasmussen/bpftrace/build/src/bpftrace \
> -v \
> -I "/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include" \
> --include "/root/bpftypes.h" \
> /root/mmap-lock-per-memcg.bpf
>
> [3]: https://github.com/iovisor/bpftrace/pull/1542
>
> [4]:
> [ 31.361814] kauditd_printk_skb: 6 callbacks suppressed
> [ 31.361815] audit: type=1334 audit(1601414765.236:34): prog-id=11
> op=LOAD
>
> From `WARN_ON(!tp_func->func)`:
>
> [ 31.365310] ------------[ cut here ]------------
> [ 31.365316] WARNING: CPU: 0 PID: 235 at kernel/tracepoint.c:136
> tracepoint_add_func+0x2ad/0x300
> [ 31.366594] Modules linked in: xhci_pci xhci_hcd virtio_net
> net_failover failover virtio_crypto crypto_engine virtio_console
> virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_ht
> [ 31.366603] CPU: 0 PID: 235 Comm: bpftrace Not tainted 5.9.0-smp-
> DEV #1
> [ 31.366604] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.13.0-1 04/01/2014
> [ 31.366606] RIP: 0010:tracepoint_add_func+0x2ad/0x300
> [ 31.366607] Code: 17 ff ff ff 4c 89 ff e8 e1 0c 0a 00 e9 22 ff ff
> ff 41 ff 57 18 85 c0 4c 8b 04 24 8b 54 24 08 0f 89 9c fd ff ff e9 31
> ff ff ff <0f> 0b b8 ea ff ff ff e9 f9 fd5
> [ 31.366607] RSP: 0018:ffff911554fdbc50 EFLAGS: 00010246
> [ 31.366608] RAX: 0000000000000000 RBX: ffff9115583c1200 RCX:
> 0000000000000000
> [ 31.366609] RDX: 000000000000000a RSI: 0000000000000000 RDI:
> ffff9115583c1200
> [ 31.366609] RBP: ffff911554fdbc90 R08: ffff911554fdbca0 R09:
> ffffcaabffdb5018
> [ 31.366610] R10: 0000000000000080 R11: 0000000000000080 R12:
> ffff911558013c90
> [ 31.366610] R13: 0000000000000000 R14: ffff911554f84d10 R15:
> ffff9115583c1200
> [ 31.366611] FS: 00007f9bdf8f0c00(0000) GS:ffff91155fc00000(0000)
> knlGS:0000000000000000
> [ 31.366612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 31.366614] CR2: 00007f9bdefbc000 CR3: 0000000814fe0006 CR4:
> 0000000000370ef0
> [ 31.366615] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 31.366615] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 31.366616] Call Trace:
> [ 31.366620] tracepoint_probe_register+0x58/0x90
> [ 31.366625] trace_event_reg+0x56/0xa0
> [ 31.366626] perf_trace_event_reg.isra.2+0xba/0x130
> [ 31.366627] perf_trace_event_init+0x32/0x70
> [ 31.366628] perf_trace_init+0x6e/0xa0
> [ 31.366631] perf_tp_event_init+0x28/0x40
> [ 31.366632] perf_try_init_event+0xe7/0x130
> [ 31.366633] perf_event_alloc+0x4ba/0xdc0
> [ 31.366635] ? __alloc_fd+0x40/0x160
> [ 31.366637] __do_sys_perf_event_open+0x2bb/0xe90
> [ 31.366638] __x64_sys_perf_event_open+0x20/0x30
> [ 31.366641] do_syscall_64+0x31/0x40
> [ 31.366644] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 31.366645] RIP: 0033:0x7f9be4886d5d
> [ 31.366646] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 488
> [ 31.366646] RSP: 002b:00007ffdcf6e4d68 EFLAGS: 00000246 ORIG_RAX:
> 000000000000012a
> [ 31.366647] RAX: ffffffffffffffda RBX: 00007ffdcf6e5e6c RCX:
> 00007f9be4886d5d
> [ 31.366647] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> 00007ffdcf6e4d80
> [ 31.366648] RBP: 00007ffdcf6e4d80 R08: 0000000000000008 R09:
> 000056095c148578
> [ 31.366651] R10: 00000000ffffffff R11: 0000000000000246 R12:
> 000000000000000b
> [ 31.366652] R13: 00000000ffffffff R14: 00007ffdcf6e5e70 R15:
> 00007ffdcf6e4e00
> [ 31.366653] ---[ end trace 25e0e681c1442c93 ]---
>
> From `WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);`:
>
> [ 31.367345] ------------[ cut here ]------------
> [ 31.367347] WARNING: CPU: 0 PID: 235 at kernel/tracepoint.c:243
> tracepoint_add_func+0xb2/0x300
> [ 31.368612] Modules linked in: xhci_pci xhci_hcd virtio_net
> net_failover failover virtio_crypto crypto_engine virtio_console
> virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_ht
> [ 31.368615] CPU: 0 PID: 235 Comm: bpftrace Tainted: G W
> 5.9.0-smp-DEV #1
> [ 31.368615] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.13.0-1 04/01/2014
> [ 31.368617] RIP: 0010:tracepoint_add_func+0xb2/0x300
> [ 31.368617] Code: c6 48 8b 01 48 85 c0 74 28 41 83 fd ff 75 df 3b
> 51 10 44 0f 4f eb 48 39 c6 75 d8 49 8b 40 08 48 39 41 08 75 ce b8 ef
> ff ff ff <0f> 0b e9 25 01 00 00 8d 43 02d
> [ 31.368618] RSP: 0018:ffff911554fdbc50 EFLAGS: 00010246
> [ 31.368618] RAX: 00000000ffffffea RBX: ffff9115583c1200 RCX:
> 0000000000000000
> [ 31.368619] RDX: 000000000000000a RSI: 0000000000000000 RDI:
> ffff9115583c1200
> [ 31.368619] RBP: ffff911554fdbc90 R08: ffff911554fdbca0 R09:
> ffffcaabffdb5018
> [ 31.368620] R10: 0000000000000080 R11: 0000000000000080 R12:
> ffff911558013c90
> [ 31.368620] R13: 0000000000000000 R14: ffff911554f84d10 R15:
> ffff9115583c1200
> [ 31.368621] FS: 00007f9bdf8f0c00(0000) GS:ffff91155fc00000(0000)
> knlGS:0000000000000000
> [ 31.368621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 31.368623] CR2: 00007f9bdefbc000 CR3: 0000000814fe0006 CR4:
> 0000000000370ef0
> [ 31.368624] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 31.368624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 31.368625] Call Trace:
> [ 31.368626] tracepoint_probe_register+0x58/0x90
> [ 31.368627] trace_event_reg+0x56/0xa0
> [ 31.368628] perf_trace_event_reg.isra.2+0xba/0x130
> [ 31.368629] perf_trace_event_init+0x32/0x70
> [ 31.368630] perf_trace_init+0x6e/0xa0
> [ 31.368631] perf_tp_event_init+0x28/0x40
> [ 31.368632] perf_try_init_event+0xe7/0x130
> [ 31.368632] perf_event_alloc+0x4ba/0xdc0
> [ 31.368634] ? __alloc_fd+0x40/0x160
> [ 31.368635] __do_sys_perf_event_open+0x2bb/0xe90
> [ 31.368636] __x64_sys_perf_event_open+0x20/0x30
> [ 31.368637] do_syscall_64+0x31/0x40
> [ 31.368638] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 31.368639] RIP: 0033:0x7f9be4886d5d
> [ 31.368639] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 488
> [ 31.368640] RSP: 002b:00007ffdcf6e4d68 EFLAGS: 00000246 ORIG_RAX:
> 000000000000012a
> [ 31.368640] RAX: ffffffffffffffda RBX: 00007ffdcf6e5e6c RCX:
> 00007f9be4886d5d
> [ 31.368641] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> 00007ffdcf6e4d80
> [ 31.368641] RBP: 00007ffdcf6e4d80 R08: 0000000000000008 R09:
> 000056095c148578
> [ 31.368642] R10: 00000000ffffffff R11: 0000000000000246 R12:
> 000000000000000b
> [ 31.368642] R13: 00000000ffffffff R14: 00007ffdcf6e5e70 R15:
> 00007ffdcf6e4e00
> [ 31.368643] ---[ end trace 25e0e681c1442c94 ]---
> [ 31.377271] audit: type=1334 audit(1601414765.252:35): prog-id=11
> op=UNLOAD
>
> >
> > Thanks,
> >
> > Tom
> >
> > The following changes since commit
> > 106c076d23cca67c959a6fd1ccadb5b3ef01ddc9:
> >
> > mmap_lock: add tracepoints around lock acquisition (2020-09-23
> > 08:48:08 -0500)
> >
> > are available in the Git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/zanussi/linux-
> > trace.git ftrace/synth-dynstring-v0
> >
> > Tom Zanussi (3):
> > tracing: Change STR_VAR_MAX_LEN
> > tracing: Fix parse_synth_field() error handling
> > tracing: Add support for dynamic strings to synthetic events
> >
> > Documentation/trace/events.rst | 15 ++-
> > Documentation/trace/histogram.rst | 18 ++++
> > kernel/trace/trace_events_hist.c | 9 ++
> > kernel/trace/trace_events_synth.c | 163
> > ++++++++++++++++++++++++++----
> > kernel/trace/trace_synth.h | 6 +-
> > 5 files changed, 186 insertions(+), 25 deletions(-)
> >
> > --
> > 2.17.1
> >

2020-09-30 18:42:26

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH 2/3] tracing: Fix parse_synth_field() error handling

Hi Axel,

On Tue, 2020-09-29 at 14:56 -0700, Axel Rasmussen wrote:
> On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]>
> wrote:
> >
> > synth_field_size() returns either the size or an error. However,
> > the
> > code assigns the return val to ssize_t which is unsigned, and then
> > tests whether it's less than 0, which it isn't so discards the
> > error.
>
> I think the patch is correct, but the commit message is not.
> field->size is a size_t (unsigned), not an ssize_t (signed). I think
> this should say instead something like:
>
> synth_field_size() returns either a positive size or an error (zero
> or
> a negative value). However, the existing code assumes the only error
> value is 0. It doesn't handle negative error codes, as it assigns
> directly to field->size (a size_t; unsigned), thereby interpreting
> the
> error code as a valid size instead.
>

Yes, that's better - I used the above text in v2.

> >
> > Do the test before assignment to field->size.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
> > ---
> > kernel/trace/trace_events_synth.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/trace/trace_events_synth.c
> > b/kernel/trace/trace_events_synth.c
> > index a9cd7793f7ea..6e7282c7b530 100644
> > --- a/kernel/trace/trace_events_synth.c
> > +++ b/kernel/trace/trace_events_synth.c
> > @@ -465,6 +465,7 @@ static struct synth_field
> > *parse_synth_field(int argc, const char **argv,
> > struct synth_field *field;
> > const char *prefix = NULL, *field_type = argv[0],
> > *field_name, *array;
> > int len, ret = 0;
> > + int size;
>
> Why not make this an ssize_t

Yep, makes sense. Changed in v2, thanks!

Tom

>
> >
> > if (field_type[0] == ';')
> > field_type++;
> > @@ -520,11 +521,12 @@ static struct synth_field
> > *parse_synth_field(int argc, const char **argv,
> > field->type[len - 1] = '\0';
> > }
> >
> > - field->size = synth_field_size(field->type);
> > - if (!field->size) {
> > + size = synth_field_size(field->type);
> > + if (size < 0) {
> > ret = -EINVAL;
> > goto free;
> > }
> > + field->size = size;
> >
> > if (synth_field_is_string(field->type))
> > field->is_string = true;
> > --
> > 2.17.1
> >