Hi,
This patchset adds support for dynamic strings for synthetic events,
as requested by Axel Rasmussen.
Actually, the first two patches should be applied in any case - the
first just changes the current max string length and the second fixes
a bug I found while testing.
It works fine for my simple test cases, but I still need to do a lot
more testing, especially of the in-kernel API parts, which don't
affect Axel's use case.
Anyway, Axel, please try it out and send me your Tested-by: if it
works for you, and your broken testcase(s) if it doesn't. ;-)
Thanks,
Tom
The following changes since commit 106c076d23cca67c959a6fd1ccadb5b3ef01ddc9:
mmap_lock: add tracepoints around lock acquisition (2020-09-23 08:48:08 -0500)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/zanussi/linux-trace.git ftrace/synth-dynstring-v0
Tom Zanussi (3):
tracing: Change STR_VAR_MAX_LEN
tracing: Fix parse_synth_field() error handling
tracing: Add support for dynamic strings to synthetic events
Documentation/trace/events.rst | 15 ++-
Documentation/trace/histogram.rst | 18 ++++
kernel/trace/trace_events_hist.c | 9 ++
kernel/trace/trace_events_synth.c | 163 ++++++++++++++++++++++++++----
kernel/trace/trace_synth.h | 6 +-
5 files changed, 186 insertions(+), 25 deletions(-)
--
2.17.1
synth_field_size() returns either the size or an error. However, the
code assigns the return val to ssize_t which is unsigned, and then
tests whether it's less than 0, which it isn't so discards the error.
Do the test before assignment to field->size.
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_events_synth.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index a9cd7793f7ea..6e7282c7b530 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -465,6 +465,7 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
struct synth_field *field;
const char *prefix = NULL, *field_type = argv[0], *field_name, *array;
int len, ret = 0;
+ int size;
if (field_type[0] == ';')
field_type++;
@@ -520,11 +521,12 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
field->type[len - 1] = '\0';
}
- field->size = synth_field_size(field->type);
- if (!field->size) {
+ size = synth_field_size(field->type);
+ if (size < 0) {
ret = -EINVAL;
goto free;
}
+ field->size = size;
if (synth_field_is_string(field->type))
field->is_string = true;
--
2.17.1
32 is too small for this value, and anyway it makes more sense to use
MAX_FILTER_STR_VAL, as this is also the value used for variable-length
__strings.
Signed-off-by: Tom Zanussi <[email protected]>
---
kernel/trace/trace_synth.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h
index ac35c45207c4..5166705d1556 100644
--- a/kernel/trace/trace_synth.h
+++ b/kernel/trace/trace_synth.h
@@ -7,7 +7,7 @@
#define SYNTH_SYSTEM "synthetic"
#define SYNTH_FIELDS_MAX 32
-#define STR_VAR_LEN_MAX 32 /* must be multiple of sizeof(u64) */
+#define STR_VAR_LEN_MAX MAX_FILTER_STR_VAL /* must be multiple of sizeof(u64) */
struct synth_field {
char *type;
--
2.17.1
Currently, sythetic events only support static string fields such as:
# echo 'test_latency u64 lat; char somename[32]' > /sys/kernel/debug/tracing/synthetic_events
Which is fine, but wastes a lot of space in the event.
It also prevents the most commonly-defined strings in the existing
trace events e.g. those defined using __string(), from being passed to
synthetic events via the trace() action.
With this change, synthetic events with dynamic fields can be defined:
# echo 'test_latency u64 lat; char somename[]' > /sys/kernel/debug/tracing/synthetic_events
And the trace() action can be used to generate events using either
dynamic or static strings:
# echo 'hist:keys=name:lat=common_timestamp.usecs-$ts0:onmatch(sys.event).test_latency($lat,name)' > /sys/kernel/debug/tracing/events
The synthetic event dynamic strings are implemented in the same way as
the existing __data_loc strings and appear as such in the format file.
Signed-off-by: Tom Zanussi <[email protected]>
---
Documentation/trace/events.rst | 15 ++-
Documentation/trace/histogram.rst | 18 ++++
kernel/trace/trace_events_hist.c | 9 ++
kernel/trace/trace_events_synth.c | 157 ++++++++++++++++++++++++++----
kernel/trace/trace_synth.h | 4 +
5 files changed, 181 insertions(+), 22 deletions(-)
diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index f792b1959a33..2a5aa48eff6c 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -589,8 +589,19 @@ name::
{ .type = "int", .name = "my_int_field" },
};
-See synth_field_size() for available types. If field_name contains [n]
-the field is considered to be an array.
+See synth_field_size() for available types.
+
+If field_name contains [n], the field is considered to be a static array.
+
+If field_names contains[] (no subscript), the field is considered to
+be a dynamic array, which will only take as much space in the event as
+is required to hold the array.
+
+Because space for an event is reserved before assigning field values
+to the event, using dynamic arrays implies that the piecewise
+in-kernel API described below can't be used with dynamic arrays. The
+other non-piecewise in-kernel APIs can, however, be used with dynamic
+arrays.
If the event is created from within a module, a pointer to the module
must be passed to synth_event_create(). This will ensure that the
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index 8408670d0328..b573604deabd 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -1776,6 +1776,24 @@ consisting of the name of the new event along with one or more
variables and their types, which can be any valid field type,
separated by semicolons, to the tracing/synthetic_events file.
+See synth_field_size() for available types.
+
+If field_name contains [n], the field is considered to be a static array.
+
+If field_names contains[] (no subscript), the field is considered to
+be a dynamic array, which will only take as much space in the event as
+is required to hold the array.
+
+A string field can be specified using either the static notation:
+
+ char name[32];
+
+Or the dynamic:
+
+ char name[];
+
+The size limit for either is 256.
+
For instance, the following creates a new event named 'wakeup_latency'
with 3 fields: lat, pid, and prio. Each of those fields is simply a
variable reference to a variable on another event::
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 1b2ef6490229..ee6b27c82350 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -3279,6 +3279,15 @@ static int check_synth_field(struct synth_event *event,
field = event->fields[field_pos];
+ /*
+ * A dynamic string synth field can accept static or
+ * dynamic. A static string synth field can only accept a
+ * same-sized static string, which is checked for later.
+ */
+ if (strstr(hist_field->type, "char[") && field->is_string
+ && field->is_dynamic)
+ return 0;
+
if (strcmp(field->type, hist_field->type) != 0) {
if (field->size != hist_field->size ||
field->is_signed != hist_field->is_signed)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 6e7282c7b530..00c6c61038d1 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -88,7 +88,7 @@ static int synth_event_define_fields(struct trace_event_call *call)
event->fields[i]->offset = n_u64;
- if (event->fields[i]->is_string) {
+ if (event->fields[i]->is_string && !event->fields[i]->is_dynamic) {
offset += STR_VAR_LEN_MAX;
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
} else {
@@ -139,6 +139,9 @@ static int synth_field_string_size(char *type)
if (len > 3)
return -EINVAL;
+ if (len == 0)
+ return 0; /* variable-length string */
+
strncpy(buf, start, len);
buf[len] = '\0';
@@ -290,10 +293,25 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
/* parameter values */
if (se->fields[i]->is_string) {
- trace_seq_printf(s, print_fmt, se->fields[i]->name,
- (char *)&entry->fields[n_u64],
- i == se->n_fields - 1 ? "" : " ");
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ if (se->fields[i]->is_dynamic) {
+ u32 offset, data_offset;
+ char *str_field;
+
+ offset = (u32)entry->fields[n_u64];
+ data_offset = offset & 0xffff;
+
+ str_field = (char *)entry + data_offset;
+
+ trace_seq_printf(s, print_fmt, se->fields[i]->name,
+ str_field,
+ i == se->n_fields - 1 ? "" : " ");
+ n_u64++;
+ } else {
+ trace_seq_printf(s, print_fmt, se->fields[i]->name,
+ (char *)&entry->fields[n_u64],
+ i == se->n_fields - 1 ? "" : " ");
+ n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ }
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
@@ -325,16 +343,52 @@ static struct trace_event_functions synth_event_funcs = {
.trace = print_synth_event
};
+static unsigned int trace_string(struct synth_trace_event *entry,
+ struct synth_event *event,
+ char *str_val,
+ bool is_dynamic,
+ unsigned int data_size,
+ unsigned int *n_u64)
+{
+ unsigned int len = 0;
+ char *str_field;
+
+ if (is_dynamic) {
+ u32 data_offset;
+
+ data_offset = offsetof(typeof(*entry), fields);
+ data_offset += event->n_u64 * sizeof(u64);
+ data_offset += data_size;
+
+ str_field = (char *)entry + data_offset;
+
+ len = strlen(str_val) + 1;
+ strscpy(str_field, str_val, len);
+
+ data_offset |= len << 16;
+ *(u32 *)&entry->fields[*n_u64] = data_offset;
+
+ (*n_u64)++;
+ } else {
+ str_field = (char *)&entry->fields[*n_u64];
+
+ strscpy(str_field, str_val, STR_VAR_LEN_MAX);
+ (*n_u64) += STR_VAR_LEN_MAX / sizeof(u64);
+ }
+
+ return len;
+}
+
static notrace void trace_event_raw_event_synth(void *__data,
u64 *var_ref_vals,
unsigned int *var_ref_idx)
{
+ unsigned int i, n_u64, val_idx, len, data_size = 0;
struct trace_event_file *trace_file = __data;
struct synth_trace_event *entry;
struct trace_event_buffer fbuffer;
struct trace_buffer *buffer;
struct synth_event *event;
- unsigned int i, n_u64, val_idx;
int fields_size = 0;
event = trace_file->event_call->data;
@@ -344,6 +398,18 @@ static notrace void trace_event_raw_event_synth(void *__data,
fields_size = event->n_u64 * sizeof(u64);
+ for (i = 0; i < event->n_dynamic_fields; i++) {
+ unsigned int field_pos = event->dynamic_fields[i]->field_pos;
+ char *str_val;
+
+ val_idx = var_ref_idx[field_pos];
+ str_val = (char *)(long)var_ref_vals[val_idx];
+
+ len = strlen(str_val) + 1;
+
+ fields_size += len;
+ }
+
/*
* Avoid ring buffer recursion detection, as this event
* is being performed within another event.
@@ -360,10 +426,11 @@ static notrace void trace_event_raw_event_synth(void *__data,
val_idx = var_ref_idx[i];
if (event->fields[i]->is_string) {
char *str_val = (char *)(long)var_ref_vals[val_idx];
- char *str_field = (char *)&entry->fields[n_u64];
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ len = trace_string(entry, event, str_val,
+ event->fields[i]->is_dynamic,
+ data_size, &n_u64);
+ data_size += len; /* only dynamic string increments */
} else {
struct synth_field *field = event->fields[i];
u64 val = var_ref_vals[val_idx];
@@ -525,6 +592,27 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
if (size < 0) {
ret = -EINVAL;
goto free;
+ } else if (size == 0) {
+ if (synth_field_is_string(field->type)) {
+ char *type;
+
+ type = kzalloc(sizeof("__data_loc ") + strlen(field->type) + 1, GFP_KERNEL);
+ if (!type) {
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ strcat(type, "__data_loc ");
+ strcat(type, field->type);
+ kfree(field->type);
+ field->type = type;
+
+ field->is_dynamic = true;
+ size = sizeof(u64);
+ } else {
+ ret = -EINVAL;
+ goto free;
+ }
}
field->size = size;
@@ -532,7 +620,6 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
field->is_string = true;
field->is_signed = synth_field_signed(field->type);
-
out:
return field;
free:
@@ -663,6 +750,7 @@ static void free_synth_event(struct synth_event *event)
free_synth_field(event->fields[i]);
kfree(event->fields);
+ kfree(event->dynamic_fields);
kfree(event->name);
kfree(event->class.system);
free_synth_tracepoint(event->tp);
@@ -673,8 +761,8 @@ static void free_synth_event(struct synth_event *event)
static struct synth_event *alloc_synth_event(const char *name, int n_fields,
struct synth_field **fields)
{
+ unsigned int i, j, n_dynamic_fields = 0;
struct synth_event *event;
- unsigned int i;
event = kzalloc(sizeof(*event), GFP_KERNEL);
if (!event) {
@@ -696,11 +784,33 @@ static struct synth_event *alloc_synth_event(const char *name, int n_fields,
goto out;
}
+ for (i = 0; i < n_fields; i++)
+ if (fields[i]->is_dynamic)
+ n_dynamic_fields++;
+
+ if (n_dynamic_fields) {
+ event->dynamic_fields = kcalloc(n_dynamic_fields,
+ sizeof(*event->dynamic_fields),
+ GFP_KERNEL);
+ if (!event->dynamic_fields) {
+ free_synth_event(event);
+ event = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+ }
+
dyn_event_init(&event->devent, &synth_event_ops);
- for (i = 0; i < n_fields; i++)
+ for (i = 0, j = 0; i < n_fields; i++) {
event->fields[i] = fields[i];
+ if (fields[i]->is_dynamic) {
+ event->dynamic_fields[j] = fields[i];
+ event->dynamic_fields[j]->field_pos = i;
+ event->dynamic_fields[j++] = fields[i];
+ event->n_dynamic_fields++;
+ }
+ }
event->n_fields = n_fields;
out:
return event;
@@ -1276,8 +1386,8 @@ __synth_event_trace_end(struct synth_event_trace_state *trace_state)
*/
int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
{
+ unsigned int i, n_u64, len, data_size = 0;
struct synth_event_trace_state state;
- unsigned int i, n_u64;
va_list args;
int ret;
@@ -1301,10 +1411,11 @@ int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
if (state.event->fields[i]->is_string) {
char *str_val = (char *)(long)val;
- char *str_field = (char *)&state.entry->fields[n_u64];
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ len = trace_string(state.entry, state.event, str_val,
+ state.event->fields[i]->is_dynamic,
+ data_size, &n_u64);
+ data_size += len; /* only dynamic string increments */
} else {
struct synth_field *field = state.event->fields[i];
@@ -1357,8 +1468,8 @@ EXPORT_SYMBOL_GPL(synth_event_trace);
int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
unsigned int n_vals)
{
+ unsigned int i, n_u64, len, data_size = 0;
struct synth_event_trace_state state;
- unsigned int i, n_u64;
int ret;
ret = __synth_event_trace_start(file, &state);
@@ -1376,10 +1487,11 @@ int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
if (state.event->fields[i]->is_string) {
char *str_val = (char *)(long)vals[i];
- char *str_field = (char *)&state.entry->fields[n_u64];
- strscpy(str_field, str_val, STR_VAR_LEN_MAX);
- n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
+ len = trace_string(state.entry, state.event, str_val,
+ state.event->fields[i]->is_dynamic,
+ data_size, &n_u64);
+ data_size += len; /* only dynamic string increments */
} else {
struct synth_field *field = state.event->fields[i];
u64 val = vals[i];
@@ -1510,6 +1622,11 @@ static int __synth_event_add_val(const char *field_name, u64 val,
char *str_val = (char *)(long)val;
char *str_field;
+ if (field->is_dynamic) { /* add_val can't do dynamic strings */
+ ret = -EINVAL;
+ goto out;
+ }
+
if (!str_val) {
ret = -EINVAL;
goto out;
diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h
index 5166705d1556..6e146b959dcd 100644
--- a/kernel/trace/trace_synth.h
+++ b/kernel/trace/trace_synth.h
@@ -16,6 +16,8 @@ struct synth_field {
unsigned int offset;
bool is_signed;
bool is_string;
+ bool is_dynamic;
+ bool field_pos;
};
struct synth_event {
@@ -24,6 +26,8 @@ struct synth_event {
char *name;
struct synth_field **fields;
unsigned int n_fields;
+ struct synth_field **dynamic_fields;
+ unsigned int n_dynamic_fields;
unsigned int n_u64;
struct trace_event_class class;
struct trace_event_call call;
--
2.17.1
On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]> wrote:
>
> synth_field_size() returns either the size or an error. However, the
> code assigns the return val to ssize_t which is unsigned, and then
> tests whether it's less than 0, which it isn't so discards the error.
I think the patch is correct, but the commit message is not.
field->size is a size_t (unsigned), not an ssize_t (signed). I think
this should say instead something like:
synth_field_size() returns either a positive size or an error (zero or
a negative value). However, the existing code assumes the only error
value is 0. It doesn't handle negative error codes, as it assigns
directly to field->size (a size_t; unsigned), thereby interpreting the
error code as a valid size instead.
>
> Do the test before assignment to field->size.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> kernel/trace/trace_events_synth.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
> index a9cd7793f7ea..6e7282c7b530 100644
> --- a/kernel/trace/trace_events_synth.c
> +++ b/kernel/trace/trace_events_synth.c
> @@ -465,6 +465,7 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
> struct synth_field *field;
> const char *prefix = NULL, *field_type = argv[0], *field_name, *array;
> int len, ret = 0;
> + int size;
Why not make this an ssize_t
>
> if (field_type[0] == ';')
> field_type++;
> @@ -520,11 +521,12 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
> field->type[len - 1] = '\0';
> }
>
> - field->size = synth_field_size(field->type);
> - if (!field->size) {
> + size = synth_field_size(field->type);
> + if (size < 0) {
> ret = -EINVAL;
> goto free;
> }
> + field->size = size;
>
> if (synth_field_is_string(field->type))
> field->is_string = true;
> --
> 2.17.1
>
On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]> wrote:
>
> Currently, sythetic events only support static string fields such as:
>
> # echo 'test_latency u64 lat; char somename[32]' > /sys/kernel/debug/tracing/synthetic_events
>
> Which is fine, but wastes a lot of space in the event.
>
> It also prevents the most commonly-defined strings in the existing
> trace events e.g. those defined using __string(), from being passed to
> synthetic events via the trace() action.
>
> With this change, synthetic events with dynamic fields can be defined:
>
> # echo 'test_latency u64 lat; char somename[]' > /sys/kernel/debug/tracing/synthetic_events
>
> And the trace() action can be used to generate events using either
> dynamic or static strings:
>
> # echo 'hist:keys=name:lat=common_timestamp.usecs-$ts0:onmatch(sys.event).test_latency($lat,name)' > /sys/kernel/debug/tracing/events
>
> The synthetic event dynamic strings are implemented in the same way as
> the existing __data_loc strings and appear as such in the format file.
>
> Signed-off-by: Tom Zanussi <[email protected]>
> ---
> Documentation/trace/events.rst | 15 ++-
> Documentation/trace/histogram.rst | 18 ++++
> kernel/trace/trace_events_hist.c | 9 ++
> kernel/trace/trace_events_synth.c | 157 ++++++++++++++++++++++++++----
> kernel/trace/trace_synth.h | 4 +
> 5 files changed, 181 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
> index f792b1959a33..2a5aa48eff6c 100644
> --- a/Documentation/trace/events.rst
> +++ b/Documentation/trace/events.rst
> @@ -589,8 +589,19 @@ name::
> { .type = "int", .name = "my_int_field" },
> };
>
> -See synth_field_size() for available types. If field_name contains [n]
> -the field is considered to be an array.
> +See synth_field_size() for available types.
> +
> +If field_name contains [n], the field is considered to be a static array.
> +
> +If field_names contains[] (no subscript), the field is considered to
> +be a dynamic array, which will only take as much space in the event as
> +is required to hold the array.
> +
> +Because space for an event is reserved before assigning field values
> +to the event, using dynamic arrays implies that the piecewise
> +in-kernel API described below can't be used with dynamic arrays. The
> +other non-piecewise in-kernel APIs can, however, be used with dynamic
> +arrays.
>
> If the event is created from within a module, a pointer to the module
> must be passed to synth_event_create(). This will ensure that the
> diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
> index 8408670d0328..b573604deabd 100644
> --- a/Documentation/trace/histogram.rst
> +++ b/Documentation/trace/histogram.rst
> @@ -1776,6 +1776,24 @@ consisting of the name of the new event along with one or more
> variables and their types, which can be any valid field type,
> separated by semicolons, to the tracing/synthetic_events file.
>
> +See synth_field_size() for available types.
> +
> +If field_name contains [n], the field is considered to be a static array.
> +
> +If field_names contains[] (no subscript), the field is considered to
> +be a dynamic array, which will only take as much space in the event as
> +is required to hold the array.
> +
> +A string field can be specified using either the static notation:
> +
> + char name[32];
> +
> +Or the dynamic:
> +
> + char name[];
> +
> +The size limit for either is 256.
> +
> For instance, the following creates a new event named 'wakeup_latency'
> with 3 fields: lat, pid, and prio. Each of those fields is simply a
> variable reference to a variable on another event::
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 1b2ef6490229..ee6b27c82350 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -3279,6 +3279,15 @@ static int check_synth_field(struct synth_event *event,
>
> field = event->fields[field_pos];
>
> + /*
> + * A dynamic string synth field can accept static or
> + * dynamic. A static string synth field can only accept a
> + * same-sized static string, which is checked for later.
> + */
> + if (strstr(hist_field->type, "char[") && field->is_string
> + && field->is_dynamic)
> + return 0;
> +
> if (strcmp(field->type, hist_field->type) != 0) {
> if (field->size != hist_field->size ||
> field->is_signed != hist_field->is_signed)
> diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
> index 6e7282c7b530..00c6c61038d1 100644
> --- a/kernel/trace/trace_events_synth.c
> +++ b/kernel/trace/trace_events_synth.c
> @@ -88,7 +88,7 @@ static int synth_event_define_fields(struct trace_event_call *call)
>
> event->fields[i]->offset = n_u64;
>
> - if (event->fields[i]->is_string) {
> + if (event->fields[i]->is_string && !event->fields[i]->is_dynamic) {
> offset += STR_VAR_LEN_MAX;
> n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> } else {
> @@ -139,6 +139,9 @@ static int synth_field_string_size(char *type)
> if (len > 3)
> return -EINVAL;
>
> + if (len == 0)
> + return 0; /* variable-length string */
> +
> strncpy(buf, start, len);
> buf[len] = '\0';
>
> @@ -290,10 +293,25 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
>
> /* parameter values */
> if (se->fields[i]->is_string) {
> - trace_seq_printf(s, print_fmt, se->fields[i]->name,
> - (char *)&entry->fields[n_u64],
> - i == se->n_fields - 1 ? "" : " ");
> - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + if (se->fields[i]->is_dynamic) {
> + u32 offset, data_offset;
> + char *str_field;
> +
> + offset = (u32)entry->fields[n_u64];
> + data_offset = offset & 0xffff;
> +
> + str_field = (char *)entry + data_offset;
Is it better to re-use __get_str from include/trace/trace_events.h
instead of writing this out directly?
> +
> + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> + str_field,
> + i == se->n_fields - 1 ? "" : " ");
> + n_u64++;
> + } else {
> + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> + (char *)&entry->fields[n_u64],
> + i == se->n_fields - 1 ? "" : " ");
> + n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + }
> } else {
> struct trace_print_flags __flags[] = {
> __def_gfpflag_names, {-1, NULL} };
> @@ -325,16 +343,52 @@ static struct trace_event_functions synth_event_funcs = {
> .trace = print_synth_event
> };
>
> +static unsigned int trace_string(struct synth_trace_event *entry,
> + struct synth_event *event,
> + char *str_val,
> + bool is_dynamic,
> + unsigned int data_size,
> + unsigned int *n_u64)
> +{
> + unsigned int len = 0;
> + char *str_field;
> +
> + if (is_dynamic) {
> + u32 data_offset;
> +
> + data_offset = offsetof(typeof(*entry), fields);
> + data_offset += event->n_u64 * sizeof(u64);
> + data_offset += data_size;
> +
> + str_field = (char *)entry + data_offset;
> +
> + len = strlen(str_val) + 1;
> + strscpy(str_field, str_val, len);
> +
> + data_offset |= len << 16;
> + *(u32 *)&entry->fields[*n_u64] = data_offset;
Similar thing here, is it possible to reuse __dynamic_array or __string?
> +
> + (*n_u64)++;
> + } else {
> + str_field = (char *)&entry->fields[*n_u64];
> +
> + strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> + (*n_u64) += STR_VAR_LEN_MAX / sizeof(u64);
> + }
> +
> + return len;
> +}
> +
> static notrace void trace_event_raw_event_synth(void *__data,
> u64 *var_ref_vals,
> unsigned int *var_ref_idx)
> {
> + unsigned int i, n_u64, val_idx, len, data_size = 0;
> struct trace_event_file *trace_file = __data;
> struct synth_trace_event *entry;
> struct trace_event_buffer fbuffer;
> struct trace_buffer *buffer;
> struct synth_event *event;
> - unsigned int i, n_u64, val_idx;
> int fields_size = 0;
>
> event = trace_file->event_call->data;
> @@ -344,6 +398,18 @@ static notrace void trace_event_raw_event_synth(void *__data,
>
> fields_size = event->n_u64 * sizeof(u64);
>
> + for (i = 0; i < event->n_dynamic_fields; i++) {
> + unsigned int field_pos = event->dynamic_fields[i]->field_pos;
> + char *str_val;
> +
> + val_idx = var_ref_idx[field_pos];
> + str_val = (char *)(long)var_ref_vals[val_idx];
> +
> + len = strlen(str_val) + 1;
> +
> + fields_size += len;
> + }
> +
> /*
> * Avoid ring buffer recursion detection, as this event
> * is being performed within another event.
> @@ -360,10 +426,11 @@ static notrace void trace_event_raw_event_synth(void *__data,
> val_idx = var_ref_idx[i];
> if (event->fields[i]->is_string) {
> char *str_val = (char *)(long)var_ref_vals[val_idx];
> - char *str_field = (char *)&entry->fields[n_u64];
>
> - strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + len = trace_string(entry, event, str_val,
> + event->fields[i]->is_dynamic,
> + data_size, &n_u64);
> + data_size += len; /* only dynamic string increments */
> } else {
> struct synth_field *field = event->fields[i];
> u64 val = var_ref_vals[val_idx];
> @@ -525,6 +592,27 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
> if (size < 0) {
> ret = -EINVAL;
> goto free;
> + } else if (size == 0) {
> + if (synth_field_is_string(field->type)) {
> + char *type;
> +
> + type = kzalloc(sizeof("__data_loc ") + strlen(field->type) + 1, GFP_KERNEL);
> + if (!type) {
> + ret = -ENOMEM;
> + goto free;
> + }
> +
> + strcat(type, "__data_loc ");
> + strcat(type, field->type);
> + kfree(field->type);
> + field->type = type;
> +
> + field->is_dynamic = true;
> + size = sizeof(u64);
> + } else {
> + ret = -EINVAL;
> + goto free;
> + }
> }
> field->size = size;
>
> @@ -532,7 +620,6 @@ static struct synth_field *parse_synth_field(int argc, const char **argv,
> field->is_string = true;
>
> field->is_signed = synth_field_signed(field->type);
> -
> out:
> return field;
> free:
> @@ -663,6 +750,7 @@ static void free_synth_event(struct synth_event *event)
> free_synth_field(event->fields[i]);
>
> kfree(event->fields);
> + kfree(event->dynamic_fields);
> kfree(event->name);
> kfree(event->class.system);
> free_synth_tracepoint(event->tp);
> @@ -673,8 +761,8 @@ static void free_synth_event(struct synth_event *event)
> static struct synth_event *alloc_synth_event(const char *name, int n_fields,
> struct synth_field **fields)
> {
> + unsigned int i, j, n_dynamic_fields = 0;
> struct synth_event *event;
> - unsigned int i;
>
> event = kzalloc(sizeof(*event), GFP_KERNEL);
> if (!event) {
> @@ -696,11 +784,33 @@ static struct synth_event *alloc_synth_event(const char *name, int n_fields,
> goto out;
> }
>
> + for (i = 0; i < n_fields; i++)
> + if (fields[i]->is_dynamic)
> + n_dynamic_fields++;
> +
> + if (n_dynamic_fields) {
> + event->dynamic_fields = kcalloc(n_dynamic_fields,
> + sizeof(*event->dynamic_fields),
> + GFP_KERNEL);
> + if (!event->dynamic_fields) {
> + free_synth_event(event);
> + event = ERR_PTR(-ENOMEM);
> + goto out;
> + }
> + }
> +
> dyn_event_init(&event->devent, &synth_event_ops);
>
> - for (i = 0; i < n_fields; i++)
> + for (i = 0, j = 0; i < n_fields; i++) {
> event->fields[i] = fields[i];
>
> + if (fields[i]->is_dynamic) {
> + event->dynamic_fields[j] = fields[i];
> + event->dynamic_fields[j]->field_pos = i;
> + event->dynamic_fields[j++] = fields[i];
> + event->n_dynamic_fields++;
> + }
> + }
> event->n_fields = n_fields;
> out:
> return event;
> @@ -1276,8 +1386,8 @@ __synth_event_trace_end(struct synth_event_trace_state *trace_state)
> */
> int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
> {
> + unsigned int i, n_u64, len, data_size = 0;
> struct synth_event_trace_state state;
> - unsigned int i, n_u64;
> va_list args;
> int ret;
>
> @@ -1301,10 +1411,11 @@ int synth_event_trace(struct trace_event_file *file, unsigned int n_vals, ...)
>
> if (state.event->fields[i]->is_string) {
> char *str_val = (char *)(long)val;
> - char *str_field = (char *)&state.entry->fields[n_u64];
>
> - strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + len = trace_string(state.entry, state.event, str_val,
> + state.event->fields[i]->is_dynamic,
> + data_size, &n_u64);
> + data_size += len; /* only dynamic string increments */
> } else {
> struct synth_field *field = state.event->fields[i];
>
> @@ -1357,8 +1468,8 @@ EXPORT_SYMBOL_GPL(synth_event_trace);
> int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
> unsigned int n_vals)
> {
> + unsigned int i, n_u64, len, data_size = 0;
> struct synth_event_trace_state state;
> - unsigned int i, n_u64;
> int ret;
>
> ret = __synth_event_trace_start(file, &state);
> @@ -1376,10 +1487,11 @@ int synth_event_trace_array(struct trace_event_file *file, u64 *vals,
> for (i = 0, n_u64 = 0; i < state.event->n_fields; i++) {
> if (state.event->fields[i]->is_string) {
> char *str_val = (char *)(long)vals[i];
> - char *str_field = (char *)&state.entry->fields[n_u64];
>
> - strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> + len = trace_string(state.entry, state.event, str_val,
> + state.event->fields[i]->is_dynamic,
> + data_size, &n_u64);
> + data_size += len; /* only dynamic string increments */
> } else {
> struct synth_field *field = state.event->fields[i];
> u64 val = vals[i];
> @@ -1510,6 +1622,11 @@ static int __synth_event_add_val(const char *field_name, u64 val,
> char *str_val = (char *)(long)val;
> char *str_field;
>
> + if (field->is_dynamic) { /* add_val can't do dynamic strings */
> + ret = -EINVAL;
> + goto out;
> + }
> +
> if (!str_val) {
> ret = -EINVAL;
> goto out;
> diff --git a/kernel/trace/trace_synth.h b/kernel/trace/trace_synth.h
> index 5166705d1556..6e146b959dcd 100644
> --- a/kernel/trace/trace_synth.h
> +++ b/kernel/trace/trace_synth.h
> @@ -16,6 +16,8 @@ struct synth_field {
> unsigned int offset;
> bool is_signed;
> bool is_string;
> + bool is_dynamic;
> + bool field_pos;
> };
>
> struct synth_event {
> @@ -24,6 +26,8 @@ struct synth_event {
> char *name;
> struct synth_field **fields;
> unsigned int n_fields;
> + struct synth_field **dynamic_fields;
> + unsigned int n_dynamic_fields;
> unsigned int n_u64;
> struct trace_event_class class;
> struct trace_event_call call;
> --
> 2.17.1
>
On Tue, 29 Sep 2020 15:01:59 -0700
Axel Rasmussen <[email protected]> wrote:
> > event->fields[i]->offset = n_u64;
> >
> > - if (event->fields[i]->is_string) {
> > + if (event->fields[i]->is_string && !event->fields[i]->is_dynamic) {
> > offset += STR_VAR_LEN_MAX;
> > n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> > } else {
> > @@ -139,6 +139,9 @@ static int synth_field_string_size(char *type)
> > if (len > 3)
> > return -EINVAL;
> >
> > + if (len == 0)
> > + return 0; /* variable-length string */
> > +
> > strncpy(buf, start, len);
> > buf[len] = '\0';
> >
> > @@ -290,10 +293,25 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
> >
> > /* parameter values */
> > if (se->fields[i]->is_string) {
> > - trace_seq_printf(s, print_fmt, se->fields[i]->name,
> > - (char *)&entry->fields[n_u64],
> > - i == se->n_fields - 1 ? "" : " ");
> > - n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> > + if (se->fields[i]->is_dynamic) {
> > + u32 offset, data_offset;
> > + char *str_field;
> > +
> > + offset = (u32)entry->fields[n_u64];
> > + data_offset = offset & 0xffff;
> > +
> > + str_field = (char *)entry + data_offset;
>
> Is it better to re-use __get_str from include/trace/trace_events.h
> instead of writing this out directly?
>
> > +
> > + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> > + str_field,
> > + i == se->n_fields - 1 ? "" : " ");
> > + n_u64++;
> > + } else {
> > + trace_seq_printf(s, print_fmt, se->fields[i]->name,
> > + (char *)&entry->fields[n_u64],
> > + i == se->n_fields - 1 ? "" : " ");
> > + n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
> > + }
> > } else {
> > struct trace_print_flags __flags[] = {
> > __def_gfpflag_names, {-1, NULL} };
> > @@ -325,16 +343,52 @@ static struct trace_event_functions synth_event_funcs = {
> > .trace = print_synth_event
> > };
> >
> > +static unsigned int trace_string(struct synth_trace_event *entry,
> > + struct synth_event *event,
> > + char *str_val,
> > + bool is_dynamic,
> > + unsigned int data_size,
> > + unsigned int *n_u64)
> > +{
> > + unsigned int len = 0;
> > + char *str_field;
> > +
> > + if (is_dynamic) {
> > + u32 data_offset;
> > +
> > + data_offset = offsetof(typeof(*entry), fields);
> > + data_offset += event->n_u64 * sizeof(u64);
> > + data_offset += data_size;
> > +
> > + str_field = (char *)entry + data_offset;
> > +
> > + len = strlen(str_val) + 1;
> > + strscpy(str_field, str_val, len);
> > +
> > + data_offset |= len << 16;
> > + *(u32 *)&entry->fields[*n_u64] = data_offset;
>
> Similar thing here, is it possible to reuse __dynamic_array or __string?
Interesting idea.
I'd prefer to keep it broken out for this patch set, and then we could look
at incorporating this file to use the macros of <trace/trace_event.h>.
But I much rather have that be a separate patch that does that as an
enhancement than to include it in this file doing more work.
-- Steve
>
> > +
> > + (*n_u64)++;
> > + } else {
> > + str_field = (char *)&entry->fields[*n_u64];
> > +
> > + strscpy(str_field, str_val, STR_VAR_LEN_MAX);
> > + (*n_u64) += STR_VAR_LEN_MAX / sizeof(u64);
> > + }
> > +
> > + return len;
> > +}
> > +
> > static n
On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]> wrote:
>
> Hi,
>
> This patchset adds support for dynamic strings for synthetic events,
> as requested by Axel Rasmussen.
>
> Actually, the first two patches should be applied in any case - the
> first just changes the current max string length and the second fixes
> a bug I found while testing.
>
> It works fine for my simple test cases, but I still need to do a lot
> more testing, especially of the in-kernel API parts, which don't
> affect Axel's use case.
>
> Anyway, Axel, please try it out and send me your Tested-by: if it
> works for you, and your broken testcase(s) if it doesn't. ;-)
Thanks for working on this!
Capturing a histogram with a histogram trigger on the synthetic
event[1] seems to work as expected for me.
However, I tried creating the synthetic event, and then capturing a
histogram from it using bpftrace [2], and this doesn't seem to work.
It's completely possible I'm just doing something wrong, or the
bpftrace I'm using is buggy (getting it to work with __data_loc fields
at all requires a super bleeding edge bpftrace [3]). bpftrace reports
that it fails to attach to the synthetic event, and I hit some
WARN_ONs [4].
Does it seem possible this issue is related? If not, feel free to
disregard. :) I'll spend some more time digging into this tomorrow.
[1]:
echo 'mmap_lock_latency u64 time; char memcg_path[], bool write' > \
/sys/kernel/tracing/synthetic_events
echo 'hist:keys=memcg_path,write:ts0=common_timestamp.usecs' > \
/sys/kernel/tracing/events/mmap_lock/mmap_lock_start_locking/trigger
echo 'hist:keys=memcg_path,write:latency=common_timestamp.usecs-$ts0:onmatch(mmap_lock.mmap_lock_start_locking).mmap_lock_latency($latency,memcg_path,write)'
> \
/sys/kernel/tracing/events/mmap_lock/mmap_lock_acquire_returned/trigger
echo 'hist:keys=memcg_path,write:values=time' > \
/sys/kernel/tracing/events/synthetic/mmap_lock_latency/trigger
[2]:
(Same setup of the synthetic event as [1], except adding the histogram
trigger...)
cat <<EOF > /root/bpftypes.h
#include <stdbool.h>
#include <stdint.h>
typedef uint64_t u64;
typedef int64_t s64;
EOF
cat <<EOF > /root/mmap-lock-per-memcg.bpf
tracepoint:synthetic:mmap_lock_latency {
@latency[args->memcg_path] = hist(args->time);
}
EOF
/home/axelrasmussen/bpftrace/build/src/bpftrace \
-v \
-I "/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include" \
--include "/root/bpftypes.h" \
/root/mmap-lock-per-memcg.bpf
[3]: https://github.com/iovisor/bpftrace/pull/1542
[4]:
[ 31.361814] kauditd_printk_skb: 6 callbacks suppressed
[ 31.361815] audit: type=1334 audit(1601414765.236:34): prog-id=11 op=LOAD
From `WARN_ON(!tp_func->func)`:
[ 31.365310] ------------[ cut here ]------------
[ 31.365316] WARNING: CPU: 0 PID: 235 at kernel/tracepoint.c:136
tracepoint_add_func+0x2ad/0x300
[ 31.366594] Modules linked in: xhci_pci xhci_hcd virtio_net
net_failover failover virtio_crypto crypto_engine virtio_console
virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_ht
[ 31.366603] CPU: 0 PID: 235 Comm: bpftrace Not tainted 5.9.0-smp-DEV #1
[ 31.366604] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.13.0-1 04/01/2014
[ 31.366606] RIP: 0010:tracepoint_add_func+0x2ad/0x300
[ 31.366607] Code: 17 ff ff ff 4c 89 ff e8 e1 0c 0a 00 e9 22 ff ff
ff 41 ff 57 18 85 c0 4c 8b 04 24 8b 54 24 08 0f 89 9c fd ff ff e9 31
ff ff ff <0f> 0b b8 ea ff ff ff e9 f9 fd5
[ 31.366607] RSP: 0018:ffff911554fdbc50 EFLAGS: 00010246
[ 31.366608] RAX: 0000000000000000 RBX: ffff9115583c1200 RCX: 0000000000000000
[ 31.366609] RDX: 000000000000000a RSI: 0000000000000000 RDI: ffff9115583c1200
[ 31.366609] RBP: ffff911554fdbc90 R08: ffff911554fdbca0 R09: ffffcaabffdb5018
[ 31.366610] R10: 0000000000000080 R11: 0000000000000080 R12: ffff911558013c90
[ 31.366610] R13: 0000000000000000 R14: ffff911554f84d10 R15: ffff9115583c1200
[ 31.366611] FS: 00007f9bdf8f0c00(0000) GS:ffff91155fc00000(0000)
knlGS:0000000000000000
[ 31.366612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.366614] CR2: 00007f9bdefbc000 CR3: 0000000814fe0006 CR4: 0000000000370ef0
[ 31.366615] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 31.366615] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 31.366616] Call Trace:
[ 31.366620] tracepoint_probe_register+0x58/0x90
[ 31.366625] trace_event_reg+0x56/0xa0
[ 31.366626] perf_trace_event_reg.isra.2+0xba/0x130
[ 31.366627] perf_trace_event_init+0x32/0x70
[ 31.366628] perf_trace_init+0x6e/0xa0
[ 31.366631] perf_tp_event_init+0x28/0x40
[ 31.366632] perf_try_init_event+0xe7/0x130
[ 31.366633] perf_event_alloc+0x4ba/0xdc0
[ 31.366635] ? __alloc_fd+0x40/0x160
[ 31.366637] __do_sys_perf_event_open+0x2bb/0xe90
[ 31.366638] __x64_sys_perf_event_open+0x20/0x30
[ 31.366641] do_syscall_64+0x31/0x40
[ 31.366644] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 31.366645] RIP: 0033:0x7f9be4886d5d
[ 31.366646] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 488
[ 31.366646] RSP: 002b:00007ffdcf6e4d68 EFLAGS: 00000246 ORIG_RAX:
000000000000012a
[ 31.366647] RAX: ffffffffffffffda RBX: 00007ffdcf6e5e6c RCX: 00007f9be4886d5d
[ 31.366647] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00007ffdcf6e4d80
[ 31.366648] RBP: 00007ffdcf6e4d80 R08: 0000000000000008 R09: 000056095c148578
[ 31.366651] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000000b
[ 31.366652] R13: 00000000ffffffff R14: 00007ffdcf6e5e70 R15: 00007ffdcf6e4e00
[ 31.366653] ---[ end trace 25e0e681c1442c93 ]---
From `WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);`:
[ 31.367345] ------------[ cut here ]------------
[ 31.367347] WARNING: CPU: 0 PID: 235 at kernel/tracepoint.c:243
tracepoint_add_func+0xb2/0x300
[ 31.368612] Modules linked in: xhci_pci xhci_hcd virtio_net
net_failover failover virtio_crypto crypto_engine virtio_console
virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_ht
[ 31.368615] CPU: 0 PID: 235 Comm: bpftrace Tainted: G W
5.9.0-smp-DEV #1
[ 31.368615] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.13.0-1 04/01/2014
[ 31.368617] RIP: 0010:tracepoint_add_func+0xb2/0x300
[ 31.368617] Code: c6 48 8b 01 48 85 c0 74 28 41 83 fd ff 75 df 3b
51 10 44 0f 4f eb 48 39 c6 75 d8 49 8b 40 08 48 39 41 08 75 ce b8 ef
ff ff ff <0f> 0b e9 25 01 00 00 8d 43 02d
[ 31.368618] RSP: 0018:ffff911554fdbc50 EFLAGS: 00010246
[ 31.368618] RAX: 00000000ffffffea RBX: ffff9115583c1200 RCX: 0000000000000000
[ 31.368619] RDX: 000000000000000a RSI: 0000000000000000 RDI: ffff9115583c1200
[ 31.368619] RBP: ffff911554fdbc90 R08: ffff911554fdbca0 R09: ffffcaabffdb5018
[ 31.368620] R10: 0000000000000080 R11: 0000000000000080 R12: ffff911558013c90
[ 31.368620] R13: 0000000000000000 R14: ffff911554f84d10 R15: ffff9115583c1200
[ 31.368621] FS: 00007f9bdf8f0c00(0000) GS:ffff91155fc00000(0000)
knlGS:0000000000000000
[ 31.368621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 31.368623] CR2: 00007f9bdefbc000 CR3: 0000000814fe0006 CR4: 0000000000370ef0
[ 31.368624] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 31.368624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 31.368625] Call Trace:
[ 31.368626] tracepoint_probe_register+0x58/0x90
[ 31.368627] trace_event_reg+0x56/0xa0
[ 31.368628] perf_trace_event_reg.isra.2+0xba/0x130
[ 31.368629] perf_trace_event_init+0x32/0x70
[ 31.368630] perf_trace_init+0x6e/0xa0
[ 31.368631] perf_tp_event_init+0x28/0x40
[ 31.368632] perf_try_init_event+0xe7/0x130
[ 31.368632] perf_event_alloc+0x4ba/0xdc0
[ 31.368634] ? __alloc_fd+0x40/0x160
[ 31.368635] __do_sys_perf_event_open+0x2bb/0xe90
[ 31.368636] __x64_sys_perf_event_open+0x20/0x30
[ 31.368637] do_syscall_64+0x31/0x40
[ 31.368638] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 31.368639] RIP: 0033:0x7f9be4886d5d
[ 31.368639] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 488
[ 31.368640] RSP: 002b:00007ffdcf6e4d68 EFLAGS: 00000246 ORIG_RAX:
000000000000012a
[ 31.368640] RAX: ffffffffffffffda RBX: 00007ffdcf6e5e6c RCX: 00007f9be4886d5d
[ 31.368641] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00007ffdcf6e4d80
[ 31.368641] RBP: 00007ffdcf6e4d80 R08: 0000000000000008 R09: 000056095c148578
[ 31.368642] R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000000b
[ 31.368642] R13: 00000000ffffffff R14: 00007ffdcf6e5e70 R15: 00007ffdcf6e4e00
[ 31.368643] ---[ end trace 25e0e681c1442c94 ]---
[ 31.377271] audit: type=1334 audit(1601414765.252:35): prog-id=11 op=UNLOAD
>
> Thanks,
>
> Tom
>
> The following changes since commit 106c076d23cca67c959a6fd1ccadb5b3ef01ddc9:
>
> mmap_lock: add tracepoints around lock acquisition (2020-09-23 08:48:08 -0500)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/zanussi/linux-trace.git ftrace/synth-dynstring-v0
>
> Tom Zanussi (3):
> tracing: Change STR_VAR_MAX_LEN
> tracing: Fix parse_synth_field() error handling
> tracing: Add support for dynamic strings to synthetic events
>
> Documentation/trace/events.rst | 15 ++-
> Documentation/trace/histogram.rst | 18 ++++
> kernel/trace/trace_events_hist.c | 9 ++
> kernel/trace/trace_events_synth.c | 163 ++++++++++++++++++++++++++----
> kernel/trace/trace_synth.h | 6 +-
> 5 files changed, 186 insertions(+), 25 deletions(-)
>
> --
> 2.17.1
>
Hi Axel,
On Tue, 2020-09-29 at 15:19 -0700, Axel Rasmussen wrote:
> On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]>
> wrote:
> >
> > Hi,
> >
> > This patchset adds support for dynamic strings for synthetic
> > events,
> > as requested by Axel Rasmussen.
> >
> > Actually, the first two patches should be applied in any case - the
> > first just changes the current max string length and the second
> > fixes
> > a bug I found while testing.
> >
> > It works fine for my simple test cases, but I still need to do a
> > lot
> > more testing, especially of the in-kernel API parts, which don't
> > affect Axel's use case.
> >
> > Anyway, Axel, please try it out and send me your Tested-by: if it
> > works for you, and your broken testcase(s) if it doesn't. ;-)
>
> Thanks for working on this!
>
> Capturing a histogram with a histogram trigger on the synthetic
> event[1] seems to work as expected for me.
>
> However, I tried creating the synthetic event, and then capturing a
> histogram from it using bpftrace [2], and this doesn't seem to work.
> It's completely possible I'm just doing something wrong, or the
> bpftrace I'm using is buggy (getting it to work with __data_loc
> fields
> at all requires a super bleeding edge bpftrace [3]). bpftrace reports
> that it fails to attach to the synthetic event, and I hit some
> WARN_ONs [4].
>
> Does it seem possible this issue is related? If not, feel free to
> disregard. :) I'll spend some more time digging into this tomorrow.
>
It's possible, if the bpf stuff is using the in-kernel trace events
API, the part that I mentioned I hadn't tested yet :-/. There's one
important piece missing for that in this patch - I'll be working on
adding it and posting a v2 today.
Tom
>
>
> [1]:
> echo 'mmap_lock_latency u64 time; char memcg_path[], bool write' > \
> /sys/kernel/tracing/synthetic_events
>
> echo 'hist:keys=memcg_path,write:ts0=common_timestamp.usecs' > \
> /sys/kernel/tracing/events/mmap_lock/mmap_lock_start_locking/
> trigger
>
> echo 'hist:keys=memcg_path,write:latency=common_timestamp.usecs-
> $ts0:onmatch(mmap_lock.mmap_lock_start_locking).mmap_lock_latency($la
> tency,memcg_path,write)'
> > \
>
> /sys/kernel/tracing/events/mmap_lock/mmap_lock_acquire_return
> ed/trigger
>
> echo 'hist:keys=memcg_path,write:values=time' > \
> /sys/kernel/tracing/events/synthetic/mmap_lock_latency/trigge
> r
>
> [2]:
> (Same setup of the synthetic event as [1], except adding the
> histogram
> trigger...)
>
> cat <<EOF > /root/bpftypes.h
> #include <stdbool.h>
> #include <stdint.h>
> typedef uint64_t u64;
> typedef int64_t s64;
> EOF
>
> cat <<EOF > /root/mmap-lock-per-memcg.bpf
> tracepoint:synthetic:mmap_lock_latency {
> @latency[args->memcg_path] = hist(args->time);
> }
> EOF
>
> /home/axelrasmussen/bpftrace/build/src/bpftrace \
> -v \
> -I "/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include" \
> --include "/root/bpftypes.h" \
> /root/mmap-lock-per-memcg.bpf
>
> [3]: https://github.com/iovisor/bpftrace/pull/1542
>
> [4]:
> [ 31.361814] kauditd_printk_skb: 6 callbacks suppressed
> [ 31.361815] audit: type=1334 audit(1601414765.236:34): prog-id=11
> op=LOAD
>
> From `WARN_ON(!tp_func->func)`:
>
> [ 31.365310] ------------[ cut here ]------------
> [ 31.365316] WARNING: CPU: 0 PID: 235 at kernel/tracepoint.c:136
> tracepoint_add_func+0x2ad/0x300
> [ 31.366594] Modules linked in: xhci_pci xhci_hcd virtio_net
> net_failover failover virtio_crypto crypto_engine virtio_console
> virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_ht
> [ 31.366603] CPU: 0 PID: 235 Comm: bpftrace Not tainted 5.9.0-smp-
> DEV #1
> [ 31.366604] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.13.0-1 04/01/2014
> [ 31.366606] RIP: 0010:tracepoint_add_func+0x2ad/0x300
> [ 31.366607] Code: 17 ff ff ff 4c 89 ff e8 e1 0c 0a 00 e9 22 ff ff
> ff 41 ff 57 18 85 c0 4c 8b 04 24 8b 54 24 08 0f 89 9c fd ff ff e9 31
> ff ff ff <0f> 0b b8 ea ff ff ff e9 f9 fd5
> [ 31.366607] RSP: 0018:ffff911554fdbc50 EFLAGS: 00010246
> [ 31.366608] RAX: 0000000000000000 RBX: ffff9115583c1200 RCX:
> 0000000000000000
> [ 31.366609] RDX: 000000000000000a RSI: 0000000000000000 RDI:
> ffff9115583c1200
> [ 31.366609] RBP: ffff911554fdbc90 R08: ffff911554fdbca0 R09:
> ffffcaabffdb5018
> [ 31.366610] R10: 0000000000000080 R11: 0000000000000080 R12:
> ffff911558013c90
> [ 31.366610] R13: 0000000000000000 R14: ffff911554f84d10 R15:
> ffff9115583c1200
> [ 31.366611] FS: 00007f9bdf8f0c00(0000) GS:ffff91155fc00000(0000)
> knlGS:0000000000000000
> [ 31.366612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 31.366614] CR2: 00007f9bdefbc000 CR3: 0000000814fe0006 CR4:
> 0000000000370ef0
> [ 31.366615] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 31.366615] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 31.366616] Call Trace:
> [ 31.366620] tracepoint_probe_register+0x58/0x90
> [ 31.366625] trace_event_reg+0x56/0xa0
> [ 31.366626] perf_trace_event_reg.isra.2+0xba/0x130
> [ 31.366627] perf_trace_event_init+0x32/0x70
> [ 31.366628] perf_trace_init+0x6e/0xa0
> [ 31.366631] perf_tp_event_init+0x28/0x40
> [ 31.366632] perf_try_init_event+0xe7/0x130
> [ 31.366633] perf_event_alloc+0x4ba/0xdc0
> [ 31.366635] ? __alloc_fd+0x40/0x160
> [ 31.366637] __do_sys_perf_event_open+0x2bb/0xe90
> [ 31.366638] __x64_sys_perf_event_open+0x20/0x30
> [ 31.366641] do_syscall_64+0x31/0x40
> [ 31.366644] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 31.366645] RIP: 0033:0x7f9be4886d5d
> [ 31.366646] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 488
> [ 31.366646] RSP: 002b:00007ffdcf6e4d68 EFLAGS: 00000246 ORIG_RAX:
> 000000000000012a
> [ 31.366647] RAX: ffffffffffffffda RBX: 00007ffdcf6e5e6c RCX:
> 00007f9be4886d5d
> [ 31.366647] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> 00007ffdcf6e4d80
> [ 31.366648] RBP: 00007ffdcf6e4d80 R08: 0000000000000008 R09:
> 000056095c148578
> [ 31.366651] R10: 00000000ffffffff R11: 0000000000000246 R12:
> 000000000000000b
> [ 31.366652] R13: 00000000ffffffff R14: 00007ffdcf6e5e70 R15:
> 00007ffdcf6e4e00
> [ 31.366653] ---[ end trace 25e0e681c1442c93 ]---
>
> From `WARN_ON_ONCE(PTR_ERR(old) != -ENOMEM);`:
>
> [ 31.367345] ------------[ cut here ]------------
> [ 31.367347] WARNING: CPU: 0 PID: 235 at kernel/tracepoint.c:243
> tracepoint_add_func+0xb2/0x300
> [ 31.368612] Modules linked in: xhci_pci xhci_hcd virtio_net
> net_failover failover virtio_crypto crypto_engine virtio_console
> virtio_blk virtio_balloon uhci_hcd ohci_pci ohci_ht
> [ 31.368615] CPU: 0 PID: 235 Comm: bpftrace Tainted: G W
> 5.9.0-smp-DEV #1
> [ 31.368615] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.13.0-1 04/01/2014
> [ 31.368617] RIP: 0010:tracepoint_add_func+0xb2/0x300
> [ 31.368617] Code: c6 48 8b 01 48 85 c0 74 28 41 83 fd ff 75 df 3b
> 51 10 44 0f 4f eb 48 39 c6 75 d8 49 8b 40 08 48 39 41 08 75 ce b8 ef
> ff ff ff <0f> 0b e9 25 01 00 00 8d 43 02d
> [ 31.368618] RSP: 0018:ffff911554fdbc50 EFLAGS: 00010246
> [ 31.368618] RAX: 00000000ffffffea RBX: ffff9115583c1200 RCX:
> 0000000000000000
> [ 31.368619] RDX: 000000000000000a RSI: 0000000000000000 RDI:
> ffff9115583c1200
> [ 31.368619] RBP: ffff911554fdbc90 R08: ffff911554fdbca0 R09:
> ffffcaabffdb5018
> [ 31.368620] R10: 0000000000000080 R11: 0000000000000080 R12:
> ffff911558013c90
> [ 31.368620] R13: 0000000000000000 R14: ffff911554f84d10 R15:
> ffff9115583c1200
> [ 31.368621] FS: 00007f9bdf8f0c00(0000) GS:ffff91155fc00000(0000)
> knlGS:0000000000000000
> [ 31.368621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 31.368623] CR2: 00007f9bdefbc000 CR3: 0000000814fe0006 CR4:
> 0000000000370ef0
> [ 31.368624] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 31.368624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 31.368625] Call Trace:
> [ 31.368626] tracepoint_probe_register+0x58/0x90
> [ 31.368627] trace_event_reg+0x56/0xa0
> [ 31.368628] perf_trace_event_reg.isra.2+0xba/0x130
> [ 31.368629] perf_trace_event_init+0x32/0x70
> [ 31.368630] perf_trace_init+0x6e/0xa0
> [ 31.368631] perf_tp_event_init+0x28/0x40
> [ 31.368632] perf_try_init_event+0xe7/0x130
> [ 31.368632] perf_event_alloc+0x4ba/0xdc0
> [ 31.368634] ? __alloc_fd+0x40/0x160
> [ 31.368635] __do_sys_perf_event_open+0x2bb/0xe90
> [ 31.368636] __x64_sys_perf_event_open+0x20/0x30
> [ 31.368637] do_syscall_64+0x31/0x40
> [ 31.368638] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 31.368639] RIP: 0033:0x7f9be4886d5d
> [ 31.368639] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 488
> [ 31.368640] RSP: 002b:00007ffdcf6e4d68 EFLAGS: 00000246 ORIG_RAX:
> 000000000000012a
> [ 31.368640] RAX: ffffffffffffffda RBX: 00007ffdcf6e5e6c RCX:
> 00007f9be4886d5d
> [ 31.368641] RDX: 0000000000000000 RSI: 00000000ffffffff RDI:
> 00007ffdcf6e4d80
> [ 31.368641] RBP: 00007ffdcf6e4d80 R08: 0000000000000008 R09:
> 000056095c148578
> [ 31.368642] R10: 00000000ffffffff R11: 0000000000000246 R12:
> 000000000000000b
> [ 31.368642] R13: 00000000ffffffff R14: 00007ffdcf6e5e70 R15:
> 00007ffdcf6e4e00
> [ 31.368643] ---[ end trace 25e0e681c1442c94 ]---
> [ 31.377271] audit: type=1334 audit(1601414765.252:35): prog-id=11
> op=UNLOAD
>
> >
> > Thanks,
> >
> > Tom
> >
> > The following changes since commit
> > 106c076d23cca67c959a6fd1ccadb5b3ef01ddc9:
> >
> > mmap_lock: add tracepoints around lock acquisition (2020-09-23
> > 08:48:08 -0500)
> >
> > are available in the Git repository at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/zanussi/linux-
> > trace.git ftrace/synth-dynstring-v0
> >
> > Tom Zanussi (3):
> > tracing: Change STR_VAR_MAX_LEN
> > tracing: Fix parse_synth_field() error handling
> > tracing: Add support for dynamic strings to synthetic events
> >
> > Documentation/trace/events.rst | 15 ++-
> > Documentation/trace/histogram.rst | 18 ++++
> > kernel/trace/trace_events_hist.c | 9 ++
> > kernel/trace/trace_events_synth.c | 163
> > ++++++++++++++++++++++++++----
> > kernel/trace/trace_synth.h | 6 +-
> > 5 files changed, 186 insertions(+), 25 deletions(-)
> >
> > --
> > 2.17.1
> >
Hi Axel,
On Tue, 2020-09-29 at 14:56 -0700, Axel Rasmussen wrote:
> On Tue, Sep 29, 2020 at 1:33 PM Tom Zanussi <[email protected]>
> wrote:
> >
> > synth_field_size() returns either the size or an error. However,
> > the
> > code assigns the return val to ssize_t which is unsigned, and then
> > tests whether it's less than 0, which it isn't so discards the
> > error.
>
> I think the patch is correct, but the commit message is not.
> field->size is a size_t (unsigned), not an ssize_t (signed). I think
> this should say instead something like:
>
> synth_field_size() returns either a positive size or an error (zero
> or
> a negative value). However, the existing code assumes the only error
> value is 0. It doesn't handle negative error codes, as it assigns
> directly to field->size (a size_t; unsigned), thereby interpreting
> the
> error code as a valid size instead.
>
Yes, that's better - I used the above text in v2.
> >
> > Do the test before assignment to field->size.
> >
> > Signed-off-by: Tom Zanussi <[email protected]>
> > ---
> > kernel/trace/trace_events_synth.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/trace/trace_events_synth.c
> > b/kernel/trace/trace_events_synth.c
> > index a9cd7793f7ea..6e7282c7b530 100644
> > --- a/kernel/trace/trace_events_synth.c
> > +++ b/kernel/trace/trace_events_synth.c
> > @@ -465,6 +465,7 @@ static struct synth_field
> > *parse_synth_field(int argc, const char **argv,
> > struct synth_field *field;
> > const char *prefix = NULL, *field_type = argv[0],
> > *field_name, *array;
> > int len, ret = 0;
> > + int size;
>
> Why not make this an ssize_t
Yep, makes sense. Changed in v2, thanks!
Tom
>
> >
> > if (field_type[0] == ';')
> > field_type++;
> > @@ -520,11 +521,12 @@ static struct synth_field
> > *parse_synth_field(int argc, const char **argv,
> > field->type[len - 1] = '\0';
> > }
> >
> > - field->size = synth_field_size(field->type);
> > - if (!field->size) {
> > + size = synth_field_size(field->type);
> > + if (size < 0) {
> > ret = -EINVAL;
> > goto free;
> > }
> > + field->size = size;
> >
> > if (synth_field_is_string(field->type))
> > field->is_string = true;
> > --
> > 2.17.1
> >