Currently user_events supports 1 event with the same name and must have
the exact same format when referenced by multiple programs. This opens
an opportunity for malicous or poorly thought through programs to
create events that others use with different formats. Another scenario
is user programs wishing to use the same event name but add more fields
later when the software updates. Various versions of a program may be
running side-by-side, which is prevented by the current single format
requirement.
Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates
the user program wishes to use the same user_event name, but may have
several different formats of the event in the future. When this flag is
used, create the underlying tracepoint backing the user_event with a
unique name per-version of the format. It's important that existing ABI
users do not get this logic automatically, even if one of the multi
format events matches the format. This ensures existing programs that
create events and assume the tracepoint name will match exactly continue
to work as expected. Add logic to only check multi-format events with
other multi-format events and single-format events to only check
single-format events during find.
Add a register_name (reg_name) to the user_event struct which allows for
split naming of events. We now have the name that was used to register
within user_events as well as the unique name for the tracepoint. Upon
registering events ensure matches based on first the reg_name, followed
by the fields and format of the event. This allows for multiple events
with the same registered name to have different formats. The underlying
tracepoint will have a unique name in the format of {reg_name}:[unique_id].
The unique_id is the time, in nanoseconds, of the event creation converted
to hex. Since this is done under the register mutex, it is extremely
unlikely for these IDs to ever match. It's also very unlikely a malicious
program could consistently guess what the name would be and attempt to
squat on it via the single format ABI.
For example, if both "test u32 value" and "test u64 value" are used with
the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
tracepoints. The dynamic_events file would then show the following:
u:test u64 count
u:test u32 count
The actual tracepoint names look like this:
test:[d5874fdac44]
test:[d5914662cd4]
Deleting events via "!u:test u64 count" would only delete the first
tracepoint that matched that format. When the delete ABI is used all
events with the same name will be attempted to be deleted. If
per-version deletion is required, user programs should either not use
persistent events or delete them via dynamic_events.
Beau Belgrave (4):
tracing/user_events: Prepare find/delete for same name events
tracing/user_events: Introduce multi-format events
selftests/user_events: Test multi-format events
tracing/user_events: Document multi-format flag
Documentation/trace/user_events.rst | 23 +-
include/uapi/linux/user_events.h | 6 +-
kernel/trace/trace_events_user.c | 224 +++++++++++++-----
.../testing/selftests/user_events/abi_test.c | 134 +++++++++++
4 files changed, 325 insertions(+), 62 deletions(-)
base-commit: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
--
2.34.1
User_events now has multi-format events which allow for the same
register name, but with different formats. When this occurs, different
tracepoints are created with unique names.
Add a new test that ensures the same name can be used for two different
formats. Ensure they are isolated from each other and that name and arg
matching still works if yet another register comes in with the same
format as one of the two.
Signed-off-by: Beau Belgrave <[email protected]>
---
.../testing/selftests/user_events/abi_test.c | 134 ++++++++++++++++++
1 file changed, 134 insertions(+)
diff --git a/tools/testing/selftests/user_events/abi_test.c b/tools/testing/selftests/user_events/abi_test.c
index cef1ff1af223..b3480fad65a5 100644
--- a/tools/testing/selftests/user_events/abi_test.c
+++ b/tools/testing/selftests/user_events/abi_test.c
@@ -16,6 +16,8 @@
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <unistd.h>
+#include <glob.h>
+#include <string.h>
#include <asm/unistd.h>
#include "../kselftest_harness.h"
@@ -23,6 +25,62 @@
const char *data_file = "/sys/kernel/tracing/user_events_data";
const char *enable_file = "/sys/kernel/tracing/events/user_events/__abi_event/enable";
+const char *multi_dir_glob = "/sys/kernel/tracing/events/user_events_multi/__abi_event:*";
+
+static int wait_for_delete(char *dir)
+{
+ struct stat buf;
+ int i;
+
+ for (i = 0; i < 10000; ++i) {
+ if (stat(dir, &buf) == -1 && errno == ENOENT)
+ return 0;
+
+ usleep(1000);
+ }
+
+ return -1;
+}
+
+static int find_multi_event_dir(char *unique_field, char *out_dir, int dir_len)
+{
+ char path[256];
+ glob_t buf;
+ int i, ret;
+
+ ret = glob(multi_dir_glob, GLOB_ONLYDIR, NULL, &buf);
+
+ if (ret)
+ return -1;
+
+ ret = -1;
+
+ for (i = 0; i < buf.gl_pathc; ++i) {
+ FILE *fp;
+
+ snprintf(path, sizeof(path), "%s/format", buf.gl_pathv[i]);
+ fp = fopen(path, "r");
+
+ if (!fp)
+ continue;
+
+ while (fgets(path, sizeof(path), fp) != NULL) {
+ if (strstr(path, unique_field)) {
+ fclose(fp);
+ /* strscpy is not available, use snprintf */
+ snprintf(out_dir, dir_len, "%s", buf.gl_pathv[i]);
+ ret = 0;
+ goto out;
+ }
+ }
+
+ fclose(fp);
+ }
+out:
+ globfree(&buf);
+
+ return ret;
+}
static bool event_exists(void)
{
@@ -74,6 +132,39 @@ static int event_delete(void)
return ret;
}
+static int reg_enable_multi(void *enable, int size, int bit, int flags,
+ char *args)
+{
+ struct user_reg reg = {0};
+ char full_args[512] = {0};
+ int fd = open(data_file, O_RDWR);
+ int len;
+ int ret;
+
+ if (fd < 0)
+ return -1;
+
+ len = snprintf(full_args, sizeof(full_args), "__abi_event %s", args);
+
+ if (len > sizeof(full_args)) {
+ ret = -E2BIG;
+ goto out;
+ }
+
+ reg.size = sizeof(reg);
+ reg.name_args = (__u64)full_args;
+ reg.flags = USER_EVENT_REG_MULTI_FORMAT | flags;
+ reg.enable_bit = bit;
+ reg.enable_addr = (__u64)enable;
+ reg.enable_size = size;
+
+ ret = ioctl(fd, DIAG_IOCSREG, ®);
+out:
+ close(fd);
+
+ return ret;
+}
+
static int reg_enable_flags(void *enable, int size, int bit, int flags)
{
struct user_reg reg = {0};
@@ -207,6 +298,49 @@ TEST_F(user, bit_sizes) {
ASSERT_NE(0, reg_enable(&self->check, 128, 0));
}
+TEST_F(user, multi_format) {
+ char first_dir[256];
+ char second_dir[256];
+ struct stat buf;
+
+ /* Multiple formats for the same name should work */
+ ASSERT_EQ(0, reg_enable_multi(&self->check, sizeof(int), 0,
+ 0, "u32 multi_first"));
+
+ ASSERT_EQ(0, reg_enable_multi(&self->check, sizeof(int), 1,
+ 0, "u64 multi_second"));
+
+ /* Same name with same format should also work */
+ ASSERT_EQ(0, reg_enable_multi(&self->check, sizeof(int), 2,
+ 0, "u64 multi_second"));
+
+ ASSERT_EQ(0, find_multi_event_dir("multi_first",
+ first_dir, sizeof(first_dir)));
+
+ ASSERT_EQ(0, find_multi_event_dir("multi_second",
+ second_dir, sizeof(second_dir)));
+
+ /* Should not be found in the same dir */
+ ASSERT_NE(0, strcmp(first_dir, second_dir));
+
+ /* First dir should still exist */
+ ASSERT_EQ(0, stat(first_dir, &buf));
+
+ /* Disabling first register should remove first dir */
+ ASSERT_EQ(0, reg_disable(&self->check, 0));
+ ASSERT_EQ(0, wait_for_delete(first_dir));
+
+ /* Second dir should still exist */
+ ASSERT_EQ(0, stat(second_dir, &buf));
+
+ /* Disabling second register should remove second dir */
+ ASSERT_EQ(0, reg_disable(&self->check, 1));
+ /* Ensure bit 1 and 2 are tied together, should not delete yet */
+ ASSERT_EQ(0, stat(second_dir, &buf));
+ ASSERT_EQ(0, reg_disable(&self->check, 2));
+ ASSERT_EQ(0, wait_for_delete(second_dir));
+}
+
TEST_F(user, forks) {
int i;
--
2.34.1
User programs can now ask user_events to handle the synchronization of
multiple different formats for an event with the same name via the new
USER_EVENT_REG_MULTI_FORMAT flag.
Add a section for USER_EVENT_REG_MULTI_FORMAT that explains the intended
purpose and caveats of using it. Explain how deletion works in these
cases and how to use /sys/kernel/tracing/dynamic_events for per-version
deletion.
Signed-off-by: Beau Belgrave <[email protected]>
---
Documentation/trace/user_events.rst | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user_events.rst
index d8f12442aaa6..35a2524bc73c 100644
--- a/Documentation/trace/user_events.rst
+++ b/Documentation/trace/user_events.rst
@@ -92,6 +92,20 @@ The following flags are currently supported.
process closes or unregisters the event. Requires CAP_PERFMON otherwise
-EPERM is returned.
++ USER_EVENT_REG_MULTI_FORMAT: The event can contain multiple formats. This
+ allows programs to prevent themselves from being blocked when their event
+ format changes and they wish to use the same name. When this flag is used the
+ tracepoint name will be in the new format of "name:[unique_id]" vs the older
+ format of "name". A tracepoint will be created for each unique pair of name
+ and format. This means if several processes use the same name and format,
+ they will use the same tracepoint. If yet another process uses the same name,
+ but a different format than the other processes, it will use a different
+ tracepoint with a new unique id. Recording programs need to scan tracefs for
+ the various different formats of the event name they are interested in
+ recording. The system name of the tracepoint will also use "user_events_multi"
+ instead of "user_events". This prevents single-format event names conflicting
+ with any multi-format event names within tracefs.
+
Upon successful registration the following is set.
+ write_index: The index to use for this file descriptor that represents this
@@ -106,6 +120,9 @@ or perf record -e user_events:[name] when attaching/recording.
**NOTE:** The event subsystem name by default is "user_events". Callers should
not assume it will always be "user_events". Operators reserve the right in the
future to change the subsystem name per-process to accommodate event isolation.
+In addition if the USER_EVENT_REG_MULTI_FORMAT flag is used the tracepoint name
+will have a unique id appended to it and the system name will be
+"user_events_multi" as described above.
Command Format
^^^^^^^^^^^^^^
@@ -156,7 +173,11 @@ to request deletes than the one used for registration due to this.
to the event. If programs do not want auto-delete, they must use the
USER_EVENT_REG_PERSIST flag when registering the event. Once that flag is used
the event exists until DIAG_IOCSDEL is invoked. Both register and delete of an
-event that persists requires CAP_PERFMON, otherwise -EPERM is returned.
+event that persists requires CAP_PERFMON, otherwise -EPERM is returned. When
+there are multiple formats of the same event name, all events with the same
+name will be attempted to be deleted. If only a specific version is wanted to
+be deleted then the /sys/kernel/tracing/dynamic_events file should be used for
+that specific format of the event.
Unregistering
-------------
--
2.34.1
The current code for finding and deleting events assumes that there will
never be cases when user_events are registered with the same name, but
different formats. In the future this scenario will exist to ensure
user programs can be updated or modify their events and run different
versions of their programs side-by-side without being blocked.
This change does not yet allow for multi-format events. If user_events
are registered with the same name but different arguments the programs
see the same return values as before. This change simply makes it
possible to easily accomodate for this in future changes.
Update find_user_event() to take in argument parameters and register
flags to accomodate future multi-format event scenarios. Have find
validate argument matching and return error pointers to cover address
in use cases, or allocation errors. Update callers to handle error
pointer logic.
Move delete_user_event() to use hash walking directly now that find has
changed. Delete all events found that match the register name, stop
if an error occurs and report back to the user.
Update user_fields_match() to cover list_empty() scenarios instead of
each callsite doing it now that find_user_event() uses it directly.
Signed-off-by: Beau Belgrave <[email protected]>
---
kernel/trace/trace_events_user.c | 106 +++++++++++++++++--------------
1 file changed, 58 insertions(+), 48 deletions(-)
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 9365ce407426..0480579ba563 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -202,6 +202,8 @@ static struct user_event_mm *user_event_mm_get(struct user_event_mm *mm);
static struct user_event_mm *user_event_mm_get_all(struct user_event *user);
static void user_event_mm_put(struct user_event_mm *mm);
static int destroy_user_event(struct user_event *user);
+static bool user_fields_match(struct user_event *user, int argc,
+ const char **argv);
static u32 user_event_key(char *name)
{
@@ -1493,17 +1495,24 @@ static int destroy_user_event(struct user_event *user)
}
static struct user_event *find_user_event(struct user_event_group *group,
- char *name, u32 *outkey)
+ char *name, int argc, const char **argv,
+ u32 flags, u32 *outkey)
{
struct user_event *user;
u32 key = user_event_key(name);
*outkey = key;
- hash_for_each_possible(group->register_table, user, node, key)
- if (!strcmp(EVENT_NAME(user), name))
+ hash_for_each_possible(group->register_table, user, node, key) {
+ if (strcmp(EVENT_NAME(user), name))
+ continue;
+
+ if (user_fields_match(user, argc, argv))
return user_event_get(user);
+ return ERR_PTR(-EADDRINUSE);
+ }
+
return NULL;
}
@@ -1860,6 +1869,9 @@ static bool user_fields_match(struct user_event *user, int argc,
struct list_head *head = &user->fields;
int i = 0;
+ if (argc == 0)
+ return list_empty(head);
+
list_for_each_entry_reverse(field, head, link) {
if (!user_field_match(field, argc, argv, &i))
return false;
@@ -1880,10 +1892,8 @@ static bool user_event_match(const char *system, const char *event,
match = strcmp(EVENT_NAME(user), event) == 0 &&
(!system || strcmp(system, USER_EVENTS_SYSTEM) == 0);
- if (match && argc > 0)
+ if (match)
match = user_fields_match(user, argc, argv);
- else if (match && argc == 0)
- match = list_empty(&user->fields);
return match;
}
@@ -1922,11 +1932,11 @@ static int user_event_parse(struct user_event_group *group, char *name,
char *args, char *flags,
struct user_event **newuser, int reg_flags)
{
- int ret;
- u32 key;
struct user_event *user;
+ char **argv = NULL;
int argc = 0;
- char **argv;
+ int ret;
+ u32 key;
/* Currently don't support any text based flags */
if (flags != NULL)
@@ -1935,41 +1945,34 @@ static int user_event_parse(struct user_event_group *group, char *name,
if (!user_event_capable(reg_flags))
return -EPERM;
+ if (args) {
+ argv = argv_split(GFP_KERNEL, args, &argc);
+
+ if (!argv)
+ return -ENOMEM;
+ }
+
/* Prevent dyn_event from racing */
mutex_lock(&event_mutex);
- user = find_user_event(group, name, &key);
+ user = find_user_event(group, name, argc, (const char **)argv,
+ reg_flags, &key);
mutex_unlock(&event_mutex);
- if (user) {
- if (args) {
- argv = argv_split(GFP_KERNEL, args, &argc);
- if (!argv) {
- ret = -ENOMEM;
- goto error;
- }
+ if (argv)
+ argv_free(argv);
- ret = user_fields_match(user, argc, (const char **)argv);
- argv_free(argv);
-
- } else
- ret = list_empty(&user->fields);
-
- if (ret) {
- *newuser = user;
- /*
- * Name is allocated by caller, free it since it already exists.
- * Caller only worries about failure cases for freeing.
- */
- kfree(name);
- } else {
- ret = -EADDRINUSE;
- goto error;
- }
+ if (IS_ERR(user))
+ return PTR_ERR(user);
+
+ if (user) {
+ *newuser = user;
+ /*
+ * Name is allocated by caller, free it since it already exists.
+ * Caller only worries about failure cases for freeing.
+ */
+ kfree(name);
return 0;
-error:
- user_event_put(user, false);
- return ret;
}
user = kzalloc(sizeof(*user), GFP_KERNEL_ACCOUNT);
@@ -2052,25 +2055,32 @@ static int user_event_parse(struct user_event_group *group, char *name,
}
/*
- * Deletes a previously created event if it is no longer being used.
+ * Deletes previously created events if they are no longer being used.
*/
static int delete_user_event(struct user_event_group *group, char *name)
{
- u32 key;
- struct user_event *user = find_user_event(group, name, &key);
+ struct user_event *user;
+ u32 key = user_event_key(name);
+ int ret = -ENOENT;
- if (!user)
- return -ENOENT;
+ /* Attempt to delete all event(s) with the name passed in */
+ hash_for_each_possible(group->register_table, user, node, key) {
+ if (strcmp(EVENT_NAME(user), name))
+ continue;
- user_event_put(user, true);
+ if (!user_event_last_ref(user))
+ return -EBUSY;
- if (!user_event_last_ref(user))
- return -EBUSY;
+ if (!user_event_capable(user->reg_flags))
+ return -EPERM;
- if (!user_event_capable(user->reg_flags))
- return -EPERM;
+ ret = destroy_user_event(user);
- return destroy_user_event(user);
+ if (ret)
+ goto out;
+ }
+out:
+ return ret;
}
/*
--
2.34.1
Currently user_events supports 1 event with the same name and must have
the exact same format when referenced by multiple programs. This opens
an opportunity for malicous or poorly thought through programs to
create events that others use with different formats. Another scenario
is user programs wishing to use the same event name but add more fields
later when the software updates. Various versions of a program may be
running side-by-side, which is prevented by the current single format
requirement.
Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates
the user program wishes to use the same user_event name, but may have
several different formats of the event in the future. When this flag is
used, create the underlying tracepoint backing the user_event with a
unique name per-version of the format. It's important that existing ABI
users do not get this logic automatically, even if one of the multi
format events matches the format. This ensures existing programs that
create events and assume the tracepoint name will match exactly continue
to work as expected. Add logic to only check multi-format events with
other multi-format events and single-format events to only check
single-format events during find.
Change system name of the multi-format event tracepoint to ensure that
multi-format events are isolated completely from single-format events.
Add a register_name (reg_name) to the user_event struct which allows for
split naming of events. We now have the name that was used to register
within user_events as well as the unique name for the tracepoint. Upon
registering events ensure matches based on first the reg_name, followed
by the fields and format of the event. This allows for multiple events
with the same registered name to have different formats. The underlying
tracepoint will have a unique name in the format of {reg_name}:[unique_id].
For example, if both "test u32 value" and "test u64 value" are used with
the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
tracepoints. The dynamic_events file would then show the following:
u:test u64 count
u:test u32 count
The actual tracepoint names look like this:
test:[d5874fdac44]
test:[d5914662cd4]
Both would be under the new user_events_multi system name to prevent the
older ABI from being used to squat on multi-formatted events and block
their use.
Deleting events via "!u:test u64 count" would only delete the first
tracepoint that matched that format. When the delete ABI is used all
events with the same name will be attempted to be deleted. If
per-version deletion is required, user programs should either not use
persistent events or delete them via dynamic_events.
Signed-off-by: Beau Belgrave <[email protected]>
---
include/uapi/linux/user_events.h | 6 +-
kernel/trace/trace_events_user.c | 118 +++++++++++++++++++++++++++----
2 files changed, 111 insertions(+), 13 deletions(-)
diff --git a/include/uapi/linux/user_events.h b/include/uapi/linux/user_events.h
index f74f3aedd49c..a03de03dccbc 100644
--- a/include/uapi/linux/user_events.h
+++ b/include/uapi/linux/user_events.h
@@ -12,6 +12,7 @@
#include <linux/ioctl.h>
#define USER_EVENTS_SYSTEM "user_events"
+#define USER_EVENTS_MULTI_SYSTEM "user_events_multi"
#define USER_EVENTS_PREFIX "u:"
/* Create dynamic location entry within a 32-bit value */
@@ -22,8 +23,11 @@ enum user_reg_flag {
/* Event will not delete upon last reference closing */
USER_EVENT_REG_PERSIST = 1U << 0,
+ /* Event will be allowed to have multiple formats */
+ USER_EVENT_REG_MULTI_FORMAT = 1U << 1,
+
/* This value or above is currently non-ABI */
- USER_EVENT_REG_MAX = 1U << 1,
+ USER_EVENT_REG_MAX = 1U << 2,
};
/*
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 0480579ba563..f9c0781285b6 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -34,7 +34,8 @@
/* Limit how long of an event name plus args within the subsystem. */
#define MAX_EVENT_DESC 512
-#define EVENT_NAME(user_event) ((user_event)->tracepoint.name)
+#define EVENT_NAME(user_event) ((user_event)->reg_name)
+#define EVENT_TP_NAME(user_event) ((user_event)->tracepoint.name)
#define MAX_FIELD_ARRAY_SIZE 1024
/*
@@ -54,10 +55,13 @@
* allows isolation for events by various means.
*/
struct user_event_group {
- char *system_name;
- struct hlist_node node;
- struct mutex reg_mutex;
+ char *system_name;
+ char *system_multi_name;
+ struct hlist_node node;
+ struct mutex reg_mutex;
DECLARE_HASHTABLE(register_table, 8);
+ /* ID that moves forward within the group for multi-event names */
+ u64 multi_id;
};
/* Group for init_user_ns mapping, top-most group */
@@ -78,6 +82,7 @@ static unsigned int current_user_events;
*/
struct user_event {
struct user_event_group *group;
+ char *reg_name;
struct tracepoint tracepoint;
struct trace_event_call call;
struct trace_event_class class;
@@ -127,6 +132,8 @@ struct user_event_enabler {
#define ENABLE_BIT(e) ((int)((e)->values & ENABLE_VAL_BIT_MASK))
+#define EVENT_MULTI_FORMAT(f) ((f) & USER_EVENT_REG_MULTI_FORMAT)
+
/* Used for asynchronous faulting in of pages */
struct user_event_enabler_fault {
struct work_struct work;
@@ -330,6 +337,7 @@ static void user_event_put(struct user_event *user, bool locked)
static void user_event_group_destroy(struct user_event_group *group)
{
kfree(group->system_name);
+ kfree(group->system_multi_name);
kfree(group);
}
@@ -348,6 +356,21 @@ static char *user_event_group_system_name(void)
return system_name;
}
+static char *user_event_group_system_multi_name(void)
+{
+ char *system_name;
+ int len = sizeof(USER_EVENTS_MULTI_SYSTEM) + 1;
+
+ system_name = kmalloc(len, GFP_KERNEL);
+
+ if (!system_name)
+ return NULL;
+
+ snprintf(system_name, len, "%s", USER_EVENTS_MULTI_SYSTEM);
+
+ return system_name;
+}
+
static struct user_event_group *current_user_event_group(void)
{
return init_group;
@@ -367,6 +390,11 @@ static struct user_event_group *user_event_group_create(void)
if (!group->system_name)
goto error;
+ group->system_multi_name = user_event_group_system_multi_name();
+
+ if (!group->system_multi_name)
+ goto error;
+
mutex_init(&group->reg_mutex);
hash_init(group->register_table);
@@ -1482,6 +1510,11 @@ static int destroy_user_event(struct user_event *user)
hash_del(&user->node);
user_event_destroy_validators(user);
+
+ /* If we have different names, both must be freed */
+ if (EVENT_NAME(user) != EVENT_TP_NAME(user))
+ kfree(EVENT_TP_NAME(user));
+
kfree(user->call.print_fmt);
kfree(EVENT_NAME(user));
kfree(user);
@@ -1504,12 +1537,24 @@ static struct user_event *find_user_event(struct user_event_group *group,
*outkey = key;
hash_for_each_possible(group->register_table, user, node, key) {
+ /*
+ * Single-format events shouldn't return multi-format
+ * events. Callers expect the underlying tracepoint to match
+ * the name exactly in these cases. Only check like-formats.
+ */
+ if (EVENT_MULTI_FORMAT(flags) != EVENT_MULTI_FORMAT(user->reg_flags))
+ continue;
+
if (strcmp(EVENT_NAME(user), name))
continue;
if (user_fields_match(user, argc, argv))
return user_event_get(user);
+ /* Scan others if this is a multi-format event */
+ if (EVENT_MULTI_FORMAT(flags))
+ continue;
+
return ERR_PTR(-EADDRINUSE);
}
@@ -1889,8 +1934,12 @@ static bool user_event_match(const char *system, const char *event,
struct user_event *user = container_of(ev, struct user_event, devent);
bool match;
- match = strcmp(EVENT_NAME(user), event) == 0 &&
- (!system || strcmp(system, USER_EVENTS_SYSTEM) == 0);
+ match = strcmp(EVENT_NAME(user), event) == 0;
+
+ if (match && system) {
+ match = strcmp(system, user->group->system_name) == 0 ||
+ strcmp(system, user->group->system_multi_name) == 0;
+ }
if (match)
match = user_fields_match(user, argc, argv);
@@ -1923,6 +1972,39 @@ static int user_event_trace_register(struct user_event *user)
return ret;
}
+static int user_event_set_tp_name(struct user_event *user)
+{
+ lockdep_assert_held(&user->group->reg_mutex);
+
+ if (EVENT_MULTI_FORMAT(user->reg_flags)) {
+ char *multi_name;
+ int len;
+
+ len = snprintf(NULL, 0, "%s:[%llx]", user->reg_name,
+ user->group->multi_id) + 1;
+
+ multi_name = kzalloc(len, GFP_KERNEL_ACCOUNT);
+
+ if (!multi_name)
+ return -ENOMEM;
+
+ snprintf(multi_name, len, "%s:[%llx]", user->reg_name,
+ user->group->multi_id);
+
+ user->call.name = multi_name;
+ user->tracepoint.name = multi_name;
+
+ /* Inc to ensure unique multi-event name next time */
+ user->group->multi_id++;
+ } else {
+ /* Non Multi-format uses register name */
+ user->call.name = user->reg_name;
+ user->tracepoint.name = user->reg_name;
+ }
+
+ return 0;
+}
+
/*
* Parses the event name, arguments and flags then registers if successful.
* The name buffer lifetime is owned by this method for success cases only.
@@ -1985,7 +2067,13 @@ static int user_event_parse(struct user_event_group *group, char *name,
INIT_LIST_HEAD(&user->validators);
user->group = group;
- user->tracepoint.name = name;
+ user->reg_name = name;
+ user->reg_flags = reg_flags;
+
+ ret = user_event_set_tp_name(user);
+
+ if (ret)
+ goto put_user;
ret = user_event_parse_fields(user, args);
@@ -1999,11 +2087,14 @@ static int user_event_parse(struct user_event_group *group, char *name,
user->call.data = user;
user->call.class = &user->class;
- user->call.name = name;
user->call.flags = TRACE_EVENT_FL_TRACEPOINT;
user->call.tp = &user->tracepoint;
user->call.event.funcs = &user_event_funcs;
- user->class.system = group->system_name;
+
+ if (EVENT_MULTI_FORMAT(user->reg_flags))
+ user->class.system = group->system_multi_name;
+ else
+ user->class.system = group->system_name;
user->class.fields_array = user_event_fields_array;
user->class.get_fields = user_event_get_fields;
@@ -2025,8 +2116,6 @@ static int user_event_parse(struct user_event_group *group, char *name,
if (ret)
goto put_user_lock;
- user->reg_flags = reg_flags;
-
if (user->reg_flags & USER_EVENT_REG_PERSIST) {
/* Ensure we track self ref and caller ref (2) */
refcount_set(&user->refcnt, 2);
@@ -2050,6 +2139,11 @@ static int user_event_parse(struct user_event_group *group, char *name,
user_event_destroy_fields(user);
user_event_destroy_validators(user);
kfree(user->call.print_fmt);
+
+ /* Caller frees reg_name on error, but not multi-name */
+ if (EVENT_NAME(user) != EVENT_TP_NAME(user))
+ kfree(EVENT_TP_NAME(user));
+
kfree(user);
return ret;
}
@@ -2640,7 +2734,7 @@ static int user_seq_show(struct seq_file *m, void *p)
hash_for_each(group->register_table, i, user, node) {
status = user->status;
- seq_printf(m, "%s", EVENT_NAME(user));
+ seq_printf(m, "%s", EVENT_TP_NAME(user));
if (status != 0)
seq_puts(m, " #");
--
2.34.1
On Tue, 23 Jan 2024 22:08:41 +0000
Beau Belgrave <[email protected]> wrote:
> The current code for finding and deleting events assumes that there will
> never be cases when user_events are registered with the same name, but
> different formats. In the future this scenario will exist to ensure
> user programs can be updated or modify their events and run different
> versions of their programs side-by-side without being blocked.
Ah, this is a very important point. Kernel always has only one instance
but user program doesn't. Thus it can define the same event name.
For the similar problem, uprobe event assumes that the user (here
admin) will define different group name to avoid it. But for the user
event, it is embedded, hmm.
>
> This change does not yet allow for multi-format events. If user_events
> are registered with the same name but different arguments the programs
> see the same return values as before. This change simply makes it
> possible to easily accomodate for this in future changes.
>
> Update find_user_event() to take in argument parameters and register
> flags to accomodate future multi-format event scenarios. Have find
> validate argument matching and return error pointers to cover address
> in use cases, or allocation errors. Update callers to handle error
> pointer logic.
Understand, that is similar to what probe events do.
>
> Move delete_user_event() to use hash walking directly now that find has
> changed. Delete all events found that match the register name, stop
> if an error occurs and report back to the user.
What happen if we run 2 different version of the applications and terminate
one of them? The event which is used by others will be kept?
Thank you,
>
> Update user_fields_match() to cover list_empty() scenarios instead of
> each callsite doing it now that find_user_event() uses it directly.
>
> Signed-off-by: Beau Belgrave <[email protected]>
> ---
> kernel/trace/trace_events_user.c | 106 +++++++++++++++++--------------
> 1 file changed, 58 insertions(+), 48 deletions(-)
>
> diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
> index 9365ce407426..0480579ba563 100644
> --- a/kernel/trace/trace_events_user.c
> +++ b/kernel/trace/trace_events_user.c
> @@ -202,6 +202,8 @@ static struct user_event_mm *user_event_mm_get(struct user_event_mm *mm);
> static struct user_event_mm *user_event_mm_get_all(struct user_event *user);
> static void user_event_mm_put(struct user_event_mm *mm);
> static int destroy_user_event(struct user_event *user);
> +static bool user_fields_match(struct user_event *user, int argc,
> + const char **argv);
>
> static u32 user_event_key(char *name)
> {
> @@ -1493,17 +1495,24 @@ static int destroy_user_event(struct user_event *user)
> }
>
> static struct user_event *find_user_event(struct user_event_group *group,
> - char *name, u32 *outkey)
> + char *name, int argc, const char **argv,
> + u32 flags, u32 *outkey)
> {
> struct user_event *user;
> u32 key = user_event_key(name);
>
> *outkey = key;
>
> - hash_for_each_possible(group->register_table, user, node, key)
> - if (!strcmp(EVENT_NAME(user), name))
> + hash_for_each_possible(group->register_table, user, node, key) {
> + if (strcmp(EVENT_NAME(user), name))
> + continue;
> +
> + if (user_fields_match(user, argc, argv))
> return user_event_get(user);
>
> + return ERR_PTR(-EADDRINUSE);
> + }
> +
> return NULL;
> }
>
> @@ -1860,6 +1869,9 @@ static bool user_fields_match(struct user_event *user, int argc,
> struct list_head *head = &user->fields;
> int i = 0;
>
> + if (argc == 0)
> + return list_empty(head);
> +
> list_for_each_entry_reverse(field, head, link) {
> if (!user_field_match(field, argc, argv, &i))
> return false;
> @@ -1880,10 +1892,8 @@ static bool user_event_match(const char *system, const char *event,
> match = strcmp(EVENT_NAME(user), event) == 0 &&
> (!system || strcmp(system, USER_EVENTS_SYSTEM) == 0);
>
> - if (match && argc > 0)
> + if (match)
> match = user_fields_match(user, argc, argv);
> - else if (match && argc == 0)
> - match = list_empty(&user->fields);
>
> return match;
> }
> @@ -1922,11 +1932,11 @@ static int user_event_parse(struct user_event_group *group, char *name,
> char *args, char *flags,
> struct user_event **newuser, int reg_flags)
> {
> - int ret;
> - u32 key;
> struct user_event *user;
> + char **argv = NULL;
> int argc = 0;
> - char **argv;
> + int ret;
> + u32 key;
>
> /* Currently don't support any text based flags */
> if (flags != NULL)
> @@ -1935,41 +1945,34 @@ static int user_event_parse(struct user_event_group *group, char *name,
> if (!user_event_capable(reg_flags))
> return -EPERM;
>
> + if (args) {
> + argv = argv_split(GFP_KERNEL, args, &argc);
> +
> + if (!argv)
> + return -ENOMEM;
> + }
> +
> /* Prevent dyn_event from racing */
> mutex_lock(&event_mutex);
> - user = find_user_event(group, name, &key);
> + user = find_user_event(group, name, argc, (const char **)argv,
> + reg_flags, &key);
> mutex_unlock(&event_mutex);
>
> - if (user) {
> - if (args) {
> - argv = argv_split(GFP_KERNEL, args, &argc);
> - if (!argv) {
> - ret = -ENOMEM;
> - goto error;
> - }
> + if (argv)
> + argv_free(argv);
>
> - ret = user_fields_match(user, argc, (const char **)argv);
> - argv_free(argv);
> -
> - } else
> - ret = list_empty(&user->fields);
> -
> - if (ret) {
> - *newuser = user;
> - /*
> - * Name is allocated by caller, free it since it already exists.
> - * Caller only worries about failure cases for freeing.
> - */
> - kfree(name);
> - } else {
> - ret = -EADDRINUSE;
> - goto error;
> - }
> + if (IS_ERR(user))
> + return PTR_ERR(user);
> +
> + if (user) {
> + *newuser = user;
> + /*
> + * Name is allocated by caller, free it since it already exists.
> + * Caller only worries about failure cases for freeing.
> + */
> + kfree(name);
>
> return 0;
> -error:
> - user_event_put(user, false);
> - return ret;
> }
>
> user = kzalloc(sizeof(*user), GFP_KERNEL_ACCOUNT);
> @@ -2052,25 +2055,32 @@ static int user_event_parse(struct user_event_group *group, char *name,
> }
>
> /*
> - * Deletes a previously created event if it is no longer being used.
> + * Deletes previously created events if they are no longer being used.
> */
> static int delete_user_event(struct user_event_group *group, char *name)
> {
> - u32 key;
> - struct user_event *user = find_user_event(group, name, &key);
> + struct user_event *user;
> + u32 key = user_event_key(name);
> + int ret = -ENOENT;
>
> - if (!user)
> - return -ENOENT;
> + /* Attempt to delete all event(s) with the name passed in */
> + hash_for_each_possible(group->register_table, user, node, key) {
> + if (strcmp(EVENT_NAME(user), name))
> + continue;
>
> - user_event_put(user, true);
> + if (!user_event_last_ref(user))
> + return -EBUSY;
>
> - if (!user_event_last_ref(user))
> - return -EBUSY;
> + if (!user_event_capable(user->reg_flags))
> + return -EPERM;
>
> - if (!user_event_capable(user->reg_flags))
> - return -EPERM;
> + ret = destroy_user_event(user);
>
> - return destroy_user_event(user);
> + if (ret)
> + goto out;
> + }
> +out:
> + return ret;
> }
>
> /*
> --
> 2.34.1
>
--
Masami Hiramatsu (Google) <[email protected]>
On Thu, Jan 25, 2024 at 09:59:03AM +0900, Masami Hiramatsu wrote:
> On Tue, 23 Jan 2024 22:08:41 +0000
> Beau Belgrave <[email protected]> wrote:
>
> > The current code for finding and deleting events assumes that there will
> > never be cases when user_events are registered with the same name, but
> > different formats. In the future this scenario will exist to ensure
> > user programs can be updated or modify their events and run different
> > versions of their programs side-by-side without being blocked.
>
> Ah, this is a very important point. Kernel always has only one instance
> but user program doesn't. Thus it can define the same event name.
> For the similar problem, uprobe event assumes that the user (here
> admin) will define different group name to avoid it. But for the user
> event, it is embedded, hmm.
>
Yes, the series will handle if multi-processes use the same name, we
will find a matching version of that name within the user_event group.
If there isn't one, a new one is created. Each is backed by an
independent tracepoint which does match up with how uprobe does it. This
actually got brought up in the tracefs meetings we've had and it seemed
to get wide agreement on how to best handle this.
> >
> > This change does not yet allow for multi-format events. If user_events
> > are registered with the same name but different arguments the programs
> > see the same return values as before. This change simply makes it
> > possible to easily accomodate for this in future changes.
> >
> > Update find_user_event() to take in argument parameters and register
> > flags to accomodate future multi-format event scenarios. Have find
> > validate argument matching and return error pointers to cover address
> > in use cases, or allocation errors. Update callers to handle error
> > pointer logic.
>
> Understand, that is similar to what probe events do.
>
> >
> > Move delete_user_event() to use hash walking directly now that find has
> > changed. Delete all events found that match the register name, stop
> > if an error occurs and report back to the user.
>
> What happen if we run 2 different version of the applications and terminate
> one of them? The event which is used by others will be kept?
>
Each unique version of a user_event has it's own ref-count. If one
version is not-used, but another version is, only the not-used version
will get deleted. The other version that is in use will return a -EBUSY
when it gets to that version via enumeration.
While we only have a single tracepoint per-version, we have several
user_event structures in memory that have the same name, yet different
formats. Each of which have their own lifetime, enablers and ref-counts
to keep them isolated from each other.
Thanks,
-Beau
> Thank you,
>
> >
> > Update user_fields_match() to cover list_empty() scenarios instead of
> > each callsite doing it now that find_user_event() uses it directly.
> >
> > Signed-off-by: Beau Belgrave <[email protected]>
> > ---
> > kernel/trace/trace_events_user.c | 106 +++++++++++++++++--------------
> > 1 file changed, 58 insertions(+), 48 deletions(-)
> >
> > diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
> > index 9365ce407426..0480579ba563 100644
> > --- a/kernel/trace/trace_events_user.c
> > +++ b/kernel/trace/trace_events_user.c
> > @@ -202,6 +202,8 @@ static struct user_event_mm *user_event_mm_get(struct user_event_mm *mm);
> > static struct user_event_mm *user_event_mm_get_all(struct user_event *user);
> > static void user_event_mm_put(struct user_event_mm *mm);
> > static int destroy_user_event(struct user_event *user);
> > +static bool user_fields_match(struct user_event *user, int argc,
> > + const char **argv);
> >
> > static u32 user_event_key(char *name)
> > {
> > @@ -1493,17 +1495,24 @@ static int destroy_user_event(struct user_event *user)
> > }
> >
> > static struct user_event *find_user_event(struct user_event_group *group,
> > - char *name, u32 *outkey)
> > + char *name, int argc, const char **argv,
> > + u32 flags, u32 *outkey)
> > {
> > struct user_event *user;
> > u32 key = user_event_key(name);
> >
> > *outkey = key;
> >
> > - hash_for_each_possible(group->register_table, user, node, key)
> > - if (!strcmp(EVENT_NAME(user), name))
> > + hash_for_each_possible(group->register_table, user, node, key) {
> > + if (strcmp(EVENT_NAME(user), name))
> > + continue;
> > +
> > + if (user_fields_match(user, argc, argv))
> > return user_event_get(user);
> >
> > + return ERR_PTR(-EADDRINUSE);
> > + }
> > +
> > return NULL;
> > }
> >
> > @@ -1860,6 +1869,9 @@ static bool user_fields_match(struct user_event *user, int argc,
> > struct list_head *head = &user->fields;
> > int i = 0;
> >
> > + if (argc == 0)
> > + return list_empty(head);
> > +
> > list_for_each_entry_reverse(field, head, link) {
> > if (!user_field_match(field, argc, argv, &i))
> > return false;
> > @@ -1880,10 +1892,8 @@ static bool user_event_match(const char *system, const char *event,
> > match = strcmp(EVENT_NAME(user), event) == 0 &&
> > (!system || strcmp(system, USER_EVENTS_SYSTEM) == 0);
> >
> > - if (match && argc > 0)
> > + if (match)
> > match = user_fields_match(user, argc, argv);
> > - else if (match && argc == 0)
> > - match = list_empty(&user->fields);
> >
> > return match;
> > }
> > @@ -1922,11 +1932,11 @@ static int user_event_parse(struct user_event_group *group, char *name,
> > char *args, char *flags,
> > struct user_event **newuser, int reg_flags)
> > {
> > - int ret;
> > - u32 key;
> > struct user_event *user;
> > + char **argv = NULL;
> > int argc = 0;
> > - char **argv;
> > + int ret;
> > + u32 key;
> >
> > /* Currently don't support any text based flags */
> > if (flags != NULL)
> > @@ -1935,41 +1945,34 @@ static int user_event_parse(struct user_event_group *group, char *name,
> > if (!user_event_capable(reg_flags))
> > return -EPERM;
> >
> > + if (args) {
> > + argv = argv_split(GFP_KERNEL, args, &argc);
> > +
> > + if (!argv)
> > + return -ENOMEM;
> > + }
> > +
> > /* Prevent dyn_event from racing */
> > mutex_lock(&event_mutex);
> > - user = find_user_event(group, name, &key);
> > + user = find_user_event(group, name, argc, (const char **)argv,
> > + reg_flags, &key);
> > mutex_unlock(&event_mutex);
> >
> > - if (user) {
> > - if (args) {
> > - argv = argv_split(GFP_KERNEL, args, &argc);
> > - if (!argv) {
> > - ret = -ENOMEM;
> > - goto error;
> > - }
> > + if (argv)
> > + argv_free(argv);
> >
> > - ret = user_fields_match(user, argc, (const char **)argv);
> > - argv_free(argv);
> > -
> > - } else
> > - ret = list_empty(&user->fields);
> > -
> > - if (ret) {
> > - *newuser = user;
> > - /*
> > - * Name is allocated by caller, free it since it already exists.
> > - * Caller only worries about failure cases for freeing.
> > - */
> > - kfree(name);
> > - } else {
> > - ret = -EADDRINUSE;
> > - goto error;
> > - }
> > + if (IS_ERR(user))
> > + return PTR_ERR(user);
> > +
> > + if (user) {
> > + *newuser = user;
> > + /*
> > + * Name is allocated by caller, free it since it already exists.
> > + * Caller only worries about failure cases for freeing.
> > + */
> > + kfree(name);
> >
> > return 0;
> > -error:
> > - user_event_put(user, false);
> > - return ret;
> > }
> >
> > user = kzalloc(sizeof(*user), GFP_KERNEL_ACCOUNT);
> > @@ -2052,25 +2055,32 @@ static int user_event_parse(struct user_event_group *group, char *name,
> > }
> >
> > /*
> > - * Deletes a previously created event if it is no longer being used.
> > + * Deletes previously created events if they are no longer being used.
> > */
> > static int delete_user_event(struct user_event_group *group, char *name)
> > {
> > - u32 key;
> > - struct user_event *user = find_user_event(group, name, &key);
> > + struct user_event *user;
> > + u32 key = user_event_key(name);
> > + int ret = -ENOENT;
> >
> > - if (!user)
> > - return -ENOENT;
> > + /* Attempt to delete all event(s) with the name passed in */
> > + hash_for_each_possible(group->register_table, user, node, key) {
> > + if (strcmp(EVENT_NAME(user), name))
> > + continue;
> >
> > - user_event_put(user, true);
> > + if (!user_event_last_ref(user))
> > + return -EBUSY;
> >
> > - if (!user_event_last_ref(user))
> > - return -EBUSY;
> > + if (!user_event_capable(user->reg_flags))
> > + return -EPERM;
> >
> > - if (!user_event_capable(user->reg_flags))
> > - return -EPERM;
> > + ret = destroy_user_event(user);
> >
> > - return destroy_user_event(user);
> > + if (ret)
> > + goto out;
> > + }
> > +out:
> > + return ret;
> > }
> >
> > /*
> > --
> > 2.34.1
> >
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>
It appears to put an outdated coversheet onto this series.
Below is the updated coversheet that reflects changes made:
Currently user_events supports 1 event with the same name and must have
the exact same format when referenced by multiple programs. This opens
an opportunity for malicous or poorly thought through programs to
create events that others use with different formats. Another scenario
is user programs wishing to use the same event name but add more fields
later when the software updates. Various versions of a program may be
running side-by-side, which is prevented by the current single format
requirement.
Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates
the user program wishes to use the same user_event name, but may have
several different formats of the event in the future. When this flag is
used, create the underlying tracepoint backing the user_event with a
unique name per-version of the format. It's important that existing ABI
users do not get this logic automatically, even if one of the multi
format events matches the format. This ensures existing programs that
create events and assume the tracepoint name will match exactly continue
to work as expected. Add logic to only check multi-format events with
other multi-format events and single-format events to only check
single-format events during find.
Change system name of the multi-format event tracepoint to ensure that
multi-format events are isolated completely from single-format events.
Add a register_name (reg_name) to the user_event struct which allows for
split naming of events. We now have the name that was used to register
within user_events as well as the unique name for the tracepoint. Upon
registering events ensure matches based on first the reg_name, followed
by the fields and format of the event. This allows for multiple events
with the same registered name to have different formats. The underlying
tracepoint will have a unique name in the format of {reg_name}:[unique_id].
For example, if both "test u32 value" and "test u64 value" are used with
the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
tracepoints. The dynamic_events file would then show the following:
u:test u64 count
u:test u32 count
The actual tracepoint names look like this:
test:[d5874fdac44]
test:[d5914662cd4]
Both would be under the new user_events_multi system name to prevent the
older ABI from being used to squat on multi-formatted events and block
their use.
Deleting events via "!u:test u64 count" would only delete the first
tracepoint that matched that format. When the delete ABI is used all
events with the same name will be attempted to be deleted. If
per-version deletion is required, user programs should either not use
persistent events or delete them via dynamic_events.
Thanks,
-Beau
On Tue, 23 Jan 2024 22:08:42 +0000
Beau Belgrave <[email protected]> wrote:
> Add a register_name (reg_name) to the user_event struct which allows for
> split naming of events. We now have the name that was used to register
> within user_events as well as the unique name for the tracepoint. Upon
> registering events ensure matches based on first the reg_name, followed
> by the fields and format of the event. This allows for multiple events
> with the same registered name to have different formats. The underlying
> tracepoint will have a unique name in the format of {reg_name}:[unique_id].
>
> For example, if both "test u32 value" and "test u64 value" are used with
> the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
> tracepoints. The dynamic_events file would then show the following:
> u:test u64 count
> u:test u32 count
>
> The actual tracepoint names look like this:
> test:[d5874fdac44]
> test:[d5914662cd4]
>
> Both would be under the new user_events_multi system name to prevent the
> older ABI from being used to squat on multi-formatted events and block
> their use.
[...]
> @@ -1923,6 +1972,39 @@ static int user_event_trace_register(struct user_event *user)
> return ret;
> }
>
> +static int user_event_set_tp_name(struct user_event *user)
> +{
> + lockdep_assert_held(&user->group->reg_mutex);
> +
> + if (EVENT_MULTI_FORMAT(user->reg_flags)) {
> + char *multi_name;
> + int len;
> +
> + len = snprintf(NULL, 0, "%s:[%llx]", user->reg_name,
> + user->group->multi_id) + 1;
> +
> + multi_name = kzalloc(len, GFP_KERNEL_ACCOUNT);
> +
> + if (!multi_name)
> + return -ENOMEM;
> +
> + snprintf(multi_name, len, "%s:[%llx]", user->reg_name,
> + user->group->multi_id);
OK, so the each different event has suffixed name. But this will
introduce non C-variable name.
Steve, do you think your library can handle these symbols? It will
be something like "event:[1]" as the event name.
Personally I like "event.1" style. (of course we need to ensure the
user given event name is NOT including such suffix numbers)
Thank you.
--
Masami Hiramatsu (Google) <[email protected]>
On Sat, Jan 27, 2024 at 12:01:04AM +0900, Masami Hiramatsu wrote:
> On Tue, 23 Jan 2024 22:08:42 +0000
> Beau Belgrave <[email protected]> wrote:
>
> > Add a register_name (reg_name) to the user_event struct which allows for
> > split naming of events. We now have the name that was used to register
> > within user_events as well as the unique name for the tracepoint. Upon
> > registering events ensure matches based on first the reg_name, followed
> > by the fields and format of the event. This allows for multiple events
> > with the same registered name to have different formats. The underlying
> > tracepoint will have a unique name in the format of {reg_name}:[unique_id].
> >
> > For example, if both "test u32 value" and "test u64 value" are used with
> > the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
> > tracepoints. The dynamic_events file would then show the following:
> > u:test u64 count
> > u:test u32 count
> >
> > The actual tracepoint names look like this:
> > test:[d5874fdac44]
> > test:[d5914662cd4]
> >
> > Both would be under the new user_events_multi system name to prevent the
> > older ABI from being used to squat on multi-formatted events and block
> > their use.
> [...]
> > @@ -1923,6 +1972,39 @@ static int user_event_trace_register(struct user_event *user)
> > return ret;
> > }
> >
> > +static int user_event_set_tp_name(struct user_event *user)
> > +{
> > + lockdep_assert_held(&user->group->reg_mutex);
> > +
> > + if (EVENT_MULTI_FORMAT(user->reg_flags)) {
> > + char *multi_name;
> > + int len;
> > +
> > + len = snprintf(NULL, 0, "%s:[%llx]", user->reg_name,
> > + user->group->multi_id) + 1;
> > +
> > + multi_name = kzalloc(len, GFP_KERNEL_ACCOUNT);
> > +
> > + if (!multi_name)
> > + return -ENOMEM;
> > +
> > + snprintf(multi_name, len, "%s:[%llx]", user->reg_name,
> > + user->group->multi_id);
>
> OK, so the each different event has suffixed name. But this will
> introduce non C-variable name.
>
> Steve, do you think your library can handle these symbols? It will
> be something like "event:[1]" as the event name.
> Personally I like "event.1" style. (of course we need to ensure the
> user given event name is NOT including such suffix numbers)
>
Just to clarify around events including a suffix number. This is why
multi-events use "user_events_multi" system name and the single-events
using just "user_events".
Even if a user program did include a suffix, the suffix would still get
appended. An example is "test" vs "test:[0]" using multi-format would
result in two tracepoints ("test:[0]" and "test:[0]:[1]" respectively
(assuming these are the first multi-events on the system).
I'm with you, we really don't want any spoofing or squatting possible.
By using different system names and always appending the suffix I
believe covers this.
Looking forward to hearing Steven's thoughts on this as well.
Thanks,
-Beau
> Thank you.
>
> --
> Masami Hiramatsu (Google) <[email protected]>
On Fri, 26 Jan 2024 11:10:07 -0800
Beau Belgrave <[email protected]> wrote:
> > OK, so the each different event has suffixed name. But this will
> > introduce non C-variable name.
> >
> > Steve, do you think your library can handle these symbols? It will
> > be something like "event:[1]" as the event name.
> > Personally I like "event.1" style. (of course we need to ensure the
> > user given event name is NOT including such suffix numbers)
> >
>
> Just to clarify around events including a suffix number. This is why
> multi-events use "user_events_multi" system name and the single-events
> using just "user_events".
>
> Even if a user program did include a suffix, the suffix would still get
> appended. An example is "test" vs "test:[0]" using multi-format would
> result in two tracepoints ("test:[0]" and "test:[0]:[1]" respectively
> (assuming these are the first multi-events on the system).
>
> I'm with you, we really don't want any spoofing or squatting possible.
> By using different system names and always appending the suffix I
> believe covers this.
>
> Looking forward to hearing Steven's thoughts on this as well.
I'm leaning towards Masami's suggestion to use dots, as that won't conflict
with special characters from bash, as '[' and ']' do.
-- Steve
On Fri, Jan 26, 2024 at 03:04:45PM -0500, Steven Rostedt wrote:
> On Fri, 26 Jan 2024 11:10:07 -0800
> Beau Belgrave <[email protected]> wrote:
>
> > > OK, so the each different event has suffixed name. But this will
> > > introduce non C-variable name.
> > >
> > > Steve, do you think your library can handle these symbols? It will
> > > be something like "event:[1]" as the event name.
> > > Personally I like "event.1" style. (of course we need to ensure the
> > > user given event name is NOT including such suffix numbers)
> > >
> >
> > Just to clarify around events including a suffix number. This is why
> > multi-events use "user_events_multi" system name and the single-events
> > using just "user_events".
> >
> > Even if a user program did include a suffix, the suffix would still get
> > appended. An example is "test" vs "test:[0]" using multi-format would
> > result in two tracepoints ("test:[0]" and "test:[0]:[1]" respectively
> > (assuming these are the first multi-events on the system).
> >
> > I'm with you, we really don't want any spoofing or squatting possible.
> > By using different system names and always appending the suffix I
> > believe covers this.
> >
> > Looking forward to hearing Steven's thoughts on this as well.
>
> I'm leaning towards Masami's suggestion to use dots, as that won't conflict
> with special characters from bash, as '[' and ']' do.
>
Thanks, yeah ideally we wouldn't use special characters.
I'm not picky about this. However, I did want something that clearly
allowed a glob pattern to find all versions of a given register name of
user_events by user programs that record. The dot notation will pull in
more than expected if dotted namespace style names are used.
An example is "Asserts" and "Asserts.Verbose" from different programs.
If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
doesn't work if the number is higher, like 128. If we ever decide to
change the ID from an integer to say hex to save space, these globs
would break.
Is there some scheme that fits the C-variable name that addresses the
above scenarios? Brackets gave me a simple glob that seemed to prevent a
lot of this ("Asserts.\[*\]" in this case).
Are we confident that we always want to represent the ID as a base-10
integer vs a base-16 integer? The suffix will be ABI to ensure recording
programs can find their events easily.
Thanks,
-Beau
> -- Steve
Hi Beau,
On Tue, 23 Jan 2024 22:08:40 +0000
Beau Belgrave <[email protected]> wrote:
> Currently user_events supports 1 event with the same name and must have
> the exact same format when referenced by multiple programs. This opens
> an opportunity for malicous or poorly thought through programs to
> create events that others use with different formats. Another scenario
> is user programs wishing to use the same event name but add more fields
> later when the software updates. Various versions of a program may be
> running side-by-side, which is prevented by the current single format
> requirement.
>
> Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates
> the user program wishes to use the same user_event name, but may have
> several different formats of the event in the future. When this flag is
> used, create the underlying tracepoint backing the user_event with a
> unique name per-version of the format. It's important that existing ABI
> users do not get this logic automatically, even if one of the multi
> format events matches the format. This ensures existing programs that
> create events and assume the tracepoint name will match exactly continue
> to work as expected. Add logic to only check multi-format events with
> other multi-format events and single-format events to only check
> single-format events during find.
Thanks for this work! This will allow many instance to use the same
user-events at the same time.
BTW, can we force this flag set by default? My concern is if any user
program use this user-event interface in the container (maybe it is
possible if we bind-mount it). In this case, the user program can
detect the other program is using the event if this flag is not set.
Moreover, if there is a malicious program running in the container,
it can prevent using the event name from other programs even if it
is isolated by the name-space.
Steve suggested that if a user program which is running in a namespace
uses user-event without this flag, we can reject that by default.
What would you think about?
Thank you,
>
> Add a register_name (reg_name) to the user_event struct which allows for
> split naming of events. We now have the name that was used to register
> within user_events as well as the unique name for the tracepoint. Upon
> registering events ensure matches based on first the reg_name, followed
> by the fields and format of the event. This allows for multiple events
> with the same registered name to have different formats. The underlying
> tracepoint will have a unique name in the format of {reg_name}:[unique_id].
> The unique_id is the time, in nanoseconds, of the event creation converted
> to hex. Since this is done under the register mutex, it is extremely
> unlikely for these IDs to ever match. It's also very unlikely a malicious
> program could consistently guess what the name would be and attempt to
> squat on it via the single format ABI.
>
> For example, if both "test u32 value" and "test u64 value" are used with
> the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
> tracepoints. The dynamic_events file would then show the following:
> u:test u64 count
> u:test u32 count
>
> The actual tracepoint names look like this:
> test:[d5874fdac44]
> test:[d5914662cd4]
>
> Deleting events via "!u:test u64 count" would only delete the first
> tracepoint that matched that format. When the delete ABI is used all
> events with the same name will be attempted to be deleted. If
> per-version deletion is required, user programs should either not use
> persistent events or delete them via dynamic_events.
>
> Beau Belgrave (4):
> tracing/user_events: Prepare find/delete for same name events
> tracing/user_events: Introduce multi-format events
> selftests/user_events: Test multi-format events
> tracing/user_events: Document multi-format flag
>
> Documentation/trace/user_events.rst | 23 +-
> include/uapi/linux/user_events.h | 6 +-
> kernel/trace/trace_events_user.c | 224 +++++++++++++-----
> .../testing/selftests/user_events/abi_test.c | 134 +++++++++++
> 4 files changed, 325 insertions(+), 62 deletions(-)
>
>
> base-commit: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
> --
> 2.34.1
>
--
Masami Hiramatsu (Google) <[email protected]>
On Mon, 29 Jan 2024 09:29:07 -0800
Beau Belgrave <[email protected]> wrote:
> Thanks, yeah ideally we wouldn't use special characters.
>
> I'm not picky about this. However, I did want something that clearly
> allowed a glob pattern to find all versions of a given register name of
> user_events by user programs that record. The dot notation will pull in
> more than expected if dotted namespace style names are used.
>
> An example is "Asserts" and "Asserts.Verbose" from different programs.
> If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
> will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
Do you prevent brackets in names?
>
> While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
> doesn't work if the number is higher, like 128. If we ever decide to
> change the ID from an integer to say hex to save space, these globs
> would break.
>
> Is there some scheme that fits the C-variable name that addresses the
> above scenarios? Brackets gave me a simple glob that seemed to prevent a
> lot of this ("Asserts.\[*\]" in this case).
Prevent a lot of what? I'm not sure what your example here is.
>
> Are we confident that we always want to represent the ID as a base-10
> integer vs a base-16 integer? The suffix will be ABI to ensure recording
> programs can find their events easily.
Is there a difference to what we choose?
-- Steve
On Mon, 29 Jan 2024 09:29:07 -0800
Beau Belgrave <[email protected]> wrote:
> On Fri, Jan 26, 2024 at 03:04:45PM -0500, Steven Rostedt wrote:
> > On Fri, 26 Jan 2024 11:10:07 -0800
> > Beau Belgrave <[email protected]> wrote:
> >
> > > > OK, so the each different event has suffixed name. But this will
> > > > introduce non C-variable name.
> > > >
> > > > Steve, do you think your library can handle these symbols? It will
> > > > be something like "event:[1]" as the event name.
> > > > Personally I like "event.1" style. (of course we need to ensure the
> > > > user given event name is NOT including such suffix numbers)
> > > >
> > >
> > > Just to clarify around events including a suffix number. This is why
> > > multi-events use "user_events_multi" system name and the single-events
> > > using just "user_events".
> > >
> > > Even if a user program did include a suffix, the suffix would still get
> > > appended. An example is "test" vs "test:[0]" using multi-format would
> > > result in two tracepoints ("test:[0]" and "test:[0]:[1]" respectively
> > > (assuming these are the first multi-events on the system).
> > >
> > > I'm with you, we really don't want any spoofing or squatting possible.
> > > By using different system names and always appending the suffix I
> > > believe covers this.
> > >
> > > Looking forward to hearing Steven's thoughts on this as well.
> >
> > I'm leaning towards Masami's suggestion to use dots, as that won't conflict
> > with special characters from bash, as '[' and ']' do.
> >
>
> Thanks, yeah ideally we wouldn't use special characters.
>
> I'm not picky about this. However, I did want something that clearly
> allowed a glob pattern to find all versions of a given register name of
> user_events by user programs that record. The dot notation will pull in
> more than expected if dotted namespace style names are used.
>
> An example is "Asserts" and "Asserts.Verbose" from different programs.
> If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
> will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
If we use dot for the suffix number, we can prohibit user to use it
for their name. They still can use '_' (or change the group name?)
I just concerned that the name can be parsed by existing tools. Since
':' is used as a separator for group and event name in some case (e.g.
tracefs "set_event" is using, so trace-cmd and perf is using it.)
> While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
> doesn't work if the number is higher, like 128. If we ever decide to
> change the ID from an integer to say hex to save space, these globs
> would break.
Hmm, why can't we use regexp?
And if we limits the number of events up to 1000 for each same-name event
we can use fixed numbers, like Assets.[0-9][0-9][0-9]
Thank you,
>
> Is there some scheme that fits the C-variable name that addresses the
> above scenarios? Brackets gave me a simple glob that seemed to prevent a
> lot of this ("Asserts.\[*\]" in this case).
>
> Are we confident that we always want to represent the ID as a base-10
> integer vs a base-16 integer? The suffix will be ABI to ensure recording
> programs can find their events easily.
>
> Thanks,
> -Beau
>
> > -- Steve
--
Masami Hiramatsu (Google) <[email protected]>
On Mon, Jan 29, 2024 at 09:24:07PM -0500, Steven Rostedt wrote:
> On Mon, 29 Jan 2024 09:29:07 -0800
> Beau Belgrave <[email protected]> wrote:
>
> > Thanks, yeah ideally we wouldn't use special characters.
> >
> > I'm not picky about this. However, I did want something that clearly
> > allowed a glob pattern to find all versions of a given register name of
> > user_events by user programs that record. The dot notation will pull in
> > more than expected if dotted namespace style names are used.
> >
> > An example is "Asserts" and "Asserts.Verbose" from different programs.
> > If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
> > will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
>
> Do you prevent brackets in names?
>
No. However, since brackets have a start and end token that are distinct
finding all versions of your event is trivial compared to a single dot.
Imagine two events:
Asserts
Asserts[MyCoolIndex]
Resolves to tracepoints of:
Asserts:[0]
Asserts[MyCoolIndex]:[1]
Regardless of brackets in the names, a simple glob of Asserts:\[*\] only
finds Asserts:[0]. This is because we have that end bracket in the glob
and the full event name including the start bracket.
If I register another "version" of Asserts, thne I'll have:
Asserts:[0]
Asserts[MyCoolIndex]:[1]
Asserts:[2]
The glob of Asserts:\[*\] will return both:
Asserts:[0]
Asserts:[2]
At this point the program can either record all versions or scan further
to find which version of Asserts is wanted.
> >
> > While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
> > doesn't work if the number is higher, like 128. If we ever decide to
> > change the ID from an integer to say hex to save space, these globs
> > would break.
> >
> > Is there some scheme that fits the C-variable name that addresses the
> > above scenarios? Brackets gave me a simple glob that seemed to prevent a
> > lot of this ("Asserts.\[*\]" in this case).
>
> Prevent a lot of what? I'm not sure what your example here is.
>
I'll try again :)
We have 2 events registered via user_events:
Asserts
Asserts.Verbose
Using dot notation these would result in tracepoints of:
user_events_multi/Asserts.0
user_events_multi/Asserts.Verbose.1
Using bracket notation these would result in tracepoints of:
user_events_multi/Asserts:[0]
user_events_multi/Asserts.Verbose:[1]
A recording program only wants to enable the Asserts tracepoint. It does
not want to record the Asserts.Verbose tracepoint.
The program must find the right tracepoint by scanning tracefs under the
user_events_multi system.
A single dot suffix does not allow a simple glob to be used. The glob
Asserts.* will return both Asserts.0 and Asserts.Verbose.1.
A simple glob of Asserts:\[*\] will only find Asserts:[0], it will not
find Asserts.Verbose:[1].
We could just use brackets and not have the colon (Asserts[0] in this
case). But brackets are still special for bash.
> >
> > Are we confident that we always want to represent the ID as a base-10
> > integer vs a base-16 integer? The suffix will be ABI to ensure recording
> > programs can find their events easily.
>
> Is there a difference to what we choose?
>
If a simple glob of event_name:\[*\] cannot be used, then we must document
what the suffix format is, so an appropriate regex can be created. If we
start with base-10 then later move to base-16 we will break existing regex
patterns on the recording side.
I prefer, and have in this series, a base-16 output since it saves on
the tracepoint name size.
Either way we go, we need to define how recording programs should find
the events they care about. So we must be very clear, IMHO, about the
format of the tracepoint names in our documentation.
I personally think recording programs are likely to get this wrong
without proper guidance.
Thanks,
-Beau
> -- Steve
On Tue, Jan 30, 2024 at 11:12:22PM +0900, Masami Hiramatsu wrote:
> On Mon, 29 Jan 2024 09:29:07 -0800
> Beau Belgrave <[email protected]> wrote:
>
> > On Fri, Jan 26, 2024 at 03:04:45PM -0500, Steven Rostedt wrote:
> > > On Fri, 26 Jan 2024 11:10:07 -0800
> > > Beau Belgrave <[email protected]> wrote:
> > >
> > > > > OK, so the each different event has suffixed name. But this will
> > > > > introduce non C-variable name.
> > > > >
> > > > > Steve, do you think your library can handle these symbols? It will
> > > > > be something like "event:[1]" as the event name.
> > > > > Personally I like "event.1" style. (of course we need to ensure the
> > > > > user given event name is NOT including such suffix numbers)
> > > > >
> > > >
> > > > Just to clarify around events including a suffix number. This is why
> > > > multi-events use "user_events_multi" system name and the single-events
> > > > using just "user_events".
> > > >
> > > > Even if a user program did include a suffix, the suffix would still get
> > > > appended. An example is "test" vs "test:[0]" using multi-format would
> > > > result in two tracepoints ("test:[0]" and "test:[0]:[1]" respectively
> > > > (assuming these are the first multi-events on the system).
> > > >
> > > > I'm with you, we really don't want any spoofing or squatting possible.
> > > > By using different system names and always appending the suffix I
> > > > believe covers this.
> > > >
> > > > Looking forward to hearing Steven's thoughts on this as well.
> > >
> > > I'm leaning towards Masami's suggestion to use dots, as that won't conflict
> > > with special characters from bash, as '[' and ']' do.
> > >
> >
> > Thanks, yeah ideally we wouldn't use special characters.
> >
> > I'm not picky about this. However, I did want something that clearly
> > allowed a glob pattern to find all versions of a given register name of
> > user_events by user programs that record. The dot notation will pull in
> > more than expected if dotted namespace style names are used.
> >
> > An example is "Asserts" and "Asserts.Verbose" from different programs.
> > If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
> > will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
>
> If we use dot for the suffix number, we can prohibit user to use it
> for their name. They still can use '_' (or change the group name?)
We could, however, we have user_event integration in OpenTelemetry and
I'm unsure if we should really try to restrict names. We'll also at some
point have libside integration, which might not have the same
restrictions on the user-tracer side as the kernel-tracer side.
I'm trying to restrict the user_event group name from changing outside
of an eventual tracer namespace. I'd like for each container to inherit
a tracer namespace long-term which decides what the actual group name
will be instead of users self-selecting names to prevent squatting or
spoofing of events.
> I just concerned that the name can be parsed by existing tools. Since
> ':' is used as a separator for group and event name in some case (e.g.
> tracefs "set_event" is using, so trace-cmd and perf is using it.)
>
Good point.
What about just event_name[unique_id]? IE: Drop the colon.
Brackets are still special in bash, but it would prevent simple glob
patterns from matching to incorrect tracepoints under user_events_multi.
> > While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
> > doesn't work if the number is higher, like 128. If we ever decide to
> > change the ID from an integer to say hex to save space, these globs
> > would break.
>
> Hmm, why can't we use regexp?
We can use regex, but we'll need to agree the suffix format. We won't be
able to change it after that point. I'd prefer a base-16/Hex suffix
either in brackets or a simple dot.
> And if we limits the number of events up to 1000 for each same-name event
> we can use fixed numbers, like Assets.[0-9][0-9][0-9]
>
I'm always wrong when I guess how many events programs will end up
using. Folks always surprise me. I'd rather have a solution that scales
to the max number of tracepoints allowed on the system (currently 16-bit
max value).
Thanks,
-Beau
> Thank you,
>
> >
> > Is there some scheme that fits the C-variable name that addresses the
> > above scenarios? Brackets gave me a simple glob that seemed to prevent a
> > lot of this ("Asserts.\[*\]" in this case).
> >
> > Are we confident that we always want to represent the ID as a base-10
> > integer vs a base-16 integer? The suffix will be ABI to ensure recording
> > programs can find their events easily.
> >
> > Thanks,
> > -Beau
> >
> > > -- Steve
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>
On Tue, Jan 30, 2024 at 11:09:33AM +0900, Masami Hiramatsu wrote:
> Hi Beau,
>
> On Tue, 23 Jan 2024 22:08:40 +0000
> Beau Belgrave <[email protected]> wrote:
>
> > Currently user_events supports 1 event with the same name and must have
> > the exact same format when referenced by multiple programs. This opens
> > an opportunity for malicous or poorly thought through programs to
> > create events that others use with different formats. Another scenario
> > is user programs wishing to use the same event name but add more fields
> > later when the software updates. Various versions of a program may be
> > running side-by-side, which is prevented by the current single format
> > requirement.
> >
> > Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates
> > the user program wishes to use the same user_event name, but may have
> > several different formats of the event in the future. When this flag is
> > used, create the underlying tracepoint backing the user_event with a
> > unique name per-version of the format. It's important that existing ABI
> > users do not get this logic automatically, even if one of the multi
> > format events matches the format. This ensures existing programs that
> > create events and assume the tracepoint name will match exactly continue
> > to work as expected. Add logic to only check multi-format events with
> > other multi-format events and single-format events to only check
> > single-format events during find.
>
> Thanks for this work! This will allow many instance to use the same
> user-events at the same time.
>
> BTW, can we force this flag set by default? My concern is if any user
> program use this user-event interface in the container (maybe it is
> possible if we bind-mount it). In this case, the user program can
> detect the other program is using the event if this flag is not set.
> Moreover, if there is a malicious program running in the container,
> it can prevent using the event name from other programs even if it
> is isolated by the name-space.
>
The multi-format use a different system name (user_events_multi). So you
cannot use the single-format flag to detect multi-format names, etc. You
can only use it to find other single-format names like you could always do.
Likewise, you cannot use the single-event flag to block or prevent
multi-format events from being created.
Changing this behavior to default would break all of our environments.
So I'm pretty sure it would break others. The current environment
expects tracepoints to show up as their registered name under the
user_events system name. If this changed out from under us on a specific
kernel version, that would be bad.
I'd like eventually to have a tracer namespace concept for containers.
Then we would have a user_event_group per tracer namespace. Then all
user_events within the container have a unique system name which fully
isolates them. However, even with that isolation, we still need a way to
allow programs in the same container to have different versions of the
same event name.
Multi-format events fixes this problem. I think isolation should be
dealt with via true namespace isolation at the tracing level.
> Steve suggested that if a user program which is running in a namespace
> uses user-event without this flag, we can reject that by default.
>
> What would you think about?
>
This would break all of our environments. It would make previously
compiled programs using the existing ABI from working as expected.
I'd much prefer that level of isolation to happen at the namespace level
and why user_events as plumbing for different groups to achieve this.
It's also why the ABI does not allow programs to state a system name.
I'm trying to reserve the system name for the group/tracer/namespace
isolation work.
Thanks,
-Beau
> Thank you,
>
>
> >
> > Add a register_name (reg_name) to the user_event struct which allows for
> > split naming of events. We now have the name that was used to register
> > within user_events as well as the unique name for the tracepoint. Upon
> > registering events ensure matches based on first the reg_name, followed
> > by the fields and format of the event. This allows for multiple events
> > with the same registered name to have different formats. The underlying
> > tracepoint will have a unique name in the format of {reg_name}:[unique_id].
> > The unique_id is the time, in nanoseconds, of the event creation converted
> > to hex. Since this is done under the register mutex, it is extremely
> > unlikely for these IDs to ever match. It's also very unlikely a malicious
> > program could consistently guess what the name would be and attempt to
> > squat on it via the single format ABI.
> >
> > For example, if both "test u32 value" and "test u64 value" are used with
> > the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
> > tracepoints. The dynamic_events file would then show the following:
> > u:test u64 count
> > u:test u32 count
> >
> > The actual tracepoint names look like this:
> > test:[d5874fdac44]
> > test:[d5914662cd4]
> >
> > Deleting events via "!u:test u64 count" would only delete the first
> > tracepoint that matched that format. When the delete ABI is used all
> > events with the same name will be attempted to be deleted. If
> > per-version deletion is required, user programs should either not use
> > persistent events or delete them via dynamic_events.
> >
> > Beau Belgrave (4):
> > tracing/user_events: Prepare find/delete for same name events
> > tracing/user_events: Introduce multi-format events
> > selftests/user_events: Test multi-format events
> > tracing/user_events: Document multi-format flag
> >
> > Documentation/trace/user_events.rst | 23 +-
> > include/uapi/linux/user_events.h | 6 +-
> > kernel/trace/trace_events_user.c | 224 +++++++++++++-----
> > .../testing/selftests/user_events/abi_test.c | 134 +++++++++++
> > 4 files changed, 325 insertions(+), 62 deletions(-)
> >
> >
> > base-commit: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
> > --
> > 2.34.1
> >
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>
On Tue, 30 Jan 2024 10:05:15 -0800
Beau Belgrave <[email protected]> wrote:
> On Mon, Jan 29, 2024 at 09:24:07PM -0500, Steven Rostedt wrote:
> > On Mon, 29 Jan 2024 09:29:07 -0800
> > Beau Belgrave <[email protected]> wrote:
> >
> > > Thanks, yeah ideally we wouldn't use special characters.
> > >
> > > I'm not picky about this. However, I did want something that clearly
> > > allowed a glob pattern to find all versions of a given register name of
> > > user_events by user programs that record. The dot notation will pull in
> > > more than expected if dotted namespace style names are used.
> > >
> > > An example is "Asserts" and "Asserts.Verbose" from different programs.
> > > If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
> > > will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
> >
> > Do you prevent brackets in names?
> >
>
> No. However, since brackets have a start and end token that are distinct
> finding all versions of your event is trivial compared to a single dot.
>
> Imagine two events:
> Asserts
> Asserts[MyCoolIndex]
>
> Resolves to tracepoints of:
> Asserts:[0]
> Asserts[MyCoolIndex]:[1]
>
> Regardless of brackets in the names, a simple glob of Asserts:\[*\] only
> finds Asserts:[0]. This is because we have that end bracket in the glob
> and the full event name including the start bracket.
>
> If I register another "version" of Asserts, thne I'll have:
> Asserts:[0]
> Asserts[MyCoolIndex]:[1]
> Asserts:[2]
>
> The glob of Asserts:\[*\] will return both:
> Asserts:[0]
> Asserts:[2]
But what if you had registered "Asserts:[MyCoolIndex]:[1]"
Do you prevent colons?
>
> At this point the program can either record all versions or scan further
> to find which version of Asserts is wanted.
>
> > >
> > > While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
> > > doesn't work if the number is higher, like 128. If we ever decide to
> > > change the ID from an integer to say hex to save space, these globs
> > > would break.
> > >
> > > Is there some scheme that fits the C-variable name that addresses the
> > > above scenarios? Brackets gave me a simple glob that seemed to prevent a
> > > lot of this ("Asserts.\[*\]" in this case).
> >
> > Prevent a lot of what? I'm not sure what your example here is.
> >
>
> I'll try again :)
>
> We have 2 events registered via user_events:
> Asserts
> Asserts.Verbose
>
> Using dot notation these would result in tracepoints of:
> user_events_multi/Asserts.0
> user_events_multi/Asserts.Verbose.1
>
> Using bracket notation these would result in tracepoints of:
> user_events_multi/Asserts:[0]
> user_events_multi/Asserts.Verbose:[1]
>
> A recording program only wants to enable the Asserts tracepoint. It does
> not want to record the Asserts.Verbose tracepoint.
>
> The program must find the right tracepoint by scanning tracefs under the
> user_events_multi system.
>
> A single dot suffix does not allow a simple glob to be used. The glob
> Asserts.* will return both Asserts.0 and Asserts.Verbose.1.
>
> A simple glob of Asserts:\[*\] will only find Asserts:[0], it will not
> find Asserts.Verbose:[1].
>
> We could just use brackets and not have the colon (Asserts[0] in this
> case). But brackets are still special for bash.
Are these shell scripts or programs. I use regex in programs all the time.
And if you have shell scripts, use awk or something.
Unless you prevent something from being added, I don't see the protection.
>
> > >
> > > Are we confident that we always want to represent the ID as a base-10
> > > integer vs a base-16 integer? The suffix will be ABI to ensure recording
> > > programs can find their events easily.
> >
> > Is there a difference to what we choose?
> >
>
> If a simple glob of event_name:\[*\] cannot be used, then we must document
> what the suffix format is, so an appropriate regex can be created. If we
> start with base-10 then later move to base-16 we will break existing regex
> patterns on the recording side.
>
> I prefer, and have in this series, a base-16 output since it saves on
> the tracepoint name size.
I honestly don't care which base you use. So if you want to use base 16,
I'm fine with that.
>
> Either way we go, we need to define how recording programs should find
> the events they care about. So we must be very clear, IMHO, about the
> format of the tracepoint names in our documentation.
>
> I personally think recording programs are likely to get this wrong
> without proper guidance.
>
Agreed.
-- Steve
On Tue, Jan 30, 2024 at 01:52:30PM -0500, Steven Rostedt wrote:
> On Tue, 30 Jan 2024 10:05:15 -0800
> Beau Belgrave <[email protected]> wrote:
>
> > On Mon, Jan 29, 2024 at 09:24:07PM -0500, Steven Rostedt wrote:
> > > On Mon, 29 Jan 2024 09:29:07 -0800
> > > Beau Belgrave <[email protected]> wrote:
> > >
> > > > Thanks, yeah ideally we wouldn't use special characters.
> > > >
> > > > I'm not picky about this. However, I did want something that clearly
> > > > allowed a glob pattern to find all versions of a given register name of
> > > > user_events by user programs that record. The dot notation will pull in
> > > > more than expected if dotted namespace style names are used.
> > > >
> > > > An example is "Asserts" and "Asserts.Verbose" from different programs.
> > > > If we tried to find all versions of "Asserts" via glob of "Asserts.*" it
> > > > will pull in "Asserts.Verbose.1" in addition to "Asserts.0".
> > >
> > > Do you prevent brackets in names?
> > >
> >
> > No. However, since brackets have a start and end token that are distinct
> > finding all versions of your event is trivial compared to a single dot.
> >
> > Imagine two events:
> > Asserts
> > Asserts[MyCoolIndex]
> >
> > Resolves to tracepoints of:
> > Asserts:[0]
> > Asserts[MyCoolIndex]:[1]
> >
> > Regardless of brackets in the names, a simple glob of Asserts:\[*\] only
> > finds Asserts:[0]. This is because we have that end bracket in the glob
> > and the full event name including the start bracket.
> >
> > If I register another "version" of Asserts, thne I'll have:
> > Asserts:[0]
> > Asserts[MyCoolIndex]:[1]
> > Asserts:[2]
> >
> > The glob of Asserts:\[*\] will return both:
> > Asserts:[0]
> > Asserts:[2]
>
> But what if you had registered "Asserts:[MyCoolIndex]:[1]"
>
Good point, the above would still require a regex type pattern to not
get pulled in.
> Do you prevent colons?
>
No, nothing is prevented at this point.
It seems we could either prevent certain characters to make it easier or
define a good regex that we should document.
I'm leaning toward just doing a simple suffix and documenting the regex
well.
> >
> > At this point the program can either record all versions or scan further
> > to find which version of Asserts is wanted.
> >
> > > >
> > > > While a glob of "Asserts.[0-9]" works when the unique ID is 0-9, it
> > > > doesn't work if the number is higher, like 128. If we ever decide to
> > > > change the ID from an integer to say hex to save space, these globs
> > > > would break.
> > > >
> > > > Is there some scheme that fits the C-variable name that addresses the
> > > > above scenarios? Brackets gave me a simple glob that seemed to prevent a
> > > > lot of this ("Asserts.\[*\]" in this case).
> > >
> > > Prevent a lot of what? I'm not sure what your example here is.
> > >
> >
> > I'll try again :)
> >
> > We have 2 events registered via user_events:
> > Asserts
> > Asserts.Verbose
> >
> > Using dot notation these would result in tracepoints of:
> > user_events_multi/Asserts.0
> > user_events_multi/Asserts.Verbose.1
> >
> > Using bracket notation these would result in tracepoints of:
> > user_events_multi/Asserts:[0]
> > user_events_multi/Asserts.Verbose:[1]
> >
> > A recording program only wants to enable the Asserts tracepoint. It does
> > not want to record the Asserts.Verbose tracepoint.
> >
> > The program must find the right tracepoint by scanning tracefs under the
> > user_events_multi system.
> >
> > A single dot suffix does not allow a simple glob to be used. The glob
> > Asserts.* will return both Asserts.0 and Asserts.Verbose.1.
> >
> > A simple glob of Asserts:\[*\] will only find Asserts:[0], it will not
> > find Asserts.Verbose:[1].
> >
> > We could just use brackets and not have the colon (Asserts[0] in this
> > case). But brackets are still special for bash.
>
> Are these shell scripts or programs. I use regex in programs all the time.
> And if you have shell scripts, use awk or something.
>
They could be both. In our case, it is a program.
> Unless you prevent something from being added, I don't see the protection.
>
Yeah, it just makes it way less likely. Given that, I'm starting to lean
toward just documenting the regex well and not trying to get fancy.
> >
> > > >
> > > > Are we confident that we always want to represent the ID as a base-10
> > > > integer vs a base-16 integer? The suffix will be ABI to ensure recording
> > > > programs can find their events easily.
> > >
> > > Is there a difference to what we choose?
> > >
> >
> > If a simple glob of event_name:\[*\] cannot be used, then we must document
> > what the suffix format is, so an appropriate regex can be created. If we
> > start with base-10 then later move to base-16 we will break existing regex
> > patterns on the recording side.
> >
> > I prefer, and have in this series, a base-16 output since it saves on
> > the tracepoint name size.
>
> I honestly don't care which base you use. So if you want to use base 16,
> I'm fine with that.
>
> >
> > Either way we go, we need to define how recording programs should find
> > the events they care about. So we must be very clear, IMHO, about the
> > format of the tracepoint names in our documentation.
> >
> > I personally think recording programs are likely to get this wrong
> > without proper guidance.
> >
>
> Agreed.
>
> -- Steve
Thanks,
-Beau
On Tue, 30 Jan 2024 10:25:49 -0800
Beau Belgrave <[email protected]> wrote:
> On Tue, Jan 30, 2024 at 11:09:33AM +0900, Masami Hiramatsu wrote:
> > Hi Beau,
> >
> > On Tue, 23 Jan 2024 22:08:40 +0000
> > Beau Belgrave <[email protected]> wrote:
> >
> > > Currently user_events supports 1 event with the same name and must have
> > > the exact same format when referenced by multiple programs. This opens
> > > an opportunity for malicous or poorly thought through programs to
> > > create events that others use with different formats. Another scenario
> > > is user programs wishing to use the same event name but add more fields
> > > later when the software updates. Various versions of a program may be
> > > running side-by-side, which is prevented by the current single format
> > > requirement.
> > >
> > > Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates
> > > the user program wishes to use the same user_event name, but may have
> > > several different formats of the event in the future. When this flag is
> > > used, create the underlying tracepoint backing the user_event with a
> > > unique name per-version of the format. It's important that existing ABI
> > > users do not get this logic automatically, even if one of the multi
> > > format events matches the format. This ensures existing programs that
> > > create events and assume the tracepoint name will match exactly continue
> > > to work as expected. Add logic to only check multi-format events with
> > > other multi-format events and single-format events to only check
> > > single-format events during find.
> >
> > Thanks for this work! This will allow many instance to use the same
> > user-events at the same time.
> >
> > BTW, can we force this flag set by default? My concern is if any user
> > program use this user-event interface in the container (maybe it is
> > possible if we bind-mount it). In this case, the user program can
> > detect the other program is using the event if this flag is not set.
> > Moreover, if there is a malicious program running in the container,
> > it can prevent using the event name from other programs even if it
> > is isolated by the name-space.
> >
>
> The multi-format use a different system name (user_events_multi). So you
> cannot use the single-format flag to detect multi-format names, etc. You
> can only use it to find other single-format names like you could always do.
>
> Likewise, you cannot use the single-event flag to block or prevent
> multi-format events from being created.
Hmm, got it.
>
> Changing this behavior to default would break all of our environments.
> So I'm pretty sure it would break others. The current environment
> expects tracepoints to show up as their registered name under the
> user_events system name. If this changed out from under us on a specific
> kernel version, that would be bad.
>
> I'd like eventually to have a tracer namespace concept for containers.
> Then we would have a user_event_group per tracer namespace. Then all
> user_events within the container have a unique system name which fully
> isolates them. However, even with that isolation, we still need a way to
> allow programs in the same container to have different versions of the
> same event name.
Agreed.
>
> Multi-format events fixes this problem. I think isolation should be
> dealt with via true namespace isolation at the tracing level.
>
> > Steve suggested that if a user program which is running in a namespace
> > uses user-event without this flag, we can reject that by default.
> >
> > What would you think about?
> >
>
> This would break all of our environments. It would make previously
> compiled programs using the existing ABI from working as expected.
>
> I'd much prefer that level of isolation to happen at the namespace level
> and why user_events as plumbing for different groups to achieve this.
> It's also why the ABI does not allow programs to state a system name.
> I'm trying to reserve the system name for the group/tracer/namespace
> isolation work.
OK, that's reasonable enough.
Thank you!
>
> Thanks,
> -Beau
>
> > Thank you,
> >
> >
> > >
> > > Add a register_name (reg_name) to the user_event struct which allows for
> > > split naming of events. We now have the name that was used to register
> > > within user_events as well as the unique name for the tracepoint. Upon
> > > registering events ensure matches based on first the reg_name, followed
> > > by the fields and format of the event. This allows for multiple events
> > > with the same registered name to have different formats. The underlying
> > > tracepoint will have a unique name in the format of {reg_name}:[unique_id].
> > > The unique_id is the time, in nanoseconds, of the event creation converted
> > > to hex. Since this is done under the register mutex, it is extremely
> > > unlikely for these IDs to ever match. It's also very unlikely a malicious
> > > program could consistently guess what the name would be and attempt to
> > > squat on it via the single format ABI.
> > >
> > > For example, if both "test u32 value" and "test u64 value" are used with
> > > the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
> > > tracepoints. The dynamic_events file would then show the following:
> > > u:test u64 count
> > > u:test u32 count
> > >
> > > The actual tracepoint names look like this:
> > > test:[d5874fdac44]
> > > test:[d5914662cd4]
> > >
> > > Deleting events via "!u:test u64 count" would only delete the first
> > > tracepoint that matched that format. When the delete ABI is used all
> > > events with the same name will be attempted to be deleted. If
> > > per-version deletion is required, user programs should either not use
> > > persistent events or delete them via dynamic_events.
> > >
> > > Beau Belgrave (4):
> > > tracing/user_events: Prepare find/delete for same name events
> > > tracing/user_events: Introduce multi-format events
> > > selftests/user_events: Test multi-format events
> > > tracing/user_events: Document multi-format flag
> > >
> > > Documentation/trace/user_events.rst | 23 +-
> > > include/uapi/linux/user_events.h | 6 +-
> > > kernel/trace/trace_events_user.c | 224 +++++++++++++-----
> > > .../testing/selftests/user_events/abi_test.c | 134 +++++++++++
> > > 4 files changed, 325 insertions(+), 62 deletions(-)
> > >
> > >
> > > base-commit: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
> > > --
> > > 2.34.1
> > >
> >
> >
> > --
> > Masami Hiramatsu (Google) <[email protected]>
--
Masami Hiramatsu (Google) <[email protected]>