2024-02-02 19:10:02

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH v2 0/4] tracing/user_events: Introduce multi-format events

Currently user_events supports 1 event with the same name and must have
the exact same format when referenced by multiple programs. This opens
an opportunity for malicous or poorly thought through programs to
create events that others use with different formats. Another scenario
is user programs wishing to use the same event name but add more fields
later when the software updates. Various versions of a program may be
running side-by-side, which is prevented by the current single format
requirement.

Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates
the user program wishes to use the same user_event name, but may have
several different formats of the event in the future. When this flag is
used, create the underlying tracepoint backing the user_event with a
unique name per-version of the format. It's important that existing ABI
users do not get this logic automatically, even if one of the multi
format events matches the format. This ensures existing programs that
create events and assume the tracepoint name will match exactly continue
to work as expected. Add logic to only check multi-format events with
other multi-format events and single-format events to only check
single-format events during find.

Change system name of the multi-format event tracepoint to ensure that
multi-format events are isolated completely from single-format events.
This prevents single-format names from conflicting with multi-format
events if they end with the same suffix as the multi-format events.

Add a register_name (reg_name) to the user_event struct which allows for
split naming of events. We now have the name that was used to register
within user_events as well as the unique name for the tracepoint. Upon
registering events ensure matches based on first the reg_name, followed
by the fields and format of the event. This allows for multiple events
with the same registered name to have different formats. The underlying
tracepoint will have a unique name in the format of {reg_name}.{unique_id}.

For example, if both "test u32 value" and "test u64 value" are used with
the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique
tracepoints. The dynamic_events file would then show the following:
u:test u64 count
u:test u32 count

The actual tracepoint names look like this:
test.0
test.1

Both would be under the new user_events_multi system name to prevent the
older ABI from being used to squat on multi-formatted events and block
their use.

Deleting events via "!u:test u64 count" would only delete the first
tracepoint that matched that format. When the delete ABI is used all
events with the same name will be attempted to be deleted. If
per-version deletion is required, user programs should either not use
persistent events or delete them via dynamic_events.

Changes in v2:
Tracepoint names changed from "name:[id]" to "name.id". Feedback
was the : could conflict with system name formats. []'s are also
special characters for bash.

Updated self-test and docs to reflect the new suffix format.

Updated docs to include a regex example to help guide recording
programs find the correct event in ambiguous cases.

Beau Belgrave (4):
tracing/user_events: Prepare find/delete for same name events
tracing/user_events: Introduce multi-format events
selftests/user_events: Test multi-format events
tracing/user_events: Document multi-format flag

Documentation/trace/user_events.rst | 27 ++-
include/uapi/linux/user_events.h | 6 +-
kernel/trace/trace_events_user.c | 224 +++++++++++++-----
.../testing/selftests/user_events/abi_test.c | 134 +++++++++++
4 files changed, 329 insertions(+), 62 deletions(-)


base-commit: 610a9b8f49fbcf1100716370d3b5f6f884a2835a
--
2.34.1



2024-02-02 19:10:06

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH v2 1/4] tracing/user_events: Prepare find/delete for same name events

The current code for finding and deleting events assumes that there will
never be cases when user_events are registered with the same name, but
different formats. In the future this scenario will exist to ensure
user programs can be updated or modify their events and run different
versions of their programs side-by-side without being blocked.

This change does not yet allow for multi-format events. If user_events
are registered with the same name but different arguments the programs
see the same return values as before. This change simply makes it
possible to easily accomodate for this in future changes.

Update find_user_event() to take in argument parameters and register
flags to accomodate future multi-format event scenarios. Have find
validate argument matching and return error pointers to cover address
in use cases, or allocation errors. Update callers to handle error
pointer logic.

Move delete_user_event() to use hash walking directly now that find has
changed. Delete all events found that match the register name, stop
if an error occurs and report back to the user.

Update user_fields_match() to cover list_empty() scenarios instead of
each callsite doing it now that find_user_event() uses it directly.

Signed-off-by: Beau Belgrave <[email protected]>
---
kernel/trace/trace_events_user.c | 106 +++++++++++++++++--------------
1 file changed, 58 insertions(+), 48 deletions(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 9365ce407426..0480579ba563 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -202,6 +202,8 @@ static struct user_event_mm *user_event_mm_get(struct user_event_mm *mm);
static struct user_event_mm *user_event_mm_get_all(struct user_event *user);
static void user_event_mm_put(struct user_event_mm *mm);
static int destroy_user_event(struct user_event *user);
+static bool user_fields_match(struct user_event *user, int argc,
+ const char **argv);

static u32 user_event_key(char *name)
{
@@ -1493,17 +1495,24 @@ static int destroy_user_event(struct user_event *user)
}

static struct user_event *find_user_event(struct user_event_group *group,
- char *name, u32 *outkey)
+ char *name, int argc, const char **argv,
+ u32 flags, u32 *outkey)
{
struct user_event *user;
u32 key = user_event_key(name);

*outkey = key;

- hash_for_each_possible(group->register_table, user, node, key)
- if (!strcmp(EVENT_NAME(user), name))
+ hash_for_each_possible(group->register_table, user, node, key) {
+ if (strcmp(EVENT_NAME(user), name))
+ continue;
+
+ if (user_fields_match(user, argc, argv))
return user_event_get(user);

+ return ERR_PTR(-EADDRINUSE);
+ }
+
return NULL;
}

@@ -1860,6 +1869,9 @@ static bool user_fields_match(struct user_event *user, int argc,
struct list_head *head = &user->fields;
int i = 0;

+ if (argc == 0)
+ return list_empty(head);
+
list_for_each_entry_reverse(field, head, link) {
if (!user_field_match(field, argc, argv, &i))
return false;
@@ -1880,10 +1892,8 @@ static bool user_event_match(const char *system, const char *event,
match = strcmp(EVENT_NAME(user), event) == 0 &&
(!system || strcmp(system, USER_EVENTS_SYSTEM) == 0);

- if (match && argc > 0)
+ if (match)
match = user_fields_match(user, argc, argv);
- else if (match && argc == 0)
- match = list_empty(&user->fields);

return match;
}
@@ -1922,11 +1932,11 @@ static int user_event_parse(struct user_event_group *group, char *name,
char *args, char *flags,
struct user_event **newuser, int reg_flags)
{
- int ret;
- u32 key;
struct user_event *user;
+ char **argv = NULL;
int argc = 0;
- char **argv;
+ int ret;
+ u32 key;

/* Currently don't support any text based flags */
if (flags != NULL)
@@ -1935,41 +1945,34 @@ static int user_event_parse(struct user_event_group *group, char *name,
if (!user_event_capable(reg_flags))
return -EPERM;

+ if (args) {
+ argv = argv_split(GFP_KERNEL, args, &argc);
+
+ if (!argv)
+ return -ENOMEM;
+ }
+
/* Prevent dyn_event from racing */
mutex_lock(&event_mutex);
- user = find_user_event(group, name, &key);
+ user = find_user_event(group, name, argc, (const char **)argv,
+ reg_flags, &key);
mutex_unlock(&event_mutex);

- if (user) {
- if (args) {
- argv = argv_split(GFP_KERNEL, args, &argc);
- if (!argv) {
- ret = -ENOMEM;
- goto error;
- }
+ if (argv)
+ argv_free(argv);

- ret = user_fields_match(user, argc, (const char **)argv);
- argv_free(argv);
-
- } else
- ret = list_empty(&user->fields);
-
- if (ret) {
- *newuser = user;
- /*
- * Name is allocated by caller, free it since it already exists.
- * Caller only worries about failure cases for freeing.
- */
- kfree(name);
- } else {
- ret = -EADDRINUSE;
- goto error;
- }
+ if (IS_ERR(user))
+ return PTR_ERR(user);
+
+ if (user) {
+ *newuser = user;
+ /*
+ * Name is allocated by caller, free it since it already exists.
+ * Caller only worries about failure cases for freeing.
+ */
+ kfree(name);

return 0;
-error:
- user_event_put(user, false);
- return ret;
}

user = kzalloc(sizeof(*user), GFP_KERNEL_ACCOUNT);
@@ -2052,25 +2055,32 @@ static int user_event_parse(struct user_event_group *group, char *name,
}

/*
- * Deletes a previously created event if it is no longer being used.
+ * Deletes previously created events if they are no longer being used.
*/
static int delete_user_event(struct user_event_group *group, char *name)
{
- u32 key;
- struct user_event *user = find_user_event(group, name, &key);
+ struct user_event *user;
+ u32 key = user_event_key(name);
+ int ret = -ENOENT;

- if (!user)
- return -ENOENT;
+ /* Attempt to delete all event(s) with the name passed in */
+ hash_for_each_possible(group->register_table, user, node, key) {
+ if (strcmp(EVENT_NAME(user), name))
+ continue;

- user_event_put(user, true);
+ if (!user_event_last_ref(user))
+ return -EBUSY;

- if (!user_event_last_ref(user))
- return -EBUSY;
+ if (!user_event_capable(user->reg_flags))
+ return -EPERM;

- if (!user_event_capable(user->reg_flags))
- return -EPERM;
+ ret = destroy_user_event(user);

- return destroy_user_event(user);
+ if (ret)
+ goto out;
+ }
+out:
+ return ret;
}

/*
--
2.34.1


2024-02-02 19:30:13

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH v2 4/4] tracing/user_events: Document multi-format flag

User programs can now ask user_events to handle the synchronization of
multiple different formats for an event with the same name via the new
USER_EVENT_REG_MULTI_FORMAT flag.

Add a section for USER_EVENT_REG_MULTI_FORMAT that explains the intended
purpose and caveats of using it. Explain how deletion works in these
cases and how to use /sys/kernel/tracing/dynamic_events for per-version
deletion.

Signed-off-by: Beau Belgrave <[email protected]>
---
Documentation/trace/user_events.rst | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user_events.rst
index d8f12442aaa6..1d5a7626e6a6 100644
--- a/Documentation/trace/user_events.rst
+++ b/Documentation/trace/user_events.rst
@@ -92,6 +92,24 @@ The following flags are currently supported.
process closes or unregisters the event. Requires CAP_PERFMON otherwise
-EPERM is returned.

++ USER_EVENT_REG_MULTI_FORMAT: The event can contain multiple formats. This
+ allows programs to prevent themselves from being blocked when their event
+ format changes and they wish to use the same name. When this flag is used the
+ tracepoint name will be in the new format of "name.unique_id" vs the older
+ format of "name". A tracepoint will be created for each unique pair of name
+ and format. This means if several processes use the same name and format,
+ they will use the same tracepoint. If yet another process uses the same name,
+ but a different format than the other processes, it will use a different
+ tracepoint with a new unique id. Recording programs need to scan tracefs for
+ the various different formats of the event name they are interested in
+ recording. The system name of the tracepoint will also use "user_events_multi"
+ instead of "user_events". This prevents single-format event names conflicting
+ with any multi-format event names within tracefs. The unique_id is output as
+ a hex string. Recording programs should ensure the tracepoint name starts with
+ the event name they registered and has a suffix that starts with . and only
+ has hex characters. For example to find all versions of the event "test" you
+ can use the regex "^test\.[0-9a-fA-F]+$".
+
Upon successful registration the following is set.

+ write_index: The index to use for this file descriptor that represents this
@@ -106,6 +124,9 @@ or perf record -e user_events:[name] when attaching/recording.
**NOTE:** The event subsystem name by default is "user_events". Callers should
not assume it will always be "user_events". Operators reserve the right in the
future to change the subsystem name per-process to accommodate event isolation.
+In addition if the USER_EVENT_REG_MULTI_FORMAT flag is used the tracepoint name
+will have a unique id appended to it and the system name will be
+"user_events_multi" as described above.

Command Format
^^^^^^^^^^^^^^
@@ -156,7 +177,11 @@ to request deletes than the one used for registration due to this.
to the event. If programs do not want auto-delete, they must use the
USER_EVENT_REG_PERSIST flag when registering the event. Once that flag is used
the event exists until DIAG_IOCSDEL is invoked. Both register and delete of an
-event that persists requires CAP_PERFMON, otherwise -EPERM is returned.
+event that persists requires CAP_PERFMON, otherwise -EPERM is returned. When
+there are multiple formats of the same event name, all events with the same
+name will be attempted to be deleted. If only a specific version is wanted to
+be deleted then the /sys/kernel/tracing/dynamic_events file should be used for
+that specific format of the event.

Unregistering
-------------
--
2.34.1


2024-02-14 04:14:53

by Oliver Sang

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] tracing/user_events: Prepare find/delete for same name events



Hello,

kernel test robot noticed "BUG:KASAN:slab-use-after-free_in_user_events_ioctl" on:

commit: fecc001d587ceeeb47043c20353f257e3f01b39f ("[PATCH v2 1/4] tracing/user_events: Prepare find/delete for same name events")
url: https://github.com/intel-lab-lkp/linux/commits/Beau-Belgrave/tracing-user_events-Prepare-find-delete-for-same-name-events/20240203-031207
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH v2 1/4] tracing/user_events: Prepare find/delete for same name events

in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:

group: user_events



compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 106.969333][ T2278] BUG: KASAN: slab-use-after-free in user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543)
[ 106.970079][ T2278] Read of size 8 at addr ffff88816644ef38 by task abi_test/2278
[ 106.970788][ T2278]
[ 106.971058][ T2278] CPU: 2 PID: 2278 Comm: abi_test Not tainted 6.7.0-rc8-00001-gfecc001d587c #1
[ 106.971881][ T2278] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021
[ 106.972829][ T2278] Call Trace:
[ 106.973185][ T2278] <TASK>
[ 106.973514][ T2278] dump_stack_lvl (lib/dump_stack.c:108)
[ 106.973966][ T2278] print_address_description+0x2c/0x3a0
[ 106.974597][ T2278] ? user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543)
[ 106.975099][ T2278] print_report (mm/kasan/report.c:476)
[ 106.975542][ T2278] ? kasan_addr_to_slab (mm/kasan/common.c:35)
[ 106.976025][ T2278] ? user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543)
[ 106.976531][ T2278] kasan_report (mm/kasan/report.c:590)
[ 106.976978][ T2278] ? user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543)
[ 106.977481][ T2278] user_events_ioctl (kernel/trace/trace_events_user.c:2067 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543)
[ 106.977970][ T2278] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:871 fs/ioctl.c:857 fs/ioctl.c:857)
[ 106.978441][ T2278] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
[ 106.978889][ T2278] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 106.979462][ T2278] RIP: 0033:0x7f2e121c8b5b
[ 106.979907][ T2278] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
All code
========
0: 00 48 89 add %cl,-0x77(%rax)
3: 44 24 18 rex.R and $0x18,%al
6: 31 c0 xor %eax,%eax
8: 48 8d 44 24 60 lea 0x60(%rsp),%rax
d: c7 04 24 10 00 00 00 movl $0x10,(%rsp)
14: 48 89 44 24 08 mov %rax,0x8(%rsp)
19: 48 8d 44 24 20 lea 0x20(%rsp),%rax
1e: 48 89 44 24 10 mov %rax,0x10(%rsp)
23: b8 10 00 00 00 mov $0x10,%eax
28: 0f 05 syscall
2a:* 89 c2 mov %eax,%edx <-- trapping instruction
2c: 3d 00 f0 ff ff cmp $0xfffff000,%eax
31: 77 1c ja 0x4f
33: 48 8b 44 24 18 mov 0x18(%rsp),%rax
38: 64 fs
39: 48 rex.W
3a: 2b .byte 0x2b
3b: 04 25 add $0x25,%al
3d: 28 00 sub %al,(%rax)
...

Code starting with the faulting instruction
===========================================
0: 89 c2 mov %eax,%edx
2: 3d 00 f0 ff ff cmp $0xfffff000,%eax
7: 77 1c ja 0x25
9: 48 8b 44 24 18 mov 0x18(%rsp),%rax
e: 64 fs
f: 48 rex.W
10: 2b .byte 0x2b
11: 04 25 add $0x25,%al
13: 28 00 sub %al,(%rax)
...
[ 106.981608][ T2278] RSP: 002b:00007ffcb0ba5ed0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 106.982385][ T2278] RAX: ffffffffffffffda RBX: 00007ffcb0ba6228 RCX: 00007f2e121c8b5b
[ 106.983128][ T2278] RDX: 0000564d434bc6fe RSI: 0000000040082a01 RDI: 0000000000000005
[ 106.983878][ T2278] RBP: 00007ffcb0ba5f40 R08: 0000000000000000 R09: 00007f2e120c9b80
[ 106.984626][ T2278] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 106.986296][ T2278] R13: 00007ffcb0ba6238 R14: 0000564d434bedc8 R15: 00007f2e123cc020
[ 106.987040][ T2278] </TASK>
[ 106.987364][ T2278]
[ 106.987635][ T2278] Allocated by task 2278:
[ 106.988071][ T2278] kasan_save_stack (mm/kasan/common.c:46)
[ 106.988543][ T2278] kasan_set_track (mm/kasan/common.c:52)
[ 106.988999][ T2278] __kasan_kmalloc (mm/kasan/common.c:374 mm/kasan/common.c:383)
[ 106.989465][ T2278] user_event_parse (include/linux/slab.h:600 include/linux/slab.h:721 kernel/trace/trace_events_user.c:1978)
[ 106.989939][ T2278] user_events_ioctl_reg (kernel/trace/trace_events_user.c:2342)
[ 106.990462][ T2278] user_events_ioctl (kernel/trace/trace_events_user.c:2538)
[ 106.990954][ T2278] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:871 fs/ioctl.c:857 fs/ioctl.c:857)
[ 106.991428][ T2278] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
[ 106.991871][ T2278] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 106.992436][ T2278]
[ 106.992705][ T2278] Freed by task 2278:
[ 106.993112][ T2278] kasan_save_stack (mm/kasan/common.c:46)
[ 106.993582][ T2278] kasan_set_track (mm/kasan/common.c:52)
[ 106.994043][ T2278] kasan_save_free_info (mm/kasan/generic.c:524)
[ 106.994544][ T2278] __kasan_slab_free (mm/kasan/common.c:238 mm/kasan/common.c:200 mm/kasan/common.c:244)
[ 106.995028][ T2278] slab_free_freelist_hook (mm/slub.c:1826)
[ 106.995553][ T2278] __kmem_cache_free (mm/slub.c:3809 mm/slub.c:3822)
[ 106.996026][ T2278] destroy_user_event (kernel/trace/trace_events_user.c:1489 kernel/trace/trace_events_user.c:1467)
[ 106.996513][ T2278] user_events_ioctl (kernel/trace/trace_events_user.c:2077 kernel/trace/trace_events_user.c:2401 kernel/trace/trace_events_user.c:2543)
[ 106.997009][ T2278] __x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:871 fs/ioctl.c:857 fs/ioctl.c:857)
[ 106.997483][ T2278] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
[ 106.997926][ T2278] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 106.998496][ T2278]
[ 106.998768][ T2278] The buggy address belongs to the object at ffff88816644ee00
[ 106.998768][ T2278] which belongs to the cache kmalloc-cg-512 of size 512
[ 107.000035][ T2278] The buggy address is located 312 bytes inside of
[ 107.000035][ T2278] freed 512-byte region [ffff88816644ee00, ffff88816644f000)
[ 107.001266][ T2278]
[ 107.001532][ T2278] The buggy address belongs to the physical page:
[ 107.002142][ T2278] page:ffffea0005991200 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88816644b800 pfn:0x166448
[ 107.003179][ T2278] head:ffffea0005991200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 107.003996][ T2278] memcg:ffff888160dfc4e9
[ 107.004425][ T2278] flags: 0x17ffffc0000840(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
[ 107.005189][ T2278] page_type: 0xffffffff()
[ 107.005635][ T2278] raw: 0017ffffc0000840 ffff888100051700 ffffea0004050810 ffff888100043dc8
[ 107.006434][ T2278] raw: ffff88816644b800 0000000000150008 00000001ffffffff ffff888160dfc4e9
[ 107.007223][ T2278] page dumped because: kasan: bad access detected
[ 107.007841][ T2278]
[ 107.008111][ T2278] Memory state around the buggy address:
[ 107.008660][ T2278] ffff88816644ee00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 107.009416][ T2278] ffff88816644ee80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 107.010161][ T2278] >ffff88816644ef00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 107.010907][ T2278] ^
[ 107.011471][ T2278] ffff88816644ef80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 107.012215][ T2278] ffff88816644f000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 107.012967][ T2278] ==================================================================
[ 107.013787][ T2278] Disabling lock debugging due to kernel taint



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240214/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki