2023-05-31 00:18:27

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH 0/5] tracing/user_events: Add auto-del flag for events

As part of the discussions for user_events aligning to be used with eBPF
it became clear [1] we needed a way to delete events without having to rely
upon the delete IOCTL. Steven suggested that we simply have an owner
for the event, however, the event can be held by more than just the
first register FD, such as perf/ftrace or additional registers. In order
to handle all those cases, we must only delete after all references are
gone from both user and kernel space.

This series adds a new register flag, USER_EVENT_REG_AUTO_DEL, which
causes the event to delete itself upon the last put reference. We cannot
fully drop the delete IOCTL, since we still want to enable events to be
registered early via dynamic_events and persist. If the auto delete flag
was used during dynamic_events, the event would delete immediately.

We have a few key events that we enable immediately after boot and are
monitored in our environments. Today this is done via dynamic events,
however, it could also be done directly via the ABI by not passing the
auto delete flag.

NOTE: I'll need to merge this work once we take these [2] [3] patches
into for-next. I'm happy to do so once they land there.

1: https://lore.kernel.org/linux-trace-kernel/[email protected]/
2: https://lore.kernel.org/linux-trace-kernel/[email protected]/
3: https://lore.kernel.org/linux-trace-kernel/[email protected]/

Beau Belgrave (5):
tracing/user_events: Store register flags on events
tracing/user_events: Track refcount consistently via put/get
tracing/user_events: Add flag to auto-delete events
tracing/user_events: Add self-test for auto-del flag
tracing/user_events: Add auto-del flag documentation

Documentation/trace/user_events.rst | 21 +-
include/uapi/linux/user_events.h | 10 +-
kernel/trace/trace_events_user.c | 183 ++++++++++++++----
.../testing/selftests/user_events/abi_test.c | 115 ++++++++++-
4 files changed, 278 insertions(+), 51 deletions(-)


base-commit: 3862f86c1529fa0016de6344eb974877b4cd3838
--
2.25.1



2023-05-31 00:18:48

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH 2/5] tracing/user_events: Track refcount consistently via put/get

Various parts of the code today track user_event's refcnt field directly
via a refcount_add/dec. This makes it hard to modify the behavior of the
last reference decrement in all code paths consistently. For example, in
the future we will auto-delete events upon the last reference going
away. This last reference could happen in many places, but we want it to
be consistently handled.

Add user_event_get() and user_event_put() for the add/dec. Update all
places where direct refcounts are being used to utilize these new
functions. In each location pass if event_mutex is locked or not. This
allows us to drop events automatically in future patches clearly. Ensure
when caller states the lock is held, it really is (or is not) held.

Signed-off-by: Beau Belgrave <[email protected]>
---
kernel/trace/trace_events_user.c | 66 +++++++++++++++++++-------------
1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 34aa0a5d8e2a..8f0fb6cb0f33 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -175,6 +175,28 @@ static u32 user_event_key(char *name)
return jhash(name, strlen(name), 0);
}

+static struct user_event *user_event_get(struct user_event *user)
+{
+ refcount_inc(&user->refcnt);
+
+ return user;
+}
+
+static void user_event_put(struct user_event *user, bool locked)
+{
+#ifdef CONFIG_LOCKDEP
+ if (locked)
+ lockdep_assert_held(&event_mutex);
+ else
+ lockdep_assert_not_held(&event_mutex);
+#endif
+
+ if (unlikely(!user))
+ return;
+
+ refcount_dec(&user->refcnt);
+}
+
static void user_event_group_destroy(struct user_event_group *group)
{
kfree(group->system_name);
@@ -258,12 +280,13 @@ static struct user_event_group
return NULL;
};

-static void user_event_enabler_destroy(struct user_event_enabler *enabler)
+static void user_event_enabler_destroy(struct user_event_enabler *enabler,
+ bool locked)
{
list_del_rcu(&enabler->link);

/* No longer tracking the event via the enabler */
- refcount_dec(&enabler->event->refcnt);
+ user_event_put(enabler->event, locked);

kfree(enabler);
}
@@ -325,7 +348,7 @@ static void user_event_enabler_fault_fixup(struct work_struct *work)

/* User asked for enabler to be removed during fault */
if (test_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler))) {
- user_event_enabler_destroy(enabler);
+ user_event_enabler_destroy(enabler, true);
goto out;
}

@@ -489,13 +512,12 @@ static bool user_event_enabler_dup(struct user_event_enabler *orig,
if (!enabler)
return false;

- enabler->event = orig->event;
+ enabler->event = user_event_get(orig->event);
enabler->addr = orig->addr;

/* Only dup part of value (ignore future flags, etc) */
enabler->values = orig->values & ENABLE_VAL_DUP_MASK;

- refcount_inc(&enabler->event->refcnt);
list_add_rcu(&enabler->link, &mm->enablers);

return true;
@@ -595,7 +617,7 @@ static void user_event_mm_destroy(struct user_event_mm *mm)
struct user_event_enabler *enabler, *next;

list_for_each_entry_safe(enabler, next, &mm->enablers, link)
- user_event_enabler_destroy(enabler);
+ user_event_enabler_destroy(enabler, false);

mmdrop(mm->mm);
kfree(mm);
@@ -748,7 +770,7 @@ static struct user_event_enabler
* exit or run exec(), which includes forks and clones.
*/
if (!*write_result) {
- refcount_inc(&enabler->event->refcnt);
+ user_event_get(user);
list_add_rcu(&enabler->link, &user_mm->enablers);
}

@@ -1336,10 +1358,8 @@ static struct user_event *find_user_event(struct user_event_group *group,
*outkey = key;

hash_for_each_possible(group->register_table, user, node, key)
- if (!strcmp(EVENT_NAME(user), name)) {
- refcount_inc(&user->refcnt);
- return user;
- }
+ if (!strcmp(EVENT_NAME(user), name))
+ return user_event_get(user);

return NULL;
}
@@ -1553,12 +1573,12 @@ static int user_event_reg(struct trace_event_call *call,

return ret;
inc:
- refcount_inc(&user->refcnt);
+ user_event_get(user);
update_enable_bit_for(user);
return 0;
dec:
update_enable_bit_for(user);
- refcount_dec(&user->refcnt);
+ user_event_put(user, true);
return 0;
}

@@ -1592,7 +1612,7 @@ static int user_event_create(const char *raw_command)
ret = user_event_parse_cmd(group, name, &user, 0);

if (!ret)
- refcount_dec(&user->refcnt);
+ user_event_put(user, false);

mutex_unlock(&group->reg_mutex);

@@ -1856,7 +1876,7 @@ static int delete_user_event(struct user_event_group *group, char *name)
if (!user)
return -ENOENT;

- refcount_dec(&user->refcnt);
+ user_event_put(user, true);

if (!user_event_last_ref(user))
return -EBUSY;
@@ -2015,9 +2035,7 @@ static int user_events_ref_add(struct user_event_file_info *info,
for (i = 0; i < count; ++i)
new_refs->events[i] = refs->events[i];

- new_refs->events[i] = user;
-
- refcount_inc(&user->refcnt);
+ new_refs->events[i] = user_event_get(user);

rcu_assign_pointer(info->refs, new_refs);

@@ -2131,7 +2149,7 @@ static long user_events_ioctl_reg(struct user_event_file_info *info,
ret = user_events_ref_add(info, user);

/* No longer need parse ref, ref_add either worked or not */
- refcount_dec(&user->refcnt);
+ user_event_put(user, false);

/* Positive number is index and valid */
if (ret < 0)
@@ -2280,7 +2298,7 @@ static long user_events_ioctl_unreg(unsigned long uarg)
set_bit(ENABLE_VAL_FREEING_BIT, ENABLE_BITOPS(enabler));

if (!test_bit(ENABLE_VAL_FAULTING_BIT, ENABLE_BITOPS(enabler)))
- user_event_enabler_destroy(enabler);
+ user_event_enabler_destroy(enabler, true);

/* Removed at least one */
ret = 0;
@@ -2337,7 +2355,6 @@ static int user_events_release(struct inode *node, struct file *file)
struct user_event_file_info *info = file->private_data;
struct user_event_group *group;
struct user_event_refs *refs;
- struct user_event *user;
int i;

if (!info)
@@ -2361,12 +2378,9 @@ static int user_events_release(struct inode *node, struct file *file)
* The underlying user_events are ref counted, and cannot be freed.
* After this decrement, the user_events may be freed elsewhere.
*/
- for (i = 0; i < refs->count; ++i) {
- user = refs->events[i];
+ for (i = 0; i < refs->count; ++i)
+ user_event_put(refs->events[i], false);

- if (user)
- refcount_dec(&user->refcnt);
- }
out:
file->private_data = NULL;

--
2.25.1


2023-05-31 22:02:09

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH 0/5] tracing/user_events: Add auto-del flag for events

On Tue, May 30, 2023 at 04:52:59PM -0700, Beau Belgrave wrote:
> As part of the discussions for user_events aligning to be used with eBPF
> it became clear [1] we needed a way to delete events without having to rely
> upon the delete IOCTL. Steven suggested that we simply have an owner

This patch set is not addressing the issues I pointed out earlier.
It adds a new flag and new api. It's not a fix.

> for the event, however, the event can be held by more than just the
> first register FD, such as perf/ftrace or additional registers. In order
> to handle all those cases, we must only delete after all references are
> gone from both user and kernel space.
>
> This series adds a new register flag, USER_EVENT_REG_AUTO_DEL, which
> causes the event to delete itself upon the last put reference. We cannot

Do not introduce a new flag. Make this default.

> fully drop the delete IOCTL, since we still want to enable events to be
> registered early via dynamic_events and persist. If the auto delete flag
> was used during dynamic_events, the event would delete immediately.

You have to delete this broken "delete via ioctl" api.
For persistent events you need a different api in its own name scope,
so it doesn't conflict with per-fd events.

> We have a few key events that we enable immediately after boot and are
> monitored in our environments. Today this is done via dynamic events,
> however, it could also be done directly via the ABI by not passing the
> auto delete flag.
>
> NOTE: I'll need to merge this work once we take these [2] [3] patches
> into for-next. I'm happy to do so once they land there.
>
> 1: https://lore.kernel.org/linux-trace-kernel/[email protected]/
> 2: https://lore.kernel.org/linux-trace-kernel/[email protected]/
> 3: https://lore.kernel.org/linux-trace-kernel/[email protected]/
>
> Beau Belgrave (5):
> tracing/user_events: Store register flags on events
> tracing/user_events: Track refcount consistently via put/get
> tracing/user_events: Add flag to auto-delete events
> tracing/user_events: Add self-test for auto-del flag
> tracing/user_events: Add auto-del flag documentation
>
> Documentation/trace/user_events.rst | 21 +-
> include/uapi/linux/user_events.h | 10 +-
> kernel/trace/trace_events_user.c | 183 ++++++++++++++----
> .../testing/selftests/user_events/abi_test.c | 115 ++++++++++-
> 4 files changed, 278 insertions(+), 51 deletions(-)
>
>
> base-commit: 3862f86c1529fa0016de6344eb974877b4cd3838
> --
> 2.25.1
>

2023-06-01 00:46:31

by Beau Belgrave

[permalink] [raw]
Subject: Re: [PATCH 0/5] tracing/user_events: Add auto-del flag for events

On Wed, May 31, 2023 at 02:44:44PM -0700, Alexei Starovoitov wrote:
> On Tue, May 30, 2023 at 04:52:59PM -0700, Beau Belgrave wrote:
> > As part of the discussions for user_events aligning to be used with eBPF
> > it became clear [1] we needed a way to delete events without having to rely
> > upon the delete IOCTL. Steven suggested that we simply have an owner
>
> This patch set is not addressing the issues I pointed out earlier.
> It adds a new flag and new api. It's not a fix.
>

Can you point out the scenario you are worried about?

For example, if anything is using a per-FD event, it cannot be deleted,
it will return -EBUSY. If perf, ftrace, or any user-process still has a
reference to the event, the delete will not go through (even without
these changes).

I read your previous issues as, we cannot let anyone delete events while
others are using them. And I also heard Steven state, we need to not let
things pile up, since manual deletes are unlikely.

> > for the event, however, the event can be held by more than just the
> > first register FD, such as perf/ftrace or additional registers. In order
> > to handle all those cases, we must only delete after all references are
> > gone from both user and kernel space.
> >
> > This series adds a new register flag, USER_EVENT_REG_AUTO_DEL, which
> > causes the event to delete itself upon the last put reference. We cannot
>
> Do not introduce a new flag. Make this default.
>

If this is to be default, then I would have to have a flag for
persistent events, which seems reasonable.

> > fully drop the delete IOCTL, since we still want to enable events to be
> > registered early via dynamic_events and persist. If the auto delete flag
> > was used during dynamic_events, the event would delete immediately.
>
> You have to delete this broken "delete via ioctl" api.
> For persistent events you need a different api in its own name scope,
> so it doesn't conflict with per-fd events.
>

We have certain events we want persistent, that don't go away if the
process crashes, etc. and we don't yet have a ring buffer up via
perf_events.

In these cases, we want the name to be the same for all processes, since
it's a common event. An example is a common library that emits out
assert messages. We want to watch for any asserts on the system,
regardless of which process emits them.

I'm not sure I understand how you think they would conflict?

Another process cannot come in and register the same event name while
it's in use. They can only do so once everything has been closed down.

If another process uses the same name for an event, it must match the
previous events arguments, and is treated as the same event. If they
don't match then the register fails. The only way to get a conflict is
to delete the event and then create a new one, but that only works if
no one is still using it at all.

Thanks,
-Beau

> > We have a few key events that we enable immediately after boot and are
> > monitored in our environments. Today this is done via dynamic events,
> > however, it could also be done directly via the ABI by not passing the
> > auto delete flag.
> >
> > NOTE: I'll need to merge this work once we take these [2] [3] patches
> > into for-next. I'm happy to do so once they land there.
> >
> > 1: https://lore.kernel.org/linux-trace-kernel/[email protected]/
> > 2: https://lore.kernel.org/linux-trace-kernel/[email protected]/
> > 3: https://lore.kernel.org/linux-trace-kernel/[email protected]/
> >
> > Beau Belgrave (5):
> > tracing/user_events: Store register flags on events
> > tracing/user_events: Track refcount consistently via put/get
> > tracing/user_events: Add flag to auto-delete events
> > tracing/user_events: Add self-test for auto-del flag
> > tracing/user_events: Add auto-del flag documentation
> >
> > Documentation/trace/user_events.rst | 21 +-
> > include/uapi/linux/user_events.h | 10 +-
> > kernel/trace/trace_events_user.c | 183 ++++++++++++++----
> > .../testing/selftests/user_events/abi_test.c | 115 ++++++++++-
> > 4 files changed, 278 insertions(+), 51 deletions(-)
> >
> >
> > base-commit: 3862f86c1529fa0016de6344eb974877b4cd3838
> > --
> > 2.25.1
> >