There are several scenarios that have come up where having a user_event
persist even if the process that registered it exits. The main one is
having a daemon create events on bootup that shouldn't get deleted if
the daemon has to exit or reload. Another is within OpenTelemetry
exporters, they wish to potentially check if a user_event exists on the
system to determine if exporting the data out should occur. The
user_event in this case must exist even in the absence of the owning
process running (such as the above daemon case).
Since persistent events aren't automatically cleaned up, we want to ensure
only trusted users are allowed to do this. It seems reasonable to use
CAP_PERFMON as that boundary, since those users can already do many things
via perf_event_open without requiring full CAP_SYS_ADMIN.
This patchset brings back the ability to use /sys/kernel/tracing/dynamic_events
to create user_events, as persist is now back to being supported. Both the
register and delete of events that persist require CAP_PERFMON, which prevents
a non-perfmon user from making an event go away that a perfmon user decided
should persist.
Change History:
V2:
Added common user_event_capable() function to check access given
the register flags.
Ensure access check is done for dynamic_events when implicitly removing
events, such as "echo 'u:test' > /sys/kernel/tracing/dynamic_events".
Beau Belgrave (3):
tracing/user_events: Allow events to persist for perfmon_capable users
selftests/user_events: Test persist flag cases
tracing/user_events: Document persist event flags
Documentation/trace/user_events.rst | 21 ++++++-
include/uapi/linux/user_events.h | 11 +++-
kernel/trace/trace_events_user.c | 36 +++++++-----
.../testing/selftests/user_events/abi_test.c | 55 ++++++++++++++++++-
.../testing/selftests/user_events/dyn_test.c | 54 +++++++++++++++++-
5 files changed, 157 insertions(+), 20 deletions(-)
base-commit: fc1653abba0d554aad80224e51bcad42b09895ed
--
2.34.1
There are several scenarios that have come up where having a user_event
persist even if the process that registered it exits. The main one is
having a daemon create events on bootup that shouldn't get deleted if
the daemon has to exit or reload. Another is within OpenTelemetry
exporters, they wish to potentially check if a user_event exists on the
system to determine if exporting the data out should occur. The
user_event in this case must exist even in the absence of the owning
process running (such as the above daemon case).
Expose the previously internal flag USER_EVENT_REG_PERSIST to user
processes. Upon register or delete of events with this flag, ensure the
user is perfmon_capable to prevent random user processes with access to
tracefs from creating events that persist after exit.
Signed-off-by: Beau Belgrave <[email protected]>
---
include/uapi/linux/user_events.h | 11 +++++++++-
kernel/trace/trace_events_user.c | 36 +++++++++++++++++++-------------
2 files changed, 32 insertions(+), 15 deletions(-)
diff --git a/include/uapi/linux/user_events.h b/include/uapi/linux/user_events.h
index 2984aae4a2b4..f74f3aedd49c 100644
--- a/include/uapi/linux/user_events.h
+++ b/include/uapi/linux/user_events.h
@@ -17,6 +17,15 @@
/* Create dynamic location entry within a 32-bit value */
#define DYN_LOC(offset, size) ((size) << 16 | (offset))
+/* List of supported registration flags */
+enum user_reg_flag {
+ /* Event will not delete upon last reference closing */
+ USER_EVENT_REG_PERSIST = 1U << 0,
+
+ /* This value or above is currently non-ABI */
+ USER_EVENT_REG_MAX = 1U << 1,
+};
+
/*
* Describes an event registration and stores the results of the registration.
* This structure is passed to the DIAG_IOCSREG ioctl, callers at a minimum
@@ -33,7 +42,7 @@ struct user_reg {
/* Input: Enable size in bytes at address */
__u8 enable_size;
- /* Input: Flags for future use, set to 0 */
+ /* Input: Flags to use, if any */
__u16 flags;
/* Input: Address to update when enabled */
diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 6f046650e527..e3f2b8d72e01 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -49,18 +49,6 @@
#define EVENT_STATUS_PERF BIT(1)
#define EVENT_STATUS_OTHER BIT(7)
-/*
- * User register flags are not allowed yet, keep them here until we are
- * ready to expose them out to the user ABI.
- */
-enum user_reg_flag {
- /* Event will not delete upon last reference closing */
- USER_EVENT_REG_PERSIST = 1U << 0,
-
- /* This value or above is currently non-ABI */
- USER_EVENT_REG_MAX = 1U << 1,
-};
-
/*
* Stores the system name, tables, and locks for a group of events. This
* allows isolation for events by various means.
@@ -191,6 +179,17 @@ static u32 user_event_key(char *name)
return jhash(name, strlen(name), 0);
}
+static bool user_event_capable(u16 reg_flags)
+{
+ /* Persistent events require CAP_PERFMON / CAP_SYS_ADMIN */
+ if (reg_flags & USER_EVENT_REG_PERSIST) {
+ if (!perfmon_capable())
+ return false;
+ }
+
+ return true;
+}
+
static struct user_event *user_event_get(struct user_event *user)
{
refcount_inc(&user->refcnt);
@@ -1773,6 +1772,9 @@ static int user_event_free(struct dyn_event *ev)
if (!user_event_last_ref(user))
return -EBUSY;
+ if (!user_event_capable(user->reg_flags))
+ return -EPERM;
+
return destroy_user_event(user);
}
@@ -1888,10 +1890,13 @@ static int user_event_parse(struct user_event_group *group, char *name,
int argc = 0;
char **argv;
- /* User register flags are not ready yet */
- if (reg_flags != 0 || flags != NULL)
+ /* Currently don't support any text based flags */
+ if (flags != NULL)
return -EINVAL;
+ if (!user_event_capable(reg_flags))
+ return -EPERM;
+
/* Prevent dyn_event from racing */
mutex_lock(&event_mutex);
user = find_user_event(group, name, &key);
@@ -2024,6 +2029,9 @@ static int delete_user_event(struct user_event_group *group, char *name)
if (!user_event_last_ref(user))
return -EBUSY;
+ if (!user_event_capable(user->reg_flags))
+ return -EPERM;
+
return destroy_user_event(user);
}
--
2.34.1
Users need to know how to make events persist now that we allow for
that. We also now allow the dynamic_events file to create events by
utilizing the persist flag during event register.
Add back in to documentation how /sys/kernel/tracing/dynamic_events can
be used to create persistent user_events. Add a section under registering
for the currently supported flags (USER_EVENT_REG_PERSIST) and the
required permissions. Add a note under deleting that deleting a
persistent event also requires sufficient permission.
Signed-off-by: Beau Belgrave <[email protected]>
---
Documentation/trace/user_events.rst | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/Documentation/trace/user_events.rst b/Documentation/trace/user_events.rst
index e7b07313550a..576d2c35f22e 100644
--- a/Documentation/trace/user_events.rst
+++ b/Documentation/trace/user_events.rst
@@ -14,6 +14,11 @@ Programs can view status of the events via
/sys/kernel/tracing/user_events_status and can both register and write
data out via /sys/kernel/tracing/user_events_data.
+Programs can also use /sys/kernel/tracing/dynamic_events to register and
+delete user based events via the u: prefix. The format of the command to
+dynamic_events is the same as the ioctl with the u: prefix applied. This
+requires CAP_PERFMON due to the event persisting, otherwise -EPERM is returned.
+
Typically programs will register a set of events that they wish to expose to
tools that can read trace_events (such as ftrace and perf). The registration
process tells the kernel which address and bit to reflect if any tool has
@@ -45,7 +50,7 @@ This command takes a packed struct user_reg as an argument::
/* Input: Enable size in bytes at address */
__u8 enable_size;
- /* Input: Flags for future use, set to 0 */
+ /* Input: Flags to use, if any */
__u16 flags;
/* Input: Address to update when enabled */
@@ -69,7 +74,7 @@ The struct user_reg requires all the above inputs to be set appropriately.
This must be 4 (32-bit) or 8 (64-bit). 64-bit values are only allowed to be
used on 64-bit kernels, however, 32-bit can be used on all kernels.
-+ flags: The flags to use, if any. For the initial version this must be 0.
++ flags: The flags to use, if any.
Callers should first attempt to use flags and retry without flags to ensure
support for lower versions of the kernel. If a flag is not supported -EINVAL
is returned.
@@ -80,6 +85,13 @@ The struct user_reg requires all the above inputs to be set appropriately.
+ name_args: The name and arguments to describe the event, see command format
for details.
+The following flags are currently supported.
+
++ USER_EVENT_REG_PERSIST: The event will not delete upon the last reference
+ closing. Callers may use this if an event should exist even after the
+ process closes or unregisters the event. Requires CAP_PERFMON otherwise
+ -EPERM is returned.
+
Upon successful registration the following is set.
+ write_index: The index to use for this file descriptor that represents this
@@ -141,7 +153,10 @@ event (in both user and kernel space). User programs should use a separate file
to request deletes than the one used for registration due to this.
**NOTE:** By default events will auto-delete when there are no references left
-to the event. Flags in the future may change this logic.
+to the event. If programs do not want auto-delete, they must use the
+USER_EVENT_REG_PERSIST flag when registering the event. Once that flag is used
+the event exists until DIAG_IOCSDEL is invoked. Both register and delete of an
+event that persists requires CAP_PERFMON, otherwise -EPERM is returned.
Unregistering
-------------
--
2.34.1