2022-04-04 10:12:33

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH 0/7] tracing/user_events: Update user_events ABI from

This series covers the changes that were brought up once user_events went into
5.18. The largest change is moving away from byte index to a bit index, as
first suggested by Mathieu Desnoyers.

The other changes are either fixes that have accumulated or found by Mathieu.

NOTE: The sample and self-tests do not build unless you manually install
user_events.h into usr/include/linux.

Link: https://lore.kernel.org/all/[email protected]/

Psuedo code example of typical usage with the new ABI:
struct user_reg reg;

int page_fd = open("user_events_status", O_RDWR);
char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0);
close(page_fd);

int data_fd = open("user_events_data", O_RDWR);

reg.size = sizeof(reg);
reg.name_args = (__u64)"test";

ioctl(data_fd, DIAG_IOCSREG, &reg);
int status_id = reg.status_index;
int status_mask = reg.status_mask;
int write_id = reg.write_index;

struct iovec io[2];
io[0].iov_base = &write_id;
io[0].iov_len = sizeof(write_id);
io[1].iov_base = payload;
io[1].iov_len = sizeof(payload);

if (page_data[status_id] & status_mask)
writev(data_fd, io, 2);

Beau Belgrave (7):
tracing/user_events: Fix repeated word in comments
tracing/user_events: Use NULL for strstr checks
tracing/user_events: Use WRITE instead of READ for io vector import
tracing/user_events: Ensure user provided strings are safely formatted
tracing/user_events: Use refcount instead of atomic for ref tracking
tracing/user_events: Use bits vs bytes for enabled status page data
tracing/user_events: Update ABI documentation to align to bits vs
bytes

Documentation/trace/user_events.rst | 46 ++--
include/linux/user_events.h | 19 +-
kernel/trace/trace_events_user.c | 213 ++++++++++++------
samples/user_events/example.c | 12 +-
.../selftests/user_events/ftrace_test.c | 16 +-
.../testing/selftests/user_events/perf_test.c | 6 +-
6 files changed, 187 insertions(+), 125 deletions(-)


base-commit: bfdf01279299f1254561d6c2072f1919e457e23a
--
2.25.1


2022-04-04 15:02:20

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH 5/7] tracing/user_events: Use refcount instead of atomic for ref tracking

User processes could open up enough event references to cause rollovers.
These could cause use after free scenarios, which we do not want.
Switching to refcount APIs prevent this, but will leak memory once
saturated.

Once saturated, user processes can still use the events. This prevents
a bad user process from stopping existing telemetry from being emitted.

Link: https://lore.kernel.org/all/[email protected]/

Reported-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Beau Belgrave <[email protected]>
---
kernel/trace/trace_events_user.c | 53 +++++++++++++++-----------------
1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index f9bb7d37d76f..2bcae7abfa81 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -14,6 +14,7 @@
#include <linux/uio.h>
#include <linux/ioctl.h>
#include <linux/jhash.h>
+#include <linux/refcount.h>
#include <linux/trace_events.h>
#include <linux/tracefs.h>
#include <linux/types.h>
@@ -57,7 +58,7 @@ static DECLARE_BITMAP(page_bitmap, MAX_EVENTS);
* within a file a user_event might be created if it does not
* already exist. These are globally used and their lifetime
* is tied to the refcnt member. These cannot go away until the
- * refcnt reaches zero.
+ * refcnt reaches one.
*/
struct user_event {
struct tracepoint tracepoint;
@@ -67,7 +68,7 @@ struct user_event {
struct hlist_node node;
struct list_head fields;
struct list_head validators;
- atomic_t refcnt;
+ refcount_t refcnt;
int index;
int flags;
int min_size;
@@ -105,6 +106,12 @@ static u32 user_event_key(char *name)
return jhash(name, strlen(name), 0);
}

+static __always_inline __must_check
+bool user_event_last_ref(struct user_event *user)
+{
+ return refcount_read(&user->refcnt) == 1;
+}
+
static __always_inline __must_check
size_t copy_nofault(void *addr, size_t bytes, struct iov_iter *i)
{
@@ -662,7 +669,7 @@ static struct user_event *find_user_event(char *name, u32 *outkey)

hash_for_each_possible(register_table, user, node, key)
if (!strcmp(EVENT_NAME(user), name)) {
- atomic_inc(&user->refcnt);
+ refcount_inc(&user->refcnt);
return user;
}

@@ -876,12 +883,12 @@ static int user_event_reg(struct trace_event_call *call,

return ret;
inc:
- atomic_inc(&user->refcnt);
+ refcount_inc(&user->refcnt);
update_reg_page_for(user);
return 0;
dec:
update_reg_page_for(user);
- atomic_dec(&user->refcnt);
+ refcount_dec(&user->refcnt);
return 0;
}

@@ -907,7 +914,7 @@ static int user_event_create(const char *raw_command)
ret = user_event_parse_cmd(name, &user);

if (!ret)
- atomic_dec(&user->refcnt);
+ refcount_dec(&user->refcnt);

mutex_unlock(&reg_mutex);

@@ -951,14 +958,14 @@ static bool user_event_is_busy(struct dyn_event *ev)
{
struct user_event *user = container_of(ev, struct user_event, devent);

- return atomic_read(&user->refcnt) != 0;
+ return !user_event_last_ref(user);
}

static int user_event_free(struct dyn_event *ev)
{
struct user_event *user = container_of(ev, struct user_event, devent);

- if (atomic_read(&user->refcnt) != 0)
+ if (!user_event_last_ref(user))
return -EBUSY;

return destroy_user_event(user);
@@ -1137,8 +1144,8 @@ static int user_event_parse(char *name, char *args, char *flags,

user->index = index;

- /* Ensure we track ref */
- atomic_inc(&user->refcnt);
+ /* Ensure we track self ref and caller ref (2) */
+ refcount_set(&user->refcnt, 2);

dyn_event_init(&user->devent, &user_event_dops);
dyn_event_add(&user->devent, &user->call);
@@ -1164,29 +1171,17 @@ static int user_event_parse(char *name, char *args, char *flags,
static int delete_user_event(char *name)
{
u32 key;
- int ret;
struct user_event *user = find_user_event(name, &key);

if (!user)
return -ENOENT;

- /* Ensure we are the last ref */
- if (atomic_read(&user->refcnt) != 1) {
- ret = -EBUSY;
- goto put_ref;
- }
-
- ret = destroy_user_event(user);
+ refcount_dec(&user->refcnt);

- if (ret)
- goto put_ref;
-
- return ret;
-put_ref:
- /* No longer have this ref */
- atomic_dec(&user->refcnt);
+ if (!user_event_last_ref(user))
+ return -EBUSY;

- return ret;
+ return destroy_user_event(user);
}

/*
@@ -1314,7 +1309,7 @@ static int user_events_ref_add(struct file *file, struct user_event *user)

new_refs->events[i] = user;

- atomic_inc(&user->refcnt);
+ refcount_inc(&user->refcnt);

rcu_assign_pointer(file->private_data, new_refs);

@@ -1374,7 +1369,7 @@ static long user_events_ioctl_reg(struct file *file, unsigned long uarg)
ret = user_events_ref_add(file, user);

/* No longer need parse ref, ref_add either worked or not */
- atomic_dec(&user->refcnt);
+ refcount_dec(&user->refcnt);

/* Positive number is index and valid */
if (ret < 0)
@@ -1464,7 +1459,7 @@ static int user_events_release(struct inode *node, struct file *file)
user = refs->events[i];

if (user)
- atomic_dec(&user->refcnt);
+ refcount_dec(&user->refcnt);
}
out:
file->private_data = NULL;
--
2.25.1

2022-04-05 01:17:55

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH 6/7] tracing/user_events: Use bits vs bytes for enabled status page data

User processes may require many events and when they do the cache
performance of a byte index status check is less ideal than a bit index.
The previous event limit per-page was 4096, the new limit is 32,768.

This change adds a mask property to the user_reg struct. Programs check
that the byte at status_index has a bit set by ANDing the status_mask.

Link: https://lore.kernel.org/all/[email protected]/

Suggested-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Beau Belgrave <[email protected]>
---
include/linux/user_events.h | 19 +++---
kernel/trace/trace_events_user.c | 58 ++++++++++++++++---
samples/user_events/example.c | 12 ++--
.../selftests/user_events/ftrace_test.c | 16 ++---
.../testing/selftests/user_events/perf_test.c | 6 +-
5 files changed, 77 insertions(+), 34 deletions(-)

diff --git a/include/linux/user_events.h b/include/linux/user_events.h
index 736e05603463..c5051fee26c6 100644
--- a/include/linux/user_events.h
+++ b/include/linux/user_events.h
@@ -20,15 +20,6 @@
#define USER_EVENTS_SYSTEM "user_events"
#define USER_EVENTS_PREFIX "u:"

-/* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */
-#define EVENT_BIT_FTRACE 0
-#define EVENT_BIT_PERF 1
-#define EVENT_BIT_OTHER 7
-
-#define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE)
-#define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF)
-#define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER)
-
/* Create dynamic location entry within a 32-bit value */
#define DYN_LOC(offset, size) ((size) << 16 | (offset))

@@ -48,9 +39,17 @@ struct user_reg {
/* Output: Byte index of the event within the status page */
__u32 status_index;

+ /* Output: Mask for the event within the status page byte */
+ __u32 status_mask;
+
/* Output: Index of the event to use when writing data */
__u32 write_index;
-};
+} __attribute__((__packed__));
+
+static inline int user_event_enabled(void *status_data, int index, int mask)
+{
+ return status_data && (((const char *)status_data)[index] & mask);
+}

#define DIAG_IOC_MAGIC '*'

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 2bcae7abfa81..d960b5ea76c4 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -40,17 +40,26 @@
*/
#define MAX_PAGE_ORDER 0
#define MAX_PAGES (1 << MAX_PAGE_ORDER)
-#define MAX_EVENTS (MAX_PAGES * PAGE_SIZE)
+#define MAX_BYTES (MAX_PAGES * PAGE_SIZE)
+#define MAX_EVENTS (MAX_BYTES * 8)

/* Limit how long of an event name plus args within the subsystem. */
#define MAX_EVENT_DESC 512
#define EVENT_NAME(user_event) ((user_event)->tracepoint.name)
#define MAX_FIELD_ARRAY_SIZE 1024

+#define STATUS_BYTE(bit) ((bit) >> 3)
+#define STATUS_MASK(bit) (1 << ((bit) & 7))
+
+/* Internal bits to keep track of connected probes */
+#define EVENT_STATUS_FTRACE (1 << 0)
+#define EVENT_STATUS_PERF (1 << 1)
+#define EVENT_STATUS_OTHER (1 << 7)
+
static char *register_page_data;

static DEFINE_MUTEX(reg_mutex);
-static DEFINE_HASHTABLE(register_table, 4);
+static DEFINE_HASHTABLE(register_table, 8);
static DECLARE_BITMAP(page_bitmap, MAX_EVENTS);

/*
@@ -72,6 +81,7 @@ struct user_event {
int index;
int flags;
int min_size;
+ char status;
};

/*
@@ -106,6 +116,22 @@ static u32 user_event_key(char *name)
return jhash(name, strlen(name), 0);
}

+static __always_inline
+void user_event_register_set(struct user_event *user)
+{
+ int i = user->index;
+
+ register_page_data[STATUS_BYTE(i)] |= STATUS_MASK(i);
+}
+
+static __always_inline
+void user_event_register_clear(struct user_event *user)
+{
+ int i = user->index;
+
+ register_page_data[STATUS_BYTE(i)] &= ~STATUS_MASK(i);
+}
+
static __always_inline __must_check
bool user_event_last_ref(struct user_event *user)
{
@@ -648,7 +674,7 @@ static int destroy_user_event(struct user_event *user)

dyn_event_remove(&user->devent);

- register_page_data[user->index] = 0;
+ user_event_register_clear(user);
clear_bit(user->index, page_bitmap);
hash_del(&user->node);

@@ -827,7 +853,12 @@ static void update_reg_page_for(struct user_event *user)
rcu_read_unlock_sched();
}

- register_page_data[user->index] = status;
+ if (status)
+ user_event_register_set(user);
+ else
+ user_event_register_clear(user);
+
+ user->status = status;
}

/*
@@ -1332,7 +1363,17 @@ static long user_reg_get(struct user_reg __user *ureg, struct user_reg *kreg)
if (size > PAGE_SIZE)
return -E2BIG;

- return copy_struct_from_user(kreg, sizeof(*kreg), ureg, size);
+ if (size < offsetofend(struct user_reg, write_index))
+ return -EINVAL;
+
+ ret = copy_struct_from_user(kreg, sizeof(*kreg), ureg, size);
+
+ if (ret)
+ return ret;
+
+ kreg->size = size;
+
+ return 0;
}

/*
@@ -1376,7 +1417,8 @@ static long user_events_ioctl_reg(struct file *file, unsigned long uarg)
return ret;

put_user((u32)ret, &ureg->write_index);
- put_user(user->index, &ureg->status_index);
+ put_user(STATUS_BYTE(user->index), &ureg->status_index);
+ put_user(STATUS_MASK(user->index), &ureg->status_mask);

return 0;
}
@@ -1485,7 +1527,7 @@ static int user_status_mmap(struct file *file, struct vm_area_struct *vma)
{
unsigned long size = vma->vm_end - vma->vm_start;

- if (size != MAX_EVENTS)
+ if (size != MAX_BYTES)
return -EINVAL;

return remap_pfn_range(vma, vma->vm_start,
@@ -1520,7 +1562,7 @@ static int user_seq_show(struct seq_file *m, void *p)
mutex_lock(&reg_mutex);

hash_for_each(register_table, i, user, node) {
- status = register_page_data[user->index];
+ status = user->status;
flags = user->flags;

seq_printf(m, "%d:%s", user->index, EVENT_NAME(user));
diff --git a/samples/user_events/example.c b/samples/user_events/example.c
index 4f5778e441c0..e72260bf6e49 100644
--- a/samples/user_events/example.c
+++ b/samples/user_events/example.c
@@ -33,7 +33,8 @@ static int event_status(char **status)
return 0;
}

-static int event_reg(int fd, const char *command, int *status, int *write)
+static int event_reg(int fd, const char *command, int *index, int *mask,
+ int *write)
{
struct user_reg reg = {0};

@@ -43,7 +44,8 @@ static int event_reg(int fd, const char *command, int *status, int *write)
if (ioctl(fd, DIAG_IOCSREG, &reg) == -1)
return -1;

- *status = reg.status_index;
+ *index = reg.status_index;
+ *mask = reg.status_mask;
*write = reg.write_index;

return 0;
@@ -51,7 +53,7 @@ static int event_reg(int fd, const char *command, int *status, int *write)

int main(int argc, char **argv)
{
- int data_fd, status, write;
+ int data_fd, index, mask, write;
char *status_page;
struct iovec io[2];
__u32 count = 0;
@@ -61,7 +63,7 @@ int main(int argc, char **argv)

data_fd = open(data_file, O_RDWR);

- if (event_reg(data_fd, "test u32 count", &status, &write) == -1)
+ if (event_reg(data_fd, "test u32 count", &index, &mask, &write) == -1)
return errno;

/* Setup iovec */
@@ -75,7 +77,7 @@ int main(int argc, char **argv)
getchar();

/* Check if anyone is listening */
- if (status_page[status]) {
+ if (user_event_enabled(status_page, index, mask)) {
/* Yep, trace out our data */
writev(data_fd, (const struct iovec *)io, 2);

diff --git a/tools/testing/selftests/user_events/ftrace_test.c b/tools/testing/selftests/user_events/ftrace_test.c
index a80fb5ef61d5..ba7a2757dcbd 100644
--- a/tools/testing/selftests/user_events/ftrace_test.c
+++ b/tools/testing/selftests/user_events/ftrace_test.c
@@ -197,12 +197,12 @@ TEST_F(user, register_events) {
/* Register should work */
ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
ASSERT_EQ(0, reg.write_index);
- ASSERT_NE(0, reg.status_index);
+ ASSERT_EQ(0, reg.status_index == 0 && reg.status_mask == 1);

/* Multiple registers should result in same index */
ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
ASSERT_EQ(0, reg.write_index);
- ASSERT_NE(0, reg.status_index);
+ ASSERT_EQ(0, reg.status_index == 0 && reg.status_mask == 1);

/* Ensure disabled */
self->enable_fd = open(enable_file, O_RDWR);
@@ -212,15 +212,15 @@ TEST_F(user, register_events) {
/* MMAP should work and be zero'd */
ASSERT_NE(MAP_FAILED, status_page);
ASSERT_NE(NULL, status_page);
- ASSERT_EQ(0, status_page[reg.status_index]);
+ ASSERT_EQ(0, status_page[reg.status_index] & reg.status_mask);

/* Enable event and ensure bits updated in status */
ASSERT_NE(-1, write(self->enable_fd, "1", sizeof("1")))
- ASSERT_EQ(EVENT_STATUS_FTRACE, status_page[reg.status_index]);
+ ASSERT_NE(0, status_page[reg.status_index] & reg.status_mask);

/* Disable event and ensure bits updated in status */
ASSERT_NE(-1, write(self->enable_fd, "0", sizeof("0")))
- ASSERT_EQ(0, status_page[reg.status_index]);
+ ASSERT_EQ(0, status_page[reg.status_index] & reg.status_mask);

/* File still open should return -EBUSY for delete */
ASSERT_EQ(-1, ioctl(self->data_fd, DIAG_IOCSDEL, "__test_event"));
@@ -257,7 +257,7 @@ TEST_F(user, write_events) {
/* Register should work */
ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
ASSERT_EQ(0, reg.write_index);
- ASSERT_NE(0, reg.status_index);
+ ASSERT_EQ(0, reg.status_index == 0 && reg.status_mask == 1);

/* Write should fail on invalid slot with ENOENT */
io[0].iov_base = &field2;
@@ -298,7 +298,7 @@ TEST_F(user, write_fault) {
/* Register should work */
ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
ASSERT_EQ(0, reg.write_index);
- ASSERT_NE(0, reg.status_index);
+ ASSERT_EQ(0, reg.status_index == 0 && reg.status_mask == 1);

/* Write should work normally */
ASSERT_NE(-1, writev(self->data_fd, (const struct iovec *)io, 2));
@@ -322,7 +322,7 @@ TEST_F(user, write_validator) {
/* Register should work */
ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
ASSERT_EQ(0, reg.write_index);
- ASSERT_NE(0, reg.status_index);
+ ASSERT_EQ(0, reg.status_index == 0 && reg.status_mask == 1);

io[0].iov_base = &reg.write_index;
io[0].iov_len = sizeof(reg.write_index);
diff --git a/tools/testing/selftests/user_events/perf_test.c b/tools/testing/selftests/user_events/perf_test.c
index 26851d51d6bb..81ceaf71e364 100644
--- a/tools/testing/selftests/user_events/perf_test.c
+++ b/tools/testing/selftests/user_events/perf_test.c
@@ -120,8 +120,8 @@ TEST_F(user, perf_write) {
/* Register should work */
ASSERT_EQ(0, ioctl(self->data_fd, DIAG_IOCSREG, &reg));
ASSERT_EQ(0, reg.write_index);
- ASSERT_NE(0, reg.status_index);
- ASSERT_EQ(0, status_page[reg.status_index]);
+ ASSERT_EQ(0, reg.status_index == 0 && reg.status_mask == 1);
+ ASSERT_EQ(0, status_page[reg.status_index] & reg.status_mask);

/* Id should be there */
id = get_id();
@@ -144,7 +144,7 @@ TEST_F(user, perf_write) {
ASSERT_NE(MAP_FAILED, perf_page);

/* Status should be updated */
- ASSERT_EQ(EVENT_STATUS_PERF, status_page[reg.status_index]);
+ ASSERT_NE(0, status_page[reg.status_index] & reg.status_mask);

event.index = reg.write_index;
event.field1 = 0xc001;
--
2.25.1

2022-04-05 01:20:35

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH 3/7] tracing/user_events: Use WRITE instead of READ for io vector import

import_single_range expects the direction/rw to be where it came from,
not the protection/limit. Since the import is in a write path use WRITE.

Link: https://lore.kernel.org/all/[email protected]/

Reported-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Beau Belgrave <[email protected]>
---
kernel/trace/trace_events_user.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 075d694d20e3..15edbf6b1e2e 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -1245,7 +1245,8 @@ static ssize_t user_events_write(struct file *file, const char __user *ubuf,
if (unlikely(*ppos != 0))
return -EFAULT;

- if (unlikely(import_single_range(READ, (char *)ubuf, count, &iov, &i)))
+ if (unlikely(import_single_range(WRITE, (char __user *)ubuf,
+ count, &iov, &i)))
return -EFAULT;

return user_events_write_core(file, &i);
--
2.25.1

2022-04-05 02:17:55

by Beau Belgrave

[permalink] [raw]
Subject: [PATCH 4/7] tracing/user_events: Ensure user provided strings are safely formatted

User processes can provide bad strings that may cause issues or leak
kernel details back out. Don't trust the content of these strings
when formatting strings for matching.

This also moves to a consistent dynamic length string creation model.

Link: https://lore.kernel.org/all/[email protected]/

Reported-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Beau Belgrave <[email protected]>
---
kernel/trace/trace_events_user.c | 91 +++++++++++++++++++++-----------
1 file changed, 59 insertions(+), 32 deletions(-)

diff --git a/kernel/trace/trace_events_user.c b/kernel/trace/trace_events_user.c
index 15edbf6b1e2e..f9bb7d37d76f 100644
--- a/kernel/trace/trace_events_user.c
+++ b/kernel/trace/trace_events_user.c
@@ -45,7 +45,6 @@
#define MAX_EVENT_DESC 512
#define EVENT_NAME(user_event) ((user_event)->tracepoint.name)
#define MAX_FIELD_ARRAY_SIZE 1024
-#define MAX_FIELD_ARG_NAME 256

static char *register_page_data;

@@ -483,6 +482,48 @@ static bool user_field_is_dyn_string(const char *type, const char **str_func)
}

#define LEN_OR_ZERO (len ? len - pos : 0)
+static int user_dyn_field_set_string(int argc, const char **argv, int *iout,
+ char *buf, int len, bool *colon)
+{
+ int pos = 0, i = *iout;
+
+ *colon = false;
+
+ for (; i < argc; ++i) {
+ if (i != *iout)
+ pos += snprintf(buf + pos, LEN_OR_ZERO, " ");
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", argv[i]);
+
+ if (strchr(argv[i], ';')) {
+ ++i;
+ *colon = true;
+ break;
+ }
+ }
+
+ /* Actual set, advance i */
+ if (len != 0)
+ *iout = i;
+
+ return pos + 1;
+}
+
+static int user_field_set_string(struct ftrace_event_field *field,
+ char *buf, int len, bool colon)
+{
+ int pos = 0;
+
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", field->type);
+ pos += snprintf(buf + pos, LEN_OR_ZERO, " ");
+ pos += snprintf(buf + pos, LEN_OR_ZERO, "%s", field->name);
+
+ if (colon)
+ pos += snprintf(buf + pos, LEN_OR_ZERO, ";");
+
+ return pos + 1;
+}
+
static int user_event_set_print_fmt(struct user_event *user, char *buf, int len)
{
struct ftrace_event_field *field, *next;
@@ -926,49 +967,35 @@ static int user_event_free(struct dyn_event *ev)
static bool user_field_match(struct ftrace_event_field *field, int argc,
const char **argv, int *iout)
{
- char *field_name, *arg_name;
- int len, pos, i = *iout;
+ char *field_name = NULL, *dyn_field_name = NULL;
bool colon = false, match = false;
+ int dyn_len, len;

- if (i >= argc)
+ if (*iout >= argc)
return false;

- len = MAX_FIELD_ARG_NAME;
- field_name = kmalloc(len, GFP_KERNEL);
- arg_name = kmalloc(len, GFP_KERNEL);
+ dyn_len = user_dyn_field_set_string(argc, argv, iout, dyn_field_name,
+ 0, &colon);

- if (!arg_name || !field_name)
- goto out;
-
- pos = 0;
-
- for (; i < argc; ++i) {
- if (i != *iout)
- pos += snprintf(arg_name + pos, len - pos, " ");
+ len = user_field_set_string(field, field_name, 0, colon);

- pos += snprintf(arg_name + pos, len - pos, argv[i]);
-
- if (strchr(argv[i], ';')) {
- ++i;
- colon = true;
- break;
- }
- }
+ if (dyn_len != len)
+ return false;

- pos = 0;
+ dyn_field_name = kmalloc(dyn_len, GFP_KERNEL);
+ field_name = kmalloc(len, GFP_KERNEL);

- pos += snprintf(field_name + pos, len - pos, field->type);
- pos += snprintf(field_name + pos, len - pos, " ");
- pos += snprintf(field_name + pos, len - pos, field->name);
+ if (!dyn_field_name || !field_name)
+ goto out;

- if (colon)
- pos += snprintf(field_name + pos, len - pos, ";");
+ user_dyn_field_set_string(argc, argv, iout, dyn_field_name,
+ dyn_len, &colon);

- *iout = i;
+ user_field_set_string(field, field_name, len, colon);

- match = strcmp(arg_name, field_name) == 0;
+ match = strcmp(dyn_field_name, field_name) == 0;
out:
- kfree(arg_name);
+ kfree(dyn_field_name);
kfree(field_name);

return match;
--
2.25.1

2022-04-21 10:45:41

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 6/7] tracing/user_events: Use bits vs bytes for enabled status page data

----- On Apr 20, 2022, at 4:12 PM, Beau Belgrave [email protected] wrote:

> On Wed, Apr 20, 2022 at 01:53:47PM -0400, Mathieu Desnoyers wrote:
>>
>>
>> ----- On Apr 19, 2022, at 7:48 PM, Beau Belgrave [email protected]
>> wrote:
>>
>> > On Tue, Apr 19, 2022 at 05:26:20PM -0400, Mathieu Desnoyers wrote:
>> >> ----- On Apr 19, 2022, at 2:57 PM, Beau Belgrave [email protected]
>> >> wrote:
>> >>
>> >> > On Tue, Apr 19, 2022 at 10:35:45AM -0400, Mathieu Desnoyers wrote:
>> >> >> ----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave [email protected] wrote:
>> >> >>
>> >> >> > User processes may require many events and when they do the cache
>> >> >> > performance of a byte index status check is less ideal than a bit index.
>> >> >> > The previous event limit per-page was 4096, the new limit is 32,768.
>> >> >> >
>> >> >> > This change adds a mask property to the user_reg struct. Programs check
>> >> >> > that the byte at status_index has a bit set by ANDing the status_mask.
>> >> >> >
>> >> >> > Link:
>> >> >> > https://lore.kernel.org/all/[email protected]/
>> >> >> >
>> >> >> > Suggested-by: Mathieu Desnoyers <[email protected]>
>> >> >> > Signed-off-by: Beau Belgrave <[email protected]>
>> >> >>
>> >> >> Hi Beau,
>> >> >>
>> >> >> Considering this will be used in a fast-path, why choose bytewise
>> >> >> loads for the byte at status_index and the status_mask ?
>> >> >>
>> >> >
>> >> > First, thanks for the review!
>> >> >
>> >> > Which loads are you concerned about? The user programs can store the
>> >> > index and mask in another type after registration instead of an int.
>> >>
>> >> I'm concerned about the loads from user-space, considering that
>> >> those are on the fast-path.
>> >>
>> >> Indeed user programs will need to copy the status index and mask
>> >> returned in struct user_reg, so adapting the indexing and mask to
>> >> deal with an array of unsigned long rather than bytes can be done
>> >> at that point, but I wonder how many users will go through that
>> >> extra trouble unless there are helpers to convert the status index
>> >> from byte-wise to long-wise, and convert the status mask from a
>> >> byte-wise mask to a long-wise mask (and associated documentation).
>> >>
>> >
>> > Yeah, do you think it's wise to maybe add inline functions in
>> > user_events.h to do this conversion? I could then add them to our
>> > documentation.
>> >
>> > Hopefully this would make more APIs/people do the better approach?
>> >
>> >>
>> >> >
>> >> > However, you may be referring to something on the kernel side?
>> >>
>> >> No.
>> >>
>> >
>> > [..]
>> >
>> >> >> Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit
>> >> >> binaries and 64-bit on 64-bit binaries) for both the array access
>> >> >> and the status mask, but this brings extra complexity for 32-bit compat
>> >> >> handling.
>> >> >>
>> >> >
>> >> > User programs can store the index and mask returned into better value
>> >> > types for their architecture.
>> >> >
>> >> > I agree it will cause compat handling issues if it's put into the user
>> >> > facing header as a long.
>> >> >
>> >> > I was hoping APIs, like libtracefs, could abstract many callers from how
>> >> > best to use the returned values. For example, it could save the index
>> >> > and mask as unsigned long for the callers and use those for the
>> >> > enablement checks.
>> >> >
>> >> > Do you think there is a way to enable these native types in the ABI
>> >> > without causing compat handling issues? I used ints to prevent compat
>> >> > issues between 32-bit user mode and 64-bit kernel mode.
>> >>
>> >> I think you are right: this is not an ABI issue, but rather a usability
>> >> issue that can be solved by implementing and documenting user-space library
>> >> helpers to help user applications index the array and apply the mask to an
>> >> unsigned long type.
>> >>
>> >
>> > Great. Let me know if updating user_events.h to do the conversion is a
>> > good idea or not, or if you have other thoughts how to make more people
>> > do the best thing.
>>
>> Usually uapi headers are reserved for exposing the kernel ABI to user-space.
>> I think the helpers we are discussing here do not belong to the uapi because
>> they
>> do not define the ABI, and should probably sit elsewhere in a proper library.
>>
>
> Makes sense.
>
> That likely means I should remove the enablement helper check from
> user_events.h, right?

Yes, I would be tempted to remove it, and document the bitwise index ABI
instead.

>
>> If the status_mask is meant to be modified in some ways by user-space before it
>> can
>> be used as a mask, I wonder why it is exposed as a byte-wise mask at all ?
>>
>> Rather than exposing a byte-wise index and single-byte mask as ABI, the kernel
>> could
>> simply expose a bit-wise index, which can then be used by the application to
>> calculate
>> index and mask, which it should interpret in little endian if it wants to apply
>> the
>> mask on types larger than a single byte.
>>
>> Thoughts ?
>>
>
> Yeah, you're right, we can just expose out the bit-wise index at the
> ABI.
>
> I'll switch over to that model in the next version.

Allright !

Thanks,

Mathieu

>
> Thanks,
> -Beau
>
>> Thanks,
>>
>> Mathieu
>>
>
> [..]

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2022-04-21 21:23:06

by Beau Belgrave

[permalink] [raw]
Subject: Re: [PATCH 6/7] tracing/user_events: Use bits vs bytes for enabled status page data

On Wed, Apr 20, 2022 at 01:53:47PM -0400, Mathieu Desnoyers wrote:
>
>
> ----- On Apr 19, 2022, at 7:48 PM, Beau Belgrave [email protected] wrote:
>
> > On Tue, Apr 19, 2022 at 05:26:20PM -0400, Mathieu Desnoyers wrote:
> >> ----- On Apr 19, 2022, at 2:57 PM, Beau Belgrave [email protected]
> >> wrote:
> >>
> >> > On Tue, Apr 19, 2022 at 10:35:45AM -0400, Mathieu Desnoyers wrote:
> >> >> ----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave [email protected] wrote:
> >> >>
> >> >> > User processes may require many events and when they do the cache
> >> >> > performance of a byte index status check is less ideal than a bit index.
> >> >> > The previous event limit per-page was 4096, the new limit is 32,768.
> >> >> >
> >> >> > This change adds a mask property to the user_reg struct. Programs check
> >> >> > that the byte at status_index has a bit set by ANDing the status_mask.
> >> >> >
> >> >> > Link:
> >> >> > https://lore.kernel.org/all/[email protected]/
> >> >> >
> >> >> > Suggested-by: Mathieu Desnoyers <[email protected]>
> >> >> > Signed-off-by: Beau Belgrave <[email protected]>
> >> >>
> >> >> Hi Beau,
> >> >>
> >> >> Considering this will be used in a fast-path, why choose bytewise
> >> >> loads for the byte at status_index and the status_mask ?
> >> >>
> >> >
> >> > First, thanks for the review!
> >> >
> >> > Which loads are you concerned about? The user programs can store the
> >> > index and mask in another type after registration instead of an int.
> >>
> >> I'm concerned about the loads from user-space, considering that
> >> those are on the fast-path.
> >>
> >> Indeed user programs will need to copy the status index and mask
> >> returned in struct user_reg, so adapting the indexing and mask to
> >> deal with an array of unsigned long rather than bytes can be done
> >> at that point, but I wonder how many users will go through that
> >> extra trouble unless there are helpers to convert the status index
> >> from byte-wise to long-wise, and convert the status mask from a
> >> byte-wise mask to a long-wise mask (and associated documentation).
> >>
> >
> > Yeah, do you think it's wise to maybe add inline functions in
> > user_events.h to do this conversion? I could then add them to our
> > documentation.
> >
> > Hopefully this would make more APIs/people do the better approach?
> >
> >>
> >> >
> >> > However, you may be referring to something on the kernel side?
> >>
> >> No.
> >>
> >
> > [..]
> >
> >> >> Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit
> >> >> binaries and 64-bit on 64-bit binaries) for both the array access
> >> >> and the status mask, but this brings extra complexity for 32-bit compat
> >> >> handling.
> >> >>
> >> >
> >> > User programs can store the index and mask returned into better value
> >> > types for their architecture.
> >> >
> >> > I agree it will cause compat handling issues if it's put into the user
> >> > facing header as a long.
> >> >
> >> > I was hoping APIs, like libtracefs, could abstract many callers from how
> >> > best to use the returned values. For example, it could save the index
> >> > and mask as unsigned long for the callers and use those for the
> >> > enablement checks.
> >> >
> >> > Do you think there is a way to enable these native types in the ABI
> >> > without causing compat handling issues? I used ints to prevent compat
> >> > issues between 32-bit user mode and 64-bit kernel mode.
> >>
> >> I think you are right: this is not an ABI issue, but rather a usability
> >> issue that can be solved by implementing and documenting user-space library
> >> helpers to help user applications index the array and apply the mask to an
> >> unsigned long type.
> >>
> >
> > Great. Let me know if updating user_events.h to do the conversion is a
> > good idea or not, or if you have other thoughts how to make more people
> > do the best thing.
>
> Usually uapi headers are reserved for exposing the kernel ABI to user-space.
> I think the helpers we are discussing here do not belong to the uapi because they
> do not define the ABI, and should probably sit elsewhere in a proper library.
>

Makes sense.

That likely means I should remove the enablement helper check from
user_events.h, right?

> If the status_mask is meant to be modified in some ways by user-space before it can
> be used as a mask, I wonder why it is exposed as a byte-wise mask at all ?
>
> Rather than exposing a byte-wise index and single-byte mask as ABI, the kernel could
> simply expose a bit-wise index, which can then be used by the application to calculate
> index and mask, which it should interpret in little endian if it wants to apply the
> mask on types larger than a single byte.
>
> Thoughts ?
>

Yeah, you're right, we can just expose out the bit-wise index at the
ABI.

I'll switch over to that model in the next version.

Thanks,
-Beau

> Thanks,
>
> Mathieu
>

[..]

2022-04-22 10:14:37

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 6/7] tracing/user_events: Use bits vs bytes for enabled status page data

----- On Apr 19, 2022, at 2:57 PM, Beau Belgrave [email protected] wrote:

> On Tue, Apr 19, 2022 at 10:35:45AM -0400, Mathieu Desnoyers wrote:
>> ----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave [email protected] wrote:
>>
>> > User processes may require many events and when they do the cache
>> > performance of a byte index status check is less ideal than a bit index.
>> > The previous event limit per-page was 4096, the new limit is 32,768.
>> >
>> > This change adds a mask property to the user_reg struct. Programs check
>> > that the byte at status_index has a bit set by ANDing the status_mask.
>> >
>> > Link:
>> > https://lore.kernel.org/all/[email protected]/
>> >
>> > Suggested-by: Mathieu Desnoyers <[email protected]>
>> > Signed-off-by: Beau Belgrave <[email protected]>
>>
>> Hi Beau,
>>
>> Considering this will be used in a fast-path, why choose bytewise
>> loads for the byte at status_index and the status_mask ?
>>
>
> First, thanks for the review!
>
> Which loads are you concerned about? The user programs can store the
> index and mask in another type after registration instead of an int.

I'm concerned about the loads from user-space, considering that
those are on the fast-path.

Indeed user programs will need to copy the status index and mask
returned in struct user_reg, so adapting the indexing and mask to
deal with an array of unsigned long rather than bytes can be done
at that point, but I wonder how many users will go through that
extra trouble unless there are helpers to convert the status index
from byte-wise to long-wise, and convert the status mask from a
byte-wise mask to a long-wise mask (and associated documentation).


>
> However, you may be referring to something on the kernel side?

No.

>
>> I'm concerned about the performance penalty associated with partial
>> register stalls when working with bytewise ALU operations rather than
>> operations using the entire registers.
>>
>
> On the kernel side these only occur when a registration happens (pretty
> rare compared to enabled checks) or a delete (even rarer). But I have
> the feeling you are more concerned about the user side, right?

Right.

>
>> Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit
>> binaries and 64-bit on 64-bit binaries) for both the array access
>> and the status mask, but this brings extra complexity for 32-bit compat
>> handling.
>>
>
> User programs can store the index and mask returned into better value
> types for their architecture.
>
> I agree it will cause compat handling issues if it's put into the user
> facing header as a long.
>
> I was hoping APIs, like libtracefs, could abstract many callers from how
> best to use the returned values. For example, it could save the index
> and mask as unsigned long for the callers and use those for the
> enablement checks.
>
> Do you think there is a way to enable these native types in the ABI
> without causing compat handling issues? I used ints to prevent compat
> issues between 32-bit user mode and 64-bit kernel mode.

I think you are right: this is not an ABI issue, but rather a usability
issue that can be solved by implementing and documenting user-space library
helpers to help user applications index the array and apply the mask to an
unsigned long type.

Thanks,

Mathieu

>
>> Thanks,
>>
>> Mathieu
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
>
> Thanks,
> -Beau

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2022-04-22 17:51:52

by Beau Belgrave

[permalink] [raw]
Subject: Re: [PATCH 6/7] tracing/user_events: Use bits vs bytes for enabled status page data

On Tue, Apr 19, 2022 at 05:26:20PM -0400, Mathieu Desnoyers wrote:
> ----- On Apr 19, 2022, at 2:57 PM, Beau Belgrave [email protected] wrote:
>
> > On Tue, Apr 19, 2022 at 10:35:45AM -0400, Mathieu Desnoyers wrote:
> >> ----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave [email protected] wrote:
> >>
> >> > User processes may require many events and when they do the cache
> >> > performance of a byte index status check is less ideal than a bit index.
> >> > The previous event limit per-page was 4096, the new limit is 32,768.
> >> >
> >> > This change adds a mask property to the user_reg struct. Programs check
> >> > that the byte at status_index has a bit set by ANDing the status_mask.
> >> >
> >> > Link:
> >> > https://lore.kernel.org/all/[email protected]/
> >> >
> >> > Suggested-by: Mathieu Desnoyers <[email protected]>
> >> > Signed-off-by: Beau Belgrave <[email protected]>
> >>
> >> Hi Beau,
> >>
> >> Considering this will be used in a fast-path, why choose bytewise
> >> loads for the byte at status_index and the status_mask ?
> >>
> >
> > First, thanks for the review!
> >
> > Which loads are you concerned about? The user programs can store the
> > index and mask in another type after registration instead of an int.
>
> I'm concerned about the loads from user-space, considering that
> those are on the fast-path.
>
> Indeed user programs will need to copy the status index and mask
> returned in struct user_reg, so adapting the indexing and mask to
> deal with an array of unsigned long rather than bytes can be done
> at that point, but I wonder how many users will go through that
> extra trouble unless there are helpers to convert the status index
> from byte-wise to long-wise, and convert the status mask from a
> byte-wise mask to a long-wise mask (and associated documentation).
>

Yeah, do you think it's wise to maybe add inline functions in
user_events.h to do this conversion? I could then add them to our
documentation.

Hopefully this would make more APIs/people do the better approach?

>
> >
> > However, you may be referring to something on the kernel side?
>
> No.
>

[..]

> >> Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit
> >> binaries and 64-bit on 64-bit binaries) for both the array access
> >> and the status mask, but this brings extra complexity for 32-bit compat
> >> handling.
> >>
> >
> > User programs can store the index and mask returned into better value
> > types for their architecture.
> >
> > I agree it will cause compat handling issues if it's put into the user
> > facing header as a long.
> >
> > I was hoping APIs, like libtracefs, could abstract many callers from how
> > best to use the returned values. For example, it could save the index
> > and mask as unsigned long for the callers and use those for the
> > enablement checks.
> >
> > Do you think there is a way to enable these native types in the ABI
> > without causing compat handling issues? I used ints to prevent compat
> > issues between 32-bit user mode and 64-bit kernel mode.
>
> I think you are right: this is not an ABI issue, but rather a usability
> issue that can be solved by implementing and documenting user-space library
> helpers to help user applications index the array and apply the mask to an
> unsigned long type.
>

Great. Let me know if updating user_events.h to do the conversion is a
good idea or not, or if you have other thoughts how to make more people
do the best thing.

> Thanks,
>
> Mathieu
>
> >
> >> Thanks,
> >>
> >> Mathieu
> >>
> >> --
> >> Mathieu Desnoyers
> >> EfficiOS Inc.
> >> http://www.efficios.com
> >
> > Thanks,
> > -Beau
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

Thanks,
-Beau

2022-04-22 21:21:14

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 6/7] tracing/user_events: Use bits vs bytes for enabled status page data

----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave [email protected] wrote:

> User processes may require many events and when they do the cache
> performance of a byte index status check is less ideal than a bit index.
> The previous event limit per-page was 4096, the new limit is 32,768.
>
> This change adds a mask property to the user_reg struct. Programs check
> that the byte at status_index has a bit set by ANDing the status_mask.
>
> Link:
> https://lore.kernel.org/all/[email protected]/
>
> Suggested-by: Mathieu Desnoyers <[email protected]>
> Signed-off-by: Beau Belgrave <[email protected]>

Hi Beau,

Considering this will be used in a fast-path, why choose bytewise
loads for the byte at status_index and the status_mask ?

I'm concerned about the performance penalty associated with partial
register stalls when working with bytewise ALU operations rather than
operations using the entire registers.

Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit
binaries and 64-bit on 64-bit binaries) for both the array access
and the status mask, but this brings extra complexity for 32-bit compat
handling.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2022-04-22 21:42:20

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 6/7] tracing/user_events: Use bits vs bytes for enabled status page data



----- On Apr 19, 2022, at 7:48 PM, Beau Belgrave [email protected] wrote:

> On Tue, Apr 19, 2022 at 05:26:20PM -0400, Mathieu Desnoyers wrote:
>> ----- On Apr 19, 2022, at 2:57 PM, Beau Belgrave [email protected]
>> wrote:
>>
>> > On Tue, Apr 19, 2022 at 10:35:45AM -0400, Mathieu Desnoyers wrote:
>> >> ----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave [email protected] wrote:
>> >>
>> >> > User processes may require many events and when they do the cache
>> >> > performance of a byte index status check is less ideal than a bit index.
>> >> > The previous event limit per-page was 4096, the new limit is 32,768.
>> >> >
>> >> > This change adds a mask property to the user_reg struct. Programs check
>> >> > that the byte at status_index has a bit set by ANDing the status_mask.
>> >> >
>> >> > Link:
>> >> > https://lore.kernel.org/all/[email protected]/
>> >> >
>> >> > Suggested-by: Mathieu Desnoyers <[email protected]>
>> >> > Signed-off-by: Beau Belgrave <[email protected]>
>> >>
>> >> Hi Beau,
>> >>
>> >> Considering this will be used in a fast-path, why choose bytewise
>> >> loads for the byte at status_index and the status_mask ?
>> >>
>> >
>> > First, thanks for the review!
>> >
>> > Which loads are you concerned about? The user programs can store the
>> > index and mask in another type after registration instead of an int.
>>
>> I'm concerned about the loads from user-space, considering that
>> those are on the fast-path.
>>
>> Indeed user programs will need to copy the status index and mask
>> returned in struct user_reg, so adapting the indexing and mask to
>> deal with an array of unsigned long rather than bytes can be done
>> at that point, but I wonder how many users will go through that
>> extra trouble unless there are helpers to convert the status index
>> from byte-wise to long-wise, and convert the status mask from a
>> byte-wise mask to a long-wise mask (and associated documentation).
>>
>
> Yeah, do you think it's wise to maybe add inline functions in
> user_events.h to do this conversion? I could then add them to our
> documentation.
>
> Hopefully this would make more APIs/people do the better approach?
>
>>
>> >
>> > However, you may be referring to something on the kernel side?
>>
>> No.
>>
>
> [..]
>
>> >> Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit
>> >> binaries and 64-bit on 64-bit binaries) for both the array access
>> >> and the status mask, but this brings extra complexity for 32-bit compat
>> >> handling.
>> >>
>> >
>> > User programs can store the index and mask returned into better value
>> > types for their architecture.
>> >
>> > I agree it will cause compat handling issues if it's put into the user
>> > facing header as a long.
>> >
>> > I was hoping APIs, like libtracefs, could abstract many callers from how
>> > best to use the returned values. For example, it could save the index
>> > and mask as unsigned long for the callers and use those for the
>> > enablement checks.
>> >
>> > Do you think there is a way to enable these native types in the ABI
>> > without causing compat handling issues? I used ints to prevent compat
>> > issues between 32-bit user mode and 64-bit kernel mode.
>>
>> I think you are right: this is not an ABI issue, but rather a usability
>> issue that can be solved by implementing and documenting user-space library
>> helpers to help user applications index the array and apply the mask to an
>> unsigned long type.
>>
>
> Great. Let me know if updating user_events.h to do the conversion is a
> good idea or not, or if you have other thoughts how to make more people
> do the best thing.

Usually uapi headers are reserved for exposing the kernel ABI to user-space.
I think the helpers we are discussing here do not belong to the uapi because they
do not define the ABI, and should probably sit elsewhere in a proper library.

If the status_mask is meant to be modified in some ways by user-space before it can
be used as a mask, I wonder why it is exposed as a byte-wise mask at all ?

Rather than exposing a byte-wise index and single-byte mask as ABI, the kernel could
simply expose a bit-wise index, which can then be used by the application to calculate
index and mask, which it should interpret in little endian if it wants to apply the
mask on types larger than a single byte.

Thoughts ?

Thanks,

Mathieu

>
>> Thanks,
>>
>> Mathieu
>>
>> >
>> >> Thanks,
>> >>
>> >> Mathieu
>> >>
>> >> --
>> >> Mathieu Desnoyers
>> >> EfficiOS Inc.
>> >> http://www.efficios.com
>> >
>> > Thanks,
>> > -Beau
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
>
> Thanks,
> -Beau

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2022-04-22 22:40:25

by Beau Belgrave

[permalink] [raw]
Subject: Re: [PATCH 6/7] tracing/user_events: Use bits vs bytes for enabled status page data

On Tue, Apr 19, 2022 at 10:35:45AM -0400, Mathieu Desnoyers wrote:
> ----- On Apr 1, 2022, at 7:43 PM, Beau Belgrave [email protected] wrote:
>
> > User processes may require many events and when they do the cache
> > performance of a byte index status check is less ideal than a bit index.
> > The previous event limit per-page was 4096, the new limit is 32,768.
> >
> > This change adds a mask property to the user_reg struct. Programs check
> > that the byte at status_index has a bit set by ANDing the status_mask.
> >
> > Link:
> > https://lore.kernel.org/all/[email protected]/
> >
> > Suggested-by: Mathieu Desnoyers <[email protected]>
> > Signed-off-by: Beau Belgrave <[email protected]>
>
> Hi Beau,
>
> Considering this will be used in a fast-path, why choose bytewise
> loads for the byte at status_index and the status_mask ?
>

First, thanks for the review!

Which loads are you concerned about? The user programs can store the
index and mask in another type after registration instead of an int.

However, you may be referring to something on the kernel side?

> I'm concerned about the performance penalty associated with partial
> register stalls when working with bytewise ALU operations rather than
> operations using the entire registers.
>

On the kernel side these only occur when a registration happens (pretty
rare compared to enabled checks) or a delete (even rarer). But I have
the feeling you are more concerned about the user side, right?

> Ideally I would be tempted to use "unsigned long" type (32-bit on 32-bit
> binaries and 64-bit on 64-bit binaries) for both the array access
> and the status mask, but this brings extra complexity for 32-bit compat
> handling.
>

User programs can store the index and mask returned into better value
types for their architecture.

I agree it will cause compat handling issues if it's put into the user
facing header as a long.

I was hoping APIs, like libtracefs, could abstract many callers from how
best to use the returned values. For example, it could save the index
and mask as unsigned long for the callers and use those for the
enablement checks.

Do you think there is a way to enable these native types in the ABI
without causing compat handling issues? I used ints to prevent compat
issues between 32-bit user mode and 64-bit kernel mode.

> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

Thanks,
-Beau