2024-04-16 14:11:59

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 00/18] Introduce bpf_wq

This is a followup of sleepable bpf_timer[0].

When discussing sleepable bpf_timer, it was thought that we should give
a try to bpf_wq, as the 2 APIs are similar but distinct enough to
justify a new one.

So here it is.

I tried to keep as much as possible common code in kernel/bpf/helpers.c
but I couldn't get away with code duplication in kernel/bpf/verifier.c.

This series introduces a basic bpf_wq support:
- creation is supported
- assignment is supported
- running a simple bpf_wq is also supported.

We will probably need to extend the API further with:
- a full delayed_work API (can be piggy backed on top with a correct
flag)
- bpf_wq_cancel()
- bpf_wq_cancel_sync() (for sleepable programs)
- documentation

But for now, let's focus on what we currently have to see if it's worth
it compared to sleepable bpf_timer.

FWIW, I still have a couple of concerns with this implementation:
- I'm explicitely declaring the async callback as sleepable or not
(BPF_F_WQ_SLEEPABLE) through a flag. Is it really worth it?
Or should I just consider that any wq is running in a sleepable
context?
- bpf_wq_work() access ->prog without protection, but I think this might
be racing with bpf_wq_set_callback(): if we have the following:

CPU 0 CPU 1
bpf_wq_set_callback()
bpf_start()
bpf_wq_work():
prog = cb->prog;

bpf_wq_set_callback()
cb->prog = prog;
bpf_prog_put(prev)
rcu_assign_ptr(cb->callback_fn,
callback_fn);
callback = READ_ONCE(w->cb.callback_fn);

As I understand callback_fn is fine, prog might be, but we clearly
have an inconstency between "prog" and "callback_fn" as they can come
from 2 different bpf_wq_set_callback() calls.

IMO we should protect this by the async->lock, but I'm not sure if
it's OK or not.

---

For reference, the use cases I have in mind:

---

Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.

2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.

3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.

Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.

bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.

Cheers,
Benjamin

To: Alexei Starovoitov <[email protected]>
To: Daniel Borkmann <[email protected]>
To: Andrii Nakryiko <[email protected]>
To: Martin KaFai Lau <[email protected]>
To: Eduard Zingerman <[email protected]>
To: Song Liu <[email protected]>
To: Yonghong Song <[email protected]>
To: John Fastabend <[email protected]>
To: KP Singh <[email protected]>
To: Stanislav Fomichev <[email protected]>
To: Hao Luo <[email protected]>
To: Jiri Olsa <[email protected]>
To: Mykola Lysenko <[email protected]>
To: Shuah Khan <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Benjamin Tissoires <[email protected]>

[0] https://lore.kernel.org/all/[email protected]/

---
Benjamin Tissoires (18):
bpf: trampoline: export __bpf_prog_enter/exit_recur
bpf: make timer data struct more generic
bpf: replace bpf_timer_init with a generic helper
bpf: replace bpf_timer_set_callback with a generic helper
bpf: replace bpf_timer_cancel_and_free with a generic helper
bpf: add support for bpf_wq user type
tools: sync include/uapi/linux/bpf.h
bpf: add support for KF_ARG_PTR_TO_WORKQUEUE
bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps
selftests/bpf: add bpf_wq tests
bpf: wq: add bpf_wq_init
tools: sync include/uapi/linux/bpf.h
selftests/bpf: wq: add bpf_wq_init() checks
bpf/verifier: add is_sleepable argument to push_callback_call
bpf: wq: add bpf_wq_set_callback_impl
selftests/bpf: add checks for bpf_wq_set_callback()
bpf: add bpf_wq_start
selftests/bpf: wq: add bpf_wq_start() checks

include/linux/bpf.h | 17 +-
include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 13 +
kernel/bpf/arraymap.c | 18 +-
kernel/bpf/btf.c | 17 +
kernel/bpf/hashtab.c | 55 ++-
kernel/bpf/helpers.c | 371 ++++++++++++++++-----
kernel/bpf/syscall.c | 16 +-
kernel/bpf/trampoline.c | 6 +-
kernel/bpf/verifier.c | 195 ++++++++++-
tools/include/uapi/linux/bpf.h | 13 +
tools/testing/selftests/bpf/bpf_experimental.h | 7 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/wq.c | 41 +++
tools/testing/selftests/bpf/progs/wq.c | 192 +++++++++++
tools/testing/selftests/bpf/progs/wq_failures.c | 197 +++++++++++
17 files changed, 1052 insertions(+), 113 deletions(-)
---
base-commit: ffa6b26b4d8a0520b78636ca9373ab842cb3b1a8
change-id: 20240411-bpf_wq-fe24e8d24f5e

Best regards,
--
Benjamin Tissoires <[email protected]>



2024-04-16 14:12:53

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 06/18] bpf: add support for bpf_wq user type

Mostly a copy/paste from the bpf_timer API, without the initialization
and free, as they will be done in a separate patch.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
include/linux/bpf.h | 11 ++++++++++-
include/uapi/linux/bpf.h | 4 ++++
kernel/bpf/btf.c | 17 +++++++++++++++++
kernel/bpf/syscall.c | 6 +++++-
kernel/bpf/verifier.c | 9 +++++++++
5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 551445c47779..45cb13dfd15e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -185,7 +185,7 @@ struct bpf_map_ops {

enum {
/* Support at most 10 fields in a BTF type */
- BTF_FIELDS_MAX = 10,
+ BTF_FIELDS_MAX = 11,
};

enum btf_field_type {
@@ -202,6 +202,7 @@ enum btf_field_type {
BPF_GRAPH_NODE = BPF_RB_NODE | BPF_LIST_NODE,
BPF_GRAPH_ROOT = BPF_RB_ROOT | BPF_LIST_HEAD,
BPF_REFCOUNT = (1 << 9),
+ BPF_WORKQUEUE = (1 << 10),
};

typedef void (*btf_dtor_kfunc_t)(void *);
@@ -238,6 +239,7 @@ struct btf_record {
u32 field_mask;
int spin_lock_off;
int timer_off;
+ int wq_off;
int refcount_off;
struct btf_field fields[];
};
@@ -312,6 +314,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
return "bpf_spin_lock";
case BPF_TIMER:
return "bpf_timer";
+ case BPF_WORKQUEUE:
+ return "bpf_wq";
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
return "kptr";
@@ -340,6 +344,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
return sizeof(struct bpf_spin_lock);
case BPF_TIMER:
return sizeof(struct bpf_timer);
+ case BPF_WORKQUEUE:
+ return sizeof(struct bpf_wq);
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
@@ -367,6 +373,8 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
return __alignof__(struct bpf_spin_lock);
case BPF_TIMER:
return __alignof__(struct bpf_timer);
+ case BPF_WORKQUEUE:
+ return __alignof__(struct bpf_wq);
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
@@ -406,6 +414,7 @@ static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
case BPF_SPIN_LOCK:
case BPF_TIMER:
+ case BPF_WORKQUEUE:
case BPF_KPTR_UNREF:
case BPF_KPTR_REF:
case BPF_KPTR_PERCPU:
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cee0a7915c08..e4ae83550fb3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7306,6 +7306,10 @@ struct bpf_timer {
__u64 __opaque[2];
} __attribute__((aligned(8)));

+struct bpf_wq {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
struct bpf_dynptr {
__u64 __opaque[2];
} __attribute__((aligned(8)));
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 90c4a32d89ff..a9b5b28a2630 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3464,6 +3464,15 @@ static int btf_get_field_type(const char *name, u32 field_mask, u32 *seen_mask,
goto end;
}
}
+ if (field_mask & BPF_WORKQUEUE) {
+ if (!strcmp(name, "bpf_wq")) {
+ if (*seen_mask & BPF_WORKQUEUE)
+ return -E2BIG;
+ *seen_mask |= BPF_WORKQUEUE;
+ type = BPF_WORKQUEUE;
+ goto end;
+ }
+ }
field_mask_test_name(BPF_LIST_HEAD, "bpf_list_head");
field_mask_test_name(BPF_LIST_NODE, "bpf_list_node");
field_mask_test_name(BPF_RB_ROOT, "bpf_rb_root");
@@ -3515,6 +3524,7 @@ static int btf_find_struct_field(const struct btf *btf,
switch (field_type) {
case BPF_SPIN_LOCK:
case BPF_TIMER:
+ case BPF_WORKQUEUE:
case BPF_LIST_NODE:
case BPF_RB_NODE:
case BPF_REFCOUNT:
@@ -3582,6 +3592,7 @@ static int btf_find_datasec_var(const struct btf *btf, const struct btf_type *t,
switch (field_type) {
case BPF_SPIN_LOCK:
case BPF_TIMER:
+ case BPF_WORKQUEUE:
case BPF_LIST_NODE:
case BPF_RB_NODE:
case BPF_REFCOUNT:
@@ -3816,6 +3827,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type

rec->spin_lock_off = -EINVAL;
rec->timer_off = -EINVAL;
+ rec->wq_off = -EINVAL;
rec->refcount_off = -EINVAL;
for (i = 0; i < cnt; i++) {
field_type_size = btf_field_type_size(info_arr[i].type);
@@ -3846,6 +3858,11 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
/* Cache offset for faster lookup at runtime */
rec->timer_off = rec->fields[i].offset;
break;
+ case BPF_WORKQUEUE:
+ WARN_ON_ONCE(rec->wq_off >= 0);
+ /* Cache offset for faster lookup at runtime */
+ rec->wq_off = rec->fields[i].offset;
+ break;
case BPF_REFCOUNT:
WARN_ON_ONCE(rec->refcount_off >= 0);
/* Cache offset for faster lookup at runtime */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7d392ec83655..0848e4141b00 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -559,6 +559,7 @@ void btf_record_free(struct btf_record *rec)
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_REFCOUNT:
+ case BPF_WORKQUEUE:
/* Nothing to release */
break;
default:
@@ -608,6 +609,7 @@ struct btf_record *btf_record_dup(const struct btf_record *rec)
case BPF_SPIN_LOCK:
case BPF_TIMER:
case BPF_REFCOUNT:
+ case BPF_WORKQUEUE:
/* Nothing to acquire */
break;
default:
@@ -679,6 +681,8 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
case BPF_TIMER:
bpf_timer_cancel_and_free(field_ptr);
break;
+ case BPF_WORKQUEUE:
+ break;
case BPF_KPTR_UNREF:
WRITE_ONCE(*(u64 *)field_ptr, 0);
break;
@@ -1085,7 +1089,7 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,

map->record = btf_parse_fields(btf, value_type,
BPF_SPIN_LOCK | BPF_TIMER | BPF_KPTR | BPF_LIST_HEAD |
- BPF_RB_ROOT | BPF_REFCOUNT,
+ BPF_RB_ROOT | BPF_REFCOUNT | BPF_WORKQUEUE,
map->value_size);
if (!IS_ERR_OR_NULL(map->record)) {
int i;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2aad6d90550f..deaf2e1ab690 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1838,6 +1838,8 @@ static void mark_ptr_not_null_reg(struct bpf_reg_state *reg)
*/
if (btf_record_has_field(map->inner_map_meta->record, BPF_TIMER))
reg->map_uid = reg->id;
+ if (btf_record_has_field(map->inner_map_meta->record, BPF_WORKQUEUE))
+ reg->map_uid = reg->id;
} else if (map->map_type == BPF_MAP_TYPE_XSKMAP) {
reg->type = PTR_TO_XDP_SOCK;
} else if (map->map_type == BPF_MAP_TYPE_SOCKMAP ||
@@ -18155,6 +18157,13 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
}
}

+ if (btf_record_has_field(map->record, BPF_WORKQUEUE)) {
+ if (is_tracing_prog_type(prog_type)) {
+ verbose(env, "tracing progs cannot use bpf_wq yet\n");
+ return -EINVAL;
+ }
+ }
+
if ((bpf_prog_is_offloaded(prog->aux) || bpf_map_is_offloaded(map)) &&
!bpf_offload_prog_map_match(prog, map)) {
verbose(env, "offload device mismatch between prog and map\n");

--
2.44.0


2024-04-16 14:12:58

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 02/18] bpf: make timer data struct more generic

To be able to add workqueues and reuse most of the timer code, we need
to make bpf_hrtimer more generic.

There is no code change except that the new struct gets a new u64 flags
attribute. We are still below 2 cache lines, so this shouldn't impact
the current running codes.

The ordering is also changed. Everything related to async callback
is now on top of bpf_hrtimer.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
kernel/bpf/helpers.c | 71 ++++++++++++++++++++++++++++------------------------
1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8cde717137bd..5a069c70b5e6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1079,11 +1079,20 @@ const struct bpf_func_proto bpf_snprintf_proto = {
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
};

+struct bpf_async_cb {
+ struct bpf_map *map;
+ struct bpf_prog *prog;
+ void __rcu *callback_fn;
+ void *value;
+ struct rcu_head rcu;
+ u64 flags;
+};
+
/* BPF map elements can contain 'struct bpf_timer'.
* Such map owns all of its BPF timers.
* 'struct bpf_timer' is allocated as part of map element allocation
* and it's zero initialized.
- * That space is used to keep 'struct bpf_timer_kern'.
+ * That space is used to keep 'struct bpf_async_kern'.
* bpf_timer_init() allocates 'struct bpf_hrtimer', inits hrtimer, and
* remembers 'struct bpf_map *' pointer it's part of.
* bpf_timer_set_callback() increments prog refcnt and assign bpf callback_fn.
@@ -1096,16 +1105,12 @@ const struct bpf_func_proto bpf_snprintf_proto = {
* freeing the timers when inner map is replaced or deleted by user space.
*/
struct bpf_hrtimer {
+ struct bpf_async_cb cb;
struct hrtimer timer;
- struct bpf_map *map;
- struct bpf_prog *prog;
- void __rcu *callback_fn;
- void *value;
- struct rcu_head rcu;
};

/* the actual struct hidden inside uapi struct bpf_timer */
-struct bpf_timer_kern {
+struct bpf_async_kern {
struct bpf_hrtimer *timer;
/* bpf_spin_lock is used here instead of spinlock_t to make
* sure that it always fits into space reserved by struct bpf_timer
@@ -1119,14 +1124,14 @@ static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
{
struct bpf_hrtimer *t = container_of(hrtimer, struct bpf_hrtimer, timer);
- struct bpf_map *map = t->map;
- void *value = t->value;
+ struct bpf_map *map = t->cb.map;
+ void *value = t->cb.value;
bpf_callback_t callback_fn;
void *key;
u32 idx;

BTF_TYPE_EMIT(struct bpf_timer);
- callback_fn = rcu_dereference_check(t->callback_fn, rcu_read_lock_bh_held());
+ callback_fn = rcu_dereference_check(t->cb.callback_fn, rcu_read_lock_bh_held());
if (!callback_fn)
goto out;

@@ -1155,7 +1160,7 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
return HRTIMER_NORESTART;
}

-BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map,
+BPF_CALL_3(bpf_timer_init, struct bpf_async_kern *, timer, struct bpf_map *, map,
u64, flags)
{
clockid_t clockid = flags & (MAX_CLOCKS - 1);
@@ -1163,8 +1168,8 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
int ret = 0;

BUILD_BUG_ON(MAX_CLOCKS != 16);
- BUILD_BUG_ON(sizeof(struct bpf_timer_kern) > sizeof(struct bpf_timer));
- BUILD_BUG_ON(__alignof__(struct bpf_timer_kern) != __alignof__(struct bpf_timer));
+ BUILD_BUG_ON(sizeof(struct bpf_async_kern) > sizeof(struct bpf_timer));
+ BUILD_BUG_ON(__alignof__(struct bpf_async_kern) != __alignof__(struct bpf_timer));

if (in_nmi())
return -EOPNOTSUPP;
@@ -1187,10 +1192,10 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
ret = -ENOMEM;
goto out;
}
- t->value = (void *)timer - map->record->timer_off;
- t->map = map;
- t->prog = NULL;
- rcu_assign_pointer(t->callback_fn, NULL);
+ t->cb.value = (void *)timer - map->record->timer_off;
+ t->cb.map = map;
+ t->cb.prog = NULL;
+ rcu_assign_pointer(t->cb.callback_fn, NULL);
hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
t->timer.function = bpf_timer_cb;
WRITE_ONCE(timer->timer, t);
@@ -1222,7 +1227,7 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
.arg3_type = ARG_ANYTHING,
};

-BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
+BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callback_fn,
struct bpf_prog_aux *, aux)
{
struct bpf_prog *prev, *prog = aux->prog;
@@ -1237,7 +1242,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
ret = -EINVAL;
goto out;
}
- if (!atomic64_read(&t->map->usercnt)) {
+ if (!atomic64_read(&t->cb.map->usercnt)) {
/* maps with timers must be either held by user space
* or pinned in bpffs. Otherwise timer might still be
* running even when bpf prog is detached and user space
@@ -1246,7 +1251,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
ret = -EPERM;
goto out;
}
- prev = t->prog;
+ prev = t->cb.prog;
if (prev != prog) {
/* Bump prog refcnt once. Every bpf_timer_set_callback()
* can pick different callback_fn-s within the same prog.
@@ -1259,9 +1264,9 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
if (prev)
/* Drop prev prog refcnt when swapping with new prog */
bpf_prog_put(prev);
- t->prog = prog;
+ t->cb.prog = prog;
}
- rcu_assign_pointer(t->callback_fn, callback_fn);
+ rcu_assign_pointer(t->cb.callback_fn, callback_fn);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
return ret;
@@ -1275,7 +1280,7 @@ static const struct bpf_func_proto bpf_timer_set_callback_proto = {
.arg2_type = ARG_PTR_TO_FUNC,
};

-BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, flags)
+BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, flags)
{
struct bpf_hrtimer *t;
int ret = 0;
@@ -1287,7 +1292,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
return -EINVAL;
__bpf_spin_lock_irqsave(&timer->lock);
t = timer->timer;
- if (!t || !t->prog) {
+ if (!t || !t->cb.prog) {
ret = -EINVAL;
goto out;
}
@@ -1315,18 +1320,18 @@ static const struct bpf_func_proto bpf_timer_start_proto = {
.arg3_type = ARG_ANYTHING,
};

-static void drop_prog_refcnt(struct bpf_hrtimer *t)
+static void drop_prog_refcnt(struct bpf_async_cb *async)
{
- struct bpf_prog *prog = t->prog;
+ struct bpf_prog *prog = async->prog;

if (prog) {
bpf_prog_put(prog);
- t->prog = NULL;
- rcu_assign_pointer(t->callback_fn, NULL);
+ async->prog = NULL;
+ rcu_assign_pointer(async->callback_fn, NULL);
}
}

-BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
+BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer)
{
struct bpf_hrtimer *t;
int ret = 0;
@@ -1348,7 +1353,7 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
ret = -EDEADLK;
goto out;
}
- drop_prog_refcnt(t);
+ drop_prog_refcnt(&t->cb);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
/* Cancel the timer and wait for associated callback to finish
@@ -1371,7 +1376,7 @@ static const struct bpf_func_proto bpf_timer_cancel_proto = {
*/
void bpf_timer_cancel_and_free(void *val)
{
- struct bpf_timer_kern *timer = val;
+ struct bpf_async_kern *timer = val;
struct bpf_hrtimer *t;

/* Performance optimization: read timer->timer without lock first. */
@@ -1383,7 +1388,7 @@ void bpf_timer_cancel_and_free(void *val)
t = timer->timer;
if (!t)
goto out;
- drop_prog_refcnt(t);
+ drop_prog_refcnt(&t->cb);
/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
* this timer, since it won't be initialized.
*/
@@ -1410,7 +1415,7 @@ void bpf_timer_cancel_and_free(void *val)
*/
if (this_cpu_read(hrtimer_running) != t)
hrtimer_cancel(&t->timer);
- kfree_rcu(t, rcu);
+ kfree_rcu(t, cb.rcu);
}

BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)

--
2.44.0


2024-04-16 14:13:14

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 07/18] tools: sync include/uapi/linux/bpf.h

cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h

Signed-off-by: Benjamin Tissoires <[email protected]>
---
tools/include/uapi/linux/bpf.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index cee0a7915c08..e4ae83550fb3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7306,6 +7306,10 @@ struct bpf_timer {
__u64 __opaque[2];
} __attribute__((aligned(8)));

+struct bpf_wq {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
struct bpf_dynptr {
__u64 __opaque[2];
} __attribute__((aligned(8)));

--
2.44.0


2024-04-16 14:13:26

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 04/18] bpf: replace bpf_timer_set_callback with a generic helper

In the same way we have a generic __bpf_async_init(), we also need
to share code between timer and workqueue for the set_callback call.

We just add an unused flags parameter, as it will be used for workqueues.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
kernel/bpf/helpers.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1e6d1011303b..d0a645b09d3d 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1262,22 +1262,23 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
.arg3_type = ARG_ANYTHING,
};

-BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callback_fn,
- struct bpf_prog_aux *, aux)
+static int __bpf_async_set_callback(struct bpf_async_kern *async, void *callback_fn,
+ struct bpf_prog_aux *aux, unsigned int flags,
+ enum bpf_async_type type)
{
struct bpf_prog *prev, *prog = aux->prog;
- struct bpf_hrtimer *t;
+ struct bpf_async_cb *cb;
int ret = 0;

if (in_nmi())
return -EOPNOTSUPP;
- __bpf_spin_lock_irqsave(&timer->lock);
- t = timer->timer;
- if (!t) {
+ __bpf_spin_lock_irqsave(&async->lock);
+ cb = async->cb;
+ if (!cb) {
ret = -EINVAL;
goto out;
}
- if (!atomic64_read(&t->cb.map->usercnt)) {
+ if (!atomic64_read(&cb->map->usercnt)) {
/* maps with timers must be either held by user space
* or pinned in bpffs. Otherwise timer might still be
* running even when bpf prog is detached and user space
@@ -1286,7 +1287,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callb
ret = -EPERM;
goto out;
}
- prev = t->cb.prog;
+ prev = cb->prog;
if (prev != prog) {
/* Bump prog refcnt once. Every bpf_timer_set_callback()
* can pick different callback_fn-s within the same prog.
@@ -1299,14 +1300,20 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callb
if (prev)
/* Drop prev prog refcnt when swapping with new prog */
bpf_prog_put(prev);
- t->cb.prog = prog;
+ cb->prog = prog;
}
- rcu_assign_pointer(t->cb.callback_fn, callback_fn);
+ rcu_assign_pointer(cb->callback_fn, callback_fn);
out:
- __bpf_spin_unlock_irqrestore(&timer->lock);
+ __bpf_spin_unlock_irqrestore(&async->lock);
return ret;
}

+BPF_CALL_3(bpf_timer_set_callback, struct bpf_async_kern *, timer, void *, callback_fn,
+ struct bpf_prog_aux *, aux)
+{
+ return __bpf_async_set_callback(timer, callback_fn, aux, 0, BPF_ASYNC_TYPE_TIMER);
+}
+
static const struct bpf_func_proto bpf_timer_set_callback_proto = {
.func = bpf_timer_set_callback,
.gpl_only = true,

--
2.44.0


2024-04-16 14:14:49

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 13/18] selftests/bpf: wq: add bpf_wq_init() checks

Allows to test if allocation/free works

Signed-off-by: Benjamin Tissoires <[email protected]>
---
tools/testing/selftests/bpf/bpf_experimental.h | 1 +
tools/testing/selftests/bpf/prog_tests/wq.c | 8 +++
tools/testing/selftests/bpf/progs/wq.c | 10 +++
tools/testing/selftests/bpf/progs/wq_failures.c | 82 +++++++++++++++++++++++++
4 files changed, 101 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 3329ea080865..6af6f2a43e6d 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -470,4 +470,5 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;

+extern int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags) __weak __ksym;
#endif
diff --git a/tools/testing/selftests/bpf/prog_tests/wq.c b/tools/testing/selftests/bpf/prog_tests/wq.c
index 9a07b8bc2c52..26ab69796103 100644
--- a/tools/testing/selftests/bpf/prog_tests/wq.c
+++ b/tools/testing/selftests/bpf/prog_tests/wq.c
@@ -2,6 +2,7 @@
/* Copyright (c) 2024 Benjamin Tissoires */
#include <test_progs.h>
#include "wq.skel.h"
+#include "wq_failures.skel.h"

void serial_test_wq(void)
{
@@ -9,3 +10,10 @@ void serial_test_wq(void)

RUN_TESTS(wq);
}
+
+void serial_test_failures_wq(void)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+ RUN_TESTS(wq_failures);
+}
diff --git a/tools/testing/selftests/bpf/progs/wq.c b/tools/testing/selftests/bpf/progs/wq.c
index 8c668ad04059..f92466eb8fb1 100644
--- a/tools/testing/selftests/bpf/progs/wq.c
+++ b/tools/testing/selftests/bpf/progs/wq.c
@@ -52,6 +52,7 @@ struct {
static int test_elem_callback(void *map, int *key)
{
struct elem init = {}, *val;
+ struct bpf_wq *wq;

if (map == &lru &&
bpf_map_update_elem(map, key, &init, 0))
@@ -61,12 +62,17 @@ static int test_elem_callback(void *map, int *key)
if (!val)
return -2;

+ wq = &val->w;
+ if (bpf_wq_init(wq, map, 0) != 0)
+ return -3;
+
return 0;
}

static int test_hmap_elem_callback(void *map, int *key)
{
struct hmap_elem init = {}, *val;
+ struct bpf_wq *wq;

if (bpf_map_update_elem(map, key, &init, 0))
return -1;
@@ -75,6 +81,10 @@ static int test_hmap_elem_callback(void *map, int *key)
if (!val)
return -2;

+ wq = &val->work;
+ if (bpf_wq_init(wq, map, 0) != 0)
+ return -3;
+
return 0;
}

diff --git a/tools/testing/selftests/bpf/progs/wq_failures.c b/tools/testing/selftests/bpf/progs/wq_failures.c
new file mode 100644
index 000000000000..a08b6c93a195
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/wq_failures.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Benjamin Tissoires
+ */
+
+#include "bpf_experimental.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "../bpf_testmod/bpf_testmod_kfunc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct elem {
+ struct bpf_wq w;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(max_entries, 2);
+ __type(key, int);
+ __type(value, struct elem);
+} array SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_LRU_HASH);
+ __uint(max_entries, 4);
+ __type(key, int);
+ __type(value, struct elem);
+} lru SEC(".maps");
+
+SEC("tc")
+/* test that bpf_wq_init takes a map as a second argument
+ */
+__log_level(2)
+__flag(BPF_F_TEST_STATE_FREQ)
+__failure
+/* check that code path marks r1 as precise */
+__msg("mark_precise: frame0: regs=r1 stack= before")
+__msg(": (85) call bpf_wq_init#") /* anchor message */
+__msg("workqueue pointer in R1 map_uid=0 doesn't match map pointer in R2 map_uid=0")
+long test_wq_init_nomap(void *ctx)
+{
+ struct bpf_wq *wq;
+ struct elem *val;
+ int key = 0;
+
+ val = bpf_map_lookup_elem(&array, &key);
+ if (!val)
+ return -1;
+
+ wq = &val->w;
+ if (bpf_wq_init(wq, &key, 0) != 0)
+ return -3;
+
+ return 0;
+}
+
+SEC("tc")
+/* test that the workqueue is part of the map in bpf_wq_init
+ */
+__log_level(2)
+__flag(BPF_F_TEST_STATE_FREQ)
+__failure
+/* check that code path marks r1 as precise */
+__msg("mark_precise: frame0: regs=r1 stack= before")
+__msg(": (85) call bpf_wq_init#") /* anchor message */
+__msg("workqueue pointer in R1 map_uid=0 doesn't match map pointer in R2 map_uid=0")
+long test_wq_init_wrong_map(void *ctx)
+{
+ struct bpf_wq *wq;
+ struct elem *val;
+ int key = 0;
+
+ val = bpf_map_lookup_elem(&array, &key);
+ if (!val)
+ return -1;
+
+ wq = &val->w;
+ if (bpf_wq_init(wq, &lru, 0) != 0)
+ return -3;
+
+ return 0;
+}

--
2.44.0


2024-04-16 14:15:39

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 15/18] bpf: wq: add bpf_wq_set_callback_impl

In the verifier, we need to teach it that the mentioned callback is
is an async call.
We look at the provided flags and set the callback to be sleepable if
BPF_F_WQ_SLEEPABLE is set.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
kernel/bpf/helpers.c | 22 +++++++++++++++++
kernel/bpf/verifier.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9ac1b8bb3a01..e5c8adc44619 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1362,6 +1362,13 @@ static int __bpf_async_set_callback(struct bpf_async_kern *async, void *callback
ret = -EINVAL;
goto out;
}
+ if (type == BPF_ASYNC_TYPE_WQ) {
+ /* check that flags_k is consistent with work->flags */
+ if ((flags ^ cb->flags) & BPF_F_WQ_SLEEPABLE) {
+ ret = -EINVAL;
+ goto out;
+ }
+ }
if (!atomic64_read(&cb->map->usercnt)) {
/* maps with timers must be either held by user space
* or pinned in bpffs. Otherwise timer might still be
@@ -2721,6 +2728,20 @@ __bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags)
return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
}

+__bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq,
+ int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
+ unsigned int flags__k,
+ void *aux__ign)
+{
+ struct bpf_prog_aux *aux = (struct bpf_prog_aux *)aux__ign;
+ struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
+
+ if (flags__k & ~BPF_F_WQ_SLEEPABLE)
+ return -EINVAL;
+
+ return __bpf_async_set_callback(async, callback_fn, aux, flags__k, BPF_ASYNC_TYPE_WQ);
+}
+
__bpf_kfunc_end_defs();

BTF_KFUNCS_START(generic_btf_ids)
@@ -2799,6 +2820,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_size)
BTF_ID_FLAGS(func, bpf_dynptr_clone)
BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
BTF_ID_FLAGS(func, bpf_wq_init)
+BTF_ID_FLAGS(func, bpf_wq_set_callback_impl)
BTF_KFUNCS_END(common_btf_ids)

static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6a45d88244c6..947cd544ffab 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -501,8 +501,12 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
}

static bool is_sync_callback_calling_kfunc(u32 btf_id);
+static bool is_async_callback_calling_kfunc(u32 btf_id);
+static bool is_callback_calling_kfunc(u32 btf_id);
static bool is_bpf_throw_kfunc(struct bpf_insn *insn);

+static bool is_bpf_wq_set_callback_impl_kfunc(u32 btf_id);
+
static bool is_sync_callback_calling_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_for_each_map_elem ||
@@ -530,7 +534,8 @@ static bool is_sync_callback_calling_insn(struct bpf_insn *insn)

static bool is_async_callback_calling_insn(struct bpf_insn *insn)
{
- return bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm);
+ return (bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm)) ||
+ (bpf_pseudo_kfunc_call(insn) && is_async_callback_calling_kfunc(insn->imm));
}

static bool is_may_goto_insn(struct bpf_insn *insn)
@@ -9536,7 +9541,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
*/
env->subprog_info[subprog].is_cb = true;
if (bpf_pseudo_kfunc_call(insn) &&
- !is_sync_callback_calling_kfunc(insn->imm)) {
+ !is_callback_calling_kfunc(insn->imm)) {
verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
func_id_name(insn->imm), insn->imm);
return -EFAULT;
@@ -9550,7 +9555,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
if (is_async_callback_calling_insn(insn)) {
struct bpf_verifier_state *async_cb;

- /* there is no real recursion here. timer callbacks are async */
+ /* there is no real recursion here. timer and workqueue callbacks are async */
env->subprog_info[subprog].is_async_cb = true;
async_cb = push_async_cb(env, env->subprog_info[subprog].start,
insn_idx, subprog, is_sleepable);
@@ -11042,6 +11047,7 @@ enum special_kfunc_type {
KF_bpf_throw,
KF_bpf_iter_css_task_new,
KF_bpf_wq_init,
+ KF_bpf_wq_set_callback_impl,
};

BTF_SET_START(special_kfunc_set)
@@ -11069,6 +11075,7 @@ BTF_ID(func, bpf_throw)
BTF_ID(func, bpf_iter_css_task_new)
#endif
BTF_ID(func, bpf_wq_init)
+BTF_ID(func, bpf_wq_set_callback_impl)
BTF_SET_END(special_kfunc_set)

BTF_ID_LIST(special_kfunc_list)
@@ -11100,6 +11107,7 @@ BTF_ID(func, bpf_iter_css_task_new)
BTF_ID_UNUSED
#endif
BTF_ID(func, bpf_wq_init)
+BTF_ID(func, bpf_wq_set_callback_impl)

static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
{
@@ -11431,12 +11439,28 @@ static bool is_sync_callback_calling_kfunc(u32 btf_id)
return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl];
}

+static bool is_async_callback_calling_kfunc(u32 btf_id)
+{
+ return btf_id == special_kfunc_list[KF_bpf_wq_set_callback_impl];
+}
+
static bool is_bpf_throw_kfunc(struct bpf_insn *insn)
{
return bpf_pseudo_kfunc_call(insn) && insn->off == 0 &&
insn->imm == special_kfunc_list[KF_bpf_throw];
}

+static bool is_bpf_wq_set_callback_impl_kfunc(u32 btf_id)
+{
+ return btf_id == special_kfunc_list[KF_bpf_wq_set_callback_impl];
+}
+
+static bool is_callback_calling_kfunc(u32 btf_id)
+{
+ return is_sync_callback_calling_kfunc(btf_id) ||
+ is_async_callback_calling_kfunc(btf_id);
+}
+
static bool is_rbtree_lock_required_kfunc(u32 btf_id)
{
return is_bpf_rbtree_api_kfunc(btf_id);
@@ -12244,6 +12268,22 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
}
}

+ if (is_bpf_wq_set_callback_impl_kfunc(meta.func_id)) {
+ if (!meta.arg_constant.found) {
+ verbose(env, "kfunc %s#%d failed flags verification\n",
+ func_name, meta.func_id);
+ return -EINVAL;
+ }
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ !!(meta.arg_constant.value & BPF_F_WQ_SLEEPABLE),
+ set_timer_callback_state);
+ if (err) {
+ verbose(env, "kfunc %s#%d failed callback verification\n",
+ func_name, meta.func_id);
+ return err;
+ }
+ }
+
rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta);
rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta);

@@ -19681,6 +19721,28 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
*cnt = 1;
+ } else if (is_bpf_wq_set_callback_impl_kfunc(desc->func_id)) {
+ /* The verifier will process callback_fn as many times as necessary
+ * with different maps and the register states prepared by
+ * set_timer_callback_state will be accurate.
+ *
+ * The following use case is valid:
+ * map1 is shared by prog1, prog2, prog3.
+ * prog1 calls bpf_timer_init for some map1 elements
+ * prog2 calls bpf_timer_set_callback for some map1 elements.
+ * Those that were not bpf_timer_init-ed will return -EINVAL.
+ * prog3 calls bpf_timer_start for some map1 elements.
+ * Those that were not both bpf_timer_init-ed and
+ * bpf_timer_set_callback-ed will return -EINVAL.
+ */
+ struct bpf_insn ld_addrs[2] = {
+ BPF_LD_IMM64(BPF_REG_4, (long)env->prog->aux),
+ };
+
+ insn_buf[0] = ld_addrs[0];
+ insn_buf[1] = ld_addrs[1];
+ insn_buf[2] = *insn;
+ *cnt = 3;
}
return 0;
}

--
2.44.0


2024-04-16 14:16:05

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 16/18] selftests/bpf: add checks for bpf_wq_set_callback()

We assign the callback and set everything up.
The actual tests of these callbacks will be done when bpf_wq_start() is
available.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
tools/testing/selftests/bpf/bpf_experimental.h | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/progs/wq.c | 47 +++++++--
tools/testing/selftests/bpf/progs/wq_failures.c | 115 +++++++++++++++++++++
5 files changed, 165 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 6af6f2a43e6d..272604f9c4a5 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -471,4 +471,9 @@ extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __
extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;

extern int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags) __weak __ksym;
+extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
+ int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
+ unsigned int flags__k, void *aux__ign) __ksym;
+#define bpf_wq_set_callback(timer, cb, flags) \
+ bpf_wq_set_callback_impl(timer, cb, flags, NULL)
#endif
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 39ad96a18123..eb2b78552ca2 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -494,6 +494,10 @@ __bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused
return arg;
}

+__bpf_kfunc void bpf_kfunc_call_test_sleepable(void)
+{
+}
+
BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids)
BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc)
BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
@@ -520,6 +524,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_sleepable, KF_SLEEPABLE)
BTF_KFUNCS_END(bpf_testmod_check_kfunc_ids)

static int bpf_testmod_ops_init(struct btf *btf)
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
index 7c664dd61059..ce5cd763561c 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
@@ -96,6 +96,7 @@ void bpf_kfunc_call_test_pass2(struct prog_test_pass2 *p) __ksym;
void bpf_kfunc_call_test_mem_len_fail2(__u64 *mem, int len) __ksym;

void bpf_kfunc_call_test_destructive(void) __ksym;
+void bpf_kfunc_call_test_sleepable(void) __ksym;

void bpf_kfunc_call_test_offset(struct prog_test_ref_kfunc *p);
struct prog_test_member *bpf_kfunc_call_memb_acquire(void);
diff --git a/tools/testing/selftests/bpf/progs/wq.c b/tools/testing/selftests/bpf/progs/wq.c
index f92466eb8fb1..c0a094c84834 100644
--- a/tools/testing/selftests/bpf/progs/wq.c
+++ b/tools/testing/selftests/bpf/progs/wq.c
@@ -49,7 +49,9 @@ struct {
__type(value, struct elem);
} lru SEC(".maps");

-static int test_elem_callback(void *map, int *key)
+static int test_elem_callback(void *map, int *key,
+ int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
+ u64 callback_flags)
{
struct elem init = {}, *val;
struct bpf_wq *wq;
@@ -63,13 +65,18 @@ static int test_elem_callback(void *map, int *key)
return -2;

wq = &val->w;
- if (bpf_wq_init(wq, map, 0) != 0)
+ if (bpf_wq_init(wq, map, callback_flags) != 0)
return -3;

+ if (bpf_wq_set_callback(wq, callback_fn, callback_flags))
+ return -4;
+
return 0;
}

-static int test_hmap_elem_callback(void *map, int *key)
+static int test_hmap_elem_callback(void *map, int *key,
+ int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
+ u64 callback_flags)
{
struct hmap_elem init = {}, *val;
struct bpf_wq *wq;
@@ -85,6 +92,28 @@ static int test_hmap_elem_callback(void *map, int *key)
if (bpf_wq_init(wq, map, 0) != 0)
return -3;

+ if (bpf_wq_set_callback(wq, callback_fn, callback_flags))
+ return -4;
+
+ return 0;
+}
+
+__u32 ok;
+__u32 ok_sleepable;
+
+/* callback for non sleepable workqueue */
+static int wq_callback(void *map, int *key, struct bpf_wq *work)
+{
+ bpf_kfunc_common_test();
+ ok |= (1 << *key);
+ return 0;
+}
+
+/* callback for sleepable workqueue */
+static int wq_cb_sleepable(void *map, int *key, struct bpf_wq *work)
+{
+ bpf_kfunc_call_test_sleepable();
+ ok_sleepable |= (1 << *key);
return 0;
}

@@ -96,7 +125,8 @@ long test_call_array_sleepable(void *ctx)
{
int key = 0;

- return test_elem_callback(&array, &key);
+ return test_elem_callback(&array, &key, wq_cb_sleepable,
+ BPF_F_WQ_SLEEPABLE);
}

SEC("syscall")
@@ -107,7 +137,8 @@ long test_syscall_array_sleepable(void *ctx)
{
int key = 1;

- return test_elem_callback(&array, &key);
+ return test_elem_callback(&array, &key, wq_cb_sleepable,
+ BPF_F_WQ_SLEEPABLE);
}

SEC("tc")
@@ -118,7 +149,7 @@ long test_call_hash_sleepable(void *ctx)
{
int key = 2;

- return test_hmap_elem_callback(&hmap, &key);
+ return test_hmap_elem_callback(&hmap, &key, wq_callback, 0);
}

SEC("tc")
@@ -130,7 +161,7 @@ long test_call_hash_malloc_sleepable(void *ctx)
{
int key = 3;

- return test_hmap_elem_callback(&hmap_malloc, &key);
+ return test_hmap_elem_callback(&hmap_malloc, &key, wq_callback, 0);
}

SEC("tc")
@@ -141,5 +172,5 @@ long test_call_lru_sleepable(void *ctx)
{
int key = 4;

- return test_elem_callback(&lru, &key);
+ return test_elem_callback(&lru, &key, wq_callback, 0);
}
diff --git a/tools/testing/selftests/bpf/progs/wq_failures.c b/tools/testing/selftests/bpf/progs/wq_failures.c
index a08b6c93a195..efde14516358 100644
--- a/tools/testing/selftests/bpf/progs/wq_failures.c
+++ b/tools/testing/selftests/bpf/progs/wq_failures.c
@@ -27,6 +27,20 @@ struct {
__type(value, struct elem);
} lru SEC(".maps");

+/* callback for non sleepable workqueue */
+static int wq_callback(void *map, int *key, struct bpf_wq *work)
+{
+ bpf_kfunc_common_test();
+ return 0;
+}
+
+/* callback for sleepable workqueue */
+static int wq_cb_sleepable(void *map, int *key, struct bpf_wq *work)
+{
+ bpf_kfunc_call_test_sleepable();
+ return 0;
+}
+
SEC("tc")
/* test that bpf_wq_init takes a map as a second argument
*/
@@ -80,3 +94,104 @@ long test_wq_init_wrong_map(void *ctx)

return 0;
}
+
+SEC("?tc")
+__log_level(2)
+__failure
+/* check that bpf_wq_set_callback() can not be called with a
+ * sleepable callback without BPF_F_WQ_SLEEPABLE
+ */
+__msg("mark_precise: frame0: regs=r1 stack= before")
+__msg(": (85) call bpf_kfunc_call_test_sleepable#") /* anchor message */
+__msg("program must be sleepable to call sleepable kfunc bpf_kfunc_call_test_sleepable")
+long test_call_sleepable_callback_missing_flag(void *ctx)
+{
+ int key = 0;
+ struct bpf_wq *wq;
+
+ wq = bpf_map_lookup_elem(&array, &key);
+ if (wq) {
+ bpf_wq_init(wq, &array, 0);
+ bpf_wq_set_callback(wq, wq_cb_sleepable, 0);
+ }
+
+ return 0;
+}
+
+SEC("tc")
+/* check that calling a non sleepable callback with
+ * bpf_wq_set_callback() while providing BPF_F_WQ_SLEEPABLE
+ * fails.
+ */
+__retval(3)
+long test_call_non_sleepable_callback_wrong_flag(void *ctx)
+{
+ int key = 1;
+ struct bpf_wq *wq;
+
+ wq = bpf_map_lookup_elem(&array, &key);
+ if (!wq)
+ return 1;
+
+ if (bpf_wq_init(wq, &array, 0))
+ return 2;
+
+ if (bpf_wq_set_callback(wq, wq_callback, BPF_F_WQ_SLEEPABLE))
+ return 3;
+
+ return -22;
+}
+
+SEC("?tc")
+__log_level(2)
+__failure
+/* check that the first argument of bpf_wq_set_callback()
+ * is a correct bpf_wq pointer.
+ */
+__msg("mark_precise: frame0: regs=r1 stack= before")
+__msg(": (85) call bpf_wq_set_callback_impl#") /* anchor message */
+__msg("arg#0 doesn't point to a map value")
+long test_wrong_wq_pointer(void *ctx)
+{
+ int key = 0;
+ struct bpf_wq *wq;
+
+ wq = bpf_map_lookup_elem(&array, &key);
+ if (!wq)
+ return 1;
+
+ if (bpf_wq_init(wq, &array, 0))
+ return 2;
+
+ if (bpf_wq_set_callback((void *)&wq, wq_callback, 0))
+ return 3;
+
+ return -22;
+}
+
+SEC("?tc")
+__log_level(2)
+__failure
+/* check that the first argument of bpf_wq_set_callback()
+ * is a correct bpf_wq pointer.
+ */
+__msg("mark_precise: frame0: regs=r1 stack= before")
+__msg(": (85) call bpf_wq_set_callback_impl#") /* anchor message */
+__msg("off 1 doesn't point to 'struct bpf_wq' that is at 0")
+long test_wrong_wq_pointer_offset(void *ctx)
+{
+ int key = 0;
+ struct bpf_wq *wq;
+
+ wq = bpf_map_lookup_elem(&array, &key);
+ if (!wq)
+ return 1;
+
+ if (bpf_wq_init(wq, &array, 0))
+ return 2;
+
+ if (bpf_wq_set_callback((void *)wq + 1, wq_cb_sleepable, 0))
+ return 3;
+
+ return -22;
+}

--
2.44.0


2024-04-16 14:16:57

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 17/18] bpf: add bpf_wq_start

again, copy/paste from bpf_timer_start().

Signed-off-by: Benjamin Tissoires <[email protected]>
---
kernel/bpf/helpers.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index e5c8adc44619..ed5309a37eda 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2728,6 +2728,29 @@ __bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags)
return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
}

+__bpf_kfunc int bpf_wq_start(struct bpf_wq *wq, unsigned int flags)
+{
+ struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
+ struct bpf_work *w;
+ int ret = 0;
+
+ if (in_nmi())
+ return -EOPNOTSUPP;
+ if (flags)
+ return -EINVAL;
+ __bpf_spin_lock_irqsave(&async->lock);
+ w = async->work;
+ if (!w || !w->cb.prog) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ schedule_work(&w->work);
+out:
+ __bpf_spin_unlock_irqrestore(&async->lock);
+ return ret;
+}
+
__bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq,
int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
unsigned int flags__k,
@@ -2821,6 +2844,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_clone)
BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
BTF_ID_FLAGS(func, bpf_wq_init)
BTF_ID_FLAGS(func, bpf_wq_set_callback_impl)
+BTF_ID_FLAGS(func, bpf_wq_start)
BTF_KFUNCS_END(common_btf_ids)

static const struct btf_kfunc_id_set common_kfunc_set = {

--
2.44.0


2024-04-16 14:17:22

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 18/18] selftests/bpf: wq: add bpf_wq_start() checks

Allows to test if allocation/free works

Signed-off-by: Benjamin Tissoires <[email protected]>
---
tools/testing/selftests/bpf/bpf_experimental.h | 1 +
tools/testing/selftests/bpf/prog_tests/wq.c | 22 ++++++++++++++++++++++
tools/testing/selftests/bpf/progs/wq.c | 22 +++++++++++++++++++---
3 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index 272604f9c4a5..19dffa32fc08 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -471,6 +471,7 @@ extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __
extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;

extern int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags) __weak __ksym;
+extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym;
extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
unsigned int flags__k, void *aux__ign) __ksym;
diff --git a/tools/testing/selftests/bpf/prog_tests/wq.c b/tools/testing/selftests/bpf/prog_tests/wq.c
index 26ab69796103..8a4a91d944cc 100644
--- a/tools/testing/selftests/bpf/prog_tests/wq.c
+++ b/tools/testing/selftests/bpf/prog_tests/wq.c
@@ -6,9 +6,31 @@

void serial_test_wq(void)
{
+ struct wq *wq_skel = NULL;
+ int err, prog_fd;
+
LIBBPF_OPTS(bpf_test_run_opts, topts);

RUN_TESTS(wq);
+
+ /* re-run the success test to check if the timer was actually executed */
+
+ wq_skel = wq__open_and_load();
+ if (!ASSERT_OK_PTR(wq_skel, "wq_skel_load"))
+ return;
+
+ err = wq__attach(wq_skel);
+ if (!ASSERT_OK(err, "wq_attach"))
+ return;
+
+ prog_fd = bpf_program__fd(wq_skel->progs.test_syscall_array_sleepable);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(topts.retval, 0, "test_run");
+
+ usleep(50); /* 10 usecs should be enough, but give it extra */
+
+ ASSERT_EQ(wq_skel->bss->ok_sleepable, (1 << 1), "ok_sleepable");
}

void serial_test_failures_wq(void)
diff --git a/tools/testing/selftests/bpf/progs/wq.c b/tools/testing/selftests/bpf/progs/wq.c
index c0a094c84834..0cdb9d273e60 100644
--- a/tools/testing/selftests/bpf/progs/wq.c
+++ b/tools/testing/selftests/bpf/progs/wq.c
@@ -49,6 +49,11 @@ struct {
__type(value, struct elem);
} lru SEC(".maps");

+#define CLOCK_MONOTONIC 1
+
+__u32 ok;
+__u32 ok_sleepable;
+
static int test_elem_callback(void *map, int *key,
int (callback_fn)(void *map, int *key, struct bpf_wq *wq),
u64 callback_flags)
@@ -56,6 +61,10 @@ static int test_elem_callback(void *map, int *key,
struct elem init = {}, *val;
struct bpf_wq *wq;

+ if ((ok & (1 << *key) ||
+ (ok_sleepable & (1 << *key))))
+ return -22;
+
if (map == &lru &&
bpf_map_update_elem(map, key, &init, 0))
return -1;
@@ -71,6 +80,9 @@ static int test_elem_callback(void *map, int *key,
if (bpf_wq_set_callback(wq, callback_fn, callback_flags))
return -4;

+ if (bpf_wq_start(wq, 0))
+ return -5;
+
return 0;
}

@@ -81,6 +93,10 @@ static int test_hmap_elem_callback(void *map, int *key,
struct hmap_elem init = {}, *val;
struct bpf_wq *wq;

+ if ((ok & (1 << *key) ||
+ (ok_sleepable & (1 << *key))))
+ return -22;
+
if (bpf_map_update_elem(map, key, &init, 0))
return -1;

@@ -95,12 +111,12 @@ static int test_hmap_elem_callback(void *map, int *key,
if (bpf_wq_set_callback(wq, callback_fn, callback_flags))
return -4;

+ if (bpf_wq_start(wq, 0))
+ return -5;
+
return 0;
}

-__u32 ok;
-__u32 ok_sleepable;
-
/* callback for non sleepable workqueue */
static int wq_callback(void *map, int *key, struct bpf_wq *work)
{

--
2.44.0


2024-04-16 14:17:52

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 11/18] bpf: wq: add bpf_wq_init

We need to teach the verifier about the second argument which is declared
as void * but which is of type KF_ARG_PTR_TO_MAP. We could have dropped
this extra case if we declared the second argument as struct bpf_map *,
but that means users will have to do extra casting to have their program
compile.

We also need to duplicate the timer code for the checking if the map
argument is matching the provided workqueue.

Signed-off-by: Benjamin Tissoires <[email protected]>

---

Note that the timer code when matching for the map is checking for
constant map pointers. I wonder if this needs to be enforced too
(being constant?)
---
include/uapi/linux/bpf.h | 9 ++++
kernel/bpf/helpers.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++-
kernel/bpf/verifier.c | 6 +++
3 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e4ae83550fb3..519f6019d158 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7502,4 +7502,13 @@ struct bpf_iter_num {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+/*
+ * Flags to control bpf_wq_init() and bpf_wq_set_callback() behaviour.
+ * - BPF_F_WQ_SLEEPABLE: the callback needs to run in
+ * a sleepable context
+ */
+enum {
+ BPF_F_WQ_SLEEPABLE = (1ULL << 0),
+};
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9fd12d480b8b..9ac1b8bb3a01 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1109,11 +1109,18 @@ struct bpf_hrtimer {
struct hrtimer timer;
};

-/* the actual struct hidden inside uapi struct bpf_timer */
+struct bpf_work {
+ struct bpf_async_cb cb;
+ struct work_struct work;
+ struct work_struct delete_work;
+};
+
+/* the actual struct hidden inside uapi struct bpf_timer and bpf_wq */
struct bpf_async_kern {
union {
struct bpf_async_cb *cb;
struct bpf_hrtimer *timer;
+ struct bpf_work *work;
};
/* bpf_spin_lock is used here instead of spinlock_t to make
* sure that it always fits into space reserved by struct bpf_timer
@@ -1124,6 +1131,7 @@ struct bpf_async_kern {

enum bpf_async_type {
BPF_ASYNC_TYPE_TIMER = 0,
+ BPF_ASYNC_TYPE_WQ,
};

static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
@@ -1167,11 +1175,75 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
return HRTIMER_NORESTART;
}

+static void bpf_wq_work(struct work_struct *work)
+{
+ struct bpf_work *w = container_of(work, struct bpf_work, work);
+ struct bpf_tramp_run_ctx __maybe_unused run_ctx;
+ struct bpf_prog *prog = w->cb.prog;
+ unsigned int flags = w->cb.flags;
+ struct bpf_map *map = w->cb.map;
+ bpf_callback_t callback_fn;
+ void *value = w->cb.value;
+ void *key;
+ u32 idx;
+
+ BTF_TYPE_EMIT(struct bpf_wq);
+
+ callback_fn = READ_ONCE(w->cb.callback_fn);
+ if (!callback_fn || !prog)
+ return;
+
+ if (map->map_type == BPF_MAP_TYPE_ARRAY) {
+ struct bpf_array *array = container_of(map, struct bpf_array, map);
+
+ /* compute the key */
+ idx = ((char *)value - array->value) / array->elem_size;
+ key = &idx;
+ } else { /* hash or lru */
+ key = value - round_up(map->key_size, 8);
+ }
+
+ run_ctx.bpf_cookie = 0;
+
+ if (flags & BPF_F_WQ_SLEEPABLE) {
+ if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) {
+ /* recursion detected */
+ __bpf_prog_exit_sleepable_recur(prog, 0, &run_ctx);
+ return;
+ }
+ } else {
+ if (!__bpf_prog_enter_recur(prog, &run_ctx)) {
+ /* recursion detected */
+ __bpf_prog_exit_recur(prog, 0, &run_ctx);
+ return;
+ }
+ }
+
+ callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
+ /* The verifier checked that return value is zero. */
+
+ if (flags & BPF_F_WQ_SLEEPABLE)
+ __bpf_prog_exit_sleepable_recur(prog, 0 /* bpf_prog_run does runtime stats */,
+ &run_ctx);
+ else
+ __bpf_prog_exit_recur(prog, 0, &run_ctx);
+}
+
+static void bpf_wq_delete_work(struct work_struct *work)
+{
+ struct bpf_work *w = container_of(work, struct bpf_work, delete_work);
+
+ cancel_work_sync(&w->work);
+
+ kfree_rcu(w, cb.rcu);
+}
+
static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
enum bpf_async_type type)
{
struct bpf_async_cb *cb;
struct bpf_hrtimer *t;
+ struct bpf_work *w;
clockid_t clockid;
size_t size;
int ret = 0;
@@ -1183,6 +1255,9 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
case BPF_ASYNC_TYPE_TIMER:
size = sizeof(struct bpf_hrtimer);
break;
+ case BPF_ASYNC_TYPE_WQ:
+ size = sizeof(struct bpf_work);
+ break;
default:
return -EINVAL;
}
@@ -1201,13 +1276,22 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
goto out;
}

- if (type == BPF_ASYNC_TYPE_TIMER) {
+ switch (type) {
+ case BPF_ASYNC_TYPE_TIMER:
clockid = flags & (MAX_CLOCKS - 1);
t = (struct bpf_hrtimer *)cb;

hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
t->timer.function = bpf_timer_cb;
cb->value = (void *)async - map->record->timer_off;
+ break;
+ case BPF_ASYNC_TYPE_WQ:
+ w = (struct bpf_work *)cb;
+
+ INIT_WORK(&w->work, bpf_wq_work);
+ INIT_WORK(&w->delete_work, bpf_wq_delete_work);
+ cb->value = (void *)async - map->record->wq_off;
+ break;
}
cb->map = map;
cb->prog = NULL;
@@ -1473,7 +1557,19 @@ void bpf_timer_cancel_and_free(void *val)
*/
void bpf_wq_cancel_and_free(void *val)
{
+ struct bpf_work *work;
+
BTF_TYPE_EMIT(struct bpf_wq);
+
+ work = (struct bpf_work *)__bpf_async_cancel_and_free(val);
+ if (!work)
+ return;
+ /* Trigger cancel of the sleepable work, but *do not* wait for
+ * it to finish if it was running as we might not be in a
+ * sleepable context.
+ * kfree will be called once the work has finished.
+ */
+ schedule_work(&work->delete_work);
}

BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
@@ -2612,6 +2708,19 @@ __bpf_kfunc void bpf_throw(u64 cookie)
WARN(1, "A call to BPF exception callback should never return\n");
}

+__bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags)
+{
+ struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
+
+ BUILD_BUG_ON(sizeof(struct bpf_async_kern) > sizeof(struct bpf_wq));
+ BUILD_BUG_ON(__alignof__(struct bpf_async_kern) != __alignof__(struct bpf_wq));
+
+ if (flags & ~BPF_F_WQ_SLEEPABLE)
+ return -EINVAL;
+
+ return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
+}
+
__bpf_kfunc_end_defs();

BTF_KFUNCS_START(generic_btf_ids)
@@ -2689,6 +2798,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
BTF_ID_FLAGS(func, bpf_dynptr_size)
BTF_ID_FLAGS(func, bpf_dynptr_clone)
BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
+BTF_ID_FLAGS(func, bpf_wq_init)
BTF_KFUNCS_END(common_btf_ids)

static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 112faf2cd7e9..5e8c1e65fe8c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11038,6 +11038,7 @@ enum special_kfunc_type {
KF_bpf_percpu_obj_drop_impl,
KF_bpf_throw,
KF_bpf_iter_css_task_new,
+ KF_bpf_wq_init,
};

BTF_SET_START(special_kfunc_set)
@@ -11064,6 +11065,7 @@ BTF_ID(func, bpf_throw)
#ifdef CONFIG_CGROUPS
BTF_ID(func, bpf_iter_css_task_new)
#endif
+BTF_ID(func, bpf_wq_init)
BTF_SET_END(special_kfunc_set)

BTF_ID_LIST(special_kfunc_list)
@@ -11094,6 +11096,7 @@ BTF_ID(func, bpf_iter_css_task_new)
#else
BTF_ID_UNUSED
#endif
+BTF_ID(func, bpf_wq_init)

static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
{
@@ -11171,6 +11174,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (is_kfunc_arg_wq(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_WORKQUEUE;

+ if (meta->func_id == special_kfunc_list[KF_bpf_wq_init] && argno == 1)
+ return KF_ARG_PTR_TO_MAP;
+
if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
if (!btf_type_is_struct(ref_t)) {
verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",

--
2.44.0


2024-04-16 14:18:05

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 12/18] tools: sync include/uapi/linux/bpf.h

cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h

Signed-off-by: Benjamin Tissoires <[email protected]>
---
tools/include/uapi/linux/bpf.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index e4ae83550fb3..519f6019d158 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -7502,4 +7502,13 @@ struct bpf_iter_num {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+/*
+ * Flags to control bpf_wq_init() and bpf_wq_set_callback() behaviour.
+ * - BPF_F_WQ_SLEEPABLE: the callback needs to run in
+ * a sleepable context
+ */
+enum {
+ BPF_F_WQ_SLEEPABLE = (1ULL << 0),
+};
+
#endif /* _UAPI__LINUX_BPF_H__ */

--
2.44.0


2024-04-16 14:18:31

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 14/18] bpf/verifier: add is_sleepable argument to push_callback_call

To support sleepable async callbacks, we need to tell push_callback_call()
whether the cb is sleepable or not.
Doing so while checking for the kfunc arguments (when we call
push_callback_call()) is simpler than adding a check for a function
while inside push_callback_call().

When a callback is tagged as sleepable, the verifier now knows that it is
the case and can allow a sleepable callback to happen.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
include/linux/bpf_verifier.h | 1 +
kernel/bpf/verifier.c | 26 ++++++++++++++++----------
2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 36d19cd32eb5..9db35530c878 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -426,6 +426,7 @@ struct bpf_verifier_state {
* while they are still in use.
*/
bool used_as_loop_entry;
+ bool in_sleepable;

/* first and last insn idx of this verifier state */
u32 first_insn_idx;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5e8c1e65fe8c..6a45d88244c6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1429,6 +1429,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
}
dst_state->speculative = src->speculative;
dst_state->active_rcu_lock = src->active_rcu_lock;
+ dst_state->in_sleepable = src->in_sleepable;
dst_state->curframe = src->curframe;
dst_state->active_lock.ptr = src->active_lock.ptr;
dst_state->active_lock.id = src->active_lock.id;
@@ -2404,7 +2405,7 @@ static void init_func_state(struct bpf_verifier_env *env,
/* Similar to push_stack(), but for async callbacks */
static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx,
- int subprog)
+ int subprog, bool is_sleepable)
{
struct bpf_verifier_stack_elem *elem;
struct bpf_func_state *frame;
@@ -2431,6 +2432,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
* Initialize it similar to do_check_common().
*/
elem->st.branches = 1;
+ elem->st.in_sleepable = is_sleepable;
frame = kzalloc(sizeof(*frame), GFP_KERNEL);
if (!frame)
goto err;
@@ -5278,7 +5280,8 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,

static bool in_sleepable(struct bpf_verifier_env *env)
{
- return env->prog->sleepable;
+ return env->prog->sleepable ||
+ (env->cur_state && env->cur_state->in_sleepable);
}

/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
@@ -9515,7 +9518,7 @@ static int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog,
}

static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
- int insn_idx, int subprog,
+ int insn_idx, int subprog, bool is_sleepable,
set_callee_state_fn set_callee_state_cb)
{
struct bpf_verifier_state *state = env->cur_state, *callback_state;
@@ -9550,7 +9553,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
/* there is no real recursion here. timer callbacks are async */
env->subprog_info[subprog].is_async_cb = true;
async_cb = push_async_cb(env, env->subprog_info[subprog].start,
- insn_idx, subprog);
+ insn_idx, subprog, is_sleepable);
if (!async_cb)
return -EFAULT;
callee = async_cb->frame[0];
@@ -10389,15 +10392,15 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
}
break;
case BPF_FUNC_for_each_map_elem:
- err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno, false,
set_map_elem_callback_state);
break;
case BPF_FUNC_timer_set_callback:
- err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno, false,
set_timer_callback_state);
break;
case BPF_FUNC_find_vma:
- err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno, false,
set_find_vma_callback_state);
break;
case BPF_FUNC_snprintf:
@@ -10412,7 +10415,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
if (err)
return err;
if (cur_func(env)->callback_depth < regs[BPF_REG_1].umax_value) {
- err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno, false,
set_loop_callback_state);
} else {
cur_func(env)->callback_depth = 0;
@@ -10515,7 +10518,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
break;
}
case BPF_FUNC_user_ringbuf_drain:
- err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno, false,
set_user_ringbuf_callback_state);
break;
}
@@ -12232,7 +12235,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
return err;

if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
- err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno, false,
set_rbtree_add_callback_state);
if (err) {
verbose(env, "kfunc %s#%d failed callback verification\n",
@@ -17004,6 +17007,9 @@ static bool states_equal(struct bpf_verifier_env *env,
if (old->active_rcu_lock != cur->active_rcu_lock)
return false;

+ if (old->in_sleepable != cur->in_sleepable)
+ return false;
+
/* for states to be equal callsites have to be the same
* and all frame states need to be equivalent
*/

--
2.44.0


2024-04-16 14:23:18

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 05/18] bpf: replace bpf_timer_cancel_and_free with a generic helper

Same reason than most bpf_timer* functions, we need almost the same for
workqueues.
So extract the generic part out of it so bpf_wq_cancel_and_free can reuse
it.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
kernel/bpf/helpers.c | 42 +++++++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index d0a645b09d3d..78847f444f79 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1413,36 +1413,44 @@ static const struct bpf_func_proto bpf_timer_cancel_proto = {
.arg1_type = ARG_PTR_TO_TIMER,
};

-/* This function is called by map_delete/update_elem for individual element and
- * by ops->map_release_uref when the user space reference to a map reaches zero.
- */
-void bpf_timer_cancel_and_free(void *val)
+static struct bpf_async_cb *__bpf_async_cancel_and_free(struct bpf_async_kern *async)
{
- struct bpf_async_kern *timer = val;
- struct bpf_hrtimer *t;
+ struct bpf_async_cb *cb;

- /* Performance optimization: read timer->timer without lock first. */
- if (!READ_ONCE(timer->timer))
- return;
+ /* Performance optimization: read async->cb without lock first. */
+ if (!READ_ONCE(async->cb))
+ return NULL;

- __bpf_spin_lock_irqsave(&timer->lock);
+ __bpf_spin_lock_irqsave(&async->lock);
/* re-read it under lock */
- t = timer->timer;
- if (!t)
+ cb = async->cb;
+ if (!cb)
goto out;
- drop_prog_refcnt(&t->cb);
+ drop_prog_refcnt(cb);
/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
* this timer, since it won't be initialized.
*/
- WRITE_ONCE(timer->timer, NULL);
+ WRITE_ONCE(async->cb, NULL);
out:
- __bpf_spin_unlock_irqrestore(&timer->lock);
+ __bpf_spin_unlock_irqrestore(&async->lock);
+ return cb;
+}
+
+/* This function is called by map_delete/update_elem for individual element and
+ * by ops->map_release_uref when the user space reference to a map reaches zero.
+ */
+void bpf_timer_cancel_and_free(void *val)
+{
+ struct bpf_hrtimer *t;
+
+ t = (struct bpf_hrtimer *)__bpf_async_cancel_and_free(val);
+
if (!t)
return;
/* Cancel the timer and wait for callback to complete if it was running.
* If hrtimer_cancel() can be safely called it's safe to call kfree(t)
* right after for both preallocated and non-preallocated maps.
- * The timer->timer = NULL was already done and no code path can
+ * The async->cb = NULL was already done and no code path can
* see address 't' anymore.
*
* Check that bpf_map_delete/update_elem() wasn't called from timer
@@ -1451,7 +1459,7 @@ void bpf_timer_cancel_and_free(void *val)
* return -1). Though callback_fn is still running on this cpu it's
* safe to do kfree(t) because bpf_timer_cb() read everything it needed
* from 't'. The bpf subprog callback_fn won't be able to access 't',
- * since timer->timer = NULL was already done. The timer will be
+ * since async->cb = NULL was already done. The timer will be
* effectively cancelled because bpf_timer_cb() will return
* HRTIMER_NORESTART.
*/

--
2.44.0


2024-04-16 14:24:51

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 08/18] bpf: add support for KF_ARG_PTR_TO_WORKQUEUE

Introduce support for KF_ARG_PTR_TO_WORKQUEUE. The kfuncs will use bpf_wq
as argument and that will be recognized as workqueue argument by verifier.
bpf_wq_kern casting can happen inside kfunc, but using bpf_wq in
argument makes life easier for users who work with non-kern type in BPF
progs.

Duplicate process_timer_func into process_wq_func.
meta argument is only needed to ensure bpf_wq_init's workqueue and map
arguments are coming from the same map (map_uid logic is necessary for
correct inner-map handling), so also amend check_kfunc_args() to
match what helpers functions check is doing.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
kernel/bpf/verifier.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index deaf2e1ab690..112faf2cd7e9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -332,6 +332,10 @@ struct bpf_kfunc_call_arg_meta {
u8 spi;
u8 frameno;
} iter;
+ struct {
+ struct bpf_map *ptr;
+ int uid;
+ } map;
u64 mem_size;
};

@@ -7598,6 +7602,43 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
return 0;
}

+static int process_wq_func(struct bpf_verifier_env *env, int regno,
+ struct bpf_kfunc_call_arg_meta *meta)
+{
+ struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+ bool is_const = tnum_is_const(reg->var_off);
+ struct bpf_map *map = reg->map_ptr;
+ u64 val = reg->var_off.value;
+
+ if (!is_const) {
+ verbose(env,
+ "R%d doesn't have constant offset. bpf_wq has to be at the constant offset\n",
+ regno);
+ return -EINVAL;
+ }
+ if (!map->btf) {
+ verbose(env, "map '%s' has to have BTF in order to use bpf_wq\n",
+ map->name);
+ return -EINVAL;
+ }
+ if (!btf_record_has_field(map->record, BPF_WORKQUEUE)) {
+ verbose(env, "map '%s' has no valid bpf_wq\n", map->name);
+ return -EINVAL;
+ }
+ if (map->record->wq_off != val + reg->off) {
+ verbose(env, "off %lld doesn't point to 'struct bpf_wq' that is at %d\n",
+ val + reg->off, map->record->wq_off);
+ return -EINVAL;
+ }
+ if (meta->map.ptr) {
+ verbose(env, "verifier bug. Two map pointers in a workqueue helper\n");
+ return -EFAULT;
+ }
+ meta->map.uid = reg->map_uid;
+ meta->map.ptr = map;
+ return 0;
+}
+
static int process_kptr_func(struct bpf_verifier_env *env, int regno,
struct bpf_call_arg_meta *meta)
{
@@ -10843,6 +10884,7 @@ enum {
KF_ARG_LIST_NODE_ID,
KF_ARG_RB_ROOT_ID,
KF_ARG_RB_NODE_ID,
+ KF_ARG_WORKQUEUE_ID,
};

BTF_ID_LIST(kf_arg_btf_ids)
@@ -10851,6 +10893,7 @@ BTF_ID(struct, bpf_list_head)
BTF_ID(struct, bpf_list_node)
BTF_ID(struct, bpf_rb_root)
BTF_ID(struct, bpf_rb_node)
+BTF_ID(struct, bpf_wq)

static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
const struct btf_param *arg, int type)
@@ -10894,6 +10937,11 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
}

+static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
+{
+ return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
+}
+
static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
const struct btf_param *arg)
{
@@ -10963,6 +11011,7 @@ enum kfunc_ptr_arg_type {
KF_ARG_PTR_TO_NULL,
KF_ARG_PTR_TO_CONST_STR,
KF_ARG_PTR_TO_MAP,
+ KF_ARG_PTR_TO_WORKQUEUE,
};

enum special_kfunc_type {
@@ -11119,6 +11168,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (is_kfunc_arg_map(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_MAP;

+ if (is_kfunc_arg_wq(meta->btf, &args[argno]))
+ return KF_ARG_PTR_TO_WORKQUEUE;
+
if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
if (!btf_type_is_struct(ref_t)) {
verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -11720,6 +11772,30 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_NULL:
continue;
case KF_ARG_PTR_TO_MAP:
+ if (meta->map.ptr) {
+ /* Use map_uid (which is unique id of inner map) to reject:
+ * inner_map1 = bpf_map_lookup_elem(outer_map, key1)
+ * inner_map2 = bpf_map_lookup_elem(outer_map, key2)
+ * if (inner_map1 && inner_map2) {
+ * timer = bpf_map_lookup_elem(inner_map1);
+ * if (timer)
+ * // mismatch would have been allowed
+ * bpf_timer_init(timer, inner_map2);
+ * }
+ *
+ * Comparing map_ptr is enough to distinguish normal and outer maps.
+ */
+ if (meta->map.ptr != reg->map_ptr ||
+ meta->map.uid != reg->map_uid) {
+ verbose(env,
+ "workqueue pointer in R1 map_uid=%d doesn't match map pointer in R2 map_uid=%d\n",
+ meta->map.uid, reg->map_uid);
+ return -EINVAL;
+ }
+ }
+ meta->map.ptr = reg->map_ptr;
+ meta->map.uid = reg->map_uid;
+ fallthrough;
case KF_ARG_PTR_TO_ALLOC_BTF_ID:
case KF_ARG_PTR_TO_BTF_ID:
if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
@@ -11752,6 +11828,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_CALLBACK:
case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
case KF_ARG_PTR_TO_CONST_STR:
+ case KF_ARG_PTR_TO_WORKQUEUE:
/* Trusted by default */
break;
default:
@@ -12038,6 +12115,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
if (ret)
return ret;
break;
+ case KF_ARG_PTR_TO_WORKQUEUE:
+ if (reg->type != PTR_TO_MAP_VALUE) {
+ verbose(env, "arg#%d doesn't point to a map value\n", i);
+ return -EINVAL;
+ }
+ ret = process_wq_func(env, regno, meta);
+ if (ret < 0)
+ return ret;
+ break;
}
}


--
2.44.0


2024-04-16 14:25:14

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 09/18] bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps

Currently bpf_wq_cancel_and_free() is just a placeholder as there is
no memory allocation for bpf_wq just yet.

Again, duplication of the bpf_timer approach

Signed-off-by: Benjamin Tissoires <[email protected]>
---
include/linux/bpf.h | 2 ++
kernel/bpf/arraymap.c | 18 ++++++++++-------
kernel/bpf/hashtab.c | 55 ++++++++++++++++++++++++++++++++++++++++-----------
kernel/bpf/helpers.c | 8 ++++++++
kernel/bpf/syscall.c | 10 ++++++++++
5 files changed, 74 insertions(+), 19 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 45cb13dfd15e..9ea50d9c7a0c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -534,6 +534,7 @@ static inline void zero_map_value(struct bpf_map *map, void *dst)
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
bool lock_src);
void bpf_timer_cancel_and_free(void *timer);
+void bpf_wq_cancel_and_free(void *timer);
void bpf_list_head_free(const struct btf_field *field, void *list_head,
struct bpf_spin_lock *spin_lock);
void bpf_rb_root_free(const struct btf_field *field, void *rb_root,
@@ -2208,6 +2209,7 @@ void bpf_map_free_record(struct bpf_map *map);
struct btf_record *btf_record_dup(const struct btf_record *rec);
bool btf_record_equal(const struct btf_record *rec_a, const struct btf_record *rec_b);
void bpf_obj_free_timer(const struct btf_record *rec, void *obj);
+void bpf_obj_free_workqueue(const struct btf_record *rec, void *obj);
void bpf_obj_free_fields(const struct btf_record *rec, void *obj);
void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu);

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 8c1e6d7654bb..580d07b15471 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -428,17 +428,21 @@ static void *array_map_vmalloc_addr(struct bpf_array *array)
return (void *)round_down((unsigned long)array, PAGE_SIZE);
}

-static void array_map_free_timers(struct bpf_map *map)
+static void array_map_free_timers_wq(struct bpf_map *map)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
int i;

- /* We don't reset or free fields other than timer on uref dropping to zero. */
- if (!btf_record_has_field(map->record, BPF_TIMER))
- return;
+ /* We don't reset or free fields other than timer and workqueue
+ * on uref dropping to zero.
+ */
+ if (btf_record_has_field(map->record, BPF_TIMER))
+ for (i = 0; i < array->map.max_entries; i++)
+ bpf_obj_free_timer(map->record, array_map_elem_ptr(array, i));

- for (i = 0; i < array->map.max_entries; i++)
- bpf_obj_free_timer(map->record, array_map_elem_ptr(array, i));
+ if (btf_record_has_field(map->record, BPF_WORKQUEUE))
+ for (i = 0; i < array->map.max_entries; i++)
+ bpf_obj_free_workqueue(map->record, array_map_elem_ptr(array, i));
}

/* Called when map->refcnt goes to zero, either from workqueue or from syscall */
@@ -782,7 +786,7 @@ const struct bpf_map_ops array_map_ops = {
.map_alloc = array_map_alloc,
.map_free = array_map_free,
.map_get_next_key = array_map_get_next_key,
- .map_release_uref = array_map_free_timers,
+ .map_release_uref = array_map_free_timers_wq,
.map_lookup_elem = array_map_lookup_elem,
.map_update_elem = array_map_update_elem,
.map_delete_elem = array_map_delete_elem,
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 83a9a74260e9..4f8590067c6a 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -240,6 +240,26 @@ static void htab_free_prealloced_timers(struct bpf_htab *htab)
}
}

+static void htab_free_prealloced_wq(struct bpf_htab *htab)
+{
+ u32 num_entries = htab->map.max_entries;
+ int i;
+
+ if (!btf_record_has_field(htab->map.record, BPF_WORKQUEUE))
+ return;
+ if (htab_has_extra_elems(htab))
+ num_entries += num_possible_cpus();
+
+ for (i = 0; i < num_entries; i++) {
+ struct htab_elem *elem;
+
+ elem = get_htab_elem(htab, i);
+ bpf_obj_free_workqueue(htab->map.record,
+ elem->key + round_up(htab->map.key_size, 8));
+ cond_resched();
+ }
+}
+
static void htab_free_prealloced_fields(struct bpf_htab *htab)
{
u32 num_entries = htab->map.max_entries;
@@ -1495,7 +1515,7 @@ static void delete_all_elements(struct bpf_htab *htab)
migrate_enable();
}

-static void htab_free_malloced_timers(struct bpf_htab *htab)
+static void htab_free_malloced_timers_or_wq(struct bpf_htab *htab, bool is_timer)
{
int i;

@@ -1507,24 +1527,35 @@ static void htab_free_malloced_timers(struct bpf_htab *htab)

hlist_nulls_for_each_entry(l, n, head, hash_node) {
/* We only free timer on uref dropping to zero */
- bpf_obj_free_timer(htab->map.record, l->key + round_up(htab->map.key_size, 8));
+ if (is_timer)
+ bpf_obj_free_timer(htab->map.record,
+ l->key + round_up(htab->map.key_size, 8));
+ else
+ bpf_obj_free_workqueue(htab->map.record,
+ l->key + round_up(htab->map.key_size, 8));
}
cond_resched_rcu();
}
rcu_read_unlock();
}

-static void htab_map_free_timers(struct bpf_map *map)
+static void htab_map_free_timers_and_wq(struct bpf_map *map)
{
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);

- /* We only free timer on uref dropping to zero */
- if (!btf_record_has_field(htab->map.record, BPF_TIMER))
- return;
- if (!htab_is_prealloc(htab))
- htab_free_malloced_timers(htab);
- else
- htab_free_prealloced_timers(htab);
+ /* We only free timer and workqueue on uref dropping to zero */
+ if (btf_record_has_field(htab->map.record, BPF_TIMER)) {
+ if (!htab_is_prealloc(htab))
+ htab_free_malloced_timers_or_wq(htab, true);
+ else
+ htab_free_prealloced_timers(htab);
+ }
+ if (btf_record_has_field(htab->map.record, BPF_WORKQUEUE)) {
+ if (!htab_is_prealloc(htab))
+ htab_free_malloced_timers_or_wq(htab, false);
+ else
+ htab_free_prealloced_wq(htab);
+ }
}

/* Called when map->refcnt goes to zero, either from workqueue or from syscall */
@@ -2260,7 +2291,7 @@ const struct bpf_map_ops htab_map_ops = {
.map_alloc = htab_map_alloc,
.map_free = htab_map_free,
.map_get_next_key = htab_map_get_next_key,
- .map_release_uref = htab_map_free_timers,
+ .map_release_uref = htab_map_free_timers_and_wq,
.map_lookup_elem = htab_map_lookup_elem,
.map_lookup_and_delete_elem = htab_map_lookup_and_delete_elem,
.map_update_elem = htab_map_update_elem,
@@ -2281,7 +2312,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
.map_alloc = htab_map_alloc,
.map_free = htab_map_free,
.map_get_next_key = htab_map_get_next_key,
- .map_release_uref = htab_map_free_timers,
+ .map_release_uref = htab_map_free_timers_and_wq,
.map_lookup_elem = htab_lru_map_lookup_elem,
.map_lookup_and_delete_elem = htab_lru_map_lookup_and_delete_elem,
.map_lookup_elem_sys_only = htab_lru_map_lookup_elem_sys,
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 78847f444f79..9fd12d480b8b 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1468,6 +1468,14 @@ void bpf_timer_cancel_and_free(void *val)
kfree_rcu(t, cb.rcu);
}

+/* This function is called by map_delete/update_elem for individual element and
+ * by ops->map_release_uref when the user space reference to a map reaches zero.
+ */
+void bpf_wq_cancel_and_free(void *val)
+{
+ BTF_TYPE_EMIT(struct bpf_wq);
+}
+
BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
{
unsigned long *kptr = map_value;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0848e4141b00..c793be6b34b2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -661,6 +661,13 @@ void bpf_obj_free_timer(const struct btf_record *rec, void *obj)
bpf_timer_cancel_and_free(obj + rec->timer_off);
}

+void bpf_obj_free_workqueue(const struct btf_record *rec, void *obj)
+{
+ if (WARN_ON_ONCE(!btf_record_has_field(rec, BPF_WORKQUEUE)))
+ return;
+ bpf_wq_cancel_and_free(obj + rec->wq_off);
+}
+
void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
{
const struct btf_field *fields;
@@ -682,6 +689,7 @@ void bpf_obj_free_fields(const struct btf_record *rec, void *obj)
bpf_timer_cancel_and_free(field_ptr);
break;
case BPF_WORKQUEUE:
+ bpf_wq_cancel_and_free(field_ptr);
break;
case BPF_KPTR_UNREF:
WRITE_ONCE(*(u64 *)field_ptr, 0);
@@ -1119,6 +1127,8 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
}
break;
case BPF_TIMER:
+ fallthrough;
+ case BPF_WORKQUEUE:
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_LRU_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY) {

--
2.44.0


2024-04-16 14:25:36

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH bpf-next 10/18] selftests/bpf: add bpf_wq tests

We simply try in all supported map types if we can store/load a bpf_wq.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
tools/testing/selftests/bpf/prog_tests/wq.c | 11 +++
tools/testing/selftests/bpf/progs/wq.c | 135 ++++++++++++++++++++++++++++
2 files changed, 146 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/wq.c b/tools/testing/selftests/bpf/prog_tests/wq.c
new file mode 100644
index 000000000000..9a07b8bc2c52
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/wq.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Benjamin Tissoires */
+#include <test_progs.h>
+#include "wq.skel.h"
+
+void serial_test_wq(void)
+{
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+ RUN_TESTS(wq);
+}
diff --git a/tools/testing/selftests/bpf/progs/wq.c b/tools/testing/selftests/bpf/progs/wq.c
new file mode 100644
index 000000000000..8c668ad04059
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/wq.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Benjamin Tissoires
+ */
+
+#include "bpf_experimental.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "../bpf_testmod/bpf_testmod_kfunc.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct hmap_elem {
+ int counter;
+ struct bpf_timer timer; /* unused */
+ struct bpf_spin_lock lock; /* unused */
+ struct bpf_wq work;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 1000);
+ __type(key, int);
+ __type(value, struct hmap_elem);
+} hmap SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __uint(max_entries, 1000);
+ __type(key, int);
+ __type(value, struct hmap_elem);
+} hmap_malloc SEC(".maps");
+
+struct elem {
+ struct bpf_wq w;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(max_entries, 2);
+ __type(key, int);
+ __type(value, struct elem);
+} array SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_LRU_HASH);
+ __uint(max_entries, 4);
+ __type(key, int);
+ __type(value, struct elem);
+} lru SEC(".maps");
+
+static int test_elem_callback(void *map, int *key)
+{
+ struct elem init = {}, *val;
+
+ if (map == &lru &&
+ bpf_map_update_elem(map, key, &init, 0))
+ return -1;
+
+ val = bpf_map_lookup_elem(map, key);
+ if (!val)
+ return -2;
+
+ return 0;
+}
+
+static int test_hmap_elem_callback(void *map, int *key)
+{
+ struct hmap_elem init = {}, *val;
+
+ if (bpf_map_update_elem(map, key, &init, 0))
+ return -1;
+
+ val = bpf_map_lookup_elem(map, key);
+ if (!val)
+ return -2;
+
+ return 0;
+}
+
+SEC("tc")
+/* test that workqueues can be used from an array
+ */
+__retval(0)
+long test_call_array_sleepable(void *ctx)
+{
+ int key = 0;
+
+ return test_elem_callback(&array, &key);
+}
+
+SEC("syscall")
+/* Same test than above but from a sleepable context.
+ */
+__retval(0)
+long test_syscall_array_sleepable(void *ctx)
+{
+ int key = 1;
+
+ return test_elem_callback(&array, &key);
+}
+
+SEC("tc")
+/* test that workqueues can be used from a hashmap
+ */
+__retval(0)
+long test_call_hash_sleepable(void *ctx)
+{
+ int key = 2;
+
+ return test_hmap_elem_callback(&hmap, &key);
+}
+
+SEC("tc")
+/* test that workqueues can be used from a hashmap
+ * with NO_PREALLOC.
+ */
+__retval(0)
+long test_call_hash_malloc_sleepable(void *ctx)
+{
+ int key = 3;
+
+ return test_hmap_elem_callback(&hmap_malloc, &key);
+}
+
+SEC("tc")
+/* test that workqueues can be used from a LRU map
+ */
+__retval(0)
+long test_call_lru_sleepable(void *ctx)
+{
+ int key = 4;
+
+ return test_elem_callback(&lru, &key);
+}

--
2.44.0


2024-04-18 03:25:42

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH bpf-next 16/18] selftests/bpf: add checks for bpf_wq_set_callback()

On Tue, Apr 16, 2024 at 7:11 AM Benjamin Tissoires <[email protected]> wrote:
[...]

> +SEC("?tc")
> +__log_level(2)
> +__failure
> +/* check that the first argument of bpf_wq_set_callback()
> + * is a correct bpf_wq pointer.
> + */
> +__msg("mark_precise: frame0: regs=r1 stack= before")

This line and some other "mark_precise" lines are causing issues for
test_progs-no_alu32 in the CI. I can reproduce it in my local tests.

I am not quite sure what is the best fix. Maybe we can just
remove it.

Thanks,
Song

> +__msg(": (85) call bpf_wq_set_callback_impl#") /* anchor message */
> +__msg("off 1 doesn't point to 'struct bpf_wq' that is at 0")
> +long test_wrong_wq_pointer_offset(void *ctx)
> +{
> + int key = 0;
> + struct bpf_wq *wq;
> +
> + wq = bpf_map_lookup_elem(&array, &key);
> + if (!wq)
> + return 1;
> +
> + if (bpf_wq_init(wq, &array, 0))
> + return 2;
> +
> + if (bpf_wq_set_callback((void *)wq + 1, wq_cb_sleepable, 0))
> + return 3;
> +
> + return -22;
> +}
>
> --
> 2.44.0
>

2024-04-18 08:55:55

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH bpf-next 16/18] selftests/bpf: add checks for bpf_wq_set_callback()

On Thu, Apr 18, 2024 at 5:25 AM Song Liu <[email protected]> wrote:
>
> On Tue, Apr 16, 2024 at 7:11 AM Benjamin Tissoires <[email protected]> wrote:
> [...]
>
> > +SEC("?tc")
> > +__log_level(2)
> > +__failure
> > +/* check that the first argument of bpf_wq_set_callback()
> > + * is a correct bpf_wq pointer.
> > + */
> > +__msg("mark_precise: frame0: regs=r1 stack= before")
>
> This line and some other "mark_precise" lines are causing issues for
> test_progs-no_alu32 in the CI. I can reproduce it in my local tests.
>

Indeed. I can also reproduce locally. Here, it only happens for
test_wq_init_nomap() and test_wq_init_wrong_map().
TBH, I'm not sure what "precise" means, I just copied the checks from
timer_failures.c.

>
> I am not quite sure what is the best fix. Maybe we can just
> remove it.

Given that most of the code is shared with timer, but given that we
are working with kfuncs, we are not using the same r0 registers.
So yeah, I would think we could rely on the timer tests for precise,
and drop them here...

Cheers,
Benjamin

>
>
> Thanks,
> Song
>
> > +__msg(": (85) call bpf_wq_set_callback_impl#") /* anchor message */
> > +__msg("off 1 doesn't point to 'struct bpf_wq' that is at 0")
> > +long test_wrong_wq_pointer_offset(void *ctx)
> > +{
> > + int key = 0;
> > + struct bpf_wq *wq;
> > +
> > + wq = bpf_map_lookup_elem(&array, &key);
> > + if (!wq)
> > + return 1;
> > +
> > + if (bpf_wq_init(wq, &array, 0))
> > + return 2;
> > +
> > + if (bpf_wq_set_callback((void *)wq + 1, wq_cb_sleepable, 0))
> > + return 3;
> > +
> > + return -22;
> > +}
> >
> > --
> > 2.44.0
> >
>


2024-04-19 05:26:03

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next 11/18] bpf: wq: add bpf_wq_init

On Tue, Apr 16, 2024 at 04:08:24PM +0200, Benjamin Tissoires wrote:
> We need to teach the verifier about the second argument which is declared
> as void * but which is of type KF_ARG_PTR_TO_MAP. We could have dropped
> this extra case if we declared the second argument as struct bpf_map *,
> but that means users will have to do extra casting to have their program
> compile.
>
> We also need to duplicate the timer code for the checking if the map
> argument is matching the provided workqueue.
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
>
> ---
>
> Note that the timer code when matching for the map is checking for
> constant map pointers. I wonder if this needs to be enforced too
> (being constant?)
> ---
> include/uapi/linux/bpf.h | 9 ++++
> kernel/bpf/helpers.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++-
> kernel/bpf/verifier.c | 6 +++
> 3 files changed, 127 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index e4ae83550fb3..519f6019d158 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7502,4 +7502,13 @@ struct bpf_iter_num {
> __u64 __opaque[1];
> } __attribute__((aligned(8)));
>
> +/*
> + * Flags to control bpf_wq_init() and bpf_wq_set_callback() behaviour.
> + * - BPF_F_WQ_SLEEPABLE: the callback needs to run in
> + * a sleepable context
> + */
> +enum {
> + BPF_F_WQ_SLEEPABLE = (1ULL << 0),
> +};

Just started looking at the patch set. The first reaction that
this flag is odd. Why add it? wq provides sleepable ctx.
Why would the program ask to be non-sleepable in wq?
If it needs to callback to run in rcu cs, it can use bpf_rcu_read_lock() kfunc
as the first call in such callback and it will be equivalent
to not-passing this BPF_F_WQ_SLEEPABLE flag.
It seem it can be dropped and complexity reduced.
The verifier complications in later patches due to this flag too...
I just don't see the point.

> +
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9fd12d480b8b..9ac1b8bb3a01 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1109,11 +1109,18 @@ struct bpf_hrtimer {
> struct hrtimer timer;
> };
>
> -/* the actual struct hidden inside uapi struct bpf_timer */
> +struct bpf_work {
> + struct bpf_async_cb cb;
> + struct work_struct work;
> + struct work_struct delete_work;
> +};
> +
> +/* the actual struct hidden inside uapi struct bpf_timer and bpf_wq */
> struct bpf_async_kern {
> union {
> struct bpf_async_cb *cb;
> struct bpf_hrtimer *timer;
> + struct bpf_work *work;
> };
> /* bpf_spin_lock is used here instead of spinlock_t to make
> * sure that it always fits into space reserved by struct bpf_timer
> @@ -1124,6 +1131,7 @@ struct bpf_async_kern {
>
> enum bpf_async_type {
> BPF_ASYNC_TYPE_TIMER = 0,
> + BPF_ASYNC_TYPE_WQ,
> };
>
> static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
> @@ -1167,11 +1175,75 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
> return HRTIMER_NORESTART;
> }
>
> +static void bpf_wq_work(struct work_struct *work)
> +{
> + struct bpf_work *w = container_of(work, struct bpf_work, work);
> + struct bpf_tramp_run_ctx __maybe_unused run_ctx;
> + struct bpf_prog *prog = w->cb.prog;
> + unsigned int flags = w->cb.flags;
> + struct bpf_map *map = w->cb.map;
> + bpf_callback_t callback_fn;
> + void *value = w->cb.value;
> + void *key;
> + u32 idx;
> +
> + BTF_TYPE_EMIT(struct bpf_wq);
> +
> + callback_fn = READ_ONCE(w->cb.callback_fn);
> + if (!callback_fn || !prog)
> + return;
> +
> + if (map->map_type == BPF_MAP_TYPE_ARRAY) {
> + struct bpf_array *array = container_of(map, struct bpf_array, map);
> +
> + /* compute the key */
> + idx = ((char *)value - array->value) / array->elem_size;
> + key = &idx;
> + } else { /* hash or lru */
> + key = value - round_up(map->key_size, 8);
> + }
> +
> + run_ctx.bpf_cookie = 0;
> +
> + if (flags & BPF_F_WQ_SLEEPABLE) {
> + if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) {
> + /* recursion detected */
> + __bpf_prog_exit_sleepable_recur(prog, 0, &run_ctx);
> + return;
> + }
> + } else {
> + if (!__bpf_prog_enter_recur(prog, &run_ctx)) {
> + /* recursion detected */
> + __bpf_prog_exit_recur(prog, 0, &run_ctx);
> + return;
> + }
> + }
> +
> + callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
> + /* The verifier checked that return value is zero. */
> +
> + if (flags & BPF_F_WQ_SLEEPABLE)
> + __bpf_prog_exit_sleepable_recur(prog, 0 /* bpf_prog_run does runtime stats */,
> + &run_ctx);
> + else
> + __bpf_prog_exit_recur(prog, 0, &run_ctx);
> +}
> +
> +static void bpf_wq_delete_work(struct work_struct *work)
> +{
> + struct bpf_work *w = container_of(work, struct bpf_work, delete_work);
> +
> + cancel_work_sync(&w->work);
> +
> + kfree_rcu(w, cb.rcu);
> +}
> +
> static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
> enum bpf_async_type type)
> {
> struct bpf_async_cb *cb;
> struct bpf_hrtimer *t;
> + struct bpf_work *w;
> clockid_t clockid;
> size_t size;
> int ret = 0;
> @@ -1183,6 +1255,9 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
> case BPF_ASYNC_TYPE_TIMER:
> size = sizeof(struct bpf_hrtimer);
> break;
> + case BPF_ASYNC_TYPE_WQ:
> + size = sizeof(struct bpf_work);
> + break;
> default:
> return -EINVAL;
> }
> @@ -1201,13 +1276,22 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
> goto out;
> }
>
> - if (type == BPF_ASYNC_TYPE_TIMER) {
> + switch (type) {
> + case BPF_ASYNC_TYPE_TIMER:
> clockid = flags & (MAX_CLOCKS - 1);
> t = (struct bpf_hrtimer *)cb;
>
> hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
> t->timer.function = bpf_timer_cb;
> cb->value = (void *)async - map->record->timer_off;
> + break;
> + case BPF_ASYNC_TYPE_WQ:
> + w = (struct bpf_work *)cb;
> +
> + INIT_WORK(&w->work, bpf_wq_work);
> + INIT_WORK(&w->delete_work, bpf_wq_delete_work);
> + cb->value = (void *)async - map->record->wq_off;
> + break;
> }
> cb->map = map;
> cb->prog = NULL;
> @@ -1473,7 +1557,19 @@ void bpf_timer_cancel_and_free(void *val)
> */
> void bpf_wq_cancel_and_free(void *val)
> {
> + struct bpf_work *work;
> +
> BTF_TYPE_EMIT(struct bpf_wq);
> +
> + work = (struct bpf_work *)__bpf_async_cancel_and_free(val);
> + if (!work)
> + return;
> + /* Trigger cancel of the sleepable work, but *do not* wait for
> + * it to finish if it was running as we might not be in a
> + * sleepable context.
> + * kfree will be called once the work has finished.
> + */
> + schedule_work(&work->delete_work);
> }
>
> BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
> @@ -2612,6 +2708,19 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> WARN(1, "A call to BPF exception callback should never return\n");
> }
>
> +__bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags)
> +{
> + struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
> +
> + BUILD_BUG_ON(sizeof(struct bpf_async_kern) > sizeof(struct bpf_wq));
> + BUILD_BUG_ON(__alignof__(struct bpf_async_kern) != __alignof__(struct bpf_wq));
> +
> + if (flags & ~BPF_F_WQ_SLEEPABLE)
> + return -EINVAL;
> +
> + return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
> +}
> +
> __bpf_kfunc_end_defs();
>
> BTF_KFUNCS_START(generic_btf_ids)
> @@ -2689,6 +2798,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> BTF_ID_FLAGS(func, bpf_dynptr_size)
> BTF_ID_FLAGS(func, bpf_dynptr_clone)
> BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
> +BTF_ID_FLAGS(func, bpf_wq_init)
> BTF_KFUNCS_END(common_btf_ids)
>
> static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 112faf2cd7e9..5e8c1e65fe8c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -11038,6 +11038,7 @@ enum special_kfunc_type {
> KF_bpf_percpu_obj_drop_impl,
> KF_bpf_throw,
> KF_bpf_iter_css_task_new,
> + KF_bpf_wq_init,
> };
>
> BTF_SET_START(special_kfunc_set)
> @@ -11064,6 +11065,7 @@ BTF_ID(func, bpf_throw)
> #ifdef CONFIG_CGROUPS
> BTF_ID(func, bpf_iter_css_task_new)
> #endif
> +BTF_ID(func, bpf_wq_init)
> BTF_SET_END(special_kfunc_set)
>
> BTF_ID_LIST(special_kfunc_list)
> @@ -11094,6 +11096,7 @@ BTF_ID(func, bpf_iter_css_task_new)
> #else
> BTF_ID_UNUSED
> #endif
> +BTF_ID(func, bpf_wq_init)
>
> static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
> {
> @@ -11171,6 +11174,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
> if (is_kfunc_arg_wq(meta->btf, &args[argno]))
> return KF_ARG_PTR_TO_WORKQUEUE;
>
> + if (meta->func_id == special_kfunc_list[KF_bpf_wq_init] && argno == 1)
> + return KF_ARG_PTR_TO_MAP;
> +

Hmm. This function has this bit:
if (is_kfunc_arg_map(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_MAP;

Just do:
+__bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map__map, ...

It was specifically added for bpf_arena_alloc_pages to pass pointer to a map.
In case of arena map type it's used as:
struct {
__uint(type, BPF_MAP_TYPE_ARENA);
...
} arena SEC(".maps");

page = bpf_arena_alloc_pages(&arena, ...)
libbpf and the verifier do the right thing.
I think it should work here as well.

2024-04-19 06:01:16

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next 08/18] bpf: add support for KF_ARG_PTR_TO_WORKQUEUE

On Tue, Apr 16, 2024 at 04:08:21PM +0200, Benjamin Tissoires wrote:
> Introduce support for KF_ARG_PTR_TO_WORKQUEUE. The kfuncs will use bpf_wq
> as argument and that will be recognized as workqueue argument by verifier.
> bpf_wq_kern casting can happen inside kfunc, but using bpf_wq in
> argument makes life easier for users who work with non-kern type in BPF
> progs.
>
> Duplicate process_timer_func into process_wq_func.
> meta argument is only needed to ensure bpf_wq_init's workqueue and map
> arguments are coming from the same map (map_uid logic is necessary for
> correct inner-map handling), so also amend check_kfunc_args() to
> match what helpers functions check is doing.
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
> ---
> kernel/bpf/verifier.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 86 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index deaf2e1ab690..112faf2cd7e9 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -332,6 +332,10 @@ struct bpf_kfunc_call_arg_meta {
> u8 spi;
> u8 frameno;
> } iter;
> + struct {
> + struct bpf_map *ptr;
> + int uid;
> + } map;
> u64 mem_size;
> };
>
> @@ -7598,6 +7602,43 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
> return 0;
> }
>
> +static int process_wq_func(struct bpf_verifier_env *env, int regno,
> + struct bpf_kfunc_call_arg_meta *meta)
> +{
> + struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> + bool is_const = tnum_is_const(reg->var_off);
> + struct bpf_map *map = reg->map_ptr;
> + u64 val = reg->var_off.value;
> +
> + if (!is_const) {
> + verbose(env,
> + "R%d doesn't have constant offset. bpf_wq has to be at the constant offset\n",
> + regno);
> + return -EINVAL;
> + }

This check is unnecessary.
Before calling into this function it goes as:
switch (kf_arg_type) {
case KF_ARG_PTR_TO_MAP:
... your new checks ...
fallthrough;
case KF_ARG_PTR_TO_CTX:
arg_type |= OBJ_RELEASE;
break;

check_func_arg_reg_off()
if (arg_type_is_release(arg_type)
return __check_ptr_off_reg()
// that does offset checks

> + if (!map->btf) {
> + verbose(env, "map '%s' has to have BTF in order to use bpf_wq\n",
> + map->name);
> + return -EINVAL;
> + }
> + if (!btf_record_has_field(map->record, BPF_WORKQUEUE)) {
> + verbose(env, "map '%s' has no valid bpf_wq\n", map->name);
> + return -EINVAL;
> + }

above two are also unnecessary. wq_off will be negative otherwise.

> + if (map->record->wq_off != val + reg->off) {
> + verbose(env, "off %lld doesn't point to 'struct bpf_wq' that is at %d\n",
> + val + reg->off, map->record->wq_off);
> + return -EINVAL;
> + }

only this one is needed, but I need to study the rest of the patches.

> + if (meta->map.ptr) {
> + verbose(env, "verifier bug. Two map pointers in a workqueue helper\n");
> + return -EFAULT;
> + }

this was checked already as well.

> + meta->map.uid = reg->map_uid;
> + meta->map.ptr = map;
> + return 0;
> +}
> +
> static int process_kptr_func(struct bpf_verifier_env *env, int regno,
> struct bpf_call_arg_meta *meta)
> {
> @@ -10843,6 +10884,7 @@ enum {
> KF_ARG_LIST_NODE_ID,
> KF_ARG_RB_ROOT_ID,
> KF_ARG_RB_NODE_ID,
> + KF_ARG_WORKQUEUE_ID,
> };
>
> BTF_ID_LIST(kf_arg_btf_ids)
> @@ -10851,6 +10893,7 @@ BTF_ID(struct, bpf_list_head)
> BTF_ID(struct, bpf_list_node)
> BTF_ID(struct, bpf_rb_root)
> BTF_ID(struct, bpf_rb_node)
> +BTF_ID(struct, bpf_wq)
>
> static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
> const struct btf_param *arg, int type)
> @@ -10894,6 +10937,11 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
> return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
> }
>
> +static bool is_kfunc_arg_wq(const struct btf *btf, const struct btf_param *arg)
> +{
> + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_WORKQUEUE_ID);
> +}
> +
> static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
> const struct btf_param *arg)
> {
> @@ -10963,6 +11011,7 @@ enum kfunc_ptr_arg_type {
> KF_ARG_PTR_TO_NULL,
> KF_ARG_PTR_TO_CONST_STR,
> KF_ARG_PTR_TO_MAP,
> + KF_ARG_PTR_TO_WORKQUEUE,
> };
>
> enum special_kfunc_type {
> @@ -11119,6 +11168,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
> if (is_kfunc_arg_map(meta->btf, &args[argno]))
> return KF_ARG_PTR_TO_MAP;
>
> + if (is_kfunc_arg_wq(meta->btf, &args[argno]))
> + return KF_ARG_PTR_TO_WORKQUEUE;
> +
> if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
> if (!btf_type_is_struct(ref_t)) {
> verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
> @@ -11720,6 +11772,30 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> case KF_ARG_PTR_TO_NULL:
> continue;
> case KF_ARG_PTR_TO_MAP:
> + if (meta->map.ptr) {
> + /* Use map_uid (which is unique id of inner map) to reject:
> + * inner_map1 = bpf_map_lookup_elem(outer_map, key1)
> + * inner_map2 = bpf_map_lookup_elem(outer_map, key2)
> + * if (inner_map1 && inner_map2) {
> + * timer = bpf_map_lookup_elem(inner_map1);
> + * if (timer)
> + * // mismatch would have been allowed
> + * bpf_timer_init(timer, inner_map2);

adjust the comment too. Here it can only be wq.

> + * }
> + *
> + * Comparing map_ptr is enough to distinguish normal and outer maps.
> + */
> + if (meta->map.ptr != reg->map_ptr ||
> + meta->map.uid != reg->map_uid) {

let's also gate this check with map->record->wq_off.
Otherwise it will trigger for maps that don't have wq field.

Copy paste is a bit unfortunate, but fine. We can clean it up later.

> + verbose(env,
> + "workqueue pointer in R1 map_uid=%d doesn't match map pointer in R2 map_uid=%d\n",
> + meta->map.uid, reg->map_uid);
> + return -EINVAL;
> + }
> + }
> + meta->map.ptr = reg->map_ptr;
> + meta->map.uid = reg->map_uid;
> + fallthrough;
> case KF_ARG_PTR_TO_ALLOC_BTF_ID:
> case KF_ARG_PTR_TO_BTF_ID:
> if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
> @@ -11752,6 +11828,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> case KF_ARG_PTR_TO_CALLBACK:
> case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
> case KF_ARG_PTR_TO_CONST_STR:
> + case KF_ARG_PTR_TO_WORKQUEUE:
> /* Trusted by default */
> break;
> default:
> @@ -12038,6 +12115,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> if (ret)
> return ret;
> break;
> + case KF_ARG_PTR_TO_WORKQUEUE:
> + if (reg->type != PTR_TO_MAP_VALUE) {
> + verbose(env, "arg#%d doesn't point to a map value\n", i);
> + return -EINVAL;
> + }
> + ret = process_wq_func(env, regno, meta);
> + if (ret < 0)
> + return ret;
> + break;
> }
> }
>
>
> --
> 2.44.0
>

2024-04-19 06:02:43

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next 06/18] bpf: add support for bpf_wq user type

On Tue, Apr 16, 2024 at 04:08:19PM +0200, Benjamin Tissoires wrote:
> Mostly a copy/paste from the bpf_timer API, without the initialization
> and free, as they will be done in a separate patch.
>
> Signed-off-by: Benjamin Tissoires <[email protected]>

Patches 2-6 look good.

2024-04-19 06:05:23

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next 09/18] bpf: allow struct bpf_wq to be embedded in arraymaps and hashmaps

On Tue, Apr 16, 2024 at 04:08:22PM +0200, Benjamin Tissoires wrote:
> WRITE_ONCE(*(u64 *)field_ptr, 0);
> @@ -1119,6 +1127,8 @@ static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
> }
> break;
> case BPF_TIMER:
> + fallthrough;
> + case BPF_WORKQUEUE:

fallthrough is unnecessary when cases are back to back.

2024-04-19 06:19:03

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next 17/18] bpf: add bpf_wq_start

On Tue, Apr 16, 2024 at 04:08:30PM +0200, Benjamin Tissoires wrote:
> again, copy/paste from bpf_timer_start().
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
> ---
> kernel/bpf/helpers.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index e5c8adc44619..ed5309a37eda 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2728,6 +2728,29 @@ __bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags)
> return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
> }
>
> +__bpf_kfunc int bpf_wq_start(struct bpf_wq *wq, unsigned int flags)
> +{
> + struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
> + struct bpf_work *w;
> + int ret = 0;
> +
> + if (in_nmi())
> + return -EOPNOTSUPP;
> + if (flags)
> + return -EINVAL;
> + __bpf_spin_lock_irqsave(&async->lock);
> + w = async->work;
> + if (!w || !w->cb.prog) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + schedule_work(&w->work);
> +out:
> + __bpf_spin_unlock_irqrestore(&async->lock);

Looks like you're not adding wq_cancel kfunc in this patch set and
it's probably a good thing not to expose async cancel to bpf users,
since it's a foot gun.
Even when we eventually add wq_cancel_sync kfunc it will not be
removing a callback.
So we can drop spinlock here.
READ_ONCE of w and cb would be enough.
Since they cannot get back to NULL once init-ed and cb is set.

2024-04-19 15:13:08

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH bpf-next 11/18] bpf: wq: add bpf_wq_init

On Apr 18 2024, Alexei Starovoitov wrote:
> On Tue, Apr 16, 2024 at 04:08:24PM +0200, Benjamin Tissoires wrote:
> > We need to teach the verifier about the second argument which is declared
> > as void * but which is of type KF_ARG_PTR_TO_MAP. We could have dropped
> > this extra case if we declared the second argument as struct bpf_map *,
> > but that means users will have to do extra casting to have their program
> > compile.
> >
> > We also need to duplicate the timer code for the checking if the map
> > argument is matching the provided workqueue.
> >
> > Signed-off-by: Benjamin Tissoires <[email protected]>
> >
> > ---
> >
> > Note that the timer code when matching for the map is checking for
> > constant map pointers. I wonder if this needs to be enforced too
> > (being constant?)
> > ---
> > include/uapi/linux/bpf.h | 9 ++++
> > kernel/bpf/helpers.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++-
> > kernel/bpf/verifier.c | 6 +++
> > 3 files changed, 127 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index e4ae83550fb3..519f6019d158 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -7502,4 +7502,13 @@ struct bpf_iter_num {
> > __u64 __opaque[1];
> > } __attribute__((aligned(8)));
> >
> > +/*
> > + * Flags to control bpf_wq_init() and bpf_wq_set_callback() behaviour.
> > + * - BPF_F_WQ_SLEEPABLE: the callback needs to run in
> > + * a sleepable context
> > + */
> > +enum {
> > + BPF_F_WQ_SLEEPABLE = (1ULL << 0),
> > +};
>
> Just started looking at the patch set. The first reaction that
> this flag is odd. Why add it? wq provides sleepable ctx.
> Why would the program ask to be non-sleepable in wq?

That was one of my questions: do we need it? Apparently not :)

If not, then that simplifies the series by a lot: patch 1/18 can be
dropped and 14/18 can be stripped down and squashed in 15/18.

> If it needs to callback to run in rcu cs, it can use bpf_rcu_read_lock() kfunc
> as the first call in such callback and it will be equivalent
> to not-passing this BPF_F_WQ_SLEEPABLE flag.
> It seem it can be dropped and complexity reduced.

yep :)

> The verifier complications in later patches due to this flag too...
> I just don't see the point.

It's something I added while adding the tests. And some tests were passing
in case I was having a non sleepable callback. But if we have
bpf_rcu_read_lock(), we are all fine and can reduce the complexity.

>
> > +
> > #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 9fd12d480b8b..9ac1b8bb3a01 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -1109,11 +1109,18 @@ struct bpf_hrtimer {
> > struct hrtimer timer;
> > };
> >
> > -/* the actual struct hidden inside uapi struct bpf_timer */
> > +struct bpf_work {
> > + struct bpf_async_cb cb;
> > + struct work_struct work;
> > + struct work_struct delete_work;
> > +};
> > +
> > +/* the actual struct hidden inside uapi struct bpf_timer and bpf_wq */
> > struct bpf_async_kern {
> > union {
> > struct bpf_async_cb *cb;
> > struct bpf_hrtimer *timer;
> > + struct bpf_work *work;
> > };
> > /* bpf_spin_lock is used here instead of spinlock_t to make
> > * sure that it always fits into space reserved by struct bpf_timer
> > @@ -1124,6 +1131,7 @@ struct bpf_async_kern {
> >
> > enum bpf_async_type {
> > BPF_ASYNC_TYPE_TIMER = 0,
> > + BPF_ASYNC_TYPE_WQ,
> > };
> >
> > static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
> > @@ -1167,11 +1175,75 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
> > return HRTIMER_NORESTART;
> > }
> >
> > +static void bpf_wq_work(struct work_struct *work)
> > +{
> > + struct bpf_work *w = container_of(work, struct bpf_work, work);
> > + struct bpf_tramp_run_ctx __maybe_unused run_ctx;
> > + struct bpf_prog *prog = w->cb.prog;
> > + unsigned int flags = w->cb.flags;
> > + struct bpf_map *map = w->cb.map;
> > + bpf_callback_t callback_fn;
> > + void *value = w->cb.value;
> > + void *key;
> > + u32 idx;
> > +
> > + BTF_TYPE_EMIT(struct bpf_wq);
> > +
> > + callback_fn = READ_ONCE(w->cb.callback_fn);
> > + if (!callback_fn || !prog)
> > + return;
> > +
> > + if (map->map_type == BPF_MAP_TYPE_ARRAY) {
> > + struct bpf_array *array = container_of(map, struct bpf_array, map);
> > +
> > + /* compute the key */
> > + idx = ((char *)value - array->value) / array->elem_size;
> > + key = &idx;
> > + } else { /* hash or lru */
> > + key = value - round_up(map->key_size, 8);
> > + }
> > +
> > + run_ctx.bpf_cookie = 0;
> > +
> > + if (flags & BPF_F_WQ_SLEEPABLE) {
> > + if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) {
> > + /* recursion detected */
> > + __bpf_prog_exit_sleepable_recur(prog, 0, &run_ctx);
> > + return;
> > + }
> > + } else {
> > + if (!__bpf_prog_enter_recur(prog, &run_ctx)) {
> > + /* recursion detected */
> > + __bpf_prog_exit_recur(prog, 0, &run_ctx);
> > + return;
> > + }
> > + }
> > +
> > + callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
> > + /* The verifier checked that return value is zero. */
> > +
> > + if (flags & BPF_F_WQ_SLEEPABLE)
> > + __bpf_prog_exit_sleepable_recur(prog, 0 /* bpf_prog_run does runtime stats */,
> > + &run_ctx);
> > + else
> > + __bpf_prog_exit_recur(prog, 0, &run_ctx);
> > +}
> > +
> > +static void bpf_wq_delete_work(struct work_struct *work)
> > +{
> > + struct bpf_work *w = container_of(work, struct bpf_work, delete_work);
> > +
> > + cancel_work_sync(&w->work);
> > +
> > + kfree_rcu(w, cb.rcu);
> > +}
> > +
> > static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
> > enum bpf_async_type type)
> > {
> > struct bpf_async_cb *cb;
> > struct bpf_hrtimer *t;
> > + struct bpf_work *w;
> > clockid_t clockid;
> > size_t size;
> > int ret = 0;
> > @@ -1183,6 +1255,9 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
> > case BPF_ASYNC_TYPE_TIMER:
> > size = sizeof(struct bpf_hrtimer);
> > break;
> > + case BPF_ASYNC_TYPE_WQ:
> > + size = sizeof(struct bpf_work);
> > + break;
> > default:
> > return -EINVAL;
> > }
> > @@ -1201,13 +1276,22 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
> > goto out;
> > }
> >
> > - if (type == BPF_ASYNC_TYPE_TIMER) {
> > + switch (type) {
> > + case BPF_ASYNC_TYPE_TIMER:
> > clockid = flags & (MAX_CLOCKS - 1);
> > t = (struct bpf_hrtimer *)cb;
> >
> > hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
> > t->timer.function = bpf_timer_cb;
> > cb->value = (void *)async - map->record->timer_off;
> > + break;
> > + case BPF_ASYNC_TYPE_WQ:
> > + w = (struct bpf_work *)cb;
> > +
> > + INIT_WORK(&w->work, bpf_wq_work);
> > + INIT_WORK(&w->delete_work, bpf_wq_delete_work);
> > + cb->value = (void *)async - map->record->wq_off;
> > + break;
> > }
> > cb->map = map;
> > cb->prog = NULL;
> > @@ -1473,7 +1557,19 @@ void bpf_timer_cancel_and_free(void *val)
> > */
> > void bpf_wq_cancel_and_free(void *val)
> > {
> > + struct bpf_work *work;
> > +
> > BTF_TYPE_EMIT(struct bpf_wq);
> > +
> > + work = (struct bpf_work *)__bpf_async_cancel_and_free(val);
> > + if (!work)
> > + return;
> > + /* Trigger cancel of the sleepable work, but *do not* wait for
> > + * it to finish if it was running as we might not be in a
> > + * sleepable context.
> > + * kfree will be called once the work has finished.
> > + */
> > + schedule_work(&work->delete_work);
> > }
> >
> > BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
> > @@ -2612,6 +2708,19 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> > WARN(1, "A call to BPF exception callback should never return\n");
> > }
> >
> > +__bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags)
> > +{
> > + struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
> > +
> > + BUILD_BUG_ON(sizeof(struct bpf_async_kern) > sizeof(struct bpf_wq));
> > + BUILD_BUG_ON(__alignof__(struct bpf_async_kern) != __alignof__(struct bpf_wq));
> > +
> > + if (flags & ~BPF_F_WQ_SLEEPABLE)
> > + return -EINVAL;
> > +
> > + return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
> > +}
> > +
> > __bpf_kfunc_end_defs();
> >
> > BTF_KFUNCS_START(generic_btf_ids)
> > @@ -2689,6 +2798,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> > BTF_ID_FLAGS(func, bpf_dynptr_size)
> > BTF_ID_FLAGS(func, bpf_dynptr_clone)
> > BTF_ID_FLAGS(func, bpf_modify_return_test_tp)
> > +BTF_ID_FLAGS(func, bpf_wq_init)
> > BTF_KFUNCS_END(common_btf_ids)
> >
> > static const struct btf_kfunc_id_set common_kfunc_set = {
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 112faf2cd7e9..5e8c1e65fe8c 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -11038,6 +11038,7 @@ enum special_kfunc_type {
> > KF_bpf_percpu_obj_drop_impl,
> > KF_bpf_throw,
> > KF_bpf_iter_css_task_new,
> > + KF_bpf_wq_init,
> > };
> >
> > BTF_SET_START(special_kfunc_set)
> > @@ -11064,6 +11065,7 @@ BTF_ID(func, bpf_throw)
> > #ifdef CONFIG_CGROUPS
> > BTF_ID(func, bpf_iter_css_task_new)
> > #endif
> > +BTF_ID(func, bpf_wq_init)
> > BTF_SET_END(special_kfunc_set)
> >
> > BTF_ID_LIST(special_kfunc_list)
> > @@ -11094,6 +11096,7 @@ BTF_ID(func, bpf_iter_css_task_new)
> > #else
> > BTF_ID_UNUSED
> > #endif
> > +BTF_ID(func, bpf_wq_init)
> >
> > static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
> > {
> > @@ -11171,6 +11174,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
> > if (is_kfunc_arg_wq(meta->btf, &args[argno]))
> > return KF_ARG_PTR_TO_WORKQUEUE;
> >
> > + if (meta->func_id == special_kfunc_list[KF_bpf_wq_init] && argno == 1)
> > + return KF_ARG_PTR_TO_MAP;
> > +
>
> Hmm. This function has this bit:
> if (is_kfunc_arg_map(meta->btf, &args[argno]))
> return KF_ARG_PTR_TO_MAP;
>
> Just do:
> +__bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map__map, ...

Oh, nice. I haven't noticed that. I'll try it in v2.

Cheers,
Benjamin

>
> It was specifically added for bpf_arena_alloc_pages to pass pointer to a map.
> In case of arena map type it's used as:
> struct {
> __uint(type, BPF_MAP_TYPE_ARENA);
> ...
> } arena SEC(".maps");
>
> page = bpf_arena_alloc_pages(&arena, ...)
> libbpf and the verifier do the right thing.
> I think it should work here as well.

2024-04-19 15:15:07

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH bpf-next 17/18] bpf: add bpf_wq_start

On Apr 18 2024, Alexei Starovoitov wrote:
> On Tue, Apr 16, 2024 at 04:08:30PM +0200, Benjamin Tissoires wrote:
> > again, copy/paste from bpf_timer_start().
> >
> > Signed-off-by: Benjamin Tissoires <[email protected]>
> > ---
> > kernel/bpf/helpers.c | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index e5c8adc44619..ed5309a37eda 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -2728,6 +2728,29 @@ __bpf_kfunc int bpf_wq_init(struct bpf_wq *wq, void *map, unsigned int flags)
> > return __bpf_async_init(async, map, flags, BPF_ASYNC_TYPE_WQ);
> > }
> >
> > +__bpf_kfunc int bpf_wq_start(struct bpf_wq *wq, unsigned int flags)
> > +{
> > + struct bpf_async_kern *async = (struct bpf_async_kern *)wq;
> > + struct bpf_work *w;
> > + int ret = 0;
> > +
> > + if (in_nmi())
> > + return -EOPNOTSUPP;
> > + if (flags)
> > + return -EINVAL;
> > + __bpf_spin_lock_irqsave(&async->lock);
> > + w = async->work;
> > + if (!w || !w->cb.prog) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + schedule_work(&w->work);
> > +out:
> > + __bpf_spin_unlock_irqrestore(&async->lock);
>
> Looks like you're not adding wq_cancel kfunc in this patch set and
> it's probably a good thing not to expose async cancel to bpf users,
> since it's a foot gun.

Honestly I just felt the patch series was big enough for a PoC and
comparison with sleepable bpf_timer. But if we think this needs not to
be added, I guess that works too :)

> Even when we eventually add wq_cancel_sync kfunc it will not be
> removing a callback.

Yeah, I understood that bit :)

> So we can drop spinlock here.
> READ_ONCE of w and cb would be enough.
> Since they cannot get back to NULL once init-ed and cb is set.

Great, thanks for the review (and the other patches).

I'll work toward v2.

Cheers,
Benjamin

2024-04-19 15:34:50

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next 11/18] bpf: wq: add bpf_wq_init

On Fri, Apr 19, 2024 at 8:12 AM Benjamin Tissoires <[email protected]> wrote:
>
>
> It's something I added while adding the tests. And some tests were passing
> in case I was having a non sleepable callback. But if we have
> bpf_rcu_read_lock(), we are all fine and can reduce the complexity.

Not quite following what was the issue.
Since the verifier is unconditionally verifying such callback as sleepable
the callback won't be able to access rcu pointers without doing
explicit bpf_rcu_read_lock() first (and few other code patterns
might be rejected), but that's a good thing.
Maybe next to set_cb kfunc add a comment that wq callbacks are sleepable.
I think bpf prog writers are often with kernel background,
so it will be their natural assumption that cb is sleepable.

2024-04-19 15:51:18

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next 17/18] bpf: add bpf_wq_start

On Fri, Apr 19, 2024 at 8:14 AM Benjamin Tissoires <[email protected]> wrote:
>
>
> Honestly I just felt the patch series was big enough for a PoC and
> comparison with sleepable bpf_timer. But if we think this needs not to
> be added, I guess that works too :)

It certainly did its job to compare the two and imo bpf_wq with kfunc approach
looks cleaner overall and will be easier to extend in the long term.

I mean that we'll be adding 3 kfuncs initially:
bpf_wq_init, bpf_wq_start, bpf_wq_set_callback.

imo that's good enough to land it and get some exposure.
I'll be using it right away to refactor bpf_arena_alloc.h into
actual arena allocator for bpf progs that is not just a selftest.

I'm currently working on locks for bpf_arena.
Kumar has a patch set that adds bpf_preempt_disble kfunc and
coupled with bpf_wq we'll have all mechanisms to build
arbitrary data structures/algorithms as bpf programs.

2024-04-19 15:55:39

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH bpf-next 11/18] bpf: wq: add bpf_wq_init

On Apr 19 2024, Alexei Starovoitov wrote:
> On Fri, Apr 19, 2024 at 8:12 AM Benjamin Tissoires <[email protected]> wrote:
> >
> >
> > It's something I added while adding the tests. And some tests were passing
> > in case I was having a non sleepable callback. But if we have
> > bpf_rcu_read_lock(), we are all fine and can reduce the complexity.
>
> Not quite following what was the issue.
> Since the verifier is unconditionally verifying such callback as sleepable
> the callback won't be able to access rcu pointers without doing
> explicit bpf_rcu_read_lock() first (and few other code patterns
> might be rejected), but that's a good thing.

Oh, I missed that. Well, given that the verifier enforces everything, I
guess we are good :)

> Maybe next to set_cb kfunc add a comment that wq callbacks are sleepable.
> I think bpf prog writers are often with kernel background,
> so it will be their natural assumption that cb is sleepable.

I assume so as well.

Cheers,
Benjamin

2024-04-19 16:01:45

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH bpf-next 17/18] bpf: add bpf_wq_start

On Apr 19 2024, Alexei Starovoitov wrote:
> On Fri, Apr 19, 2024 at 8:14 AM Benjamin Tissoires <[email protected]> wrote:
> >
> >
> > Honestly I just felt the patch series was big enough for a PoC and
> > comparison with sleepable bpf_timer. But if we think this needs not to
> > be added, I guess that works too :)
>
> It certainly did its job to compare the two and imo bpf_wq with kfunc approach
> looks cleaner overall and will be easier to extend in the long term.

Yeah. I agree. I'm also glad we pick the bpf_wq approach as I gave it a
lot more care :)

Talking about extending, I think I'll need delayed_work soon enough.
Most of the time when I receive an input event, the device is preventing
any communication with it, and with plain bpf_wq, it's likely that when
the code kicks in the device won't have processed the current input,
meaning to a useless retry. With delayed_works, I can schedule it
slightly later, and have a higher chance of not having to retry.

I've got a quick hack locally that I can submit once this series get
merged.

>
> I mean that we'll be adding 3 kfuncs initially:
> bpf_wq_init, bpf_wq_start, bpf_wq_set_callback.
>
> imo that's good enough to land it and get some exposure.

sounds good to me.

> I'll be using it right away to refactor bpf_arena_alloc.h into
> actual arena allocator for bpf progs that is not just a selftest.
>
> I'm currently working on locks for bpf_arena.
> Kumar has a patch set that adds bpf_preempt_disble kfunc and
> coupled with bpf_wq we'll have all mechanisms to build
> arbitrary data structures/algorithms as bpf programs.

Oh. I did not realize that it was that needed for outside of my
playground. That's good to hear :)

Cheers,
Benjamin