2024-04-08 08:09:59

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH RFC bpf-next v6 0/6] sleepable bpf_timer (was: allow HID-BPF to do device IOs)

New version of the sleepable bpf_timer code.

I'm posting this as this is the result of the previous review, so we can
have a baseline to compare to.

The plan is now to introduce a new user API struct bpf_wq, as the timer
API working on softIRQ seems to be quite far away from a wq.

For reference, the use cases I have in mind:

---

Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.

2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.

3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.

Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.

bpf_timers are currently running in a soft IRQ context, this patch
series implements a sleppable context for them.

Cheers,
Benjamin

To: Alexei Starovoitov <[email protected]>
To: Daniel Borkmann <[email protected]>
To: Andrii Nakryiko <[email protected]>
To: Martin KaFai Lau <[email protected]>
To: Eduard Zingerman <[email protected]>
To: Song Liu <[email protected]>
To: Yonghong Song <[email protected]>
To: John Fastabend <[email protected]>
To: KP Singh <[email protected]>
To: Stanislav Fomichev <[email protected]>
To: Hao Luo <[email protected]>
To: Jiri Olsa <[email protected]>
To: Mykola Lysenko <[email protected]>
To: Shuah Khan <[email protected]>
Cc: Benjamin Tissoires <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>

---
Changes in v6:
- Use of a workqueue to clean up sleepable timers
- integrated Kumar's patch instead of mine
- Link to v5: https://lore.kernel.org/r/[email protected]

Changes in v5:
- took various reviews into account
- rewrote the tests to be separated to not have a uggly include
- Link to v4: https://lore.kernel.org/r/[email protected]

Changes in v4:
- dropped the HID changes, they can go independently from bpf-core
- addressed Alexei's and Eduard's remarks
- added selftests
- Link to v3: https://lore.kernel.org/r/[email protected]

Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/[email protected]

Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/[email protected]

---
Benjamin Tissoires (5):
bpf/helpers: introduce sleepable bpf_timers
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
tools: sync include/uapi/linux/bpf.h
selftests/bpf: add sleepable timer tests

Kumar Kartikeya Dwivedi (1):
bpf: Add support for KF_ARG_PTR_TO_TIMER

include/linux/bpf_verifier.h | 1 +
include/uapi/linux/bpf.h | 13 ++
kernel/bpf/helpers.c | 202 ++++++++++++++++---
kernel/bpf/verifier.c | 98 +++++++++-
tools/include/uapi/linux/bpf.h | 20 +-
tools/testing/selftests/bpf/bpf_experimental.h | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 34 ++++
.../testing/selftests/bpf/progs/timer_sleepable.c | 213 +++++++++++++++++++++
10 files changed, 553 insertions(+), 39 deletions(-)
---
base-commit: 61df575632d6b39213f47810c441bddbd87c3606
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4

Best regards,
--
Benjamin Tissoires <[email protected]>



2024-04-08 08:10:17

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH RFC bpf-next v6 1/6] bpf/helpers: introduce sleepable bpf_timers

They are implemented as a workqueue, which means that there are no
guarantees of timing nor ordering.

Signed-off-by: Benjamin Tissoires <[email protected]>

---

changes in v6:
- dropped extra spinlock
- implement cancel_and_free of a sleepable timer through
a dedicated workqueue

no changes in v5

changes in v4:
- dropped __bpf_timer_compute_key()
- use a spin_lock instead of a semaphore
- ensure bpf_timer_cancel_and_free is not complaining about
non sleepable context and use cancel_work() instead of
cancel_work_sync()
- return -EINVAL if a delay is given to bpf_timer_start() with
BPF_F_TIMER_SLEEPABLE

changes in v3:
- extracted the implementation in bpf_timer only, without
bpf_timer_set_sleepable_cb()
- rely on schedule_work() only, from bpf_timer_start()
- add semaphore to ensure bpf_timer_work_cb() is accessing
consistent data

changes in v2 (compared to the one attaches to v1 0/9):
- make use of a kfunc
- add a (non-used) BPF_F_TIMER_SLEEPABLE
- the callback is *not* called, it makes the kernel crashes
---
include/uapi/linux/bpf.h | 13 ++++
kernel/bpf/helpers.c | 159 ++++++++++++++++++++++++++++++++++++++---------
2 files changed, 143 insertions(+), 29 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9585f5345353..f1890eed213a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7469,6 +7469,19 @@ enum {
BPF_F_TIMER_CPU_PIN = (1ULL << 1),
};

+/* Extra flags to control bpf_timer_init() behaviour, in addition to CLOCK_*.
+ * - BPF_F_TIMER_SLEEPABLE: Timer will run in a workqueue in a sleepable
+ * context, with no guarantees of ordering nor timing (consider this as
+ * being just offloaded immediately).
+ */
+enum {
+ /* CLOCK_* are using bits 0 to 3 */
+ BPF_F_TIMER_SLEEPABLE = (1ULL << 4),
+ __MAX_BPF_F_TIMER_INIT,
+};
+
+#define MAX_BPF_F_TIMER_INIT __MAX_BPF_F_TIMER_INIT
+
/* BPF numbers iterator state */
struct bpf_iter_num {
/* opaque iterator state; having __u64 here allows to preserve correct
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9234174ccb21..fd05d4358b31 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1096,12 +1096,19 @@ const struct bpf_func_proto bpf_snprintf_proto = {
* freeing the timers when inner map is replaced or deleted by user space.
*/
struct bpf_hrtimer {
- struct hrtimer timer;
+ union {
+ struct hrtimer timer;
+ struct work_struct work;
+ };
struct bpf_map *map;
struct bpf_prog *prog;
void __rcu *callback_fn;
void *value;
- struct rcu_head rcu;
+ union {
+ struct rcu_head rcu;
+ struct work_struct sync_work;
+ };
+ u64 flags;
};

/* the actual struct hidden inside uapi struct bpf_timer */
@@ -1114,6 +1121,58 @@ struct bpf_timer_kern {
struct bpf_spin_lock lock;
} __attribute__((aligned(8)));

+static void bpf_timer_work_cb(struct work_struct *work)
+{
+ struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, work);
+ struct bpf_tramp_run_ctx __maybe_unused run_ctx;
+ struct bpf_prog *prog = t->prog;
+ struct bpf_map *map = t->map;
+ bpf_callback_t callback_fn;
+ void *value = t->value;
+ unsigned long flags;
+ void *key;
+ u32 idx;
+
+ BTF_TYPE_EMIT(struct bpf_timer);
+
+ callback_fn = READ_ONCE(t->callback_fn);
+ if (!callback_fn || !prog)
+ return;
+
+ if (map->map_type == BPF_MAP_TYPE_ARRAY) {
+ struct bpf_array *array = container_of(map, struct bpf_array, map);
+
+ /* compute the key */
+ idx = ((char *)value - array->value) / array->elem_size;
+ key = &idx;
+ } else { /* hash or lru */
+ key = value - round_up(map->key_size, 8);
+ }
+
+ run_ctx.bpf_cookie = 0;
+
+ if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) {
+ /* recursion detected */
+ __bpf_prog_exit_sleepable_recur(prog, 0, &run_ctx);
+ return;
+ }
+
+ callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
+ /* The verifier checked that return value is zero. */
+
+ __bpf_prog_exit_sleepable_recur(prog, 0 /* bpf_prog_run does runtime stats */,
+ &run_ctx);
+}
+
+static void bpf_timer_sync_work_cb(struct work_struct *work)
+{
+ struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, sync_work);
+
+ cancel_work_sync(&t->work);
+
+ kfree_rcu(t, rcu);
+}
+
static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);

static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
@@ -1155,10 +1214,14 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
return HRTIMER_NORESTART;
}

+#define BPF_TIMER_CLOCK_MASK (MAX_CLOCKS - 1) /* 0xf */
+#define BPF_TIMER_FLAGS_MASK GENMASK_ULL(63, 4)
+
BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map,
u64, flags)
{
- clockid_t clockid = flags & (MAX_CLOCKS - 1);
+ clockid_t clockid = flags & BPF_TIMER_CLOCK_MASK;
+ u64 bpf_timer_flags = flags & BPF_TIMER_FLAGS_MASK;
struct bpf_hrtimer *t;
int ret = 0;

@@ -1169,7 +1232,7 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
if (in_nmi())
return -EOPNOTSUPP;

- if (flags >= MAX_CLOCKS ||
+ if (bpf_timer_flags & ~(BPF_F_TIMER_SLEEPABLE) ||
/* similar to timerfd except _ALARM variants are not supported */
(clockid != CLOCK_MONOTONIC &&
clockid != CLOCK_REALTIME &&
@@ -1190,8 +1253,14 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
t->value = (void *)timer - map->record->timer_off;
t->map = map;
t->prog = NULL;
+ t->flags = flags;
rcu_assign_pointer(t->callback_fn, NULL);
- hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
+ if (flags & BPF_F_TIMER_SLEEPABLE) {
+ INIT_WORK(&t->work, bpf_timer_work_cb);
+ INIT_WORK(&t->sync_work, bpf_timer_sync_work_cb);
+ } else {
+ hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
+ }
t->timer.function = bpf_timer_cb;
WRITE_ONCE(timer->timer, t);
/* Guarantee the order between timer->timer and map->usercnt. So
@@ -1292,6 +1361,11 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
goto out;
}

+ if ((t->flags & BPF_F_TIMER_SLEEPABLE) && nsecs) {
+ ret = -EINVAL;
+ goto out;
+ }
+
if (flags & BPF_F_TIMER_ABS)
mode = HRTIMER_MODE_ABS_SOFT;
else
@@ -1300,7 +1374,10 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
if (flags & BPF_F_TIMER_CPU_PIN)
mode |= HRTIMER_MODE_PINNED;

- hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
+ if (t->flags & BPF_F_TIMER_SLEEPABLE)
+ schedule_work(&t->work);
+ else
+ hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
return ret;
@@ -1329,6 +1406,7 @@ static void drop_prog_refcnt(struct bpf_hrtimer *t)
BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
{
struct bpf_hrtimer *t;
+ u64 flags = 0;
int ret = 0;

if (in_nmi())
@@ -1340,6 +1418,7 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
ret = -EINVAL;
goto out;
}
+ flags = t->flags;
if (this_cpu_read(hrtimer_running) == t) {
/* If bpf callback_fn is trying to bpf_timer_cancel()
* its own timer the hrtimer_cancel() will deadlock
@@ -1351,10 +1430,20 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
drop_prog_refcnt(t);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
- /* Cancel the timer and wait for associated callback to finish
- * if it was running.
- */
- ret = ret ?: hrtimer_cancel(&t->timer);
+
+ if (flags & BPF_F_TIMER_SLEEPABLE) {
+ /* Cancel the sleepable work, but *do not* wait for
+ * it to finish if it was running as we might not be in a
+ * sleepable context
+ */
+ ret = ret ?: cancel_work(&t->work);
+ } else {
+ /* Cancel the timer and wait for associated callback to finish
+ * if it was running.
+ */
+ ret = ret ?: hrtimer_cancel(&t->timer);
+ }
+
rcu_read_unlock();
return ret;
}
@@ -1373,6 +1462,7 @@ void bpf_timer_cancel_and_free(void *val)
{
struct bpf_timer_kern *timer = val;
struct bpf_hrtimer *t;
+ u64 flags;

/* Performance optimization: read timer->timer without lock first. */
if (!READ_ONCE(timer->timer))
@@ -1383,6 +1473,7 @@ void bpf_timer_cancel_and_free(void *val)
t = timer->timer;
if (!t)
goto out;
+ flags = t->flags;
drop_prog_refcnt(t);
/* The subsequent bpf_timer_start/cancel() helpers won't be able to use
* this timer, since it won't be initialized.
@@ -1392,25 +1483,35 @@ void bpf_timer_cancel_and_free(void *val)
__bpf_spin_unlock_irqrestore(&timer->lock);
if (!t)
return;
- /* Cancel the timer and wait for callback to complete if it was running.
- * If hrtimer_cancel() can be safely called it's safe to call kfree(t)
- * right after for both preallocated and non-preallocated maps.
- * The timer->timer = NULL was already done and no code path can
- * see address 't' anymore.
- *
- * Check that bpf_map_delete/update_elem() wasn't called from timer
- * callback_fn. In such case don't call hrtimer_cancel() (since it will
- * deadlock) and don't call hrtimer_try_to_cancel() (since it will just
- * return -1). Though callback_fn is still running on this cpu it's
- * safe to do kfree(t) because bpf_timer_cb() read everything it needed
- * from 't'. The bpf subprog callback_fn won't be able to access 't',
- * since timer->timer = NULL was already done. The timer will be
- * effectively cancelled because bpf_timer_cb() will return
- * HRTIMER_NORESTART.
- */
- if (this_cpu_read(hrtimer_running) != t)
- hrtimer_cancel(&t->timer);
- kfree_rcu(t, rcu);
+
+ if (flags & BPF_F_TIMER_SLEEPABLE) {
+ /* Trigger cancel of the sleepable work, but *do not* wait for
+ * it to finish if it was running as we might not be in a
+ * sleepable context.
+ * kfree will be called once the work has finished.
+ */
+ schedule_work(&t->sync_work);
+ } else {
+ /* Cancel the timer and wait for callback to complete if it was running.
+ * If hrtimer_cancel() can be safely called it's safe to call kfree(t)
+ * right after for both preallocated and non-preallocated maps.
+ * The timer->timer = NULL was already done and no code path can
+ * see address 't' anymore.
+ *
+ * Check that bpf_map_delete/update_elem() wasn't called from timer
+ * callback_fn. In such case don't call hrtimer_cancel() (since it will
+ * deadlock) and don't call hrtimer_try_to_cancel() (since it will just
+ * return -1). Though callback_fn is still running on this cpu it's
+ * safe to do kfree(t) because bpf_timer_cb() read everything it needed
+ * from 't'. The bpf subprog callback_fn won't be able to access 't',
+ * since timer->timer = NULL was already done. The timer will be
+ * effectively cancelled because bpf_timer_cb() will return
+ * HRTIMER_NORESTART.
+ */
+ if (this_cpu_read(hrtimer_running) != t)
+ hrtimer_cancel(&t->timer);
+ kfree_rcu(t, rcu);
+ }
}

BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)

--
2.44.0


2024-04-08 08:10:49

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH RFC bpf-next v6 2/6] bpf: Add support for KF_ARG_PTR_TO_TIMER

From: Kumar Kartikeya Dwivedi <[email protected]>

Introduce support for KF_ARG_PTR_TO_TIMER. The kfuncs will use bpf_timer
as argument and that will be recognized as timer argument by verifier.
bpf_timer_kern casting can happen inside kfunc, but using bpf_timer in
argument makes life easier for users who work with non-kern type in BPF
progs.

Fix up process_timer_func's meta argument usage (ignore if NULL) so that
we can share the same checks for helpers and kfuncs. meta argument is
only needed to ensure bpf_timer_init's timer and map arguments are
coming from the same map (map_uid logic is necessary for correct
inner-map handling).

No such concerns will be necessary for kfuncs as timer initialization
happens using helpers, hence pass NULL to process_timer_func from kfunc
argument handling code to ignore it.

Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Benjamin Tissoires <[email protected]>

---

changes in v6:
- used Kumar's version of the patch
- reverted `+BTF_ID(struct, bpf_timer_kern)`

changes in v5:
- also check for the reg offset

changes in v4:
- enforce KF_ARG_PTR_TO_TIMER to be of type PTR_TO_MAP_VALUE

new in v3 (split from v2 02/10)
---
kernel/bpf/verifier.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ca6cacf7b42f..ccfe9057d8dc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7568,12 +7568,16 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
val + reg->off, map->record->timer_off);
return -EINVAL;
}
+ /* meta is only needed for bpf_timer_init to match timer and map */
+ if (!meta)
+ goto out;
if (meta->map_ptr) {
verbose(env, "verifier bug. Two map pointers in a timer helper\n");
return -EFAULT;
}
meta->map_uid = reg->map_uid;
meta->map_ptr = map;
+out:
return 0;
}

@@ -10826,6 +10830,7 @@ enum {
KF_ARG_LIST_NODE_ID,
KF_ARG_RB_ROOT_ID,
KF_ARG_RB_NODE_ID,
+ KF_ARG_TIMER_ID,
};

BTF_ID_LIST(kf_arg_btf_ids)
@@ -10834,6 +10839,7 @@ BTF_ID(struct, bpf_list_head)
BTF_ID(struct, bpf_list_node)
BTF_ID(struct, bpf_rb_root)
BTF_ID(struct, bpf_rb_node)
+BTF_ID(struct, bpf_timer_kern)

static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
const struct btf_param *arg, int type)
@@ -10877,6 +10883,11 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
}

+static bool is_kfunc_arg_timer(const struct btf *btf, const struct btf_param *arg)
+{
+ return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_TIMER_ID);
+}
+
static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
const struct btf_param *arg)
{
@@ -10946,6 +10957,7 @@ enum kfunc_ptr_arg_type {
KF_ARG_PTR_TO_NULL,
KF_ARG_PTR_TO_CONST_STR,
KF_ARG_PTR_TO_MAP,
+ KF_ARG_PTR_TO_TIMER,
};

enum special_kfunc_type {
@@ -11102,6 +11114,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (is_kfunc_arg_map(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_MAP;

+ if (is_kfunc_arg_timer(meta->btf, &args[argno]))
+ return KF_ARG_PTR_TO_TIMER;
+
if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
if (!btf_type_is_struct(ref_t)) {
verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -11735,6 +11750,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_CALLBACK:
case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
case KF_ARG_PTR_TO_CONST_STR:
+ case KF_ARG_PTR_TO_TIMER:
/* Trusted by default */
break;
default:
@@ -12021,6 +12037,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
if (ret)
return ret;
break;
+ case KF_ARG_PTR_TO_TIMER:
+ if (reg->type != PTR_TO_MAP_VALUE) {
+ verbose(env, "arg#%d doesn't point to a map value\n", i);
+ return -EINVAL;
+ }
+ ret = process_timer_func(env, regno, NULL);
+ if (ret < 0)
+ return ret;
+ break;
}
}


--
2.44.0


2024-04-08 08:11:19

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH RFC bpf-next v6 3/6] bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc

In this patch, bpf_timer_set_sleepable_cb() is functionally equivalent
to bpf_timer_set_callback(), to the exception that it enforces
the timer to be started with BPF_F_TIMER_SLEEPABLE.

But given that bpf_timer_set_callback() is a helper when
bpf_timer_set_sleepable_cb() is a kfunc, we need to teach the verifier
about its attached callback.
Marking that callback as sleepable will be done in a separate patch

Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Benjamin Tissoires <[email protected]>

---

changes in v6:
- adapted for flags being set during timer_init

changes in v5:
- enforced sleepable timers to only have BPF_F_TIMER_SLEEPABLE
- use is_bpf_timer_set_sleepable_cb_impl_kfunc() instead of generic
is_async_cb()

changes in v4:
- added a new (ignored) argument to the kfunc so that we do not
need to wlak the stack

new in v3 (split from v2 02/10)
---
kernel/bpf/helpers.c | 43 ++++++++++++++++++++++++++++++++++--
kernel/bpf/verifier.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 99 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index fd05d4358b31..d6528359b3f4 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1291,8 +1291,8 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
.arg3_type = ARG_ANYTHING,
};

-BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
- struct bpf_prog_aux *, aux)
+static int __bpf_timer_set_callback(struct bpf_timer_kern *timer, void *callback_fn,
+ struct bpf_prog_aux *aux, bool is_sleepable)
{
struct bpf_prog *prev, *prog = aux->prog;
struct bpf_hrtimer *t;
@@ -1306,6 +1306,10 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
ret = -EINVAL;
goto out;
}
+ if (!!(t->flags & BPF_F_TIMER_SLEEPABLE) != is_sleepable) {
+ ret = -EINVAL;
+ goto out;
+ }
if (!atomic64_read(&t->map->usercnt)) {
/* maps with timers must be either held by user space
* or pinned in bpffs. Otherwise timer might still be
@@ -1336,6 +1340,12 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
return ret;
}

+BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
+ struct bpf_prog_aux *, aux)
+{
+ return __bpf_timer_set_callback(timer, callback_fn, aux, false);
+}
+
static const struct bpf_func_proto bpf_timer_set_callback_proto = {
.func = bpf_timer_set_callback,
.gpl_only = true,
@@ -2650,6 +2660,34 @@ __bpf_kfunc void bpf_throw(u64 cookie)
WARN(1, "A call to BPF exception callback should never return\n");
}

+/**
+ * bpf_timer_set_sleepable_cb_impl() - Configure the timer to call %callback_fn
+ * static function in a sleepable context.
+ * @timer: The bpf_timer that needs to be configured
+ * @callback_fn: a static bpf function
+ *
+ * @returns %0 on success. %-EINVAL if %timer was not initialized with
+ * bpf_timer_init() earlier. %-EPERM if %timer is in a map that doesn't
+ * have any user references.
+ * The user space should either hold a file descriptor to a map with timers
+ * or pin such map in bpffs. When map is unpinned or file descriptor is
+ * closed all timers in the map will be cancelled and freed.
+ *
+ * This kfunc is equivalent to %bpf_timer_set_callback except that it tells
+ * the verifier that the target callback is run in a sleepable context.
+ */
+__bpf_kfunc int bpf_timer_set_sleepable_cb_impl(struct bpf_timer_kern *timer,
+ int (callback_fn)(void *map, int *key, struct bpf_timer *timer),
+ void *aux__ign)
+{
+ struct bpf_prog_aux *aux = (struct bpf_prog_aux *)aux__ign;
+
+ if (!aux)
+ return -EINVAL;
+
+ return __bpf_timer_set_callback(timer, (void *)callback_fn, aux, true);
+}
+
__bpf_kfunc_end_defs();

BTF_KFUNCS_START(generic_btf_ids)
@@ -2726,6 +2764,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
BTF_ID_FLAGS(func, bpf_dynptr_size)
BTF_ID_FLAGS(func, bpf_dynptr_clone)
+BTF_ID_FLAGS(func, bpf_timer_set_sleepable_cb_impl)
BTF_KFUNCS_END(common_btf_ids)

static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ccfe9057d8dc..00ac3a3a5f01 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -501,8 +501,12 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
}

static bool is_sync_callback_calling_kfunc(u32 btf_id);
+static bool is_async_callback_calling_kfunc(u32 btf_id);
+static bool is_callback_calling_kfunc(u32 btf_id);
static bool is_bpf_throw_kfunc(struct bpf_insn *insn);

+static bool is_bpf_timer_set_sleepable_cb_impl_kfunc(u32 btf_id);
+
static bool is_sync_callback_calling_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_for_each_map_elem ||
@@ -530,7 +534,8 @@ static bool is_sync_callback_calling_insn(struct bpf_insn *insn)

static bool is_async_callback_calling_insn(struct bpf_insn *insn)
{
- return bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm);
+ return (bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm)) ||
+ (bpf_pseudo_kfunc_call(insn) && is_async_callback_calling_kfunc(insn->imm));
}

static bool is_may_goto_insn(struct bpf_insn *insn)
@@ -9475,7 +9480,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
*/
env->subprog_info[subprog].is_cb = true;
if (bpf_pseudo_kfunc_call(insn) &&
- !is_sync_callback_calling_kfunc(insn->imm)) {
+ !is_callback_calling_kfunc(insn->imm)) {
verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
func_id_name(insn->imm), insn->imm);
return -EFAULT;
@@ -10984,6 +10989,7 @@ enum special_kfunc_type {
KF_bpf_percpu_obj_drop_impl,
KF_bpf_throw,
KF_bpf_iter_css_task_new,
+ KF_bpf_timer_set_sleepable_cb_impl,
};

BTF_SET_START(special_kfunc_set)
@@ -11010,6 +11016,7 @@ BTF_ID(func, bpf_throw)
#ifdef CONFIG_CGROUPS
BTF_ID(func, bpf_iter_css_task_new)
#endif
+BTF_ID(func, bpf_timer_set_sleepable_cb_impl)
BTF_SET_END(special_kfunc_set)

BTF_ID_LIST(special_kfunc_list)
@@ -11040,6 +11047,7 @@ BTF_ID(func, bpf_iter_css_task_new)
#else
BTF_ID_UNUSED
#endif
+BTF_ID(func, bpf_timer_set_sleepable_cb_impl)

static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
{
@@ -11368,12 +11376,28 @@ static bool is_sync_callback_calling_kfunc(u32 btf_id)
return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl];
}

+static bool is_async_callback_calling_kfunc(u32 btf_id)
+{
+ return btf_id == special_kfunc_list[KF_bpf_timer_set_sleepable_cb_impl];
+}
+
static bool is_bpf_throw_kfunc(struct bpf_insn *insn)
{
return bpf_pseudo_kfunc_call(insn) && insn->off == 0 &&
insn->imm == special_kfunc_list[KF_bpf_throw];
}

+static bool is_bpf_timer_set_sleepable_cb_impl_kfunc(u32 btf_id)
+{
+ return btf_id == special_kfunc_list[KF_bpf_timer_set_sleepable_cb_impl];
+}
+
+static bool is_callback_calling_kfunc(u32 btf_id)
+{
+ return is_sync_callback_calling_kfunc(btf_id) ||
+ is_async_callback_calling_kfunc(btf_id);
+}
+
static bool is_rbtree_lock_required_kfunc(u32 btf_id)
{
return is_bpf_rbtree_api_kfunc(btf_id);
@@ -12157,6 +12181,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
}
}

+ if (is_bpf_timer_set_sleepable_cb_impl_kfunc(meta.func_id)) {
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ set_timer_callback_state);
+ if (err) {
+ verbose(env, "kfunc %s#%d failed callback verification\n",
+ func_name, meta.func_id);
+ return err;
+ }
+ }
+
rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta);
rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta);

@@ -19559,6 +19593,28 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
*cnt = 1;
+ } else if (is_bpf_timer_set_sleepable_cb_impl_kfunc(desc->func_id)) {
+ /* The verifier will process callback_fn as many times as necessary
+ * with different maps and the register states prepared by
+ * set_timer_callback_state will be accurate.
+ *
+ * The following use case is valid:
+ * map1 is shared by prog1, prog2, prog3.
+ * prog1 calls bpf_timer_init for some map1 elements
+ * prog2 calls bpf_timer_set_callback for some map1 elements.
+ * Those that were not bpf_timer_init-ed will return -EINVAL.
+ * prog3 calls bpf_timer_start for some map1 elements.
+ * Those that were not both bpf_timer_init-ed and
+ * bpf_timer_set_callback-ed will return -EINVAL.
+ */
+ struct bpf_insn ld_addrs[2] = {
+ BPF_LD_IMM64(BPF_REG_3, (long)env->prog->aux),
+ };
+
+ insn_buf[0] = ld_addrs[0];
+ insn_buf[1] = ld_addrs[1];
+ insn_buf[2] = *insn;
+ *cnt = 3;
}
return 0;
}

--
2.44.0


2024-04-08 08:11:35

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH RFC bpf-next v6 4/6] bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable

Now that we have bpf_timer_set_sleepable_cb() available and working, we
can tag the attached callback as sleepable, and let the verifier check
in the correct context the calls and kfuncs.

Signed-off-by: Benjamin Tissoires <[email protected]>

---

no changes in v6

no changes in v5

changes in v4:
- use a function parameter to forward the sleepable information

new in v3 (split from v2 02/10)
---
include/linux/bpf_verifier.h | 1 +
kernel/bpf/verifier.c | 13 ++++++++++---
2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7cb1b75eee38..14e4ee67b694 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -426,6 +426,7 @@ struct bpf_verifier_state {
* while they are still in use.
*/
bool used_as_loop_entry;
+ bool in_sleepable;

/* first and last insn idx of this verifier state */
u32 first_insn_idx;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 00ac3a3a5f01..b75a8aca6ec9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1434,6 +1434,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
}
dst_state->speculative = src->speculative;
dst_state->active_rcu_lock = src->active_rcu_lock;
+ dst_state->in_sleepable = src->in_sleepable;
dst_state->curframe = src->curframe;
dst_state->active_lock.ptr = src->active_lock.ptr;
dst_state->active_lock.id = src->active_lock.id;
@@ -2407,7 +2408,7 @@ static void init_func_state(struct bpf_verifier_env *env,
/* Similar to push_stack(), but for async callbacks */
static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
int insn_idx, int prev_insn_idx,
- int subprog)
+ int subprog, bool is_sleepable)
{
struct bpf_verifier_stack_elem *elem;
struct bpf_func_state *frame;
@@ -2434,6 +2435,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
* Initialize it similar to do_check_common().
*/
elem->st.branches = 1;
+ elem->st.in_sleepable = is_sleepable;
frame = kzalloc(sizeof(*frame), GFP_KERNEL);
if (!frame)
goto err;
@@ -5279,7 +5281,8 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,

static bool in_sleepable(struct bpf_verifier_env *env)
{
- return env->prog->sleepable;
+ return env->prog->sleepable ||
+ (env->cur_state && env->cur_state->in_sleepable);
}

/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
@@ -9497,7 +9500,8 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
/* there is no real recursion here. timer callbacks are async */
env->subprog_info[subprog].is_async_cb = true;
async_cb = push_async_cb(env, env->subprog_info[subprog].start,
- insn_idx, subprog);
+ insn_idx, subprog,
+ is_bpf_timer_set_sleepable_cb_impl_kfunc(insn->imm));
if (!async_cb)
return -EFAULT;
callee = async_cb->frame[0];
@@ -16947,6 +16951,9 @@ static bool states_equal(struct bpf_verifier_env *env,
if (old->active_rcu_lock != cur->active_rcu_lock)
return false;

+ if (old->in_sleepable != cur->in_sleepable)
+ return false;
+
/* for states to be equal callsites have to be the same
* and all frame states need to be equivalent
*/

--
2.44.0


2024-04-08 08:12:17

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH RFC bpf-next v6 5/6] tools: sync include/uapi/linux/bpf.h

cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h

Signed-off-by: Benjamin Tissoires <[email protected]>

---

updated in v6

no changes in v5

new in v4
---
tools/include/uapi/linux/bpf.h | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index bf80b614c4db..f1890eed213a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1662,9 +1662,10 @@ union bpf_attr {
} query;

struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
- __u64 name;
- __u32 prog_fd;
- __aligned_u64 cookie;
+ __u64 name;
+ __u32 prog_fd;
+ __u32 :32;
+ __aligned_u64 cookie;
} raw_tracepoint;

struct { /* anonymous struct for BPF_BTF_LOAD */
@@ -7468,6 +7469,19 @@ enum {
BPF_F_TIMER_CPU_PIN = (1ULL << 1),
};

+/* Extra flags to control bpf_timer_init() behaviour, in addition to CLOCK_*.
+ * - BPF_F_TIMER_SLEEPABLE: Timer will run in a workqueue in a sleepable
+ * context, with no guarantees of ordering nor timing (consider this as
+ * being just offloaded immediately).
+ */
+enum {
+ /* CLOCK_* are using bits 0 to 3 */
+ BPF_F_TIMER_SLEEPABLE = (1ULL << 4),
+ __MAX_BPF_F_TIMER_INIT,
+};
+
+#define MAX_BPF_F_TIMER_INIT __MAX_BPF_F_TIMER_INIT
+
/* BPF numbers iterator state */
struct bpf_iter_num {
/* opaque iterator state; having __u64 here allows to preserve correct

--
2.44.0


2024-04-08 08:13:13

by Benjamin Tissoires

[permalink] [raw]
Subject: [PATCH RFC bpf-next v6 6/6] selftests/bpf: add sleepable timer tests

bpf_experimental.h and ../bpf_testmod/bpf_testmod_kfunc.h are both
including vmlinux.h, which is not compatible with including time.h
or bpf_tcp_helpers.h.

So keep sleepable tests in a separate bpf source file.

The first correct test is run twice for convenience:
- first through RUN_TESTS
- then we ensure that the timer was actually executed, in a manual
load/attach/run

Signed-off-by: Benjamin Tissoires <[email protected]>

---

changes in v6:
- use of new _init API with BPF_F_TIMER_SLEEPABLE

changes in v5b:
- do not use SEC(syscall) as they are run in a sleepable context
already
- duplicate test_call_sleepable() so we can test if the callback gets
called

changes in v5:
- keep sleepable test separate

new in v4
---
tools/testing/selftests/bpf/bpf_experimental.h | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 5 +
.../selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h | 1 +
tools/testing/selftests/bpf/prog_tests/timer.c | 34 ++++
.../testing/selftests/bpf/progs/timer_sleepable.c | 213 +++++++++++++++++++++
5 files changed, 258 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index a5b9df38c162..06ed98b3bc51 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -459,4 +459,9 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;

+extern int bpf_timer_set_sleepable_cb_impl(struct bpf_timer *timer,
+ int (callback_fn)(void *map, int *key, struct bpf_timer *timer),
+ void *aux__ign) __ksym;
+#define bpf_timer_set_sleepable_cb(timer, cb) \
+ bpf_timer_set_sleepable_cb_impl(timer, cb, NULL)
#endif
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 39ad96a18123..eb2b78552ca2 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -494,6 +494,10 @@ __bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused
return arg;
}

+__bpf_kfunc void bpf_kfunc_call_test_sleepable(void)
+{
+}
+
BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids)
BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc)
BTF_ID_FLAGS(func, bpf_kfunc_call_test1)
@@ -520,6 +524,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_sleepable, KF_SLEEPABLE)
BTF_KFUNCS_END(bpf_testmod_check_kfunc_ids)

static int bpf_testmod_ops_init(struct btf *btf)
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
index 7c664dd61059..ce5cd763561c 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h
@@ -96,6 +96,7 @@ void bpf_kfunc_call_test_pass2(struct prog_test_pass2 *p) __ksym;
void bpf_kfunc_call_test_mem_len_fail2(__u64 *mem, int len) __ksym;

void bpf_kfunc_call_test_destructive(void) __ksym;
+void bpf_kfunc_call_test_sleepable(void) __ksym;

void bpf_kfunc_call_test_offset(struct prog_test_ref_kfunc *p);
struct prog_test_member *bpf_kfunc_call_memb_acquire(void);
diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c
index d66687f1ee6a..c6c7c623b31c 100644
--- a/tools/testing/selftests/bpf/prog_tests/timer.c
+++ b/tools/testing/selftests/bpf/prog_tests/timer.c
@@ -3,6 +3,7 @@
#include <test_progs.h>
#include "timer.skel.h"
#include "timer_failure.skel.h"
+#include "timer_sleepable.skel.h"

#define NUM_THR 8

@@ -95,3 +96,36 @@ void serial_test_timer(void)

RUN_TESTS(timer_failure);
}
+
+/* isolate sleepable tests from main timer tests
+ * because if test_timer fails, it spews the console
+ * with 10000 * 5 "spin_lock_thread:PASS:test_run_opts retval 0 nsec"
+ */
+void serial_test_sleepable_timer(void)
+{
+ struct timer_sleepable *timer_sleepable_skel = NULL;
+ int err, prog_fd;
+
+ LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+ RUN_TESTS(timer_sleepable);
+
+ /* re-run the success test to check if the timer was actually executed */
+
+ timer_sleepable_skel = timer_sleepable__open_and_load();
+ if (!ASSERT_OK_PTR(timer_sleepable_skel, "timer_sleepable_skel_load"))
+ return;
+
+ err = timer_sleepable__attach(timer_sleepable_skel);
+ if (!ASSERT_OK(err, "timer_sleepable_attach"))
+ return;
+
+ prog_fd = bpf_program__fd(timer_sleepable_skel->progs.test_syscall_sleepable);
+ err = bpf_prog_test_run_opts(prog_fd, &topts);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(topts.retval, 0, "test_run");
+
+ usleep(50); /* 10 usecs should be enough, but give it extra */
+
+ ASSERT_EQ(timer_sleepable_skel->bss->ok_sleepable, 1, "ok_sleepable");
+}
diff --git a/tools/testing/selftests/bpf/progs/timer_sleepable.c b/tools/testing/selftests/bpf/progs/timer_sleepable.c
new file mode 100644
index 000000000000..fc7829d8b6c4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/timer_sleepable.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook
+ * Copyright (c) 2024 Benjamin Tissoires
+ */
+
+#include "bpf_experimental.h"
+#include <bpf/bpf_helpers.h>
+#include "bpf_misc.h"
+#include "../bpf_testmod/bpf_testmod_kfunc.h"
+
+char _license[] SEC("license") = "GPL";
+
+#define CLOCK_MONOTONIC 1
+
+struct elem {
+ struct bpf_timer t;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(max_entries, 3);
+ __type(key, int);
+ __type(value, struct elem);
+} timer_map SEC(".maps");
+
+__u32 ok_sleepable;
+
+/* callback for sleepable timer */
+static int timer_cb_sleepable(void *map, int *key, struct bpf_timer *timer)
+{
+ bpf_kfunc_call_test_sleepable();
+ ok_sleepable |= (1 << *key);
+ return 0;
+}
+
+SEC("tc")
+/* check that calling bpf_timer_start() with BPF_F_TIMER_SLEEPABLE on a sleepable
+ * callback works
+ */
+__retval(0)
+long test_call_sleepable(void *ctx)
+{
+ int key = 0;
+ struct bpf_timer *timer;
+
+ if (ok_sleepable & 1)
+ return -1;
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (timer) {
+ if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE) != 0)
+ return -2;
+ bpf_timer_set_sleepable_cb(timer, timer_cb_sleepable);
+ if (bpf_timer_start(timer, 0, 0))
+ return -3;
+ } else {
+ return -4;
+ }
+
+ return 0;
+}
+
+SEC("syscall")
+/* check that calling bpf_timer_start() with BPF_F_TIMER_SLEEPABLE on a sleepable
+ * callback works.
+ */
+__retval(0)
+long test_syscall_sleepable(void *ctx)
+{
+ int key = 0;
+ struct bpf_timer *timer;
+
+ if (ok_sleepable & 1)
+ return -1;
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (timer) {
+ if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE) != 0)
+ return -2;
+ bpf_timer_set_sleepable_cb(timer, timer_cb_sleepable);
+ if (bpf_timer_start(timer, 0, 0))
+ return -3;
+ } else {
+ return -4;
+ }
+
+ return 0;
+}
+
+SEC("?tc")
+__log_level(2)
+__failure
+/* check that bpf_timer_set_callback() can not be called with a
+ * sleepable callback
+ */
+__msg("mark_precise: frame0: regs=r1 stack= before")
+__msg(": (85) call bpf_kfunc_call_test_sleepable#") /* anchor message */
+__msg("program must be sleepable to call sleepable kfunc bpf_kfunc_call_test_sleepable")
+long test_non_sleepable_sleepable_callback(void *ctx)
+{
+ int key = 0;
+ struct bpf_timer *timer;
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (timer) {
+ bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE);
+ bpf_timer_set_callback(timer, timer_cb_sleepable);
+ bpf_timer_start(timer, 0, 0);
+ }
+
+ return 0;
+}
+
+SEC("tc")
+/* check that calling bpf_timer_set_sleepable_cb() without a sleepable timer
+ * initialized with BPF_F_TIMER_SLEEPABLE on a sleepable
+ * callback is returning -EINVAL
+ */
+__retval(3)
+long test_call_sleepable_missing_flag(void *ctx)
+{
+ int key = 1;
+ struct bpf_timer *timer;
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (!timer)
+ return 1;
+
+ if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC))
+ return 2;
+
+ if (bpf_timer_set_sleepable_cb(timer, timer_cb_sleepable))
+ return 3;
+
+ return bpf_timer_start(timer, 0, 0);
+}
+
+SEC("tc")
+/* check that calling bpf_timer_start() with a delay on a sleepable
+ * callback is returning -EINVAL
+ */
+__retval(-22)
+long test_call_sleepable_delay(void *ctx)
+{
+ int key = 2;
+ struct bpf_timer *timer;
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (!timer)
+ return 1;
+
+ if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE))
+ return 2;
+
+ if (bpf_timer_set_sleepable_cb(timer, timer_cb_sleepable))
+ return 3;
+
+ return bpf_timer_start(timer, 1, 0);
+}
+
+SEC("?tc")
+__log_level(2)
+__failure
+/* check that the first argument of bpf_timer_set_callback()
+ * is a correct bpf_timer pointer.
+ */
+__msg("mark_precise: frame0: regs=r1 stack= before")
+__msg(": (85) call bpf_timer_set_sleepable_cb_impl#") /* anchor message */
+__msg("arg#0 doesn't point to a map value")
+long test_wrong_pointer(void *ctx)
+{
+ int key = 0;
+ struct bpf_timer *timer;
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (!timer)
+ return 1;
+
+ if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE))
+ return 2;
+
+ if (bpf_timer_set_sleepable_cb((void *)&timer, timer_cb_sleepable))
+ return 3;
+
+ return -22;
+}
+
+SEC("?tc")
+__log_level(2)
+__failure
+/* check that the first argument of bpf_timer_set_callback()
+ * is a correct bpf_timer pointer.
+ */
+__msg("mark_precise: frame0: regs=r1 stack= before")
+__msg(": (85) call bpf_timer_set_sleepable_cb_impl#") /* anchor message */
+__msg("off 1 doesn't point to 'struct bpf_timer' that is at 0")
+long test_wrong_pointer_offset(void *ctx)
+{
+ int key = 0;
+ struct bpf_timer *timer;
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (!timer)
+ return 1;
+
+ if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE))
+ return 2;
+
+ if (bpf_timer_set_sleepable_cb((void *)timer + 1, timer_cb_sleepable))
+ return 3;
+
+ return -22;
+}

--
2.44.0


2024-04-08 14:25:15

by Eduard Zingerman

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 3/6] bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc

On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
[...]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index fd05d4358b31..d6528359b3f4 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c

[...]

> @@ -2726,6 +2764,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> BTF_ID_FLAGS(func, bpf_dynptr_size)
> BTF_ID_FLAGS(func, bpf_dynptr_clone)
> +BTF_ID_FLAGS(func, bpf_timer_set_sleepable_cb_impl)

Note:
this hunk does not apply cleanly on top of current master.
The line 'BTF_ID_FLAGS(func, bpf_modify_return_test_tp)'
was added to the list since last time current patch-set was merged.

> BTF_KFUNCS_END(common_btf_ids)
>
> static const struct btf_kfunc_id_set common_kfunc_set = {

2024-04-08 17:08:45

by Eduard Zingerman

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 1/6] bpf/helpers: introduce sleepable bpf_timers

On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:

[...]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9234174ccb21..fd05d4358b31 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1096,12 +1096,19 @@ const struct bpf_func_proto bpf_snprintf_proto = {
> * freeing the timers when inner map is replaced or deleted by user space.
> */
> struct bpf_hrtimer {
> - struct hrtimer timer;
> + union {
> + struct hrtimer timer;
> + struct work_struct work;
> + };
> struct bpf_map *map;
> struct bpf_prog *prog;
> void __rcu *callback_fn;
> void *value;
> - struct rcu_head rcu;
> + union {
> + struct rcu_head rcu;
> + struct work_struct sync_work;

Nit:
I find this name very confusing, the field is used to cancel timer
execution, is it a convention to call such things '...sync...'?

> + };
> + u64 flags;
> };
>

[...]

> +static void bpf_timer_sync_work_cb(struct work_struct *work)
> +{
> + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, sync_work);
> +
> + cancel_work_sync(&t->work);
> +
> + kfree_rcu(t, rcu);

Sorry, I might be wrong, but this looks suspicious.
The 'rcu' field of 'bpf_hrtimer' is defined as follows:

struct bpf_hrtimer {
...
union {
struct rcu_head rcu;
struct work_struct sync_work;
};
...
};

And for sleepable timers the 'sync_work' field is set as follows:

BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map,
u64, flags)
{
...
INIT_WORK(&t->sync_work, bpf_timer_sync_work_cb);
...
}

So, it looks like 'kfree_rcu' would be called for a non-rcu pointer.

> +}
> +



2024-04-08 17:18:08

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 3/6] bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc

On Mon, Apr 8, 2024 at 4:31 PM Eduard Zingerman <[email protected]> wrote:
>
> On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
> [...]
>
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index fd05d4358b31..d6528359b3f4 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
>
> [...]
>
> > @@ -2726,6 +2764,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> > BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> > BTF_ID_FLAGS(func, bpf_dynptr_size)
> > BTF_ID_FLAGS(func, bpf_dynptr_clone)
> > +BTF_ID_FLAGS(func, bpf_timer_set_sleepable_cb_impl)
>
> Note:
> this hunk does not apply cleanly on top of current master.
> The line 'BTF_ID_FLAGS(func, bpf_modify_return_test_tp)'
> was added to the list since last time current patch-set was merged.


Oops, thanks for the update.

Just to be clear, I already mentioned it in the cover letter, but this
series is not intended to be merged just now (thus RFC again). The
plan is to add a new bpf_wq API on the side, and compare it with this
v6 to see which one is best, because I am trying to force the
workqueue API into a timer, when it's getting further and further away
from each other.

Cheers,
Benjamin


>
>
> > BTF_KFUNCS_END(common_btf_ids)
> >
> > static const struct btf_kfunc_id_set common_kfunc_set = {
>


2024-04-08 17:20:54

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 1/6] bpf/helpers: introduce sleepable bpf_timers

On Mon, Apr 8, 2024 at 7:08 PM Eduard Zingerman <[email protected]> wrote:
>
> On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
>
> [...]
>
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 9234174ccb21..fd05d4358b31 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -1096,12 +1096,19 @@ const struct bpf_func_proto bpf_snprintf_proto = {
> > * freeing the timers when inner map is replaced or deleted by user space.
> > */
> > struct bpf_hrtimer {
> > - struct hrtimer timer;
> > + union {
> > + struct hrtimer timer;
> > + struct work_struct work;
> > + };
> > struct bpf_map *map;
> > struct bpf_prog *prog;
> > void __rcu *callback_fn;
> > void *value;
> > - struct rcu_head rcu;
> > + union {
> > + struct rcu_head rcu;
> > + struct work_struct sync_work;
>
> Nit:
> I find this name very confusing, the field is used to cancel timer
> execution, is it a convention to call such things '...sync...'?
>
> > + };
> > + u64 flags;
> > };
> >
>
> [...]
>
> > +static void bpf_timer_sync_work_cb(struct work_struct *work)
> > +{
> > + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, sync_work);
> > +
> > + cancel_work_sync(&t->work);
> > +
> > + kfree_rcu(t, rcu);
>
> Sorry, I might be wrong, but this looks suspicious.
> The 'rcu' field of 'bpf_hrtimer' is defined as follows:
>
> struct bpf_hrtimer {
> ...
> union {
> struct rcu_head rcu;
> struct work_struct sync_work;
> };
> ...
> };
>
> And for sleepable timers the 'sync_work' field is set as follows:
>
> BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map,
> u64, flags)
> {
> ...
> INIT_WORK(&t->sync_work, bpf_timer_sync_work_cb);
> ...
> }
>
> So, it looks like 'kfree_rcu' would be called for a non-rcu pointer.

That was my initial assumption too, but Alexei told me it was fine.
And I think he is correct because kfree_rcu doesn't need the rcu_head
to be initialized.

So in the end, we initialize the memory as a work_struct, and when
that work kicks in, we reuse that exact same memory as the rcu_head.
This is fine because that work will never be reused.

If I understand correctly, this is to save a few bytes as this is a
critical struct used in programs with a high rate usage, and every
byte counts.

Cheers,
Benjamin

>
> > +}
> > +
>
>


2024-04-08 21:18:19

by Eduard Zingerman

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 1/6] bpf/helpers: introduce sleepable bpf_timers

On Mon, 2024-04-08 at 19:20 +0200, Benjamin Tissoires wrote:

[...]

> That was my initial assumption too, but Alexei told me it was fine.
> And I think he is correct because kfree_rcu doesn't need the rcu_head
> to be initialized.
>
> So in the end, we initialize the memory as a work_struct, and when
> that work kicks in, we reuse that exact same memory as the rcu_head.
> This is fine because that work will never be reused.

Oh, I get it, thank you for explanation.

Thanks,
Eduard

2024-04-08 21:56:08

by Eduard Zingerman

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 2/6] bpf: Add support for KF_ARG_PTR_TO_TIMER

On Mon, 2024-04-08 at 10:09 +0200, [email protected] wrote:
> From: Kumar Kartikeya Dwivedi <[email protected]>
>
> Introduce support for KF_ARG_PTR_TO_TIMER. The kfuncs will use bpf_timer
> as argument and that will be recognized as timer argument by verifier.
> bpf_timer_kern casting can happen inside kfunc, but using bpf_timer in
> argument makes life easier for users who work with non-kern type in BPF
> progs.
>
> Fix up process_timer_func's meta argument usage (ignore if NULL) so that
> we can share the same checks for helpers and kfuncs. meta argument is
> only needed to ensure bpf_timer_init's timer and map arguments are
> coming from the same map (map_uid logic is necessary for correct
> inner-map handling).
>
> No such concerns will be necessary for kfuncs as timer initialization
> happens using helpers, hence pass NULL to process_timer_func from kfunc
> argument handling code to ignore it.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
> Signed-off-by: Benjamin Tissoires <[email protected]>
>
> ---

Acked-by: Eduard Zingerman <[email protected]>

2024-04-08 22:36:14

by Eduard Zingerman

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 4/6] bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable

On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
> Now that we have bpf_timer_set_sleepable_cb() available and working, we
> can tag the attached callback as sleepable, and let the verifier check
> in the correct context the calls and kfuncs.
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
>
> ---

I think this patch is fine with one nit regarding in_sleepable().
Acked-by: Eduard Zingerman <[email protected]>

> @@ -5279,7 +5281,8 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>
> static bool in_sleepable(struct bpf_verifier_env *env)
> {
> - return env->prog->sleepable;
> + return env->prog->sleepable ||
> + (env->cur_state && env->cur_state->in_sleepable);
> }

Sorry, I already raised this before.
As far as I understand the 'env->cur_state' check is needed because
this function is used from do_misc_fixups():

if (is_storage_get_function(insn->imm)) {
if (!in_sleepable(env) ||
env->insn_aux_data[i + delta].storage_get_func_atomic)
insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
else
insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
insn_buf[1] = *insn;
cnt = 2;
...
}

For a timer callback function 'env->prog->sleepable' would be false.
Which means that inside sleepable callback function GFP_ATOMIC would
be used in cases where GFP_KERNEL would be sufficient.
An alternative would be to check (and set) sleepable flag not for a
full program but for a subprogram.

Whether or not this is something worth addressing I don't know.

[...]

2024-04-08 23:01:20

by Eduard Zingerman

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 6/6] selftests/bpf: add sleepable timer tests

On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
> bpf_experimental.h and ../bpf_testmod/bpf_testmod_kfunc.h are both
> including vmlinux.h, which is not compatible with including time.h
> or bpf_tcp_helpers.h.
>
> So keep sleepable tests in a separate bpf source file.
>
> The first correct test is run twice for convenience:
> - first through RUN_TESTS
> - then we ensure that the timer was actually executed, in a manual
> load/attach/run
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
>
> ---

Acked-by: Eduard Zingerman <[email protected]>
(With a few nitpicks)

> diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c
> index d66687f1ee6a..c6c7c623b31c 100644
> --- a/tools/testing/selftests/bpf/prog_tests/timer.c
> +++ b/tools/testing/selftests/bpf/prog_tests/timer.c

[...]

> +void serial_test_sleepable_timer(void)
> +{
> + struct timer_sleepable *timer_sleepable_skel = NULL;
> + int err, prog_fd;
> +
> + LIBBPF_OPTS(bpf_test_run_opts, topts);
> +
> + RUN_TESTS(timer_sleepable);
> +
> + /* re-run the success test to check if the timer was actually executed */
> +
> + timer_sleepable_skel = timer_sleepable__open_and_load();
> + if (!ASSERT_OK_PTR(timer_sleepable_skel, "timer_sleepable_skel_load"))
> + return;
> +
> + err = timer_sleepable__attach(timer_sleepable_skel);
> + if (!ASSERT_OK(err, "timer_sleepable_attach"))
> + return;

Nit: this should call timer_sleepable__destroy();

> +
> + prog_fd = bpf_program__fd(timer_sleepable_skel->progs.test_syscall_sleepable);
> + err = bpf_prog_test_run_opts(prog_fd, &topts);
> + ASSERT_OK(err, "test_run");
> + ASSERT_EQ(topts.retval, 0, "test_run");
> +
> + usleep(50); /* 10 usecs should be enough, but give it extra */
> +
> + ASSERT_EQ(timer_sleepable_skel->bss->ok_sleepable, 1, "ok_sleepable");

Nit: same as above.

> +}
> diff --git a/tools/testing/selftests/bpf/progs/timer_sleepable.c b/tools/testing/selftests/bpf/progs/timer_sleepable.c
> new file mode 100644
> index 000000000000..fc7829d8b6c4
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/timer_sleepable.c

[...]

> +SEC("tc")
> +/* check that calling bpf_timer_start() with BPF_F_TIMER_SLEEPABLE on a sleepable
> + * callback works
> + */
> +__retval(0)
> +long test_call_sleepable(void *ctx)
> +{
> + int key = 0;
> + struct bpf_timer *timer;
> +
> + if (ok_sleepable & 1)
> + return -1;
> +
> + timer = bpf_map_lookup_elem(&timer_map, &key);
> + if (timer) {
> + if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE) != 0)
> + return -2;
> + bpf_timer_set_sleepable_cb(timer, timer_cb_sleepable);
> + if (bpf_timer_start(timer, 0, 0))
> + return -3;
> + } else {
> + return -4;
> + }
> +
> + return 0;
> +}
> +
> +SEC("syscall")
> +/* check that calling bpf_timer_start() with BPF_F_TIMER_SLEEPABLE on a sleepable
> + * callback works.
> + */
> +__retval(0)
> +long test_syscall_sleepable(void *ctx)
> +{

Nit: the body of this function is the same as in test_call_sleepable(),
maybe factor it out as an auxiliary static function?
(also, should these tests use different 'key' ?)

> + int key = 0;
> + struct bpf_timer *timer;
> +
> + if (ok_sleepable & 1)
> + return -1;
> +
> + timer = bpf_map_lookup_elem(&timer_map, &key);
> + if (timer) {
> + if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE) != 0)
> + return -2;
> + bpf_timer_set_sleepable_cb(timer, timer_cb_sleepable);
> + if (bpf_timer_start(timer, 0, 0))
> + return -3;
> + } else {
> + return -4;
> + }
> +
> + return 0;
> +}

[...]

> +SEC("tc")
> +/* check that calling bpf_timer_start() with a delay on a sleepable
> + * callback is returning -EINVAL
> + */
> +__retval(-22)
> +long test_call_sleepable_delay(void *ctx)
> +{
> + int key = 2;
> + struct bpf_timer *timer;
> +
> + timer = bpf_map_lookup_elem(&timer_map, &key);
> + if (!timer)
> + return 1;
> +
> + if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE))
> + return 2;
> +
> + if (bpf_timer_set_sleepable_cb(timer, timer_cb_sleepable))
> + return 3;
> +
> + return bpf_timer_start(timer, 1, 0);

Q: should verifier statically check that 3rd parameter is zero for sleepable timers?
(same question for call to bpf_timer_set_sleepable_cb() with non-sleepable map)

[...]



2024-04-09 03:04:43

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 1/6] bpf/helpers: introduce sleepable bpf_timers

On Mon, Apr 8, 2024 at 10:20 AM Benjamin Tissoires
<[email protected]> wrote:
>
> On Mon, Apr 8, 2024 at 7:08 PM Eduard Zingerman <[email protected]> wrote:
> >
> > On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
> >
> > [...]
> >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index 9234174ccb21..fd05d4358b31 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -1096,12 +1096,19 @@ const struct bpf_func_proto bpf_snprintf_proto = {
> > > * freeing the timers when inner map is replaced or deleted by user space.
> > > */
> > > struct bpf_hrtimer {
> > > - struct hrtimer timer;
> > > + union {
> > > + struct hrtimer timer;
> > > + struct work_struct work;
> > > + };
> > > struct bpf_map *map;
> > > struct bpf_prog *prog;
> > > void __rcu *callback_fn;
> > > void *value;
> > > - struct rcu_head rcu;
> > > + union {
> > > + struct rcu_head rcu;
> > > + struct work_struct sync_work;
> >
> > Nit:
> > I find this name very confusing, the field is used to cancel timer
> > execution, is it a convention to call such things '...sync...'?
> >
> > > + };
> > > + u64 flags;
> > > };
> > >
> >
> > [...]
> >
> > > +static void bpf_timer_sync_work_cb(struct work_struct *work)
> > > +{
> > > + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, sync_work);
> > > +
> > > + cancel_work_sync(&t->work);
> > > +
> > > + kfree_rcu(t, rcu);
> >
> > Sorry, I might be wrong, but this looks suspicious.
> > The 'rcu' field of 'bpf_hrtimer' is defined as follows:
> >
> > struct bpf_hrtimer {
> > ...
> > union {
> > struct rcu_head rcu;
> > struct work_struct sync_work;
> > };
> > ...
> > };
> >
> > And for sleepable timers the 'sync_work' field is set as follows:
> >
> > BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map,
> > u64, flags)
> > {
> > ...
> > INIT_WORK(&t->sync_work, bpf_timer_sync_work_cb);
> > ...
> > }
> >
> > So, it looks like 'kfree_rcu' would be called for a non-rcu pointer.
>
> That was my initial assumption too, but Alexei told me it was fine.
> And I think he is correct because kfree_rcu doesn't need the rcu_head
> to be initialized.
>
> So in the end, we initialize the memory as a work_struct, and when
> that work kicks in, we reuse that exact same memory as the rcu_head.
> This is fine because that work will never be reused.
>
> If I understand correctly, this is to save a few bytes as this is a
> critical struct used in programs with a high rate usage, and every
> byte counts.

Yes. All correct.
Probably makes sense to add a comment before kfree_rcu()
line in bpf_timer_sync_work_cb() that
kfree_rcu will wait for bpf_timer_cancel() to finish
as was done in
commit 0281b919e175 ("bpf: Fix racing between
bpf_timer_cancel_and_free and bpf_timer_cancel").

I suspect that's what confused Eduard.

The patch 1 looks great overall.

If we're going to keep this combined bpf_timer_* api for both wq
and timer we'd need to add flags compatibility check
to bpf_timer_start() too.
We can disallow this flag in 'flags' argument and use one from t->flags.
Which kinda makes sense to make bpf_timer_start() less verbose.
Alternatively we can allow bpf_timer_start() to have it,
but then we'd need to check that it is actually passed.
Either way the patch needs an extra check in bpf_timer_start().
Just ignoring BPF_F_TIMER_SLEEPABLE in bpf_timer_start() doesn't look right.

2024-04-09 03:14:26

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 4/6] bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable

On Mon, Apr 8, 2024 at 3:36 PM Eduard Zingerman <[email protected]> wrote:
>
> On Mon, 2024-04-08 at 10:09 +0200, Benjamin Tissoires wrote:
> > Now that we have bpf_timer_set_sleepable_cb() available and working, we
> > can tag the attached callback as sleepable, and let the verifier check
> > in the correct context the calls and kfuncs.
> >
> > Signed-off-by: Benjamin Tissoires <[email protected]>
> >
> > ---
>
> I think this patch is fine with one nit regarding in_sleepable().
> Acked-by: Eduard Zingerman <[email protected]>
>
> > @@ -5279,7 +5281,8 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >
> > static bool in_sleepable(struct bpf_verifier_env *env)
> > {
> > - return env->prog->sleepable;
> > + return env->prog->sleepable ||
> > + (env->cur_state && env->cur_state->in_sleepable);
> > }
>
> Sorry, I already raised this before.
> As far as I understand the 'env->cur_state' check is needed because
> this function is used from do_misc_fixups():
>
> if (is_storage_get_function(insn->imm)) {
> if (!in_sleepable(env) ||
> env->insn_aux_data[i + delta].storage_get_func_atomic)
> insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
> else
> insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL);
> insn_buf[1] = *insn;
> cnt = 2;
> ...
> }
>
> For a timer callback function 'env->prog->sleepable' would be false.
> Which means that inside sleepable callback function GFP_ATOMIC would
> be used in cases where GFP_KERNEL would be sufficient.
> An alternative would be to check (and set) sleepable flag not for a
> full program but for a subprogram.

At this point all subprograms are still part of the main program.
jit_subprogs() hasn't been called yet.
So there is only one 'prog' object.
So cannot really set prog->sleepable for callback subprog.

But you've raised a good point.
We can remove "!in_sleepable(env)" part in do_misc_fixups() with:
- if (in_sleepable(env) && is_storage_get_function(func_id))
-
env->insn_aux_data[insn_idx].storage_get_func_atomic = true;

+ if (is_storage_get_function(func_id))
+ env->insn_aux_data[insn_idx].storage_get_func_atomic = !in_sleepable(env);

2024-04-09 03:15:50

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 5/6] tools: sync include/uapi/linux/bpf.h

On Mon, Apr 8, 2024 at 1:10 AM Benjamin Tissoires <[email protected]> wrote:
>
> cp include/uapi/linux/bpf.h tools/include/uapi/linux/bpf.h
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
>
> ---
>
> updated in v6
>
> no changes in v5
>
> new in v4
> ---
> tools/include/uapi/linux/bpf.h | 20 +++++++++++++++++---
> 1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index bf80b614c4db..f1890eed213a 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -1662,9 +1662,10 @@ union bpf_attr {
> } query;
>
> struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
> - __u64 name;
> - __u32 prog_fd;
> - __aligned_u64 cookie;
> + __u64 name;
> + __u32 prog_fd;
> + __u32 :32;
> + __aligned_u64 cookie;
> } raw_tracepoint;

something wrong with your tree.
there is no such issue in bpf-next.

2024-04-09 03:18:02

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 0/6] sleepable bpf_timer (was: allow HID-BPF to do device IOs)

On Mon, Apr 8, 2024 at 1:09 AM Benjamin Tissoires <[email protected]> wrote:
>
> New version of the sleepable bpf_timer code.
>
> I'm posting this as this is the result of the previous review, so we can
> have a baseline to compare to.
>
> The plan is now to introduce a new user API struct bpf_wq, as the timer
> API working on softIRQ seems to be quite far away from a wq.

Looking forward. I think it's almost there.

2024-04-09 03:23:38

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 6/6] selftests/bpf: add sleepable timer tests

On Mon, Apr 8, 2024 at 4:01 PM Eduard Zingerman <[email protected]> wrote:
>
> > +SEC("tc")
> > +/* check that calling bpf_timer_start() with a delay on a sleepable
> > + * callback is returning -EINVAL
> > + */
> > +__retval(-22)
> > +long test_call_sleepable_delay(void *ctx)
> > +{
> > + int key = 2;
> > + struct bpf_timer *timer;
> > +
> > + timer = bpf_map_lookup_elem(&timer_map, &key);
> > + if (!timer)
> > + return 1;
> > +
> > + if (bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC | BPF_F_TIMER_SLEEPABLE))
> > + return 2;
> > +
> > + if (bpf_timer_set_sleepable_cb(timer, timer_cb_sleepable))
> > + return 3;
> > +
> > + return bpf_timer_start(timer, 1, 0);
>
> Q: should verifier statically check that 3rd parameter is zero for sleepable timers?
> (same question for call to bpf_timer_set_sleepable_cb() with non-sleepable map)

It can, but that sounds like more work for the verifier.
Which gives more reasons to use new kfuncs and clean start with bpf_wq.

2024-04-12 08:14:58

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [PATCH RFC bpf-next v6 2/6] bpf: Add support for KF_ARG_PTR_TO_TIMER

On Apr 08 2024, [email protected] wrote:
> From: Kumar Kartikeya Dwivedi <[email protected]>
>
> Introduce support for KF_ARG_PTR_TO_TIMER. The kfuncs will use bpf_timer
> as argument and that will be recognized as timer argument by verifier.
> bpf_timer_kern casting can happen inside kfunc, but using bpf_timer in
> argument makes life easier for users who work with non-kern type in BPF
> progs.
>
> Fix up process_timer_func's meta argument usage (ignore if NULL) so that
> we can share the same checks for helpers and kfuncs. meta argument is
> only needed to ensure bpf_timer_init's timer and map arguments are
> coming from the same map (map_uid logic is necessary for correct
> inner-map handling).
>
> No such concerns will be necessary for kfuncs as timer initialization
> happens using helpers, hence pass NULL to process_timer_func from kfunc
> argument handling code to ignore it.
>
> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
> Signed-off-by: Benjamin Tissoires <[email protected]>
>
> ---
>
> changes in v6:
> - used Kumar's version of the patch
> - reverted `+BTF_ID(struct, bpf_timer_kern)`

My bad. While working on bpf_wq I realized I shouldn't have touched this
part. See below:

>
> changes in v5:
> - also check for the reg offset
>
> changes in v4:
> - enforce KF_ARG_PTR_TO_TIMER to be of type PTR_TO_MAP_VALUE
>
> new in v3 (split from v2 02/10)
> ---
> kernel/bpf/verifier.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index ca6cacf7b42f..ccfe9057d8dc 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7568,12 +7568,16 @@ static int process_timer_func(struct bpf_verifier_env *env, int regno,
> val + reg->off, map->record->timer_off);
> return -EINVAL;
> }
> + /* meta is only needed for bpf_timer_init to match timer and map */
> + if (!meta)
> + goto out;
> if (meta->map_ptr) {
> verbose(env, "verifier bug. Two map pointers in a timer helper\n");
> return -EFAULT;
> }
> meta->map_uid = reg->map_uid;
> meta->map_ptr = map;
> +out:
> return 0;
> }
>
> @@ -10826,6 +10830,7 @@ enum {
> KF_ARG_LIST_NODE_ID,
> KF_ARG_RB_ROOT_ID,
> KF_ARG_RB_NODE_ID,
> + KF_ARG_TIMER_ID,
> };
>
> BTF_ID_LIST(kf_arg_btf_ids)
> @@ -10834,6 +10839,7 @@ BTF_ID(struct, bpf_list_head)
> BTF_ID(struct, bpf_list_node)
> BTF_ID(struct, bpf_rb_root)
> BTF_ID(struct, bpf_rb_node)
> +BTF_ID(struct, bpf_timer_kern)

As Kumar originally put, this should be BTF_ID(struct, bpf_timer), and
he explained everything in the commit message. I was just too dumb to
understand it properly.

(Adding a comment here in case we want to extend bpf_timer API in the
future, and so this patch will be useful).

Cheers,
Benjamin

>
> static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
> const struct btf_param *arg, int type)
> @@ -10877,6 +10883,11 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
> return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
> }
>
> +static bool is_kfunc_arg_timer(const struct btf *btf, const struct btf_param *arg)
> +{
> + return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_TIMER_ID);
> +}
> +
> static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
> const struct btf_param *arg)
> {
> @@ -10946,6 +10957,7 @@ enum kfunc_ptr_arg_type {
> KF_ARG_PTR_TO_NULL,
> KF_ARG_PTR_TO_CONST_STR,
> KF_ARG_PTR_TO_MAP,
> + KF_ARG_PTR_TO_TIMER,
> };
>
> enum special_kfunc_type {
> @@ -11102,6 +11114,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
> if (is_kfunc_arg_map(meta->btf, &args[argno]))
> return KF_ARG_PTR_TO_MAP;
>
> + if (is_kfunc_arg_timer(meta->btf, &args[argno]))
> + return KF_ARG_PTR_TO_TIMER;
> +
> if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
> if (!btf_type_is_struct(ref_t)) {
> verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
> @@ -11735,6 +11750,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> case KF_ARG_PTR_TO_CALLBACK:
> case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
> case KF_ARG_PTR_TO_CONST_STR:
> + case KF_ARG_PTR_TO_TIMER:
> /* Trusted by default */
> break;
> default:
> @@ -12021,6 +12037,15 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> if (ret)
> return ret;
> break;
> + case KF_ARG_PTR_TO_TIMER:
> + if (reg->type != PTR_TO_MAP_VALUE) {
> + verbose(env, "arg#%d doesn't point to a map value\n", i);
> + return -EINVAL;
> + }
> + ret = process_timer_func(env, regno, NULL);
> + if (ret < 0)
> + return ret;
> + break;
> }
> }
>
>
> --
> 2.44.0
>