[Partly a RFC/formal submission: there are still FIXMEs in the code]
[Also using bpf-next as the base tree for HID changes as there will
be conflicting changes otherwise, so I'm personaly fine for the HID
commits to go through bpf-next]
IMO, patches 1-3 and 9-14 are ready to go, rest is still pending review.
For reference, the use cases I have in mind:
---
Basically, I need to be able to defer a HID-BPF program for the
following reasons (from the aforementioned patch):
1. defer an event:
Sometimes we receive an out of proximity event, but the device can not
be trusted enough, and we need to ensure that we won't receive another
one in the following n milliseconds. So we need to wait those n
milliseconds, and eventually re-inject that event in the stack.
2. inject new events in reaction to one given event:
We might want to transform one given event into several. This is the
case for macro keys where a single key press is supposed to send
a sequence of key presses. But this could also be used to patch a
faulty behavior, if a device forgets to send a release event.
3. communicate with the device in reaction to one event:
We might want to communicate back to the device after a given event.
For example a device might send us an event saying that it came back
from sleeping state and needs to be re-initialized.
Currently we can achieve that by keeping a userspace program around,
raise a bpf event, and let that userspace program inject the events and
commands.
However, we are just keeping that program alive as a daemon for just
scheduling commands. There is no logic in it, so it doesn't really justify
an actual userspace wakeup. So a kernel workqueue seems simpler to handle.
The other part I'm not sure is whether we can say that BPF maps of type
queue/stack can be used in sleepable context.
I don't see any warning when running the test programs, but that's probably
not a guarantee I'm doing the things properly :)
Cheers,
Benjamin
To: Alexei Starovoitov <[email protected]>
To: Daniel Borkmann <[email protected]>
To: John Fastabend <[email protected]>
To: Andrii Nakryiko <[email protected]>
To: Martin KaFai Lau <[email protected]>
To: Eduard Zingerman <[email protected]>
To: Song Liu <[email protected]>
To: Yonghong Song <[email protected]>
To: KP Singh <[email protected]>
To: Stanislav Fomichev <[email protected]>
To: Hao Luo <[email protected]>
To: Jiri Olsa <[email protected]>
To: Jiri Kosina <[email protected]>
To: Benjamin Tissoires <[email protected]>
To: Jonathan Corbet <[email protected]>
To: Shuah Khan <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Benjamin Tissoires <[email protected]>
---
Changes in v3:
- fixed the crash from v2
- changed the API to have only BPF_F_TIMER_SLEEPABLE for
bpf_timer_start()
- split the new kfuncs/verifier patch into several sub-patches, for
easier reviews
- Link to v2: https://lore.kernel.org/r/[email protected]
Changes in v2:
- make use of bpf_timer (and dropped the custom HID handling)
- implemented bpf_timer_set_sleepable_cb as a kfunc
- still not implemented global subprogs
- no sleepable bpf_timer selftests yet
- Link to v1: https://lore.kernel.org/r/[email protected]
---
Benjamin Tissoires (16):
bpf/verifier: allow more maps in sleepable bpf programs
bpf/verifier: introduce in_sleepable() helper
bpf/verifier: add is_async_callback_calling_insn() helper
bpf/helpers: introduce sleepable bpf_timers
bpf/verifier: add bpf_timer as a kfunc capable type
bpf/helpers: introduce bpf_timer_set_sleepable_cb() kfunc
bpf/helpers: mark the callback of bpf_timer_set_sleepable_cb() as sleepable
bpf/verifier: do_misc_fixups for is_bpf_timer_set_sleepable_cb_kfunc
HID: bpf/dispatch: regroup kfuncs definitions
HID: bpf: export hid_hw_output_report as a BPF kfunc
selftests/hid: Add test for hid_bpf_hw_output_report
HID: bpf: allow to inject HID event from BPF
selftests/hid: add tests for hid_bpf_input_report
HID: bpf: allow to use bpf_timer_set_sleepable_cb() in tracing callbacks.
selftests/hid: add test for bpf_timer
selftests/hid: add KASAN to the VM tests
Documentation/hid/hid-bpf.rst | 2 +-
drivers/hid/bpf/hid_bpf_dispatch.c | 232 ++++++++++++++-------
drivers/hid/hid-core.c | 2 +
include/linux/bpf_verifier.h | 2 +
include/linux/hid_bpf.h | 3 +
include/uapi/linux/bpf.h | 4 +
kernel/bpf/helpers.c | 140 +++++++++++--
kernel/bpf/verifier.c | 114 ++++++++--
tools/testing/selftests/hid/config.common | 1 +
tools/testing/selftests/hid/hid_bpf.c | 195 ++++++++++++++++-
tools/testing/selftests/hid/progs/hid.c | 198 ++++++++++++++++++
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 8 +
12 files changed, 795 insertions(+), 106 deletions(-)
---
base-commit: 5c331823b3fc52ffd27524bf5b7e0d137114f470
change-id: 20240205-hid-bpf-sleepable-c01260fd91c4
Best regards,
--
Benjamin Tissoires <[email protected]>
These 2 maps types are required for HID-BPF when a user wants to do
IO with a device from a sleepable tracing point.
Allowing BPF_MAP_TYPE_QUEUE (and therefore BPF_MAP_TYPE_STACK) allows
for a BPF program to prepare from an IRQ the list of HID commands to send
back to the device and then these commands can be retrieved from the
sleepable trace point.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
no changes in v3
changes in v2:
- dropped BPF_MAP_TYPE_PROG_ARRAY from the list
---
kernel/bpf/verifier.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 011d54a1dc53..88e9d2e4c29f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18022,6 +18022,8 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
case BPF_MAP_TYPE_SK_STORAGE:
case BPF_MAP_TYPE_TASK_STORAGE:
case BPF_MAP_TYPE_CGRP_STORAGE:
+ case BPF_MAP_TYPE_QUEUE:
+ case BPF_MAP_TYPE_STACK:
break;
default:
verbose(env,
--
2.43.0
No code change, but it'll allow to have only one place to change
everything when we add in_sleepable in cur_state.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
no changes in v3
changes in v2 (compared to the one attaches to v1 0/9):
- dropped the cur_state flag, so it can be put first
---
kernel/bpf/verifier.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 88e9d2e4c29f..7a4b19bea2ac 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5255,6 +5255,11 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
return -EINVAL;
}
+static bool in_sleepable(struct bpf_verifier_env *env)
+{
+ return env->prog->aux->sleepable;
+}
+
/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
* can dereference RCU protected pointers and result is PTR_TRUSTED.
*/
@@ -5262,7 +5267,7 @@ static bool in_rcu_cs(struct bpf_verifier_env *env)
{
return env->cur_state->active_rcu_lock ||
env->cur_state->active_lock.ptr ||
- !env->prog->aux->sleepable;
+ !in_sleepable(env);
}
/* Once GCC supports btf_type_tag the following mechanism will be replaced with tag check */
@@ -10164,7 +10169,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
return -EINVAL;
}
- if (!env->prog->aux->sleepable && fn->might_sleep) {
+ if (!in_sleepable(env) && fn->might_sleep) {
verbose(env, "helper call might sleep in a non-sleepable prog\n");
return -EINVAL;
}
@@ -10194,7 +10199,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
return -EINVAL;
}
- if (env->prog->aux->sleepable && is_storage_get_function(func_id))
+ if (in_sleepable(env) && is_storage_get_function(func_id))
env->insn_aux_data[insn_idx].storage_get_func_atomic = true;
}
@@ -11535,7 +11540,7 @@ static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env)
return true;
fallthrough;
default:
- return env->prog->aux->sleepable;
+ return in_sleepable(env);
}
}
@@ -12056,7 +12061,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
}
sleepable = is_kfunc_sleepable(&meta);
- if (sleepable && !env->prog->aux->sleepable) {
+ if (sleepable && !in_sleepable(env)) {
verbose(env, "program must be sleepable to call sleepable kfunc %s\n", func_name);
return -EACCES;
}
@@ -18193,7 +18198,7 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
return -E2BIG;
}
- if (env->prog->aux->sleepable)
+ if (in_sleepable(env))
atomic64_inc(&map->sleepable_refcnt);
/* hold the map. If the program is rejected by verifier,
* the map will be released by release_maps() or it
@@ -19669,7 +19674,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
}
if (is_storage_get_function(insn->imm)) {
- if (!env->prog->aux->sleepable ||
+ if (!in_sleepable(env) ||
env->insn_aux_data[i + delta].storage_get_func_atomic)
insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC);
else
--
2.43.0
They are implemented as a workqueue, which means that there are no
guarantees of timing nor ordering.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
changes in v3:
- extracted the implementation in bpf_timer only, without
bpf_timer_set_sleepable_cb()
- rely on schedule_work() only, from bpf_timer_start()
- add semaphore to ensure bpf_timer_work_cb() is accessing
consistent data
changes in v2 (compared to the one attaches to v1 0/9):
- make use of a kfunc
- add a (non-used) BPF_F_TIMER_SLEEPABLE
- the callback is *not* called, it makes the kernel crashes
---
include/uapi/linux/bpf.h | 4 +++
kernel/bpf/helpers.c | 92 ++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 82 insertions(+), 14 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d96708380e52..1fc7ecbd9d33 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7421,10 +7421,14 @@ struct bpf_core_relo {
* - BPF_F_TIMER_ABS: Timeout passed is absolute time, by default it is
* relative to current time.
* - BPF_F_TIMER_CPU_PIN: Timer will be pinned to the CPU of the caller.
+ * - BPF_F_TIMER_SLEEPABLE: Timer will run in a sleepable context, with
+ * no guarantees of ordering nor timing (consider this as being just
+ * offloaded immediately).
*/
enum {
BPF_F_TIMER_ABS = (1ULL << 0),
BPF_F_TIMER_CPU_PIN = (1ULL << 1),
+ BPF_F_TIMER_SLEEPABLE = (1ULL << 2),
};
/* BPF numbers iterator state */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 93edf730d288..f9add0abe40a 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -23,6 +23,7 @@
#include <linux/btf_ids.h>
#include <linux/bpf_mem_alloc.h>
#include <linux/kasan.h>
+#include <linux/semaphore.h>
#include "../../lib/kstrtox.h"
@@ -1094,13 +1095,19 @@ const struct bpf_func_proto bpf_snprintf_proto = {
* bpf_timer_cancel() cancels the timer and decrements prog's refcnt.
* Inner maps can contain bpf timers as well. ops->map_release_uref is
* freeing the timers when inner map is replaced or deleted by user space.
+ *
+ * sleepable_lock protects only the setup of the workqueue, not the callback
+ * itself. This is done to ensure we don't run concurrently a free of the
+ * callback or the associated program.
*/
struct bpf_hrtimer {
struct hrtimer timer;
+ struct work_struct work;
struct bpf_map *map;
struct bpf_prog *prog;
void __rcu *callback_fn;
void *value;
+ struct semaphore sleepable_lock;
};
/* the actual struct hidden inside uapi struct bpf_timer */
@@ -1113,6 +1120,55 @@ struct bpf_timer_kern {
struct bpf_spin_lock lock;
} __attribute__((aligned(8)));
+static u32 __bpf_timer_compute_key(struct bpf_hrtimer *timer)
+{
+ struct bpf_map *map = timer->map;
+ void *value = timer->value;
+
+ if (map->map_type == BPF_MAP_TYPE_ARRAY) {
+ struct bpf_array *array = container_of(map, struct bpf_array, map);
+
+ /* compute the key */
+ return ((char *)value - array->value) / array->elem_size;
+ }
+
+ /* hash or lru */
+ return *(u32 *)(value - round_up(map->key_size, 8));
+}
+
+static void bpf_timer_work_cb(struct work_struct *work)
+{
+ struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, work);
+ struct bpf_map *map = t->map;
+ void *value = t->value;
+ bpf_callback_t callback_fn;
+ u32 key;
+
+ BTF_TYPE_EMIT(struct bpf_timer);
+
+ down(&t->sleepable_lock);
+
+ callback_fn = READ_ONCE(t->callback_fn);
+ if (!callback_fn) {
+ up(&t->sleepable_lock);
+ return;
+ }
+
+ key = __bpf_timer_compute_key(t);
+
+ /* prevent the callback to be freed by bpf_timer_cancel() while running
+ * so we can release the semaphore
+ */
+ bpf_prog_inc(t->prog);
+
+ up(&t->sleepable_lock);
+
+ callback_fn((u64)(long)map, (u64)(long)&key, (u64)(long)value, 0, 0);
+ /* The verifier checked that return value is zero. */
+
+ bpf_prog_put(t->prog);
+}
+
static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
@@ -1121,8 +1177,7 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
struct bpf_map *map = t->map;
void *value = t->value;
bpf_callback_t callback_fn;
- void *key;
- u32 idx;
+ u32 key;
BTF_TYPE_EMIT(struct bpf_timer);
callback_fn = rcu_dereference_check(t->callback_fn, rcu_read_lock_bh_held());
@@ -1136,17 +1191,9 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
* bpf_map_delete_elem() on the same timer.
*/
this_cpu_write(hrtimer_running, t);
- if (map->map_type == BPF_MAP_TYPE_ARRAY) {
- struct bpf_array *array = container_of(map, struct bpf_array, map);
-
- /* compute the key */
- idx = ((char *)value - array->value) / array->elem_size;
- key = &idx;
- } else { /* hash or lru */
- key = value - round_up(map->key_size, 8);
- }
+ key = __bpf_timer_compute_key(t);
- callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
+ callback_fn((u64)(long)map, (u64)(long)&key, (u64)(long)value, 0, 0);
/* The verifier checked that return value is zero. */
this_cpu_write(hrtimer_running, NULL);
@@ -1191,6 +1238,8 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
t->prog = NULL;
rcu_assign_pointer(t->callback_fn, NULL);
hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
+ INIT_WORK(&t->work, bpf_timer_work_cb);
+ sema_init(&t->sleepable_lock, 1);
t->timer.function = bpf_timer_cb;
WRITE_ONCE(timer->timer, t);
/* Guarantee the order between timer->timer and map->usercnt. So
@@ -1245,6 +1294,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
ret = -EPERM;
goto out;
}
+ down(&t->sleepable_lock);
prev = t->prog;
if (prev != prog) {
/* Bump prog refcnt once. Every bpf_timer_set_callback()
@@ -1261,6 +1311,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
t->prog = prog;
}
rcu_assign_pointer(t->callback_fn, callback_fn);
+ up(&t->sleepable_lock);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
return ret;
@@ -1282,7 +1333,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
if (in_nmi())
return -EOPNOTSUPP;
- if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN))
+ if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN | BPF_F_TIMER_SLEEPABLE))
return -EINVAL;
__bpf_spin_lock_irqsave(&timer->lock);
t = timer->timer;
@@ -1299,7 +1350,10 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
if (flags & BPF_F_TIMER_CPU_PIN)
mode |= HRTIMER_MODE_PINNED;
- hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
+ if (flags & BPF_F_TIMER_SLEEPABLE)
+ schedule_work(&t->work);
+ else
+ hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
return ret;
@@ -1346,13 +1400,21 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
ret = -EDEADLK;
goto out;
}
+ down(&t->sleepable_lock);
drop_prog_refcnt(t);
+ up(&t->sleepable_lock);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
/* Cancel the timer and wait for associated callback to finish
* if it was running.
*/
ret = ret ?: hrtimer_cancel(&t->timer);
+
+ /* also cancel the sleepable work, but *do not* wait for
+ * it to finish if it was running as we might not be in a
+ * sleepable context
+ */
+ ret = ret ?: cancel_work(&t->work);
return ret;
}
@@ -1407,6 +1469,8 @@ void bpf_timer_cancel_and_free(void *val)
*/
if (this_cpu_read(hrtimer_running) != t)
hrtimer_cancel(&t->timer);
+
+ cancel_work_sync(&t->work);
kfree(t);
}
--
2.43.0
We need to extend the bpf_timer API, but the way forward relies on kfuncs.
So make bpf_timer known for kfuncs from the verifier PoV
Signed-off-by: Benjamin Tissoires <[email protected]>
---
new in v3 (split from v2 02/10)
---
kernel/bpf/verifier.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f81c799b2c80..2b11687063ff 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5444,6 +5444,26 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
return -EACCES;
}
break;
+ case BPF_TIMER:
+ /* FIXME: kptr does the above, should we use the same? */
+ if (src != ACCESS_DIRECT) {
+ verbose(env, "bpf_timer cannot be accessed indirectly by helper\n");
+ return -EACCES;
+ }
+ if (!tnum_is_const(reg->var_off)) {
+ verbose(env, "bpf_timer access cannot have variable offset\n");
+ return -EACCES;
+ }
+ if (p != off + reg->var_off.value) {
+ verbose(env, "bpf_timer access misaligned expected=%u off=%llu\n",
+ p, off + reg->var_off.value);
+ return -EACCES;
+ }
+ if (size != bpf_size_to_bytes(BPF_DW)) {
+ verbose(env, "bpf_timer access size must be BPF_DW\n");
+ return -EACCES;
+ }
+ break;
default:
verbose(env, "%s cannot be accessed directly by load/store\n",
btf_field_type_name(field->type));
@@ -10789,6 +10809,7 @@ enum {
KF_ARG_LIST_NODE_ID,
KF_ARG_RB_ROOT_ID,
KF_ARG_RB_NODE_ID,
+ KF_ARG_TIMER_ID,
};
BTF_ID_LIST(kf_arg_btf_ids)
@@ -10797,6 +10818,7 @@ BTF_ID(struct, bpf_list_head)
BTF_ID(struct, bpf_list_node)
BTF_ID(struct, bpf_rb_root)
BTF_ID(struct, bpf_rb_node)
+BTF_ID(struct, bpf_timer_kern)
static bool __is_kfunc_ptr_arg_type(const struct btf *btf,
const struct btf_param *arg, int type)
@@ -10840,6 +10862,12 @@ static bool is_kfunc_arg_rbtree_node(const struct btf *btf, const struct btf_par
return __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_RB_NODE_ID);
}
+static bool is_kfunc_arg_timer(const struct btf *btf, const struct btf_param *arg)
+{
+ bool ret = __is_kfunc_ptr_arg_type(btf, arg, KF_ARG_TIMER_ID);
+ return ret;
+}
+
static bool is_kfunc_arg_callback(struct bpf_verifier_env *env, const struct btf *btf,
const struct btf_param *arg)
{
@@ -10908,6 +10936,7 @@ enum kfunc_ptr_arg_type {
KF_ARG_PTR_TO_RB_NODE,
KF_ARG_PTR_TO_NULL,
KF_ARG_PTR_TO_CONST_STR,
+ KF_ARG_PTR_TO_TIMER,
};
enum special_kfunc_type {
@@ -11061,6 +11090,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (is_kfunc_arg_const_str(meta->btf, &args[argno]))
return KF_ARG_PTR_TO_CONST_STR;
+ if (is_kfunc_arg_timer(meta->btf, &args[argno]))
+ return KF_ARG_PTR_TO_TIMER;
+
if ((base_type(reg->type) == PTR_TO_BTF_ID || reg2btf_ids[base_type(reg->type)])) {
if (!btf_type_is_struct(ref_t)) {
verbose(env, "kernel function %s args#%d pointer type %s %s is not supported\n",
@@ -11693,6 +11725,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
case KF_ARG_PTR_TO_CALLBACK:
case KF_ARG_PTR_TO_REFCOUNTED_KPTR:
case KF_ARG_PTR_TO_CONST_STR:
+ case KF_ARG_PTR_TO_TIMER:
/* Trusted by default */
break;
default:
@@ -11973,6 +12006,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
if (ret)
return ret;
break;
+ case KF_ARG_PTR_TO_TIMER:
+ /* FIXME: should we do anything here? */
+ break;
}
}
--
2.43.0
This is still a WIP, but I think this can be dropped as we never
get to this instruction. So what should we do here?
Signed-off-by: Benjamin Tissoires <[email protected]>
---
kernel/bpf/verifier.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4766c43606c4..8a9f268c4ee2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -19720,7 +19720,8 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
continue;
}
- if (insn->imm == BPF_FUNC_timer_set_callback) {
+ if (insn->imm == BPF_FUNC_timer_set_callback ||
+ is_bpf_timer_set_sleepable_cb_kfunc(insn->imm)) {
/* The verifier will process callback_fn as many times as necessary
* with different maps and the register states prepared by
* set_timer_callback_state will be accurate.
--
2.43.0
No code change, just move down the hid_bpf_get_data() kfunc definition
so we have only one block of __bpf_kfunc_start/end_defs()
Signed-off-by: Benjamin Tissoires <[email protected]>
---
no changes in v3
no changes in v2
---
drivers/hid/bpf/hid_bpf_dispatch.c | 80 ++++++++++++++++++--------------------
1 file changed, 38 insertions(+), 42 deletions(-)
diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
index e630caf644e8..52abb27426f4 100644
--- a/drivers/hid/bpf/hid_bpf_dispatch.c
+++ b/drivers/hid/bpf/hid_bpf_dispatch.c
@@ -143,48 +143,6 @@ u8 *call_hid_bpf_rdesc_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *s
}
EXPORT_SYMBOL_GPL(call_hid_bpf_rdesc_fixup);
-/* Disables missing prototype warnings */
-__bpf_kfunc_start_defs();
-
-/**
- * hid_bpf_get_data - Get the kernel memory pointer associated with the context @ctx
- *
- * @ctx: The HID-BPF context
- * @offset: The offset within the memory
- * @rdwr_buf_size: the const size of the buffer
- *
- * @returns %NULL on error, an %__u8 memory pointer on success
- */
-__bpf_kfunc __u8 *
-hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const size_t rdwr_buf_size)
-{
- struct hid_bpf_ctx_kern *ctx_kern;
-
- if (!ctx)
- return NULL;
-
- ctx_kern = container_of(ctx, struct hid_bpf_ctx_kern, ctx);
-
- if (rdwr_buf_size + offset > ctx->allocated_size)
- return NULL;
-
- return ctx_kern->data + offset;
-}
-__bpf_kfunc_end_defs();
-
-/*
- * The following set contains all functions we agree BPF programs
- * can use.
- */
-BTF_KFUNCS_START(hid_bpf_kfunc_ids)
-BTF_ID_FLAGS(func, hid_bpf_get_data, KF_RET_NULL)
-BTF_KFUNCS_END(hid_bpf_kfunc_ids)
-
-static const struct btf_kfunc_id_set hid_bpf_kfunc_set = {
- .owner = THIS_MODULE,
- .set = &hid_bpf_kfunc_ids,
-};
-
static int device_match_id(struct device *dev, const void *id)
{
struct hid_device *hdev = to_hid_device(dev);
@@ -281,6 +239,31 @@ static int do_hid_bpf_attach_prog(struct hid_device *hdev, int prog_fd, struct b
/* Disables missing prototype warnings */
__bpf_kfunc_start_defs();
+/**
+ * hid_bpf_get_data - Get the kernel memory pointer associated with the context @ctx
+ *
+ * @ctx: The HID-BPF context
+ * @offset: The offset within the memory
+ * @rdwr_buf_size: the const size of the buffer
+ *
+ * @returns %NULL on error, an %__u8 memory pointer on success
+ */
+__bpf_kfunc __u8 *
+hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const size_t rdwr_buf_size)
+{
+ struct hid_bpf_ctx_kern *ctx_kern;
+
+ if (!ctx)
+ return NULL;
+
+ ctx_kern = container_of(ctx, struct hid_bpf_ctx_kern, ctx);
+
+ if (rdwr_buf_size + offset > ctx->allocated_size)
+ return NULL;
+
+ return ctx_kern->data + offset;
+}
+
/**
* hid_bpf_attach_prog - Attach the given @prog_fd to the given HID device
*
@@ -474,6 +457,19 @@ hid_bpf_hw_request(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz,
}
__bpf_kfunc_end_defs();
+/*
+ * The following set contains all functions we agree BPF programs
+ * can use.
+ */
+BTF_KFUNCS_START(hid_bpf_kfunc_ids)
+BTF_ID_FLAGS(func, hid_bpf_get_data, KF_RET_NULL)
+BTF_KFUNCS_END(hid_bpf_kfunc_ids)
+
+static const struct btf_kfunc_id_set hid_bpf_kfunc_set = {
+ .owner = THIS_MODULE,
+ .set = &hid_bpf_kfunc_ids,
+};
+
/* our HID-BPF entrypoints */
BTF_SET8_START(hid_bpf_fmodret_ids)
BTF_ID_FLAGS(func, hid_bpf_device_event)
--
2.43.0
We currently only export hid_hw_raw_request() as a BPF kfunc.
However, some devices require an explicit write on the Output Report
instead of the use of the control channel.
So also export hid_hw_output_report to BPF
Signed-off-by: Benjamin Tissoires <[email protected]>
---
no changes in v3
no changes in v2
---
Documentation/hid/hid-bpf.rst | 2 +-
drivers/hid/bpf/hid_bpf_dispatch.c | 112 +++++++++++++++++++++++++++----------
drivers/hid/hid-core.c | 1 +
include/linux/hid_bpf.h | 1 +
4 files changed, 86 insertions(+), 30 deletions(-)
diff --git a/Documentation/hid/hid-bpf.rst b/Documentation/hid/hid-bpf.rst
index 4fad83a6ebc3..a575004d9025 100644
--- a/Documentation/hid/hid-bpf.rst
+++ b/Documentation/hid/hid-bpf.rst
@@ -179,7 +179,7 @@ Available API that can be used in syscall HID-BPF programs:
-----------------------------------------------------------
.. kernel-doc:: drivers/hid/bpf/hid_bpf_dispatch.c
- :functions: hid_bpf_attach_prog hid_bpf_hw_request hid_bpf_allocate_context hid_bpf_release_context
+ :functions: hid_bpf_attach_prog hid_bpf_hw_request hid_bpf_hw_output_report hid_bpf_allocate_context hid_bpf_release_context
General overview of a HID-BPF program
=====================================
diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
index 52abb27426f4..a5b88b491b80 100644
--- a/drivers/hid/bpf/hid_bpf_dispatch.c
+++ b/drivers/hid/bpf/hid_bpf_dispatch.c
@@ -376,6 +376,46 @@ hid_bpf_release_context(struct hid_bpf_ctx *ctx)
put_device(&hid->dev);
}
+static int
+__hid_bpf_hw_check_params(struct hid_bpf_ctx *ctx, __u8 *buf, size_t *buf__sz,
+ enum hid_report_type rtype)
+{
+ struct hid_report_enum *report_enum;
+ struct hid_report *report;
+ struct hid_device *hdev;
+ u32 report_len;
+
+ /* check arguments */
+ if (!ctx || !hid_bpf_ops || !buf)
+ return -EINVAL;
+
+ switch (rtype) {
+ case HID_INPUT_REPORT:
+ case HID_OUTPUT_REPORT:
+ case HID_FEATURE_REPORT:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (*buf__sz < 1)
+ return -EINVAL;
+
+ hdev = (struct hid_device *)ctx->hid; /* discard const */
+
+ report_enum = hdev->report_enum + rtype;
+ report = hid_bpf_ops->hid_get_report(report_enum, buf);
+ if (!report)
+ return -EINVAL;
+
+ report_len = hid_report_len(report);
+
+ if (*buf__sz > report_len)
+ *buf__sz = report_len;
+
+ return 0;
+}
+
/**
* hid_bpf_hw_request - Communicate with a HID device
*
@@ -392,24 +432,14 @@ hid_bpf_hw_request(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz,
enum hid_report_type rtype, enum hid_class_request reqtype)
{
struct hid_device *hdev;
- struct hid_report *report;
- struct hid_report_enum *report_enum;
+ size_t size = buf__sz;
u8 *dma_data;
- u32 report_len;
int ret;
/* check arguments */
- if (!ctx || !hid_bpf_ops || !buf)
- return -EINVAL;
-
- switch (rtype) {
- case HID_INPUT_REPORT:
- case HID_OUTPUT_REPORT:
- case HID_FEATURE_REPORT:
- break;
- default:
- return -EINVAL;
- }
+ ret = __hid_bpf_hw_check_params(ctx, buf, &size, rtype);
+ if (ret)
+ return ret;
switch (reqtype) {
case HID_REQ_GET_REPORT:
@@ -423,29 +453,16 @@ hid_bpf_hw_request(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz,
return -EINVAL;
}
- if (buf__sz < 1)
- return -EINVAL;
-
hdev = (struct hid_device *)ctx->hid; /* discard const */
- report_enum = hdev->report_enum + rtype;
- report = hid_bpf_ops->hid_get_report(report_enum, buf);
- if (!report)
- return -EINVAL;
-
- report_len = hid_report_len(report);
-
- if (buf__sz > report_len)
- buf__sz = report_len;
-
- dma_data = kmemdup(buf, buf__sz, GFP_KERNEL);
+ dma_data = kmemdup(buf, size, GFP_KERNEL);
if (!dma_data)
return -ENOMEM;
ret = hid_bpf_ops->hid_hw_raw_request(hdev,
dma_data[0],
dma_data,
- buf__sz,
+ size,
rtype,
reqtype);
@@ -455,6 +472,42 @@ hid_bpf_hw_request(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz,
kfree(dma_data);
return ret;
}
+
+/**
+ * hid_bpf_hw_output_report - Send an output report to a HID device
+ *
+ * @ctx: the HID-BPF context previously allocated in hid_bpf_allocate_context()
+ * @buf: a %PTR_TO_MEM buffer
+ * @buf__sz: the size of the data to transfer
+ *
+ * @returns the number of bytes transferred on success, a negative error code otherwise.
+ */
+__bpf_kfunc int
+hid_bpf_hw_output_report(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz)
+{
+ struct hid_device *hdev;
+ size_t size = buf__sz;
+ u8 *dma_data;
+ int ret;
+
+ /* check arguments */
+ ret = __hid_bpf_hw_check_params(ctx, buf, &size, HID_OUTPUT_REPORT);
+ if (ret)
+ return ret;
+
+ hdev = (struct hid_device *)ctx->hid; /* discard const */
+
+ dma_data = kmemdup(buf, size, GFP_KERNEL);
+ if (!dma_data)
+ return -ENOMEM;
+
+ ret = hid_bpf_ops->hid_hw_output_report(hdev,
+ dma_data,
+ size);
+
+ kfree(dma_data);
+ return ret;
+}
__bpf_kfunc_end_defs();
/*
@@ -488,6 +541,7 @@ BTF_ID_FLAGS(func, hid_bpf_attach_prog)
BTF_ID_FLAGS(func, hid_bpf_allocate_context, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, hid_bpf_release_context, KF_RELEASE)
BTF_ID_FLAGS(func, hid_bpf_hw_request)
+BTF_ID_FLAGS(func, hid_bpf_hw_output_report)
BTF_KFUNCS_END(hid_bpf_syscall_kfunc_ids)
static const struct btf_kfunc_id_set hid_bpf_syscall_kfunc_set = {
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index de7a477d6665..1243595890ba 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -2974,6 +2974,7 @@ EXPORT_SYMBOL_GPL(hid_check_keys_pressed);
static struct hid_bpf_ops hid_ops = {
.hid_get_report = hid_get_report,
.hid_hw_raw_request = hid_hw_raw_request,
+ .hid_hw_output_report = hid_hw_output_report,
.owner = THIS_MODULE,
.bus_type = &hid_bus_type,
};
diff --git a/include/linux/hid_bpf.h b/include/linux/hid_bpf.h
index 7118ac28d468..5c7ff93dc73e 100644
--- a/include/linux/hid_bpf.h
+++ b/include/linux/hid_bpf.h
@@ -103,6 +103,7 @@ struct hid_bpf_ops {
unsigned char reportnum, __u8 *buf,
size_t len, enum hid_report_type rtype,
enum hid_class_request reqtype);
+ int (*hid_hw_output_report)(struct hid_device *hdev, __u8 *buf, size_t len);
struct module *owner;
const struct bus_type *bus_type;
};
--
2.43.0
This time we need to ensure uhid receives it, thus the new mutex and
condition.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
no changes in v3
no changes in v2
---
tools/testing/selftests/hid/hid_bpf.c | 63 ++++++++++++++++++++++
tools/testing/selftests/hid/progs/hid.c | 24 +++++++++
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 2 +
3 files changed, 89 insertions(+)
diff --git a/tools/testing/selftests/hid/hid_bpf.c b/tools/testing/selftests/hid/hid_bpf.c
index 2cf96f818f25..8332014838b0 100644
--- a/tools/testing/selftests/hid/hid_bpf.c
+++ b/tools/testing/selftests/hid/hid_bpf.c
@@ -16,6 +16,11 @@
#define SHOW_UHID_DEBUG 0
+#define min(a, b) \
+ ({ __typeof__(a) _a = (a); \
+ __typeof__(b) _b = (b); \
+ _a < _b ? _a : _b; })
+
static unsigned char rdesc[] = {
0x06, 0x00, 0xff, /* Usage Page (Vendor Defined Page 1) */
0x09, 0x21, /* Usage (Vendor Usage 0x21) */
@@ -111,6 +116,10 @@ struct hid_hw_request_syscall_args {
static pthread_mutex_t uhid_started_mtx = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t uhid_started = PTHREAD_COND_INITIALIZER;
+static pthread_mutex_t uhid_output_mtx = PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t uhid_output_cond = PTHREAD_COND_INITIALIZER;
+static unsigned char output_report[10];
+
/* no need to protect uhid_stopped, only one thread accesses it */
static bool uhid_stopped;
@@ -205,6 +214,13 @@ static int uhid_event(struct __test_metadata *_metadata, int fd)
break;
case UHID_OUTPUT:
UHID_LOG("UHID_OUTPUT from uhid-dev");
+
+ pthread_mutex_lock(&uhid_output_mtx);
+ memcpy(output_report,
+ ev.u.output.data,
+ min(ev.u.output.size, sizeof(output_report)));
+ pthread_cond_signal(&uhid_output_cond);
+ pthread_mutex_unlock(&uhid_output_mtx);
break;
case UHID_GET_REPORT:
UHID_LOG("UHID_GET_REPORT from uhid-dev");
@@ -733,6 +749,53 @@ TEST_F(hid_bpf, test_hid_change_report)
ASSERT_EQ(buf[2], 0) TH_LOG("leftovers_from_previous_test");
}
+/*
+ * Call hid_bpf_hw_output_report against the given uhid device,
+ * check that the program is called and does the expected.
+ */
+TEST_F(hid_bpf, test_hid_user_output_report_call)
+{
+ struct hid_hw_request_syscall_args args = {
+ .retval = -1,
+ .size = 10,
+ };
+ DECLARE_LIBBPF_OPTS(bpf_test_run_opts, tattrs,
+ .ctx_in = &args,
+ .ctx_size_in = sizeof(args),
+ );
+ int err, cond_err, prog_fd;
+ struct timespec time_to_wait;
+
+ LOAD_BPF;
+
+ args.hid = self->hid_id;
+ args.data[0] = 1; /* report ID */
+ args.data[1] = 2; /* report ID */
+ args.data[2] = 42; /* report ID */
+
+ prog_fd = bpf_program__fd(self->skel->progs.hid_user_output_report);
+
+ pthread_mutex_lock(&uhid_output_mtx);
+
+ memset(output_report, 0, sizeof(output_report));
+ clock_gettime(CLOCK_REALTIME, &time_to_wait);
+ time_to_wait.tv_sec += 2;
+
+ err = bpf_prog_test_run_opts(prog_fd, &tattrs);
+ cond_err = pthread_cond_timedwait(&uhid_output_cond, &uhid_output_mtx, &time_to_wait);
+
+ ASSERT_OK(err) TH_LOG("error while calling bpf_prog_test_run_opts");
+ ASSERT_OK(cond_err) TH_LOG("error while calling waiting for the condition");
+
+ ASSERT_EQ(args.retval, 3);
+
+ ASSERT_EQ(output_report[0], 1);
+ ASSERT_EQ(output_report[1], 2);
+ ASSERT_EQ(output_report[2], 42);
+
+ pthread_mutex_unlock(&uhid_output_mtx);
+}
+
/*
* Attach hid_user_raw_request to the given uhid device,
* call the bpf program from userspace
diff --git a/tools/testing/selftests/hid/progs/hid.c b/tools/testing/selftests/hid/progs/hid.c
index 1e558826b809..2c2b679a83b1 100644
--- a/tools/testing/selftests/hid/progs/hid.c
+++ b/tools/testing/selftests/hid/progs/hid.c
@@ -101,6 +101,30 @@ int hid_user_raw_request(struct hid_hw_request_syscall_args *args)
return 0;
}
+SEC("syscall")
+int hid_user_output_report(struct hid_hw_request_syscall_args *args)
+{
+ struct hid_bpf_ctx *ctx;
+ const size_t size = args->size;
+ int i, ret = 0;
+
+ if (size > sizeof(args->data))
+ return -7; /* -E2BIG */
+
+ ctx = hid_bpf_allocate_context(args->hid);
+ if (!ctx)
+ return -1; /* EPERM check */
+
+ ret = hid_bpf_hw_output_report(ctx,
+ args->data,
+ size);
+ args->retval = ret;
+
+ hid_bpf_release_context(ctx);
+
+ return 0;
+}
+
static const __u8 rdesc[] = {
0x05, 0x01, /* USAGE_PAGE (Generic Desktop) */
0x09, 0x32, /* USAGE (Z) */
diff --git a/tools/testing/selftests/hid/progs/hid_bpf_helpers.h b/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
index 65e657ac1198..50c6a0d5765e 100644
--- a/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
+++ b/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
@@ -94,5 +94,7 @@ extern int hid_bpf_hw_request(struct hid_bpf_ctx *ctx,
size_t buf__sz,
enum hid_report_type type,
enum hid_class_request reqtype) __ksym;
+extern int hid_bpf_hw_output_report(struct hid_bpf_ctx *ctx,
+ __u8 *buf, size_t buf__sz) __ksym;
#endif /* __HID_BPF_HELPERS_H */
--
2.43.0
It can be interesting to inject events from BPF as if the event were
to come from the device.
For example, some multitouch devices do not all the time send a proximity
out event, and we might want to send it for the physical device.
Compared to uhid, we can now inject events on any physical device, not
just uhid virtual ones.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
no changes in v3
no changes in v2
---
Documentation/hid/hid-bpf.rst | 2 +-
drivers/hid/bpf/hid_bpf_dispatch.c | 29 +++++++++++++++++++++++++++++
drivers/hid/hid-core.c | 1 +
include/linux/hid_bpf.h | 2 ++
4 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/Documentation/hid/hid-bpf.rst b/Documentation/hid/hid-bpf.rst
index a575004d9025..0765b3298ecf 100644
--- a/Documentation/hid/hid-bpf.rst
+++ b/Documentation/hid/hid-bpf.rst
@@ -179,7 +179,7 @@ Available API that can be used in syscall HID-BPF programs:
-----------------------------------------------------------
.. kernel-doc:: drivers/hid/bpf/hid_bpf_dispatch.c
- :functions: hid_bpf_attach_prog hid_bpf_hw_request hid_bpf_hw_output_report hid_bpf_allocate_context hid_bpf_release_context
+ :functions: hid_bpf_attach_prog hid_bpf_hw_request hid_bpf_hw_output_report hid_bpf_input_report hid_bpf_allocate_context hid_bpf_release_context
General overview of a HID-BPF program
=====================================
diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
index a5b88b491b80..e1a650f4a626 100644
--- a/drivers/hid/bpf/hid_bpf_dispatch.c
+++ b/drivers/hid/bpf/hid_bpf_dispatch.c
@@ -508,6 +508,34 @@ hid_bpf_hw_output_report(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz)
kfree(dma_data);
return ret;
}
+
+/**
+ * hid_bpf_input_report - Inject a HID report in the kernel from a HID device
+ *
+ * @ctx: the HID-BPF context previously allocated in hid_bpf_allocate_context()
+ * @type: the type of the report (%HID_INPUT_REPORT, %HID_FEATURE_REPORT, %HID_OUTPUT_REPORT)
+ * @buf: a %PTR_TO_MEM buffer
+ * @buf__sz: the size of the data to transfer
+ *
+ * @returns %0 on success, a negative error code otherwise.
+ */
+__bpf_kfunc int
+hid_bpf_input_report(struct hid_bpf_ctx *ctx, enum hid_report_type type, u8 *buf,
+ const size_t buf__sz)
+{
+ struct hid_device *hdev;
+ size_t size = buf__sz;
+ int ret;
+
+ /* check arguments */
+ ret = __hid_bpf_hw_check_params(ctx, buf, &size, type);
+ if (ret)
+ return ret;
+
+ hdev = (struct hid_device *)ctx->hid; /* discard const */
+
+ return hid_input_report(hdev, type, buf, size, 0);
+}
__bpf_kfunc_end_defs();
/*
@@ -542,6 +570,7 @@ BTF_ID_FLAGS(func, hid_bpf_allocate_context, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, hid_bpf_release_context, KF_RELEASE)
BTF_ID_FLAGS(func, hid_bpf_hw_request)
BTF_ID_FLAGS(func, hid_bpf_hw_output_report)
+BTF_ID_FLAGS(func, hid_bpf_input_report)
BTF_KFUNCS_END(hid_bpf_syscall_kfunc_ids)
static const struct btf_kfunc_id_set hid_bpf_syscall_kfunc_set = {
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index 1243595890ba..b1fa0378e8f4 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -2975,6 +2975,7 @@ static struct hid_bpf_ops hid_ops = {
.hid_get_report = hid_get_report,
.hid_hw_raw_request = hid_hw_raw_request,
.hid_hw_output_report = hid_hw_output_report,
+ .hid_input_report = hid_input_report,
.owner = THIS_MODULE,
.bus_type = &hid_bus_type,
};
diff --git a/include/linux/hid_bpf.h b/include/linux/hid_bpf.h
index 5c7ff93dc73e..17b08f500098 100644
--- a/include/linux/hid_bpf.h
+++ b/include/linux/hid_bpf.h
@@ -104,6 +104,8 @@ struct hid_bpf_ops {
size_t len, enum hid_report_type rtype,
enum hid_class_request reqtype);
int (*hid_hw_output_report)(struct hid_device *hdev, __u8 *buf, size_t len);
+ int (*hid_input_report)(struct hid_device *hid, enum hid_report_type type,
+ u8 *data, u32 size, int interrupt);
struct module *owner;
const struct bus_type *bus_type;
};
--
2.43.0
Usual way of testing, we call the function and ensures we receive
the event
Signed-off-by: Benjamin Tissoires <[email protected]>
---
no changes in v3
no changes in v2
---
tools/testing/selftests/hid/hid_bpf.c | 49 +++++++++++++++++++++-
tools/testing/selftests/hid/progs/hid.c | 22 ++++++++++
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 4 ++
3 files changed, 73 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/hid/hid_bpf.c b/tools/testing/selftests/hid/hid_bpf.c
index 8332014838b0..f825623e3edc 100644
--- a/tools/testing/selftests/hid/hid_bpf.c
+++ b/tools/testing/selftests/hid/hid_bpf.c
@@ -749,6 +749,52 @@ TEST_F(hid_bpf, test_hid_change_report)
ASSERT_EQ(buf[2], 0) TH_LOG("leftovers_from_previous_test");
}
+/*
+ * Call hid_bpf_input_report against the given uhid device,
+ * check that the program is called and does the expected.
+ */
+TEST_F(hid_bpf, test_hid_user_input_report_call)
+{
+ struct hid_hw_request_syscall_args args = {
+ .retval = -1,
+ .size = 10,
+ };
+ DECLARE_LIBBPF_OPTS(bpf_test_run_opts, tattrs,
+ .ctx_in = &args,
+ .ctx_size_in = sizeof(args),
+ );
+ __u8 buf[10] = {0};
+ int err, prog_fd;
+
+ LOAD_BPF;
+
+ args.hid = self->hid_id;
+ args.data[0] = 1; /* report ID */
+ args.data[1] = 2; /* report ID */
+ args.data[2] = 42; /* report ID */
+
+ prog_fd = bpf_program__fd(self->skel->progs.hid_user_input_report);
+
+ /* check that there is no data to read from hidraw */
+ memset(buf, 0, sizeof(buf));
+ err = read(self->hidraw_fd, buf, sizeof(buf));
+ ASSERT_EQ(err, -1) TH_LOG("read_hidraw");
+
+ err = bpf_prog_test_run_opts(prog_fd, &tattrs);
+
+ ASSERT_OK(err) TH_LOG("error while calling bpf_prog_test_run_opts");
+
+ ASSERT_EQ(args.retval, 0);
+
+ /* read the data from hidraw */
+ memset(buf, 0, sizeof(buf));
+ err = read(self->hidraw_fd, buf, sizeof(buf));
+ ASSERT_EQ(err, 6) TH_LOG("read_hidraw");
+ ASSERT_EQ(buf[0], 1);
+ ASSERT_EQ(buf[1], 2);
+ ASSERT_EQ(buf[2], 42);
+}
+
/*
* Call hid_bpf_hw_output_report against the given uhid device,
* check that the program is called and does the expected.
@@ -797,8 +843,7 @@ TEST_F(hid_bpf, test_hid_user_output_report_call)
}
/*
- * Attach hid_user_raw_request to the given uhid device,
- * call the bpf program from userspace
+ * Call hid_hw_raw_request against the given uhid device,
* check that the program is called and does the expected.
*/
TEST_F(hid_bpf, test_hid_user_raw_request_call)
diff --git a/tools/testing/selftests/hid/progs/hid.c b/tools/testing/selftests/hid/progs/hid.c
index 2c2b679a83b1..f67d35def142 100644
--- a/tools/testing/selftests/hid/progs/hid.c
+++ b/tools/testing/selftests/hid/progs/hid.c
@@ -125,6 +125,28 @@ int hid_user_output_report(struct hid_hw_request_syscall_args *args)
return 0;
}
+SEC("syscall")
+int hid_user_input_report(struct hid_hw_request_syscall_args *args)
+{
+ struct hid_bpf_ctx *ctx;
+ const size_t size = args->size;
+ int i, ret = 0;
+
+ if (size > sizeof(args->data))
+ return -7; /* -E2BIG */
+
+ ctx = hid_bpf_allocate_context(args->hid);
+ if (!ctx)
+ return -1; /* EPERM check */
+
+ ret = hid_bpf_input_report(ctx, HID_INPUT_REPORT, args->data, size);
+ args->retval = ret;
+
+ hid_bpf_release_context(ctx);
+
+ return 0;
+}
+
static const __u8 rdesc[] = {
0x05, 0x01, /* USAGE_PAGE (Generic Desktop) */
0x09, 0x32, /* USAGE (Z) */
diff --git a/tools/testing/selftests/hid/progs/hid_bpf_helpers.h b/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
index 50c6a0d5765e..9cd56821d0f1 100644
--- a/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
+++ b/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
@@ -96,5 +96,9 @@ extern int hid_bpf_hw_request(struct hid_bpf_ctx *ctx,
enum hid_class_request reqtype) __ksym;
extern int hid_bpf_hw_output_report(struct hid_bpf_ctx *ctx,
__u8 *buf, size_t buf__sz) __ksym;
+extern int hid_bpf_input_report(struct hid_bpf_ctx *ctx,
+ enum hid_report_type type,
+ __u8 *data,
+ size_t buf__sz) __ksym;
#endif /* __HID_BPF_HELPERS_H */
--
2.43.0
Export the sleepable kfuncs we have on HID-BPF in tracing bpf programs,
but with the condition of being used in a sleepable context.
This allows to use the bpf_timer when used in a sleepable context
through bpf_timer_set_sleepable_cb() and initiate work from a device event.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
no changes in v3
new in v2
---
drivers/hid/bpf/hid_bpf_dispatch.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
index e1a650f4a626..275f2057c48d 100644
--- a/drivers/hid/bpf/hid_bpf_dispatch.c
+++ b/drivers/hid/bpf/hid_bpf_dispatch.c
@@ -544,6 +544,11 @@ __bpf_kfunc_end_defs();
*/
BTF_KFUNCS_START(hid_bpf_kfunc_ids)
BTF_ID_FLAGS(func, hid_bpf_get_data, KF_RET_NULL)
+BTF_ID_FLAGS(func, hid_bpf_allocate_context, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, hid_bpf_release_context, KF_RELEASE | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, hid_bpf_hw_request, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, hid_bpf_hw_output_report, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, hid_bpf_input_report, KF_SLEEPABLE)
BTF_KFUNCS_END(hid_bpf_kfunc_ids)
static const struct btf_kfunc_id_set hid_bpf_kfunc_set = {
@@ -566,11 +571,11 @@ static const struct btf_kfunc_id_set hid_bpf_fmodret_set = {
/* for syscall HID-BPF */
BTF_KFUNCS_START(hid_bpf_syscall_kfunc_ids)
BTF_ID_FLAGS(func, hid_bpf_attach_prog)
-BTF_ID_FLAGS(func, hid_bpf_allocate_context, KF_ACQUIRE | KF_RET_NULL)
-BTF_ID_FLAGS(func, hid_bpf_release_context, KF_RELEASE)
-BTF_ID_FLAGS(func, hid_bpf_hw_request)
-BTF_ID_FLAGS(func, hid_bpf_hw_output_report)
-BTF_ID_FLAGS(func, hid_bpf_input_report)
+BTF_ID_FLAGS(func, hid_bpf_allocate_context, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, hid_bpf_release_context, KF_RELEASE | KF_SLEEPABLE)
+BTF_ID_FLAGS(func, hid_bpf_hw_request, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, hid_bpf_hw_output_report, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, hid_bpf_input_report, KF_SLEEPABLE)
BTF_KFUNCS_END(hid_bpf_syscall_kfunc_ids)
static const struct btf_kfunc_id_set hid_bpf_syscall_kfunc_set = {
--
2.43.0
This test checks that we can actually delay a workload in a sleepable
context through bpf_timer.
When an event is injected, we push it on a map of type queue and schedule
a work.
When that work kicks in, it pulls the event from the queue, and wakes
up userspace through a ring buffer.
The use of the ring buffer is there to not have sleeps in userspace
because we have no guarantees of the timing of when those jobs will be
called.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
changes in v3:
- amended for the new API changes in v3
new in v2
---
tools/testing/selftests/hid/hid_bpf.c | 83 +++++++++++
tools/testing/selftests/hid/progs/hid.c | 152 +++++++++++++++++++++
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 2 +
3 files changed, 237 insertions(+)
diff --git a/tools/testing/selftests/hid/hid_bpf.c b/tools/testing/selftests/hid/hid_bpf.c
index f825623e3edc..c16efb43dd91 100644
--- a/tools/testing/selftests/hid/hid_bpf.c
+++ b/tools/testing/selftests/hid/hid_bpf.c
@@ -875,6 +875,89 @@ TEST_F(hid_bpf, test_hid_user_raw_request_call)
ASSERT_EQ(args.data[1], 2);
}
+static __u8 workload_data;
+
+static int handle_event(void *ctx, void *data, size_t data_sz)
+{
+ const __u8 *e = data;
+
+ workload_data = *e;
+
+ return 0;
+}
+
+TEST_F(hid_bpf, test_hid_schedule_work_defer_events_2)
+{
+ struct hid_hw_request_syscall_args args = {
+ .retval = -1,
+ .size = 10,
+ };
+ DECLARE_LIBBPF_OPTS(bpf_test_run_opts, tattrs,
+ .ctx_in = &args,
+ .ctx_size_in = sizeof(args),
+ );
+ const struct test_program progs[] = {
+ { .name = "hid_defer_bpf_timer" },
+ };
+ struct ring_buffer *rb = NULL;
+ __u8 buf[10] = {0};
+ int prog_fd, err;
+
+ LOAD_PROGRAMS(progs);
+
+ /* Set up ring buffer polling */
+ rb = ring_buffer__new(bpf_map__fd(self->skel->maps.rb), handle_event, NULL, NULL);
+ ASSERT_OK_PTR(rb) TH_LOG("Failed to create ring buffer");
+ ASSERT_EQ(workload_data, 0);
+
+ args.hid = self->hid_id;
+ prog_fd = bpf_program__fd(self->skel->progs.hid_setup_timer);
+
+ err = bpf_prog_test_run_opts(prog_fd, &tattrs);
+
+ ASSERT_OK(err) TH_LOG("error while calling bpf_prog_test_run_opts");
+
+ /* check that there is no data to read from hidraw */
+ memset(buf, 0, sizeof(buf));
+ err = read(self->hidraw_fd, buf, sizeof(buf));
+ ASSERT_EQ(err, -1) TH_LOG("read_hidraw");
+
+ /* inject one event */
+ buf[0] = 1;
+ buf[1] = 47;
+ buf[2] = 50;
+ uhid_send_event(_metadata, self->uhid_fd, buf, 6);
+
+ err = ring_buffer__poll(rb, 100 /* timeout, ms */);
+ ASSERT_EQ(err, 1) TH_LOG("error while calling ring_buffer__poll");
+ ASSERT_EQ(workload_data, 3);
+
+ /* read the data from hidraw */
+ memset(buf, 0, sizeof(buf));
+ err = read(self->hidraw_fd, buf, sizeof(buf));
+ ASSERT_EQ(err, 6) TH_LOG("read_hidraw");
+ ASSERT_EQ(buf[0], 2);
+ ASSERT_EQ(buf[1], 3);
+ ASSERT_EQ(buf[2], 4) TH_LOG("leftovers_from_previous_test");
+
+ err = ring_buffer__poll(rb, 100 /* timeout, ms */);
+ ASSERT_EQ(err, 1) TH_LOG("error while calling ring_buffer__poll");
+ ASSERT_EQ(workload_data, 4);
+
+ /* read the data from hidraw */
+ memset(buf, 0, sizeof(buf));
+ err = read(self->hidraw_fd, buf, sizeof(buf));
+ ASSERT_EQ(err, 6) TH_LOG("read_hidraw");
+ ASSERT_EQ(buf[0], 2);
+ ASSERT_EQ(buf[1], 4);
+ ASSERT_EQ(buf[2], 6);
+
+ /* read the data from hidraw */
+ memset(buf, 0, sizeof(buf));
+ err = read(self->hidraw_fd, buf, sizeof(buf));
+ ASSERT_EQ(err, -1) TH_LOG("read_hidraw");
+}
+
/*
* Attach hid_insert{0,1,2} to the given uhid device,
* retrieve and open the matching hidraw node,
diff --git a/tools/testing/selftests/hid/progs/hid.c b/tools/testing/selftests/hid/progs/hid.c
index f67d35def142..7afcc77bcc29 100644
--- a/tools/testing/selftests/hid/progs/hid.c
+++ b/tools/testing/selftests/hid/progs/hid.c
@@ -250,3 +250,155 @@ int BPF_PROG(hid_test_insert3, struct hid_bpf_ctx *hid_ctx)
return 0;
}
+
+struct test_report {
+ __u8 data[6];
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_QUEUE);
+ __uint(max_entries, 8);
+ __type(value, struct test_report);
+} queue SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_RINGBUF);
+ __uint(max_entries, 8);
+} rb SEC(".maps");
+
+struct elem {
+ struct bpf_timer t;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, 1024);
+ __type(key, u32);
+ __type(value, struct elem);
+} timer_map SEC(".maps");
+
+/* callback for timer_map timers */
+
+static int timer_cb1(void *map, int *key, struct bpf_timer *timer)
+{
+ struct hid_bpf_ctx *hid_ctx;
+ struct test_report buf;
+ __u8 *rb_elem;
+ int err;
+ int i, ret = 0;
+
+ /* do not pop the event, it'll be done in hid_offload_test() when
+ * notifying user space, this also allows to retry sending it
+ * if hid_bpf_input_report fails
+ */
+ if (bpf_map_peek_elem(&queue, &buf))
+ return 0;
+
+ hid_ctx = hid_bpf_allocate_context(*key);
+ if (!hid_ctx)
+ return 0; /* EPERM check */
+
+ buf.data[0] = 2;
+
+ /* re-inject the modified event into the HID stack */
+ err = hid_bpf_input_report(hid_ctx, HID_INPUT_REPORT, buf.data, sizeof(buf.data));
+ if (err == -16 /* -EBUSY */) {
+ /*
+ * This happens when we schedule the work with a 0 delay:
+ * the thread immediately starts but the current input
+ * processing hasn't finished yet. So the semaphore is
+ * already taken, and hid_input_report returns -EBUSY
+ */
+ /* schedule another attempt */
+ bpf_timer_start(timer, 0, BPF_F_TIMER_SLEEPABLE);
+
+ goto out;
+ }
+
+ if (bpf_map_pop_elem(&queue, &buf))
+ goto out;
+
+ rb_elem = bpf_ringbuf_reserve(&rb, sizeof(*rb_elem), 0);
+ if (!rb_elem)
+ goto out;
+
+ *rb_elem = buf.data[1];
+
+ bpf_ringbuf_submit(rb_elem, 0);
+
+ /* call ourself once again until there is no more events in the queue */
+ bpf_timer_start(timer, 0, BPF_F_TIMER_SLEEPABLE);
+
+ out:
+ hid_bpf_release_context(hid_ctx);
+ return 0;
+}
+
+#define CLOCK_MONOTONIC 1
+
+SEC("?fmod_ret/hid_bpf_device_event")
+int BPF_PROG(hid_defer_bpf_timer, struct hid_bpf_ctx *hctx)
+{
+ __u8 *data = hid_bpf_get_data(hctx, 0 /* offset */, 4 /* size */);
+ struct test_report buf = {
+ .data = {2, 3, 4, 5, 6, 7},
+ };
+ struct bpf_timer *timer;
+ int key = hctx->hid->id;
+ struct elem init = {};
+
+ if (!data)
+ return 0; /* EPERM check */
+
+ /* Only schedule a delayed work when reportID is 1, otherwise
+ * simply forward it to hidraw
+ */
+ if (data[0] != 1)
+ return 0;
+
+ bpf_map_push_elem(&queue, &buf, BPF_ANY);
+ buf.data[0] = 2;
+ buf.data[1] = 4;
+ buf.data[2] = 6;
+ bpf_map_push_elem(&queue, &buf, BPF_ANY);
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (!timer)
+ return 3;
+
+ bpf_timer_set_sleepable_cb(timer, timer_cb1);
+
+ if (bpf_timer_start(timer, 0, BPF_F_TIMER_SLEEPABLE) != 0)
+ return 2;
+
+ return -1; /* discard the event */
+}
+
+SEC("syscall")
+int hid_setup_timer(struct hid_hw_request_syscall_args *args)
+{
+ struct hid_bpf_ctx *ctx;
+ struct bpf_timer *timer;
+ struct elem init = {};
+ int key = args->hid;
+ int i, ret = 0;
+
+ ctx = hid_bpf_allocate_context(args->hid);
+ if (!ctx)
+ return -1; /* EPERM check */
+
+ bpf_map_update_elem(&timer_map, &key, &init, 0);
+
+ timer = bpf_map_lookup_elem(&timer_map, &key);
+ if (!timer) {
+ hid_bpf_release_context(ctx);
+ return 1;
+ }
+
+ bpf_timer_init(timer, &timer_map, CLOCK_MONOTONIC);
+
+ hid_bpf_release_context(ctx);
+
+ return 0;
+}
+
diff --git a/tools/testing/selftests/hid/progs/hid_bpf_helpers.h b/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
index 9cd56821d0f1..8235a28e7dee 100644
--- a/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
+++ b/tools/testing/selftests/hid/progs/hid_bpf_helpers.h
@@ -100,5 +100,7 @@ extern int hid_bpf_input_report(struct hid_bpf_ctx *ctx,
enum hid_report_type type,
__u8 *data,
size_t buf__sz) __ksym;
+extern int bpf_timer_set_sleepable_cb(struct bpf_timer *timer,
+ int (callback_fn)(void *map, int *key, struct bpf_timer *timer)) __ksym;
#endif /* __HID_BPF_HELPERS_H */
--
2.43.0
It's always a good idea to have KASAN in tests.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
new in v3
---
tools/testing/selftests/hid/config.common | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/hid/config.common b/tools/testing/selftests/hid/config.common
index 0f456dbab62f..45b5570441ce 100644
--- a/tools/testing/selftests/hid/config.common
+++ b/tools/testing/selftests/hid/config.common
@@ -238,3 +238,4 @@ CONFIG_VLAN_8021Q=y
CONFIG_XFRM_SUB_POLICY=y
CONFIG_XFRM_USER=y
CONFIG_ZEROPLUS_FF=y
+CONFIG_KASAN=y
--
2.43.0
Currently we have a special case for BPF_FUNC_timer_set_callback,
let's introduce a helper we can extend for the kfunc that will come in
a later patch
Signed-off-by: Benjamin Tissoires <[email protected]>
---
new in v3
---
kernel/bpf/verifier.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7a4b19bea2ac..f81c799b2c80 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -528,6 +528,11 @@ static bool is_sync_callback_calling_insn(struct bpf_insn *insn)
(bpf_pseudo_kfunc_call(insn) && is_sync_callback_calling_kfunc(insn->imm));
}
+static bool is_async_callback_calling_insn(struct bpf_insn *insn)
+{
+ return (bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm));
+}
+
static bool is_storage_get_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_sk_storage_get ||
@@ -9445,9 +9450,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
return -EFAULT;
}
- if (insn->code == (BPF_JMP | BPF_CALL) &&
- insn->src_reg == 0 &&
- insn->imm == BPF_FUNC_timer_set_callback) {
+ if (is_async_callback_calling_insn(insn)) {
struct bpf_verifier_state *async_cb;
/* there is no real recursion here. timer callbacks are async */
@@ -15588,7 +15591,7 @@ static int visit_insn(int t, struct bpf_verifier_env *env)
return DONE_EXPLORING;
case BPF_CALL:
- if (insn->src_reg == 0 && insn->imm == BPF_FUNC_timer_set_callback)
+ if (is_async_callback_calling_insn(insn))
/* Mark this call insn as a prune point to trigger
* is_state_visited() check before call itself is
* processed by __check_func_call(). Otherwise new
--
2.43.0
Now that we have bpf_timer_set_sleepable_cb() available and working, we
can tag the attached callback as sleepable, and let the verifier check
in the correct context the calls and kfuncs.
Signed-off-by: Benjamin Tissoires <[email protected]>
---
new in v3 (split from v2 02/10)
---
include/linux/bpf_verifier.h | 2 ++
kernel/bpf/verifier.c | 16 +++++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 84365e6dd85d..789ef5fec547 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -426,6 +426,7 @@ struct bpf_verifier_state {
* while they are still in use.
*/
bool used_as_loop_entry;
+ bool in_sleepable;
/* first and last insn idx of this verifier state */
u32 first_insn_idx;
@@ -626,6 +627,7 @@ struct bpf_subprog_info {
bool is_async_cb: 1;
bool is_exception_cb: 1;
bool args_cached: 1;
+ bool is_sleepable: 1;
u8 arg_cnt;
struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 91e583c6feba..4766c43606c4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -505,6 +505,8 @@ static bool is_async_callback_calling_kfunc(u32 btf_id);
static bool is_callback_calling_kfunc(u32 btf_id);
static bool is_bpf_throw_kfunc(struct bpf_insn *insn);
+static bool is_bpf_timer_set_sleepable_cb_kfunc(u32 btf_id);
+
static bool is_sync_callback_calling_function(enum bpf_func_id func_id)
{
return func_id == BPF_FUNC_for_each_map_elem ||
@@ -1422,6 +1424,7 @@ static int copy_verifier_state(struct bpf_verifier_state *dst_state,
}
dst_state->speculative = src->speculative;
dst_state->active_rcu_lock = src->active_rcu_lock;
+ dst_state->in_sleepable = src->in_sleepable;
dst_state->curframe = src->curframe;
dst_state->active_lock.ptr = src->active_lock.ptr;
dst_state->active_lock.id = src->active_lock.id;
@@ -2421,6 +2424,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
* Initialize it similar to do_check_common().
*/
elem->st.branches = 1;
+ elem->st.in_sleepable = env->subprog_info[subprog].is_sleepable;
frame = kzalloc(sizeof(*frame), GFP_KERNEL);
if (!frame)
goto err;
@@ -5265,7 +5269,8 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
static bool in_sleepable(struct bpf_verifier_env *env)
{
- return env->prog->aux->sleepable;
+ return env->prog->aux->sleepable ||
+ (env->cur_state && env->cur_state->in_sleepable);
}
/* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
@@ -9478,6 +9483,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
/* there is no real recursion here. timer callbacks are async */
env->subprog_info[subprog].is_async_cb = true;
+ env->subprog_info[subprog].is_sleepable = is_bpf_timer_set_sleepable_cb_kfunc(insn->imm);
async_cb = push_async_cb(env, env->subprog_info[subprog].start,
insn_idx, subprog);
if (!async_cb)
@@ -11361,6 +11367,11 @@ static bool is_bpf_throw_kfunc(struct bpf_insn *insn)
insn->imm == special_kfunc_list[KF_bpf_throw];
}
+static bool is_bpf_timer_set_sleepable_cb_kfunc(u32 btf_id)
+{
+ return btf_id == special_kfunc_list[KF_bpf_timer_set_sleepable_cb];
+}
+
static bool is_callback_calling_kfunc(u32 btf_id)
{
return is_sync_callback_calling_kfunc(btf_id) ||
@@ -16830,6 +16841,9 @@ static bool states_equal(struct bpf_verifier_env *env,
if (old->active_rcu_lock != cur->active_rcu_lock)
return false;
+ if (old->in_sleepable != cur->in_sleepable)
+ return false;
+
/* for states to be equal callsites have to be the same
* and all frame states need to be equivalent
*/
--
2.43.0
In this patch, bpf_timer_set_sleepable_cb() is functionally equivalent
to bpf_timer_set_callback(), to the exception that it enforces
the timer to be started with BPF_F_TIMER_SLEEPABLE.
But given that bpf_timer_set_callback() is a helper when
bpf_timer_set_sleepable_cb() is a kfunc, we need to teach the verifier
about its attached callback.
Marking that callback as sleepable will be done in a separate patch
Signed-off-by: Benjamin Tissoires <[email protected]>
---
new in v3 (split from v2 02/10)
---
kernel/bpf/helpers.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
kernel/bpf/verifier.c | 31 +++++++++++++++++++++++++++++--
2 files changed, 75 insertions(+), 4 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index f9add0abe40a..2c6dc3d0ffff 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1108,6 +1108,7 @@ struct bpf_hrtimer {
void __rcu *callback_fn;
void *value;
struct semaphore sleepable_lock;
+ bool is_sleepable;
};
/* the actual struct hidden inside uapi struct bpf_timer */
@@ -1270,8 +1271,8 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
.arg3_type = ARG_ANYTHING,
};
-BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
- struct bpf_prog_aux *, aux)
+static int __bpf_timer_set_callback(struct bpf_timer_kern *timer, void *callback_fn,
+ struct bpf_prog_aux *aux, bool is_sleepable)
{
struct bpf_prog *prev, *prog = aux->prog;
struct bpf_hrtimer *t;
@@ -1311,12 +1312,19 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
t->prog = prog;
}
rcu_assign_pointer(t->callback_fn, callback_fn);
+ t->is_sleepable = is_sleepable;
up(&t->sleepable_lock);
out:
__bpf_spin_unlock_irqrestore(&timer->lock);
return ret;
}
+BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
+ struct bpf_prog_aux *, aux)
+{
+ return __bpf_timer_set_callback(timer, callback_fn, aux, false);
+}
+
static const struct bpf_func_proto bpf_timer_set_callback_proto = {
.func = bpf_timer_set_callback,
.gpl_only = true,
@@ -1342,6 +1350,11 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
goto out;
}
+ if (t->is_sleepable && !(flags & BPF_F_TIMER_SLEEPABLE)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
if (flags & BPF_F_TIMER_ABS)
mode = HRTIMER_MODE_ABS_SOFT;
else
@@ -2606,6 +2619,36 @@ __bpf_kfunc void bpf_throw(u64 cookie)
WARN(1, "A call to BPF exception callback should never return\n");
}
+/**
+ * bpf_timer_set_sleepable_cb() - Configure the timer to call %callback_fn
+ * static function in a sleepable context.
+ * @timer: The bpf_timer that needs to be configured
+ * @callback_fn: a static bpf function
+ *
+ * @returns %0 on success. %-EINVAL if %timer was not initialized with
+ * bpf_timer_init() earlier. %-EPERM if %timer is in a map that doesn't
+ * have any user references.
+ * The user space should either hold a file descriptor to a map with timers
+ * or pin such map in bpffs. When map is unpinned or file descriptor is
+ * closed all timers in the map will be cancelled and freed.
+ *
+ * This kfunc is equivalent to %bpf_timer_set_callback except that it tells
+ * the verifier that the target callback is run in a sleepable context.
+ */
+__bpf_kfunc int bpf_timer_set_sleepable_cb(struct bpf_timer_kern *timer,
+ int (callback_fn)(void *map, int *key, struct bpf_timer *timer))
+{
+ struct bpf_throw_ctx ctx = {};
+
+ arch_bpf_stack_walk(bpf_stack_walker, &ctx);
+ WARN_ON_ONCE(!ctx.aux);
+
+ if (!ctx.aux)
+ return -EINVAL;
+
+ return __bpf_timer_set_callback(timer, (void *)callback_fn, ctx.aux, true);
+}
+
__bpf_kfunc_end_defs();
BTF_KFUNCS_START(generic_btf_ids)
@@ -2682,6 +2725,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
BTF_ID_FLAGS(func, bpf_dynptr_size)
BTF_ID_FLAGS(func, bpf_dynptr_clone)
+BTF_ID_FLAGS(func, bpf_timer_set_sleepable_cb)
BTF_KFUNCS_END(common_btf_ids)
static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2b11687063ff..91e583c6feba 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -501,6 +501,8 @@ static bool is_dynptr_ref_function(enum bpf_func_id func_id)
}
static bool is_sync_callback_calling_kfunc(u32 btf_id);
+static bool is_async_callback_calling_kfunc(u32 btf_id);
+static bool is_callback_calling_kfunc(u32 btf_id);
static bool is_bpf_throw_kfunc(struct bpf_insn *insn);
static bool is_sync_callback_calling_function(enum bpf_func_id func_id)
@@ -530,7 +532,8 @@ static bool is_sync_callback_calling_insn(struct bpf_insn *insn)
static bool is_async_callback_calling_insn(struct bpf_insn *insn)
{
- return (bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm));
+ return (bpf_helper_call(insn) && is_async_callback_calling_function(insn->imm)) ||
+ (bpf_pseudo_kfunc_call(insn) && is_async_callback_calling_kfunc(insn->imm));
}
static bool is_storage_get_function(enum bpf_func_id func_id)
@@ -9459,7 +9462,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
*/
env->subprog_info[subprog].is_cb = true;
if (bpf_pseudo_kfunc_call(insn) &&
- !is_sync_callback_calling_kfunc(insn->imm)) {
+ !is_callback_calling_kfunc(insn->imm)) {
verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n",
func_id_name(insn->imm), insn->imm);
return -EFAULT;
@@ -10963,6 +10966,7 @@ enum special_kfunc_type {
KF_bpf_percpu_obj_drop_impl,
KF_bpf_throw,
KF_bpf_iter_css_task_new,
+ KF_bpf_timer_set_sleepable_cb,
};
BTF_SET_START(special_kfunc_set)
@@ -10989,6 +10993,7 @@ BTF_ID(func, bpf_throw)
#ifdef CONFIG_CGROUPS
BTF_ID(func, bpf_iter_css_task_new)
#endif
+BTF_ID(func, bpf_timer_set_sleepable_cb)
BTF_SET_END(special_kfunc_set)
BTF_ID_LIST(special_kfunc_list)
@@ -11019,6 +11024,7 @@ BTF_ID(func, bpf_iter_css_task_new)
#else
BTF_ID_UNUSED
#endif
+BTF_ID(func, bpf_timer_set_sleepable_cb)
static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta)
{
@@ -11344,12 +11350,23 @@ static bool is_sync_callback_calling_kfunc(u32 btf_id)
return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl];
}
+static bool is_async_callback_calling_kfunc(u32 btf_id)
+{
+ return btf_id == special_kfunc_list[KF_bpf_timer_set_sleepable_cb];
+}
+
static bool is_bpf_throw_kfunc(struct bpf_insn *insn)
{
return bpf_pseudo_kfunc_call(insn) && insn->off == 0 &&
insn->imm == special_kfunc_list[KF_bpf_throw];
}
+static bool is_callback_calling_kfunc(u32 btf_id)
+{
+ return is_sync_callback_calling_kfunc(btf_id) ||
+ is_async_callback_calling_kfunc(btf_id);
+}
+
static bool is_rbtree_lock_required_kfunc(u32 btf_id)
{
return is_bpf_rbtree_api_kfunc(btf_id);
@@ -12120,6 +12137,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
}
}
+ if (is_async_callback_calling_kfunc(meta.func_id)) {
+ err = push_callback_call(env, insn, insn_idx, meta.subprogno,
+ set_timer_callback_state);
+ if (err) {
+ verbose(env, "kfunc %s#%d failed callback verification\n",
+ func_name, meta.func_id);
+ return err;
+ }
+ }
+
rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta);
rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta);
--
2.43.0
On Feb 21 2024, Benjamin Tissoires wrote:
> They are implemented as a workqueue, which means that there are no
> guarantees of timing nor ordering.
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
>
> ---
>
> changes in v3:
> - extracted the implementation in bpf_timer only, without
> bpf_timer_set_sleepable_cb()
> - rely on schedule_work() only, from bpf_timer_start()
> - add semaphore to ensure bpf_timer_work_cb() is accessing
> consistent data
>
> changes in v2 (compared to the one attaches to v1 0/9):
> - make use of a kfunc
> - add a (non-used) BPF_F_TIMER_SLEEPABLE
> - the callback is *not* called, it makes the kernel crashes
> ---
> include/uapi/linux/bpf.h | 4 +++
> kernel/bpf/helpers.c | 92 ++++++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 82 insertions(+), 14 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d96708380e52..1fc7ecbd9d33 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7421,10 +7421,14 @@ struct bpf_core_relo {
> * - BPF_F_TIMER_ABS: Timeout passed is absolute time, by default it is
> * relative to current time.
> * - BPF_F_TIMER_CPU_PIN: Timer will be pinned to the CPU of the caller.
> + * - BPF_F_TIMER_SLEEPABLE: Timer will run in a sleepable context, with
> + * no guarantees of ordering nor timing (consider this as being just
> + * offloaded immediately).
> */
> enum {
> BPF_F_TIMER_ABS = (1ULL << 0),
> BPF_F_TIMER_CPU_PIN = (1ULL << 1),
> + BPF_F_TIMER_SLEEPABLE = (1ULL << 2),
> };
>
> /* BPF numbers iterator state */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 93edf730d288..f9add0abe40a 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -23,6 +23,7 @@
> #include <linux/btf_ids.h>
> #include <linux/bpf_mem_alloc.h>
> #include <linux/kasan.h>
> +#include <linux/semaphore.h>
>
> #include "../../lib/kstrtox.h"
>
> @@ -1094,13 +1095,19 @@ const struct bpf_func_proto bpf_snprintf_proto = {
> * bpf_timer_cancel() cancels the timer and decrements prog's refcnt.
> * Inner maps can contain bpf timers as well. ops->map_release_uref is
> * freeing the timers when inner map is replaced or deleted by user space.
> + *
> + * sleepable_lock protects only the setup of the workqueue, not the callback
> + * itself. This is done to ensure we don't run concurrently a free of the
> + * callback or the associated program.
> */
> struct bpf_hrtimer {
> struct hrtimer timer;
> + struct work_struct work;
> struct bpf_map *map;
> struct bpf_prog *prog;
> void __rcu *callback_fn;
> void *value;
> + struct semaphore sleepable_lock;
> };
>
> /* the actual struct hidden inside uapi struct bpf_timer */
> @@ -1113,6 +1120,55 @@ struct bpf_timer_kern {
> struct bpf_spin_lock lock;
> } __attribute__((aligned(8)));
>
> +static u32 __bpf_timer_compute_key(struct bpf_hrtimer *timer)
> +{
> + struct bpf_map *map = timer->map;
> + void *value = timer->value;
> +
> + if (map->map_type == BPF_MAP_TYPE_ARRAY) {
> + struct bpf_array *array = container_of(map, struct bpf_array, map);
> +
> + /* compute the key */
> + return ((char *)value - array->value) / array->elem_size;
> + }
> +
> + /* hash or lru */
> + return *(u32 *)(value - round_up(map->key_size, 8));
> +}
> +
> +static void bpf_timer_work_cb(struct work_struct *work)
> +{
> + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, work);
> + struct bpf_map *map = t->map;
> + void *value = t->value;
> + bpf_callback_t callback_fn;
> + u32 key;
> +
> + BTF_TYPE_EMIT(struct bpf_timer);
> +
> + down(&t->sleepable_lock);
> +
> + callback_fn = READ_ONCE(t->callback_fn);
> + if (!callback_fn) {
> + up(&t->sleepable_lock);
> + return;
> + }
> +
> + key = __bpf_timer_compute_key(t);
> +
> + /* prevent the callback to be freed by bpf_timer_cancel() while running
> + * so we can release the semaphore
> + */
> + bpf_prog_inc(t->prog);
> +
> + up(&t->sleepable_lock);
> +
> + callback_fn((u64)(long)map, (u64)(long)&key, (u64)(long)value, 0, 0);
> + /* The verifier checked that return value is zero. */
> +
> + bpf_prog_put(t->prog);
> +}
> +
> static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
>
> static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
> @@ -1121,8 +1177,7 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
> struct bpf_map *map = t->map;
> void *value = t->value;
> bpf_callback_t callback_fn;
> - void *key;
> - u32 idx;
> + u32 key;
>
> BTF_TYPE_EMIT(struct bpf_timer);
> callback_fn = rcu_dereference_check(t->callback_fn, rcu_read_lock_bh_held());
> @@ -1136,17 +1191,9 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
> * bpf_map_delete_elem() on the same timer.
> */
> this_cpu_write(hrtimer_running, t);
> - if (map->map_type == BPF_MAP_TYPE_ARRAY) {
> - struct bpf_array *array = container_of(map, struct bpf_array, map);
> -
> - /* compute the key */
> - idx = ((char *)value - array->value) / array->elem_size;
> - key = &idx;
> - } else { /* hash or lru */
> - key = value - round_up(map->key_size, 8);
> - }
> + key = __bpf_timer_compute_key(t);
>
> - callback_fn((u64)(long)map, (u64)(long)key, (u64)(long)value, 0, 0);
> + callback_fn((u64)(long)map, (u64)(long)&key, (u64)(long)value, 0, 0);
> /* The verifier checked that return value is zero. */
>
> this_cpu_write(hrtimer_running, NULL);
> @@ -1191,6 +1238,8 @@ BPF_CALL_3(bpf_timer_init, struct bpf_timer_kern *, timer, struct bpf_map *, map
> t->prog = NULL;
> rcu_assign_pointer(t->callback_fn, NULL);
> hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
> + INIT_WORK(&t->work, bpf_timer_work_cb);
> + sema_init(&t->sleepable_lock, 1);
> t->timer.function = bpf_timer_cb;
> WRITE_ONCE(timer->timer, t);
> /* Guarantee the order between timer->timer and map->usercnt. So
> @@ -1245,6 +1294,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
> ret = -EPERM;
> goto out;
> }
> + down(&t->sleepable_lock);
> prev = t->prog;
> if (prev != prog) {
> /* Bump prog refcnt once. Every bpf_timer_set_callback()
> @@ -1261,6 +1311,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
> t->prog = prog;
> }
> rcu_assign_pointer(t->callback_fn, callback_fn);
> + up(&t->sleepable_lock);
> out:
> __bpf_spin_unlock_irqrestore(&timer->lock);
> return ret;
> @@ -1282,7 +1333,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
>
> if (in_nmi())
> return -EOPNOTSUPP;
> - if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN))
> + if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN | BPF_F_TIMER_SLEEPABLE))
> return -EINVAL;
> __bpf_spin_lock_irqsave(&timer->lock);
> t = timer->timer;
> @@ -1299,7 +1350,10 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
> if (flags & BPF_F_TIMER_CPU_PIN)
> mode |= HRTIMER_MODE_PINNED;
>
> - hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> + if (flags & BPF_F_TIMER_SLEEPABLE)
> + schedule_work(&t->work);
> + else
> + hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> out:
> __bpf_spin_unlock_irqrestore(&timer->lock);
> return ret;
> @@ -1346,13 +1400,21 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
> ret = -EDEADLK;
> goto out;
> }
> + down(&t->sleepable_lock);
Sigh. I initially used a semaphore because here I wanted to have a
down_trylock() to mimic the behavior of hrtimer. However, this doesn't
work because we don't know who is actually calling bpf_timer_cancel(),
and we might not be able to cancel the timer from other threads. And
actually it doesn't matter because the semaphore is just preventing the
setup of the callback, not the sleepable callback itself so it's fine to
call bpf_timer_cancel() from within the callback itself: the timer will
be freed but the callback will not because the associated prog is
incremented before entering the callback.
Anyway, I better change this as a simple spinlock (or bpf_spinlock).
Also I realized that I still have the RFC in the prefix.
I can repost a v4 with the spinlock change if it is better to not have
the RFC.
Cheers,
Benjamin
> drop_prog_refcnt(t);
> + up(&t->sleepable_lock);
> out:
> __bpf_spin_unlock_irqrestore(&timer->lock);
> /* Cancel the timer and wait for associated callback to finish
> * if it was running.
> */
> ret = ret ?: hrtimer_cancel(&t->timer);
> +
> + /* also cancel the sleepable work, but *do not* wait for
> + * it to finish if it was running as we might not be in a
> + * sleepable context
> + */
> + ret = ret ?: cancel_work(&t->work);
> return ret;
> }
>
> @@ -1407,6 +1469,8 @@ void bpf_timer_cancel_and_free(void *val)
> */
> if (this_cpu_read(hrtimer_running) != t)
> hrtimer_cancel(&t->timer);
> +
> + cancel_work_sync(&t->work);
> kfree(t);
> }
>
>
> --
> 2.43.0
>
Benjamin Tissoires <[email protected]> writes:
> @@ -1245,6 +1294,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
> ret = -EPERM;
> goto out;
> }
> + down(&t->sleepable_lock);
> prev = t->prog;
> if (prev != prog) {
> /* Bump prog refcnt once. Every bpf_timer_set_callback()
> @@ -1261,6 +1311,7 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
> t->prog = prog;
> }
> rcu_assign_pointer(t->callback_fn, callback_fn);
> + up(&t->sleepable_lock);
> out:
> __bpf_spin_unlock_irqrestore(&timer->lock);
> return ret;
> @@ -1282,7 +1333,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
>
> if (in_nmi())
> return -EOPNOTSUPP;
> - if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN))
> + if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN | BPF_F_TIMER_SLEEPABLE))
> return -EINVAL;
> __bpf_spin_lock_irqsave(&timer->lock);
> t = timer->timer;
> @@ -1299,7 +1350,10 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
> if (flags & BPF_F_TIMER_CPU_PIN)
> mode |= HRTIMER_MODE_PINNED;
>
> - hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> + if (flags & BPF_F_TIMER_SLEEPABLE)
> + schedule_work(&t->work);
> + else
> + hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> out:
> __bpf_spin_unlock_irqrestore(&timer->lock);
> return ret;
I think it's a little weird to just ignore the timeout parameter when
called with the sleepable flag. But I guess it can work at least as a
first pass; however, in that case we should enforce that the caller
passes in a timeout of 0, so that if we do add support for a timeout for
sleepable timers in the future, callers will be able to detect this.
-Toke
On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
[...]
> diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
> index e630caf644e8..52abb27426f4 100644
> --- a/drivers/hid/bpf/hid_bpf_dispatch.c
> +++ b/drivers/hid/bpf/hid_bpf_dispatch.c
> @@ -143,48 +143,6 @@ u8 *call_hid_bpf_rdesc_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *s
> }
> EXPORT_SYMBOL_GPL(call_hid_bpf_rdesc_fixup);
>
> -/* Disables missing prototype warnings */
> -__bpf_kfunc_start_defs();
Note:
this patch does not apply on top of current bpf-next [0] because
__bpf_kfunc_start_defs and __bpf_kfunc are not present in [0].
[0] commit 58fd62e0aa50 ("bpf: Clarify batch lookup/lookup_and_delete semantics")
> -
> -/**
> - * hid_bpf_get_data - Get the kernel memory pointer associated with the context @ctx
> - *
> - * @ctx: The HID-BPF context
> - * @offset: The offset within the memory
> - * @rdwr_buf_size: the const size of the buffer
> - *
> - * @returns %NULL on error, an %__u8 memory pointer on success
> - */
> -__bpf_kfunc __u8 *
> -hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const size_t rdwr_buf_size)
> -{
> - struct hid_bpf_ctx_kern *ctx_kern;
> -
> - if (!ctx)
> - return NULL;
[...]
On Wed, Feb 21, 2024 at 8:25 AM Benjamin Tissoires <[email protected]> wrote:
>
> In this patch, bpf_timer_set_sleepable_cb() is functionally equivalent
> to bpf_timer_set_callback(), to the exception that it enforces
> the timer to be started with BPF_F_TIMER_SLEEPABLE.
>
> But given that bpf_timer_set_callback() is a helper when
> bpf_timer_set_sleepable_cb() is a kfunc, we need to teach the verifier
> about its attached callback.
> Marking that callback as sleepable will be done in a separate patch
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
>
> ---
>
> new in v3 (split from v2 02/10)
> ---
> kernel/bpf/helpers.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
> kernel/bpf/verifier.c | 31 +++++++++++++++++++++++++++++--
> 2 files changed, 75 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index f9add0abe40a..2c6dc3d0ffff 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1108,6 +1108,7 @@ struct bpf_hrtimer {
> void __rcu *callback_fn;
> void *value;
> struct semaphore sleepable_lock;
> + bool is_sleepable;
> };
>
> /* the actual struct hidden inside uapi struct bpf_timer */
> @@ -1270,8 +1271,8 @@ static const struct bpf_func_proto bpf_timer_init_proto = {
> .arg3_type = ARG_ANYTHING,
> };
>
> -BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
> - struct bpf_prog_aux *, aux)
> +static int __bpf_timer_set_callback(struct bpf_timer_kern *timer, void *callback_fn,
> + struct bpf_prog_aux *aux, bool is_sleepable)
> {
> struct bpf_prog *prev, *prog = aux->prog;
> struct bpf_hrtimer *t;
> @@ -1311,12 +1312,19 @@ BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callb
> t->prog = prog;
> }
> rcu_assign_pointer(t->callback_fn, callback_fn);
> + t->is_sleepable = is_sleepable;
> up(&t->sleepable_lock);
> out:
> __bpf_spin_unlock_irqrestore(&timer->lock);
> return ret;
> }
>
> +BPF_CALL_3(bpf_timer_set_callback, struct bpf_timer_kern *, timer, void *, callback_fn,
> + struct bpf_prog_aux *, aux)
> +{
> + return __bpf_timer_set_callback(timer, callback_fn, aux, false);
> +}
> +
> static const struct bpf_func_proto bpf_timer_set_callback_proto = {
> .func = bpf_timer_set_callback,
> .gpl_only = true,
> @@ -1342,6 +1350,11 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
> goto out;
> }
>
> + if (t->is_sleepable && !(flags & BPF_F_TIMER_SLEEPABLE)) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> if (flags & BPF_F_TIMER_ABS)
> mode = HRTIMER_MODE_ABS_SOFT;
> else
> @@ -2606,6 +2619,36 @@ __bpf_kfunc void bpf_throw(u64 cookie)
> WARN(1, "A call to BPF exception callback should never return\n");
> }
>
> +/**
> + * bpf_timer_set_sleepable_cb() - Configure the timer to call %callback_fn
> + * static function in a sleepable context.
> + * @timer: The bpf_timer that needs to be configured
> + * @callback_fn: a static bpf function
> + *
> + * @returns %0 on success. %-EINVAL if %timer was not initialized with
> + * bpf_timer_init() earlier. %-EPERM if %timer is in a map that doesn't
> + * have any user references.
> + * The user space should either hold a file descriptor to a map with timers
> + * or pin such map in bpffs. When map is unpinned or file descriptor is
> + * closed all timers in the map will be cancelled and freed.
> + *
> + * This kfunc is equivalent to %bpf_timer_set_callback except that it tells
> + * the verifier that the target callback is run in a sleepable context.
> + */
> +__bpf_kfunc int bpf_timer_set_sleepable_cb(struct bpf_timer_kern *timer,
> + int (callback_fn)(void *map, int *key, struct bpf_timer *timer))
> +{
> + struct bpf_throw_ctx ctx = {};
> +
> + arch_bpf_stack_walk(bpf_stack_walker, &ctx);
> + WARN_ON_ONCE(!ctx.aux);
Sorry. Why such complexity?
Please see how do_misc_fixups() handles BPF_FUNC_timer_set_callback.
On Wed, Feb 21, 2024 at 8:25 AM Benjamin Tissoires <[email protected]> wrote:
> /* the actual struct hidden inside uapi struct bpf_timer */
> @@ -1113,6 +1120,55 @@ struct bpf_timer_kern {
> struct bpf_spin_lock lock;
> } __attribute__((aligned(8)));
>
> +static u32 __bpf_timer_compute_key(struct bpf_hrtimer *timer)
> +{
> + struct bpf_map *map = timer->map;
> + void *value = timer->value;
> +
> + if (map->map_type == BPF_MAP_TYPE_ARRAY) {
> + struct bpf_array *array = container_of(map, struct bpf_array, map);
> +
> + /* compute the key */
> + return ((char *)value - array->value) / array->elem_size;
> + }
> +
> + /* hash or lru */
> + return *(u32 *)(value - round_up(map->key_size, 8));
> +}
> +
> +static void bpf_timer_work_cb(struct work_struct *work)
> +{
> + struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, work);
> + struct bpf_map *map = t->map;
> + void *value = t->value;
> + bpf_callback_t callback_fn;
> + u32 key;
> +
> + BTF_TYPE_EMIT(struct bpf_timer);
> +
> + down(&t->sleepable_lock);
> +
> + callback_fn = READ_ONCE(t->callback_fn);
> + if (!callback_fn) {
> + up(&t->sleepable_lock);
> + return;
> + }
> +
> + key = __bpf_timer_compute_key(t);
> +
> +
> + callback_fn((u64)(long)map, (u64)(long)&key, (u64)(long)value, 0, 0);
> + /* The verifier checked that return value is zero. */
> +
> + bpf_prog_put(t->prog);
> +}
> +
> static DEFINE_PER_CPU(struct bpf_hrtimer *, hrtimer_running);
>
> static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
> @@ -1121,8 +1177,7 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
> struct bpf_map *map = t->map;
> void *value = t->value;
> bpf_callback_t callback_fn;
> - void *key;
> - u32 idx;
> + u32 key;
>
> BTF_TYPE_EMIT(struct bpf_timer);
> callback_fn = rcu_dereference_check(t->callback_fn, rcu_read_lock_bh_held());
> @@ -1136,17 +1191,9 @@ static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer)
> * bpf_map_delete_elem() on the same timer.
> */
> this_cpu_write(hrtimer_running, t);
> - if (map->map_type == BPF_MAP_TYPE_ARRAY) {
> - struct bpf_array *array = container_of(map, struct bpf_array, map);
> -
> - /* compute the key */
> - idx = ((char *)value - array->value) / array->elem_size;
> - key = &idx;
> - } else { /* hash or lru */
> - key = value - round_up(map->key_size, 8);
> - }
> + key = __bpf_timer_compute_key(t);
Please don't mix such "cleanup" with main changes.
It's buggy for a hash map.
Instead of passing a pointer to the real key into bpf prog
you're reading the first 4 bytes from the key. Copying it into a temp var
and passing an address to that.
It would have been very painful to debug such a bug if it slipped through,
since bpf prog would sort-of work for 4-byte keys.
On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
[...]
> @@ -1282,7 +1333,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
>
> if (in_nmi())
> return -EOPNOTSUPP;
> - if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN))
> + if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN | BPF_F_TIMER_SLEEPABLE))
> return -EINVAL;
> __bpf_spin_lock_irqsave(&timer->lock);
> t = timer->timer;
> @@ -1299,7 +1350,10 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
> if (flags & BPF_F_TIMER_CPU_PIN)
> mode |= HRTIMER_MODE_PINNED;
>
> - hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> + if (flags & BPF_F_TIMER_SLEEPABLE)
> + schedule_work(&t->work);
> + else
> + hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
It looks like nsecs is simply ignored for sleepable timers.
Should this be hrtimer_start() that waits nsecs and schedules work,
or schedule_delayed_work()? (but it takes delay in jiffies, which is
probably too coarse). Sorry if I miss something.
On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
[...]
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index f81c799b2c80..2b11687063ff 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5444,6 +5444,26 @@ static int check_map_access(struct bpf_verifier_env *env, u32 regno,
> return -EACCES;
> }
> break;
> + case BPF_TIMER:
> + /* FIXME: kptr does the above, should we use the same? */
I don't think so.
Basically this allows double word reads / writes from timer address,
which probably should not be allowed.
The ACCESS_DIRECT is passed to check_map_access() from
check_mem_access() and I don't see points where check_mem_access()
call would be triggered for pointer parameter of kfunc
(unless it is accompanied by a size parameter).
I tried the following simple program and it verifies fine:
struct elem {
struct bpf_timer t;
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 2);
__type(key, int);
__type(value, struct elem);
} array SEC(".maps");
int bpf_timer_set_sleepable_cb
(struct bpf_timer *timer,
int (callback_fn)(void *map, int *key, struct bpf_timer *timer))
__ksym __weak;
static int cb_sleepable(void *map, int *key, struct bpf_timer *timer)
{
return 0;
}
SEC("fentry/bpf_fentry_test5")
int BPF_PROG2(test_sleepable, int, a)
{
struct bpf_timer *arr_timer;
int array_key = ARRAY;
arr_timer = bpf_map_lookup_elem(&array, &array_key);
if (!arr_timer)
return 0;
bpf_timer_init(arr_timer, &array, CLOCK_MONOTONIC);
bpf_timer_set_sleepable_cb(arr_timer, cb_sleepable);
bpf_timer_start(arr_timer, 0, 0);
return 0;
}
(in general, it would be easier to review if there were some test
cases to play with).
> + if (src != ACCESS_DIRECT) {
> + verbose(env, "bpf_timer cannot be accessed indirectly by helper\n");
> + return -EACCES;
> + }
> + if (!tnum_is_const(reg->var_off)) {
> + verbose(env, "bpf_timer access cannot have variable offset\n");
> + return -EACCES;
> + }
> + if (p != off + reg->var_off.value) {
> + verbose(env, "bpf_timer access misaligned expected=%u off=%llu\n",
> + p, off + reg->var_off.value);
> + return -EACCES;
> + }
> + if (size != bpf_size_to_bytes(BPF_DW)) {
> + verbose(env, "bpf_timer access size must be BPF_DW\n");
> + return -EACCES;
> + }
> + break;
> default:
> verbose(env, "%s cannot be accessed directly by load/store\n",
> btf_field_type_name(field->type));
[...]
On Fri, 2024-02-23 at 02:22 +0200, Eduard Zingerman wrote:
[...]
> > + case BPF_TIMER:
> > + /* FIXME: kptr does the above, should we use the same? */
[...]
> I tried the following simple program and it verifies fine:
Sorry, I meant that I tried it with the above check removed.
On Wed, Feb 21, 2024 at 8:25 AM Benjamin Tissoires <[email protected]> wrote:
> @@ -18193,7 +18198,7 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
> return -E2BIG;
> }
>
> - if (env->prog->aux->sleepable)
> + if (in_sleepable(env))
> atomic64_inc(&map->sleepable_refcnt);
this one doesn't look correct.
The verifier didn't start its main loop when resolve_pseudo_ldimm64()
is called.
It also loses symmetry with other sleepable_refcnt operations
in syscall.c and core.c
I reverted this hunk and applied patches 1,2,3
with minor edits, like removing unnecessary parens in patch 3,
and patch subject rewords.
On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
[...]
> @@ -11973,6 +12006,9 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> if (ret)
> return ret;
> break;
> + case KF_ARG_PTR_TO_TIMER:
> + /* FIXME: should we do anything here? */
> + break;
I think that here it is necessary to enforce that R1
is PTR_TO_MAP_VALUE and that it points to the timer field of the map value.
As is, the following program leads to in-kernel page fault when
printing verifier log:
--- 8< ----------------------------
struct elem {
struct bpf_timer t;
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 2);
__type(key, int);
__type(value, struct elem);
} array SEC(".maps");
int bpf_timer_set_sleepable_cb
(struct bpf_timer *timer,
int (callback_fn)(void *map, int *key, struct bpf_timer *timer))
__ksym __weak;
static int cb_sleepable(void *map, int *key, struct bpf_timer *timer)
{
return 0;
}
SEC("fentry/bpf_fentry_test5")
int BPF_PROG2(test_sleepable, int, a)
{
struct bpf_timer *arr_timer;
int array_key = 1;
arr_timer = bpf_map_lookup_elem(&array, &array_key);
if (!arr_timer)
return 0;
bpf_timer_init(arr_timer, &array, CLOCK_MONOTONIC);
bpf_timer_set_sleepable_cb((void *)&arr_timer, // note incorrrect pointer type!
cb_sleepable);
bpf_timer_start(arr_timer, 0, 0);
return 0;
}
---------------------------- >8 ---
I get the page fault when doing:
$ ./veristat -l7 -vvv -f test_sleepable timer.bpf.o
[ 21.014886] BUG: kernel NULL pointer dereference, address: 0000000000000060
..
[ 21.015780] RIP: 0010:print_reg_state (kernel/bpf/log.c:715)
And here is a relevant fragment of print_reg_state():
713 if (type_is_map_ptr(t)) {
714 if (reg->map_ptr->name[0])
715 verbose_a("map=%s", reg->map_ptr->name);
716 verbose_a("ks=%d,vs=%d",
717 reg->map_ptr->key_size,
718 reg->map_ptr->value_size);
719 }
The error is caused by reg->map_ptr being NULL.
The code in check_kfunc_args() allows anything in R1,
including registers for which type is not pointer to map and reg->map_ptr is NULL.
When later the check_kfunc_call() is done it does push_callback_call():
12152 err = push_callback_call(env, insn, insn_idx, meta.subprogno,
12153 set_timer_callback_state);
Which calls set_timer_callback_state(), that sets bogus state for R{1,2,3}:
9683 static int set_timer_callback_state(...)
9684 {
9685 struct bpf_map *map_ptr = caller->regs[BPF_REG_1].map_ptr;
9687
9688 /* bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn);
9689 * callback_fn(struct bpf_map *map, void *key, void *value);
9690 */
9691 callee->regs[BPF_REG_1].type = CONST_PTR_TO_MAP;
9692 __mark_reg_known_zero(&callee->regs[BPF_REG_1]);
9693 callee->regs[BPF_REG_1].map_ptr = map_ptr;
^^^^^^^^^
This is NULL!
On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
[...]
> @@ -12120,6 +12137,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> }
> }
>
> + if (is_async_callback_calling_kfunc(meta.func_id)) {
I think that it's better to check specific kfunc id here:
meta.func_id == special_kfunc_list[KF_bpf_timer_set_sleepable_cb_impl]
In case if some new async callback calling kfunc would be added,
for which set_timer_callback_state() won't be correct.
> + err = push_callback_call(env, insn, insn_idx, meta.subprogno,
> + set_timer_callback_state);
> + if (err) {
> + verbose(env, "kfunc %s#%d failed callback verification\n",
> + func_name, meta.func_id);
> + return err;
> + }
> + }
> +
> rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta);
> rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta);
>
>
On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
[...]
> @@ -626,6 +627,7 @@ struct bpf_subprog_info {
> bool is_async_cb: 1;
> bool is_exception_cb: 1;
> bool args_cached: 1;
> + bool is_sleepable: 1;
>
> u8 arg_cnt;
> struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
[...]
> @@ -2421,6 +2424,7 @@ static struct bpf_verifier_state *push_async_cb(struct bpf_verifier_env *env,
> * Initialize it similar to do_check_common().
> */
> elem->st.branches = 1;
> + elem->st.in_sleepable = env->subprog_info[subprog].is_sleepable;
> frame = kzalloc(sizeof(*frame), GFP_KERNEL);
> if (!frame)
> goto err;
[...]
> @@ -9478,6 +9483,7 @@ static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *ins
>
> /* there is no real recursion here. timer callbacks are async */
> env->subprog_info[subprog].is_async_cb = true;
> + env->subprog_info[subprog].is_sleepable = is_bpf_timer_set_sleepable_cb_kfunc(insn->imm);
> async_cb = push_async_cb(env, env->subprog_info[subprog].start,
> insn_idx, subprog);
I'd make is_sleepable a parameter for push_async_cb() instead of a field
in struct bpf_subprog_info.
I had to spend some time convincing myself that bpf_subprog_info->is_sleepable
does not have to be computed before do_check() in check_cfg(),
or what would happen if same callback is passed as both sleepable and
non-sleepable callback. These questions won't arise if this is a parameter.
[...]
On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
> This is still a WIP, but I think this can be dropped as we never
> get to this instruction. So what should we do here?
As Alexei replied in a separate sub-thread you probably want this
for sleepable timers. Here is full source code block:
if (insn->imm == BPF_FUNC_timer_set_callback ||
is_bpf_timer_set_sleepable_cb_kfunc(insn->imm)) {
...
struct bpf_insn ld_addrs[2] = {
BPF_LD_IMM64(BPF_REG_3, (long)prog->aux),
};
insn_buf[0] = ld_addrs[0];
insn_buf[1] = ld_addrs[1];
insn_buf[2] = *insn;
cnt = 3;
new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
...
}
Effectively, it sets up third function call parameter (R3)
for timer_set_callback() to be prog->aux.
E.g. before bpf_patch_insn_data():
r1 = ... timer ...
r2 = ... callback address ...
call timer_set_callback
After bpf_patch_insn_data():
r1 = ... timer ...
r2 = ... callback address ...
r3 = prog->aux ll
call timer_set_callback
This way it won't be necessary to walk stack in search for ctx.aux
in bpf_timer_set_sleepable_cb().
On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
> [Partly a RFC/formal submission: there are still FIXMEs in the code]
> [Also using bpf-next as the base tree for HID changes as there will
> be conflicting changes otherwise, so I'm personaly fine for the HID
> commits to go through bpf-next]
[...]
Could you please also add verifier selftests, e.g. extend
tools/testing/selftests/bpf/progs/timer.c (bpf side)
tools/testing/selftests/bpf/prog_tests/timer.c (userspace side triggering
bpf side)
Negative tests could be added in
tools/testing/selftests/bpf/progs/timer_failure.c
Please let me know if you need any help setting up local BPF test
environment, I have a short writeup on how to set it up in chroot.
Hi,
On Feb 23 2024, Eduard Zingerman wrote:
> On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
> > [Partly a RFC/formal submission: there are still FIXMEs in the code]
> > [Also using bpf-next as the base tree for HID changes as there will
> > be conflicting changes otherwise, so I'm personaly fine for the HID
> > commits to go through bpf-next]
>
> [...]
>
> Could you please also add verifier selftests, e.g. extend
> tools/testing/selftests/bpf/progs/timer.c (bpf side)
> tools/testing/selftests/bpf/prog_tests/timer.c (userspace side triggering
> bpf side)
> Negative tests could be added in
> tools/testing/selftests/bpf/progs/timer_failure.c
>
> Please let me know if you need any help setting up local BPF test
> environment, I have a short writeup on how to set it up in chroot.
Thanks a lot for your review (and Alexei's). I was actually off today
and will be off next Monday too, but I'll work on those tests next week.
Cheers,
Benjamin
On Feb 22 2024, Eduard Zingerman wrote:
> On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
>
> [...]
>
> > diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c
> > index e630caf644e8..52abb27426f4 100644
> > --- a/drivers/hid/bpf/hid_bpf_dispatch.c
> > +++ b/drivers/hid/bpf/hid_bpf_dispatch.c
> > @@ -143,48 +143,6 @@ u8 *call_hid_bpf_rdesc_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *s
> > }
> > EXPORT_SYMBOL_GPL(call_hid_bpf_rdesc_fixup);
> >
> > -/* Disables missing prototype warnings */
> > -__bpf_kfunc_start_defs();
>
> Note:
> this patch does not apply on top of current bpf-next [0] because
> __bpf_kfunc_start_defs and __bpf_kfunc are not present in [0].
>
> [0] commit 58fd62e0aa50 ("bpf: Clarify batch lookup/lookup_and_delete semantics")
Right... this was in Linus' tree as a late 6.8-rcx addition. Depending
on how bpf-next will be rebased/merged, I'll see if I merge this
subseries through the HID tree or the BPF one.
Cheers,
Benjamin
>
> > -
> > -/**
> > - * hid_bpf_get_data - Get the kernel memory pointer associated with the context @ctx
> > - *
> > - * @ctx: The HID-BPF context
> > - * @offset: The offset within the memory
> > - * @rdwr_buf_size: the const size of the buffer
> > - *
> > - * @returns %NULL on error, an %__u8 memory pointer on success
> > - */
> > -__bpf_kfunc __u8 *
> > -hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const size_t rdwr_buf_size)
> > -{
> > - struct hid_bpf_ctx_kern *ctx_kern;
> > -
> > - if (!ctx)
> > - return NULL;
>
> [...]
On Feb 22 2024, Alexei Starovoitov wrote:
> On Wed, Feb 21, 2024 at 8:25 AM Benjamin Tissoires <[email protected]> wrote:
> > @@ -18193,7 +18198,7 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
> > return -E2BIG;
> > }
> >
> > - if (env->prog->aux->sleepable)
> > + if (in_sleepable(env))
> > atomic64_inc(&map->sleepable_refcnt);
>
> this one doesn't look correct.
> The verifier didn't start its main loop when resolve_pseudo_ldimm64()
> is called.
> It also loses symmetry with other sleepable_refcnt operations
> in syscall.c and core.c
>
> I reverted this hunk and applied patches 1,2,3
> with minor edits, like removing unnecessary parens in patch 3,
> and patch subject rewords.
Thanks a lot. I'll work on the rest of the series next week.
Cheers,
Benjamin
On Feb 23 2024, Eduard Zingerman wrote:
> On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
>
> [...]
>
> > @@ -1282,7 +1333,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
> >
> > if (in_nmi())
> > return -EOPNOTSUPP;
> > - if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN))
> > + if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN | BPF_F_TIMER_SLEEPABLE))
> > return -EINVAL;
> > __bpf_spin_lock_irqsave(&timer->lock);
> > t = timer->timer;
> > @@ -1299,7 +1350,10 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla
> > if (flags & BPF_F_TIMER_CPU_PIN)
> > mode |= HRTIMER_MODE_PINNED;
> >
> > - hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
> > + if (flags & BPF_F_TIMER_SLEEPABLE)
> > + schedule_work(&t->work);
> > + else
> > + hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode);
>
> It looks like nsecs is simply ignored for sleepable timers.
> Should this be hrtimer_start() that waits nsecs and schedules work,
> or schedule_delayed_work()? (but it takes delay in jiffies, which is
> probably too coarse). Sorry if I miss something.
Yeah, I agree it's confusing, but as mentioned by Toke in his reply, we
should return -EINVAL if a timer value is provided (for now).
Alexei mentioned[0] that he didn't want to mix delays in hrtimers with
workqueue as they are non deterministic. So AFAIU, I should add the only
guarantee we can provide: a sleepable context, and proper delays in
sleepable contexts will be added once we have a better workqueue
selection available.
Cheers,
Benjamin
[0] https://lore.kernel.org/bpf/CAO-hwJKz+eRA+BFLANTrEqz2jQAOANTE3c7eqNJ6wDqJR7jMiQ@mail.gmail.com/T/#md15e431cbcddec9fcaddf1c305234523ed26f7ce
On Feb 23 2024, Eduard Zingerman wrote:
> On Wed, 2024-02-21 at 17:25 +0100, Benjamin Tissoires wrote:
> > This is still a WIP, but I think this can be dropped as we never
> > get to this instruction. So what should we do here?
>
> As Alexei replied in a separate sub-thread you probably want this
> for sleepable timers. Here is full source code block:
>
> if (insn->imm == BPF_FUNC_timer_set_callback ||
> is_bpf_timer_set_sleepable_cb_kfunc(insn->imm)) {
> ...
> struct bpf_insn ld_addrs[2] = {
> BPF_LD_IMM64(BPF_REG_3, (long)prog->aux),
> };
>
> insn_buf[0] = ld_addrs[0];
> insn_buf[1] = ld_addrs[1];
> insn_buf[2] = *insn;
> cnt = 3;
>
> new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt);
> ...
> }
>
> Effectively, it sets up third function call parameter (R3)
> for timer_set_callback() to be prog->aux.
> E.g. before bpf_patch_insn_data():
>
> r1 = ... timer ...
> r2 = ... callback address ...
> call timer_set_callback
>
> After bpf_patch_insn_data():
>
> r1 = ... timer ...
> r2 = ... callback address ...
> r3 = prog->aux ll
> call timer_set_callback
>
> This way it won't be necessary to walk stack in search for ctx.aux
> in bpf_timer_set_sleepable_cb().
Hmm, I must still be missing a piece of the puzzle:
if I declare bpf_timer_set_sleepable_cb() to take a third "aux"
argument, given that it is declared as kfunc, I also must declare it in
my bpf program, or I get the following:
# libbpf: extern (func ksym) 'bpf_timer_set_sleepable_cb': func_proto [264] incompatible with vmlinux [18151]
And if I declare it, then I don't know what to pass, given that this is
purely added by the verifier:
43: (85) call bpf_timer_set_sleepable_cb#18152
arg#2 pointer type STRUCT bpf_prog_aux must point to scalar, or struct with scalar
Maybe I should teach the verifier that this kfunc only takes 2
arguments, and the third one is virtual, but that also means that when
the kfunc definitions are to be included in vmlinux.h, they would also
have this special case.
(I just tried with a blank u64 instead of the struct bpf_prog_aux*, but
it crashes with KASAN complaining).
Cheers,
Benjamin
On Tue, Feb 27, 2024 at 5:36 PM Eduard Zingerman <[email protected]> wrote:
>
> On Tue, 2024-02-27 at 17:18 +0100, Benjamin Tissoires wrote:
> [...]
>
> > Hmm, I must still be missing a piece of the puzzle:
> > if I declare bpf_timer_set_sleepable_cb() to take a third "aux"
> > argument, given that it is declared as kfunc, I also must declare it in
> > my bpf program, or I get the following:
> >
> > # libbpf: extern (func ksym) 'bpf_timer_set_sleepable_cb': func_proto [264] incompatible with vmlinux [18151]
> >
> > And if I declare it, then I don't know what to pass, given that this is
> > purely added by the verifier:
> >
> > 43: (85) call bpf_timer_set_sleepable_cb#18152
> > arg#2 pointer type STRUCT bpf_prog_aux must point to scalar, or struct with scalar
>
> Right, something has to be done about number of arguments and we don't
> have a convenient mechanism for this afaik.
>
> The simplest way would be to have two kfuncs:
> - one with 2 arguments, used form bpf program;
> - another with 3 arguments, used at runtime;
> - replace former by latter during rewrite.
It's hacky but seems interesting enough to be tested :)
>
> > Maybe I should teach the verifier that this kfunc only takes 2
> > arguments, and the third one is virtual, but that also means that when
> > the kfunc definitions are to be included in vmlinux.h, they would also
> > have this special case.
>
> It might be a somewhat generic mechanism, e.g. btf_decl_tag("hidden")
> for kfunc parameter.
We also could use the suffix (like __uninit, __k, etc...), but it
might introduce more headaches than the 2 kfuncs you are proposing.
>
> imho, having two kfuncs is less hacky.
>
> > (I just tried with a blank u64 instead of the struct bpf_prog_aux*, but
> > it crashes with KASAN complaining).
>
> For my understanding:
> - you added a 3rd param (void *) to kfunc;
it was struct bpf_prog_aux *, but yes
> - passed it as zero in BPF program;
> - applied the above rewrite, so that r3 equals to prog->aux;
> - and now KASAN complains, right?
yep, but see below
>
> Could you please provide more details on what exactly it complains about?
>
Well, there is a simple reason: that code is never reached because, in
that function, there is a `if (insn->src_reg ==
BPF_PSEUDO_KFUNC_CALL)` above that unconditionally terminates with a
`continue`. So basically this part of the code is never hit.
I'll include that new third argument and the dual kfunc call in
fixup_kfunc_call() and report if it works from here.
Cheers,
Benjamin
On Tue, 2024-02-27 at 17:18 +0100, Benjamin Tissoires wrote:
[...]
> Hmm, I must still be missing a piece of the puzzle:
> if I declare bpf_timer_set_sleepable_cb() to take a third "aux"
> argument, given that it is declared as kfunc, I also must declare it in
> my bpf program, or I get the following:
>
> # libbpf: extern (func ksym) 'bpf_timer_set_sleepable_cb': func_proto [264] incompatible with vmlinux [18151]
>
> And if I declare it, then I don't know what to pass, given that this is
> purely added by the verifier:
>
> 43: (85) call bpf_timer_set_sleepable_cb#18152
> arg#2 pointer type STRUCT bpf_prog_aux must point to scalar, or struct with scalar
Right, something has to be done about number of arguments and we don't
have a convenient mechanism for this afaik.
The simplest way would be to have two kfuncs:
- one with 2 arguments, used form bpf program;
- another with 3 arguments, used at runtime;
- replace former by latter during rewrite.
> Maybe I should teach the verifier that this kfunc only takes 2
> arguments, and the third one is virtual, but that also means that when
> the kfunc definitions are to be included in vmlinux.h, they would also
> have this special case.
It might be a somewhat generic mechanism, e.g. btf_decl_tag("hidden")
for kfunc parameter.
imho, having two kfuncs is less hacky.
> (I just tried with a blank u64 instead of the struct bpf_prog_aux*, but
> it crashes with KASAN complaining).
For my understanding:
- you added a 3rd param (void *) to kfunc;
- passed it as zero in BPF program;
- applied the above rewrite, so that r3 equals to prog->aux;
- and now KASAN complains, right?
Could you please provide more details on what exactly it complains about?
On Tue, Feb 27, 2024 at 8:51 AM Benjamin Tissoires
<[email protected]> wrote:
>
> On Tue, Feb 27, 2024 at 5:36 PM Eduard Zingerman <[email protected]> wrote:
> >
> > On Tue, 2024-02-27 at 17:18 +0100, Benjamin Tissoires wrote:
> > [...]
> >
> > > Hmm, I must still be missing a piece of the puzzle:
> > > if I declare bpf_timer_set_sleepable_cb() to take a third "aux"
> > > argument, given that it is declared as kfunc, I also must declare it in
> > > my bpf program, or I get the following:
> > >
> > > # libbpf: extern (func ksym) 'bpf_timer_set_sleepable_cb': func_proto [264] incompatible with vmlinux [18151]
> > >
> > > And if I declare it, then I don't know what to pass, given that this is
> > > purely added by the verifier:
> > >
> > > 43: (85) call bpf_timer_set_sleepable_cb#18152
> > > arg#2 pointer type STRUCT bpf_prog_aux must point to scalar, or struct with scalar
> >
> > Right, something has to be done about number of arguments and we don't
> > have a convenient mechanism for this afaik.
> >
> > The simplest way would be to have two kfuncs:
> > - one with 2 arguments, used form bpf program;
> > - another with 3 arguments, used at runtime;
> > - replace former by latter during rewrite.
>
> It's hacky but seems interesting enough to be tested :)
Too hacky imo :)
Let's follow the existing pattern.
See:
__bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
__ign suffix tells the verifier to ignore it.
Then we do:
#define bpf_obj_new(type) \
((type *)bpf_obj_new_impl(bpf_core_type_id_local(type), NULL))
and later the verifier replaces arg2 with the correct pointer.
> We also could use the suffix (like __uninit, __k, etc...), but it
> might introduce more headaches than the 2 kfuncs you are proposing.
Only one kfunc pls. Let's not make it more complex than necessary.
We cannot easily add a suffix to tell libbpf to ignore that arg,
since bpf_core_types_are_compat() compares types and there are
no argument names in the types.
So it will be a significant surgery for libbpf to find the arg name
in vmlinux BTF and strcmp the suffix.
>
> >
> > Could you please provide more details on what exactly it complains about?
> >
>
> Well, there is a simple reason: that code is never reached because, in
> that function, there is a `if (insn->src_reg ==
> BPF_PSEUDO_KFUNC_CALL)` above that unconditionally terminates with a
> `continue`. So basically this part of the code is never hit.
>
> I'll include that new third argument and the dual kfunc call in
> fixup_kfunc_call() and report if it works from here.
Something is wrong. fixup_kfunc_call() can rewrite args with whatever
it wants.
Are you sure you've added bpf_timer_set_sleepable_cb to special_kfunc_list ?
On Wed, Feb 28, 2024 at 2:49 AM Alexei Starovoitov
<[email protected]> wrote:
>
> On Tue, Feb 27, 2024 at 8:51 AM Benjamin Tissoires
> <[email protected]> wrote:
> >
> > On Tue, Feb 27, 2024 at 5:36 PM Eduard Zingerman <eddyz87@gmailcom> wrote:
> > >
> > > On Tue, 2024-02-27 at 17:18 +0100, Benjamin Tissoires wrote:
> > > [...]
> > >
> > > > Hmm, I must still be missing a piece of the puzzle:
> > > > if I declare bpf_timer_set_sleepable_cb() to take a third "aux"
> > > > argument, given that it is declared as kfunc, I also must declare it in
> > > > my bpf program, or I get the following:
> > > >
> > > > # libbpf: extern (func ksym) 'bpf_timer_set_sleepable_cb': func_proto [264] incompatible with vmlinux [18151]
> > > >
> > > > And if I declare it, then I don't know what to pass, given that this is
> > > > purely added by the verifier:
> > > >
> > > > 43: (85) call bpf_timer_set_sleepable_cb#18152
> > > > arg#2 pointer type STRUCT bpf_prog_aux must point to scalar, or struct with scalar
> > >
> > > Right, something has to be done about number of arguments and we don't
> > > have a convenient mechanism for this afaik.
> > >
> > > The simplest way would be to have two kfuncs:
> > > - one with 2 arguments, used form bpf program;
> > > - another with 3 arguments, used at runtime;
> > > - replace former by latter during rewrite.
> >
> > It's hacky but seems interesting enough to be tested :)
>
> Too hacky imo :)
>
> Let's follow the existing pattern.
> See:
> __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign)
>
> __ign suffix tells the verifier to ignore it.
>
> Then we do:
> #define bpf_obj_new(type) \
> ((type *)bpf_obj_new_impl(bpf_core_type_id_local(type), NULL))
>
> and later the verifier replaces arg2 with the correct pointer.
\o/ Thanks, it works :)
>
> > We also could use the suffix (like __uninit, __k, etc...), but it
> > might introduce more headaches than the 2 kfuncs you are proposing.
>
> Only one kfunc pls. Let's not make it more complex than necessary.
>
> We cannot easily add a suffix to tell libbpf to ignore that arg,
> since bpf_core_types_are_compat() compares types and there are
> no argument names in the types.
> So it will be a significant surgery for libbpf to find the arg name
> in vmlinux BTF and strcmp the suffix.
Yeah, I guessed so. Having a single #define is fine, especially given
that there are already a lot of them for the same purpose.
>
> >
> > >
> > > Could you please provide more details on what exactly it complains about?
> > >
> >
> > Well, there is a simple reason: that code is never reached because, in
> > that function, there is a `if (insn->src_reg ==
> > BPF_PSEUDO_KFUNC_CALL)` above that unconditionally terminates with a
> > `continue`. So basically this part of the code is never hit.
> >
> > I'll include that new third argument and the dual kfunc call in
> > fixup_kfunc_call() and report if it works from here.
>
> Something is wrong. fixup_kfunc_call() can rewrite args with whatever
> it wants.
> Are you sure you've added bpf_timer_set_sleepable_cb to special_kfunc_list ?
>
Yeah, but as I mentioned, I wasn't hacking at the correct place. I was
not doing the changes in the fixup_kfunc_call() but in the helper
processing, so that path was not hit.
But with your instructions it works.
I have a couple of changes to do and the selftests to add and the
series will be ready.
Cheers,
Benjamin