Now that BPF supports adding new kernel functions with kfuncs, and
storing kernel objects in maps with kptrs, we can add a set of kfuncs
which allow struct task_struct objects to be stored in maps as
referenced kptrs.
The possible use cases for doing this are plentiful. During tracing,
for example, it would be useful to be able to collect some tasks that
performed a certain operation, and then periodically summarize who they
are, which cgroup they're in, how much CPU time they've utilized, etc.
Doing this now would require storing the tasks' pids along with some
relevant data to be exported to user space, and later associating the
pids to tasks in other event handlers where the data is recorded.
Another useful by-product of this is that it allows a program to pin a
task in a BPF program, and by proxy therefore also e.g. pin its task
local storage.
In order to support this, we'll need to expand KF_TRUSTED_ARGS to
support receiving trusted, non-refcounted pointers. It currently only
supports either PTR_TO_CTX pointers, or refcounted pointers. What this
means in terms of the implementation is that btf_check_func_arg_match()
would have to add another condition to its logic for checking if
a ptr needs a refcount to also require that the pointer has at least one
type modifier, such as a new modifier we're adding called PTR_WALKED
(described below). Note that PTR_UNTRUSTED is insufficient for this
purpose, as it does not cover all of the possible pointers we need to
watch out for, though. For example, a pointer obtained from walking a
struct is considered "trusted" (or at least, not PTR_UNTRUSTED). To
account for this and enable us to expand KF_TRUSTED_ARGS, this patch set
also introduces a new PTR_WALKED type flag modifier which records if a
pointer was obtained from walking a struct.
Additionally, while loosening the restrictions of KF_TRUSTED_ARGS does
expand the availability of the flag to more possible kfunc variants, it
also removes some restrictions that are required by some existing
kfuncs. For example, kfuncs like bpf_ct_{g,s}et_status() expect a struct
nf_conn___init * argument that was allocated on behalf of the program by
bpf_skb_ct_alloc(), rather than which was allocated and passed by the
kernel in a tracepoint handler. We do not want to allow other struct
nf_conn___init pointers that are not "owned" by the BPF program to be
passed to these kfuncs. To address this, we also add a KF_OWNED_ARGS
kfunc flag which is identical to KF_TRUSTED_ARGS, but is more
restrictive in that it also requires the BPF program to own a reference
to an object.
Authors please note, in my opinion it's not 100% clear whether
KF_OWNED_ARGS is really the right abstraction. Consider, for example, if
two KF_ACQUIRE kfuncs are exposed which return a reference to the same
type of object, with one of them actually allocating the object, and the
other just taking a reference on an object that was acquired by the main
kernel. This flag would not sufficiently express that only the object
that was returned by the kfunc that _allocated_ the object should be
allowed to be passed to KF_OWNED_ARGS. On the other hand, it seems like
a reasonable flag for providing the equvialent semantics to the existing
version of KF_TRUSTED_ARGS. Much of how all this works will likely
change over time as well. For example, eventually it would be nice if we
could specify constraints per-arg, rather than on the whole kfunc, using
something like type tagging (not yet available in gcc).
In closing, this patch set:
1. Adds the new PTR_WALKED register type modifier flag, and updates the
verifier and existing selftests accordingly.
2. Expands KF_TRUSTED_ARGS to also include trusted pointers that were
not obtained from walking structs.
3. Adds the new KF_OWNED_ARGS kfunc flag to specify kfuncs that can only
accept trusted, refcounted objects. Updates the existing kfuncs that
require these semantics to use this new kfunc flag.
4. Adds a new set of kfuncs that allows struct task_struct* objects to be
used as kptrs.
5. Adds a new selftest suite to validate these new task kfuncs.
--
Changelog:
v5 -> v6:
- Add a new KF_OWNED_ARGS kfunc flag which may be used by kfuncs to
express that they require trusted, refcounted args (Kumar)
- Rename PTR_NESTED -> PTR_WALKED in the verifier (Kumar)
- Convert reg_type_str() prefixes to use snprintf() instead of strncpy()
(Kumar)
- Add PTR_TO_BTF_ID | PTR_WALKED to missing struct btf_reg_type
instances -- specifically btf_id_sock_common_types, and
percpu_btf_ptr_types.
- Add a missing PTR_TO_BTF_ID | PTR_WALKED switch case entry in
check_func_arg_reg_off(), which is required when validating helper
calls (Kumar)
- Update reg_type_mismatch_ok() to check base types for the registers
(i.e. to accommodate type modifiers). Additionally, add a lengthy
comment that explains why this is being done (Kumar)
- Update convert_ctx_accesses() to also issue probe reads for
PTR_TO_BTF_ID | PTR_WALKED (Kumar)
- Update selftests to expect new prefix reg type strings.
- Rename task_kfunc_acquire_trusted_nested testcase to
task_kfunc_acquire_trusted_walked, and fix a comment (Kumar)
- Remove KF_TRUSTED_ARGS from bpf_task_release(), which already includes
KF_RELEASE (Kumar)
- Add bpf-next in patch subject lines (Kumar)
v4 -> v5:
- Fix an improperly formatted patch title.
v3 -> v4:
- Remove an unnecessary check from my repository that I forgot to remove
after debugging something.
v2 -> v3:
- Make bpf_task_acquire() check for NULL, and include KF_RET_NULL
(Martin)
- Include new PTR_NESTED register modifier type flag which specifies
whether a pointer was obtained from walking a struct. Use this to
expand the meaning of KF_TRUSTED_ARGS to include trusted pointers that
were passed from the kernel (Kumar)
- Add more selftests to the task_kfunc selftest suite which verify that
you cannot pass a walked pointer to bpf_task_acquire().
- Update bpf_task_acquire() to also specify KF_TRUSTED_ARGS.
v1 -> v2:
- Rename tracing_btf_ids to generic_kfunc_btf_ids, and add the new
kfuncs to that list instead of making a separate btf id list (Alexei).
- Don't run the new selftest suite on s390x, which doesn't appear to
support invoking kfuncs.
- Add a missing __diag_ignore block for -Wmissing-prototypes
([email protected]).
- Fix formatting on some of the SPDX-License-Identifier tags.
- Clarified the function header comment a bit on bpf_task_kptr_get().
David Vernet (3):
bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs
bpf: Add kfuncs for storing struct task_struct * as a kptr
bpf/selftests: Add selftests for new task kfuncs
Documentation/bpf/kfuncs.rst | 49 ++-
include/linux/bpf.h | 6 +
include/linux/btf.h | 60 +++-
kernel/bpf/btf.c | 18 +-
kernel/bpf/helpers.c | 86 ++++-
kernel/bpf/verifier.c | 66 +++-
net/bpf/test_run.c | 2 +-
net/netfilter/nf_conntrack_bpf.c | 8 +-
net/netfilter/nf_nat_bpf.c | 2 +-
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../selftests/bpf/prog_tests/map_kptr.c | 2 +-
.../selftests/bpf/prog_tests/task_kfunc.c | 160 +++++++++
.../selftests/bpf/progs/task_kfunc_common.h | 83 +++++
.../selftests/bpf/progs/task_kfunc_failure.c | 315 ++++++++++++++++++
.../selftests/bpf/progs/task_kfunc_success.c | 132 ++++++++
tools/testing/selftests/bpf/verifier/calls.c | 4 +-
.../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
17 files changed, 943 insertions(+), 53 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c
--
2.38.0
Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
to the verifier that it should enforce that a BPF program passes it a
"safe", trusted pointer. Currently, "safe" means that the pointer is
either PTR_TO_CTX, or is refcounted. There may be cases, however, where
the kernel passes a BPF program a safe / trusted pointer to an object
that the BPF program wishes to use as a kptr, but because the object
does not yet have a ref_obj_id from the perspective of the verifier, the
program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
kfunc.
The solution is to expand the set of pointers that are considered
trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
with these pointers without getting rejected by the verifier.
There is already a PTR_UNTRUSTED flag that is set in some scenarios,
such as when a BPF program reads a kptr directly from a map
without performing a bpf_kptr_xchg() call. These pointers of course can
and should be rejected by the verifier. Unfortunately, however,
PTR_UNTRUSTED does not cover all the cases for safety that need to
be addressed to adequately protect kfuncs. Specifically, pointers
obtained by a BPF program "walking" a struct are _not_ considered
PTR_UNTRUSTED according to BPF. For example, say that we were to add a
kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
that a task was unsafe to pass to a kfunc, the verifier would mistakenly
allow the following unsafe BPF program to be loaded:
SEC("tp_btf/task_newtask")
int BPF_PROG(unsafe_acquire_task,
struct task_struct *task,
u64 clone_flags)
{
struct task_struct *acquired, *nested;
nested = task->last_wakee;
/* Would not be rejected by the verifier. */
acquired = bpf_task_acquire(nested);
if (!acquired)
return 0;
bpf_task_release(acquired);
return 0;
}
To address this, this patch defines a new type flag called PTR_WALKED
which tracks whether a PTR_TO_BTF_ID pointer was retrieved from walking
a struct. A pointer passed directly from the kernel begins with
(PTR_WALKED & type) == 0, meaning of course that it is not obtained from
walking another struct. Any pointer received from walking that object,
however, would inherit that flag and become a walked pointer.
Additionally, because some kfuncs still only want BPF programs to be
able to send them an arg that they "own" (i.e. which they own a refcount
on) another kfunc arg flag called KF_OWNED_ARGS is added which is
identical to KF_TRUSTED_ARGS, but imposes the stricter requirement that
the arg must also have a refcount.
A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
and then another patch will validate this feature by ensuring that the
verifier rejects a kfunc invocation with a nested pointer.
Signed-off-by: David Vernet <[email protected]>
---
Documentation/bpf/kfuncs.rst | 50 ++++++++++----
include/linux/bpf.h | 6 ++
include/linux/btf.h | 57 ++++++++++++++--
kernel/bpf/btf.c | 18 ++++-
kernel/bpf/verifier.c | 66 ++++++++++++++-----
net/bpf/test_run.c | 2 +-
net/netfilter/nf_conntrack_bpf.c | 8 +--
net/netfilter/nf_nat_bpf.c | 2 +-
.../selftests/bpf/prog_tests/map_kptr.c | 2 +-
tools/testing/selftests/bpf/verifier/calls.c | 4 +-
.../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
11 files changed, 169 insertions(+), 48 deletions(-)
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 0f858156371d..8e2825150a8d 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -137,30 +137,54 @@ KF_ACQUIRE and KF_RET_NULL flags.
--------------------------
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
-indicates that the all pointer arguments will always have a guaranteed lifetime,
-and pointers to kernel objects are always passed to helpers in their unmodified
-form (as obtained from acquire kfuncs).
+indicates that the all pointer arguments will always have a guaranteed
+lifetime, and pointers to kernel objects are always passed to helpers in their
+unmodified form (either as passed by the main kernel, or as obtained from
+acquire kfuncs).
-It can be used to enforce that a pointer to a refcounted object acquired from a
-kfunc or BPF helper is passed as an argument to this kfunc without any
-modifications (e.g. pointer arithmetic) such that it is trusted and points to
-the original object.
+It can be used to enforce that a safe pointer passed to the program by the
+kernel, or a refcounted object acquired from a kfunc or BPF helper, is passed
+as an argument to this kfunc without any modifications (e.g. pointer
+arithmetic) such that it is trusted and points to the original object.
Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
but those can have a non-zero offset.
-This flag is often used for kfuncs that operate (change some property, perform
-some operation) on an object that was obtained using an acquire kfunc. Such
-kfuncs need an unchanged pointer to ensure the integrity of the operation being
-performed on the expected object.
+This flag is often used for kfuncs that receive a trusted pointer from the
+kernel, and which do not require a reference to be held by the program. For
+example, if there's a kernel object that was allocated by the main kernel, and
+which the BPF program wishes to store in a map as a kptr, KF_TRUSTED_ARGS can
+be used to ensure that the pointer is actually a trusted kernel pointer before
+a reference is acquired on it in a KF_ACQUIRE kfunc.
+
+2.4.6 KF_OWNED_ARGS flag
+------------------------
+
+The KF_OWNED_ARGS flag is identical to the KF_TRUSTED_ARGS flag, though it is
+more restrictive in that it also requires the BPF program to hold a reference
+on the object.
-2.4.6 KF_SLEEPABLE flag
+In other words, it can be used to enforce that a pointer to a refcounted object
+acquired from a kfunc or BPF helper is passed as an argument to this kfunc
+without any modifications (e.g. pointer arithmetic) such that it is trusted and
+points to the original object that was allocated or owned by the BPF program.
+
+This flag is often used for kfuncs that operate (change some property, perform
+some operation) on an object that was obtained using an acquire kfunc. For
+example, if an acquire kfunc allocates an object on behalf of a program,
+KF_OWNED_ARGS would be an appropriate flag to specify for other kfuncs which
+allow the program to mutate that object. KF_TRUSTED_ARGS, on the other hand,
+would likely not be sufficiently restrictive as the kfunc does not want to
+allow the BPF program to mutate another instance of the same object type which
+was allocated by the main kernel.
+
+2.4.7 KF_SLEEPABLE flag
-----------------------
The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
be called by sleepable BPF programs (BPF_F_SLEEPABLE).
-2.4.7 KF_DESTRUCTIVE flag
+2.4.8 KF_DESTRUCTIVE flag
--------------------------
The KF_DESTRUCTIVE flag is used to indicate functions calling which is
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9e7d46d16032..ccdbefd72a95 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -457,6 +457,12 @@ enum bpf_type_flag {
/* Size is known at compile time. */
MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS),
+ /* PTR was obtained from walking a struct. This is used with
+ * PTR_TO_BTF_ID to determine whether the pointer is safe to pass to a
+ * kfunc with KF_TRUSTED_ARGS.
+ */
+ PTR_WALKED = BIT(11 + BPF_BASE_TYPE_BITS),
+
__BPF_TYPE_FLAG_MAX,
__BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
};
diff --git a/include/linux/btf.h b/include/linux/btf.h
index f9aababc5d78..7f5a438196a2 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -17,9 +17,48 @@
#define KF_RELEASE (1 << 1) /* kfunc is a release function */
#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */
#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */
-/* Trusted arguments are those which are meant to be referenced arguments with
- * unchanged offset. It is used to enforce that pointers obtained from acquire
- * kfuncs remain unmodified when being passed to helpers taking trusted args.
+/* Trusted arguments are those which are meant to be guaranteed valid
+ * arguments, with an unchanged offset. It is used to enforce that pointers
+ * obtained from either acquire kfuncs or the main kernel remain unmodified
+ * when being passed to helpers taking trusted args.
+ *
+ * Consider, for example, the following task tracepoint:
+ *
+ * SEC("tp_btf/task_newtask")
+ * int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
+ * {
+ * ...
+ * }
+ *
+ * And the following kfunc:
+ *
+ * BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
+ *
+ * All invocations to the kfunc must pass the unmodified, unwalked task:
+ *
+ * bpf_task_acquire(task); // Allowed
+ * bpf_task_acquire(task->last_wakee); // Rejected, walked task
+ *
+ * Users may also pass referenced tasks directly to the kfunc:
+ *
+ * struct task_struct *acquired;
+ *
+ * acquired = bpf_task_acquire(task); // Allowed, same as above
+ * bpf_task_acquire(acquired); // Allowed
+ * bpf_task_acquire(task); // Allowed
+ * bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
+ *
+ * If users wish to only allow referenced objects to be passed to a kfunc, they
+ * may instead specify the KF_OWNED_ARGS flag.
+ */
+#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
+#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */
+#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */
+/* Owned arguments are similar to trusted arguments, but are even more
+ * restrictive. Owned arguments are arguments which are "owned" by the BPF
+ * program, meaning it has acquired a reference to the object via an acquire
+ * kfunc. Just as with trusted arguments, the verifier enforces that owned
+ * arguments have an unchanged offset when they're passed to kfuncs.
*
* Consider
* struct foo {
@@ -36,7 +75,7 @@
* struct bar *b = alloc_bar(); // Acquire kfunc
*
* If a kfunc set_foo_data() wants to operate only on the allocated object, it
- * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
+ * will set the KR_ARGS_OWNED flag, which will prevent unsafe usage like:
*
* set_foo_data(f, 42); // Allowed
* set_foo_data(f->next, 42); // Rejected, non-referenced pointer
@@ -47,10 +86,14 @@
* by looking at the type of the member at the offset, but due to the
* requirement of trusted argument, this deduction will be strict and not done
* for this case.
+ *
+ * Note as well that if tracepoints existed which took a struct foo *f argument
+ * that was passed from the kernel, the verifier would also reject
+ * set_foo_bar(f, 42) on it, as the BPF program had not acquired a reference on
+ * it. If the kfunc had instead specified KF_TRUSTED_ARGS, this would be
+ * permitted.
*/
-#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
-#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */
-#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */
+#define KF_OWNED_ARGS (1 << 7) /* kfunc performs destructive actions */
/*
* Return the name of the passed struct, if exists, or halt the build if for
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index eba603cec2c5..a3712abae108 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6227,7 +6227,7 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
bool processing_call)
{
enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
- bool rel = false, kptr_get = false, trusted_args = false;
+ bool rel = false, kptr_get = false, trusted_args = false, owned_args = false;
bool sleepable = false;
struct bpf_verifier_log *log = &env->log;
u32 i, nargs, ref_id, ref_obj_id = 0;
@@ -6265,7 +6265,8 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
/* Only kfunc can be release func */
rel = kfunc_meta->flags & KF_RELEASE;
kptr_get = kfunc_meta->flags & KF_KPTR_GET;
- trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS;
+ owned_args = kfunc_meta->flags & KF_OWNED_ARGS;
+ trusted_args = owned_args || (kfunc_meta->flags & KF_TRUSTED_ARGS);
sleepable = kfunc_meta->flags & KF_SLEEPABLE;
}
@@ -6333,8 +6334,19 @@ static int btf_check_func_arg_match(struct bpf_verifier_env *env,
/* Check if argument must be a referenced pointer, args + i has
* been verified to be a pointer (after skipping modifiers).
* PTR_TO_CTX is ok without having non-zero ref_obj_id.
+ *
+ * All object pointers must be refcounted, other than:
+ * - PTR_TO_CTX
+ * - Trusted pointers (i.e. pointers with no type modifiers).
+ * Kfuncs that have specified KF_OWNED_ARGS require
+ * references even if a pointer is otherwise trusted.
*/
- if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
+ if (is_kfunc &&
+ trusted_args &&
+ obj_ptr &&
+ base_type(reg->type) != PTR_TO_CTX &&
+ (owned_args || type_flag(reg->type)) &&
+ !reg->ref_obj_id) {
bpf_log(log, "R%d must be referenced\n", regno);
return -EINVAL;
}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f6d2d511c06..7b8b20feeb62 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -541,7 +541,7 @@ static bool is_cmpxchg_insn(const struct bpf_insn *insn)
static const char *reg_type_str(struct bpf_verifier_env *env,
enum bpf_reg_type type)
{
- char postfix[16] = {0}, prefix[32] = {0};
+ char postfix[16] = {0}, prefix[64] = {0};
static const char * const str[] = {
[NOT_INIT] = "?",
[SCALAR_VALUE] = "scalar",
@@ -573,16 +573,14 @@ static const char *reg_type_str(struct bpf_verifier_env *env,
strncpy(postfix, "_or_null", 16);
}
- if (type & MEM_RDONLY)
- strncpy(prefix, "rdonly_", 32);
- if (type & MEM_ALLOC)
- strncpy(prefix, "alloc_", 32);
- if (type & MEM_USER)
- strncpy(prefix, "user_", 32);
- if (type & MEM_PERCPU)
- strncpy(prefix, "percpu_", 32);
- if (type & PTR_UNTRUSTED)
- strncpy(prefix, "untrusted_", 32);
+ snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s",
+ type & MEM_RDONLY ? "rdonly_" : "",
+ type & MEM_ALLOC ? "alloc_" : "",
+ type & MEM_USER ? "user_" : "",
+ type & MEM_PERCPU ? "percpu_" : "",
+ type & PTR_UNTRUSTED ? "untrusted_" : "",
+ type & PTR_WALKED ? "walked_" : ""
+ );
snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
prefix, str[base_type(type)], postfix);
@@ -4558,6 +4556,9 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
if (type_flag(reg->type) & PTR_UNTRUSTED)
flag |= PTR_UNTRUSTED;
+ /* Mark this and any future pointers as having been obtained from walking a struct. */
+ flag |= PTR_WALKED;
+
if (atype == BPF_READ && value_regno >= 0)
mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
@@ -5661,6 +5662,7 @@ static const struct bpf_reg_types btf_id_sock_common_types = {
PTR_TO_TCP_SOCK,
PTR_TO_XDP_SOCK,
PTR_TO_BTF_ID,
+ PTR_TO_BTF_ID | PTR_WALKED,
},
.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
};
@@ -5694,9 +5696,19 @@ static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
+static const struct bpf_reg_types btf_ptr_types = {
+ .types = {
+ PTR_TO_BTF_ID,
+ PTR_TO_BTF_ID | PTR_WALKED
+ },
+};
static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
-static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
+static const struct bpf_reg_types percpu_btf_ptr_types = {
+ .types = {
+ PTR_TO_BTF_ID | MEM_PERCPU,
+ PTR_TO_BTF_ID | MEM_PERCPU | PTR_WALKED,
+ }
+};
static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
@@ -5860,6 +5872,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
* fixed offset.
*/
case PTR_TO_BTF_ID:
+ case PTR_TO_BTF_ID | PTR_WALKED:
/* When referenced PTR_TO_BTF_ID is passed to release function,
* it's fixed offset must be 0. In the other cases, fixed offset
* can be non-zero.
@@ -12136,8 +12149,30 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type)
*/
static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
{
- return src != prev && (!reg_type_mismatch_ok(src) ||
- !reg_type_mismatch_ok(prev));
+ /* Compare only the base types of the registers, to avoid confusing the
+ * verifier with the following type of code:
+ *
+ * struct fib6_nh *fib6_nh;
+ * struct nexthop *nh;
+ *
+ * fib6_nh = &rt->fib6_nh[0];
+ *
+ * nh = rt->nh;
+ * if (nh)
+ * fib6_nh = &nh->nh_info->fib6_nh;
+ *
+ * If we did not compare base types, the verifier would reject this
+ * because the register in the former branch will have PTR_TO_BTF_ID,
+ * whereas the latter branch will have PTR_TO_BTF_ID | PTR_WALKED.
+ *
+ * The safety of the memory access is validated in check_mem_access()
+ * before this function is called. The intention here is rather to
+ * prevent a program from doing something like using PTR_TO_BTF_ID in
+ * one path, and PTR_TO_CTX in another, as it would cause the
+ * convert_ctx_access() handling to be incorrect.
+ */
+ return base_type(src) != base_type(prev) &&
+ (!reg_type_mismatch_ok(src) || !reg_type_mismatch_ok(prev));
}
static int do_check(struct bpf_verifier_env *env)
@@ -13499,6 +13534,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
break;
case PTR_TO_BTF_ID:
case PTR_TO_BTF_ID | PTR_UNTRUSTED:
+ case PTR_TO_BTF_ID | PTR_WALKED:
if (type == BPF_READ) {
insn->code = BPF_LDX | BPF_PROBE_MEM |
BPF_SIZE((insn)->code);
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 13d578ce2a09..a298bef13e12 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -758,7 +758,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
-BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_OWNED_ARGS)
BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
BTF_SET8_END(test_sk_check_kfunc_ids)
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 8639e7efd0e2..ef937f4b4fe4 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -479,10 +479,10 @@ BTF_ID_FLAGS(func, bpf_skb_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_skb_ct_lookup, KF_ACQUIRE | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_ct_insert_entry, KF_ACQUIRE | KF_RET_NULL | KF_RELEASE)
BTF_ID_FLAGS(func, bpf_ct_release, KF_RELEASE)
-BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_TRUSTED_ARGS)
-BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_TRUSTED_ARGS)
-BTF_ID_FLAGS(func, bpf_ct_set_status, KF_TRUSTED_ARGS)
-BTF_ID_FLAGS(func, bpf_ct_change_status, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_OWNED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_OWNED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_set_status, KF_OWNED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_change_status, KF_OWNED_ARGS)
BTF_SET8_END(nf_ct_kfunc_set)
static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = {
diff --git a/net/netfilter/nf_nat_bpf.c b/net/netfilter/nf_nat_bpf.c
index 0fa5a0bbb0ff..69d6668756c4 100644
--- a/net/netfilter/nf_nat_bpf.c
+++ b/net/netfilter/nf_nat_bpf.c
@@ -57,7 +57,7 @@ int bpf_ct_set_nat_info(struct nf_conn___init *nfct,
__diag_pop()
BTF_SET8_START(nf_nat_kfunc_set)
-BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_OWNED_ARGS)
BTF_SET8_END(nf_nat_kfunc_set)
static const struct btf_kfunc_id_set nf_bpf_nat_kfunc_set = {
diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
index 0d66b1524208..1070ce936d32 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
@@ -20,7 +20,7 @@ struct {
{ "reject_bad_type_match", "invalid kptr access, R1 type=untrusted_ptr_prog_test_ref_kfunc" },
{ "marked_as_untrusted_or_null", "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_" },
{ "correct_btf_id_check_size", "access beyond struct prog_test_ref_kfunc at off 32 size 4" },
- { "inherit_untrusted_on_walk", "R1 type=untrusted_ptr_ expected=percpu_ptr_" },
+ { "inherit_untrusted_on_walk", "R1 type=untrusted_walked_ptr_ expected=percpu_ptr_" },
{ "reject_kptr_xchg_on_unref", "off=8 kptr isn't referenced kptr" },
{ "reject_kptr_get_no_map_val", "arg#0 expected pointer to map value" },
{ "reject_kptr_get_no_null_map_val", "arg#0 expected pointer to map value" },
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index e1a937277b54..3327b3e75ce8 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -181,7 +181,7 @@
},
.result_unpriv = REJECT,
.result = REJECT,
- .errstr = "negative offset ptr_ ptr R1 off=-4 disallowed",
+ .errstr = "negative offset walked_ptr_ ptr R1 off=-4 disallowed",
},
{
"calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset",
@@ -243,7 +243,7 @@
},
.result_unpriv = REJECT,
.result = REJECT,
- .errstr = "R1 must be referenced",
+ .errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point to scalar",
},
{
"calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID",
diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c
index 6914904344c0..003bae55d79e 100644
--- a/tools/testing/selftests/bpf/verifier/map_kptr.c
+++ b/tools/testing/selftests/bpf/verifier/map_kptr.c
@@ -242,7 +242,7 @@
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
.fixup_map_kptr = { 1 },
.result = REJECT,
- .errstr = "R1 type=untrusted_ptr_ expected=percpu_ptr_",
+ .errstr = "R1 type=untrusted_walked_ptr_ expected=percpu_ptr_",
},
{
"map_kptr: unref: no reference state created",
--
2.38.0
Now that BPF supports adding new kernel functions with kfuncs, and
storing kernel objects in maps with kptrs, we can add a set of kfuncs
which allow struct task_struct objects to be stored in maps as
referenced kptrs. The possible use cases for doing this are plentiful.
During tracing, for example, it would be useful to be able to collect
some tasks that performed a certain operation, and then periodically
summarize who they are, which cgroup they're in, how much CPU time
they've utilized, etc.
In order to enable this, this patch adds three new kfuncs:
struct task_struct *bpf_task_acquire(struct task_struct *p);
struct task_struct *bpf_task_kptr_get(struct task_struct **pp);
void bpf_task_release(struct task_struct *p);
A follow-on patch will add selftests validating these kfuncs.
Signed-off-by: David Vernet <[email protected]>
---
kernel/bpf/helpers.c | 86 +++++++++++++++++++++++++++++++++++++++++---
1 file changed, 81 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a6b04faed282..bf2b1679aea8 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1700,20 +1700,96 @@ bpf_base_func_proto(enum bpf_func_id func_id)
}
}
-BTF_SET8_START(tracing_btf_ids)
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+ "Global functions as their definitions will be in vmlinux BTF");
+
+/**
+ * bpf_task_acquire - Acquire a reference to a task. A task acquired by this
+ * kfunc which is not stored in a map as a kptr, must be released by calling
+ * bpf_task_release().
+ * @p: The task on which a reference is being acquired.
+ */
+__used noinline
+struct task_struct *bpf_task_acquire(struct task_struct *p)
+{
+ if (!p)
+ return NULL;
+
+ refcount_inc(&p->rcu_users);
+ return p;
+}
+
+/**
+ * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
+ * kptr acquired by this kfunc which is not subsequently stored in a map, must
+ * be released by calling bpf_task_release().
+ * @pp: A pointer to a task kptr on which a reference is being acquired.
+ */
+__used noinline
+struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
+{
+ struct task_struct *p;
+
+ rcu_read_lock();
+ p = READ_ONCE(*pp);
+ if (p && !refcount_inc_not_zero(&p->rcu_users))
+ p = NULL;
+ rcu_read_unlock();
+
+ return p;
+}
+
+/**
+ * bpf_task_release - Release the reference acquired on a struct task_struct *.
+ * If this kfunc is invoked in an RCU read region, the task_struct is
+ * guaranteed to not be freed until the current grace period has ended, even if
+ * its refcount drops to 0.
+ * @p: The task on which a reference is being released.
+ */
+__used noinline void bpf_task_release(struct task_struct *p)
+{
+ if (!p)
+ return;
+
+ put_task_struct_rcu_user(p);
+}
+
+__diag_pop();
+
+BTF_SET8_START(generic_kfunc_btf_ids)
#ifdef CONFIG_KEXEC_CORE
BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE)
#endif
-BTF_SET8_END(tracing_btf_ids)
+BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
+BTF_SET8_END(generic_kfunc_btf_ids)
-static const struct btf_kfunc_id_set tracing_kfunc_set = {
+static const struct btf_kfunc_id_set generic_kfunc_set = {
.owner = THIS_MODULE,
- .set = &tracing_btf_ids,
+ .set = &generic_kfunc_btf_ids,
};
+BTF_ID_LIST(generic_kfunc_dtor_ids)
+BTF_ID(struct, task_struct)
+BTF_ID(func, bpf_task_release)
+
static int __init kfunc_init(void)
{
- return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &tracing_kfunc_set);
+ int ret;
+ const struct btf_id_dtor_kfunc generic_kfunc_dtors[] = {
+ {
+ .btf_id = generic_kfunc_dtor_ids[0],
+ .kfunc_btf_id = generic_kfunc_dtor_ids[1]
+ },
+ };
+
+ ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &generic_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &generic_kfunc_set);
+ return ret ?: register_btf_id_dtor_kfuncs(generic_kfunc_dtors,
+ ARRAY_SIZE(generic_kfunc_dtors),
+ THIS_MODULE);
}
late_initcall(kfunc_init);
--
2.38.0
A previous change added a series of kfuncs for storing struct
task_struct objects as referenced kptrs. This patch adds a new
task_kfunc test suite for validating their expected behavior.
Signed-off-by: David Vernet <[email protected]>
---
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../selftests/bpf/prog_tests/task_kfunc.c | 160 +++++++++
.../selftests/bpf/progs/task_kfunc_common.h | 81 +++++
.../selftests/bpf/progs/task_kfunc_failure.c | 315 ++++++++++++++++++
.../selftests/bpf/progs/task_kfunc_success.c | 132 ++++++++
5 files changed, 689 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_kfunc.c
create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_common.h
create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_failure.c
create mode 100644 tools/testing/selftests/bpf/progs/task_kfunc_success.c
diff --git a/tools/testing/selftests/bpf/DENYLIST.s390x b/tools/testing/selftests/bpf/DENYLIST.s390x
index 520f12229b98..323a0e312b3d 100644
--- a/tools/testing/selftests/bpf/DENYLIST.s390x
+++ b/tools/testing/selftests/bpf/DENYLIST.s390x
@@ -52,6 +52,7 @@ skc_to_unix_sock # could not attach BPF object unexpecte
socket_cookie # prog_attach unexpected error: -524 (trampoline)
stacktrace_build_id # compare_map_keys stackid_hmap vs. stackmap err -2 errno 2 (?)
tailcalls # tail_calls are not allowed in non-JITed programs with bpf-to-bpf calls (?)
+task_kfunc # JIT does not support calling kernel function
task_local_storage # failed to auto-attach program 'trace_exit_creds': -524 (trampoline)
test_bpffs # bpffs test failed 255 (iterator)
test_bprm_opts # failed to auto-attach program 'secure_exec': -524 (trampoline)
diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
new file mode 100644
index 000000000000..0be90106fd96
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
@@ -0,0 +1,160 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#define _GNU_SOURCE
+#include <sys/wait.h>
+#include <test_progs.h>
+#include <unistd.h>
+
+#include "task_kfunc_failure.skel.h"
+#include "task_kfunc_success.skel.h"
+
+static size_t log_buf_sz = 1 << 20; /* 1 MB */
+static char obj_log_buf[1048576];
+
+static struct task_kfunc_success *open_load_task_kfunc_skel(void)
+{
+ struct task_kfunc_success *skel;
+ int err;
+
+ skel = task_kfunc_success__open();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ return NULL;
+
+ skel->bss->pid = getpid();
+
+ err = task_kfunc_success__load(skel);
+ if (!ASSERT_OK(err, "skel_load"))
+ goto cleanup;
+
+ return skel;
+
+cleanup:
+ task_kfunc_success__destroy(skel);
+ return NULL;
+}
+
+static void run_success_test(const char *prog_name)
+{
+ struct task_kfunc_success *skel;
+ int status;
+ pid_t child_pid;
+ struct bpf_program *prog;
+ struct bpf_link *link = NULL;
+
+ skel = open_load_task_kfunc_skel();
+ if (!ASSERT_OK_PTR(skel, "open_load_skel"))
+ return;
+
+ if (!ASSERT_OK(skel->bss->err, "pre_spawn_err"))
+ goto cleanup;
+
+ prog = bpf_object__find_program_by_name(skel->obj, prog_name);
+ if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+ goto cleanup;
+
+ link = bpf_program__attach(prog);
+ if (!ASSERT_OK_PTR(link, "attached_link"))
+ goto cleanup;
+
+ child_pid = fork();
+ if (!ASSERT_GT(child_pid, -1, "child_pid"))
+ goto cleanup;
+ if (child_pid == 0)
+ _exit(0);
+ waitpid(child_pid, &status, 0);
+
+ ASSERT_OK(skel->bss->err, "post_wait_err");
+
+cleanup:
+ bpf_link__destroy(link);
+ task_kfunc_success__destroy(skel);
+}
+
+static const char * const success_tests[] = {
+ "test_task_acquire_release_argument",
+ "test_task_acquire_release_current",
+ "test_task_acquire_leave_in_map",
+ "test_task_xchg_release",
+ "test_task_get_release",
+};
+
+static struct {
+ const char *prog_name;
+ const char *expected_err_msg;
+} failure_tests[] = {
+ {"task_kfunc_acquire_untrusted", "arg#0 pointer type STRUCT task_struct must point"},
+ {"task_kfunc_acquire_fp", "arg#0 pointer type STRUCT task_struct must point"},
+ {"task_kfunc_acquire_no_null_check", "arg#0 pointer type STRUCT task_struct must point"},
+ {"task_kfunc_acquire_trusted_walked", "arg#0 pointer type STRUCT task_struct must point"},
+ {"task_kfunc_acquire_null", "arg#0 pointer type STRUCT task_struct must point"},
+ {"task_kfunc_acquire_unreleased", "Unreleased reference"},
+ {"task_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
+ {"task_kfunc_get_non_kptr_acquired", "arg#0 expected pointer to map value"},
+ {"task_kfunc_get_null", "arg#0 expected pointer to map value"},
+ {"task_kfunc_xchg_unreleased", "Unreleased reference"},
+ {"task_kfunc_get_unreleased", "Unreleased reference"},
+ {"task_kfunc_release_untrusted", "arg#0 pointer type STRUCT task_struct must point"},
+ {"task_kfunc_release_fp", "arg#0 pointer type STRUCT task_struct must point"},
+ {"task_kfunc_release_null", "arg#0 pointer type STRUCT task_struct must point"},
+ {"task_kfunc_release_unacquired", "release kernel function bpf_task_release expects"},
+};
+
+static void verify_fail(const char *prog_name, const char *expected_err_msg)
+{
+ LIBBPF_OPTS(bpf_object_open_opts, opts);
+ struct task_kfunc_failure *skel;
+ int err, i;
+
+ opts.kernel_log_buf = obj_log_buf;
+ opts.kernel_log_size = log_buf_sz;
+ opts.kernel_log_level = 1;
+
+ skel = task_kfunc_failure__open_opts(&opts);
+ if (!ASSERT_OK_PTR(skel, "task_kfunc_failure__open_opts"))
+ goto cleanup;
+
+ skel->bss->pid = getpid();
+
+ for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
+ struct bpf_program *prog;
+ const char *curr_name = failure_tests[i].prog_name;
+
+ prog = bpf_object__find_program_by_name(skel->obj, curr_name);
+ if (!ASSERT_OK_PTR(prog, "bpf_object__find_program_by_name"))
+ goto cleanup;
+
+ bpf_program__set_autoload(prog, !strcmp(curr_name, prog_name));
+ }
+
+ err = task_kfunc_failure__load(skel);
+ if (!ASSERT_ERR(err, "unexpected load success"))
+ goto cleanup;
+
+ if (!ASSERT_OK_PTR(strstr(obj_log_buf, expected_err_msg), "expected_err_msg")) {
+ fprintf(stderr, "Expected err_msg: %s\n", expected_err_msg);
+ fprintf(stderr, "Verifier output: %s\n", obj_log_buf);
+ }
+
+cleanup:
+ task_kfunc_failure__destroy(skel);
+}
+
+void test_task_kfunc(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(success_tests); i++) {
+ if (!test__start_subtest(success_tests[i]))
+ continue;
+
+ run_success_test(success_tests[i]);
+ }
+
+ for (i = 0; i < ARRAY_SIZE(failure_tests); i++) {
+ if (!test__start_subtest(failure_tests[i].prog_name))
+ continue;
+
+ verify_fail(failure_tests[i].prog_name, failure_tests[i].expected_err_msg);
+ }
+}
diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_common.h b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
new file mode 100644
index 000000000000..a5bb7604505a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_common.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#ifndef _TASK_KFUNC_COMMON_H
+#define _TASK_KFUNC_COMMON_H
+
+#include <errno.h>
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct __tasks_kfunc_map_value {
+ struct task_struct __kptr_ref * task;
+};
+
+struct hash_map {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __type(key, int);
+ __type(value, struct __tasks_kfunc_map_value);
+ __uint(max_entries, 1);
+} __tasks_kfunc_map SEC(".maps");
+
+struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
+struct task_struct *bpf_task_kptr_get(struct task_struct **pp) __ksym;
+void bpf_task_release(struct task_struct *p) __ksym;
+
+/* The pid of the test process used to determine if a newly created task is the test task. */
+int pid;
+
+static inline struct __tasks_kfunc_map_value *tasks_kfunc_map_value_lookup(struct task_struct *p)
+{
+ s32 pid;
+ long status;
+
+ status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
+ if (status)
+ return NULL;
+
+ return bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
+}
+
+static inline int tasks_kfunc_map_insert(struct task_struct *p)
+{
+ struct __tasks_kfunc_map_value local, *v;
+ long status;
+ struct task_struct *acquired, *old;
+ s32 pid;
+
+ status = bpf_probe_read_kernel(&pid, sizeof(pid), &p->pid);
+ if (status)
+ return status;
+
+ local.task = NULL;
+ status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
+ if (status)
+ return status;
+
+ v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
+ if (!v) {
+ bpf_map_delete_elem(&__tasks_kfunc_map, &pid);
+ return status;
+ }
+
+ acquired = bpf_task_acquire(p);
+ old = bpf_kptr_xchg(&v->task, acquired);
+ if (old) {
+ bpf_task_release(old);
+ return -EEXIST;
+ }
+
+ return 0;
+}
+
+static inline bool is_test_kfunc_task(void)
+{
+ int cur_pid = bpf_get_current_pid_tgid() >> 32;
+
+ return pid == cur_pid;
+}
+
+#endif /* _TASK_KFUNC_COMMON_H */
diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_failure.c b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
new file mode 100644
index 000000000000..892ee73a928c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_failure.c
@@ -0,0 +1,315 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+/* Prototype for all of the program trace events below:
+ *
+ * TRACE_EVENT(task_newtask,
+ * TP_PROTO(struct task_struct *p, u64 clone_flags)
+ */
+
+static struct __tasks_kfunc_map_value *insert_lookup_task(struct task_struct *task)
+{
+ int status;
+
+ status = tasks_kfunc_map_insert(task);
+ if (status)
+ return NULL;
+
+ return tasks_kfunc_map_value_lookup(task);
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_untrusted, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *acquired;
+ struct __tasks_kfunc_map_value *v;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ v = insert_lookup_task(task);
+ if (!v)
+ return 0;
+
+ /* Can't invoke bpf_task_acquire() on an untrusted pointer. */
+ acquired = bpf_task_acquire(v->task);
+ if (!acquired)
+ return 0;
+
+ bpf_task_release(acquired);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_fp, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *acquired, *stack_task = (struct task_struct *)&clone_flags;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ /* Can't invoke bpf_task_acquire() on a random frame pointer. */
+ acquired = bpf_task_acquire((struct task_struct *)&stack_task);
+ if (!acquired)
+ return 0;
+ bpf_task_release(acquired);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_no_null_check, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *acquired;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ acquired = bpf_task_acquire(task);
+ /* Can't release a bpf_task_acquire()'d task without a NULL check. */
+ bpf_task_release(acquired);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_trusted_walked, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *acquired;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ /* Can't invoke bpf_task_acquire() on a trusted pointer obtained from walking a struct. */
+ acquired = bpf_task_acquire(task->last_wakee);
+ if (!acquired)
+ return 0;
+ bpf_task_release(acquired);
+
+ return 0;
+}
+
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_null, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *acquired;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ /* Can't invoke bpf_task_acquire() on a NULL pointer. */
+ acquired = bpf_task_acquire(NULL);
+ if (!acquired)
+ return 0;
+ bpf_task_release(acquired);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_acquire_unreleased, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *acquired;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ acquired = bpf_task_acquire(task);
+
+ /* Acquired task is never released. */
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_get_non_kptr_param, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *kptr;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ /* Cannot use bpf_task_kptr_get() on a non-kptr, even on a valid task. */
+ kptr = bpf_task_kptr_get(&task);
+ if (!kptr)
+ return 0;
+
+ bpf_task_release(kptr);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_get_non_kptr_acquired, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *kptr, *acquired;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ acquired = bpf_task_acquire(task);
+ if (!acquired)
+ return 0;
+
+ /* Cannot use bpf_task_kptr_get() on a non-kptr, even if it was acquired. */
+ kptr = bpf_task_kptr_get(&acquired);
+ bpf_task_release(acquired);
+ if (!kptr)
+ return 0;
+
+ bpf_task_release(kptr);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_get_null, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *kptr;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ /* Cannot use bpf_task_kptr_get() on a NULL pointer. */
+ kptr = bpf_task_kptr_get(NULL);
+ if (!kptr)
+ return 0;
+
+ bpf_task_release(kptr);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_xchg_unreleased, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *kptr;
+ struct __tasks_kfunc_map_value *v;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ v = insert_lookup_task(task);
+ if (!v)
+ return 0;
+
+ kptr = bpf_kptr_xchg(&v->task, NULL);
+ if (!kptr)
+ return 0;
+
+ /* Kptr retrieved from map is never released. */
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_get_unreleased, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *kptr;
+ struct __tasks_kfunc_map_value *v;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ v = insert_lookup_task(task);
+ if (!v)
+ return 0;
+
+ kptr = bpf_task_kptr_get(&v->task);
+ if (!kptr)
+ return 0;
+
+ /* Kptr acquired above is never released. */
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_release_untrusted, struct task_struct *task, u64 clone_flags)
+{
+ struct __tasks_kfunc_map_value *v;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ v = insert_lookup_task(task);
+ if (!v)
+ return 0;
+
+ /* Can't invoke bpf_task_release() on an untrusted pointer. */
+ bpf_task_release(v->task);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_release_fp, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *acquired = (struct task_struct *)&clone_flags;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ /* Cannot release random frame pointer. */
+ bpf_task_release(acquired);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_release_null, struct task_struct *task, u64 clone_flags)
+{
+ struct __tasks_kfunc_map_value local, *v;
+ long status;
+ struct task_struct *acquired, *old;
+ s32 pid;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ status = bpf_probe_read_kernel(&pid, sizeof(pid), &task->pid);
+ if (status)
+ return 0;
+
+ local.task = NULL;
+ status = bpf_map_update_elem(&__tasks_kfunc_map, &pid, &local, BPF_NOEXIST);
+ if (status)
+ return status;
+
+ v = bpf_map_lookup_elem(&__tasks_kfunc_map, &pid);
+ if (!v)
+ return status;
+
+ acquired = bpf_task_acquire(task);
+ if (!acquired)
+ return 0;
+
+ old = bpf_kptr_xchg(&v->task, acquired);
+
+ /* old cannot be passed to bpf_task_release() without a NULL check. */
+ bpf_task_release(old);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(task_kfunc_release_unacquired, struct task_struct *task, u64 clone_flags)
+{
+ if (!is_test_kfunc_task())
+ return 0;
+
+ /* Cannot release trusted task pointer which was not acquired. */
+ bpf_task_release(task);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/task_kfunc_success.c b/tools/testing/selftests/bpf/progs/task_kfunc_success.c
new file mode 100644
index 000000000000..8d5c05b41d53
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_kfunc_success.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */
+
+#include <vmlinux.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_helpers.h>
+
+#include "task_kfunc_common.h"
+
+char _license[] SEC("license") = "GPL";
+
+int err;
+
+/* Prototype for all of the program trace events below:
+ *
+ * TRACE_EVENT(task_newtask,
+ * TP_PROTO(struct task_struct *p, u64 clone_flags)
+ */
+
+static int test_acquire_release(struct task_struct *task)
+{
+ struct task_struct *acquired;
+
+ acquired = bpf_task_acquire(task);
+ if (!acquired) {
+ err = 1;
+ return 0;
+ }
+
+ bpf_task_release(acquired);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_acquire_release_argument, struct task_struct *task, u64 clone_flags)
+{
+ if (!is_test_kfunc_task())
+ return 0;
+
+ return test_acquire_release(task);
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_acquire_release_current, struct task_struct *task, u64 clone_flags)
+{
+ if (!is_test_kfunc_task())
+ return 0;
+
+ return test_acquire_release(bpf_get_current_task_btf());
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_acquire_leave_in_map, struct task_struct *task, u64 clone_flags)
+{
+ long status;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ status = tasks_kfunc_map_insert(task);
+ if (status)
+ err = 1;
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_xchg_release, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *kptr;
+ struct __tasks_kfunc_map_value *v;
+ long status;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ status = tasks_kfunc_map_insert(task);
+ if (status) {
+ err = 1;
+ return 0;
+ }
+
+ v = tasks_kfunc_map_value_lookup(task);
+ if (!v) {
+ err = 2;
+ return 0;
+ }
+
+ kptr = bpf_kptr_xchg(&v->task, NULL);
+ if (!kptr) {
+ err = 3;
+ return 0;
+ }
+
+ bpf_task_release(kptr);
+
+ return 0;
+}
+
+SEC("tp_btf/task_newtask")
+int BPF_PROG(test_task_get_release, struct task_struct *task, u64 clone_flags)
+{
+ struct task_struct *kptr;
+ struct __tasks_kfunc_map_value *v;
+ long status;
+
+ if (!is_test_kfunc_task())
+ return 0;
+
+ status = tasks_kfunc_map_insert(task);
+ if (status) {
+ err = 1;
+ return 0;
+ }
+
+ v = tasks_kfunc_map_value_lookup(task);
+ if (!v) {
+ err = 2;
+ return 0;
+ }
+
+ kptr = bpf_task_kptr_get(&v->task);
+ if (!kptr) {
+ err = 3;
+ return 0;
+ }
+
+ bpf_task_release(kptr);
+
+ return 0;
+}
--
2.38.0
On Thu, Oct 20, 2022 at 05:24:14PM -0500, David Vernet wrote:
> Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
> to the verifier that it should enforce that a BPF program passes it a
> "safe", trusted pointer. Currently, "safe" means that the pointer is
> either PTR_TO_CTX, or is refcounted. There may be cases, however, where
> the kernel passes a BPF program a safe / trusted pointer to an object
> that the BPF program wishes to use as a kptr, but because the object
> does not yet have a ref_obj_id from the perspective of the verifier, the
> program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
> kfunc.
>
> The solution is to expand the set of pointers that are considered
> trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
> with these pointers without getting rejected by the verifier.
>
> There is already a PTR_UNTRUSTED flag that is set in some scenarios,
> such as when a BPF program reads a kptr directly from a map
> without performing a bpf_kptr_xchg() call. These pointers of course can
> and should be rejected by the verifier. Unfortunately, however,
> PTR_UNTRUSTED does not cover all the cases for safety that need to
> be addressed to adequately protect kfuncs. Specifically, pointers
> obtained by a BPF program "walking" a struct are _not_ considered
> PTR_UNTRUSTED according to BPF. For example, say that we were to add a
> kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
> acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
> that a task was unsafe to pass to a kfunc, the verifier would mistakenly
> allow the following unsafe BPF program to be loaded:
>
> SEC("tp_btf/task_newtask")
> int BPF_PROG(unsafe_acquire_task,
> struct task_struct *task,
> u64 clone_flags)
> {
> struct task_struct *acquired, *nested;
>
> nested = task->last_wakee;
>
> /* Would not be rejected by the verifier. */
> acquired = bpf_task_acquire(nested);
> if (!acquired)
> return 0;
>
> bpf_task_release(acquired);
> return 0;
> }
>
> To address this, this patch defines a new type flag called PTR_WALKED
> which tracks whether a PTR_TO_BTF_ID pointer was retrieved from walking
> a struct. A pointer passed directly from the kernel begins with
> (PTR_WALKED & type) == 0, meaning of course that it is not obtained from
> walking another struct. Any pointer received from walking that object,
> however, would inherit that flag and become a walked pointer.
>
> Additionally, because some kfuncs still only want BPF programs to be
> able to send them an arg that they "own" (i.e. which they own a refcount
> on) another kfunc arg flag called KF_OWNED_ARGS is added which is
> identical to KF_TRUSTED_ARGS, but imposes the stricter requirement that
> the arg must also have a refcount.
>
> A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
> and then another patch will validate this feature by ensuring that the
> verifier rejects a kfunc invocation with a nested pointer.
>
> Signed-off-by: David Vernet <[email protected]>
> ---
> Documentation/bpf/kfuncs.rst | 50 ++++++++++----
> include/linux/bpf.h | 6 ++
> include/linux/btf.h | 57 ++++++++++++++--
> kernel/bpf/btf.c | 18 ++++-
> kernel/bpf/verifier.c | 66 ++++++++++++++-----
> net/bpf/test_run.c | 2 +-
> net/netfilter/nf_conntrack_bpf.c | 8 +--
> net/netfilter/nf_nat_bpf.c | 2 +-
> .../selftests/bpf/prog_tests/map_kptr.c | 2 +-
> tools/testing/selftests/bpf/verifier/calls.c | 4 +-
> .../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
> 11 files changed, 169 insertions(+), 48 deletions(-)
>
> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 0f858156371d..8e2825150a8d 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst
> @@ -137,30 +137,54 @@ KF_ACQUIRE and KF_RET_NULL flags.
> --------------------------
>
> The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> -indicates that the all pointer arguments will always have a guaranteed lifetime,
> -and pointers to kernel objects are always passed to helpers in their unmodified
> -form (as obtained from acquire kfuncs).
> +indicates that the all pointer arguments will always have a guaranteed
> +lifetime, and pointers to kernel objects are always passed to helpers in their
> +unmodified form (either as passed by the main kernel, or as obtained from
> +acquire kfuncs).
>
> -It can be used to enforce that a pointer to a refcounted object acquired from a
> -kfunc or BPF helper is passed as an argument to this kfunc without any
> -modifications (e.g. pointer arithmetic) such that it is trusted and points to
> -the original object.
> +It can be used to enforce that a safe pointer passed to the program by the
> +kernel, or a refcounted object acquired from a kfunc or BPF helper, is passed
> +as an argument to this kfunc without any modifications (e.g. pointer
> +arithmetic) such that it is trusted and points to the original object.
>
> Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
> but those can have a non-zero offset.
>
> -This flag is often used for kfuncs that operate (change some property, perform
> -some operation) on an object that was obtained using an acquire kfunc. Such
> -kfuncs need an unchanged pointer to ensure the integrity of the operation being
> -performed on the expected object.
> +This flag is often used for kfuncs that receive a trusted pointer from the
> +kernel, and which do not require a reference to be held by the program. For
> +example, if there's a kernel object that was allocated by the main kernel, and
> +which the BPF program wishes to store in a map as a kptr, KF_TRUSTED_ARGS can
> +be used to ensure that the pointer is actually a trusted kernel pointer before
> +a reference is acquired on it in a KF_ACQUIRE kfunc.
> +
> +2.4.6 KF_OWNED_ARGS flag
> +------------------------
> +
> +The KF_OWNED_ARGS flag is identical to the KF_TRUSTED_ARGS flag, though it is
> +more restrictive in that it also requires the BPF program to hold a reference
> +on the object.
>
> -2.4.6 KF_SLEEPABLE flag
> +In other words, it can be used to enforce that a pointer to a refcounted object
> +acquired from a kfunc or BPF helper is passed as an argument to this kfunc
> +without any modifications (e.g. pointer arithmetic) such that it is trusted and
> +points to the original object that was allocated or owned by the BPF program.
> +
> +This flag is often used for kfuncs that operate (change some property, perform
> +some operation) on an object that was obtained using an acquire kfunc. For
> +example, if an acquire kfunc allocates an object on behalf of a program,
> +KF_OWNED_ARGS would be an appropriate flag to specify for other kfuncs which
> +allow the program to mutate that object. KF_TRUSTED_ARGS, on the other hand,
> +would likely not be sufficiently restrictive as the kfunc does not want to
> +allow the BPF program to mutate another instance of the same object type which
> +was allocated by the main kernel.
> +
> +2.4.7 KF_SLEEPABLE flag
> -----------------------
>
> The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
> be called by sleepable BPF programs (BPF_F_SLEEPABLE).
>
> -2.4.7 KF_DESTRUCTIVE flag
> +2.4.8 KF_DESTRUCTIVE flag
> --------------------------
>
> The KF_DESTRUCTIVE flag is used to indicate functions calling which is
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9e7d46d16032..ccdbefd72a95 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -457,6 +457,12 @@ enum bpf_type_flag {
> /* Size is known at compile time. */
> MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS),
>
> + /* PTR was obtained from walking a struct. This is used with
> + * PTR_TO_BTF_ID to determine whether the pointer is safe to pass to a
> + * kfunc with KF_TRUSTED_ARGS.
> + */
> + PTR_WALKED = BIT(11 + BPF_BASE_TYPE_BITS),
> +
> __BPF_TYPE_FLAG_MAX,
> __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
> };
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index f9aababc5d78..7f5a438196a2 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -17,9 +17,48 @@
> #define KF_RELEASE (1 << 1) /* kfunc is a release function */
> #define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */
> #define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */
> -/* Trusted arguments are those which are meant to be referenced arguments with
> - * unchanged offset. It is used to enforce that pointers obtained from acquire
> - * kfuncs remain unmodified when being passed to helpers taking trusted args.
> +/* Trusted arguments are those which are meant to be guaranteed valid
> + * arguments, with an unchanged offset. It is used to enforce that pointers
> + * obtained from either acquire kfuncs or the main kernel remain unmodified
> + * when being passed to helpers taking trusted args.
> + *
> + * Consider, for example, the following task tracepoint:
> + *
> + * SEC("tp_btf/task_newtask")
> + * int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
> + * {
> + * ...
> + * }
> + *
> + * And the following kfunc:
> + *
> + * BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
> + *
> + * All invocations to the kfunc must pass the unmodified, unwalked task:
> + *
> + * bpf_task_acquire(task); // Allowed
> + * bpf_task_acquire(task->last_wakee); // Rejected, walked task
> + *
> + * Users may also pass referenced tasks directly to the kfunc:
> + *
> + * struct task_struct *acquired;
> + *
> + * acquired = bpf_task_acquire(task); // Allowed, same as above
> + * bpf_task_acquire(acquired); // Allowed
> + * bpf_task_acquire(task); // Allowed
> + * bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
> + *
> + * If users wish to only allow referenced objects to be passed to a kfunc, they
> + * may instead specify the KF_OWNED_ARGS flag.
> + */
> +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */
> +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */
> +/* Owned arguments are similar to trusted arguments, but are even more
> + * restrictive. Owned arguments are arguments which are "owned" by the BPF
> + * program, meaning it has acquired a reference to the object via an acquire
> + * kfunc. Just as with trusted arguments, the verifier enforces that owned
> + * arguments have an unchanged offset when they're passed to kfuncs.
I don't think the kfunc writers will be able to use KF_OWNED_ARGS vs KF_TRUSTED_ARGS properly.
refcnt-ed or not is not a property that they should worry about.
Let's evaluate this patch set without KF_OWNED_ARGS and bpf_ct_* are still KF_TRUSTED_ARGS
and the other side of the verifier is relaxed to accept non-refcnted PTR_TO_BTF_ID
into kfunc.
What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
We've introduced nf_conn___init vs nf_conf specifically to express the relationship
between allocated nf_conn and other nf_conn-s via different types.
Why is this not enough?
I prefer to keep only one flag KF_TRUSTED_ARGS that kfunc-s need to use
and eventually make all kfuncs KF_TRUSTED_ARGS by default and remove that flag.
Separately...
I think there was a plan to strengthen PTR_TO_BTF_ID and apply PTR_UNTRUSTED.
This PTR_WALKED looks like new thing.
If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
as PTR_WALKED is doing.
I mean we can introduce PTR_TRUSTED and add this flag to return value
of bpf_get_current_task_btf() and arguments of tracepoints.
As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
backward compat behavior of PTR_TO_BTF_ID.
PTR_WALKED is sort-of doing the same, but not conservative enough.
Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
I might have missed earlier discussions on this patch set. Apologies if so.
On Mon, Oct 31, 2022 at 05:02:39PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 20, 2022 at 05:24:14PM -0500, David Vernet wrote:
> > Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
> > to the verifier that it should enforce that a BPF program passes it a
> > "safe", trusted pointer. Currently, "safe" means that the pointer is
> > either PTR_TO_CTX, or is refcounted. There may be cases, however, where
> > the kernel passes a BPF program a safe / trusted pointer to an object
> > that the BPF program wishes to use as a kptr, but because the object
> > does not yet have a ref_obj_id from the perspective of the verifier, the
> > program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
> > kfunc.
> >
> > The solution is to expand the set of pointers that are considered
> > trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
> > with these pointers without getting rejected by the verifier.
> >
> > There is already a PTR_UNTRUSTED flag that is set in some scenarios,
> > such as when a BPF program reads a kptr directly from a map
> > without performing a bpf_kptr_xchg() call. These pointers of course can
> > and should be rejected by the verifier. Unfortunately, however,
> > PTR_UNTRUSTED does not cover all the cases for safety that need to
> > be addressed to adequately protect kfuncs. Specifically, pointers
> > obtained by a BPF program "walking" a struct are _not_ considered
> > PTR_UNTRUSTED according to BPF. For example, say that we were to add a
> > kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
> > acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
> > that a task was unsafe to pass to a kfunc, the verifier would mistakenly
> > allow the following unsafe BPF program to be loaded:
> >
> > SEC("tp_btf/task_newtask")
> > int BPF_PROG(unsafe_acquire_task,
> > struct task_struct *task,
> > u64 clone_flags)
> > {
> > struct task_struct *acquired, *nested;
> >
> > nested = task->last_wakee;
> >
> > /* Would not be rejected by the verifier. */
> > acquired = bpf_task_acquire(nested);
> > if (!acquired)
> > return 0;
> >
> > bpf_task_release(acquired);
> > return 0;
> > }
> >
> > To address this, this patch defines a new type flag called PTR_WALKED
> > which tracks whether a PTR_TO_BTF_ID pointer was retrieved from walking
> > a struct. A pointer passed directly from the kernel begins with
> > (PTR_WALKED & type) == 0, meaning of course that it is not obtained from
> > walking another struct. Any pointer received from walking that object,
> > however, would inherit that flag and become a walked pointer.
> >
> > Additionally, because some kfuncs still only want BPF programs to be
> > able to send them an arg that they "own" (i.e. which they own a refcount
> > on) another kfunc arg flag called KF_OWNED_ARGS is added which is
> > identical to KF_TRUSTED_ARGS, but imposes the stricter requirement that
> > the arg must also have a refcount.
> >
> > A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
> > and then another patch will validate this feature by ensuring that the
> > verifier rejects a kfunc invocation with a nested pointer.
> >
> > Signed-off-by: David Vernet <[email protected]>
> > ---
> > Documentation/bpf/kfuncs.rst | 50 ++++++++++----
> > include/linux/bpf.h | 6 ++
> > include/linux/btf.h | 57 ++++++++++++++--
> > kernel/bpf/btf.c | 18 ++++-
> > kernel/bpf/verifier.c | 66 ++++++++++++++-----
> > net/bpf/test_run.c | 2 +-
> > net/netfilter/nf_conntrack_bpf.c | 8 +--
> > net/netfilter/nf_nat_bpf.c | 2 +-
> > .../selftests/bpf/prog_tests/map_kptr.c | 2 +-
> > tools/testing/selftests/bpf/verifier/calls.c | 4 +-
> > .../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
> > 11 files changed, 169 insertions(+), 48 deletions(-)
> >
> > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > index 0f858156371d..8e2825150a8d 100644
> > --- a/Documentation/bpf/kfuncs.rst
> > +++ b/Documentation/bpf/kfuncs.rst
> > @@ -137,30 +137,54 @@ KF_ACQUIRE and KF_RET_NULL flags.
> > --------------------------
> >
> > The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> > -indicates that the all pointer arguments will always have a guaranteed lifetime,
> > -and pointers to kernel objects are always passed to helpers in their unmodified
> > -form (as obtained from acquire kfuncs).
> > +indicates that the all pointer arguments will always have a guaranteed
> > +lifetime, and pointers to kernel objects are always passed to helpers in their
> > +unmodified form (either as passed by the main kernel, or as obtained from
> > +acquire kfuncs).
> >
> > -It can be used to enforce that a pointer to a refcounted object acquired from a
> > -kfunc or BPF helper is passed as an argument to this kfunc without any
> > -modifications (e.g. pointer arithmetic) such that it is trusted and points to
> > -the original object.
> > +It can be used to enforce that a safe pointer passed to the program by the
> > +kernel, or a refcounted object acquired from a kfunc or BPF helper, is passed
> > +as an argument to this kfunc without any modifications (e.g. pointer
> > +arithmetic) such that it is trusted and points to the original object.
> >
> > Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
> > but those can have a non-zero offset.
> >
> > -This flag is often used for kfuncs that operate (change some property, perform
> > -some operation) on an object that was obtained using an acquire kfunc. Such
> > -kfuncs need an unchanged pointer to ensure the integrity of the operation being
> > -performed on the expected object.
> > +This flag is often used for kfuncs that receive a trusted pointer from the
> > +kernel, and which do not require a reference to be held by the program. For
> > +example, if there's a kernel object that was allocated by the main kernel, and
> > +which the BPF program wishes to store in a map as a kptr, KF_TRUSTED_ARGS can
> > +be used to ensure that the pointer is actually a trusted kernel pointer before
> > +a reference is acquired on it in a KF_ACQUIRE kfunc.
> > +
> > +2.4.6 KF_OWNED_ARGS flag
> > +------------------------
> > +
> > +The KF_OWNED_ARGS flag is identical to the KF_TRUSTED_ARGS flag, though it is
> > +more restrictive in that it also requires the BPF program to hold a reference
> > +on the object.
> >
> > -2.4.6 KF_SLEEPABLE flag
> > +In other words, it can be used to enforce that a pointer to a refcounted object
> > +acquired from a kfunc or BPF helper is passed as an argument to this kfunc
> > +without any modifications (e.g. pointer arithmetic) such that it is trusted and
> > +points to the original object that was allocated or owned by the BPF program.
> > +
> > +This flag is often used for kfuncs that operate (change some property, perform
> > +some operation) on an object that was obtained using an acquire kfunc. For
> > +example, if an acquire kfunc allocates an object on behalf of a program,
> > +KF_OWNED_ARGS would be an appropriate flag to specify for other kfuncs which
> > +allow the program to mutate that object. KF_TRUSTED_ARGS, on the other hand,
> > +would likely not be sufficiently restrictive as the kfunc does not want to
> > +allow the BPF program to mutate another instance of the same object type which
> > +was allocated by the main kernel.
> > +
> > +2.4.7 KF_SLEEPABLE flag
> > -----------------------
> >
> > The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
> > be called by sleepable BPF programs (BPF_F_SLEEPABLE).
> >
> > -2.4.7 KF_DESTRUCTIVE flag
> > +2.4.8 KF_DESTRUCTIVE flag
> > --------------------------
> >
> > The KF_DESTRUCTIVE flag is used to indicate functions calling which is
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 9e7d46d16032..ccdbefd72a95 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -457,6 +457,12 @@ enum bpf_type_flag {
> > /* Size is known at compile time. */
> > MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS),
> >
> > + /* PTR was obtained from walking a struct. This is used with
> > + * PTR_TO_BTF_ID to determine whether the pointer is safe to pass to a
> > + * kfunc with KF_TRUSTED_ARGS.
> > + */
> > + PTR_WALKED = BIT(11 + BPF_BASE_TYPE_BITS),
> > +
> > __BPF_TYPE_FLAG_MAX,
> > __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1,
> > };
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index f9aababc5d78..7f5a438196a2 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -17,9 +17,48 @@
> > #define KF_RELEASE (1 << 1) /* kfunc is a release function */
> > #define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */
> > #define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */
> > -/* Trusted arguments are those which are meant to be referenced arguments with
> > - * unchanged offset. It is used to enforce that pointers obtained from acquire
> > - * kfuncs remain unmodified when being passed to helpers taking trusted args.
> > +/* Trusted arguments are those which are meant to be guaranteed valid
> > + * arguments, with an unchanged offset. It is used to enforce that pointers
> > + * obtained from either acquire kfuncs or the main kernel remain unmodified
> > + * when being passed to helpers taking trusted args.
> > + *
> > + * Consider, for example, the following task tracepoint:
> > + *
> > + * SEC("tp_btf/task_newtask")
> > + * int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
> > + * {
> > + * ...
> > + * }
> > + *
> > + * And the following kfunc:
> > + *
> > + * BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
> > + *
> > + * All invocations to the kfunc must pass the unmodified, unwalked task:
> > + *
> > + * bpf_task_acquire(task); // Allowed
> > + * bpf_task_acquire(task->last_wakee); // Rejected, walked task
> > + *
> > + * Users may also pass referenced tasks directly to the kfunc:
> > + *
> > + * struct task_struct *acquired;
> > + *
> > + * acquired = bpf_task_acquire(task); // Allowed, same as above
> > + * bpf_task_acquire(acquired); // Allowed
> > + * bpf_task_acquire(task); // Allowed
> > + * bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
> > + *
> > + * If users wish to only allow referenced objects to be passed to a kfunc, they
> > + * may instead specify the KF_OWNED_ARGS flag.
> > + */
> > +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> > +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */
> > +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */
> > +/* Owned arguments are similar to trusted arguments, but are even more
> > + * restrictive. Owned arguments are arguments which are "owned" by the BPF
> > + * program, meaning it has acquired a reference to the object via an acquire
> > + * kfunc. Just as with trusted arguments, the verifier enforces that owned
> > + * arguments have an unchanged offset when they're passed to kfuncs.
>
> I don't think the kfunc writers will be able to use KF_OWNED_ARGS vs KF_TRUSTED_ARGS properly.
> refcnt-ed or not is not a property that they should worry about.
Ok, I think I agree with you that making a separate type as we did with
struct nf_conn___init is a better solution. I'll drop this in v7. First
though, we'll have to get aligned on PTR_WALKED vs. PTR_TRUSTED.
> Let's evaluate this patch set without KF_OWNED_ARGS and bpf_ct_* are still KF_TRUSTED_ARGS
> and the other side of the verifier is relaxed to accept non-refcnted PTR_TO_BTF_ID
> into kfunc.
Sounds good, assuming Kumar agrees.
> What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> We've introduced nf_conn___init vs nf_conf specifically to express the relationship
> between allocated nf_conn and other nf_conn-s via different types.
> Why is this not enough?
Kumar should have more context here (he originally suggested this in
[0]), but AFAICT you're correct that this should be sufficient. I added
a negative test case that correctly fails if a BPF program tries to call
these helpers with a struct nf_conn* rather than a struct
nf_conn__init*.
[0]: https://lore.kernel.org/all/CAP01T77PTK+bD2mBrxJShKNPhEypT2+nSHcr3=uuJbrghv_wFg@mail.gmail.com/
> I prefer to keep only one flag KF_TRUSTED_ARGS that kfunc-s need to use
> and eventually make all kfuncs KF_TRUSTED_ARGS by default and remove that flag.
Yes, KF_TRUSTED_ARGS really should be the default. As Kumar describes in
[1], we'll have to figure out how to avoid trace progs with unsafe args
from calling these kfuncs. Maybe the right thing to do is allow-listing
rather than deny-listing, as you pointed out.
[1]: https://lore.kernel.org/bpf/CAP01T77goGbF3GVithEuJ7yMQR9PxHNA9GXFODq_nfA66G=F9g@mail.gmail.com/
> Separately...
> I think there was a plan to strengthen PTR_TO_BTF_ID and apply PTR_UNTRUSTED.
That would be nice if we could do it. I assume that the issue is we're
breaking backwards compat if we do, so I'd be curious to hear what the
plan was if you're aware. The only plan that I've seen so far is what
Kumar spelled out above in [1] above.
> This PTR_WALKED looks like new thing.
> If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> as PTR_WALKED is doing.
> I mean we can introduce PTR_TRUSTED and add this flag to return value
> of bpf_get_current_task_btf() and arguments of tracepoints.
> As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> backward compat behavior of PTR_TO_BTF_ID.
> PTR_WALKED is sort-of doing the same, but not conservative enough.
> Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
I very much prefer the idea of allowlisting instead of denylisting,
though I wish we'd taken that approach from the start rather than going
with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
(and currently does) imply PTR_TRUSTED.
If we're going to go with an allowlist approach, then I think we should
just get rid of PTR_UNTRUSTED altogether. Is that what you're
suggesting? Otherwise, if we don't get rid of PTR_UNTRUSTED, then
PTR_WALKED seems like a more natural type modifier addition.
> I might have missed earlier discussions on this patch set. Apologies if so.
Just FYI, the main initial thread where this was all discussed was [2].
[2]: https://lore.kernel.org/all/CAP01T76OR3J_P8YMq4ZgKHBpuZyA0zgsPy+tq9htbX=j6AVyOg@mail.gmail.com/
On Tue, Nov 1, 2022 at 11:11 AM David Vernet <[email protected]> wrote:
>
> > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > We've introduced / vs nf_conf specifically to express the relationship
> > between allocated nf_conn and other nf_conn-s via different types.
> > Why is this not enough?
>
> Kumar should have more context here (he originally suggested this in
> [0]),
Quoting:
"
Unfortunately a side effect of this change is that now since
PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
functions would begin working with tp_btf args.
"
I couldn't find any tracepoint that has nf_conn___init as an argument.
The whole point of that new type was to return it to bpf prog,
so the verifier type matches it when it's passed into bpf_ct_*
in turn.
So I don't see a need for a new OWNED flag still.
If nf_conn___init is passed into tracepoint it's a bug and
we gotta fix it.
> but AFAICT you're correct that this should be sufficient. I added
> a negative test case that correctly fails if a BPF program tries to call
> these helpers with a struct nf_conn* rather than a struct
> nf_conn__init*.
>
> [0]: https://lore.kernel.org/all/CAP01T77PTK+bD2mBrxJShKNPhEypT2+nSHcr3=uuJbrghv_wFg@mail.gmail.com/
>
> > I prefer to keep only one flag KF_TRUSTED_ARGS that kfunc-s need to use
> > and eventually make all kfuncs KF_TRUSTED_ARGS by default and remove that flag.
>
> Yes, KF_TRUSTED_ARGS really should be the default. As Kumar describes in
> [1], we'll have to figure out how to avoid trace progs with unsafe args
> from calling these kfuncs. Maybe the right thing to do is allow-listing
> rather than deny-listing, as you pointed out.
>
> [1]: https://lore.kernel.org/bpf/CAP01T77goGbF3GVithEuJ7yMQR9PxHNA9GXFODq_nfA66G=F9g@mail.gmail.com/
That is still the plan. more or less.
> > Separately...
> > I think there was a plan to strengthen PTR_TO_BTF_ID and apply PTR_UNTRUSTED.
>
> That would be nice if we could do it. I assume that the issue is we're
> breaking backwards compat if we do, so I'd be curious to hear what the
> plan was if you're aware. The only plan that I've seen so far is what
> Kumar spelled out above in [1] above.
Right. Backward compat with existing usage of PTR_TO_BTF_ID
is the main issue.
>
> > This PTR_WALKED looks like new thing.
> > If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> > as PTR_WALKED is doing.
> > I mean we can introduce PTR_TRUSTED and add this flag to return value
> > of bpf_get_current_task_btf() and arguments of tracepoints.
> > As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> > backward compat behavior of PTR_TO_BTF_ID.
> > PTR_WALKED is sort-of doing the same, but not conservative enough.
> > Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
>
> I very much prefer the idea of allowlisting instead of denylisting,
> though I wish we'd taken that approach from the start rather than going
> with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
> PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
> (and currently does) imply PTR_TRUSTED.
I kind agree, but we gotta have both because of backward compat.
We cannot change PTR_TO_BTF_ID as a whole right now.
Note PTR_TO_BTF_ID appears in kfuncs too.
I'm proposing to use PTR_TO_BTF_ID | PTR_TRUSTED
only in tracepoint args and as return value from
certain helpers like bpf_get_current_task_btf().
afaik it's all safe. There is no uaf here.
uaf is for kfunc. Especially fexit.
Those will stay PTR_TO_BTF_ID. Without PTR_TRUSTED.
>
> If we're going to go with an allowlist approach, then I think we should
> just get rid of PTR_UNTRUSTED altogether. Is that what you're
> suggesting? Otherwise, if we don't get rid of PTR_UNTRUSTED, then
> PTR_WALKED seems like a more natural type modifier addition.
Eventually either PTR_TRUSTED or PTR_UNTRUSTED will be removed.
> > I might have missed earlier discussions on this patch set. Apologies if so.
>
> Just FYI, the main initial thread where this was all discussed was [2].
>
> [2]: https://lore.kernel.org/all/CAP01T76OR3J_P8YMq4ZgKHBpuZyA0zgsPy+tq9htbX=j6AVyOg@mail.gmail.com/
On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> On Tue, Nov 1, 2022 at 11:11 AM David Vernet <[email protected]> wrote:
> >
> > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > We've introduced / vs nf_conf specifically to express the relationship
> > > between allocated nf_conn and other nf_conn-s via different types.
> > > Why is this not enough?
> >
> > Kumar should have more context here (he originally suggested this in
> > [0]),
>
> Quoting:
> "
> Unfortunately a side effect of this change is that now since
> PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> functions would begin working with tp_btf args.
> "
> I couldn't find any tracepoint that has nf_conn___init as an argument.
> The whole point of that new type was to return it to bpf prog,
> so the verifier type matches it when it's passed into bpf_ct_*
> in turn.
> So I don't see a need for a new OWNED flag still.
> If nf_conn___init is passed into tracepoint it's a bug and
> we gotta fix it.
Yep, this is what I'm seeing as well. I think you're right that
KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
types is the way to enable an ownership model like this.
> > but AFAICT you're correct that this should be sufficient. I added
> > a negative test case that correctly fails if a BPF program tries to call
> > these helpers with a struct nf_conn* rather than a struct
> > nf_conn__init*.
> >
> > [0]: https://lore.kernel.org/all/CAP01T77PTK+bD2mBrxJShKNPhEypT2+nSHcr3=uuJbrghv_wFg@mail.gmail.com/
> >
> > > I prefer to keep only one flag KF_TRUSTED_ARGS that kfunc-s need to use
> > > and eventually make all kfuncs KF_TRUSTED_ARGS by default and remove that flag.
> >
> > Yes, KF_TRUSTED_ARGS really should be the default. As Kumar describes in
> > [1], we'll have to figure out how to avoid trace progs with unsafe args
> > from calling these kfuncs. Maybe the right thing to do is allow-listing
> > rather than deny-listing, as you pointed out.
> >
> > [1]: https://lore.kernel.org/bpf/CAP01T77goGbF3GVithEuJ7yMQR9PxHNA9GXFODq_nfA66G=F9g@mail.gmail.com/
>
> That is still the plan. more or less.
>
> > > Separately...
> > > I think there was a plan to strengthen PTR_TO_BTF_ID and apply PTR_UNTRUSTED.
> >
> > That would be nice if we could do it. I assume that the issue is we're
> > breaking backwards compat if we do, so I'd be curious to hear what the
> > plan was if you're aware. The only plan that I've seen so far is what
> > Kumar spelled out above in [1] above.
>
> Right. Backward compat with existing usage of PTR_TO_BTF_ID
> is the main issue.
>
> >
> > > This PTR_WALKED looks like new thing.
> > > If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> > > as PTR_WALKED is doing.
> > > I mean we can introduce PTR_TRUSTED and add this flag to return value
> > > of bpf_get_current_task_btf() and arguments of tracepoints.
> > > As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> > > backward compat behavior of PTR_TO_BTF_ID.
> > > PTR_WALKED is sort-of doing the same, but not conservative enough.
> > > Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
> >
> > I very much prefer the idea of allowlisting instead of denylisting,
> > though I wish we'd taken that approach from the start rather than going
> > with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
> > PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
> > (and currently does) imply PTR_TRUSTED.
>
> I kind agree, but we gotta have both because of backward compat.
> We cannot change PTR_TO_BTF_ID as a whole right now.
>
> Note PTR_TO_BTF_ID appears in kfuncs too.
> I'm proposing to use PTR_TO_BTF_ID | PTR_TRUSTED
> only in tracepoint args and as return value from
> certain helpers like bpf_get_current_task_btf().
> afaik it's all safe. There is no uaf here.
> uaf is for kfunc. Especially fexit.
> Those will stay PTR_TO_BTF_ID. Without PTR_TRUSTED.
Ok, this feels like the right approach to me. Unless I'm missing
something, modulo doing our due diligence and checking if there are any
existing kfuncs that are relying on different behavior, once this lands
I think we could maybe even make KF_TRUSTED_ARGS the default for all
kfuncs? That should probably be done in a separate patch set though.
> > If we're going to go with an allowlist approach, then I think we should
> > just get rid of PTR_UNTRUSTED altogether. Is that what you're
> > suggesting? Otherwise, if we don't get rid of PTR_UNTRUSTED, then
> > PTR_WALKED seems like a more natural type modifier addition.
>
> Eventually either PTR_TRUSTED or PTR_UNTRUSTED will be removed.
In that case I'm fine with doing PTR_TRUSTED. Would ideally like to get
Kumar's input before doing it in v7 to avoid too much more churn.
On Wed, 2 Nov 2022 at 03:06, David Vernet <[email protected]> wrote:
>
> On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <[email protected]> wrote:
> > >
> > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > Why is this not enough?
> > >
> > > Kumar should have more context here (he originally suggested this in
> > > [0]),
> >
> > Quoting:
> > "
> > Unfortunately a side effect of this change is that now since
> > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > functions would begin working with tp_btf args.
> > "
> > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > The whole point of that new type was to return it to bpf prog,
> > so the verifier type matches it when it's passed into bpf_ct_*
> > in turn.
> > So I don't see a need for a new OWNED flag still.
> > If nf_conn___init is passed into tracepoint it's a bug and
> > we gotta fix it.
>
> Yep, this is what I'm seeing as well. I think you're right that
> KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> types is the way to enable an ownership model like this.
>
It's not just nf_conn___init. Some CT helpers also take nf_conn.
e.g. bpf_ct_change_timeout, bpf_ct_change_status.
Right now they are only allowed in XDP and TC programs, so the tracing
args part is not a problem _right now_.
So currently it may not be possible to pass such a trusted but
ref_obj_id == 0 nf_conn to those helpers.
But based on changes unrelated to this, it may become possible in the
future to obtain such a trusted nf_conn pointer.
It is hard to then go and audit all possible cases where this can be
passed into helpers/kfuncs.
It is a requirement of those kfuncs that the nf_conn has its refcount
held while they are called.
KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
It seems better to me to keep that restriction instead of relaxing it,
if it is part of the contract.
It is fine to not require people to dive into these details and just
use KF_TRUSTED_ARGS in general, but we need something to cover special
cases like these where the object is only stable while we hold an
active refcount, RCU protection is not enough against reuse.
It could be 'expert only' __ref suffix on the nf_conn arg, or
KF_OWNED_ARGS, or something else.
> > > [...]
> > >
> > > > This PTR_WALKED looks like new thing.
> > > > If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> > > > as PTR_WALKED is doing.
> > > > I mean we can introduce PTR_TRUSTED and add this flag to return value
> > > > of bpf_get_current_task_btf() and arguments of tracepoints.
> > > > As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> > > > backward compat behavior of PTR_TO_BTF_ID.
> > > > PTR_WALKED is sort-of doing the same, but not conservative enough.
> > > > Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
> > >
> > > I very much prefer the idea of allowlisting instead of denylisting,
> > > though I wish we'd taken that approach from the start rather than going
> > > with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
> > > PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
> > > (and currently does) imply PTR_TRUSTED.
> >
> > I kind agree, but we gotta have both because of backward compat.
> > We cannot change PTR_TO_BTF_ID as a whole right now.
> >
> > Note PTR_TO_BTF_ID appears in kfuncs too.
> > I'm proposing to use PTR_TO_BTF_ID | PTR_TRUSTED
> > only in tracepoint args and as return value from
> > certain helpers like bpf_get_current_task_btf().
> > afaik it's all safe. There is no uaf here.
> > uaf is for kfunc. Especially fexit.
> > Those will stay PTR_TO_BTF_ID. Without PTR_TRUSTED.
>
> Ok, this feels like the right approach to me. Unless I'm missing
> something, modulo doing our due diligence and checking if there are any
> existing kfuncs that are relying on different behavior, once this lands
> I think we could maybe even make KF_TRUSTED_ARGS the default for all
> kfuncs? That should probably be done in a separate patch set though.
>
I do like the allowlist vs denylist point from Alexei. It was also
what I originally suggested in [0], but when I went looking, pointer
walking is really the only case that was problematic, which was being
marked by PTR_WALKED. The other case of handling fexit is unrelated to
both.
But it's always better to be safe than sorry.
[0]: https://lore.kernel.org/bpf/CAP01T76zg0kABh36ekC4FTxDsdiYBaP7agErO=YadfFmaJ1LKQ@mail.gmail.com
On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> On Wed, 2 Nov 2022 at 03:06, David Vernet <[email protected]> wrote:
> >
> > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <[email protected]> wrote:
> > > >
> > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > Why is this not enough?
> > > >
> > > > Kumar should have more context here (he originally suggested this in
> > > > [0]),
> > >
> > > Quoting:
> > > "
> > > Unfortunately a side effect of this change is that now since
> > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > functions would begin working with tp_btf args.
> > > "
> > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > The whole point of that new type was to return it to bpf prog,
> > > so the verifier type matches it when it's passed into bpf_ct_*
> > > in turn.
> > > So I don't see a need for a new OWNED flag still.
> > > If nf_conn___init is passed into tracepoint it's a bug and
> > > we gotta fix it.
> >
> > Yep, this is what I'm seeing as well. I think you're right that
> > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > types is the way to enable an ownership model like this.
> >
>
> It's not just nf_conn___init. Some CT helpers also take nf_conn.
> e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> Right now they are only allowed in XDP and TC programs, so the tracing
> args part is not a problem _right now_.
... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
> So currently it may not be possible to pass such a trusted but
> ref_obj_id == 0 nf_conn to those helpers.
> But based on changes unrelated to this, it may become possible in the
> future to obtain such a trusted nf_conn pointer.
From kfunc's pov trusted pointer means valid pointer.
It doesn't need to be ref_obj_id refcounted from the verifier pov.
It can be refcounted on the kernel side and it will be trusted.
The code that calls trace_*() passes only trusted pointers into tp-s.
If there is a tracepoint somewhere in the kernel that uses a volatile
pointer to potentially uaf kernel object it's a bug that should be fixed.
> It is a requirement of those kfuncs that the nf_conn has its refcount
> held while they are called.
and it will be. Just not by the verifier.
> KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> It seems better to me to keep that restriction instead of relaxing it,
> if it is part of the contract.
Disagree as explained above.
> It is fine to not require people to dive into these details and just
> use KF_TRUSTED_ARGS in general, but we need something to cover special
> cases like these where the object is only stable while we hold an
> active refcount, RCU protection is not enough against reuse.
This is not related to RCU. Let's not mix RCU concerns in here.
It's a different topic.
> It could be 'expert only' __ref suffix on the nf_conn arg, or
> KF_OWNED_ARGS, or something else.
I'm still against that.
> > > > [...]
> > > >
> > > > > This PTR_WALKED looks like new thing.
> > > > > If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> > > > > as PTR_WALKED is doing.
> > > > > I mean we can introduce PTR_TRUSTED and add this flag to return value
> > > > > of bpf_get_current_task_btf() and arguments of tracepoints.
> > > > > As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> > > > > backward compat behavior of PTR_TO_BTF_ID.
> > > > > PTR_WALKED is sort-of doing the same, but not conservative enough.
> > > > > Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
> > > >
> > > > I very much prefer the idea of allowlisting instead of denylisting,
> > > > though I wish we'd taken that approach from the start rather than going
> > > > with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
> > > > PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
> > > > (and currently does) imply PTR_TRUSTED.
> > >
> > > I kind agree, but we gotta have both because of backward compat.
> > > We cannot change PTR_TO_BTF_ID as a whole right now.
> > >
> > > Note PTR_TO_BTF_ID appears in kfuncs too.
> > > I'm proposing to use PTR_TO_BTF_ID | PTR_TRUSTED
> > > only in tracepoint args and as return value from
> > > certain helpers like bpf_get_current_task_btf().
> > > afaik it's all safe. There is no uaf here.
> > > uaf is for kfunc. Especially fexit.
> > > Those will stay PTR_TO_BTF_ID. Without PTR_TRUSTED.
> >
> > Ok, this feels like the right approach to me. Unless I'm missing
> > something, modulo doing our due diligence and checking if there are any
> > existing kfuncs that are relying on different behavior, once this lands
> > I think we could maybe even make KF_TRUSTED_ARGS the default for all
> > kfuncs? That should probably be done in a separate patch set though.
> >
>
> I do like the allowlist vs denylist point from Alexei. It was also
> what I originally suggested in [0], but when I went looking, pointer
> walking is really the only case that was problematic, which was being
> marked by PTR_WALKED. The other case of handling fexit is unrelated to
> both.
Args of fentry and fexit will not have PTR_TO_BTF_ID | PTR_TRUSTED.
So not an issue.
We can allowlist certain hooks. Like all of bpf-lsm hooks and many others.
But not all of them.
On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov
<[email protected]> wrote:
>
> On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> > On Wed, 2 Nov 2022 at 03:06, David Vernet <[email protected]> wrote:
> > >
> > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <[email protected]> wrote:
> > > > >
> > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > > Why is this not enough?
> > > > >
> > > > > Kumar should have more context here (he originally suggested this in
> > > > > [0]),
> > > >
> > > > Quoting:
> > > > "
> > > > Unfortunately a side effect of this change is that now since
> > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > > functions would begin working with tp_btf args.
> > > > "
> > > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > > The whole point of that new type was to return it to bpf prog,
> > > > so the verifier type matches it when it's passed into bpf_ct_*
> > > > in turn.
> > > > So I don't see a need for a new OWNED flag still.
> > > > If nf_conn___init is passed into tracepoint it's a bug and
> > > > we gotta fix it.
> > >
> > > Yep, this is what I'm seeing as well. I think you're right that
> > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > > types is the way to enable an ownership model like this.
> > >
> >
> > It's not just nf_conn___init. Some CT helpers also take nf_conn.
> > e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> > Right now they are only allowed in XDP and TC programs, so the tracing
> > args part is not a problem _right now_.
>
> ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
>
> > So currently it may not be possible to pass such a trusted but
> > ref_obj_id == 0 nf_conn to those helpers.
> > But based on changes unrelated to this, it may become possible in the
> > future to obtain such a trusted nf_conn pointer.
>
> From kfunc's pov trusted pointer means valid pointer.
> It doesn't need to be ref_obj_id refcounted from the verifier pov.
> It can be refcounted on the kernel side and it will be trusted.
> The code that calls trace_*() passes only trusted pointers into tp-s.
> If there is a tracepoint somewhere in the kernel that uses a volatile
> pointer to potentially uaf kernel object it's a bug that should be fixed.
>
This is all fine. I'm asking you to distinguish between
trusted-not-refcounted and trusted-and-refcounted.
It is necessary for nf_conn, since the object can be reused if the
refcount is not held.
Some other CPU could be reusing the same memory and allocating a new
nf_conn on it while we change its status.
So it's not ok to call bpf_ct_change_timeout/status on trusted
nf_conn, but only on trusted+refcounted nf_conn.
Trusted doesn't capture the difference between 'valid' vs 'valid and
owned by prog' anymore with the new definition
for PTR_TO_BTF_ID.
Yes, in most cases the tracepoints/tracing functions whitelisted will
have the caller ensure that,
but we should then allow trusted nf_conn in those hooks explicitly,
not implicitly by default everywhere.
Until then it should be restricted to ref_obj_id > 0 IMO as it is right now.
> > It is a requirement of those kfuncs that the nf_conn has its refcount
> > held while they are called.
>
> and it will be. Just not by the verifier.
>
> > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> > It seems better to me to keep that restriction instead of relaxing it,
> > if it is part of the contract.
>
> Disagree as explained above.
>
> > It is fine to not require people to dive into these details and just
> > use KF_TRUSTED_ARGS in general, but we need something to cover special
> > cases like these where the object is only stable while we hold an
> > active refcount, RCU protection is not enough against reuse.
>
> This is not related to RCU. Let's not mix RCU concerns in here.
> It's a different topic.
>
What I meant is that in the normal case, usually objects aren't reused
while the RCU read lock is held.
In case of nf_conn, the refcount needs to be held to ensure that,
since it uses SLAB_TYPESAFE_BY_RCU.
This is why bpf_ct_lookup needs to bump the refcount and match the key
after that again, and cannot just return the looked up ct directly.
> > It could be 'expert only' __ref suffix on the nf_conn arg, or
> > KF_OWNED_ARGS, or something else.
>
> I'm still against that.
>
I understand (and agree) that you don't want to complicate things further.
It's fine if you want to deal with this later when the above concern
materializes. But it will be yet another thing to keep in mind for the
future.
On Tue, Nov 1, 2022 at 6:01 PM Kumar Kartikeya Dwivedi <[email protected]> wrote:
>
> On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov
> <[email protected]> wrote:
> >
> > On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > On Wed, 2 Nov 2022 at 03:06, David Vernet <[email protected]> wrote:
> > > >
> > > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <[email protected]> wrote:
> > > > > >
> > > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > > > Why is this not enough?
> > > > > >
> > > > > > Kumar should have more context here (he originally suggested this in
> > > > > > [0]),
> > > > >
> > > > > Quoting:
> > > > > "
> > > > > Unfortunately a side effect of this change is that now since
> > > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > > > functions would begin working with tp_btf args.
> > > > > "
> > > > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > > > The whole point of that new type was to return it to bpf prog,
> > > > > so the verifier type matches it when it's passed into bpf_ct_*
> > > > > in turn.
> > > > > So I don't see a need for a new OWNED flag still.
> > > > > If nf_conn___init is passed into tracepoint it's a bug and
> > > > > we gotta fix it.
> > > >
> > > > Yep, this is what I'm seeing as well. I think you're right that
> > > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > > > types is the way to enable an ownership model like this.
> > > >
> > >
> > > It's not just nf_conn___init. Some CT helpers also take nf_conn.
> > > e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> > > Right now they are only allowed in XDP and TC programs, so the tracing
> > > args part is not a problem _right now_.
> >
> > ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
> >
> > > So currently it may not be possible to pass such a trusted but
> > > ref_obj_id == 0 nf_conn to those helpers.
> > > But based on changes unrelated to this, it may become possible in the
> > > future to obtain such a trusted nf_conn pointer.
> >
> > From kfunc's pov trusted pointer means valid pointer.
> > It doesn't need to be ref_obj_id refcounted from the verifier pov.
> > It can be refcounted on the kernel side and it will be trusted.
> > The code that calls trace_*() passes only trusted pointers into tp-s.
> > If there is a tracepoint somewhere in the kernel that uses a volatile
> > pointer to potentially uaf kernel object it's a bug that should be fixed.
> >
>
> This is all fine. I'm asking you to distinguish between
> trusted-not-refcounted and trusted-and-refcounted.
That's not what you're asking :)
> It is necessary for nf_conn, since the object can be reused if the
> refcount is not held.
of course. No one argues the opposite.
> Some other CPU could be reusing the same memory and allocating a new
> nf_conn on it while we change its status.
> So it's not ok to call bpf_ct_change_timeout/status on trusted
> nf_conn, but only on trusted+refcounted nf_conn.
and here we start to disagree.
> Trusted doesn't capture the difference between 'valid' vs 'valid and
> owned by prog' anymore with the new definition
> for PTR_TO_BTF_ID.
and here we disagree completely.
You're asking to distinguish refcnt++ done by the program
and recognized by the verifier as ref_obj_id > 0 vs
refcnt++ done by the kernel code before it calls into tracepoint.
That's odd, right?
I don't think people adding kfuncs should care what piece
of code before kfunc did refcnt++.
> Yes, in most cases the tracepoints/tracing functions whitelisted will
> have the caller ensure that,
> but we should then allow trusted nf_conn in those hooks explicitly,
> not implicitly by default everywhere.
> Until then it should be restricted to ref_obj_id > 0 IMO as it is right now.
>
> > > It is a requirement of those kfuncs that the nf_conn has its refcount
> > > held while they are called.
> >
> > and it will be. Just not by the verifier.
> >
> > > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> > > It seems better to me to keep that restriction instead of relaxing it,
> > > if it is part of the contract.
> >
> > Disagree as explained above.
> >
> > > It is fine to not require people to dive into these details and just
> > > use KF_TRUSTED_ARGS in general, but we need something to cover special
> > > cases like these where the object is only stable while we hold an
> > > active refcount, RCU protection is not enough against reuse.
> >
> > This is not related to RCU. Let's not mix RCU concerns in here.
> > It's a different topic.
> >
>
> What I meant is that in the normal case, usually objects aren't reused
> while the RCU read lock is held.
> In case of nf_conn, the refcount needs to be held to ensure that,
> since it uses SLAB_TYPESAFE_BY_RCU.
> This is why bpf_ct_lookup needs to bump the refcount and match the key
> after that again, and cannot just return the looked up ct directly.
bpf_ct_lookup needs to bump a refcnt?!
bpf_skb_ct_lookup calls __bpf_nf_ct_lookup
that calls nf_conntrack_find_get() that does
the search and incs the refcnt in a generic kernel code.
There is nothing bpf specific stuff here. bpf kfunc didn't
add any special refcnt incs.
There are no tracepoints in netfilter, so this discussion
is all theoretical, but if there was then the code
should have made sure that refcnt is held before
passing nf_conn into tracepoint.
> > > It could be 'expert only' __ref suffix on the nf_conn arg, or
> > > KF_OWNED_ARGS, or something else.
> >
> > I'm still against that.
> >
>
> I understand (and agree) that you don't want to complicate things further.
> It's fine if you want to deal with this later when the above concern
> materializes. But it will be yet another thing to keep in mind for the
> future.
I don't share the concern.
With nf_conn there is none, right?
But imagine there is only RCU protected pointer that
is passed into tracepoint somewhere.
The verifier doesn't recognize refcnt++ on it and ref_obj_id == 0
the kernel code doesn't do refcnt++ either.
But it's still safe and this arg should still be
PTR_TO_BTF_ID | PTR_TRUSTED.
The bpf prog can pass it further into kfunc that has KF_TRUSTED_ARGS.
Since RCU is held before calling into tracepoint the bpf prog
has to be non sleepable. Additional rcu_read_lock done by
the prog is redundant, but doesn't hurt.
When prog is calling kfunc the pointer is still valid and
kfunc can safely operate on it assuming that object is not going away.
That is the definition of KF_TRUSTED_ARGS from pov of kfunc.
You documented it yourself :)
"
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
indicates that the all pointer arguments will always have a guaranteed lifetime,
"
On Wed, Nov 02, 2022 at 08:04:57AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 1, 2022 at 6:01 PM Kumar Kartikeya Dwivedi <[email protected]> wrote:
> >
> > On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov
> > <[email protected]> wrote:
> > >
> > > On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > On Wed, 2 Nov 2022 at 03:06, David Vernet <[email protected]> wrote:
> > > > >
> > > > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <[email protected]> wrote:
> > > > > > >
> > > > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > > > > Why is this not enough?
> > > > > > >
> > > > > > > Kumar should have more context here (he originally suggested this in
> > > > > > > [0]),
> > > > > >
> > > > > > Quoting:
> > > > > > "
> > > > > > Unfortunately a side effect of this change is that now since
> > > > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > > > > functions would begin working with tp_btf args.
> > > > > > "
> > > > > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > > > > The whole point of that new type was to return it to bpf prog,
> > > > > > so the verifier type matches it when it's passed into bpf_ct_*
> > > > > > in turn.
> > > > > > So I don't see a need for a new OWNED flag still.
> > > > > > If nf_conn___init is passed into tracepoint it's a bug and
> > > > > > we gotta fix it.
> > > > >
> > > > > Yep, this is what I'm seeing as well. I think you're right that
> > > > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > > > > types is the way to enable an ownership model like this.
> > > > >
> > > >
> > > > It's not just nf_conn___init. Some CT helpers also take nf_conn.
> > > > e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> > > > Right now they are only allowed in XDP and TC programs, so the tracing
> > > > args part is not a problem _right now_.
> > >
> > > ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
> > >
> > > > So currently it may not be possible to pass such a trusted but
> > > > ref_obj_id == 0 nf_conn to those helpers.
> > > > But based on changes unrelated to this, it may become possible in the
> > > > future to obtain such a trusted nf_conn pointer.
> > >
> > > From kfunc's pov trusted pointer means valid pointer.
> > > It doesn't need to be ref_obj_id refcounted from the verifier pov.
> > > It can be refcounted on the kernel side and it will be trusted.
> > > The code that calls trace_*() passes only trusted pointers into tp-s.
> > > If there is a tracepoint somewhere in the kernel that uses a volatile
> > > pointer to potentially uaf kernel object it's a bug that should be fixed.
> > >
> >
> > This is all fine. I'm asking you to distinguish between
> > trusted-not-refcounted and trusted-and-refcounted.
>
> That's not what you're asking :)
>
> > It is necessary for nf_conn, since the object can be reused if the
> > refcount is not held.
>
> of course. No one argues the opposite.
>
> > Some other CPU could be reusing the same memory and allocating a new
> > nf_conn on it while we change its status.
> > So it's not ok to call bpf_ct_change_timeout/status on trusted
> > nf_conn, but only on trusted+refcounted nf_conn.
>
> and here we start to disagree.
>
> > Trusted doesn't capture the difference between 'valid' vs 'valid and
> > owned by prog' anymore with the new definition
> > for PTR_TO_BTF_ID.
>
> and here we disagree completely.
> You're asking to distinguish refcnt++ done by the program
> and recognized by the verifier as ref_obj_id > 0 vs
> refcnt++ done by the kernel code before it calls into tracepoint.
> That's odd, right?
> I don't think people adding kfuncs should care what piece
> of code before kfunc did refcnt++.
>
I think we're talking past each other. Your point is that whenever PTR_TRUSTED
is set (tracepoint args, etc.) the kernel itself should ensure it holds the
refcount on behalf of the program. There shouldn't be a scenario where
PTR_TRUSTED nf_conn is obtained and the program or the kernel isn't holding
an active refcount during its use with helpers/kfuncs.
I was not making that assumption, but it's not wrong. Which is why I wanted to
limit their use to only when verifier is tracking a reference in the program
for the nf_conn returned from an acquire kfunc.
> > Yes, in most cases the tracepoints/tracing functions whitelisted will
> > have the caller ensure that,
> > but we should then allow trusted nf_conn in those hooks explicitly,
> > not implicitly by default everywhere.
> > Until then it should be restricted to ref_obj_id > 0 IMO as it is right now.
> >
> > > > It is a requirement of those kfuncs that the nf_conn has its refcount
> > > > held while they are called.
> > >
> > > and it will be. Just not by the verifier.
> > >
> > > > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> > > > It seems better to me to keep that restriction instead of relaxing it,
> > > > if it is part of the contract.
> > >
> > > Disagree as explained above.
> > >
> > > > It is fine to not require people to dive into these details and just
> > > > use KF_TRUSTED_ARGS in general, but we need something to cover special
> > > > cases like these where the object is only stable while we hold an
> > > > active refcount, RCU protection is not enough against reuse.
> > >
> > > This is not related to RCU. Let's not mix RCU concerns in here.
> > > It's a different topic.
> > >
> >
> > What I meant is that in the normal case, usually objects aren't reused
> > while the RCU read lock is held.
> > In case of nf_conn, the refcount needs to be held to ensure that,
> > since it uses SLAB_TYPESAFE_BY_RCU.
> > This is why bpf_ct_lookup needs to bump the refcount and match the key
> > after that again, and cannot just return the looked up ct directly.
>
> bpf_ct_lookup needs to bump a refcnt?!
> bpf_skb_ct_lookup calls __bpf_nf_ct_lookup
> that calls nf_conntrack_find_get() that does
> the search and incs the refcnt in a generic kernel code.
> There is nothing bpf specific stuff here. bpf kfunc didn't
> add any special refcnt incs.
>
> There are no tracepoints in netfilter, so this discussion
> is all theoretical, but if there was then the code
> should have made sure that refcnt is held before
> passing nf_conn into tracepoint.
>
Right, if so, it makes sense. Thanks for explaining.
> > > > It could be 'expert only' __ref suffix on the nf_conn arg, or
> > > > KF_OWNED_ARGS, or something else.
> > >
> > > I'm still against that.
> > >
> >
> > I understand (and agree) that you don't want to complicate things further.
> > It's fine if you want to deal with this later when the above concern
> > materializes. But it will be yet another thing to keep in mind for the
> > future.
>
> I don't share the concern.
> With nf_conn there is none, right?
> But imagine there is only RCU protected pointer that
> is passed into tracepoint somewhere.
> The verifier doesn't recognize refcnt++ on it and ref_obj_id == 0
> the kernel code doesn't do refcnt++ either.
> But it's still safe and this arg should still be
> PTR_TO_BTF_ID | PTR_TRUSTED.
> The bpf prog can pass it further into kfunc that has KF_TRUSTED_ARGS.
> Since RCU is held before calling into tracepoint the bpf prog
> has to be non sleepable. Additional rcu_read_lock done by
> the prog is redundant, but doesn't hurt.
> When prog is calling kfunc the pointer is still valid and
> kfunc can safely operate on it assuming that object is not going away.
> That is the definition of KF_TRUSTED_ARGS from pov of kfunc.
> You documented it yourself :)
> "
> The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> indicates that the all pointer arguments will always have a guaranteed lifetime,
> "
So based on the above, verifier should only mark nf_conn as PTR_TRUSTED in cases
where the kernel already holds the refcount on behalf of the program, otherwise
for the non-PTR_TRUSTED case program must have ref_obj_id > 0?
I.e. 'the guaranteed lifetime' for the nf_conn case also means the refcount is
held in context where program can obtain PTR_TRUSTED nf_conn pointer.
It will surely be true in the tracepoint case if there ever is one. Won't work
easily for fentry/fexit but you already said that PTR_TRUSTED shouldn't be set
in that case, atleast by default and without allowlisting those BTF IDs.
On Thu, Nov 03, 2022 at 02:18:35AM +0530, Kumar Kartikeya Dwivedi wrote:
> On Wed, Nov 02, 2022 at 08:04:57AM IST, Alexei Starovoitov wrote:
> > On Tue, Nov 1, 2022 at 6:01 PM Kumar Kartikeya Dwivedi <[email protected]> wrote:
> > >
> > > On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > On Wed, 2 Nov 2022 at 03:06, David Vernet <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > > > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > > > > > Why is this not enough?
> > > > > > > >
> > > > > > > > Kumar should have more context here (he originally suggested this in
> > > > > > > > [0]),
> > > > > > >
> > > > > > > Quoting:
> > > > > > > "
> > > > > > > Unfortunately a side effect of this change is that now since
> > > > > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > > > > > functions would begin working with tp_btf args.
> > > > > > > "
> > > > > > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > > > > > The whole point of that new type was to return it to bpf prog,
> > > > > > > so the verifier type matches it when it's passed into bpf_ct_*
> > > > > > > in turn.
> > > > > > > So I don't see a need for a new OWNED flag still.
> > > > > > > If nf_conn___init is passed into tracepoint it's a bug and
> > > > > > > we gotta fix it.
> > > > > >
> > > > > > Yep, this is what I'm seeing as well. I think you're right that
> > > > > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > > > > > types is the way to enable an ownership model like this.
> > > > > >
> > > > >
> > > > > It's not just nf_conn___init. Some CT helpers also take nf_conn.
> > > > > e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> > > > > Right now they are only allowed in XDP and TC programs, so the tracing
> > > > > args part is not a problem _right now_.
> > > >
> > > > ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
> > > >
> > > > > So currently it may not be possible to pass such a trusted but
> > > > > ref_obj_id == 0 nf_conn to those helpers.
> > > > > But based on changes unrelated to this, it may become possible in the
> > > > > future to obtain such a trusted nf_conn pointer.
> > > >
> > > > From kfunc's pov trusted pointer means valid pointer.
> > > > It doesn't need to be ref_obj_id refcounted from the verifier pov.
> > > > It can be refcounted on the kernel side and it will be trusted.
> > > > The code that calls trace_*() passes only trusted pointers into tp-s.
> > > > If there is a tracepoint somewhere in the kernel that uses a volatile
> > > > pointer to potentially uaf kernel object it's a bug that should be fixed.
> > > >
> > >
> > > This is all fine. I'm asking you to distinguish between
> > > trusted-not-refcounted and trusted-and-refcounted.
> >
> > That's not what you're asking :)
> >
> > > It is necessary for nf_conn, since the object can be reused if the
> > > refcount is not held.
> >
> > of course. No one argues the opposite.
> >
> > > Some other CPU could be reusing the same memory and allocating a new
> > > nf_conn on it while we change its status.
> > > So it's not ok to call bpf_ct_change_timeout/status on trusted
> > > nf_conn, but only on trusted+refcounted nf_conn.
> >
> > and here we start to disagree.
> >
> > > Trusted doesn't capture the difference between 'valid' vs 'valid and
> > > owned by prog' anymore with the new definition
> > > for PTR_TO_BTF_ID.
> >
> > and here we disagree completely.
> > You're asking to distinguish refcnt++ done by the program
> > and recognized by the verifier as ref_obj_id > 0 vs
> > refcnt++ done by the kernel code before it calls into tracepoint.
> > That's odd, right?
> > I don't think people adding kfuncs should care what piece
> > of code before kfunc did refcnt++.
> >
>
> I think we're talking past each other. Your point is that whenever PTR_TRUSTED
> is set (tracepoint args, etc.) the kernel itself should ensure it holds the
> refcount on behalf of the program. There shouldn't be a scenario where
> PTR_TRUSTED nf_conn is obtained and the program or the kernel isn't holding
> an active refcount during its use with helpers/kfuncs.
correct.
> I was not making that assumption, but it's not wrong. Which is why I wanted to
> limit their use to only when verifier is tracking a reference in the program
> for the nf_conn returned from an acquire kfunc.
limit the use for PTR_TRUSTED to verifier only?
and introduce PTR_TRUSTED_BECAUSE_IT_CAME_FROM_KERNEL as well?
But the patch we're discussing will be treating them the same way...
Or you're proposing the kfunc writer also suppose to add two flags to a kfunc:
KF_TRUSTED_ARGS | KF_TRUSTED_BECAUSE_KERNEL_ARGS ?
and that's where it comes back to the point I keep stressing in this thread:
The kfunc writer should not be exposed to such details.
The little people need to understand about the verifier when the write bpf
programs or provide kfuncs the better.
> > > Yes, in most cases the tracepoints/tracing functions whitelisted will
> > > have the caller ensure that,
> > > but we should then allow trusted nf_conn in those hooks explicitly,
> > > not implicitly by default everywhere.
> > > Until then it should be restricted to ref_obj_id > 0 IMO as it is right now.
> > >
> > > > > It is a requirement of those kfuncs that the nf_conn has its refcount
> > > > > held while they are called.
> > > >
> > > > and it will be. Just not by the verifier.
> > > >
> > > > > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> > > > > It seems better to me to keep that restriction instead of relaxing it,
> > > > > if it is part of the contract.
> > > >
> > > > Disagree as explained above.
> > > >
> > > > > It is fine to not require people to dive into these details and just
> > > > > use KF_TRUSTED_ARGS in general, but we need something to cover special
> > > > > cases like these where the object is only stable while we hold an
> > > > > active refcount, RCU protection is not enough against reuse.
> > > >
> > > > This is not related to RCU. Let's not mix RCU concerns in here.
> > > > It's a different topic.
> > > >
> > >
> > > What I meant is that in the normal case, usually objects aren't reused
> > > while the RCU read lock is held.
> > > In case of nf_conn, the refcount needs to be held to ensure that,
> > > since it uses SLAB_TYPESAFE_BY_RCU.
> > > This is why bpf_ct_lookup needs to bump the refcount and match the key
> > > after that again, and cannot just return the looked up ct directly.
> >
> > bpf_ct_lookup needs to bump a refcnt?!
> > bpf_skb_ct_lookup calls __bpf_nf_ct_lookup
> > that calls nf_conntrack_find_get() that does
> > the search and incs the refcnt in a generic kernel code.
> > There is nothing bpf specific stuff here. bpf kfunc didn't
> > add any special refcnt incs.
> >
> > There are no tracepoints in netfilter, so this discussion
> > is all theoretical, but if there was then the code
> > should have made sure that refcnt is held before
> > passing nf_conn into tracepoint.
> >
>
> Right, if so, it makes sense. Thanks for explaining.
>
> > > > > It could be 'expert only' __ref suffix on the nf_conn arg, or
> > > > > KF_OWNED_ARGS, or something else.
> > > >
> > > > I'm still against that.
> > > >
> > >
> > > I understand (and agree) that you don't want to complicate things further.
> > > It's fine if you want to deal with this later when the above concern
> > > materializes. But it will be yet another thing to keep in mind for the
> > > future.
> >
> > I don't share the concern.
> > With nf_conn there is none, right?
> > But imagine there is only RCU protected pointer that
> > is passed into tracepoint somewhere.
> > The verifier doesn't recognize refcnt++ on it and ref_obj_id == 0
> > the kernel code doesn't do refcnt++ either.
> > But it's still safe and this arg should still be
> > PTR_TO_BTF_ID | PTR_TRUSTED.
> > The bpf prog can pass it further into kfunc that has KF_TRUSTED_ARGS.
> > Since RCU is held before calling into tracepoint the bpf prog
> > has to be non sleepable. Additional rcu_read_lock done by
> > the prog is redundant, but doesn't hurt.
> > When prog is calling kfunc the pointer is still valid and
> > kfunc can safely operate on it assuming that object is not going away.
> > That is the definition of KF_TRUSTED_ARGS from pov of kfunc.
> > You documented it yourself :)
> > "
> > The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> > indicates that the all pointer arguments will always have a guaranteed lifetime,
> > "
>
> So based on the above, verifier should only mark nf_conn as PTR_TRUSTED in cases
> where the kernel already holds the refcount on behalf of the program,
yes, but 'holding the reference' is not the only option.
Holding rcu_read_lock is fine too.
Or any other mechanism that ensures that the pointer has guaranteed lifetime.
and back to my earlier point...
using your logic we'd need PTR_TRUSTED_BECAUSE_RCU in the verifier and
corresponding KF_TRUSTED_ARGS_BECAUSE_RCU flag on kfunc.
That just doesn't scale.
PTR_RCU flag might actually be needed for some other reason, but
KF_TRUSTED_ARGS_BECAUSE_RCU is "too much info" for kfunc providers.
> otherwise
> for the non-PTR_TRUSTED case program must have ref_obj_id > 0?
>
> I.e. 'the guaranteed lifetime' for the nf_conn case also means the refcount is
> held in context where program can obtain PTR_TRUSTED nf_conn pointer.
yes.
> It will surely be true in the tracepoint case if there ever is one. Won't work
> easily for fentry/fexit but you already said that PTR_TRUSTED shouldn't be set
> in that case, atleast by default and without allowlisting those BTF IDs.
Correct. allowlisting kernel func's btf_ids is one approach.
I might argue that fentry hook on any kernel func that has 'struct nf_conn *'
as an argument is also safe (barring recursion into bpf_ct_* kfuncs).
Dual compiler of the kernel (native + bpf) will help this kind of safety analysis.