LinuxLists.cc - [RFC PATCH bpf-next 0/2] BTF support for ksyms

2020-07-15 21:43:56

Subject: [RFC PATCH bpf-next 0/2] BTF support for ksyms

This patch series extends the previously add __ksym externs with btf
info.

Right now the __ksym externs are treated as pure 64-bit scalar value.
Libbpf replaces ld_imm64 insn of __ksym by its kernel address at load
time. This patch series extend those extern with their btf info. Note
that btf support for __ksym must come with the btf that has VARs encoded
to work properly. Therefore, these patches are tested against a btf
generated by a patched pahole, whose change will available in released
pahole soon.

There are a couple of design choices that I would like feedbacks from
bpf/btf experts.

1. Because the newly added pseudo_btf_id needs to carry both a kernel
address (64 bits) and a btf id (32 bits), I used the 'off' fields
of ld_imm insn to carry btf id. I wonder if this breaks anything or
if there is a better idea.
2. Since only a subset of vars are going to be encoded into the new
btf, if a ksym that doesn't find its btf id, it doesn't get
converted into pseudo_btf_id. It is still treated as pure scalar
value. But we require kernel btf to be loaded in libbpf if there is
any ksym in the bpf prog.

This is RFC as it requires pahole changes that encode kernel vars into
btf.

Hao Luo (2):
bpf: BTF support for __ksym externs
selftests/bpf: Test __ksym externs with BTF

include/uapi/linux/bpf.h | 37 ++++++++++----
kernel/bpf/verifier.c | 26 ++++++++--
tools/include/uapi/linux/bpf.h | 37 ++++++++++----
tools/lib/bpf/libbpf.c | 50 ++++++++++++++++++-
.../testing/selftests/bpf/prog_tests/ksyms.c | 2 +
.../testing/selftests/bpf/progs/test_ksyms.c | 14 ++++++
6 files changed, 143 insertions(+), 23 deletions(-)

--
2.27.0.389.gc38d7665816-goog

2020-07-15 21:44:15

by Hao Luo

[permalink] [raw]

Subject: [RFC PATCH bpf-next 1/2] bpf: BTF support for __ksym externs

Previous commits:

commit 1c0c7074fefd ("libbpf: Add support for extracting kernel symbol addresses")
commit 2e33efe32e01 ("libbpf: Generalize libbpf externs support")

have introduced a new type of extern variable ksyms to access kernel
global variables. This patch extends that work by adding btf info
for ksyms. In more details, in addition to the existing type btf_types,
pahole is going to encode a certain global variables in kernel btf
(percpu variables at this moment). With the extended kernel btf, we
can associate btf id to the ksyms to improve the performance of
accessing those vars by using direct load instructions.

More specifically, libbpf can scan the kernel btf to find the btf id
of a ksym at extern resolution. During relocation, it will replace
"ld_imm64 rX, foo" with BPF_PSEUDO_BTF_ID. From the verifier point of
view "ld_imm64 rX, foo // pseudo_btf_id" will be similar to ld_imm64
with pseudo_map_fd and pseudo_map_value. The verifier will check btf_id
and replace that with actual kernel address at program load time. It
will also know that exact type of 'rX' from there on.

Note that since only a subset of kernel symbols are encoded in btf right
now, finding btf_id for ksyms is only best effort. If a ksym does not
have a btf id, we do not rewrite its ld_imm64 to pseudo_btf_id. In that
case, it is treated as loading from a scalar value, which is the current
default behavior for ksyms.

Also note since we need to carry the ksym's address (64bits) as well as
its btf_id (32bits), pseudo_btf_id uses ld_imm64's both imm and off
fields.

Signed-off-by: Hao Luo <[email protected]>
---
include/uapi/linux/bpf.h | 37 +++++++++++++++++++------
kernel/bpf/verifier.c | 26 +++++++++++++++---
tools/include/uapi/linux/bpf.h | 37 +++++++++++++++++++------
tools/lib/bpf/libbpf.c | 50 +++++++++++++++++++++++++++++++++-
4 files changed, 127 insertions(+), 23 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 5e386389913a..7490005acdba 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -334,18 +334,37 @@ enum bpf_link_type {
#define BPF_F_TEST_STATE_FREQ (1U << 3)

/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
- * two extensions:
- *
- * insn[0].src_reg: BPF_PSEUDO_MAP_FD BPF_PSEUDO_MAP_VALUE
- * insn[0].imm: map fd map fd
- * insn[1].imm: 0 offset into value
- * insn[0].off: 0 0
- * insn[1].off: 0 0
- * ldimm64 rewrite: address of map address of map[0]+offset
- * verifier type: CONST_PTR_TO_MAP PTR_TO_MAP_VALUE
+ * three extensions:
+ *
+ * insn[0].src_reg: BPF_PSEUDO_MAP_FD
+ * insn[0].imm: map fd
+ * insn[1].imm: 0
+ * insn[0].off: 0
+ * insn[1].off: 0
+ * ldimm64 rewrite: address of map
+ * verifier type: CONST_PTR_TO_MAP
*/
#define BPF_PSEUDO_MAP_FD 1
+/*
+ * insn[0].src_reg: BPF_PSEUDO_MAP_VALUE
+ * insn[0].imm: map fd
+ * insn[1].imm: offset into value
+ * insn[0].off: 0
+ * insn[1].off: 0
+ * ldimm64 rewrite: address of map[0]+offset
+ * verifier type: PTR_TO_MAP_VALUE
+ */
#define BPF_PSEUDO_MAP_VALUE 2
+/*
+ * insn[0].src_reg: BPF_PSEUDO_BTF_ID
+ * insn[0].imm: lower 32 bits of address
+ * insn[1].imm: higher 32 bits of address
+ * insn[0].off: lower 16 bits of btf id
+ * insn[1].off: higher 16 bits of btf id
+ * ldimm64 rewrite: address of kernel symbol
+ * verifier type: PTR_TO_BTF_ID
+ */
+#define BPF_PSEUDO_BTF_ID 3

/* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
* offset to another bpf function
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c1efc9d08fd..3c925957b9b6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7131,15 +7131,29 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
verbose(env, "invalid BPF_LD_IMM insn\n");
return -EINVAL;
}
+ err = check_reg_arg(env, insn->dst_reg, DST_OP);
+ if (err)
+ return err;
+
+ /*
+ * BPF_PSEUDO_BTF_ID insn's off fields carry the ksym's btf_id, so its
+ * handling has to come before the reserved field check.
+ */
+ if (insn->src_reg == BPF_PSEUDO_BTF_ID) {
+ u32 id = ((u32)(insn + 1)->off << 16) | (u32)insn->off;
+ const struct btf_type *t = btf_type_by_id(btf_vmlinux, id);
+
+ mark_reg_known_zero(env, regs, insn->dst_reg);
+ regs[insn->dst_reg].type = PTR_TO_BTF_ID;
+ regs[insn->dst_reg].btf_id = t->type;
+ return 0;
+ }
+
if (insn->off != 0) {
verbose(env, "BPF_LD_IMM64 uses reserved fields\n");
return -EINVAL;
}

- err = check_reg_arg(env, insn->dst_reg, DST_OP);
- if (err)
- return err;
-
if (insn->src_reg == 0) {
u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;

@@ -9166,6 +9180,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
/* valid generic load 64-bit imm */
goto next_insn;

+ if (insn[0].src_reg == BPF_PSEUDO_BTF_ID)
+ /* pseudo_btf_id load 64-bit imm */
+ goto next_insn;
+
/* In final convert_pseudo_ld_imm64() step, this is
* converted into regular 64-bit imm load insn.
*/
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 5e386389913a..7490005acdba 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -334,18 +334,37 @@ enum bpf_link_type {
#define BPF_F_TEST_STATE_FREQ (1U << 3)

/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
- * two extensions:
- *
- * insn[0].src_reg: BPF_PSEUDO_MAP_FD BPF_PSEUDO_MAP_VALUE
- * insn[0].imm: map fd map fd
- * insn[1].imm: 0 offset into value
- * insn[0].off: 0 0
- * insn[1].off: 0 0
- * ldimm64 rewrite: address of map address of map[0]+offset
- * verifier type: CONST_PTR_TO_MAP PTR_TO_MAP_VALUE
+ * three extensions:
+ *
+ * insn[0].src_reg: BPF_PSEUDO_MAP_FD
+ * insn[0].imm: map fd
+ * insn[1].imm: 0
+ * insn[0].off: 0
+ * insn[1].off: 0
+ * ldimm64 rewrite: address of map
+ * verifier type: CONST_PTR_TO_MAP
*/
#define BPF_PSEUDO_MAP_FD 1
+/*
+ * insn[0].src_reg: BPF_PSEUDO_MAP_VALUE
+ * insn[0].imm: map fd
+ * insn[1].imm: offset into value
+ * insn[0].off: 0
+ * insn[1].off: 0
+ * ldimm64 rewrite: address of map[0]+offset
+ * verifier type: PTR_TO_MAP_VALUE
+ */
#define BPF_PSEUDO_MAP_VALUE 2
+/*
+ * insn[0].src_reg: BPF_PSEUDO_BTF_ID
+ * insn[0].imm: lower 32 bits of address
+ * insn[1].imm: higher 32 bits of address
+ * insn[0].off: lower 16 bits of btf id
+ * insn[1].off: higher 16 bits of btf id
+ * ldimm64 rewrite: address of kernel symbol
+ * verifier type: PTR_TO_BTF_ID
+ */
+#define BPF_PSEUDO_BTF_ID 3

/* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
* offset to another bpf function
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4489f95f1d1a..23f9710b3d7e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -363,6 +363,7 @@ struct extern_desc {
} kcfg;
struct {
unsigned long long addr;
+ int vmlinux_btf_id;
} ksym;
};
};
@@ -387,6 +388,7 @@ struct bpf_object {

bool loaded;
bool has_pseudo_calls;
+ bool has_ksyms;

/*
* Information when doing elf related work. Only valid if fd
@@ -2500,6 +2502,10 @@ static int bpf_object__load_vmlinux_btf(struct bpf_object *obj)
if (obj->btf_ext && obj->btf_ext->field_reloc_info.len)
need_vmlinux_btf = true;

+ /* Support for ksyms needs kernel BTF */
+ if (obj->has_ksyms)
+ need_vmlinux_btf = true;
+
bpf_object__for_each_program(prog, obj) {
if (!prog->load)
continue;
@@ -2946,6 +2952,9 @@ static int bpf_object__collect_externs(struct bpf_object *obj)
pr_warn("extern (ksym) '%s' is not typeless (void)\n", ext_name);
return -ENOTSUP;
}
+
+ if (!obj->has_ksyms)
+ obj->has_ksyms = true;
} else {
pr_warn("unrecognized extern section '%s'\n", sec_name);
return -ENOTSUP;
@@ -5123,6 +5132,18 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
} else /* EXT_KSYM */ {
insn[0].imm = (__u32)ext->ksym.addr;
insn[1].imm = ext->ksym.addr >> 32;
+
+ /*
+ * If the ksym has found btf id in the vmlinux
+ * btf, it gets mapped to a BPF_PSEUDO_BTF_ID.
+ * The verifier will then convert the dst reg
+ * into PTR_TO_BTF_ID instead of SCALAR_VALUE.
+ */
+ if (ext->ksym.vmlinux_btf_id) {
+ insn[0].src_reg = BPF_PSEUDO_BTF_ID;
+ insn[0].off = (__s16)ext->ksym.vmlinux_btf_id;
+ insn[1].off = ext->ksym.vmlinux_btf_id >> 16;
+ }
}
break;
case RELO_CALL:
@@ -5800,6 +5821,30 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
return err;
}

+static int bpf_object__resolve_ksyms_btf_id(struct bpf_object *obj)
+{
+ struct extern_desc *ext;
+ int i, id;
+
+ if (!obj->btf_vmlinux) {
+ pr_warn("warn: support of ksyms needs kernel btf.\n");
+ return -1;
+ }
+
+ for (i = 0; i < obj->nr_extern; i++) {
+ ext = &obj->externs[i];
+ if (ext->type != EXT_KSYM)
+ continue;
+
+ id = btf__find_by_name_kind(obj->btf_vmlinux, ext->name,
+ BTF_KIND_VAR);
+ ext->ksym.vmlinux_btf_id = id > 0 ? id : 0;
+ pr_debug("extern (ksym) #%d: name %s, vmlinux_btf_id: %d\n",
+ i, ext->name, ext->ksym.vmlinux_btf_id);
+ }
+ return 0;
+}
+
static int bpf_object__resolve_externs(struct bpf_object *obj,
const char *extra_kconfig)
{
@@ -5862,6 +5907,9 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
err = bpf_object__read_kallsyms_file(obj);
if (err)
return -EINVAL;
+ err = bpf_object__resolve_ksyms_btf_id(obj);
+ if (err)
+ return -EINVAL;
}
for (i = 0; i < obj->nr_extern; i++) {
ext = &obj->externs[i];
@@ -5896,10 +5944,10 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr)

err = bpf_object__probe_loading(obj);
err = err ? : bpf_object__probe_caps(obj);
+ err = err ? : bpf_object__load_vmlinux_btf(obj);
err = err ? : bpf_object__resolve_externs(obj, obj->kconfig);
err = err ? : bpf_object__sanitize_and_load_btf(obj);
err = err ? : bpf_object__sanitize_maps(obj);
- err = err ? : bpf_object__load_vmlinux_btf(obj);
err = err ? : bpf_object__init_kern_struct_ops_maps(obj);
err = err ? : bpf_object__create_maps(obj);
err = err ? : bpf_object__relocate(obj, attr->target_btf_path);
--
2.27.0.389.gc38d7665816-goog

2020-07-15 21:44:27

by Hao Luo

[permalink] [raw]

Subject: [RFC PATCH bpf-next 2/2] selftests/bpf: Test __ksym externs with BTF

Extend ksyms.c selftest to make sure BTF enables direct loads of ksyms.

Note that test is done against the kernel btf extended with kernel VARs.

Signed-off-by: Hao Luo <[email protected]>
---
tools/testing/selftests/bpf/prog_tests/ksyms.c | 2 ++
tools/testing/selftests/bpf/progs/test_ksyms.c | 14 ++++++++++++++
2 files changed, 16 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms.c b/tools/testing/selftests/bpf/prog_tests/ksyms.c
index e3d6777226a8..0e7f3bc3b0ae 100644
--- a/tools/testing/selftests/bpf/prog_tests/ksyms.c
+++ b/tools/testing/selftests/bpf/prog_tests/ksyms.c
@@ -65,6 +65,8 @@ void test_ksyms(void)
"got %llu, exp %llu\n", data->out__btf_size, btf_size);
CHECK(data->out__per_cpu_start != 0, "__per_cpu_start",
"got %llu, exp %llu\n", data->out__per_cpu_start, (__u64)0);
+ CHECK(data->out__rq_cpu != 0, "rq_cpu",
+ "got %llu, exp %llu\n", data->out__rq_cpu, (__u64)0);

cleanup:
test_ksyms__destroy(skel);
diff --git a/tools/testing/selftests/bpf/progs/test_ksyms.c b/tools/testing/selftests/bpf/progs/test_ksyms.c
index 6c9cbb5a3bdf..e777603757e5 100644
--- a/tools/testing/selftests/bpf/progs/test_ksyms.c
+++ b/tools/testing/selftests/bpf/progs/test_ksyms.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */

+#include "vmlinux.h"
#include <stdbool.h>
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
@@ -9,11 +10,13 @@ __u64 out__bpf_link_fops = -1;
__u64 out__bpf_link_fops1 = -1;
__u64 out__btf_size = -1;
__u64 out__per_cpu_start = -1;
+__u64 out__rq_cpu = -1;

extern const void bpf_link_fops __ksym;
extern const void __start_BTF __ksym;
extern const void __stop_BTF __ksym;
extern const void __per_cpu_start __ksym;
+extern const void runqueues __ksym;
/* non-existing symbol, weak, default to zero */
extern const void bpf_link_fops1 __ksym __weak;

@@ -29,4 +32,15 @@ int handler(const void *ctx)
return 0;
}

+SEC("tp_btf/sys_enter")
+int handler_tp_btf(const void *ctx)
+{
+ const struct rq *rq = &runqueues;
+
+ /* rq now points to the runqueue of cpu 0. */
+ out__rq_cpu = rq->cpu;
+
+ return 0;
+}
+
char _license[] SEC("license") = "GPL";
--
2.27.0.389.gc38d7665816-goog

2020-07-20 05:23:03

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [RFC PATCH bpf-next 1/2] bpf: BTF support for __ksym externs

On Wed, Jul 15, 2020 at 2:45 PM Hao Luo <[email protected]> wrote:
>
> Previous commits:
>
> commit 1c0c7074fefd ("libbpf: Add support for extracting kernel symbol addresses")
> commit 2e33efe32e01 ("libbpf: Generalize libbpf externs support")
>
> have introduced a new type of extern variable ksyms to access kernel
> global variables. This patch extends that work by adding btf info
> for ksyms. In more details, in addition to the existing type btf_types,
> pahole is going to encode a certain global variables in kernel btf
> (percpu variables at this moment). With the extended kernel btf, we
> can associate btf id to the ksyms to improve the performance of
> accessing those vars by using direct load instructions.

This is a step in the right direction, thanks for working on this. See
below for a few problems, though.

Also, in the next version, please split kernel part and libbpf part
into separate patches.

>
> More specifically, libbpf can scan the kernel btf to find the btf id
> of a ksym at extern resolution. During relocation, it will replace
> "ld_imm64 rX, foo" with BPF_PSEUDO_BTF_ID. From the verifier point of
> view "ld_imm64 rX, foo // pseudo_btf_id" will be similar to ld_imm64
> with pseudo_map_fd and pseudo_map_value. The verifier will check btf_id
> and replace that with actual kernel address at program load time. It
> will also know that exact type of 'rX' from there on.
>
> Note that since only a subset of kernel symbols are encoded in btf right
> now, finding btf_id for ksyms is only best effort. If a ksym does not
> have a btf id, we do not rewrite its ld_imm64 to pseudo_btf_id. In that
> case, it is treated as loading from a scalar value, which is the current
> default behavior for ksyms.

I don't think that's the right approach. It can't be the best effort.
It's actually pretty clear when a user wants a BTF-based variable with
ability to do direct memory access vs __ksym address that we have
right now: variable type info. In your patch you are only looking up
variable by name, but it needs to be more elaborate logic:

1. if variable type is `extern void` -- do what we do today (no BTF required)
2. if the variable type is anything but `extern void`, then find that
variable in BTF. If no BTF or variable is not found -- hard error with
detailed enough message about what we expected to find in kernel BTF.
3. If such a variable is found in the kernel, then might be a good
idea to additionally check type compatibility (e.g., struct/union
should match struct/union, int should match int, typedefs should get
resolved to underlying type, etc). I don't think deep comparison of
structs is right, though, due to CO-RE, so just high-level
compatibility checks to prevent the most obvious mistakes.

>
> Also note since we need to carry the ksym's address (64bits) as well as
> its btf_id (32bits), pseudo_btf_id uses ld_imm64's both imm and off
> fields.

For BTF-enabled ksyms, libbpf doesn't need to provide symbol address,
kernel will find it and substitute it, so BTF ID is the only
parameter. Thus it can just go into the imm field (and simplify
ldimm64 validation logic a bit).

>
> Signed-off-by: Hao Luo <[email protected]>
> ---
> include/uapi/linux/bpf.h | 37 +++++++++++++++++++------
> kernel/bpf/verifier.c | 26 +++++++++++++++---
> tools/include/uapi/linux/bpf.h | 37 +++++++++++++++++++------
> tools/lib/bpf/libbpf.c | 50 +++++++++++++++++++++++++++++++++-
> 4 files changed, 127 insertions(+), 23 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 5e386389913a..7490005acdba 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -334,18 +334,37 @@ enum bpf_link_type {
> #define BPF_F_TEST_STATE_FREQ (1U << 3)
>
> /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> - * two extensions:
> - *
> - * insn[0].src_reg: BPF_PSEUDO_MAP_FD BPF_PSEUDO_MAP_VALUE
> - * insn[0].imm: map fd map fd
> - * insn[1].imm: 0 offset into value
> - * insn[0].off: 0 0
> - * insn[1].off: 0 0
> - * ldimm64 rewrite: address of map address of map[0]+offset
> - * verifier type: CONST_PTR_TO_MAP PTR_TO_MAP_VALUE
> + * three extensions:
> + *
> + * insn[0].src_reg: BPF_PSEUDO_MAP_FD
> + * insn[0].imm: map fd
> + * insn[1].imm: 0
> + * insn[0].off: 0
> + * insn[1].off: 0
> + * ldimm64 rewrite: address of map
> + * verifier type: CONST_PTR_TO_MAP
> */
> #define BPF_PSEUDO_MAP_FD 1
> +/*
> + * insn[0].src_reg: BPF_PSEUDO_MAP_VALUE
> + * insn[0].imm: map fd
> + * insn[1].imm: offset into value
> + * insn[0].off: 0
> + * insn[1].off: 0
> + * ldimm64 rewrite: address of map[0]+offset
> + * verifier type: PTR_TO_MAP_VALUE
> + */
> #define BPF_PSEUDO_MAP_VALUE 2
> +/*
> + * insn[0].src_reg: BPF_PSEUDO_BTF_ID
> + * insn[0].imm: lower 32 bits of address
> + * insn[1].imm: higher 32 bits of address
> + * insn[0].off: lower 16 bits of btf id
> + * insn[1].off: higher 16 bits of btf id
> + * ldimm64 rewrite: address of kernel symbol
> + * verifier type: PTR_TO_BTF_ID
> + */
> +#define BPF_PSEUDO_BTF_ID 3
>
> /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
> * offset to another bpf function
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3c1efc9d08fd..3c925957b9b6 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -7131,15 +7131,29 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> verbose(env, "invalid BPF_LD_IMM insn\n");
> return -EINVAL;
> }
> + err = check_reg_arg(env, insn->dst_reg, DST_OP);
> + if (err)
> + return err;
> +
> + /*
> + * BPF_PSEUDO_BTF_ID insn's off fields carry the ksym's btf_id, so its
> + * handling has to come before the reserved field check.
> + */
> + if (insn->src_reg == BPF_PSEUDO_BTF_ID) {
> + u32 id = ((u32)(insn + 1)->off << 16) | (u32)insn->off;
> + const struct btf_type *t = btf_type_by_id(btf_vmlinux, id);
> +

This is the kernel, we should be paranoid and assume the hackers want
to do bad things. So check t for NULL. Check that it's actually a
BTF_KIND_VAR. Check the name, find ksym addr, etc.

> + mark_reg_known_zero(env, regs, insn->dst_reg);
> + regs[insn->dst_reg].type = PTR_TO_BTF_ID;
> + regs[insn->dst_reg].btf_id = t->type;
> + return 0;
> + }
> +
> if (insn->off != 0) {
> verbose(env, "BPF_LD_IMM64 uses reserved fields\n");
> return -EINVAL;
> }
>
> - err = check_reg_arg(env, insn->dst_reg, DST_OP);
> - if (err)
> - return err;
> -
> if (insn->src_reg == 0) {
> u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
>

[...]

2020-07-20 05:30:56

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [RFC PATCH bpf-next 2/2] selftests/bpf: Test __ksym externs with BTF

On Wed, Jul 15, 2020 at 2:46 PM Hao Luo <[email protected]> wrote:
>
> Extend ksyms.c selftest to make sure BTF enables direct loads of ksyms.
>
> Note that test is done against the kernel btf extended with kernel VARs.
>
> Signed-off-by: Hao Luo <[email protected]>
> ---
> tools/testing/selftests/bpf/prog_tests/ksyms.c | 2 ++
> tools/testing/selftests/bpf/progs/test_ksyms.c | 14 ++++++++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/ksyms.c b/tools/testing/selftests/bpf/prog_tests/ksyms.c
> index e3d6777226a8..0e7f3bc3b0ae 100644
> --- a/tools/testing/selftests/bpf/prog_tests/ksyms.c
> +++ b/tools/testing/selftests/bpf/prog_tests/ksyms.c
> @@ -65,6 +65,8 @@ void test_ksyms(void)
> "got %llu, exp %llu\n", data->out__btf_size, btf_size);
> CHECK(data->out__per_cpu_start != 0, "__per_cpu_start",
> "got %llu, exp %llu\n", data->out__per_cpu_start, (__u64)0);
> + CHECK(data->out__rq_cpu != 0, "rq_cpu",
> + "got %llu, exp %llu\n", data->out__rq_cpu, (__u64)0);
>
> cleanup:
> test_ksyms__destroy(skel);
> diff --git a/tools/testing/selftests/bpf/progs/test_ksyms.c b/tools/testing/selftests/bpf/progs/test_ksyms.c
> index 6c9cbb5a3bdf..e777603757e5 100644
> --- a/tools/testing/selftests/bpf/progs/test_ksyms.c
> +++ b/tools/testing/selftests/bpf/progs/test_ksyms.c
> @@ -1,6 +1,7 @@
> // SPDX-License-Identifier: GPL-2.0
> /* Copyright (c) 2019 Facebook */
>
> +#include "vmlinux.h"
> #include <stdbool.h>
> #include <linux/bpf.h>
> #include <bpf/bpf_helpers.h>
> @@ -9,11 +10,13 @@ __u64 out__bpf_link_fops = -1;
> __u64 out__bpf_link_fops1 = -1;
> __u64 out__btf_size = -1;
> __u64 out__per_cpu_start = -1;
> +__u64 out__rq_cpu = -1;
>
> extern const void bpf_link_fops __ksym;
> extern const void __start_BTF __ksym;
> extern const void __stop_BTF __ksym;
> extern const void __per_cpu_start __ksym;
> +extern const void runqueues __ksym;

This should ideally look like a real global variable extern:

extern const struct rq runqueues __ksym;

But that's the case for non-per-cpu variables. You didn't seem to
address per-CPU variables in this patch set. How did you intend to
handle that? We should look at a possible BPF helper to access such
variables as well and how the verifier will prevent direct memory
accesses for such variables.

We should have some BPF helper that accepts per-CPU PTR_TO_BTF_ID, and
returns PTR_TO_BTF_ID, but adjusted to desired CPU. And verifier
ideally would allow direct memory access on that resulting
PTR_TO_BTF_ID, but not on per-CPU one. Not sure yet how this should
look like, but the verifier probably needs to know that variable
itself is per-cpu, no?

> /* non-existing symbol, weak, default to zero */
> extern const void bpf_link_fops1 __ksym __weak;
>
> @@ -29,4 +32,15 @@ int handler(const void *ctx)
> return 0;
> }
>
> +SEC("tp_btf/sys_enter")
> +int handler_tp_btf(const void *ctx)
> +{
> + const struct rq *rq = &runqueues;
> +
> + /* rq now points to the runqueue of cpu 0. */
> + out__rq_cpu = rq->cpu;
> +
> + return 0;
> +}
> +
> char _license[] SEC("license") = "GPL";
> --
> 2.27.0.389.gc38d7665816-goog
>

2020-07-20 17:06:16

by Hao Luo

[permalink] [raw]

Subject: Re: [RFC PATCH bpf-next 1/2] bpf: BTF support for __ksym externs

Andrii,

Thanks for taking a look at this. You comments are clear, I will fix them in v2.

> Also, in the next version, please split kernel part and libbpf part
> into separate patches.
>

Got it. Will do.

> I don't think that's the right approach. It can't be the best effort.
> It's actually pretty clear when a user wants a BTF-based variable with
> ability to do direct memory access vs __ksym address that we have
> right now: variable type info. In your patch you are only looking up
> variable by name, but it needs to be more elaborate logic:
>
> 1. if variable type is `extern void` -- do what we do today (no BTF required)
> 2. if the variable type is anything but `extern void`, then find that
> variable in BTF. If no BTF or variable is not found -- hard error with
> detailed enough message about what we expected to find in kernel BTF.
> 3. If such a variable is found in the kernel, then might be a good
> idea to additionally check type compatibility (e.g., struct/union
> should match struct/union, int should match int, typedefs should get
> resolved to underlying type, etc). I don't think deep comparison of
> structs is right, though, due to CO-RE, so just high-level
> compatibility checks to prevent the most obvious mistakes.
>

Ack.

> >
> > Also note since we need to carry the ksym's address (64bits) as well as
> > its btf_id (32bits), pseudo_btf_id uses ld_imm64's both imm and off
> > fields.
>
> For BTF-enabled ksyms, libbpf doesn't need to provide symbol address,
> kernel will find it and substitute it, so BTF ID is the only
> parameter. Thus it can just go into the imm field (and simplify
> ldimm64 validation logic a bit).
>

Ack.

> > /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
> > * offset to another bpf function
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 3c1efc9d08fd..3c925957b9b6 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -7131,15 +7131,29 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
> > verbose(env, "invalid BPF_LD_IMM insn\n");
> > return -EINVAL;
> > }
> > + err = check_reg_arg(env, insn->dst_reg, DST_OP);
> > + if (err)
> > + return err;
> > +
> > + /*
> > + * BPF_PSEUDO_BTF_ID insn's off fields carry the ksym's btf_id, so its
> > + * handling has to come before the reserved field check.
> > + */
> > + if (insn->src_reg == BPF_PSEUDO_BTF_ID) {
> > + u32 id = ((u32)(insn + 1)->off << 16) | (u32)insn->off;
> > + const struct btf_type *t = btf_type_by_id(btf_vmlinux, id);
> > +
>
> This is the kernel, we should be paranoid and assume the hackers want
> to do bad things. So check t for NULL. Check that it's actually a
> BTF_KIND_VAR. Check the name, find ksym addr, etc.
>

Ack.

2020-07-20 20:29:11

by Hao Luo

[permalink] [raw]

Subject: Re: [RFC PATCH bpf-next 2/2] selftests/bpf: Test __ksym externs with BTF

>
> This should ideally look like a real global variable extern:
>
> extern const struct rq runqueues __ksym;
>
>
> But that's the case for non-per-cpu variables. You didn't seem to
> address per-CPU variables in this patch set. How did you intend to
> handle that? We should look at a possible BPF helper to access such
> variables as well and how the verifier will prevent direct memory
> accesses for such variables.
>
> We should have some BPF helper that accepts per-CPU PTR_TO_BTF_ID, and
> returns PTR_TO_BTF_ID, but adjusted to desired CPU. And verifier
> ideally would allow direct memory access on that resulting
> PTR_TO_BTF_ID, but not on per-CPU one. Not sure yet how this should
> look like, but the verifier probably needs to know that variable
> itself is per-cpu, no?
>

Yes, that's what I was unclear about, so I don't have that part in
this patchset. But your explanation helped me organize my thoughts. :)

Actually, the verifier can tell whether a var is percpu from the
DATASEC, since we have encoded "percpu" DATASEC in btf. I think the
following should work:

We may introduce a new PTR_TO_BTF_VAR_ID. In ld_imm, libbpf replaces
ksyms with btf_id. The btf id points to a KIND_VAR. If the pointed VAR
is found in the "percpu" DATASEC, dst_reg is set to PTR_TO_BTF_VAR_ID;
otherwise, it will be a PTR_TO_BTF_ID. For PTR_TO_BTF_VAR_ID,
reg->btf_id is the id of the VAR. For PTR_TO_BTF_ID, reg->btf_id is
the id of the actual kernel type. The verifier would reject direct
memory access on PTR_TO_BTF_VAR_ID, but the new BPF helper can convert
a PTR_TO_BTF_VAR_ID to PTR_TO_BTF_ID.

Hao

2020-07-21 02:39:53

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [RFC PATCH bpf-next 2/2] selftests/bpf: Test __ksym externs with BTF

On Mon, Jul 20, 2020 at 1:28 PM Hao Luo <[email protected]> wrote:
>
> >
> > This should ideally look like a real global variable extern:
> >
> > extern const struct rq runqueues __ksym;
> >
> >
> > But that's the case for non-per-cpu variables. You didn't seem to
> > address per-CPU variables in this patch set. How did you intend to
> > handle that? We should look at a possible BPF helper to access such
> > variables as well and how the verifier will prevent direct memory
> > accesses for such variables.
> >
> > We should have some BPF helper that accepts per-CPU PTR_TO_BTF_ID, and
> > returns PTR_TO_BTF_ID, but adjusted to desired CPU. And verifier
> > ideally would allow direct memory access on that resulting
> > PTR_TO_BTF_ID, but not on per-CPU one. Not sure yet how this should
> > look like, but the verifier probably needs to know that variable
> > itself is per-cpu, no?
> >
>
> Yes, that's what I was unclear about, so I don't have that part in
> this patchset. But your explanation helped me organize my thoughts. :)
>
> Actually, the verifier can tell whether a var is percpu from the
> DATASEC, since we have encoded "percpu" DATASEC in btf. I think the
> following should work:
>
> We may introduce a new PTR_TO_BTF_VAR_ID. In ld_imm, libbpf replaces
> ksyms with btf_id. The btf id points to a KIND_VAR. If the pointed VAR
> is found in the "percpu" DATASEC, dst_reg is set to PTR_TO_BTF_VAR_ID;
> otherwise, it will be a PTR_TO_BTF_ID. For PTR_TO_BTF_VAR_ID,
> reg->btf_id is the id of the VAR. For PTR_TO_BTF_ID, reg->btf_id is
> the id of the actual kernel type. The verifier would reject direct
> memory access on PTR_TO_BTF_VAR_ID, but the new BPF helper can convert
> a PTR_TO_BTF_VAR_ID to PTR_TO_BTF_ID.

Sounds good to me as a plan, except that PTR_TO_BTF_VAR_ID is a
misleading name. It's always a variable. The per-CPU part is crucial,
though, so maybe something like PTR_TO_PERCPU_BTF_ID?

>
> Hao

2020-07-21 16:48:46

by Hao Luo

[permalink] [raw]

Subject: Re: [RFC PATCH bpf-next 2/2] selftests/bpf: Test __ksym externs with BTF

Ack. Will have that in v2.

Hao

On Mon, Jul 20, 2020 at 7:37 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Mon, Jul 20, 2020 at 1:28 PM Hao Luo <[email protected]> wrote:
> >
> > >
> > > This should ideally look like a real global variable extern:
> > >
> > > extern const struct rq runqueues __ksym;
> > >
> > >
> > > But that's the case for non-per-cpu variables. You didn't seem to
> > > address per-CPU variables in this patch set. How did you intend to
> > > handle that? We should look at a possible BPF helper to access such
> > > variables as well and how the verifier will prevent direct memory
> > > accesses for such variables.
> > >
> > > We should have some BPF helper that accepts per-CPU PTR_TO_BTF_ID, and
> > > returns PTR_TO_BTF_ID, but adjusted to desired CPU. And verifier
> > > ideally would allow direct memory access on that resulting
> > > PTR_TO_BTF_ID, but not on per-CPU one. Not sure yet how this should
> > > look like, but the verifier probably needs to know that variable
> > > itself is per-cpu, no?
> > >
> >
> > Yes, that's what I was unclear about, so I don't have that part in
> > this patchset. But your explanation helped me organize my thoughts. :)
> >
> > Actually, the verifier can tell whether a var is percpu from the
> > DATASEC, since we have encoded "percpu" DATASEC in btf. I think the
> > following should work:
> >
> > We may introduce a new PTR_TO_BTF_VAR_ID. In ld_imm, libbpf replaces
> > ksyms with btf_id. The btf id points to a KIND_VAR. If the pointed VAR
> > is found in the "percpu" DATASEC, dst_reg is set to PTR_TO_BTF_VAR_ID;
> > otherwise, it will be a PTR_TO_BTF_ID. For PTR_TO_BTF_VAR_ID,
> > reg->btf_id is the id of the VAR. For PTR_TO_BTF_ID, reg->btf_id is
> > the id of the actual kernel type. The verifier would reject direct
> > memory access on PTR_TO_BTF_VAR_ID, but the new BPF helper can convert
> > a PTR_TO_BTF_VAR_ID to PTR_TO_BTF_ID.
>
> Sounds good to me as a plan, except that PTR_TO_BTF_VAR_ID is a
> misleading name. It's always a variable. The per-CPU part is crucial,
> though, so maybe something like PTR_TO_PERCPU_BTF_ID?
>
> >
> > Hao