2022-08-22 18:30:09

by Daniel Xu

[permalink] [raw]
Subject: [PATCH bpf-next v4 0/5] Support direct writes to nf_conn:mark

Support direct writes to nf_conn:mark from TC and XDP prog types. This
is useful when applications want to store per-connection metadata. This
is also particularly useful for applications that run both bpf and
iptables/nftables because the latter can trivially access this metadata.

One example use case would be if a bpf prog is responsible for advanced
packet classification and iptables/nftables is later used for routing
due to pre-existing/legacy code.

Past discussion:
- v3: https://lore.kernel.org/bpf/[email protected]/
- v2: https://lore.kernel.org/bpf/CAP01T74Sgn354dXGiFWFryu4vg+o8b9s9La1d9zEbC4LGvH4qg@mail.gmail.com/T/
- v1: https://lore.kernel.org/bpf/[email protected]/

Changes since v3:
- Use a mutex to protect module load/unload critical section

Changes since v2:
- Remove use of NOT_INIT for btf_struct_access write path
- Disallow nf_conn writing when nf_conntrack module not loaded
- Support writing to nf_conn___init:mark

Changes since v1:
- Add unimplemented stub for when !CONFIG_BPF_SYSCALL

Daniel Xu (5):
bpf: Remove duplicate PTR_TO_BTF_ID RO check
bpf: Add stub for btf_struct_access()
bpf: Use 0 instead of NOT_INIT for btf_struct_access() writes
bpf: Add support for writing to nf_conn:mark
selftests/bpf: Add tests for writing to nf_conn:mark

include/linux/bpf.h | 9 ++
include/net/netfilter/nf_conntrack_bpf.h | 22 +++++
kernel/bpf/verifier.c | 3 -
net/core/filter.c | 34 +++++++
net/ipv4/bpf_tcp_ca.c | 2 +-
net/netfilter/nf_conntrack_bpf.c | 91 ++++++++++++++++++-
net/netfilter/nf_conntrack_core.c | 1 +
.../testing/selftests/bpf/prog_tests/bpf_nf.c | 2 +
.../testing/selftests/bpf/progs/test_bpf_nf.c | 9 +-
.../selftests/bpf/progs/test_bpf_nf_fail.c | 14 +++
10 files changed, 180 insertions(+), 7 deletions(-)

--
2.37.1


2022-08-22 18:30:47

by Daniel Xu

[permalink] [raw]
Subject: [PATCH bpf-next v4 5/5] selftests/bpf: Add tests for writing to nf_conn:mark

Add a simple extension to the existing selftest to write to
nf_conn:mark. Also add a failure test for writing to unsupported field.

Signed-off-by: Daniel Xu <[email protected]>
---
tools/testing/selftests/bpf/prog_tests/bpf_nf.c | 2 ++
tools/testing/selftests/bpf/progs/test_bpf_nf.c | 9 +++++++--
.../testing/selftests/bpf/progs/test_bpf_nf_fail.c | 14 ++++++++++++++
3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
index 544bf90ac2a7..ab9117ae7545 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c
@@ -17,6 +17,7 @@ struct {
{ "set_status_after_insert", "kernel function bpf_ct_set_status args#0 expected pointer to STRUCT nf_conn___init but" },
{ "change_timeout_after_alloc", "kernel function bpf_ct_change_timeout args#0 expected pointer to STRUCT nf_conn but" },
{ "change_status_after_alloc", "kernel function bpf_ct_change_status args#0 expected pointer to STRUCT nf_conn but" },
+ { "write_not_allowlisted_field", "no write support to nf_conn at off" },
};

enum {
@@ -113,6 +114,7 @@ static void test_bpf_nf_ct(int mode)
ASSERT_LE(skel->bss->test_delta_timeout, 10, "Test for max ct timeout update");
/* expected status is IPS_SEEN_REPLY */
ASSERT_EQ(skel->bss->test_status, 2, "Test for ct status update ");
+ ASSERT_EQ(skel->bss->test_insert_lookup_mark, 77, "Test for insert and lookup mark value");
ASSERT_EQ(skel->data->test_exist_lookup, 0, "Test existing connection lookup");
ASSERT_EQ(skel->bss->test_exist_lookup_mark, 43, "Test existing connection lookup ctmark");
end:
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf.c b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
index 2722441850cc..b5e7079701e8 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_nf.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_nf.c
@@ -23,6 +23,7 @@ int test_insert_entry = -EAFNOSUPPORT;
int test_succ_lookup = -ENOENT;
u32 test_delta_timeout = 0;
u32 test_status = 0;
+u32 test_insert_lookup_mark = 0;
__be32 saddr = 0;
__be16 sport = 0;
__be32 daddr = 0;
@@ -144,6 +145,7 @@ nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,

bpf_ct_set_timeout(ct, 10000);
bpf_ct_set_status(ct, IPS_CONFIRMED);
+ ct->mark = 77;

ct_ins = bpf_ct_insert_entry(ct);
if (ct_ins) {
@@ -157,6 +159,7 @@ nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,
test_delta_timeout = ct_lk->timeout - bpf_jiffies64();
test_delta_timeout /= CONFIG_HZ;
test_status = IPS_SEEN_REPLY;
+ test_insert_lookup_mark = ct_lk->mark;
bpf_ct_change_status(ct_lk, IPS_SEEN_REPLY);
bpf_ct_release(ct_lk);
test_succ_lookup = 0;
@@ -175,8 +178,10 @@ nf_ct_test(struct nf_conn *(*lookup_fn)(void *, struct bpf_sock_tuple *, u32,
sizeof(opts_def));
if (ct) {
test_exist_lookup = 0;
- if (ct->mark == 42)
- test_exist_lookup_mark = 43;
+ if (ct->mark == 42) {
+ ct->mark++;
+ test_exist_lookup_mark = ct->mark;
+ }
bpf_ct_release(ct);
} else {
test_exist_lookup = opts_def.error;
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c b/tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c
index bf79af15c808..0e4759ab38ff 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_nf_fail.c
@@ -69,6 +69,20 @@ int lookup_insert(struct __sk_buff *ctx)
return 0;
}

+SEC("?tc")
+int write_not_allowlisted_field(struct __sk_buff *ctx)
+{
+ struct bpf_ct_opts___local opts = {};
+ struct bpf_sock_tuple tup = {};
+ struct nf_conn *ct;
+
+ ct = bpf_skb_ct_lookup(ctx, &tup, sizeof(tup.ipv4), &opts, sizeof(opts));
+ if (!ct)
+ return 0;
+ ct->status = 0xF00;
+ return 0;
+}
+
SEC("?tc")
int set_timeout_after_insert(struct __sk_buff *ctx)
{
--
2.37.1

2022-08-22 18:39:10

by Daniel Xu

[permalink] [raw]
Subject: [PATCH bpf-next v4 3/5] bpf: Use 0 instead of NOT_INIT for btf_struct_access() writes

Returning a bpf_reg_type only makes sense in the context of a BPF_READ.
For writes, prefer to explicitly return 0 for clarity.

Note that is non-functional change as it just so happened that NOT_INIT
== 0.

Signed-off-by: Daniel Xu <[email protected]>
---
net/ipv4/bpf_tcp_ca.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 85a9e500c42d..6da16ae6a962 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -124,7 +124,7 @@ static int bpf_tcp_ca_btf_struct_access(struct bpf_verifier_log *log,
return -EACCES;
}

- return NOT_INIT;
+ return 0;
}

BPF_CALL_2(bpf_tcp_send_ack, struct tcp_sock *, tp, u32, rcv_nxt)
--
2.37.1

2022-08-22 18:39:11

by Daniel Xu

[permalink] [raw]
Subject: [PATCH bpf-next v4 4/5] bpf: Add support for writing to nf_conn:mark

Support direct writes to nf_conn:mark from TC and XDP prog types. This
is useful when applications want to store per-connection metadata. This
is also particularly useful for applications that run both bpf and
iptables/nftables because the latter can trivially access this metadata.

One example use case would be if a bpf prog is responsible for advanced
packet classification and iptables/nftables is later used for routing
due to pre-existing/legacy code.

Signed-off-by: Daniel Xu <[email protected]>
---
include/net/netfilter/nf_conntrack_bpf.h | 22 ++++++
net/core/filter.c | 34 +++++++++
net/netfilter/nf_conntrack_bpf.c | 91 +++++++++++++++++++++++-
net/netfilter/nf_conntrack_core.c | 1 +
4 files changed, 147 insertions(+), 1 deletion(-)

diff --git a/include/net/netfilter/nf_conntrack_bpf.h b/include/net/netfilter/nf_conntrack_bpf.h
index a473b56842c5..6fc03066846b 100644
--- a/include/net/netfilter/nf_conntrack_bpf.h
+++ b/include/net/netfilter/nf_conntrack_bpf.h
@@ -3,6 +3,7 @@
#ifndef _NF_CONNTRACK_BPF_H
#define _NF_CONNTRACK_BPF_H

+#include <linux/bpf.h>
#include <linux/btf.h>
#include <linux/kconfig.h>

@@ -10,6 +11,13 @@
(IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))

extern int register_nf_conntrack_bpf(void);
+extern void cleanup_nf_conntrack_bpf(void);
+extern int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
+ const struct btf *btf,
+ const struct btf_type *t, int off,
+ int size, enum bpf_access_type atype,
+ u32 *next_btf_id,
+ enum bpf_type_flag *flag);

#else

@@ -18,6 +26,20 @@ static inline int register_nf_conntrack_bpf(void)
return 0;
}

+static inline void cleanup_nf_conntrack_bpf(void)
+{
+}
+
+static inline int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
+ const struct btf *btf,
+ const struct btf_type *t, int off,
+ int size, enum bpf_access_type atype,
+ u32 *next_btf_id,
+ enum bpf_type_flag *flag)
+{
+ return -EACCES;
+}
+
#endif

#endif /* _NF_CONNTRACK_BPF_H */
diff --git a/net/core/filter.c b/net/core/filter.c
index 1acfaffeaf32..25bdbf6dc76b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -18,6 +18,7 @@
*/

#include <linux/atomic.h>
+#include <linux/bpf_verifier.h>
#include <linux/module.h>
#include <linux/types.h>
#include <linux/mm.h>
@@ -55,6 +56,7 @@
#include <net/sock_reuseport.h>
#include <net/busy_poll.h>
#include <net/tcp.h>
+#include <net/netfilter/nf_conntrack_bpf.h>
#include <net/xfrm.h>
#include <net/udp.h>
#include <linux/bpf_trace.h>
@@ -8628,6 +8630,21 @@ static bool tc_cls_act_is_valid_access(int off, int size,
return bpf_skb_is_valid_access(off, size, type, prog, info);
}

+static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
+ const struct btf *btf,
+ const struct btf_type *t, int off,
+ int size, enum bpf_access_type atype,
+ u32 *next_btf_id,
+ enum bpf_type_flag *flag)
+{
+ if (atype == BPF_READ)
+ return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
+ flag);
+
+ return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
+ next_btf_id, flag);
+}
+
static bool __is_valid_xdp_access(int off, int size)
{
if (off < 0 || off >= sizeof(struct xdp_md))
@@ -8687,6 +8704,21 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
}
EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);

+static int xdp_btf_struct_access(struct bpf_verifier_log *log,
+ const struct btf *btf,
+ const struct btf_type *t, int off,
+ int size, enum bpf_access_type atype,
+ u32 *next_btf_id,
+ enum bpf_type_flag *flag)
+{
+ if (atype == BPF_READ)
+ return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
+ flag);
+
+ return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
+ next_btf_id, flag);
+}
+
static bool sock_addr_is_valid_access(int off, int size,
enum bpf_access_type type,
const struct bpf_prog *prog,
@@ -10581,6 +10613,7 @@ const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
.convert_ctx_access = tc_cls_act_convert_ctx_access,
.gen_prologue = tc_cls_act_prologue,
.gen_ld_abs = bpf_gen_ld_abs,
+ .btf_struct_access = tc_cls_act_btf_struct_access,
};

const struct bpf_prog_ops tc_cls_act_prog_ops = {
@@ -10592,6 +10625,7 @@ const struct bpf_verifier_ops xdp_verifier_ops = {
.is_valid_access = xdp_is_valid_access,
.convert_ctx_access = xdp_convert_ctx_access,
.gen_prologue = bpf_noop_prologue,
+ .btf_struct_access = xdp_btf_struct_access,
};

const struct bpf_prog_ops xdp_prog_ops = {
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 1cd87b28c9b0..da54355927d4 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -6,8 +6,10 @@
* are exposed through to BPF programs is explicitly unstable.
*/

+#include <linux/bpf_verifier.h>
#include <linux/bpf.h>
#include <linux/btf.h>
+#include <linux/mutex.h>
#include <linux/types.h>
#include <linux/btf_ids.h>
#include <linux/net_namespace.h>
@@ -184,6 +186,79 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
return ct;
}

+BTF_ID_LIST(btf_nf_conn_ids)
+BTF_ID(struct, nf_conn)
+BTF_ID(struct, nf_conn___init)
+
+static DEFINE_MUTEX(btf_access_lock);
+static int (*nfct_bsa)(struct bpf_verifier_log *log,
+ const struct btf *btf,
+ const struct btf_type *t, int off,
+ int size, enum bpf_access_type atype,
+ u32 *next_btf_id,
+ enum bpf_type_flag *flag);
+
+/* Check writes into `struct nf_conn` */
+static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
+ const struct btf *btf,
+ const struct btf_type *t, int off,
+ int size, enum bpf_access_type atype,
+ u32 *next_btf_id,
+ enum bpf_type_flag *flag)
+{
+ const struct btf_type *ncit;
+ const struct btf_type *nct;
+ size_t end;
+
+ ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]);
+ nct = btf_type_by_id(btf, btf_nf_conn_ids[0]);
+
+ if (t != nct && t != ncit) {
+ bpf_log(log, "only read is supported\n");
+ return -EACCES;
+ }
+
+ /* `struct nf_conn` and `struct nf_conn___init` have the same layout
+ * so we are safe to simply merge offset checks here
+ */
+ switch (off) {
+#if defined(CONFIG_NF_CONNTRACK_MARK)
+ case offsetof(struct nf_conn, mark):
+ end = offsetofend(struct nf_conn, mark);
+ break;
+#endif
+ default:
+ bpf_log(log, "no write support to nf_conn at off %d\n", off);
+ return -EACCES;
+ }
+
+ if (off + size > end) {
+ bpf_log(log,
+ "write access at off %d with size %d beyond the member of nf_conn ended at %zu\n",
+ off, size, end);
+ return -EACCES;
+ }
+
+ return 0;
+}
+
+int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
+ const struct btf *btf,
+ const struct btf_type *t, int off,
+ int size, enum bpf_access_type atype,
+ u32 *next_btf_id,
+ enum bpf_type_flag *flag)
+{
+ int ret = -EACCES;
+
+ mutex_lock(&btf_access_lock);
+ if (nfct_bsa)
+ ret = nfct_bsa(log, btf, t, off, size, atype, next_btf_id, flag);
+ mutex_unlock(&btf_access_lock);
+
+ return ret;
+}
+
__diag_push();
__diag_ignore_all("-Wmissing-prototypes",
"Global functions as their definitions will be in nf_conntrack BTF");
@@ -449,5 +524,19 @@ int register_nf_conntrack_bpf(void)
int ret;

ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &nf_conntrack_kfunc_set);
- return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set);
+ ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &nf_conntrack_kfunc_set);
+ if (!ret) {
+ mutex_lock(&btf_access_lock);
+ nfct_bsa = _nf_conntrack_btf_struct_access;
+ mutex_unlock(&btf_access_lock);
+ }
+
+ return ret;
+}
+
+void cleanup_nf_conntrack_bpf(void)
+{
+ mutex_lock(&btf_access_lock);
+ nfct_bsa = NULL;
+ mutex_unlock(&btf_access_lock);
}
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index da65c6e8eeeb..0195f60fc43b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2512,6 +2512,7 @@ static int kill_all(struct nf_conn *i, void *data)

void nf_conntrack_cleanup_start(void)
{
+ cleanup_nf_conntrack_bpf();
conntrack_gc_work.exiting = true;
}

--
2.37.1

2022-08-22 18:40:00

by Daniel Xu

[permalink] [raw]
Subject: [PATCH bpf-next v4 2/5] bpf: Add stub for btf_struct_access()

Add corresponding unimplemented stub for when CONFIG_BPF_SYSCALL=n

Signed-off-by: Daniel Xu <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
---
include/linux/bpf.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 39bd36359c1e..fcde14ae6e60 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2157,6 +2157,15 @@ static inline struct bpf_prog *bpf_prog_by_id(u32 id)
return ERR_PTR(-ENOTSUPP);
}

+static inline int btf_struct_access(struct bpf_verifier_log *log,
+ const struct btf *btf,
+ const struct btf_type *t, int off, int size,
+ enum bpf_access_type atype,
+ u32 *next_btf_id, enum bpf_type_flag *flag)
+{
+ return -EACCES;
+}
+
static inline const struct bpf_func_proto *
bpf_base_func_proto(enum bpf_func_id func_id)
{
--
2.37.1

2022-08-23 00:59:55

by Kumar Kartikeya Dwivedi

[permalink] [raw]
Subject: Re: [PATCH bpf-next v4 4/5] bpf: Add support for writing to nf_conn:mark

On Mon, 22 Aug 2022 at 20:26, Daniel Xu <[email protected]> wrote:
>
> Support direct writes to nf_conn:mark from TC and XDP prog types. This
> is useful when applications want to store per-connection metadata. This
> is also particularly useful for applications that run both bpf and
> iptables/nftables because the latter can trivially access this metadata.
>
> One example use case would be if a bpf prog is responsible for advanced
> packet classification and iptables/nftables is later used for routing
> due to pre-existing/legacy code.
>
> Signed-off-by: Daniel Xu <[email protected]>
> ---
> include/net/netfilter/nf_conntrack_bpf.h | 22 ++++++
> net/core/filter.c | 34 +++++++++
> net/netfilter/nf_conntrack_bpf.c | 91 +++++++++++++++++++++++-
> net/netfilter/nf_conntrack_core.c | 1 +
> 4 files changed, 147 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/netfilter/nf_conntrack_bpf.h b/include/net/netfilter/nf_conntrack_bpf.h
> index a473b56842c5..6fc03066846b 100644
> --- a/include/net/netfilter/nf_conntrack_bpf.h
> +++ b/include/net/netfilter/nf_conntrack_bpf.h
> @@ -3,6 +3,7 @@
> #ifndef _NF_CONNTRACK_BPF_H
> #define _NF_CONNTRACK_BPF_H
>
> +#include <linux/bpf.h>
> #include <linux/btf.h>
> #include <linux/kconfig.h>
>
> @@ -10,6 +11,13 @@
> (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
>
> extern int register_nf_conntrack_bpf(void);
> +extern void cleanup_nf_conntrack_bpf(void);
> +extern int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> + const struct btf *btf,
> + const struct btf_type *t, int off,
> + int size, enum bpf_access_type atype,
> + u32 *next_btf_id,
> + enum bpf_type_flag *flag);
>
> #else
>
> @@ -18,6 +26,20 @@ static inline int register_nf_conntrack_bpf(void)
> return 0;
> }
>
> +static inline void cleanup_nf_conntrack_bpf(void)
> +{
> +}
> +
> +static inline int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> + const struct btf *btf,
> + const struct btf_type *t, int off,
> + int size, enum bpf_access_type atype,
> + u32 *next_btf_id,
> + enum bpf_type_flag *flag)
> +{
> + return -EACCES;
> +}
> +
> #endif
>
> #endif /* _NF_CONNTRACK_BPF_H */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 1acfaffeaf32..25bdbf6dc76b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -18,6 +18,7 @@
> */
>
> #include <linux/atomic.h>
> +#include <linux/bpf_verifier.h>
> #include <linux/module.h>
> #include <linux/types.h>
> #include <linux/mm.h>
> @@ -55,6 +56,7 @@
> #include <net/sock_reuseport.h>
> #include <net/busy_poll.h>
> #include <net/tcp.h>
> +#include <net/netfilter/nf_conntrack_bpf.h>
> #include <net/xfrm.h>
> #include <net/udp.h>
> #include <linux/bpf_trace.h>
> @@ -8628,6 +8630,21 @@ static bool tc_cls_act_is_valid_access(int off, int size,
> return bpf_skb_is_valid_access(off, size, type, prog, info);
> }
>
> +static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
> + const struct btf *btf,
> + const struct btf_type *t, int off,
> + int size, enum bpf_access_type atype,
> + u32 *next_btf_id,
> + enum bpf_type_flag *flag)
> +{
> + if (atype == BPF_READ)
> + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> + flag);
> +
> + return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
> + next_btf_id, flag);
> +}
> +
> static bool __is_valid_xdp_access(int off, int size)
> {
> if (off < 0 || off >= sizeof(struct xdp_md))
> @@ -8687,6 +8704,21 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
> }
> EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
>
> +static int xdp_btf_struct_access(struct bpf_verifier_log *log,
> + const struct btf *btf,
> + const struct btf_type *t, int off,
> + int size, enum bpf_access_type atype,
> + u32 *next_btf_id,
> + enum bpf_type_flag *flag)
> +{
> + if (atype == BPF_READ)
> + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> + flag);
> +
> + return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
> + next_btf_id, flag);
> +}
> +
> static bool sock_addr_is_valid_access(int off, int size,
> enum bpf_access_type type,
> const struct bpf_prog *prog,
> @@ -10581,6 +10613,7 @@ const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
> .convert_ctx_access = tc_cls_act_convert_ctx_access,
> .gen_prologue = tc_cls_act_prologue,
> .gen_ld_abs = bpf_gen_ld_abs,
> + .btf_struct_access = tc_cls_act_btf_struct_access,
> };
>
> const struct bpf_prog_ops tc_cls_act_prog_ops = {
> @@ -10592,6 +10625,7 @@ const struct bpf_verifier_ops xdp_verifier_ops = {
> .is_valid_access = xdp_is_valid_access,
> .convert_ctx_access = xdp_convert_ctx_access,
> .gen_prologue = bpf_noop_prologue,
> + .btf_struct_access = xdp_btf_struct_access,
> };
>
> const struct bpf_prog_ops xdp_prog_ops = {
> diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
> index 1cd87b28c9b0..da54355927d4 100644
> --- a/net/netfilter/nf_conntrack_bpf.c
> +++ b/net/netfilter/nf_conntrack_bpf.c
> @@ -6,8 +6,10 @@
> * are exposed through to BPF programs is explicitly unstable.
> */
>
> +#include <linux/bpf_verifier.h>
> #include <linux/bpf.h>
> #include <linux/btf.h>
> +#include <linux/mutex.h>
> #include <linux/types.h>
> #include <linux/btf_ids.h>
> #include <linux/net_namespace.h>
> @@ -184,6 +186,79 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
> return ct;
> }
>
> +BTF_ID_LIST(btf_nf_conn_ids)
> +BTF_ID(struct, nf_conn)
> +BTF_ID(struct, nf_conn___init)
> +
> +static DEFINE_MUTEX(btf_access_lock);
> +static int (*nfct_bsa)(struct bpf_verifier_log *log,
> + const struct btf *btf,
> + const struct btf_type *t, int off,
> + int size, enum bpf_access_type atype,
> + u32 *next_btf_id,
> + enum bpf_type_flag *flag);
> +
> +/* Check writes into `struct nf_conn` */
> +static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> + const struct btf *btf,
> + const struct btf_type *t, int off,
> + int size, enum bpf_access_type atype,
> + u32 *next_btf_id,
> + enum bpf_type_flag *flag)
> +{
> + const struct btf_type *ncit;
> + const struct btf_type *nct;
> + size_t end;
> +
> + ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]);
> + nct = btf_type_by_id(btf, btf_nf_conn_ids[0]);
> +
> + if (t != nct && t != ncit) {
> + bpf_log(log, "only read is supported\n");
> + return -EACCES;
> + }
> +
> + /* `struct nf_conn` and `struct nf_conn___init` have the same layout
> + * so we are safe to simply merge offset checks here
> + */
> + switch (off) {
> +#if defined(CONFIG_NF_CONNTRACK_MARK)
> + case offsetof(struct nf_conn, mark):
> + end = offsetofend(struct nf_conn, mark);
> + break;
> +#endif
> + default:
> + bpf_log(log, "no write support to nf_conn at off %d\n", off);
> + return -EACCES;
> + }
> +
> + if (off + size > end) {
> + bpf_log(log,
> + "write access at off %d with size %d beyond the member of nf_conn ended at %zu\n",
> + off, size, end);
> + return -EACCES;
> + }
> +
> + return 0;
> +}
> +
> +int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> + const struct btf *btf,
> + const struct btf_type *t, int off,
> + int size, enum bpf_access_type atype,
> + u32 *next_btf_id,
> + enum bpf_type_flag *flag)
> +{
> + int ret = -EACCES;
> +
> + mutex_lock(&btf_access_lock);
> + if (nfct_bsa)
> + ret = nfct_bsa(log, btf, t, off, size, atype, next_btf_id, flag);
> + mutex_unlock(&btf_access_lock);
> +
> + return ret;
> +}

Did you test this for CONFIG_NF_CONNTRACK=m? For me it isn't building :P.

It won't work like this. When nf_conntrack is a module, the vmlinux.o
of the kernel isn't linked to the object file nf_conntrack_bpf.o.
Hence it would be an undefined reference error. You don't see it in
BPF CI as we set CONFIG_NF_CONNTRACK=y (to simplify testing).

So you need to have code that locks and checks the cb pointer when
calling it outside the module, which means the global lock variable
and global cb pointer also need to be in the kernel. The module then
takes the same lock and sets cb pointer when loading. During unload,
it takes the same lock and sets it back to NULL.

You can have global variables in vmlinux that you reference from
modules. The compiler will emit a relocation for the module object
file which will be handled by the kernel during module load.

So please test it once with nf_conntrack built as a module before
sending the next revision. The only thing you need to do before
running ./test_progs -t bpf_nf is loading the module nf_conntrack.ko
(and its dependencies, nf_defrag_ipv{4,6}.ko).

2022-08-23 02:39:08

by Daniel Xu

[permalink] [raw]
Subject: Re: [PATCH bpf-next v4 4/5] bpf: Add support for writing to nf_conn:mark

On Tue, Aug 23, 2022 at 02:16:42AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Mon, 22 Aug 2022 at 20:26, Daniel Xu <[email protected]> wrote:
> >
> > Support direct writes to nf_conn:mark from TC and XDP prog types. This
> > is useful when applications want to store per-connection metadata. This
> > is also particularly useful for applications that run both bpf and
> > iptables/nftables because the latter can trivially access this metadata.
> >
> > One example use case would be if a bpf prog is responsible for advanced
> > packet classification and iptables/nftables is later used for routing
> > due to pre-existing/legacy code.
> >
> > Signed-off-by: Daniel Xu <[email protected]>
> > ---
> > include/net/netfilter/nf_conntrack_bpf.h | 22 ++++++
> > net/core/filter.c | 34 +++++++++
> > net/netfilter/nf_conntrack_bpf.c | 91 +++++++++++++++++++++++-
> > net/netfilter/nf_conntrack_core.c | 1 +
> > 4 files changed, 147 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/net/netfilter/nf_conntrack_bpf.h b/include/net/netfilter/nf_conntrack_bpf.h
> > index a473b56842c5..6fc03066846b 100644
> > --- a/include/net/netfilter/nf_conntrack_bpf.h
> > +++ b/include/net/netfilter/nf_conntrack_bpf.h
> > @@ -3,6 +3,7 @@
> > #ifndef _NF_CONNTRACK_BPF_H
> > #define _NF_CONNTRACK_BPF_H
> >
> > +#include <linux/bpf.h>
> > #include <linux/btf.h>
> > #include <linux/kconfig.h>
> >
> > @@ -10,6 +11,13 @@
> > (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
> >
> > extern int register_nf_conntrack_bpf(void);
> > +extern void cleanup_nf_conntrack_bpf(void);
> > +extern int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> > + const struct btf *btf,
> > + const struct btf_type *t, int off,
> > + int size, enum bpf_access_type atype,
> > + u32 *next_btf_id,
> > + enum bpf_type_flag *flag);
> >
> > #else
> >
> > @@ -18,6 +26,20 @@ static inline int register_nf_conntrack_bpf(void)
> > return 0;
> > }
> >
> > +static inline void cleanup_nf_conntrack_bpf(void)
> > +{
> > +}
> > +
> > +static inline int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> > + const struct btf *btf,
> > + const struct btf_type *t, int off,
> > + int size, enum bpf_access_type atype,
> > + u32 *next_btf_id,
> > + enum bpf_type_flag *flag)
> > +{
> > + return -EACCES;
> > +}
> > +
> > #endif
> >
> > #endif /* _NF_CONNTRACK_BPF_H */
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 1acfaffeaf32..25bdbf6dc76b 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -18,6 +18,7 @@
> > */
> >
> > #include <linux/atomic.h>
> > +#include <linux/bpf_verifier.h>
> > #include <linux/module.h>
> > #include <linux/types.h>
> > #include <linux/mm.h>
> > @@ -55,6 +56,7 @@
> > #include <net/sock_reuseport.h>
> > #include <net/busy_poll.h>
> > #include <net/tcp.h>
> > +#include <net/netfilter/nf_conntrack_bpf.h>
> > #include <net/xfrm.h>
> > #include <net/udp.h>
> > #include <linux/bpf_trace.h>
> > @@ -8628,6 +8630,21 @@ static bool tc_cls_act_is_valid_access(int off, int size,
> > return bpf_skb_is_valid_access(off, size, type, prog, info);
> > }
> >
> > +static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
> > + const struct btf *btf,
> > + const struct btf_type *t, int off,
> > + int size, enum bpf_access_type atype,
> > + u32 *next_btf_id,
> > + enum bpf_type_flag *flag)
> > +{
> > + if (atype == BPF_READ)
> > + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> > + flag);
> > +
> > + return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
> > + next_btf_id, flag);
> > +}
> > +
> > static bool __is_valid_xdp_access(int off, int size)
> > {
> > if (off < 0 || off >= sizeof(struct xdp_md))
> > @@ -8687,6 +8704,21 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
> > }
> > EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
> >
> > +static int xdp_btf_struct_access(struct bpf_verifier_log *log,
> > + const struct btf *btf,
> > + const struct btf_type *t, int off,
> > + int size, enum bpf_access_type atype,
> > + u32 *next_btf_id,
> > + enum bpf_type_flag *flag)
> > +{
> > + if (atype == BPF_READ)
> > + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> > + flag);
> > +
> > + return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
> > + next_btf_id, flag);
> > +}
> > +
> > static bool sock_addr_is_valid_access(int off, int size,
> > enum bpf_access_type type,
> > const struct bpf_prog *prog,
> > @@ -10581,6 +10613,7 @@ const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
> > .convert_ctx_access = tc_cls_act_convert_ctx_access,
> > .gen_prologue = tc_cls_act_prologue,
> > .gen_ld_abs = bpf_gen_ld_abs,
> > + .btf_struct_access = tc_cls_act_btf_struct_access,
> > };
> >
> > const struct bpf_prog_ops tc_cls_act_prog_ops = {
> > @@ -10592,6 +10625,7 @@ const struct bpf_verifier_ops xdp_verifier_ops = {
> > .is_valid_access = xdp_is_valid_access,
> > .convert_ctx_access = xdp_convert_ctx_access,
> > .gen_prologue = bpf_noop_prologue,
> > + .btf_struct_access = xdp_btf_struct_access,
> > };
> >
> > const struct bpf_prog_ops xdp_prog_ops = {
> > diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
> > index 1cd87b28c9b0..da54355927d4 100644
> > --- a/net/netfilter/nf_conntrack_bpf.c
> > +++ b/net/netfilter/nf_conntrack_bpf.c
> > @@ -6,8 +6,10 @@
> > * are exposed through to BPF programs is explicitly unstable.
> > */
> >
> > +#include <linux/bpf_verifier.h>
> > #include <linux/bpf.h>
> > #include <linux/btf.h>
> > +#include <linux/mutex.h>
> > #include <linux/types.h>
> > #include <linux/btf_ids.h>
> > #include <linux/net_namespace.h>
> > @@ -184,6 +186,79 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
> > return ct;
> > }
> >
> > +BTF_ID_LIST(btf_nf_conn_ids)
> > +BTF_ID(struct, nf_conn)
> > +BTF_ID(struct, nf_conn___init)
> > +
> > +static DEFINE_MUTEX(btf_access_lock);
> > +static int (*nfct_bsa)(struct bpf_verifier_log *log,
> > + const struct btf *btf,
> > + const struct btf_type *t, int off,
> > + int size, enum bpf_access_type atype,
> > + u32 *next_btf_id,
> > + enum bpf_type_flag *flag);
> > +
> > +/* Check writes into `struct nf_conn` */
> > +static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> > + const struct btf *btf,
> > + const struct btf_type *t, int off,
> > + int size, enum bpf_access_type atype,
> > + u32 *next_btf_id,
> > + enum bpf_type_flag *flag)
> > +{
> > + const struct btf_type *ncit;
> > + const struct btf_type *nct;
> > + size_t end;
> > +
> > + ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]);
> > + nct = btf_type_by_id(btf, btf_nf_conn_ids[0]);
> > +
> > + if (t != nct && t != ncit) {
> > + bpf_log(log, "only read is supported\n");
> > + return -EACCES;
> > + }
> > +
> > + /* `struct nf_conn` and `struct nf_conn___init` have the same layout
> > + * so we are safe to simply merge offset checks here
> > + */
> > + switch (off) {
> > +#if defined(CONFIG_NF_CONNTRACK_MARK)
> > + case offsetof(struct nf_conn, mark):
> > + end = offsetofend(struct nf_conn, mark);
> > + break;
> > +#endif
> > + default:
> > + bpf_log(log, "no write support to nf_conn at off %d\n", off);
> > + return -EACCES;
> > + }
> > +
> > + if (off + size > end) {
> > + bpf_log(log,
> > + "write access at off %d with size %d beyond the member of nf_conn ended at %zu\n",
> > + off, size, end);
> > + return -EACCES;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> > + const struct btf *btf,
> > + const struct btf_type *t, int off,
> > + int size, enum bpf_access_type atype,
> > + u32 *next_btf_id,
> > + enum bpf_type_flag *flag)
> > +{
> > + int ret = -EACCES;
> > +
> > + mutex_lock(&btf_access_lock);
> > + if (nfct_bsa)
> > + ret = nfct_bsa(log, btf, t, off, size, atype, next_btf_id, flag);
> > + mutex_unlock(&btf_access_lock);
> > +
> > + return ret;
> > +}
>
> Did you test this for CONFIG_NF_CONNTRACK=m? For me it isn't building :P.
>
> It won't work like this. When nf_conntrack is a module, the vmlinux.o
> of the kernel isn't linked to the object file nf_conntrack_bpf.o.
> Hence it would be an undefined reference error. You don't see it in
> BPF CI as we set CONFIG_NF_CONNTRACK=y (to simplify testing).

Sorry about that. Will make sure to test that config setting.

> So you need to have code that locks and checks the cb pointer when
> calling it outside the module, which means the global lock variable
> and global cb pointer also need to be in the kernel. The module then
> takes the same lock and sets cb pointer when loading. During unload,
> it takes the same lock and sets it back to NULL.
>
> You can have global variables in vmlinux that you reference from
> modules. The compiler will emit a relocation for the module object
> file which will be handled by the kernel during module load.

Sure, I'll take a look. I was trying to keep conntrack symbols out of
the core object files as much as possible but looks like that won't be
possible.

However, I think to keep the globals symbols in vmlinux we'll need to
EXPORT_SYMBOL_GPL() some symbols. Hopefully that is OK.

There's also some other issues I'm uncovering with duplicate BTF IDs for
nf_conn. Might have to do a lookup by name instead of the BTF_ID_LIST().

> So please test it once with nf_conntrack built as a module before
> sending the next revision. The only thing you need to do before
> running ./test_progs -t bpf_nf is loading the module nf_conntrack.ko
> (and its dependencies, nf_defrag_ipv{4,6}.ko).

Will do.

Thanks again for the reviews,
Daniel

2022-08-23 02:41:28

by Kumar Kartikeya Dwivedi

[permalink] [raw]
Subject: Re: [PATCH bpf-next v4 4/5] bpf: Add support for writing to nf_conn:mark

On Tue, 23 Aug 2022 at 04:19, Daniel Xu <[email protected]> wrote:
>
> On Tue, Aug 23, 2022 at 02:16:42AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Mon, 22 Aug 2022 at 20:26, Daniel Xu <[email protected]> wrote:
> > >
> > > Support direct writes to nf_conn:mark from TC and XDP prog types. This
> > > is useful when applications want to store per-connection metadata. This
> > > is also particularly useful for applications that run both bpf and
> > > iptables/nftables because the latter can trivially access this metadata.
> > >
> > > One example use case would be if a bpf prog is responsible for advanced
> > > packet classification and iptables/nftables is later used for routing
> > > due to pre-existing/legacy code.
> > >
> > > Signed-off-by: Daniel Xu <[email protected]>
> > > ---
> > > include/net/netfilter/nf_conntrack_bpf.h | 22 ++++++
> > > net/core/filter.c | 34 +++++++++
> > > net/netfilter/nf_conntrack_bpf.c | 91 +++++++++++++++++++++++-
> > > net/netfilter/nf_conntrack_core.c | 1 +
> > > 4 files changed, 147 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/net/netfilter/nf_conntrack_bpf.h b/include/net/netfilter/nf_conntrack_bpf.h
> > > index a473b56842c5..6fc03066846b 100644
> > > --- a/include/net/netfilter/nf_conntrack_bpf.h
> > > +++ b/include/net/netfilter/nf_conntrack_bpf.h
> > > @@ -3,6 +3,7 @@
> > > #ifndef _NF_CONNTRACK_BPF_H
> > > #define _NF_CONNTRACK_BPF_H
> > >
> > > +#include <linux/bpf.h>
> > > #include <linux/btf.h>
> > > #include <linux/kconfig.h>
> > >
> > > @@ -10,6 +11,13 @@
> > > (IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES))
> > >
> > > extern int register_nf_conntrack_bpf(void);
> > > +extern void cleanup_nf_conntrack_bpf(void);
> > > +extern int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> > > + const struct btf *btf,
> > > + const struct btf_type *t, int off,
> > > + int size, enum bpf_access_type atype,
> > > + u32 *next_btf_id,
> > > + enum bpf_type_flag *flag);
> > >
> > > #else
> > >
> > > @@ -18,6 +26,20 @@ static inline int register_nf_conntrack_bpf(void)
> > > return 0;
> > > }
> > >
> > > +static inline void cleanup_nf_conntrack_bpf(void)
> > > +{
> > > +}
> > > +
> > > +static inline int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> > > + const struct btf *btf,
> > > + const struct btf_type *t, int off,
> > > + int size, enum bpf_access_type atype,
> > > + u32 *next_btf_id,
> > > + enum bpf_type_flag *flag)
> > > +{
> > > + return -EACCES;
> > > +}
> > > +
> > > #endif
> > >
> > > #endif /* _NF_CONNTRACK_BPF_H */
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 1acfaffeaf32..25bdbf6dc76b 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -18,6 +18,7 @@
> > > */
> > >
> > > #include <linux/atomic.h>
> > > +#include <linux/bpf_verifier.h>
> > > #include <linux/module.h>
> > > #include <linux/types.h>
> > > #include <linux/mm.h>
> > > @@ -55,6 +56,7 @@
> > > #include <net/sock_reuseport.h>
> > > #include <net/busy_poll.h>
> > > #include <net/tcp.h>
> > > +#include <net/netfilter/nf_conntrack_bpf.h>
> > > #include <net/xfrm.h>
> > > #include <net/udp.h>
> > > #include <linux/bpf_trace.h>
> > > @@ -8628,6 +8630,21 @@ static bool tc_cls_act_is_valid_access(int off, int size,
> > > return bpf_skb_is_valid_access(off, size, type, prog, info);
> > > }
> > >
> > > +static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
> > > + const struct btf *btf,
> > > + const struct btf_type *t, int off,
> > > + int size, enum bpf_access_type atype,
> > > + u32 *next_btf_id,
> > > + enum bpf_type_flag *flag)
> > > +{
> > > + if (atype == BPF_READ)
> > > + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> > > + flag);
> > > +
> > > + return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
> > > + next_btf_id, flag);
> > > +}
> > > +
> > > static bool __is_valid_xdp_access(int off, int size)
> > > {
> > > if (off < 0 || off >= sizeof(struct xdp_md))
> > > @@ -8687,6 +8704,21 @@ void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog,
> > > }
> > > EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
> > >
> > > +static int xdp_btf_struct_access(struct bpf_verifier_log *log,
> > > + const struct btf *btf,
> > > + const struct btf_type *t, int off,
> > > + int size, enum bpf_access_type atype,
> > > + u32 *next_btf_id,
> > > + enum bpf_type_flag *flag)
> > > +{
> > > + if (atype == BPF_READ)
> > > + return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
> > > + flag);
> > > +
> > > + return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
> > > + next_btf_id, flag);
> > > +}
> > > +
> > > static bool sock_addr_is_valid_access(int off, int size,
> > > enum bpf_access_type type,
> > > const struct bpf_prog *prog,
> > > @@ -10581,6 +10613,7 @@ const struct bpf_verifier_ops tc_cls_act_verifier_ops = {
> > > .convert_ctx_access = tc_cls_act_convert_ctx_access,
> > > .gen_prologue = tc_cls_act_prologue,
> > > .gen_ld_abs = bpf_gen_ld_abs,
> > > + .btf_struct_access = tc_cls_act_btf_struct_access,
> > > };
> > >
> > > const struct bpf_prog_ops tc_cls_act_prog_ops = {
> > > @@ -10592,6 +10625,7 @@ const struct bpf_verifier_ops xdp_verifier_ops = {
> > > .is_valid_access = xdp_is_valid_access,
> > > .convert_ctx_access = xdp_convert_ctx_access,
> > > .gen_prologue = bpf_noop_prologue,
> > > + .btf_struct_access = xdp_btf_struct_access,
> > > };
> > >
> > > const struct bpf_prog_ops xdp_prog_ops = {
> > > diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
> > > index 1cd87b28c9b0..da54355927d4 100644
> > > --- a/net/netfilter/nf_conntrack_bpf.c
> > > +++ b/net/netfilter/nf_conntrack_bpf.c
> > > @@ -6,8 +6,10 @@
> > > * are exposed through to BPF programs is explicitly unstable.
> > > */
> > >
> > > +#include <linux/bpf_verifier.h>
> > > #include <linux/bpf.h>
> > > #include <linux/btf.h>
> > > +#include <linux/mutex.h>
> > > #include <linux/types.h>
> > > #include <linux/btf_ids.h>
> > > #include <linux/net_namespace.h>
> > > @@ -184,6 +186,79 @@ static struct nf_conn *__bpf_nf_ct_lookup(struct net *net,
> > > return ct;
> > > }
> > >
> > > +BTF_ID_LIST(btf_nf_conn_ids)
> > > +BTF_ID(struct, nf_conn)
> > > +BTF_ID(struct, nf_conn___init)
> > > +
> > > +static DEFINE_MUTEX(btf_access_lock);
> > > +static int (*nfct_bsa)(struct bpf_verifier_log *log,
> > > + const struct btf *btf,
> > > + const struct btf_type *t, int off,
> > > + int size, enum bpf_access_type atype,
> > > + u32 *next_btf_id,
> > > + enum bpf_type_flag *flag);
> > > +
> > > +/* Check writes into `struct nf_conn` */
> > > +static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> > > + const struct btf *btf,
> > > + const struct btf_type *t, int off,
> > > + int size, enum bpf_access_type atype,
> > > + u32 *next_btf_id,
> > > + enum bpf_type_flag *flag)
> > > +{
> > > + const struct btf_type *ncit;
> > > + const struct btf_type *nct;
> > > + size_t end;
> > > +
> > > + ncit = btf_type_by_id(btf, btf_nf_conn_ids[1]);
> > > + nct = btf_type_by_id(btf, btf_nf_conn_ids[0]);
> > > +
> > > + if (t != nct && t != ncit) {
> > > + bpf_log(log, "only read is supported\n");
> > > + return -EACCES;
> > > + }
> > > +
> > > + /* `struct nf_conn` and `struct nf_conn___init` have the same layout
> > > + * so we are safe to simply merge offset checks here
> > > + */
> > > + switch (off) {
> > > +#if defined(CONFIG_NF_CONNTRACK_MARK)
> > > + case offsetof(struct nf_conn, mark):
> > > + end = offsetofend(struct nf_conn, mark);
> > > + break;
> > > +#endif
> > > + default:
> > > + bpf_log(log, "no write support to nf_conn at off %d\n", off);
> > > + return -EACCES;
> > > + }
> > > +
> > > + if (off + size > end) {
> > > + bpf_log(log,
> > > + "write access at off %d with size %d beyond the member of nf_conn ended at %zu\n",
> > > + off, size, end);
> > > + return -EACCES;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +int nf_conntrack_btf_struct_access(struct bpf_verifier_log *log,
> > > + const struct btf *btf,
> > > + const struct btf_type *t, int off,
> > > + int size, enum bpf_access_type atype,
> > > + u32 *next_btf_id,
> > > + enum bpf_type_flag *flag)
> > > +{
> > > + int ret = -EACCES;
> > > +
> > > + mutex_lock(&btf_access_lock);
> > > + if (nfct_bsa)
> > > + ret = nfct_bsa(log, btf, t, off, size, atype, next_btf_id, flag);
> > > + mutex_unlock(&btf_access_lock);
> > > +
> > > + return ret;
> > > +}
> >
> > Did you test this for CONFIG_NF_CONNTRACK=m? For me it isn't building :P.
> >
> > It won't work like this. When nf_conntrack is a module, the vmlinux.o
> > of the kernel isn't linked to the object file nf_conntrack_bpf.o.
> > Hence it would be an undefined reference error. You don't see it in
> > BPF CI as we set CONFIG_NF_CONNTRACK=y (to simplify testing).
>
> Sorry about that. Will make sure to test that config setting.
>

No worries. We should probably think about being able to test it
automatically in CI, but not sure how that would be possible...

> > So you need to have code that locks and checks the cb pointer when
> > calling it outside the module, which means the global lock variable
> > and global cb pointer also need to be in the kernel. The module then
> > takes the same lock and sets cb pointer when loading. During unload,
> > it takes the same lock and sets it back to NULL.
> >
> > You can have global variables in vmlinux that you reference from
> > modules. The compiler will emit a relocation for the module object
> > file which will be handled by the kernel during module load.
>
> Sure, I'll take a look. I was trying to keep conntrack symbols out of
> the core object files as much as possible but looks like that won't be
> possible.

Yeah, for now hardcoding is ok I figure, later when we have more
modules doing this we can generalize it.

>
> However, I think to keep the globals symbols in vmlinux we'll need to
> EXPORT_SYMBOL_GPL() some symbols. Hopefully that is OK.
>

Yes, that should be ok.

> There's also some other issues I'm uncovering with duplicate BTF IDs for
> nf_conn. Might have to do a lookup by name instead of the BTF_ID_LIST().
>

I think I also hit this problem back when I was working on these
patches, is it similar to this?
https://lore.kernel.org/bpf/[email protected]

I think this might be a bug in the BTF generation, since there should
only be one BTF ID for a type, either in vmlinux or the module BTF.
Maybe Andrii would be able to confirm.

> > So please test it once with nf_conntrack built as a module before
> > sending the next revision. The only thing you need to do before
> > running ./test_progs -t bpf_nf is loading the module nf_conntrack.ko
> > (and its dependencies, nf_defrag_ipv{4,6}.ko).
>
> Will do.
>
> Thanks again for the reviews,
> Daniel

2022-08-23 19:08:19

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH bpf-next v4 4/5] bpf: Add support for writing to nf_conn:mark

Hi Daniel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Xu/bpf-Remove-duplicate-PTR_TO_BTF_ID-RO-check/20220823-032643
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20220823/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-5) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/496ec6d75c8e8758f93c6b987eee83713c911a05
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Daniel-Xu/bpf-Remove-duplicate-PTR_TO_BTF_ID-RO-check/20220823-032643
git checkout 496ec6d75c8e8758f93c6b987eee83713c911a05
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

ld: net/core/filter.o: in function `tc_cls_act_btf_struct_access':
net/core/filter.c:8644: undefined reference to `nf_conntrack_btf_struct_access'
ld: net/core/filter.o: in function `xdp_btf_struct_access':
>> include/net/net_namespace.h:369: undefined reference to `nf_conntrack_btf_struct_access'
pahole: .tmp_vmlinux.btf: Invalid argument
.btf.vmlinux.bin.o: file not recognized: file format not recognized


vim +369 include/net/net_namespace.h

8f424b5f32d78b Eric Dumazet 2008-11-12 365
0c5c9fb5510633 Eric W. Biederman 2015-03-11 366 static inline struct net *read_pnet(const possible_net_t *pnet)
8f424b5f32d78b Eric Dumazet 2008-11-12 367 {
0c5c9fb5510633 Eric W. Biederman 2015-03-11 368 #ifdef CONFIG_NET_NS
0c5c9fb5510633 Eric W. Biederman 2015-03-11 @369 return pnet->net;
8f424b5f32d78b Eric Dumazet 2008-11-12 370 #else
0c5c9fb5510633 Eric W. Biederman 2015-03-11 371 return &init_net;
8f424b5f32d78b Eric Dumazet 2008-11-12 372 #endif
0c5c9fb5510633 Eric W. Biederman 2015-03-11 373 }
5d1e4468a7705d Denis V. Lunev 2008-04-16 374

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-09-02 12:22:07

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH bpf-next v4 4/5] bpf: Add support for writing to nf_conn:mark

Hi Daniel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url: https://github.com/intel-lab-lkp/linux/commits/Daniel-Xu/bpf-Remove-duplicate-PTR_TO_BTF_ID-RO-check/20220823-032643
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-rhel-8.3-kselftests (https://download.01.org/0day-ci/archive/20220902/[email protected]/config)
compiler: gcc-11 (Debian 11.3.0-5) 11.3.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/496ec6d75c8e8758f93c6b987eee83713c911a05
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Daniel-Xu/bpf-Remove-duplicate-PTR_TO_BTF_ID-RO-check/20220823-032643
git checkout 496ec6d75c8e8758f93c6b987eee83713c911a05
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

ld: net/core/filter.o: in function `tc_cls_act_btf_struct_access':
>> net/core/filter.c:8644: undefined reference to `nf_conntrack_btf_struct_access'
ld: net/core/filter.o: in function `xdp_btf_struct_access':
include/net/net_namespace.h:369: undefined reference to `nf_conntrack_btf_struct_access'
pahole: .tmp_vmlinux.btf: No such file or directory
.btf.vmlinux.bin.o: file not recognized: file format not recognized


vim +8644 net/core/filter.c

8632
8633 static int tc_cls_act_btf_struct_access(struct bpf_verifier_log *log,
8634 const struct btf *btf,
8635 const struct btf_type *t, int off,
8636 int size, enum bpf_access_type atype,
8637 u32 *next_btf_id,
8638 enum bpf_type_flag *flag)
8639 {
8640 if (atype == BPF_READ)
8641 return btf_struct_access(log, btf, t, off, size, atype, next_btf_id,
8642 flag);
8643
> 8644 return nf_conntrack_btf_struct_access(log, btf, t, off, size, atype,
8645 next_btf_id, flag);
8646 }
8647

--
0-DAY CI Kernel Test Service
https://01.org/lkp

2022-09-07 02:50:27

by Daniel Xu

[permalink] [raw]
Subject: Re: [PATCH bpf-next v4 4/5] bpf: Add support for writing to nf_conn:mark

Hi Kumar,

On Tue, Aug 23, 2022 at 04:29:17AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Tue, 23 Aug 2022 at 04:19, Daniel Xu <[email protected]> wrote:

[...]

> > There's also some other issues I'm uncovering with duplicate BTF IDs for
> > nf_conn. Might have to do a lookup by name instead of the BTF_ID_LIST().
> >
>
> I think I also hit this problem back when I was working on these
> patches, is it similar to this?
> https://lore.kernel.org/bpf/[email protected]

Yes, identical I think.

>
> I think this might be a bug in the BTF generation, since there should
> only be one BTF ID for a type, either in vmlinux or the module BTF.
> Maybe Andrii would be able to confirm.

Had to put out some fires last week.

I chased this down a bit today and best I can tell was the `nf_conn`
definitions in BTF were all slightly different.

For example, here were the 3 definitions in nf_conntrack.ko alone:

[88439] STRUCT 'nf_conn' size=296 vlen=11
'ct_general' type_id=67058 bits_offset=0
'lock' type_id=373 bits_offset=64
'timeout' type_id=160 bits_offset=576
'tuplehash' type_id=67235 bits_offset=640
'status' type_id=1 bits_offset=1536
'ct_net' type_id=4298 bits_offset=1600
'__nfct_init_offset' type_id=4213 bits_offset=1664
'master' type_id=88438 bits_offset=1664
'mark' type_id=67192 bits_offset=1728
'ext' type_id=67236 bits_offset=1792
'proto' type_id=67234 bits_offset=1856

[90882] STRUCT 'nf_conn' size=296 vlen=11
'ct_general' type_id=67058 bits_offset=0
'lock' type_id=373 bits_offset=64
'timeout' type_id=160 bits_offset=576
'tuplehash' type_id=67235 bits_offset=640
'status' type_id=1 bits_offset=1536
'ct_net' type_id=90574 bits_offset=1600
'__nfct_init_offset' type_id=4213 bits_offset=1664
'master' type_id=90881 bits_offset=1664
'mark' type_id=67192 bits_offset=1728
'ext' type_id=67236 bits_offset=1792
'proto' type_id=67234 bits_offset=1856

[92469] STRUCT 'nf_conn' size=296 vlen=11
'ct_general' type_id=67058 bits_offset=0
'lock' type_id=373 bits_offset=64
'timeout' type_id=160 bits_offset=576
'tuplehash' type_id=67235 bits_offset=640
'status' type_id=1 bits_offset=1536
'ct_net' type_id=92160 bits_offset=1600
'__nfct_init_offset' type_id=4213 bits_offset=1664
'master' type_id=92468 bits_offset=1664
'mark' type_id=67192 bits_offset=1728
'ext' type_id=67236 bits_offset=1792
'proto' type_id=67234 bits_offset=1856

Note how `master` and `ct_net` all have different BTF IDs. Best I can
tell is that there's some kind of subtle difference in BTF types and
it's confusing the dedup algorithm.

I went and upgraded to latest pahole (built from today's source tree) to
chase the issue down further but the problem went away.

Figured I'd write this up in case someone stumbles onto this in the
future.

[...]

Thanks,
Daniel