2021-04-19 12:22:12

by Kumar Kartikeya Dwivedi

[permalink] [raw]
Subject: [PATCH bpf-next v2 0/4] Add TC-BPF API

This is the second version of the TC-BPF series.

It adds a simple API that uses netlink to attach the tc filter and its bpf
classifier. Currently, a user needs to shell out to the tc command line to be
able to create filters and attach SCHED_CLS programs as classifiers. With the
help of this API, it will be possible to use libbpf for doing all parts of bpf
program setup and attach.

Direct action is now the default, and currently no way to disable it has been
provided. This also means that SCHED_ACT programs are a lot less useful, so
support for them has been removed for now. In the future, if someone comes up
with a convincing use case where the direct action mode doesn't serve their
needs, a simple extension that allows disabling direct action mode and passing
the list of actions that would be bound to the classifier can be added.

In an effort to keep discussion focused, this series doesn't have the high level
TC-BPF API. It was clear that there is a need for a bpf_link API in the kernel,
hence that will be submitted as a separate patchset.

The individual commit messages contain more details, and also a brief summary of
the API.

Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/[email protected]

* netlink helpers have been renamed to object_action style.
* attach_id now only contains attributes that are not explicitly set. Only
the bare minimum info is kept in it.
* protocol is now an optional and defaults to ETH_P_ALL.
* direct-action mode is default, and cannot be unset for now.
* skip_sw and skip_hw options have also been removed.
* bpf_tc_cls_info struct now also returns the bpf program tag and id, as
available in the netlink response. This came up as a requirement during
discussion with people wanting to use this functionality.
* support for attaching SCHED_ACT programs has been dropped, as it isn't
useful without any support for binding loaded actions to a classifier.
* the distinction between dev and block API has been dropped, there is now
a single set of functions and user has to pass the special ifindex value
to indicate operation on a shared filter block on their own.
* The high level API returning a bpf_link is gone. This was already non-
functional for pinning and typical ownership semantics. Instead, a separate
patchset will be sent adding a bpf_link API for attaching SCHED_CLS progs to
the kernel, and its corresponding libbpf API.
* The clsact qdisc is now setup automatically in a best-effort fashion whenever
user passes in the clsact ingress or egress parent id. This is done with
exclusive mode, such that if an ingress or clsact qdisc is already set up,
we skip the setup and move on with filter creation.
* Other minor changes that came up during the course of discussion and rework.

Kumar Kartikeya Dwivedi (4):
tools: pkt_cls.h: sync with kernel sources
libbpf: add helpers for preparing netlink attributes
libbpf: add low level TC-BPF API
libbpf: add selftests for TC-BPF API

tools/include/uapi/linux/pkt_cls.h | 174 +++++++-
tools/lib/bpf/Makefile | 3 +
tools/lib/bpf/libbpf.h | 52 +++
tools/lib/bpf/libbpf.map | 5 +
tools/lib/bpf/netlink.c | 414 ++++++++++++++++--
tools/lib/bpf/nlattr.h | 48 ++
.../selftests/bpf/prog_tests/test_tc_bpf.c | 112 +++++
.../selftests/bpf/progs/test_tc_bpf_kern.c | 12 +
8 files changed, 789 insertions(+), 31 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c

--
2.30.2


2021-04-19 13:09:09

by Kumar Kartikeya Dwivedi

[permalink] [raw]
Subject: [PATCH bpf-next v2 2/4] libbpf: add helpers for preparing netlink attributes

This change introduces a few helpers to wrap open coded attribute
preparation in netlink.c.

Every nested attribute's closure must happen using the helper
nlattr_end_nested, which sets its length properly. NLA_F_NESTED is
enforeced using nlattr_begin_nested helper. Other simple attributes
can be added directly.

The maxsz parameter corresponds to the size of the request structure
which is being filled in, so for instance with req being:

struct {
struct nlmsghdr nh;
struct tcmsg t;
char buf[4096];
} req;

Then, maxsz should be sizeof(req).

This change also converts the open coded attribute preparation with the
helpers. Note that the only failure the internal call to nlattr_add
could result in the nested helper would be -EMSGSIZE, hence that is what
we return to our caller.

Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
---
tools/lib/bpf/netlink.c | 37 ++++++++++++++-----------------
tools/lib/bpf/nlattr.h | 48 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index d2cb28e9ef52..c79e30484e81 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -135,7 +135,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
__u32 flags)
{
int sock, seq = 0, ret;
- struct nlattr *nla, *nla_xdp;
+ struct nlattr *nla;
struct {
struct nlmsghdr nh;
struct ifinfomsg ifinfo;
@@ -157,36 +157,31 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
req.ifinfo.ifi_index = ifindex;

/* started nested attribute for XDP */
- nla = (struct nlattr *)(((char *)&req)
- + NLMSG_ALIGN(req.nh.nlmsg_len));
- nla->nla_type = NLA_F_NESTED | IFLA_XDP;
- nla->nla_len = NLA_HDRLEN;
+ nla = nlattr_begin_nested(&req.nh, sizeof(req), IFLA_XDP);
+ if (!nla) {
+ ret = -EMSGSIZE;
+ goto cleanup;
+ }

/* add XDP fd */
- nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
- nla_xdp->nla_type = IFLA_XDP_FD;
- nla_xdp->nla_len = NLA_HDRLEN + sizeof(int);
- memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd));
- nla->nla_len += nla_xdp->nla_len;
+ ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FD, &fd, sizeof(fd));
+ if (ret < 0)
+ goto cleanup;

/* if user passed in any flags, add those too */
if (flags) {
- nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
- nla_xdp->nla_type = IFLA_XDP_FLAGS;
- nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags);
- memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags));
- nla->nla_len += nla_xdp->nla_len;
+ ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_FLAGS, &flags, sizeof(flags));
+ if (ret < 0)
+ goto cleanup;
}

if (flags & XDP_FLAGS_REPLACE) {
- nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len);
- nla_xdp->nla_type = IFLA_XDP_EXPECTED_FD;
- nla_xdp->nla_len = NLA_HDRLEN + sizeof(old_fd);
- memcpy((char *)nla_xdp + NLA_HDRLEN, &old_fd, sizeof(old_fd));
- nla->nla_len += nla_xdp->nla_len;
+ ret = nlattr_add(&req.nh, sizeof(req), IFLA_XDP_EXPECTED_FD, &flags, sizeof(flags));
+ if (ret < 0)
+ goto cleanup;
}

- req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len);
+ nlattr_end_nested(&req.nh, nla);

if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
ret = -errno;
diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
index 6cc3ac91690f..1c94cdb6e89d 100644
--- a/tools/lib/bpf/nlattr.h
+++ b/tools/lib/bpf/nlattr.h
@@ -10,7 +10,10 @@
#define __LIBBPF_NLATTR_H

#include <stdint.h>
+#include <string.h>
+#include <errno.h>
#include <linux/netlink.h>
+
/* avoid multiple definition of netlink features */
#define __LINUX_NETLINK_H

@@ -103,4 +106,49 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int maxtype,

int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);

+static inline struct nlattr *nla_data(struct nlattr *nla)
+{
+ return (struct nlattr *)((char *)nla + NLA_HDRLEN);
+}
+
+static inline struct nlattr *nh_tail(struct nlmsghdr *nh)
+{
+ return (struct nlattr *)((char *)nh + NLMSG_ALIGN(nh->nlmsg_len));
+}
+
+static inline int nlattr_add(struct nlmsghdr *nh, size_t maxsz, int type,
+ const void *data, int len)
+{
+ struct nlattr *nla;
+
+ if (NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(NLA_HDRLEN + len) > maxsz)
+ return -EMSGSIZE;
+ if ((!data && len) || (data && !len))
+ return -EINVAL;
+
+ nla = nh_tail(nh);
+ nla->nla_type = type;
+ nla->nla_len = NLA_HDRLEN + len;
+ if (data)
+ memcpy(nla_data(nla), data, len);
+ nh->nlmsg_len = NLMSG_ALIGN(nh->nlmsg_len) + NLA_ALIGN(nla->nla_len);
+ return 0;
+}
+
+static inline struct nlattr *nlattr_begin_nested(struct nlmsghdr *nh,
+ size_t maxsz, int type)
+{
+ struct nlattr *tail;
+
+ tail = nh_tail(nh);
+ if (nlattr_add(nh, maxsz, type | NLA_F_NESTED, NULL, 0))
+ return NULL;
+ return tail;
+}
+
+static inline void nlattr_end_nested(struct nlmsghdr *nh, struct nlattr *tail)
+{
+ tail->nla_len = (char *)nh_tail(nh) - (char *)tail;
+}
+
#endif /* __LIBBPF_NLATTR_H */
--
2.30.2

2021-04-19 13:56:40

by Kumar Kartikeya Dwivedi

[permalink] [raw]
Subject: [PATCH bpf-next v2 4/4] libbpf: add selftests for TC-BPF API

This adds some basic tests for the low level bpf_tc_cls_* API.

Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
---
.../selftests/bpf/prog_tests/test_tc_bpf.c | 112 ++++++++++++++++++
.../selftests/bpf/progs/test_tc_bpf_kern.c | 12 ++
2 files changed, 124 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
new file mode 100644
index 000000000000..945f3a1a72f8
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <linux/err.h>
+#include <linux/limits.h>
+#include <bpf/libbpf.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <test_progs.h>
+#include <linux/if_ether.h>
+
+#define LO_IFINDEX 1
+
+static int test_tc_cls_internal(int fd, __u32 parent_id)
+{
+ DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, .handle = 1, .priority = 10,
+ .class_id = TC_H_MAKE(1UL << 16, 1),
+ .chain_index = 5);
+ struct bpf_tc_cls_attach_id id = {};
+ struct bpf_tc_cls_info info = {};
+ int ret;
+
+ ret = bpf_tc_cls_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
+ if (CHECK_FAIL(ret < 0))
+ return ret;
+
+ ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
+ if (CHECK_FAIL(ret < 0))
+ goto end;
+
+ ret = -1;
+
+ if (CHECK_FAIL(info.id.handle != id.handle) ||
+ CHECK_FAIL(info.id.chain_index != id.chain_index) ||
+ CHECK_FAIL(info.id.priority != id.priority) ||
+ CHECK_FAIL(info.id.handle != 1) ||
+ CHECK_FAIL(info.id.priority != 10) ||
+ CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)) ||
+ CHECK_FAIL(info.id.chain_index != 5))
+ goto end;
+
+ ret = bpf_tc_cls_replace(fd, LO_IFINDEX, parent_id, &opts, &id);
+ if (CHECK_FAIL(ret < 0))
+ return ret;
+
+ if (CHECK_FAIL(info.id.handle != 1) ||
+ CHECK_FAIL(info.id.priority != 10) ||
+ CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)))
+ goto end;
+
+ /* Demonstrate changing attributes */
+ opts.class_id = TC_H_MAKE(1UL << 16, 2);
+
+ ret = bpf_tc_cls_change(fd, LO_IFINDEX, parent_id, &opts, &info.id);
+ if (CHECK_FAIL(ret < 0))
+ goto end;
+
+ ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
+ if (CHECK_FAIL(ret < 0))
+ goto end;
+
+ if (CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 2)))
+ goto end;
+ if (CHECK_FAIL((info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT) != 1))
+ goto end;
+
+end:
+ ret = bpf_tc_cls_detach(LO_IFINDEX, parent_id, &id);
+ CHECK_FAIL(ret < 0);
+ return ret;
+}
+
+void test_test_tc_bpf(void)
+{
+ const char *file = "./test_tc_bpf_kern.o";
+ struct bpf_program *clsp;
+ struct bpf_object *obj;
+ int cls_fd, ret;
+
+ obj = bpf_object__open(file);
+ if (CHECK_FAIL(IS_ERR_OR_NULL(obj)))
+ return;
+
+ clsp = bpf_object__find_program_by_title(obj, "classifier");
+ if (CHECK_FAIL(IS_ERR_OR_NULL(clsp)))
+ goto end;
+
+ ret = bpf_object__load(obj);
+ if (CHECK_FAIL(ret < 0))
+ goto end;
+
+ cls_fd = bpf_program__fd(clsp);
+
+ system("tc qdisc del dev lo clsact");
+
+ ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
+ if (CHECK_FAIL(ret < 0))
+ goto end;
+
+ if (CHECK_FAIL(system("tc qdisc del dev lo clsact")))
+ goto end;
+
+ ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
+ if (CHECK_FAIL(ret < 0))
+ goto end;
+
+ CHECK_FAIL(system("tc qdisc del dev lo clsact"));
+
+end:
+ bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
new file mode 100644
index 000000000000..3dd40e21af8e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+// Dummy prog to test TC-BPF API
+
+SEC("classifier")
+int cls(struct __sk_buff *skb)
+{
+ return 0;
+}
--
2.30.2

2021-04-20 04:39:42

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH bpf-next v2 4/4] libbpf: add selftests for TC-BPF API

On Mon, Apr 19, 2021 at 5:18 AM Kumar Kartikeya Dwivedi
<[email protected]> wrote:
>
> This adds some basic tests for the low level bpf_tc_cls_* API.
>
> Reviewed-by: Toke Høiland-Jørgensen <[email protected]>
> Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
> ---
> .../selftests/bpf/prog_tests/test_tc_bpf.c | 112 ++++++++++++++++++
> .../selftests/bpf/progs/test_tc_bpf_kern.c | 12 ++
> 2 files changed, 124 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> new file mode 100644
> index 000000000000..945f3a1a72f8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <linux/err.h>
> +#include <linux/limits.h>
> +#include <bpf/libbpf.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <test_progs.h>
> +#include <linux/if_ether.h>
> +
> +#define LO_IFINDEX 1
> +
> +static int test_tc_cls_internal(int fd, __u32 parent_id)
> +{
> + DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, .handle = 1, .priority = 10,
> + .class_id = TC_H_MAKE(1UL << 16, 1),
> + .chain_index = 5);
> + struct bpf_tc_cls_attach_id id = {};
> + struct bpf_tc_cls_info info = {};
> + int ret;
> +
> + ret = bpf_tc_cls_attach(fd, LO_IFINDEX, parent_id, &opts, &id);
> + if (CHECK_FAIL(ret < 0))
> + return ret;
> +
> + ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
> + if (CHECK_FAIL(ret < 0))
> + goto end;
> +
> + ret = -1;
> +
> + if (CHECK_FAIL(info.id.handle != id.handle) ||
> + CHECK_FAIL(info.id.chain_index != id.chain_index) ||
> + CHECK_FAIL(info.id.priority != id.priority) ||
> + CHECK_FAIL(info.id.handle != 1) ||
> + CHECK_FAIL(info.id.priority != 10) ||
> + CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)) ||
> + CHECK_FAIL(info.id.chain_index != 5))
> + goto end;
> +
> + ret = bpf_tc_cls_replace(fd, LO_IFINDEX, parent_id, &opts, &id);
> + if (CHECK_FAIL(ret < 0))
> + return ret;
> +
> + if (CHECK_FAIL(info.id.handle != 1) ||
> + CHECK_FAIL(info.id.priority != 10) ||
> + CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)))
> + goto end;
> +
> + /* Demonstrate changing attributes */
> + opts.class_id = TC_H_MAKE(1UL << 16, 2);
> +
> + ret = bpf_tc_cls_change(fd, LO_IFINDEX, parent_id, &opts, &info.id);
> + if (CHECK_FAIL(ret < 0))
> + goto end;
> +
> + ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, &info);
> + if (CHECK_FAIL(ret < 0))
> + goto end;
> +
> + if (CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 2)))
> + goto end;
> + if (CHECK_FAIL((info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT) != 1))
> + goto end;
> +
> +end:
> + ret = bpf_tc_cls_detach(LO_IFINDEX, parent_id, &id);
> + CHECK_FAIL(ret < 0);
> + return ret;
> +}
> +
> +void test_test_tc_bpf(void)
> +{
> + const char *file = "./test_tc_bpf_kern.o";
> + struct bpf_program *clsp;
> + struct bpf_object *obj;
> + int cls_fd, ret;
> +
> + obj = bpf_object__open(file);
> + if (CHECK_FAIL(IS_ERR_OR_NULL(obj)))
> + return;
> +
> + clsp = bpf_object__find_program_by_title(obj, "classifier");
> + if (CHECK_FAIL(IS_ERR_OR_NULL(clsp)))
> + goto end;
> +
> + ret = bpf_object__load(obj);
> + if (CHECK_FAIL(ret < 0))
> + goto end;
> +
> + cls_fd = bpf_program__fd(clsp);
> +
> + system("tc qdisc del dev lo clsact");
> +
> + ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
> + if (CHECK_FAIL(ret < 0))
> + goto end;
> +
> + if (CHECK_FAIL(system("tc qdisc del dev lo clsact")))
> + goto end;
> +
> + ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
> + if (CHECK_FAIL(ret < 0))
> + goto end;
> +
> + CHECK_FAIL(system("tc qdisc del dev lo clsact"));

please don't use CHECK_FAIL. And prefer ASSERT_xxx over CHECK().

> +
> +end:
> + bpf_object__close(obj);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> new file mode 100644
> index 000000000000..3dd40e21af8e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +
> +// Dummy prog to test TC-BPF API

no C++-style comments, please (except for SPDX header, of course)
> +
> +SEC("classifier")
> +int cls(struct __sk_buff *skb)
> +{
> + return 0;
> +}
> --
> 2.30.2
>