Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp10087526rwp; Thu, 20 Jul 2023 14:30:47 -0700 (PDT) X-Google-Smtp-Source: APBJJlHho+EoacZOY4QopprAp21euys8ARbAddISMDSx+RVOtqjJOlma8KjgPkLA1HcSs7aLfYLE X-Received: by 2002:a2e:7205:0:b0:2b9:383b:89fc with SMTP id n5-20020a2e7205000000b002b9383b89fcmr149274ljc.0.1689888647461; Thu, 20 Jul 2023 14:30:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689888647; cv=none; d=google.com; s=arc-20160816; b=HlLhPaez2KCYG3OBaX420mZ/eMmc0mbU5g4E+8ujoMQpFl972Jg4sSN8oplrkVOlPJ AK95T/ihhgM7weHnduJRuZe5xsjA/SK2QlzKkoqrvCVtkVjBLI2MMFQk6PWcyo/ts7R9 iRSDNOgjOtjSoZCHY4zbxt/QOFtgz8Vl1tU+m7m8pwuS10WlG1q6S0y9cFzqUNIZC5GC ycRlmoUmRU8oXqzQUQGSSqpxjW1b3EN99TsfZA5S1/8ripIAP6eXSGdhtkqgkSEp0SUj 4TZ3OG2tEBDl8RzLRgh7hO0KgjnBwW7SBFm6RRrrxsb3tl/LCXn73nIC5YoqGGGMivPs Wmvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :feedback-id:dkim-signature:dkim-signature; bh=d5lEFe6MWMzdqa0TYjxflUpdEEiw+iWUZl1NhG01vS4=; fh=IxigduyaEf0NtlplgXj1Dd8kiPw3ACq4asBsiUyFx7M=; b=bBxNEk6fc4W7VxwU2inG5NQ2UrLZlc7xE09KuMscylqws/zYZvK2bzTPO/N/bp3shL x48OLAtLLonosJAa7Aq/YZgNv284ldILWbAZKJqeec8Gowv8UP5CCV9AkiwIvOnf749X mEBwFDFppdvuxTIVYviudpSNCPLXCmpSCCF4fJ5UXHZHs2ithgAnoA381Qi7dtfBBBpL UGUHkDaaNcggnUbioQfvRG2eDUYVhNZ8GPQDZNZzWapz4T8uj3PgkbkpLDhylFp4/Ph1 pOr16SiOnPbg+STu+K+gpnUMbdIjsC8LEXAH0N2H8sKMbQ8+yLuIoODnKwHGYoMw2Y4W mNNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm2 header.b=gMk+TMT2; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=3VbimdwJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m4-20020a170906848400b00992ae4dfc04si1186953ejx.990.2023.07.20.14.30.22; Thu, 20 Jul 2023 14:30:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@dxuuu.xyz header.s=fm2 header.b=gMk+TMT2; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=3VbimdwJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230263AbjGTU63 (ORCPT + 99 others); Thu, 20 Jul 2023 16:58:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230151AbjGTU6Y (ORCPT ); Thu, 20 Jul 2023 16:58:24 -0400 Received: from wnew1-smtp.messagingengine.com (wnew1-smtp.messagingengine.com [64.147.123.26]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36AEB273D; Thu, 20 Jul 2023 13:58:20 -0700 (PDT) Received: from compute6.internal (compute6.nyi.internal [10.202.2.47]) by mailnew.west.internal (Postfix) with ESMTP id 436892B0015E; Thu, 20 Jul 2023 16:58:16 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Thu, 20 Jul 2023 16:58:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dxuuu.xyz; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1689886695; x= 1689893895; bh=d5lEFe6MWMzdqa0TYjxflUpdEEiw+iWUZl1NhG01vS4=; b=g Mk+TMT2a9ICoX6oWam/hrdeXiQcOOiVJKC8IiUyGcT9TeF96PXN6BUch/7yPxHqF 5X1T9EgzR1IV3nZTq5JhUm9FKn99q4ZGYnxDkDnbsNAj3NpSt/H9OzmF0v+zst/Y xfoC8ukOF15KO3dxIgflJktfxtIDtyJqCRjmPa9ZPfKGX01K12uiS75Y0StnwwCP 58NxMgfupnmLy+y8aWoNAGtD+q9PMOTY0XvTsA+ax3BcbQ5+3BxJcHfak1ZVhpZA zOzAeaQ9/tIveaF4eLqWg6B5uBrC80dAf8uVufu77U3KJn6X/WsgPk4V/JB3EFV1 zz42MKqMYnhQM2WDa+Xpw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1689886695; x= 1689893895; bh=d5lEFe6MWMzdqa0TYjxflUpdEEiw+iWUZl1NhG01vS4=; b=3 VbimdwJ5BF6UzTMerYrVKxtJewm9RdYXXXSVM1TClnBYoolUsQs0Xu35+em4TkQu 9g4KwoHHe1O59wbEi7GvVS9nZ+wo7C/3q5LleI5KPi57C5E+RBS8+5tVhbW7yXk2 jyFGfEQyeHRGwHKJUVNfzyisBOoPLtIolFujghKy5IeSwaUUfvD7LKQtNt7quweN 7e6gZezWJy4cUiqmxYJSepHUY+F2BV1hfnik5kezIuoCsLWhju7uLDIwXG6EU6uF KYf+G1TE8I9C7DSfzP8/3RzIQMynhKOfTfpdLPjdDstGiOERnCIJ1bENin2F+J6a DA9LI5LckbhQybQYTQUrA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrhedtgdduheegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucgfrhhlucfvnfffucdljedtmdenucfjughrpefhvf evufffkffojghfggfgsedtkeertdertddtnecuhfhrohhmpeffrghnihgvlhcuighuuceo ugiguhesugiguhhuuhdrgiihiieqnecuggftrfgrthhtvghrnhepjeegveeljeehvdevud duffffleelveejueegjedvhedvhedvheethfejgedtieeinecuffhomhgrihhnpehnvght fhhilhhtvghrrdhpfhenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrih hlfhhrohhmpegugihusegugihuuhhurdighiii X-ME-Proxy: Feedback-ID: i6a694271:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 20 Jul 2023 16:58:14 -0400 (EDT) From: Daniel Xu To: daniel@iogearbox.net, kadlec@netfilter.org, ast@kernel.org, pablo@netfilter.org, kuba@kernel.org, davem@davemloft.net, andrii@kernel.org, edumazet@google.com, pabeni@redhat.com, fw@strlen.de, alexei.starovoitov@gmail.com Cc: martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org, dsahern@kernel.org Subject: [PATCH bpf-next v5 2/5] netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link Date: Thu, 20 Jul 2023 14:57:36 -0600 Message-ID: <690a1b09db84547b0f0c73654df3f4950f1262b7.1689884827.git.dxu@dxuuu.xyz> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This commit adds support for enabling IP defrag using pre-existing netfilter defrag support. Basically all the flag does is bump a refcnt while the link the active. Checks are also added to ensure the prog requesting defrag support is run _after_ netfilter defrag hooks. We also take care to avoid any issues w.r.t. module unloading -- while defrag is active on a link, the module is prevented from unloading. Signed-off-by: Daniel Xu --- include/uapi/linux/bpf.h | 5 ++ net/netfilter/nf_bpf_link.c | 116 +++++++++++++++++++++++++++++---- tools/include/uapi/linux/bpf.h | 5 ++ 3 files changed, 115 insertions(+), 11 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 739c15906a65..12a5480314a2 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1187,6 +1187,11 @@ enum bpf_perf_event_type { */ #define BPF_F_KPROBE_MULTI_RETURN (1U << 0) +/* link_create.netfilter.flags used in LINK_CREATE command for + * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation. + */ +#define BPF_F_NETFILTER_IP_DEFRAG (1U << 0) + /* When BPF ldimm64's insn[0].src_reg != 0 then this can have * the following extensions: * diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c index c36da56d756f..95d71b35f81f 100644 --- a/net/netfilter/nf_bpf_link.c +++ b/net/netfilter/nf_bpf_link.c @@ -1,6 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 #include #include +#include +#include #include #include @@ -23,8 +25,88 @@ struct bpf_nf_link { struct nf_hook_ops hook_ops; struct net *net; u32 dead; + const struct nf_defrag_hook *defrag_hook; }; +static const struct nf_defrag_hook * +get_proto_defrag_hook(struct bpf_nf_link *link, + const struct nf_defrag_hook __rcu *global_hook, + const char *mod) +{ + const struct nf_defrag_hook *hook; + int err; + + /* RCU protects us from races against module unloading */ + rcu_read_lock(); + hook = rcu_dereference(global_hook); + if (!hook) { + rcu_read_unlock(); + err = request_module(mod); + if (err) + return ERR_PTR(err < 0 ? err : -EINVAL); + + rcu_read_lock(); + hook = rcu_dereference(global_hook); + } + + if (hook && try_module_get(hook->owner)) { + /* Once we have a refcnt on the module, we no longer need RCU */ + hook = rcu_pointer_handoff(hook); + } else { + WARN_ONCE(!hook, "%s has bad registration", mod); + hook = ERR_PTR(-ENOENT); + } + rcu_read_unlock(); + + if (!IS_ERR(hook)) { + err = hook->enable(link->net); + if (err) { + module_put(hook->owner); + hook = ERR_PTR(err); + } + } + + return hook; +} + +static int bpf_nf_enable_defrag(struct bpf_nf_link *link) +{ + const struct nf_defrag_hook __maybe_unused *hook; + + switch (link->hook_ops.pf) { +#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) + case NFPROTO_IPV4: + hook = get_proto_defrag_hook(link, nf_defrag_v4_hook, "nf_defrag_ipv4"); + if (IS_ERR(hook)) + return PTR_ERR(hook); + + link->defrag_hook = hook; + return 0; +#endif +#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) + case NFPROTO_IPV6: + hook = get_proto_defrag_hook(link, nf_defrag_v6_hook, "nf_defrag_ipv6"); + if (IS_ERR(hook)) + return PTR_ERR(hook); + + link->defrag_hook = hook; + return 0; +#endif + default: + return -EAFNOSUPPORT; + } +} + +static void bpf_nf_disable_defrag(struct bpf_nf_link *link) +{ + const struct nf_defrag_hook *hook = link->defrag_hook; + + if (!hook) + return; + hook->disable(link->net); + module_put(hook->owner); +} + static void bpf_nf_link_release(struct bpf_link *link) { struct bpf_nf_link *nf_link = container_of(link, struct bpf_nf_link, link); @@ -37,6 +119,8 @@ static void bpf_nf_link_release(struct bpf_link *link) */ if (!cmpxchg(&nf_link->dead, 0, 1)) nf_unregister_net_hook(nf_link->net, &nf_link->hook_ops); + + bpf_nf_disable_defrag(nf_link); } static void bpf_nf_link_dealloc(struct bpf_link *link) @@ -92,6 +176,8 @@ static const struct bpf_link_ops bpf_nf_link_lops = { static int bpf_nf_check_pf_and_hooks(const union bpf_attr *attr) { + int prio; + switch (attr->link_create.netfilter.pf) { case NFPROTO_IPV4: case NFPROTO_IPV6: @@ -102,19 +188,18 @@ static int bpf_nf_check_pf_and_hooks(const union bpf_attr *attr) return -EAFNOSUPPORT; } - if (attr->link_create.netfilter.flags) + if (attr->link_create.netfilter.flags & ~BPF_F_NETFILTER_IP_DEFRAG) return -EOPNOTSUPP; - /* make sure conntrack confirm is always last. - * - * In the future, if userspace can e.g. request defrag, then - * "defrag_requested && prio before NF_IP_PRI_CONNTRACK_DEFRAG" - * should fail. - */ - switch (attr->link_create.netfilter.priority) { - case NF_IP_PRI_FIRST: return -ERANGE; /* sabotage_in and other warts */ - case NF_IP_PRI_LAST: return -ERANGE; /* e.g. conntrack confirm */ - } + /* make sure conntrack confirm is always last */ + prio = attr->link_create.netfilter.priority; + if (prio == NF_IP_PRI_FIRST) + return -ERANGE; /* sabotage_in and other warts */ + else if (prio == NF_IP_PRI_LAST) + return -ERANGE; /* e.g. conntrack confirm */ + else if ((attr->link_create.netfilter.flags & BPF_F_NETFILTER_IP_DEFRAG) && + prio <= NF_IP_PRI_CONNTRACK_DEFRAG) + return -ERANGE; /* cannot use defrag if prog runs before nf_defrag */ return 0; } @@ -149,6 +234,7 @@ int bpf_nf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) link->net = net; link->dead = false; + link->defrag_hook = NULL; err = bpf_link_prime(&link->link, &link_primer); if (err) { @@ -156,6 +242,14 @@ int bpf_nf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) return err; } + if (attr->link_create.netfilter.flags & BPF_F_NETFILTER_IP_DEFRAG) { + err = bpf_nf_enable_defrag(link); + if (err) { + bpf_link_cleanup(&link_primer); + return err; + } + } + err = nf_register_net_hook(net, &link->hook_ops); if (err) { bpf_link_cleanup(&link_primer); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 739c15906a65..12a5480314a2 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1187,6 +1187,11 @@ enum bpf_perf_event_type { */ #define BPF_F_KPROBE_MULTI_RETURN (1U << 0) +/* link_create.netfilter.flags used in LINK_CREATE command for + * BPF_PROG_TYPE_NETFILTER to enable IP packet defragmentation. + */ +#define BPF_F_NETFILTER_IP_DEFRAG (1U << 0) + /* When BPF ldimm64's insn[0].src_reg != 0 then this can have * the following extensions: * -- 2.41.0