Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1869805rwb; Thu, 15 Dec 2022 15:52:42 -0800 (PST) X-Google-Smtp-Source: AA0mqf7f93cb9ypJlnonV5fPwdfJxV2USa4LLH+wFM4lJlCIpcLOC7yWFi1ujHilWdxexllVTZHU X-Received: by 2002:a05:6a20:a591:b0:af:6d40:9883 with SMTP id bc17-20020a056a20a59100b000af6d409883mr10045290pzb.18.1671148362729; Thu, 15 Dec 2022 15:52:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671148362; cv=none; d=google.com; s=arc-20160816; b=r0cDxqdPw9ioeubX4mhisklFK2Argk9FxlhgC0Pcr6QqbzEPGAOnPiRZY7g6iEdtHz jPm+F7mYxL4h1iFWz60bhCLBGC/ncx4CAo/rHbgE+f9grq50Z2C1hurKJzflVTtyFWaV 2v+Ql86B3aNp/WgU/5K8ViRkIM3S0Il8qiNApL+4G4wrajgeITGfb436JdIBnbMzjhR2 r+KpbOLY6ku8pxR6Vx+6BDlz3ms9oOkAyLlWTmfdAK1KkcxtI+ZkRF59eaDPmZCPPHFP U/8zYljLPnMRYw7PwwcRxmExz+SAz04EykH33OtAcX3jLII+TVf3+QQ9FLMGT/3yf4kN AjlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=3Uv1dUVNGt3uvLZJzzj6dID2cJYnx8YBo+8DjkFkW3M=; b=nWWIBzQWRi46yMmzYgIoB3aLBvXmz21mnWOiZd9tDy0/N9zyQbwSyn2PwZt4ZI9/N2 M94yjYlL9fdSz0FIXgcHGhyFovdvkGBfB+bNAdgyliz5keOwBaSnlzDN1vSjeH4MXPZv C3p0nzni9dYvNXIQ1jbG7ueZHmeCq3bGxc2+4aMdXI5b+rZTMC3IU0co7P5ttsKfa9Ca MeyMcfV7OiKrwyTKdNSxRo3v23MgcZHQ1/FaV+43vyicoeHgdsBQ7TauU0dy/vXgDyhv 32H9W4ZS+N/G9fGAr0MdFodm924eSu2yO9ZUkLFtgjoSKePrpzJwu21cGX1TmbDBiYIK dUaA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x10-20020a63fe4a000000b00476d95c5b54si865976pgj.431.2022.12.15.15.52.33; Thu, 15 Dec 2022 15:52:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229911AbiLOWcP (ORCPT + 68 others); Thu, 15 Dec 2022 17:32:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229517AbiLOWcN (ORCPT ); Thu, 15 Dec 2022 17:32:13 -0500 Received: from www62.your-server.de (www62.your-server.de [213.133.104.62]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF88657B58; Thu, 15 Dec 2022 14:32:11 -0800 (PST) Received: from sslproxy04.your-server.de ([78.46.152.42]) by www62.your-server.de with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1p5wld-000Kma-NB; Thu, 15 Dec 2022 23:31:53 +0100 Received: from [85.1.206.226] (helo=linux.home) by sslproxy04.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1p5wld-000BfT-AM; Thu, 15 Dec 2022 23:31:53 +0100 Subject: Re: [PATCH bpf-next 3/6] bpf, net, frags: Add bpf_ip_check_defrag() kfunc To: Daniel Xu , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Eric Dumazet , Jakub Kicinski , Paolo Abeni Cc: ppenkov@aviatrix.com, dbird@aviatrix.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org References: <1f48a340a898c4d22d65e0e445dbf15f72081b9a.1671049840.git.dxu@dxuuu.xyz> From: Daniel Borkmann Message-ID: <451b291a-7798-cfe2-84da-815937b54f70@iogearbox.net> Date: Thu, 15 Dec 2022 23:31:52 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <1f48a340a898c4d22d65e0e445dbf15f72081b9a.1671049840.git.dxu@dxuuu.xyz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.103.7/26751/Thu Dec 15 09:20:56 2022) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Daniel, Thanks for working on this! On 12/15/22 12:25 AM, Daniel Xu wrote: [...] > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +__diag_push(); > +__diag_ignore_all("-Wmissing-prototypes", > + "Global functions as their definitions will be in ip_fragment BTF"); > + > +/* bpf_ip_check_defrag - Defragment an ipv4 packet > + * > + * This helper takes an skb as input. If this skb successfully reassembles > + * the original packet, the skb is updated to contain the original, reassembled > + * packet. > + * > + * Otherwise (on error or incomplete reassembly), the input skb remains > + * unmodified. > + * > + * Parameters: > + * @ctx - Pointer to program context (skb) > + * @netns - Child network namespace id. If value is a negative signed > + * 32-bit integer, the netns of the device in the skb is used. > + * > + * Return: > + * 0 on successfully reassembly or non-fragmented packet. Negative value on > + * error or incomplete reassembly. > + */ > +int bpf_ip_check_defrag(struct __sk_buff *ctx, u64 netns) small nit: for sk lookup helper we've used u32 netns_id, would be nice to have this consistent here as well. > +{ > + struct sk_buff *skb = (struct sk_buff *)ctx; > + struct sk_buff *skb_cpy, *skb_out; > + struct net *caller_net; > + struct net *net; > + int mac_len; > + void *mac; > + > + if (unlikely(!((s32)netns < 0 || netns <= S32_MAX))) > + return -EINVAL; > + > + caller_net = skb->dev ? dev_net(skb->dev) : sock_net(skb->sk); > + if ((s32)netns < 0) { > + net = caller_net; > + } else { > + net = get_net_ns_by_id(caller_net, netns); > + if (unlikely(!net)) > + return -EINVAL; > + } > + > + mac_len = skb->mac_len; > + skb_cpy = skb_copy(skb, GFP_ATOMIC); > + if (!skb_cpy) > + return -ENOMEM; Given slow path, this idea is expensive but okay. Maybe in future it could be lifted which might be a bigger lift to teach verifier that input ctx cannot be accessed anymore.. but then frags are very much discouraged either way and bpf_ip_check_defrag() might only apply in corner case situations (like DNS, etc). > + skb_out = ip_check_defrag(net, skb_cpy, IP_DEFRAG_BPF); > + if (IS_ERR(skb_out)) > + return PTR_ERR(skb_out); Looks like ip_check_defrag() can gracefully handle IPv6 packet. It will just return back skb_cpy pointer in that case. However, this brings me to my main complaint.. I don't think we should merge anything IPv4-related without also having IPv6 equivalent support, otherwise we're building up tech debt, so pls also add support for the latter. > + skb_morph(skb, skb_out); > + kfree_skb(skb_out); > + > + /* ip_check_defrag() does not maintain mac header, so push empty header > + * in so prog sees the correct layout. The empty mac header will be > + * later pulled from cls_bpf. > + */ > + mac = skb_push(skb, mac_len); > + memset(mac, 0, mac_len); > + bpf_compute_data_pointers(skb); > + > + return 0; > +} > + Thanks, Daniel