Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp26549588rwd; Mon, 3 Jul 2023 11:12:05 -0700 (PDT) X-Google-Smtp-Source: APBJJlHPzPCFBT+/8PZOTTrta/F8n/9+6Vn9Cv83+KwMZvjhi+zECCZ6U5KL/jV4XoUKZ0kD9spo X-Received: by 2002:a05:6a00:2393:b0:682:140c:245b with SMTP id f19-20020a056a00239300b00682140c245bmr10510947pfc.5.1688407925469; Mon, 03 Jul 2023 11:12:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688407925; cv=none; d=google.com; s=arc-20160816; b=wvbeQ3HjWGbn2+DxuiTTUeqpM+FAN1Usx0Krs1j/Hbakqq6IMZh4V5Mm8yeWwSA1km NfrZ+M4fof/Mor1c8kFPK8Mh8oQdGHb4R7I3QRlbCafSfnO1dwQQPMe0Vr89yJ2LSRkF cnsXFb8wNi4TRfYhK0MGJur7UioSx4hPjw3cwSM4Je4fErIWm59U3l1XicsQsSi1ij7s cI+E4tF6SboJYxu+QlMLw650/ETiJLwYehgZWPFQ092iN5q1cVYJhJGbYKXiNPRYk2u5 EcR3cIJDYupK901eT4E6tt6t+OmNNNHBzZllmZ3CfRlI3ufioceuIZHax1o6z0MWxPel hohg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=WNJoh2OY+rygLAHWugxnfL6Ogm20QtC0KZA2iDerPqU=; fh=h7ZvB0NJhZqMrmdk7wvWm7vOy+SnJthbx1KvMfLl36k=; b=UqNaf6b/0kMFgjwASS3jby4WKyv0nZLCOKEfG2iofv7AnDVNVX2WHDqHYjaZey9RAn gxqYWmL8fMkPD5ny0mZv9T4tOtZpfQU2SdFRWg7vG0r/T8oC01RLVG8Bd+EbbFLkC/7f s0tNI8aXU0BHaDL+8YEuWjda+Ooo0ZBCaPVxrrKNHAjn76AlXlout7SlEDkZ28EjOMgh 0n+IrghMCq7TN2N0LYopiEo2Y3jN/oVIdquQuppnNTPPPHDg9X/yCyfUU/N7dOInCxgi 9yidwT62XktPQw2ZtSneqCyBKryWo3psUlS46MhDQpPidtSjZyTghO3LpFV/BdQwnOxW oYHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bu11-20020a056a00410b00b0067dd7254cb0si12512826pfb.39.2023.07.03.11.11.50; Mon, 03 Jul 2023 11:12:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229844AbjGCRw5 (ORCPT + 99 others); Mon, 3 Jul 2023 13:52:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229450AbjGCRwz (ORCPT ); Mon, 3 Jul 2023 13:52:55 -0400 Received: from relay1-d.mail.gandi.net (relay1-d.mail.gandi.net [217.70.183.193]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D8ED2E42; Mon, 3 Jul 2023 10:52:53 -0700 (PDT) X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org X-GND-Sasl: i.maximets@ovn.org Received: by mail.gandi.net (Postfix) with ESMTPSA id B85E4240008; Mon, 3 Jul 2023 17:52:49 +0000 (UTC) From: Ilya Maximets To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Jason Wang , Stefan Hajnoczi , Ilya Maximets Subject: [PATCH bpf-next] xsk: honor SO_BINDTODEVICE on bind Date: Mon, 3 Jul 2023 19:53:29 +0200 Message-Id: <20230703175329.3259672-1-i.maximets@ovn.org> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. A privileged process might create the socket and pass it to a non-privileged process for later use. However, that process will be able to bind the socket to any network interface. Even though it will not be able to receive any traffic without modification of the BPF map, the situation is not ideal. Sockets already have a mechanism that can be used to restrict what interface they can be attached to. That is SO_BINDTODEVICE. To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW. Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer workflow when non-privileged process is using AF_XDP. The intended workflow is following: 1. First process creates a bare socket with socket(AF_XDP, ...). 2. First process loads the XSK program to the interface. 3. First process adds the socket fd to a BPF map. 4. First process ties socket fd to a particular interface using SO_BINDTODEVICE. 5. First process sends socket fd to a second process. 6. Second process allocates UMEM. 7. Second process binds socket to the interface with bind(...). 8. Second process sends/receives the traffic. All the steps above are possible today if the first process is privileged and the second one has sufficient RLIMIT_MEMLOCK and no capabilities. However, the second process will be able to bind the socket to any interface it wants on step 7 and send traffic from it. With the proposed change, the second process will be able to bind the socket only to a specific interface chosen by the first process at step 4. Acked-by: Magnus Karlsson Signed-off-by: Ilya Maximets --- RFC --> PATCH: * Better explained intended workflow in a commit message. * Added ACK from Magnus. Documentation/networking/af_xdp.rst | 9 +++++++++ net/xdp/xsk.c | 6 ++++++ 2 files changed, 15 insertions(+) diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index 247c6c4127e9..1cc35de336a4 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -433,6 +433,15 @@ start N bytes into the buffer leaving the first N bytes for the application to use. The final option is the flags field, but it will be dealt with in separate sections for each UMEM flag. +SO_BINDTODEVICE setsockopt +-------------------------- + +This is a generic SOL_SOCKET option that can be used to tie AF_XDP +socket to a particular network interface. It is useful when a socket +is created by a privileged process and passed to a non-privileged one. +Once the option is set, kernel will refuse attempts to bind that socket +to a different interface. Updating the value requires CAP_NET_RAW. + XDP_STATISTICS getsockopt ------------------------- diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 5a8c0dd250af..386ff641db0f 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -886,6 +886,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); struct net_device *dev; + int bound_dev_if; u32 flags, qid; int err = 0; @@ -899,6 +900,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) XDP_USE_NEED_WAKEUP)) return -EINVAL; + bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); + + if (bound_dev_if && bound_dev_if != sxdp->sxdp_ifindex) + return -EINVAL; + rtnl_lock(); mutex_lock(&xs->mutex); if (xs->state != XSK_READY) { -- 2.40.1