Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1992345rwd; Thu, 25 May 2023 23:21:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4JQIMIVbgfqYSSVwphRDe6RnNMI/nS+0XkFR2vgkVP1syQ9fJqs/8L56b9d2voVbuhsgVg X-Received: by 2002:a05:6a00:14d0:b0:627:e49a:871a with SMTP id w16-20020a056a0014d000b00627e49a871amr1904064pfu.23.1685082118992; Thu, 25 May 2023 23:21:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685082118; cv=none; d=google.com; s=arc-20160816; b=kxa9PYY9p8c/7w04m92RpyyBlhxlZabJdKQgg4iN8FxomfjWge2mpRjr6K7nCaamuZ twjAiOeAt14w4cC9/r6wW1VKRHyp0fZnZNtGb9+yygyeZaGBGUvVC3fJx63K2NHR7QK8 jbHjE088aru4CGZxYAK3HWA1CcQuQYXRT5MSd/8IsxQdf47qUUcm7gMFEavwFBOmIt3I /d5+2OQtlOZI4J0hrluzQyoyksW15mkY4EmztGHpMqTBqXknKH0QjrbnireWR+9LGxW8 HhDncSHi7o+y2PVYp2iFeKY5LyqpdabvfhttbKa2hWmYenJHgHxQc5Ro/ymxTOop9SMh rLzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=nNIdMuQIYgzE1Jz9FBURvIpeHNJmM5cDxyczWDYYf7Q=; b=cg2dtPIAaMiWm3ZUVoMqkg4sh0nUIC6WjZ6SrDHVgjtrHhVasJxMS1bDgf0wpA4t+2 cOj2o+a1K2H8h6nX52W0Ul7YEsliHgO/kuuxSwJ2Utsfw4ZB9XJCpYpy34S1p7MDfHB8 BpDZ0LHl+o6IFHA1leJUbZyZ0D0zercvF/kuKrdsHSXItuuj0fuFxNneTfzsiRGgxCMI CcHP8YqjvYq4grImfwZGUyVhKYDHXWBur1lz+LXY4igLcCX7jPnHGxxffFnXtT2EMD79 UwO+bNj6uNte5F4LPx97i6L1kZ27u4a4sPMgQVdHSHJo0rMZEnxZSHWnzaEV9cMVOL6D r4aA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cilium-io.20221208.gappssmtp.com header.s=20221208 header.b=tfo3U1+P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cilium.io Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f24-20020a637558000000b0053f28ed4d3bsi3032274pgn.30.2023.05.25.23.21.44; Thu, 25 May 2023 23:21:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cilium-io.20221208.gappssmtp.com header.s=20221208 header.b=tfo3U1+P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cilium.io Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234676AbjEZF5I (ORCPT + 99 others); Fri, 26 May 2023 01:57:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229479AbjEZF5G (ORCPT ); Fri, 26 May 2023 01:57:06 -0400 Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C9FD13D for ; Thu, 25 May 2023 22:57:03 -0700 (PDT) Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-96f5d651170so275522966b.1 for ; Thu, 25 May 2023 22:57:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cilium-io.20221208.gappssmtp.com; s=20221208; t=1685080622; x=1687672622; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nNIdMuQIYgzE1Jz9FBURvIpeHNJmM5cDxyczWDYYf7Q=; b=tfo3U1+PkxoGweL4ZVJvAX4pztA/4fMHVCXP4T7gwGVmdMx3e9QA6bg3+1uuSohmBv eOKaPEpE7ihgbgwn9DaFfCNeMbv6LARHBIgRl90TFIFeTNPg+LT1M1vbIOrcjlKpeniZ 8fGWgc6IHzDxM99ME43vuU9t9v0XVJJ/hlOzQHBv7bM5WONsQ9h3Q0N8n8myLu5niWTQ EQn0KAU5nmo4EXgOFU1vFZJYXKxOX4ZplzFyTLUX1LbhfgNzjOdFYbvL+5m0qN9SpJl1 KDWXQaxJ6I4+LyTzV5fYqtl5AaXGvsCumdSuxYOlEFNVlaoh+dsghDoSJKz8yNx2NzoM kyjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685080622; x=1687672622; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nNIdMuQIYgzE1Jz9FBURvIpeHNJmM5cDxyczWDYYf7Q=; b=Nx74Ki1pG01kTLfLAYJkDjXY4dCB+Md2UjgR2UN+4a37xB63CoKamHKSX/u3OenG6s qKDYjBzBROVnCnArf2ZfHG4fnn3loyj1h2sSrmVuFkfN3Wk6olcBZizzACQ//f+pw8xm HpqZjrOh6Ic9GlHyJ6eujq/nQdhreyjv9WIMh/GD6O4Mf7COnNt1lxgG863uqhfQStzl mCWJ9TRnt5KL5X+H76nqnRiCSoqIAFTJLUV65rdPxlk2ttrXGCBwETvJ5oxAeA1R76r1 GXBn3bRQ77avJeJ3o7xutihtReNXw+p8qmh///VBgParUk185lXAecNp86kigPrB32pQ waGQ== X-Gm-Message-State: AC+VfDx/GSlndGDs+OtWPBgzzoh0jSTtk9ilLO8VQlIbA/SkOAy1uh/R udq3NMLqBcotcpzXFyjF+mXw3N1fL1bvkLb36SEtGw== X-Received: by 2002:a17:907:3d9f:b0:966:4669:7e8d with SMTP id he31-20020a1709073d9f00b0096646697e8dmr3540030ejc.16.1685080621828; Thu, 25 May 2023 22:57:01 -0700 (PDT) MIME-Version: 1.0 References: <20230525081923.8596-1-lmb@isovalent.com> In-Reply-To: <20230525081923.8596-1-lmb@isovalent.com> From: Joe Stringer Date: Thu, 25 May 2023 22:56:50 -0700 Message-ID: Subject: Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign To: Lorenz Bauer Cc: "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , David Ahern , Willem de Bruijn , Joe Stringer , Joe Stringer , Martin KaFai Lau , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 25, 2023 at 1:19=E2=80=AFAM Lorenz Bauer wr= ote: > > Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT > sockets. This means we can't use the helper to steer traffic to Envoy, wh= ich > configures SO_REUSEPORT on its sockets. In turn, we're blocked from remov= ing > TPROXY from our setup. > > The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_loo= kup > helpers don't execute SK_REUSEPORT programs. Instead, one of the > reuseport sockets is selected by hash. This could cause dispatch to the > "wrong" socket: > > sk =3D bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash > bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed > > Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup > helpers unfortunately. In the tc context, L2 headers are at the start > of the skb, while SK_REUSEPORT expects L3 headers instead. > > Instead, we execute the SK_REUSEPORT program when the assigned socket > is pulled out of the skb, further up the stack. This creates some > trickiness with regards to refcounting as bpf_sk_assign will put both > refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU > freed. We can infer that the sk_assigned socket is RCU freed if the > reuseport lookup succeeds, but convincing yourself of this fact isn't > straight forward. Therefore we defensively check refcounting on the > sk_assign sock even though it's probably not required in practice. > > Fixes: 8e368dc ("bpf: Fix use of sk->sk_reuseport from sk_assign") > Fixes: cf7fbe6 ("bpf: Add socket assign support") > Co-developed-by: Daniel Borkmann > Signed-off-by: Daniel Borkmann > Signed-off-by: Lorenz Bauer > Cc: Joe Stringer > Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBN= ROnfWMZxUWeA@mail.gmail.com/ Nice approach to fix this issue, wish I'd thought of it :) I pulled this and tested out in a little-vm-helper environment with kind and Cilium's examples/kubernetes/connectivity-check proxy suite, as well as cilium-cli's connectivity tests and the L7 features seem to be working as expected with SO_REUSEPORT. Tested-by: Joe Stringer I also glanced through the commit, and the various protocols seem to be handled consistently at the very least, though I agree it'd be simpler for review and bisecting if broken down into more incremental changes.