Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1482723rwd; Thu, 25 May 2023 13:19:40 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6QFRx6OCE7YkSodJLD2irO5SoidLuu39AEuQ7USn+i+W+da04x4OrqpWYzdSET5OT9r+F3 X-Received: by 2002:a17:90a:6301:b0:255:ea26:9b0 with SMTP id e1-20020a17090a630100b00255ea2609b0mr2780553pjj.47.1685045980335; Thu, 25 May 2023 13:19:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685045980; cv=none; d=google.com; s=arc-20160816; b=IoIXkNCYRlHYGR3mdSX0lajAPZSVAWjA4MDHvWnmDhJGOkC7TP8XqtTKqsvt4nt6OF M/etBVY6N1OP9mJeLwtXOL5v7BeL+zTVDemRVJMGAvJY3esTzL+0ICQoAoqhTE+Gidw8 H1LXOKAkKDPpJhOqQI55obixMVtdNaMJd0qmAfZnIQsYbO8hqA/JiW2je2vVZhQoMDuH zZnM6In83RKSv/NK3F9OjX2V+RFHVP2s/hjmUFpjbByLLaILGPr4Q7NOI+M1lNwsBzVY jDHbHc683tqrY6H9eflqCSLh8sXXMG5WpuMeaL6XCrRf9hCwXiZqlY6OkmcTIxXGKrKw xRlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=nqfhm3/Yv5prAaAJ3hLzCTN2dKkIp+erbKr/gQI7qtE=; b=z8fF4zBTYvztrYYu1uOuV+O6PoVfNTWMMHP5molWY60jZmoIfL/LUcyGHhzfzDlln3 mzcJBN5Pc4cw3HBrkqVyUMk5iPIP1264AAWbGBWSFYCL3Oi7hHqj2fyOt4xSAnslkMLQ 9zX4zRikgcTQ92zwDUcJiz8bvc8piITByzFmRxKKuJpGedocBFvtyIHGlgAysvEncBiR kHlnBjLiJec82GNX+pf78rhFTTrXdqmpXUcxHLURlUbp/5XVVbsJX17Uw7kSmYXNaVIz sjhP2RIzxCRGrFcHumB2ohnJbDBpN4IcERjt3koJ42jWxPS9Mvwirll8pGxV9f+uuSeN Wgcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@iogearbox.net header.s=default2302 header.b=HDLoT+2z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=iogearbox.net Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v2-20020a637a02000000b0053ef1799a71si1899794pgc.399.2023.05.25.13.19.27; Thu, 25 May 2023 13:19:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@iogearbox.net header.s=default2302 header.b=HDLoT+2z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=iogearbox.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241380AbjEYUB0 (ORCPT + 99 others); Thu, 25 May 2023 16:01:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242009AbjEYUBW (ORCPT ); Thu, 25 May 2023 16:01:22 -0400 Received: from www62.your-server.de (www62.your-server.de [213.133.104.62]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7757895; Thu, 25 May 2023 13:01:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=iogearbox.net; s=default2302; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:Cc:To:Subject:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID; bh=nqfhm3/Yv5prAaAJ3hLzCTN2dKkIp+erbKr/gQI7qtE=; b=HDLoT+2z2oyn1lpV6boByRvq0B ssnHAwUL8qfkkR0wmfBHbhRowm5fQdTtJqQ9zqhRmuz1XaWy6Ln7+rAu5po+Lmpx9sBXTVHdZETE/ Mk5yY/q6l7oaKOWnK16xkeHVczkDp0Ndpl/lf5fveWhxxq5oft2IikFtcuzeGW3EaokONEGiN/Opg tcaJYmrmOvH9EHlQaywLLzT1ag+S4i4vMvMfl/9s3MvkWj/6a8KXQumR8quO3e4cOjlPdq35CNqlX q5+N8wIeHk9VImG40lii0oygOVjO39OSgFLOEMcvslER6QxS067b7aJKJ9EcPmeHKe9GMnHnvUvmM bnOWDWcA==; Received: from sslproxy05.your-server.de ([78.46.172.2]) by www62.your-server.de with esmtpsa (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1q2H93-0001Bo-V8; Thu, 25 May 2023 22:01:09 +0200 Received: from [178.197.248.42] (helo=linux.home) by sslproxy05.your-server.de with esmtpsa (TLSv1.3:TLS_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1q2H92-000XIa-Sz; Thu, 25 May 2023 22:01:08 +0200 Subject: Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign To: Kuniyuki Iwashima , lmb@isovalent.com Cc: andrii@kernel.org, ast@kernel.org, bpf@vger.kernel.org, davem@davemloft.net, dsahern@kernel.org, edumazet@google.com, haoluo@google.com, joe@cilium.io, joe@wand.net.nz, john.fastabend@gmail.com, jolsa@kernel.org, kafai@fb.com, kpsingh@kernel.org, kuba@kernel.org, linux-kernel@vger.kernel.org, martin.lau@linux.dev, netdev@vger.kernel.org, pabeni@redhat.com, sdf@google.com, song@kernel.org, willemdebruijn.kernel@gmail.com, yhs@fb.com References: <20230525081923.8596-1-lmb@isovalent.com> <20230525174131.4706-1-kuniyu@amazon.com> From: Daniel Borkmann Message-ID: <99681548-fa79-0607-d574-db61818cab78@iogearbox.net> Date: Thu, 25 May 2023 22:01:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20230525174131.4706-1-kuniyu@amazon.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.103.8/26918/Thu May 25 09:25:14 2023) X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/25/23 7:41 PM, Kuniyuki Iwashima wrote: > From: Lorenz Bauer > Date: Thu, 25 May 2023 09:19:22 +0100 >> Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT >> sockets. This means we can't use the helper to steer traffic to Envoy, which >> configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing >> TPROXY from our setup. >> >> The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup >> helpers don't execute SK_REUSEPORT programs. Instead, one of the >> reuseport sockets is selected by hash. This could cause dispatch to the >> "wrong" socket: >> >> sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash >> bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed >> >> Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup >> helpers unfortunately. In the tc context, L2 headers are at the start >> of the skb, while SK_REUSEPORT expects L3 headers instead. >> >> Instead, we execute the SK_REUSEPORT program when the assigned socket >> is pulled out of the skb, further up the stack. This creates some >> trickiness with regards to refcounting as bpf_sk_assign will put both >> refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU >> freed. We can infer that the sk_assigned socket is RCU freed if the >> reuseport lookup succeeds, but convincing yourself of this fact isn't >> straight forward. Therefore we defensively check refcounting on the >> sk_assign sock even though it's probably not required in practice. >> >> Fixes: 8e368dc ("bpf: Fix use of sk->sk_reuseport from sk_assign") >> Fixes: cf7fbe6 ("bpf: Add socket assign support") > > Please use 12 chars of hash. > > $ cat ~/.gitconfig > [core] > abbrev = 12 > [pretty] > fixes = Fixes: %h (\"%s\") > > $ git show 8e368dc --pretty=fixes | head -n 1 > Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign") Yeap, not quite sure what happened here but the 12 chars is clear. Will be fixed up in v2, too, ofc. >> Co-developed-by: Daniel Borkmann >> Signed-off-by: Daniel Borkmann >> Signed-off-by: Lorenz Bauer >> Cc: Joe Stringer >> Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/ >> --- >> include/net/inet6_hashtables.h | 36 +++++++++++++++++++++++++++++----- >> include/net/inet_hashtables.h | 27 +++++++++++++++++++++++-- >> include/net/sock.h | 7 +++++-- >> include/uapi/linux/bpf.h | 3 --- >> net/core/filter.c | 2 -- >> net/ipv4/inet_hashtables.c | 15 +++++++------- >> net/ipv4/udp.c | 23 +++++++++++++++++++--- >> net/ipv6/inet6_hashtables.c | 19 +++++++++--------- >> net/ipv6/udp.c | 23 +++++++++++++++++++--- >> tools/include/uapi/linux/bpf.h | 3 --- >> 10 files changed, 119 insertions(+), 39 deletions(-) >> [...] >> @@ -85,14 +92,33 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, >> int iif, int sdif, >> bool *refcounted) >> { >> - struct sock *sk = skb_steal_sock(skb, refcounted); >> - >> + bool prefetched; >> + struct sock *sk = skb_steal_sock(skb, refcounted, &prefetched); >> + struct net *net = dev_net(skb_dst(skb)->dev); >> + const struct ipv6hdr *ip6h = ipv6_hdr(skb); > > nit: Reverse Xmas Tree order. Same for other chunks. It is, the prefetched bool is simply used one line below. I don't think this is much different than most other code from style pov.. >> + >> + if (prefetched) { >> + struct sock *reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, >> + &ip6h->saddr, sport, >> + &ip6h->daddr, ntohs(dport)); >> + if (reuse_sk) { >> + if (reuse_sk != sk) { >> + if (*refcounted) { >> + sock_put(sk); >> + *refcounted = false; >> + } >> + if (IS_ERR(reuse_sk)) >> + return NULL; >> + } >> + return reuse_sk; >> + } > > Maybe we can add a hepler to avoid this duplication ? We'll check if it can be made a bit nicer and integrate this into the v2. Thanks, Daniel