Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp1174982rwb; Wed, 26 Jul 2023 08:35:39 -0700 (PDT) X-Google-Smtp-Source: APBJJlExLAKoZEDIuEzUmzXc1OqchDdaBAptFZe9HWhj3IuR1Apq+boqYzPCnrsY98Js2m8GYIRB X-Received: by 2002:a17:902:c3d1:b0:1b8:c1f:fd2c with SMTP id j17-20020a170902c3d100b001b80c1ffd2cmr1781167plj.33.1690385739047; Wed, 26 Jul 2023 08:35:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690385739; cv=none; d=google.com; s=arc-20160816; b=dXlxqP4irjtXh3OWROqW2FRZfZpH6WdAEwzut5UCg4KPim03VDFzaFhXGWbwehjLAd vqCB5I4WxtywHND/pI6ndDtQne8qIvt5pLZkZn9mj6e8giAKSpwNZdY0hMJg92rvt35j fKY2OTSvPsLVpuPMxzrlpeMPw8NWfTjMl1ljYsyBF1ejVIfTOg/Zh72nqkYLBGS+MhSQ Nab40hgcnYSonK+iQQGo2KS9MEilbQzmVPkmPFU8l2itUoTxyiqn3U5b9vMxEwg3zSpQ 6yWio2o1sFc7H6RD970Rm/v7NA3ycervgRcdotF4SUCnOXpogfm3Jx+e1zA08mrMl5hp b7fQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=pkT7MMdb9UBp/NLJEI4W0DlkPyX5NSU+s0eXsj5hKk4=; fh=Ke5maScHwjN6J59cocsTKLu/qb0KK7yOTgvF55t1yFE=; b=XOjUZ8Ylj2hZHllSvbTVJxkDnyuSE6/IuuIiXs/ObyXNuHAD/ifmHb66mxI0eFdX+q uwuiIT1uOc/bGX+NALi0UPdboD4Q84T4gE+cHjoJU+HN3zRimMt3fRLcj9glRucubrfJ PphgqxE+zi+P3AHD6da599hWhBGD8QmPNmEVHc7GSQKRmmsnS5VCneR5W2CuBh6wYt7x wQhsHmJ6jV8MXGoMvKxBBBtf4haUUAzpjBClkrWpayumSeLOKCi/6QjC7+OSLMpaHOSq gChxf62YUiKbW5zC3P4MvuKnvw5939nowBxb191JsHbkU2n9KwdBpqGU3ocj3DJMy3BG focA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=ArqoWPeA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t1-20020a1709027fc100b001b3d2717f7bsi12151541plb.61.2023.07.26.08.35.25; Wed, 26 Jul 2023 08:35:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=ArqoWPeA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234536AbjGZOQi (ORCPT + 99 others); Wed, 26 Jul 2023 10:16:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233740AbjGZOQS (ORCPT ); Wed, 26 Jul 2023 10:16:18 -0400 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0D9330ED for ; Wed, 26 Jul 2023 07:15:11 -0700 (PDT) Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-7680e3910dfso689072185a.0 for ; Wed, 26 Jul 2023 07:15:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; t=1690380899; x=1690985699; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=pkT7MMdb9UBp/NLJEI4W0DlkPyX5NSU+s0eXsj5hKk4=; b=ArqoWPeAyncX4vD74ueZ5Js2tYqx0nA4v2zayia01BhK4FXcrSsDaORXb7e/OUEIip 4koEGH7ywBK8hSNYR0G2pR0BpUajVfz4DXKEiEwAcPq0Fm3UubKlGIOQv3zTmxpvbuCN KxbB2jgQ59Oa+0WN0ov8gk9koKANE2QuzG75Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690380899; x=1690985699; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pkT7MMdb9UBp/NLJEI4W0DlkPyX5NSU+s0eXsj5hKk4=; b=SKbgVy+nA06g2Kr1C4vL11CFkeDT5ueQLeq3xDcMpdutDmGKv4rhVzGDZS4mwcfFun p+bczsPdfXI/BNHPg93T2CJjjZTzKXsQ5dqTKxPp+7/YdyECLn3qOZXBD7gbFiTK7YAQ lm7cvOVcDe9PIqD5ZQeQGrebvN30R0vWUunu7+4GEdw/8Pdnd+RmUAImwhYSNLFSaeuJ a/+NunbMkVi4tPo/kgBhmFrj6yKgCDvA5QA8OY0NF4mi7umKJ/HF2dR79/MJeqCnfHTk O+V1wCjl9yZxLFlFR+rzj6Trcee/UlUopJf+1OLzQdlD/+WVNS8mJWZ1+iuI0ixkZ4yh Cw/w== X-Gm-Message-State: ABy/qLbF4pRTtrGNrdmZIBggfr6+9GcWU0p3+pcHXMMN6FCsZroaTYfJ K9fHz8ckk7SCTyXIMieUE3RuNQ== X-Received: by 2002:a05:620a:2a01:b0:76c:4d4c:7942 with SMTP id o1-20020a05620a2a0100b0076c4d4c7942mr2763882qkp.21.1690380899352; Wed, 26 Jul 2023 07:14:59 -0700 (PDT) Received: from debian.debian ([140.141.197.139]) by smtp.gmail.com with ESMTPSA id t4-20020a05620a004400b00767c8308329sm1160072qkt.25.2023.07.26.07.14.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jul 2023 07:14:58 -0700 (PDT) Date: Wed, 26 Jul 2023 07:14:56 -0700 From: Yan Zhai To: Dan Carpenter Cc: Yan Zhai , bpf@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Mykola Lysenko , Shuah Khan , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@cloudflare.com, Jordan Griege , Markus Elfring , Jakub Sitnicki Subject: Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 26, 2023 at 04:39:08PM +0300, Dan Carpenter wrote: > I'm not positive I understand the code in ip_finish_output2(). I think > instead of looking for LWTUNNEL_XMIT_DONE it should instead look for > != LWTUNNEL_XMIT_CONTINUE. It's unfortunate that NET_XMIT_DROP and > LWTUNNEL_XMIT_CONTINUE are the both 0x1. Why don't we just change that > instead? > I considered about changing lwt side logic. But it would bring larger impact since there are multiple types of encaps on this hook, not just bpf redirect. Changing bpf return values is a minimum change on the other hand. In addition, returning value of NET_RX_DROP and NET_XMIT_CN are the same, so if we don't do something in bpf redirect, there is no way to distinguish them later: the former is considered as an error, while "CN" is considered as non-error. > Also there seems to be a leak in lwtunnel_xmit(). Should that return > LWTUNNEL_XMIT_CONTINUE or should it call kfree_skb() before returning? > > Something like the following? > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 11652e464f5d..375790b672bc 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -112,6 +112,9 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev); > #define NET_XMIT_CN 0x02 /* congestion notification */ > #define NET_XMIT_MASK 0x0f /* qdisc flags in net/sch_generic.h */ > > +#define LWTUNNEL_XMIT_DONE NET_XMIT_SUCCESS > +#define LWTUNNEL_XMIT_CONTINUE 0x3 > + > /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It > * indicates that the device will soon be dropping packets, or already drops > * some packets of the same priority; prompting us to send less aggressively. */ > diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h > index 6f15e6fa154e..8ab032ee04d0 100644 > --- a/include/net/lwtunnel.h > +++ b/include/net/lwtunnel.h > @@ -16,12 +16,6 @@ > #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) > #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) > > -enum { > - LWTUNNEL_XMIT_DONE, > - LWTUNNEL_XMIT_CONTINUE, > -}; > - > - > struct lwtunnel_state { > __u16 type; > __u16 flags; > diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c > index 711cd3b4347a..732415d1287d 100644 > --- a/net/core/lwtunnel.c > +++ b/net/core/lwtunnel.c > @@ -371,7 +371,7 @@ int lwtunnel_xmit(struct sk_buff *skb) > > if (lwtstate->type == LWTUNNEL_ENCAP_NONE || > lwtstate->type > LWTUNNEL_ENCAP_MAX) > - return 0; > + return LWTUNNEL_XMIT_CONTINUE; You are correct this path would leak skb. Return continue (or drop) would avoid the leak. Personally I'd prefer drop instead to signal the error setup. Since this is a separate issue, do you want to send a separate patch on this? Or I am happy to do it if you prefer. > > ret = -EOPNOTSUPP; > rcu_read_lock(); > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 6e70839257f7..4be50a211b14 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s > if (lwtunnel_xmit_redirect(dst->lwtstate)) { > int res = lwtunnel_xmit(skb); > > - if (res < 0 || res == LWTUNNEL_XMIT_DONE) > + if (res != LWTUNNEL_XMIT_CONTINUE) > return res; Unfortunately we cannot return res directly here when res > 0. This is the final reason why I didn't patch here. Return values here can be propagated back to sendmsg syscall, so returning a positive value would break the syscall convention. best, Yan > } > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index 1e8c90e97608..016b0a513259 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -113,7 +113,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * > if (lwtunnel_xmit_redirect(dst->lwtstate)) { > int res = lwtunnel_xmit(skb); > > - if (res < 0 || res == LWTUNNEL_XMIT_DONE) > + if (res != LWTUNNEL_XMIT_CONTINUE) > return res; > } >