Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp1221000rwb; Wed, 26 Jul 2023 09:11:58 -0700 (PDT) X-Google-Smtp-Source: APBJJlG7CoeDuql3EnTc5M/UthXaH7LHiiVKpVxEVybKosRccX99+U/hDL0y1zgjVJaXl5QhJagb X-Received: by 2002:a05:6a20:7f9e:b0:132:c7de:f71c with SMTP id d30-20020a056a207f9e00b00132c7def71cmr2761548pzj.59.1690387918288; Wed, 26 Jul 2023 09:11:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690387918; cv=none; d=google.com; s=arc-20160816; b=egd6+8yu0Mtg8FSu/+qGI/+27c30XyV2Ux7Fe+FKKO+VPnpun0BC8Qd6BkBh2F0MIw B58TQrz5xm03XSqONMskmfwaJ6Xxv2mDbvdkE65Y+/fXIgEp0Kty4LrjB1oVNu5LFYpk OykEy745XAQ8QDwTaRpMUvR0qqglS8PMP3w/8RkJXsIHdjkbdg4YjvDdjpA46pPk3pLE Wy0M01J5Yher1VCHBXIQEJ+9TrgzJoa2aHPVTw+IY599l3i+M8ekB9s9Lw+qU4A3v5vW q5lIhEj04Yl0VAbBQ4oFBjGOwZ4K1fGUq679Hx5jQnK21+ZLXw9eU5dlSP/Sqt/QZ7h/ teRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=7zM+YG+vsWbtMUD6XdWA2JUgtaWQiytdCsuMze1m6/k=; fh=H6UQKSAqGHjb54pZqph1tH1MP6c6rOoC25f6obiXGjY=; b=rZ9bpFtczW39felUn08hsnkDyjMlAowh19+EK4DrnGdNZ72IGE3U1xErpZpejtFWrp cRuIhDuQiLQ81vmu76Mr8htG5SeddWFJinJjea0A80Hz6PQr7B4aTHbSvKz/jhlaL92P isFiLLLhCxd5RO3TCrzs9pPPGA5HideVVjsxqmnHAUD5wITy5olydRLZv3bFkcaysvCW LqZsWV5vXkI4ud88WlvWuaKtJstoYaQrsAu3I9/IU8cmkhY4CD8yMgGc2vVe05T0W3nV R4z+ORUaGBbEWjs9rryKFYlqXixOMgCU2IOtoJRYYcsqN52kjrApjITZkRx/RSC+iCf7 cQEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="xN+/DtpX"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bv69-20020a632e48000000b0055bbc74626esi1015236pgb.217.2023.07.26.09.11.45; Wed, 26 Jul 2023 09:11:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="xN+/DtpX"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232569AbjGZPBL (ORCPT + 99 others); Wed, 26 Jul 2023 11:01:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232709AbjGZPBK (ORCPT ); Wed, 26 Jul 2023 11:01:10 -0400 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AEFF212D for ; Wed, 26 Jul 2023 08:01:06 -0700 (PDT) Received: by mail-wr1-x433.google.com with SMTP id ffacd0b85a97d-317716a4622so1380049f8f.1 for ; Wed, 26 Jul 2023 08:01:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1690383665; x=1690988465; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=7zM+YG+vsWbtMUD6XdWA2JUgtaWQiytdCsuMze1m6/k=; b=xN+/DtpXH07sL2bRF9VUZv/uyejUHbIwAWXONp9i6bShF/IthrYdyTZLT21dJM/qr8 I5ew8CLX5YeZDjgGUd0yCTtM771EB4OKUZjDuYcLHEaWfVdyDLZGq/VJCmjTyislOOH7 /bdLpRikQFGNfjB9iskGIEUSrPpeNj2QJIV9AGMxkZ1jJVqVEtpT9G2eBVLNr+aJGdEj 1eTC3DrbPKi9DAuK/CaO6tyHt6309JGswVakuEOJv5B2n9LzhB8lGoll4rZFuKDZc+Jj 6zJ1a1mF6OY1s5ZHeY+fsVt7T9UdyA65pK+Zijy+e6VHka5DD7sHx0yTUV0npYpashkR 8RrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690383665; x=1690988465; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7zM+YG+vsWbtMUD6XdWA2JUgtaWQiytdCsuMze1m6/k=; b=GkYelm9fKU1GjpM8HZZh2iPWhfvOlmxgsFMi5sx2bujuZ0jYmfd+Z4hA7bCakOrNVh E6yC+frhRc8EpVD89IF8pAOignMdlryrfD63K2cp14iYeO806UXcWzBznHHeFM+M92aE O2PyLFRDWNm8gM+fPgEz3PG1v56yVjtr7E5G9c9KVmhRGSrwcUX6NFIjTelBAbpKp57M 7PII+it3Xf68rt3t3+myLRAe6PJKCpk/hlbYlc6omAX2HDw62wN2pENo1EG3TdHDTiBr QrbB0JjrnVFKm2jAE8cRVkzTiLweC/BpafPYTmOQHcUML7yXW2XbjymRh/9eyeLxSxq0 MU3A== X-Gm-Message-State: ABy/qLb1SORnMAdNRExqgm3QlhOrqojooiYT5LrfoFJU6tFpRnUM1rY7 gLtcCPLZQD4aH0ujrMafBFLOVw== X-Received: by 2002:adf:f786:0:b0:317:49e9:c57 with SMTP id q6-20020adff786000000b0031749e90c57mr1591410wrp.43.1690383664792; Wed, 26 Jul 2023 08:01:04 -0700 (PDT) Received: from localhost ([102.36.222.112]) by smtp.gmail.com with ESMTPSA id o20-20020a5d58d4000000b0031433443265sm20096799wrf.53.2023.07.26.08.01.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jul 2023 08:01:03 -0700 (PDT) Date: Wed, 26 Jul 2023 18:01:00 +0300 From: Dan Carpenter To: Yan Zhai Cc: bpf@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Mykola Lysenko , Shuah Khan , linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, kernel-team@cloudflare.com, Jordan Griege , Markus Elfring , Jakub Sitnicki Subject: Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 26, 2023 at 07:14:56AM -0700, Yan Zhai wrote: > On Wed, Jul 26, 2023 at 04:39:08PM +0300, Dan Carpenter wrote: > > I'm not positive I understand the code in ip_finish_output2(). I think > > instead of looking for LWTUNNEL_XMIT_DONE it should instead look for > > != LWTUNNEL_XMIT_CONTINUE. It's unfortunate that NET_XMIT_DROP and > > LWTUNNEL_XMIT_CONTINUE are the both 0x1. Why don't we just change that > > instead? > > > I considered about changing lwt side logic. But it would bring larger > impact since there are multiple types of encaps on this hook, not just > bpf redirect. Changing bpf return values is a minimum change on the > other hand. In addition, returning value of NET_RX_DROP and > NET_XMIT_CN are the same, so if we don't do something in bpf redirect, > there is no way to distinguish them later: the former is considered as > an error, while "CN" is considered as non-error. Uh, NET_RX/XMIT_DROP values are 1. NET_XMIT_CN is 2. I'm not an expert but I think what happens is that we treat NET_XMIT_CN as success so that it takes a while for the resend to happen. Eventually the TCP layer will detect it as a dropped packet. > > > Also there seems to be a leak in lwtunnel_xmit(). Should that return > > LWTUNNEL_XMIT_CONTINUE or should it call kfree_skb() before returning? > > > > Something like the following? > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 11652e464f5d..375790b672bc 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -112,6 +112,9 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev); > > #define NET_XMIT_CN 0x02 /* congestion notification */ > > #define NET_XMIT_MASK 0x0f /* qdisc flags in net/sch_generic.h */ > > > > +#define LWTUNNEL_XMIT_DONE NET_XMIT_SUCCESS > > +#define LWTUNNEL_XMIT_CONTINUE 0x3 > > + > > /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It > > * indicates that the device will soon be dropping packets, or already drops > > * some packets of the same priority; prompting us to send less aggressively. */ > > diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h > > index 6f15e6fa154e..8ab032ee04d0 100644 > > --- a/include/net/lwtunnel.h > > +++ b/include/net/lwtunnel.h > > @@ -16,12 +16,6 @@ > > #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) > > #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) > > > > -enum { > > - LWTUNNEL_XMIT_DONE, > > - LWTUNNEL_XMIT_CONTINUE, > > -}; > > - > > - > > struct lwtunnel_state { > > __u16 type; > > __u16 flags; > > diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c > > index 711cd3b4347a..732415d1287d 100644 > > --- a/net/core/lwtunnel.c > > +++ b/net/core/lwtunnel.c > > @@ -371,7 +371,7 @@ int lwtunnel_xmit(struct sk_buff *skb) > > > > if (lwtstate->type == LWTUNNEL_ENCAP_NONE || > > lwtstate->type > LWTUNNEL_ENCAP_MAX) > > - return 0; > > + return LWTUNNEL_XMIT_CONTINUE; > > You are correct this path would leak skb. Return continue (or drop) > would avoid the leak. Personally I'd prefer drop instead to signal the > error setup. Since this is a separate issue, do you want to send a > separate patch on this? Or I am happy to do it if you prefer. > I don't know which makes sense so I'll leave that up to you. > > > > ret = -EOPNOTSUPP; > > rcu_read_lock(); > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > index 6e70839257f7..4be50a211b14 100644 > > --- a/net/ipv4/ip_output.c > > +++ b/net/ipv4/ip_output.c > > @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s > > if (lwtunnel_xmit_redirect(dst->lwtstate)) { > > int res = lwtunnel_xmit(skb); > > > > - if (res < 0 || res == LWTUNNEL_XMIT_DONE) > > + if (res != LWTUNNEL_XMIT_CONTINUE) > > return res; > > Unfortunately we cannot return res directly here when res > 0. This is > the final reason why I didn't patch here. Return values here can be > propagated back to sendmsg syscall, so returning a positive value > would break the syscall convention. The neigh_output() function is going to return NET_XMIT_DROP so this already happens. Is that not what we want to happen? I guess my concern is that eventually people will eventually new introduce bugs. Fixing incorrect error codes is something that I do several times per week. :P regards, dan carpenter