Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2656501rwb; Thu, 17 Nov 2022 14:07:58 -0800 (PST) X-Google-Smtp-Source: AA0mqf6FXOzDUFPiMNn6FWG5ngg+V/+12wSaAAOf5+iJWTZz2AAfO5ld3PKdLS85V/imgG3sZh/L X-Received: by 2002:a17:90a:688a:b0:213:11ab:9a41 with SMTP id a10-20020a17090a688a00b0021311ab9a41mr4706457pjd.192.1668722878310; Thu, 17 Nov 2022 14:07:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668722878; cv=none; d=google.com; s=arc-20160816; b=a6wtVK1fkIOwAQxEyug1LVZ33BgTTvCXY5VHclNjfUMxU8Q5f1d0BEna7VvQexlmK0 yrIYO87M1R9AyrXEJ8feSlDx7CeJeclNRJ9hVmLBfpjSFbdUkXgkrhqrbkRSUhMqF+OR KdQKVjUlLw/NDuVjYmEF4t1I8YU2zG7AxhFmCKMN7p/RN6ltw/UgSgwRoQ5AgnX1HtBA VtV8matFNYhzxn5E1VZDiMFN1EzWhxa4ShZt/SoS/UK2TfFXBO/wNRY4D7EWzc9RJ6g/ mzU33AvUROV3YDu+PGY6ZZaEvbQAzad5eLDXU5RHmIraWk622Pqq4sClZZDUMI2NzieE ywkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=Hb99SZGqiD7q1zzB403dE9eGF6JozK2TLUstBhxk3OE=; b=twuqI2ia9c2uO5R3uL7MBaWSvX+aExaq/5LaFB8dfs2jIWEp0Rvx4U7wvQU63ymKWh oXHMGnBc6o4BBQdmxVHL8+dIRHOV3Kqzwa/sQj7+QD45Tdp0dBgKVUJGOSigeGCPh2e3 1eHMDX+icKN++tRZsTIHCqTF/pZ3KzHeSqpyQL1yPpQxIXJp9RkI8vqrSX3I6muJVEmm a85kDEhBqL3nyvwwW/gqy6bLbanuf4vuGoo8SrFEPSAMv5iuL7+um4Pyc5yDZKZuMWMq 10nTbgco3mcSdaPa4qxLoPQFLwdhg+tocudniUnz2E7yg0cFzlio/Nxvbu9iYKDOhV5U 0UMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=NO1RijVC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l2-20020a170902f68200b001887891a06csi2405715plg.54.2022.11.17.14.07.35; Thu, 17 Nov 2022 14:07:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=NO1RijVC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240221AbiKQV3j (ORCPT + 93 others); Thu, 17 Nov 2022 16:29:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235071AbiKQV3d (ORCPT ); Thu, 17 Nov 2022 16:29:33 -0500 Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B2085132F for ; Thu, 17 Nov 2022 13:29:31 -0800 (PST) Received: by mail-yb1-xb2f.google.com with SMTP id f201so3439062yba.12 for ; Thu, 17 Nov 2022 13:29:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Hb99SZGqiD7q1zzB403dE9eGF6JozK2TLUstBhxk3OE=; b=NO1RijVCnuqJG3D4QUm8+dgWVagxY8YKToG8PUXVpx4aLVjE+l3FMAQLCVMvxFj7Lb 2K7gO1DTUH/wG0Eu+ljOCz28A7uKX++dso+wNrxNOol3YuD1VedhaNszDPqrichVoEFA w8ab1TqAbDKaUBC/FFmP/qgK51NkNqi420DlHTAun0xOSgjhgMHHNRh5vS9V39HATcUz QFsTAQwyGCUElPAQpCBotvgGybcWh1/ruCn3lSV7qIjS7OfYaQ0DhcGvMZDVNvLhZTTX 0pNoD/QJohcBnaKZqvi2b0e/VdpVf52148+JcFPfmAXzLKmHrCatvoT4HFk4q54IXDiL rF0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hb99SZGqiD7q1zzB403dE9eGF6JozK2TLUstBhxk3OE=; b=sQe/OwbHQpr5by2z09jy0MAIUgQIex+v+MUvv9PJpTNefRVa3NMdqSS4MvwEXzFgIx hw6zYV8+XkgUXuk+RryxIYkghuEc9bpPqlpGswA+oXw7Z6AyDtzb1mEajnNZ6hGFAvVr eid8YOpshhg6ooAbitT2r5Mxfye5Ol/TaRkRBrQlS7GwNBczc2zlIROWcnNGLd6BE7QR ZNXsECtdDuxB0ebzbZKKIa4qGB3uZjxKk6/28d6xqqJs7R4hiffDHwOzaK9z3YUkRYD6 3zrpjSH6zB0mLWZZbVZIdhgP/57VZ6c9rmGd7k7JUNlwK6u+EdrhGeUBljiIi/c6pXUP Vd5g== X-Gm-Message-State: ANoB5pnok8hFKasdpKDzJdNv4tsqtPIUTwqwgquqZupOaimAO1E/zTV6 iZGsIDaL1beHgON8bOz0poTBvI1ds/lWb1mQI40LT5JSYGySaw== X-Received: by 2002:a25:6641:0:b0:6ca:b03:7111 with SMTP id z1-20020a256641000000b006ca0b037111mr3838876ybm.598.1668720570383; Thu, 17 Nov 2022 13:29:30 -0800 (PST) MIME-Version: 1.0 References: <20221117192949.GD4001@paulmck-ThinkPad-P17-Gen-1> In-Reply-To: From: Eric Dumazet Date: Thu, 17 Nov 2022 13:29:19 -0800 Message-ID: Subject: Re: [PATCH rcu/dev 3/3] net: Use call_rcu_flush() for dst_destroy_rcu To: Joel Fernandes Cc: paulmck@kernel.org, linux-kernel@vger.kernel.org, Cong Wang , David Ahern , "David S. Miller" , Hideaki YOSHIFUJI , Jakub Kicinski , Jamal Hadi Salim , Jiri Pirko , netdev@vger.kernel.org, Paolo Abeni , rcu@vger.kernel.org, rostedt@goodmis.org, fweisbec@gmail.com, jiejiang@google.com, Thomas Glexiner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 17, 2022 at 1:16 PM Joel Fernandes wro= te: > > > > > On Nov 17, 2022, at 2:29 PM, Paul E. McKenney wrot= e: > > > > =EF=BB=BFOn Thu, Nov 17, 2022 at 05:40:40PM +0000, Joel Fernandes wrote= : > >>> On Thu, Nov 17, 2022 at 5:38 PM Joel Fernandes wrote: > >>> > >>> On Thu, Nov 17, 2022 at 5:17 PM Eric Dumazet wr= ote: > >>>> > >>>> On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes wrote: > >>>>> > >>>>> Hello Eric, > >>>>> > >>>>> On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote: > >>>>>> On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google) > >>>>>> wrote: > >>>>>>> > >>>>>>> In a networking test on ChromeOS, we find that using the new CONF= IG_RCU_LAZY > >>>>>>> causes a networking test to fail in the teardown phase. > >>>>>>> > >>>>>>> The failure happens during: ip netns del > >>>>>> > >>>>>> And ? What happens then next ? > >>>>> > >>>>> The test is doing the 'ip netns del ' and then polling for th= e > >>>>> disappearance of a network interface name for upto 5 seconds. I bel= ieve it is > >>>>> using netlink to get a table of interfaces. That polling is timing = out. > >>>>> > >>>>> Here is some more details from the test's owner (copy pasting from = another > >>>>> bug report): > >>>>> In the cleanup, we remove the netns, and thus will cause the veth p= air being > >>>>> removed automatically, so we use a poll to check that if the veth i= n the root > >>>>> netns still exists to know whether the cleanup is done. > >>>>> > >>>>> Here is a public link to the code that is failing (its in golang): > >>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main= :src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/e= nv.go;drc=3D6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=3D161 > >>>>> > >>>>> Here is a public link to the line of code in the actual test leadin= g up to the above > >>>>> path (this is the test that is run: > >>>>> network.RoutingFallthrough.ipv4_only_primary) : > >>>>> https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main= :src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/rou= ting_fallthrough.go;drc=3D8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=3D52 > >>>>> > >>>>>>> Using ftrace, I found the callbacks it was queuing which this ser= ies fixes. Use > >>>>>>> call_rcu_flush() to revert to the old behavior. With that, the te= st passes. > >>>>>> > >>>>>> What is this test about ? What barrier was used to make it not fla= ky ? > >>>>> > >>>>> I provided the links above, let me know if you have any questions. > >>>>> > >>>>>> Was it depending on some undocumented RCU behavior ? > >>>>> > >>>>> This is a new RCU feature posted here for significant power-savings= on > >>>>> battery-powered devices: > >>>>> https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-= P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a > >>>>> > >>>>> There is also an LPC presentation about the same, I can dig the lin= k if you > >>>>> are interested. > >>>>> > >>>>>> Maybe adding a sysctl to force the flush would be better for funct= ional tests ? > >>>>>> > >>>>>> I would rather change the test(s), than adding call_rcu_flush(), > >>>>>> adding merge conflicts to future backports. > >>>>> > >>>>> I am not too sure about that, I think a user might expect the netwo= rk > >>>>> interface to disappear from the networking tables quickly enough wi= thout > >>>>> dealing with barriers or kernel iternals. However, I added the auth= ors of the > >>>>> test to this email in the hopes he can provide is point of views as= well. > >>>>> > >>>>> The general approach we are taking with this sort of thing is to us= e > >>>>> call_rcu_flush() which is basically the same as call_rcu() for syst= ems with > >>>>> CALL_RCU_LAZY=3Dn. You can see some examples of that in the patch s= eries link > >>>>> above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU s= o its only > >>>>> Android and ChromeOS that are using it. I am adding Jie to share an= y input, > >>>>> he is from the networking team and knows this test well. > >>>>> > >>>>> > >>>> > >>>> I do not know what is this RCU_LAZY thing, but IMO this should be op= t-in > >>> > >>> You should read the links I sent you. We did already try opt-in, > >>> Thomas Gleixner made a point at LPC that we should not add new APIs > >>> for this purpose and confuse kernel developers. > >>> > >>>> For instance, only kfree_rcu() should use it. > >>> > >>> No. Most of the call_rcu() usages are for freeing memory, so the > >>> consensus is we should apply this as opt out and fix issues along the > >>> way. We already did a lot of research/diligence on seeing which users > >>> need conversion. > >>> > >>>> We can not review hundreds of call_rcu() call sites and decide if > >>>> adding arbitrary delays cou hurt . > >>> > >>> That work has already been done as much as possible, please read the > >>> links I sent. > >> > >> Also just to add, this test is a bit weird / corner case, as in anyone > >> expecting a quick response from call_rcu() is broken by design. > >> However, for these callbacks, it does not matter much which API they > >> use as they are quite infrequent for power savings. > > > > The "broken by design" is a bit strong. Some of those call_rcu() > > invocations have been around for the better part of 20 years, after all= . > > > > That aside, I do hope that we can arrive at something that will enhance > > battery lifetime while avoiding unnecessary disruption. But we are > > unlikely to be able to completely avoid disruption. As this email > > thread illustrates. ;-) > > Another approach, with these 3 patches could be to keep the call_rcu() bu= t add an rcu_barrier() after them. I think people running ip del netns shou= ld not have to wait for their RCU cb to take too long to run and remove use= r visible state. But I would need suggestions from networking experts which= CBs of these 3, to do this for. Or for all of them. > > Alternatively, we can also patch just the test with a new knob that does = rcu_barrier. But I dislike that as it does not fix it for all users. Probab= ly the ip utilities will also need a patch then. > Normally we have an rcu_barrier() in netns dismantle path already at a strategic location ( in cleanup_net() ) Maybe the issue here is that some particular layers need another one. Or we need to release a blocking reference before the call_rcu(). Some call_rcu() usages might not be optimal in this respect. We should not add an rcu_barrier() after a call_rcu(), we prefer factoring these expensive operations.