Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2332931rwb; Thu, 17 Nov 2022 09:21:49 -0800 (PST) X-Google-Smtp-Source: AA0mqf579tITlegVkOYoHPXd6g7iMbjci9iXNUGJ7ovf+dwQpdRtUkHx6QFJ9Y4MnwYCaRDhG7Pm X-Received: by 2002:a17:906:160b:b0:78d:dddb:3974 with SMTP id m11-20020a170906160b00b0078ddddb3974mr2896166ejd.411.1668705709667; Thu, 17 Nov 2022 09:21:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668705709; cv=none; d=google.com; s=arc-20160816; b=1AzsPp+u+srnAuCFDGXfkYkWF5MmMafCkWgoOUHgWvw+4mtldnvjQA3drEGDb4/+0d 6fR++DSG5b2yNTACODiuR8yWur6JZR9LakC1GhR9kqqVmVmXnf//ADgU56XeuA8YeEEa llggfrJRm+DOmpt2X2jZDj7LRP/EIfO6IbFr3mnCtyKscHET8PajsnOD/GN3d97fpUDn TKbkJci7slCo26Dg4sC9jlIiFT/vR4Goc9qpL3wOZU8zgN/8z1cyarwJtcNOb+Hh9sEs wOIIuSspYf1G9iYP1OjCc36Ynra5B6kiEtBKxIPy/KxtD2XRklzELLF5xQU2bauB8yE/ As2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=TfyR42Rw7fajSCS2sVCfl9VRMWqE/0SBTvtxDZ7f2WE=; b=a0c/TvKrfld2c+7j3MsMk3X+bKcUXmdtBKfZyDTyWAXjivmYj6DJ6Adbx/eNfdfBK8 F7pI9VG2Urd3B/sAkTaXvVxYj7qqbuqQvbkOL0JCpnvhkaDQhQUtoXcZut2c5HARQ1sa FzEdJoqdIuKtX+lM9y5POIQSk5+1YQeAaKEScYDZujQFI5/q4/iNG3tmK8mwzq96W59V uC0gm2m2HgS1Oj54vivWEkd9kRyObrRhC14zx7oqdsBV0LnnioQ0CZ5B99Nj29bHjrNm zrdHwqlWQxKgzJ7xViKGs8JZtMfiYfGMXH00pgxWP4I0U+bh4adhB2+Ak2tO/t7BPoDE Pi3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="qCmP/eA6"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cw4-20020a170906478400b007919c624eadsi944516ejc.522.2022.11.17.09.21.25; Thu, 17 Nov 2022 09:21:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="qCmP/eA6"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240429AbiKQRRw (ORCPT + 92 others); Thu, 17 Nov 2022 12:17:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234952AbiKQRRe (ORCPT ); Thu, 17 Nov 2022 12:17:34 -0500 Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34EA07AF77 for ; Thu, 17 Nov 2022 09:17:33 -0800 (PST) Received: by mail-yb1-xb29.google.com with SMTP id 7so2618089ybp.13 for ; Thu, 17 Nov 2022 09:17:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=TfyR42Rw7fajSCS2sVCfl9VRMWqE/0SBTvtxDZ7f2WE=; b=qCmP/eA6/5UdR6Qw9Wcp6BWygueNO+fpKceuuW0fRwSaskw4hNdQZGG511optr0DMU QBX08dMPfFKcq40jL4Oi1OVErlnaFQQGwsE9V3y91/3Mqkyow+i/sCHJeovnVm9l+/cg q9z4uNPnmEzIJOOQ5me7/fPVCFB8/LnVN21KwBZB0dT2/r4mm5da75L4IQZVxwxwTiZ8 8NDpLCgN0foSY+sa+g3wcjnBcRyA4idrkIWtgkADW+cTZjzo3MiG60440K5bgwkt0KIG ypbZiWQc3jq4SP1f2Ad3cfTy19SajhMzIV6zm7Q+EvpyLpOLgqWNCOR+EFHRCFrxIrOz 7vqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TfyR42Rw7fajSCS2sVCfl9VRMWqE/0SBTvtxDZ7f2WE=; b=khfSD6Rhj6tP+e1+znNjdbgWWuxvwc8EwOVmfZuetytr/C8VMbBA3REJgECrfBQgWX m2m8wahAxlDDY3P14wZiFFI+XBN1t5SSo/QpSgpfmjbJg/yAz8NH80q1jcK7q+dfsYY/ ff0CItpdxcPbpbS50edc3UWAYXJAGxVEXFuniIYTpoyBpoeH6OOEGXzzAexiqcjR9UC1 UXpvqy5rll9ElwYWf42J9fM2+b0ruB7thWJ0/GfayXOH73dkRgzqD+jsGXOogDMFwpfa MdMUwW6Z6DkgUACuomg9XmYSmrbUItBcsLs8JD63TzUk7+shzDL/ZZFChAIlmo5MkFpM 2D0Q== X-Gm-Message-State: ANoB5pl8+9DF3fAioNijFAOVbsDUEoOIMxQl+vrK75Xyrxu1o8+4GZvE rcS9zHZ+VqXSGTwvLE2VJGdK0Dk+sIy9ENPhl3171w== X-Received: by 2002:a05:6902:11cd:b0:6e7:f2ba:7c0f with SMTP id n13-20020a05690211cd00b006e7f2ba7c0fmr1090831ybu.55.1668705452067; Thu, 17 Nov 2022 09:17:32 -0800 (PST) MIME-Version: 1.0 References: <20221117031551.1142289-1-joel@joelfernandes.org> <20221117031551.1142289-3-joel@joelfernandes.org> In-Reply-To: From: Eric Dumazet Date: Thu, 17 Nov 2022 09:17:20 -0800 Message-ID: Subject: Re: [PATCH rcu/dev 3/3] net: Use call_rcu_flush() for dst_destroy_rcu To: Joel Fernandes Cc: linux-kernel@vger.kernel.org, Cong Wang , David Ahern , "David S. Miller" , Hideaki YOSHIFUJI , Jakub Kicinski , Jamal Hadi Salim , Jiri Pirko , netdev@vger.kernel.org, Paolo Abeni , rcu@vger.kernel.org, rostedt@goodmis.org, paulmck@kernel.org, fweisbec@gmail.com, jiejiang@google.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 17, 2022 at 7:58 AM Joel Fernandes wrote: > > Hello Eric, > > On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote: > > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google) > > wrote: > > > > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY > > > causes a networking test to fail in the teardown phase. > > > > > > The failure happens during: ip netns del > > > > And ? What happens then next ? > > The test is doing the 'ip netns del ' and then polling for the > disappearance of a network interface name for upto 5 seconds. I believe it is > using netlink to get a table of interfaces. That polling is timing out. > > Here is some more details from the test's owner (copy pasting from another > bug report): > In the cleanup, we remove the netns, and thus will cause the veth pair being > removed automatically, so we use a poll to check that if the veth in the root > netns still exists to know whether the cleanup is done. > > Here is a public link to the code that is failing (its in golang): > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161 > > Here is a public link to the line of code in the actual test leading up to the above > path (this is the test that is run: > network.RoutingFallthrough.ipv4_only_primary) : > https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52 > > > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use > > > call_rcu_flush() to revert to the old behavior. With that, the test passes. > > > > What is this test about ? What barrier was used to make it not flaky ? > > I provided the links above, let me know if you have any questions. > > > Was it depending on some undocumented RCU behavior ? > > This is a new RCU feature posted here for significant power-savings on > battery-powered devices: > https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a > > There is also an LPC presentation about the same, I can dig the link if you > are interested. > > > Maybe adding a sysctl to force the flush would be better for functional tests ? > > > > I would rather change the test(s), than adding call_rcu_flush(), > > adding merge conflicts to future backports. > > I am not too sure about that, I think a user might expect the network > interface to disappear from the networking tables quickly enough without > dealing with barriers or kernel iternals. However, I added the authors of the > test to this email in the hopes he can provide is point of views as well. > > The general approach we are taking with this sort of thing is to use > call_rcu_flush() which is basically the same as call_rcu() for systems with > CALL_RCU_LAZY=n. You can see some examples of that in the patch series link > above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only > Android and ChromeOS that are using it. I am adding Jie to share any input, > he is from the networking team and knows this test well. > > I do not know what is this RCU_LAZY thing, but IMO this should be opt-in For instance, only kfree_rcu() should use it. We can not review hundreds of call_rcu() call sites and decide if adding arbitrary delays cou hurt .