Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2244873rwb; Thu, 17 Nov 2022 08:16:46 -0800 (PST) X-Google-Smtp-Source: AA0mqf75sNfHYY4fTSYm7+hk6fJMWQmWMx0/QrocBpOPwGRASlypTqO48dunG67VF10Grh0bBygC X-Received: by 2002:a17:907:382:b0:78d:a30f:3f46 with SMTP id ss2-20020a170907038200b0078da30f3f46mr2605525ejb.681.1668701805941; Thu, 17 Nov 2022 08:16:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668701805; cv=none; d=google.com; s=arc-20160816; b=lReM/HpsfwCRVzm6+p6F0nkVpUA0mDrZi5a6tVcTuzNjEwy6E1cg8Uw6byged0JWG9 i2TZof8kzRlRv0kDV73gAwtf47mm2L3t81Fg47AgdgcZftWZmyKu7h+jH+IOXEhl6YFF H/YXwhVQmi2sUA24MIi5IckRjitmSZRmah2srWMsjjbLZqlOg+0a/Sb9VctfIUh0eeyu QBamNPk35llNUHeYT8RvONtiLHpZfXN1SbPAR/ye+8G7ZzXwyviIrXvWAVNpGWF+4Gr7 eFTm2lFeNOeyU51SvKHNm78qcKS6iFacdsn9oR56ukigNbVouSvw2/IzqGRZdUQwRz7L SPOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=DvFYBEk1i0yKc8+3u2UZQhwgiXMpi4zV8qerEA3nQas=; b=UY/JkFXWMhctWKKDppstR8VcWYIGdZOlOWjOqAUreg7gJ92++hx/pAqDFA/l3Cgrin OpwiSuNZXnUIXzA1vjvtC6TqNdj2uYfejvkX+Yw1IvLph4ZAPEyyITO7QXQ45XGUjon+ xEgTSbKX2Py5h07TUdDV4KztHoZBY1La256wU4n5c4Oy8gt+u3xi5RfP8jyh5ww3E7eX ciRTtAiNR99cXUhYUv3nS3vMKyXf0Mj36AKMIXapJ4rwlL0ix9NDihk8vU1qC2ktASQD eR5Eky+zfwLwfWBoI1W6+fdLN77d/QfFXqujlHa3qsiQYFb6ca6QpWu86uRoPmTuzZ1y 2kog== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=S5pF0R6j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nb22-20020a1709071c9600b0073daf6b44a5si892780ejc.775.2022.11.17.08.16.20; Thu, 17 Nov 2022 08:16:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=S5pF0R6j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239948AbiKQP61 (ORCPT + 92 others); Thu, 17 Nov 2022 10:58:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239739AbiKQP6Z (ORCPT ); Thu, 17 Nov 2022 10:58:25 -0500 Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF02129A for ; Thu, 17 Nov 2022 07:58:24 -0800 (PST) Received: by mail-qk1-x72e.google.com with SMTP id z1so1446966qkl.9 for ; Thu, 17 Nov 2022 07:58:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=DvFYBEk1i0yKc8+3u2UZQhwgiXMpi4zV8qerEA3nQas=; b=S5pF0R6jtqRxB1PRynwjuQVFoIfcIfq067APWyQr0H7l1Ikno52hTU432IqQlGx/+P 8cksXWIPDkHHplO5LD6YabMTHyMA/Q4j6R1o0e0+/JYgzprdS/7P8XYCZRP8jh7X7ygU AqhRqC6vcCcYcHSdDTD/LtZgI/OOwY330GQOc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=DvFYBEk1i0yKc8+3u2UZQhwgiXMpi4zV8qerEA3nQas=; b=kl5c/3EWbmOliZOppot6TjHkxJfEBlCqsPrQksr8eZWPV9j3OnXgpFeOegjTTamcv1 17wydEvgaXxwjx8m5EiJ89XmmOUv0EgY6z5m5jkwJxRR7ntyew7FjCzyXmCvfU4HW/pD qds5mNgLsBdBLoSy2cXv5ld+mO3nCMlqdJGKgHOm/JViDpoTxCToSUCcS9yC9UEMwtQs grKkG2Vw2u5ylvXZAtIE6/m1ZD0PiKnEEuqEHmeeVqoHqDzCRtQ3V5mF7ZOgVSvE+S5L W9GG6Mhg0KV8bnq8J0PITtiY6rO2haGVkAOd4jqawoPQWpQhs47QLaIlaQ4BAFUiuuAq TGOg== X-Gm-Message-State: ANoB5pk0iPWng5K1SE8IaCMquY7x3+f02hCX5h/gk+1fAaENaz+entEH 3zSVRggFd3Tz5sfJ6ESzlI70Mg== X-Received: by 2002:a37:94c6:0:b0:6fa:2ff9:e9ca with SMTP id w189-20020a3794c6000000b006fa2ff9e9camr2284267qkd.29.1668700703805; Thu, 17 Nov 2022 07:58:23 -0800 (PST) Received: from localhost (228.221.150.34.bc.googleusercontent.com. [34.150.221.228]) by smtp.gmail.com with ESMTPSA id bq40-20020a05620a46a800b006fb7c42e73asm688995qkb.21.2022.11.17.07.58.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Nov 2022 07:58:23 -0800 (PST) Date: Thu, 17 Nov 2022 15:58:23 +0000 From: Joel Fernandes To: Eric Dumazet Cc: linux-kernel@vger.kernel.org, Cong Wang , David Ahern , "David S. Miller" , Hideaki YOSHIFUJI , Jakub Kicinski , Jamal Hadi Salim , Jiri Pirko , netdev@vger.kernel.org, Paolo Abeni , rcu@vger.kernel.org, rostedt@goodmis.org, paulmck@kernel.org, fweisbec@gmail.com, jiejiang@google.com Subject: Re: [PATCH rcu/dev 3/3] net: Use call_rcu_flush() for dst_destroy_rcu Message-ID: References: <20221117031551.1142289-1-joel@joelfernandes.org> <20221117031551.1142289-3-joel@joelfernandes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Eric, On Wed, Nov 16, 2022 at 07:44:41PM -0800, Eric Dumazet wrote: > On Wed, Nov 16, 2022 at 7:16 PM Joel Fernandes (Google) > wrote: > > > > In a networking test on ChromeOS, we find that using the new CONFIG_RCU_LAZY > > causes a networking test to fail in the teardown phase. > > > > The failure happens during: ip netns del > > And ? What happens then next ? The test is doing the 'ip netns del ' and then polling for the disappearance of a network interface name for upto 5 seconds. I believe it is using netlink to get a table of interfaces. That polling is timing out. Here is some more details from the test's owner (copy pasting from another bug report): In the cleanup, we remove the netns, and thus will cause the veth pair being removed automatically, so we use a poll to check that if the veth in the root netns still exists to know whether the cleanup is done. Here is a public link to the code that is failing (its in golang): https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/network/virtualnet/env/env.go;drc=6c2841d6cc3eadd23e07912ec331943ee33d7de8;l=161 Here is a public link to the line of code in the actual test leading up to the above path (this is the test that is run: network.RoutingFallthrough.ipv4_only_primary) : https://source.chromium.org/chromiumos/chromiumos/codesearch/+/main:src/platform/tast-tests/src/chromiumos/tast/local/bundles/cros/network/routing_fallthrough.go;drc=8fbf2c53960bc8917a6a01fda5405cad7c17201e;l=52 > > Using ftrace, I found the callbacks it was queuing which this series fixes. Use > > call_rcu_flush() to revert to the old behavior. With that, the test passes. > > What is this test about ? What barrier was used to make it not flaky ? I provided the links above, let me know if you have any questions. > Was it depending on some undocumented RCU behavior ? This is a new RCU feature posted here for significant power-savings on battery-powered devices: https://lore.kernel.org/rcu/20221017140726.GG5600@paulmck-ThinkPad-P17-Gen-1/T/#m7a54809b8903b41538850194d67eb34f203c752a There is also an LPC presentation about the same, I can dig the link if you are interested. > Maybe adding a sysctl to force the flush would be better for functional tests ? > > I would rather change the test(s), than adding call_rcu_flush(), > adding merge conflicts to future backports. I am not too sure about that, I think a user might expect the network interface to disappear from the networking tables quickly enough without dealing with barriers or kernel iternals. However, I added the authors of the test to this email in the hopes he can provide is point of views as well. The general approach we are taking with this sort of thing is to use call_rcu_flush() which is basically the same as call_rcu() for systems with CALL_RCU_LAZY=n. You can see some examples of that in the patch series link above. Just to note, CALL_RCU_LAZY depends on CONFIG_RCU_NOCB_CPU so its only Android and ChromeOS that are using it. I am adding Jie to share any input, he is from the networking team and knows this test well. thanks, - Joel