Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp1072683rwb; Thu, 18 Aug 2022 18:30:55 -0700 (PDT) X-Google-Smtp-Source: AA6agR6eUDt5VrYr16SdXGqo3RgLDj/deJhNrsS/FKYNomXL7Wj9oqA0IcwaGE5/UyZKGccTh0aR X-Received: by 2002:a17:90a:4a01:b0:1fa:a153:7b5a with SMTP id e1-20020a17090a4a0100b001faa1537b5amr11737292pjh.72.1660872654724; Thu, 18 Aug 2022 18:30:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660872654; cv=none; d=google.com; s=arc-20160816; b=y+JAgsGE8EJssL4SApmwkRqK/VV5+22bzMpoSZscNzuuShcq2V9h3HaBdoKwG7F2Iq AL7ZP0YYMWQQ+AQKqzcfV4FmbcgB9pRpqfoZI9TqV60mnp5c9q2TdOlZHk9WuZS4jtC2 kyIAr2uRNhiAWCZgA2B/BLUGVXZ5uNXRrA7vstJoe+pLzzFJHtG3DKawCYvgtV7KvJmk ywMfKXx6ordYKIsftxJngZgPWRblCsnePExXeeHgaG5TRJde3IlsWDdMRIsrgCnNiPAG /5wk7ft7ADWtRNgXH62H78grcsDULUCkePTiUMKnbKcPnar2SIqoN631fNN23fS7TdTF wi1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=++fASc0fsxHxUOS23IzKRmPzdmpS9qfA0Lyv2k53js8=; b=cTdMmUdMvTWlloH08YppFXierDNeaj4ZuVTJb6GMRm5FXhOlVhNrdE1eNDDaUWrAGe ei5ipuZch2mj6Vn2yL+5yT0lK41h57JZBx5fspmlhsfnP3emJnGxNhtj8c8kjeERoFow Ol0L1Qk4YCG1iL6DaNLLlk1chuOXf4JsOEJc8HiNTDAsWL+msctGVxYUP9eMlRGKOI1i c/keRnp2oiClbivnDWaxgRJLPFkQjnFYaSLze70OI2AWl0e7NAGdnaGS6RlwCxJeRaUd PyorSGJyVzVJXUMzbBzpZP4/sSFG7fIoodTu6xeCsCgTwFx4P7l6bXlXAVkQuowISahR PFCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=c702vnQs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i136-20020a636d8e000000b00415366cf295si807319pgc.19.2022.08.18.18.30.34; Thu, 18 Aug 2022 18:30:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=c702vnQs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242900AbiHSBWL (ORCPT + 99 others); Thu, 18 Aug 2022 21:22:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229584AbiHSBWJ (ORCPT ); Thu, 18 Aug 2022 21:22:09 -0400 Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88886D51ED for ; Thu, 18 Aug 2022 18:22:08 -0700 (PDT) Received: by mail-io1-xd33.google.com with SMTP id x64so2367029iof.1 for ; Thu, 18 Aug 2022 18:22:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=++fASc0fsxHxUOS23IzKRmPzdmpS9qfA0Lyv2k53js8=; b=c702vnQs1zjXA6roO7+CTFJfTOyEb022/1v0c4gnp6L7ldFKZmMCjIie7sigbTgmp8 bDQC7LFk1YVUA6Wf6ymxs7G0ork60nMr2/dzxAjOQXiPG3+n/Khzj18qcSIwEXrRS/hc mqgjI8GQIeey3koM69rJ2LVvb9m04IlQUx+uM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=++fASc0fsxHxUOS23IzKRmPzdmpS9qfA0Lyv2k53js8=; b=U50dfjXSZBkQ0p+tx2NpTy7RqR4oFz587oduD+4SAK0ulqaSOZd8g3tTESiXjx1f0l fMaJwcvuTs3r9yHnN3uEiP1rFZBYZGbBhho3oXg+DnRuFxHnTvsyUioS4Avmkd6bRRDr aDn7JsVxvMMduIEJI3Cz/IAf4yCdWmLHm86eZP9AotEumzLBEgXcqkxAxSH0HJlAXO9j 4wvreVe/wB1frGZnxEICqK1Cccy+6WMalem++Fv9wyWusezGYmuA4h2QN73TeK6cgV4S EfU1KVdfy9SxF0EH1T3A7eUOfHzz22Xp7mMMa+r/nNcmCgIdfejMdu/tm4VnD/4eLGjE C03g== X-Gm-Message-State: ACgBeo2gdSTBquP13J4zCwBJ2mSMimsoVrqah6VTnHSlypZC9DW5abws /xHk/vv3CYsCcnh5fFJY4G7GNJUfnDukYuZM49g7y19Iy6C8eA== X-Received: by 2002:a5d:9914:0:b0:67c:2039:caff with SMTP id x20-20020a5d9914000000b0067c2039caffmr2416925iol.201.1660872127769; Thu, 18 Aug 2022 18:22:07 -0700 (PDT) MIME-Version: 1.0 References: <20220809034517.3867176-1-joel@joelfernandes.org> <20220809034517.3867176-5-joel@joelfernandes.org> In-Reply-To: From: Joel Fernandes Date: Thu, 18 Aug 2022 21:21:56 -0400 Message-ID: Subject: Re: [PATCH v3 resend 4/6] fs: Move call_rcu() to call_rcu_lazy() in some paths To: LKML Cc: Rushikesh S Kadam , "Uladzislau Rezki (Sony)" , Neeraj upadhyay , Frederic Weisbecker , "Paul E. McKenney" , Steven Rostedt , rcu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 18, 2022 at 7:05 PM Joel Fernandes wrote: > > On Thu, Aug 18, 2022 at 1:23 PM Joel Fernandes wrote: > > > > [Sorry, adding back the CC list] > > > > On Mon, Aug 8, 2022 at 11:45 PM Joel Fernandes (Google) > > wrote: > > > > > > This is required to prevent callbacks triggering RCU machinery too > > > quickly and too often, which adds more power to the system. > > > > > > When testing, we found that these paths were invoked often when the > > > system is not doing anything (screen is ON but otherwise idle). > > > > Unfortunately, I am seeing a slow down in ChromeOS boot performance > > after applying this particular patch. It is the first time I could > > test ChromeOS boot times with the series since it was hard to find a > > ChromeOS device that runs the upstream kernel. > > > > Anyway, Vlad, Neeraj, do you guys also see slower boot times with this > > patch? I wonder if the issue is with wake up interaction with the nocb > > GP threads. > > > > We ought to disable lazy RCU during boot since it would have little > > benefit anyway. But I am also concerned about some deeper problem I > > did not catch before. > > > > I'll look into tracing the fs paths to see if I can narrow down what's > > causing it. Will also try a newer kernel, I am currently testing on > > 5.19-rc4. > > I got somewhere with this. It looks like queuing CBs as lazy CBs > instead of normal CBs, are triggering expedited stalls during the boot > process: > > 39.949198] rcu: INFO: rcu_preempt detected expedited stalls on > CPUs/tasks: { } 28 jiffies s: 69 root: 0x0/. > > No idea how/why lazy RCU CBs would be related to expedited GP issues, > but maybe something hangs and causes that side-effect. > > initcall_debug did not help, as it seems initcalls all work fine, and > then 8 seconds after the boot, it starts slowing down a lot, followed > by the RCU stall messages. As a next step I'll enable ftrace during > the boot to see if I can get more insight. But I believe, its not the > FS layer, the FS layer just triggers lazy CBs, but there is something > wrong with the core lazy-RCU work itself. > > This kernel is 5.19-rc4. I'll also try to rebase ChromeOS on more > recent kernels and debug. More digging, thanks to trace_event= boot option , I find that the boot process does have some synchronous waits, and though these are "non-lazy", for some reason the lazy CBs that were previously queued are making them wait for the *full* lazy duration. Which points to a likely bug in the lazy RCU logic. These synchronous CBs should never be waiting like the lazy ones: [ 17.715904] => trace_dump_stack [ 17.715904] => __wait_rcu_gp [ 17.715904] => synchronize_rcu [ 17.715904] => selinux_netcache_avc_callback [ 17.715904] => avc_ss_reset [ 17.715904] => sel_write_enforce [ 17.715904] => vfs_write [ 17.715904] => ksys_write [ 17.715904] => do_syscall_64 [ 17.715904] => entry_SYSCALL_64_after_hwframe I'm tired so I'll resume the debug later.