Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp53786rwl; Thu, 30 Mar 2023 12:03:21 -0700 (PDT) X-Google-Smtp-Source: AKy350Y6gvA57zGHRzHkpG9sNAhfGc9bfSyQaFZOY2fiqAuoiDuEqHx3BPLUcacKmXUanXGWdvvK X-Received: by 2002:a17:903:244e:b0:19a:839d:b67a with SMTP id l14-20020a170903244e00b0019a839db67amr3347392pls.5.1680203000911; Thu, 30 Mar 2023 12:03:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680203000; cv=none; d=google.com; s=arc-20160816; b=OxlhaCr7E3h1Py3/XhwPq0a7mD5PUxPvMly/5QnHJgrvGOxobWiO2l+L6ev9o01w64 ecNHyVbBixLBnolI5sWcrmhuF0W0aB/btWHkT2mC9BjKEwZkavTnJJpzjOzB9oZuocGv CBqd16EzT/3kLLTojyTjbeTIxC1IfzLQZhbUKO7G+UJk3jQOTl1j0mpG6rTeaq18lKN+ kfOnfwWY437MFgtNe6C0ELsXBjJ0qd/y2rAnvglRVWZugPX10FTU8Fr3RsExo+vzpsh3 KgmcfnhteEWm76GJtKt4E4ai4IOd9H2cmUuLCG50xYCcEth/WgMcE/ctzQzkFPk7VXM7 HUdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=FqKUNVlTjZmCLXa4X55ltvbln4kNvEv7AyOjfUL8eXQ=; b=xyx+IstLygMgeF1CyiV/lelrPk0zm5i+HzRgSY0FsaykU1UFGT4Bl7fjS3QRIyKguq nFrD3kZ1QtpYXqPyBAL/BjVjid0QqanomWcSPMLlkGEa4ckjoDhXl1UXpQaTVyv27OU8 L7uXp0T0QWzjm/63MErc2nHp3LOBjVAZ8CyAw5eocUFAJagYo61G5Kd910u/kIrqvTQx yx+RTnNRYHfMCJ9qV9JxK5e2R60fUtAt1Y3ia8VDdR7ECZc+Y6KZ/j0sDRF9cMicyxTk 2xkwV+gyGO/TvHOhfIQ2YuXeoFlbCoXnvqag8LHysDxmTKR1+VaDkvnt/mcbnRufFjqO s+SQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Lo1q69uA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bi9-20020a170902bf0900b0019c9999f4dcsi183527plb.230.2023.03.30.12.03.04; Thu, 30 Mar 2023 12:03:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Lo1q69uA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231543AbjC3TBj (ORCPT + 99 others); Thu, 30 Mar 2023 15:01:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230326AbjC3TBh (ORCPT ); Thu, 30 Mar 2023 15:01:37 -0400 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E404BEE; Thu, 30 Mar 2023 12:01:35 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 19A21CE2B87; Thu, 30 Mar 2023 19:01:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3163DC433EF; Thu, 30 Mar 2023 19:01:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680202892; bh=h9wFIXsGa8qCPRudrt2NkuXbT8kMfGfrCv1Y5inotQk=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=Lo1q69uANy/Bfoaw91pM8zic9hcomhvJxdbRNOyZL7oEzTxKrD1/apGWpcdCx3xEV swiMJXNNJF68759OH2D5lQV+PulWYIRzyKqeAPImwmyYeBTFBKyD3ZVdrGAZGlTGfE uDe2h8ojVT4F5bP1DVkOhquWVCVeG1WNwTEX3Mp2hdo7Hs3jGyLRrE1EiTeWj8NEYd d2IOuJl6aDW9sVnj+/97tzkOlLBOhyaTL25Q68V7sKFjgP2nGRUONKjSMsk8OFluVm X9T7D1PoKRBycSVjWmvLEwkUu9W3o60LrD0n/+Vmep08Ql3YaXCKEHYekg7HHyiYVS LILghXNoRiwvQ== Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id B808D1540476; Thu, 30 Mar 2023 12:01:31 -0700 (PDT) Date: Thu, 30 Mar 2023 12:01:31 -0700 From: "Paul E. McKenney" To: Joel Fernandes Cc: "Zhang, Qiang1" , Uladzislau Rezki , "Zhuo, Qiuxu" , RCU , quic_neeraju@quicinc.com, Boqun Feng , LKML , Oleksiy Avramchenko , Steven Rostedt , Frederic Weisbecker Subject: Re: [PATCH 1/1] Reduce synchronize_rcu() waiting time Message-ID: Reply-To: paulmck@kernel.org References: <2cd8f407-2b77-48b1-9f17-9aa8e4ce9c64@paulmck-laptop> <20230330151115.GC2114899@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230330151115.GC2114899@google.com> X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 30, 2023 at 03:11:15PM +0000, Joel Fernandes wrote: > On Tue, Mar 28, 2023 at 03:14:32PM -0700, Paul E. McKenney wrote: > > On Tue, Mar 28, 2023 at 08:26:13AM -0700, Paul E. McKenney wrote: > > > On Mon, Mar 27, 2023 at 10:29:31PM -0400, Joel Fernandes wrote: > > > > Hello, > > > > > > > > > On Mar 27, 2023, at 9:06 PM, Paul E. McKenney wrote: > > > > > > > > > > On Mon, Mar 27, 2023 at 11:21:23AM +0000, Zhang, Qiang1 wrote: > > > > >>>> From: Uladzislau Rezki (Sony) > > > > >>>> Sent: Tuesday, March 21, 2023 6:28 PM > > > > >>>> [...] > > > > >>>> Subject: [PATCH 1/1] Reduce synchronize_rcu() waiting time > > > > >>>> > > > > >>>> A call to a synchronize_rcu() can be expensive from time point of view. > > > > >>>> Different workloads can be affected by this especially the ones which use this > > > > >>>> API in its time critical sections. > > > > >>>> > > > > >>> > > > > >>> This is interesting and meaningful research. ;-) > > > > >>> > > > > >>>> For example in case of NOCB scenario the wakeme_after_rcu() callback > > > > >>>> invocation depends on where in a nocb-list it is located. Below is an example > > > > >>>> when it was the last out of ~3600 callbacks: > > > > >>> > > > > >> > > > > >> > > > > >> > > > > >> Can it be implemented separately as follows? it seems that the code is simpler > > > > >> (only personal opinion) ????. > > > > >> > > > > >> But I didn't test whether this reduce synchronize_rcu() waiting time > > > > >> > > > > >> +static void rcu_poll_wait_gp(struct rcu_tasks *rtp) > > > > >> +{ > > > > >> + unsigned long gp_snap; > > > > >> + > > > > >> + gp_snap = start_poll_synchronize_rcu(); > > > > >> + while (!poll_state_synchronize_rcu(gp_snap)) > > > > >> + schedule_timeout_idle(1); > > > > > > > > > > I could be wrong, but my guess is that the guys working with > > > > > battery-powered devices are not going to be very happy with this loop. > > > > > > > > > > All those wakeups by all tasks waiting for a grace period end up > > > > > consuming a surprisingly large amount of energy. > > > > > > > > Is that really the common case? On the general topic of wake-ups: > > > > Most of the time there should be only one > > > > task waiting synchronously on a GP to end. If that is > > > > true, then it feels like waking > > > > up nocb Kthreads which indirectly wake other threads is doing more work than usual? > > > > > > A good question, and the number of outstanding synchronize_rcu() > > > calls will of course be limited by the number of tasks in the system. > > > But I myself have raised the ire of battery-powered embedded folks with > > > a rather small number of wakeups, so... > > > > > > And on larger systems there can be a tradeoff between contention on > > > the one hand and number of wakeups on the other. > > > > > > The original nocb implementation in fact had the grace-period kthead > > > waking up all of what are now called rcuoc kthreads. The indirect scheme > > > reduced the total number of wakeups by up to 50% and also reduced the > > > CPU consumption of the grace-period kthread, which otherwise would have > > > become a bottleneck on large systems. > > > > > > And also, a scheme that directly wakes tasks waiting in synchronize_rcu() > > > might well use the same ->nocb_gp_wq[] waitqueues that are used by the > > > rcuog kthreads, if that is what you were getting at. > > > > And on small systems, you might of course have the rcuog kthread directly > > invoke callbacks if there are not very many of them. This would of > > course need to be done quite carefully to avoid any number of races > > with the rcuoc kthreads. You could do the same thing on a large system, > > but on a per-rcuog basis. > > > > I vaguely recall discussing this in one of our sessions, but who knows? > > > > Would this really be of benefit? Or did you have something else in mind? > > Yes, this is what I was also referring to. > > Not sure about benefit, depends on workloads and measurement. There are of course potential downsides, including slower handling of callback floods and tuning difficulties due to there being no good way thus far to estimate how much time a given RCU callback will consume. But if there are significant benefits to battery-powered systems, then enabling this sort of thing only on those systems could make a lot of sense. Thanx, Paul > thanks, > > - Joel > > > > > > Thanx, Paul > > > > > > I am curious to measure how much does Vlad patch reduce wakeups in the common case. > > > > > > Sounds like a good thing to measure! > > > > > > > I was also wondering how Vlad patch effects RCU-barrier ordering. I guess > > > > we want the wake up to happen in the order of > > > > other callbacks also waiting. > > > > > > OK, I will bite. Why would rcu_barrier() need to care about the > > > synchronize_rcu() invocations if they no longer used call_rcu()? > > > > > > > One last note, most battery powered systems are perhaps already using expedited RCU ;-) > > > > > > Good point. And that does raise the question of exactly what workloads > > > and systems want faster wakeups from synchronize_rcu() and cannot get > > > this effect from expedited grace periods. > > > > > > Thanx, Paul > > > > > > > Thoughts? > > > > > > > > - Joel > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > >> +} > > > > >> + > > > > >> +void call_rcu_poll(struct rcu_head *rhp, rcu_callback_t func); > > > > >> +DEFINE_RCU_TASKS(rcu_poll, rcu_poll_wait_gp, call_rcu_poll, > > > > >> + "RCU Poll"); > > > > >> +void call_rcu_poll(struct rcu_head *rhp, rcu_callback_t func) > > > > >> +{ > > > > >> + call_rcu_tasks_generic(rhp, func, &rcu_poll); > > > > >> +} > > > > >> +EXPORT_SYMBOL_GPL(call_rcu_poll); > > > > >> + > > > > >> +void synchronize_rcu_poll(void) > > > > >> +{ > > > > >> + synchronize_rcu_tasks_generic(&rcu_poll); > > > > >> +} > > > > >> +EXPORT_SYMBOL_GPL(synchronize_rcu_poll); > > > > >> + > > > > >> +static int __init rcu_spawn_poll_kthread(void) > > > > >> +{ > > > > >> + cblist_init_generic(&rcu_poll); > > > > >> + rcu_poll.gp_sleep = HZ / 10; > > > > >> + rcu_spawn_tasks_kthread_generic(&rcu_poll); > > > > >> + return 0; > > > > >> +} > > > > >> > > > > >> Thanks > > > > >> Zqiang > > > > >> > > > > >> > > > > >>>> > > > > >>>> > > > > >>>> <...>-29 [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt > > > > >>>> CBs=3613 bl=28 > > > > >>>> ... > > > > >>>> <...>-29 [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt > > > > >>>> rhp=00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt > > > > >>>> <...>-29 [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt > > > > >>>> rhp=00000000a446f607 func=__free_vm_area_struct.cfi_jt > > > > >>>> <...>-29 [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt > > > > >>>> rhp=00000000a5cab03b func=__free_vm_area_struct.cfi_jt > > > > >>>> <...>-29 [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt > > > > >>>> rhp=0000000013b7e5ee func=__free_vm_area_struct.cfi_jt > > > > >>>> <...>-29 [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt > > > > >>>> rhp=000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt > > > > >>>> <...>-29 [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt > > > > >>>> rhp=000000008f162ca8 func=wakeme_after_rcu.cfi_jt > > > > >>>> <...>-29 [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs- > > > > >>>> invoked=3612 idle=.... > > > > >>>> > > > > >>>> > > > > >>> > > > > >>> Did the results above tell us that CBs-invoked=3612 during the time 21950.145313 ~ 21950.152625? > > > > >>> > > > > >>> Yes. > > > > >>> > > > > >>> > > > > >>> If possible, may I know the steps, commands, and related parameters to produce the results above? > > > > >>> Thank you! > > > > >>> > > > > >>> Build the kernel with CONFIG_RCU_TRACE configuration. Update your "set_event" > > > > >>> file with appropriate traces: > > > > >>> > > > > >>> > > > > >>> XQ-DQ54:/sys/kernel/tracing # echo rcu:rcu_batch_start rcu:rcu_batch_end rcu:rcu_invoke_callback > set_event > > > > >>> > > > > >>> XQ-DQ54:/sys/kernel/tracing # cat set_event > > > > >>> rcu:rcu_batch_start > > > > >>> rcu:rcu_invoke_callback > > > > >>> rcu:rcu_batch_end > > > > >>> XQ-DQ54:/sys/kernel/tracing # > > > > >>> > > > > >>> > > > > >>> Collect traces as much as you want: XQ-DQ54:/sys/kernel/tracing # echo 1 > tracing_on; sleep 10; echo 0 > tracing_on > > > > >>> Next problem is how to parse it. Of course you will not be able to parse > > > > >>> megabytes of traces. For that purpose i use a special C trace parser. > > > > >>> If you need an example please let me know i can show here. > > > > >>> > > > > >>> -- > > > > >>> Uladzislau Rezki