Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp1527477rwb; Thu, 10 Nov 2022 18:13:27 -0800 (PST) X-Google-Smtp-Source: AMsMyM5geZfsvXcdor4iKVCMNlBKez/EUW3vxyBfZnUksSe85rCG/I/j0XUpTSSrP9GWn43suir+ X-Received: by 2002:a05:6402:5c4:b0:458:ed79:ed5 with SMTP id n4-20020a05640205c400b00458ed790ed5mr4185731edx.374.1668132806974; Thu, 10 Nov 2022 18:13:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668132806; cv=none; d=google.com; s=arc-20160816; b=FarD6tEjzuhDEmEJz8gx7emniSPMuLXlVwF1GjxEI+FuKxiJmNtLGMCIZAaFMYtoc6 50FfQigKlnqX6W78dp4pESajRiha4JzF7S9THA7zXy6645uqcFzMEiZWCI6G9kOVeE7k qkRstuc5HQ/SxKYg22k4mLhjH6+Qigz878wcM57UJYklQlzLe1Io9yeHaiH9yth12tdd L19GJ2qfJcNqYxvxDTIG+7eaxg2MOW0xeFSBOdj/l8n38PXirayDfzt7X2qsWAm0zgy8 ddGjXKEBzht3nFINQMcQDeidudsGLaKrLCr3QDQDtaeJfs5BKBOIT5X/C9VlSr+0nd9m Chbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=O1+ZXXkNrHkfBXEqnZaSLuRUwnIIrNDVBjNXQSrU/e4=; b=aNUatS80lt3e08tyi8aGoS09s5rol9QpbqEoy7S+GUgwO8VEbykU+IUXv8HuUaXC8T XGWhp8XrQFjhHdt9fk75u+5OT/W0rAk0HbJ2KcMcuPRFUtvgyt+vd23Pij6kBSq2k5Q1 grzDJPVHnlMIOQgRzxSksfOI738BlYcZHJvuzwX68dR2YKNK12HUC6H0bqBt0KeLAMV2 4b/iClYU04ZGi3FrL1KbXbc9+QpEdbIQrjw43tgM8S0cPClhr4c0SpzH5E+1+YmjvFX3 CPiMJCNwvglrRumcn6t6dkRcFz6DyfPfVSr+MZAAeqTm+0+7zd1Hg19+wx5nM06LKoZ2 x9kA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=mUzwqFjn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gt8-20020a1709072d8800b0076f16ec62d6si892237ejc.959.2022.11.10.18.13.04; Thu, 10 Nov 2022 18:13:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=mUzwqFjn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232496AbiKKB4c (ORCPT + 92 others); Thu, 10 Nov 2022 20:56:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232094AbiKKB4a (ORCPT ); Thu, 10 Nov 2022 20:56:30 -0500 Received: from mail-qv1-xf2f.google.com (mail-qv1-xf2f.google.com [IPv6:2607:f8b0:4864:20::f2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FFAB12094 for ; Thu, 10 Nov 2022 17:56:29 -0800 (PST) Received: by mail-qv1-xf2f.google.com with SMTP id u7so2647316qvn.13 for ; Thu, 10 Nov 2022 17:56:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=O1+ZXXkNrHkfBXEqnZaSLuRUwnIIrNDVBjNXQSrU/e4=; b=mUzwqFjnIiSvqnAtZxDyjV/tWtB7cEgLY+jGPqDoztow5Js1uw6YnhfZ6IQDLx5Ijr ViSMgaePfqNm5pc7W7SQQjVD7A0ySlMjsDNBJrOxPiVX9Ib4EuurOxnlqwffSpaY/aSc LDdjL+5agg5orX4xJU+OTn4KgS0mXz1KimuOE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=O1+ZXXkNrHkfBXEqnZaSLuRUwnIIrNDVBjNXQSrU/e4=; b=1z0WxiOWBrNCtQhNbEe+it0b4zSVPWBJ1ykuZzluyFVGf3VFMu2EWc92NAe8geQ5We d/mlhd9vPf/oijG848l+t9cK21dKxwOyVQYVNAtrV28mHiSBDKUgOXrXFKnKJatC+LYQ L7+bFOJO3Hd01B+cb1FsSvQqThY9APH3iDHw5gQJfu9MF8Qlat03oGyhRdGDO2O8n6kC 0LqK0DbU8OE4WcA9GA6Rbft70QQbSEHmvnO55RiCdqWdBa8WT5xU5MiOPlrFS26eOhtE h2xkAClVWNZ0X819FmQaiQ967pgZD4I3KfESJ1+wCIKQNePmyrqg+WZKGX3HnXgxhlx9 nmig== X-Gm-Message-State: ANoB5pnXNlh0OD54XEoGl4M6e17faQKmTMdAOqe4/uZ1QI5Q+MVBFwZ5 y0zsBN3d/cqgLPadDzCJhgKGO7LWvjRq3g== X-Received: by 2002:a05:6214:570f:b0:4b1:c5bb:25f2 with SMTP id lt15-20020a056214570f00b004b1c5bb25f2mr323203qvb.101.1668131788661; Thu, 10 Nov 2022 17:56:28 -0800 (PST) Received: from localhost (228.221.150.34.bc.googleusercontent.com. [34.150.221.228]) by smtp.gmail.com with ESMTPSA id w27-20020a05620a0e9b00b006ce3f1af120sm627219qkm.44.2022.11.10.17.56.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Nov 2022 17:56:28 -0800 (PST) Date: Fri, 11 Nov 2022 01:56:27 +0000 From: Joel Fernandes To: Uladzislau Rezki Cc: linux-kernel@vger.kernel.org, paulmck@kernel.org, rcu@vger.kernel.org Subject: Re: [PATCH v2] rcu/kfree: Do not request RCU when not needed Message-ID: References: <20221109024758.2644936-1-joel@joelfernandes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 10, 2022 at 03:01:30PM +0100, Uladzislau Rezki wrote: > > Hi, > > > > On Thu, Nov 10, 2022 at 8:05 AM Uladzislau Rezki wrote: > > > > > > On ChromeOS, using this with the increased timeout, we see that we > > > almost always > > > > never need to initiate a new grace period. Testing also shows this frees > > > large > > > > amounts of unreclaimed memory, under intense kfree_rcu() pressure. > > > > > > > > Signed-off-by: Joel Fernandes (Google) > > > > --- > > > > v1->v2: Same logic but use polled grace periods instead of sampling > > > gp_seq. > > > > > > > > kernel/rcu/tree.c | 8 +++++++- > > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index 591187b6352e..ed41243f7a49 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -2935,6 +2935,7 @@ struct kfree_rcu_cpu_work { > > > > > > > > /** > > > > * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace > > > period > > > > + * @gp_snap: The GP snapshot recorded at the last scheduling of monitor > > > work. > > > > * @head: List of kfree_rcu() objects not yet waiting for a grace period > > > > * @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a > > > grace period > > > > * @krw_arr: Array of batches of kfree_rcu() objects waiting for a > > > grace period > > > > @@ -2964,6 +2965,7 @@ struct kfree_rcu_cpu { > > > > struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; > > > > raw_spinlock_t lock; > > > > struct delayed_work monitor_work; > > > > + unsigned long gp_snap; > > > > bool initialized; > > > > int count; > > > > > > > > @@ -3167,6 +3169,7 @@ schedule_delayed_monitor_work(struct kfree_rcu_cpu > > > *krcp) > > > > mod_delayed_work(system_wq, &krcp->monitor_work, > > > delay); > > > > return; > > > > } > > > > + krcp->gp_snap = get_state_synchronize_rcu(); > > > > queue_delayed_work(system_wq, &krcp->monitor_work, delay); > > > > } > > > > > > > How do you guarantee a full grace period for objects which proceed > > > to be placed into an input stream that is not yet detached? > > > > > > Just replying from phone as I’m OOO today. > > > > Hmm, so you’re saying that objects can be queued after the delayed work has > > been queued, but processed when the delayed work is run? I’m looking at > > this code after few years so I may have missed something. > > > > That’s a good point and I think I missed that. I think your version did too > > but I’ll have to double check. > > > > The fix then is to sample the clock for the latest object queued, not for > > when the delayed work is queued. > > > The patch i sent gurantee it. Just in case see v2: You are right and thank you! CBs can be queued while the monitor timer is in progress. So we need to sample unconditionally. I think my approach is still better since I take advantage of multiple seconds (I update snapshot on every CB queue monitor and sample in the monitor handler). Whereas your patch is snapshotting before queuing the regular work and when the work is executed (This is a much shorter duration and I bet you would be blocking in cond_synchronize..() more often). As you pointed, I was sampling too late, and should be fixed below. Thoughts? ---8<----------------------- From: "Joel Fernandes (Google)" Subject: [PATCH v3] rcu/kfree: Do not request RCU when not needed On ChromeOS, using this with the increased timeout, we see that we almost always never need to initiate a new grace period. Testing also shows this frees large amounts of unreclaimed memory, under intense kfree_rcu() pressure. Signed-off-by: Joel Fernandes (Google) --- kernel/rcu/tree.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 591187b6352e..499e6ab56fbf 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2935,6 +2935,7 @@ struct kfree_rcu_cpu_work { /** * struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period + * @gp_snap: The GP snapshot recorded at the last scheduling of monitor work. * @head: List of kfree_rcu() objects not yet waiting for a grace period * @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period * @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period @@ -2964,6 +2965,7 @@ struct kfree_rcu_cpu { struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES]; raw_spinlock_t lock; struct delayed_work monitor_work; + unsigned long gp_snap; bool initialized; int count; @@ -3217,7 +3219,10 @@ static void kfree_rcu_monitor(struct work_struct *work) // be that the work is in the pending state when // channels have been detached following by each // other. - queue_rcu_work(system_wq, &krwp->rcu_work); + if (poll_state_synchronize_rcu(krcp->gp_snap)) + queue_work(system_wq, &krwp->rcu_work.work); + else + queue_rcu_work(system_wq, &krwp->rcu_work); } } @@ -3409,6 +3414,9 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) WRITE_ONCE(krcp->count, krcp->count + 1); + // Snapshot the GP clock for the latest callback. + krcp->gp_snap = get_state_synchronize_rcu(); + // Set timer to drain after KFREE_DRAIN_JIFFIES. if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING) schedule_delayed_monitor_work(krcp); -- 2.38.1.493.g58b659f92b-goog