Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp468655ybl; Fri, 9 Aug 2019 08:40:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqy1CNKizsz1NeVj92s8CzOrwT+GNzLYSR/oFZW2LoPmZfSZsbAkEIMMbE5shoBTMmv9Quhn X-Received: by 2002:a17:902:ba8f:: with SMTP id k15mr14590776pls.38.1565365230886; Fri, 09 Aug 2019 08:40:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565365230; cv=none; d=google.com; s=arc-20160816; b=0fn8a0/MvkMX4cyR7dRzbLWnh0Wz63BVj4GR4TW5z14ylxodaJhhMzWW6UqLume/Y5 fHPqysqcjoFooaaXxba08uHIDKIHOju1k4/X+5YNOm82IPXSjmnrwMks4XVxo5fGyNrw 7U1bu2JsRhyisWKaHe17eS295LNFX2SNxbzr7HuXy9+58Tb//U1EKpPOgteltO0HCEko SEMc/bC0lzTBEh/wdp0WY9x2lc/vkaGcJRcQO4AFfBNxS5ttqj9JVfc9uaEwXlcDrurq cvpY+DbbXB/zoMq6ABtLXpSZi0H7VXuKWP9cGaHd2eq9reAKzvQrTQf9uQksTiDJRTwe 1xpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=BWGiOMo6Oo+hB+6Iqb/+xexv7iVbAnlBrvAjyVUrnGU=; b=YapIbciS1xPh+mAdLp3467+6Ylc5DWBGLPkjx+jM7L2eqBToMSY8+8lf1HNDF/3YZ+ XzdlU3KV9npI3oTKJZWjO+1r9dlU5y90DZmF//OcrnfgMzauux/xXbsvWc/BkxHgu7ks YgZ/3CLs9U5RfkDxG5s6oSUoVHfi90DlmaBayBopDVh+2yHb3+dr+pjbB3cR51T6H7F1 vznagfLb3BHHDHwyts6tqjdEnCLMLa6xt7wU6BGzofOSEmkOPdm5XAntRMzQ5bTEjQxg xYB7rEahXvPAoF4BQkNlN5zpLdSg2gNDBvvgbue2MViwxNbnvXpMdOPZRV5rjfwd1j+9 Hhfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b="UiDW//jf"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i10si59992578pfr.130.2019.08.09.08.40.14; Fri, 09 Aug 2019 08:40:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b="UiDW//jf"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436741AbfHIPj2 (ORCPT + 99 others); Fri, 9 Aug 2019 11:39:28 -0400 Received: from mail-pf1-f193.google.com ([209.85.210.193]:33708 "EHLO mail-pf1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436687AbfHIPj1 (ORCPT ); Fri, 9 Aug 2019 11:39:27 -0400 Received: by mail-pf1-f193.google.com with SMTP id g2so46258466pfq.0 for ; Fri, 09 Aug 2019 08:39:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=BWGiOMo6Oo+hB+6Iqb/+xexv7iVbAnlBrvAjyVUrnGU=; b=UiDW//jfXDbHnMeP/t661vQybdTVh1f0WdabueMcgF7vjn5qW7cTxVBRLNWm5Bzfb5 kVfWGccBIQESBP72iCU2FIuZK2LVU6/kezBpWaIRpUJTqKacCYrzDfXv0ghuItToJIM8 3TQ9GmA9ufThdnSTzJegdrPD7S2m/heaMLMTE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=BWGiOMo6Oo+hB+6Iqb/+xexv7iVbAnlBrvAjyVUrnGU=; b=SQHMkVPXhWwSuluoe7ISqeQXsCIFAaBj/l2cE1Frbg3V5lD+e6YjYdyuCMCAzfGmFA /KfG3Eik9GPS1hxLH7eBiq+xZmejUsSe78J9rQxbSpZqcgEmQMkRtocAlsp0kTMLDHXR YIo9BWolccyo7rTOUYs7IaXfMgvJGAu2mesXCni7ucL03LF5FrSxDC6UXvmkTfOepivv xIkW5rwFjaMeinYQq+fxvWWerrGIvA0oDvzuPkztcG1oR4P9i0sBGIWTlGaFGX5Rdg+9 R7nGMol73BoRc/uda8UodwI+bpGtV9S4eLh7FiDFRbKKgkaD7K5FKxQNu/v+ZDukh/Fx lekw== X-Gm-Message-State: APjAAAWIYkkbNA6WIcQFvJUbiRmDBpsyKYOPuDISDh39bUd8ie2let9S KmMZoYBmpBE/t9tB4DustdOuog== X-Received: by 2002:a17:90a:8a15:: with SMTP id w21mr10084347pjn.134.1565365166702; Fri, 09 Aug 2019 08:39:26 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id p2sm139275661pfb.118.2019.08.09.08.39.25 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Fri, 09 Aug 2019 08:39:25 -0700 (PDT) Date: Fri, 9 Aug 2019 11:39:24 -0400 From: Joel Fernandes To: "Paul E. McKenney" Cc: Byungchul Park , linux-kernel@vger.kernel.org, Rao Shoaib , max.byungchul.park@gmail.com, kernel-team@android.com, kernel-team@lge.com, Davidlohr Bueso , Josh Triplett , Lai Jiangshan , Mathieu Desnoyers , rcu@vger.kernel.org, Steven Rostedt Subject: Re: [PATCH RFC v1 1/2] rcu/tree: Add basic support for kfree_rcu batching Message-ID: <20190809153924.GB211412@google.com> References: <20190806212041.118146-1-joel@joelfernandes.org> <20190806235631.GU28441@linux.ibm.com> <20190807094504.GB169551@google.com> <20190807175215.GE28441@linux.ibm.com> <20190808095232.GA30401@X58A-UD3R> <20190808125607.GB261256@google.com> <20190808233014.GA184373@google.com> <20190809151619.GD28441@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190809151619.GD28441@linux.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 09, 2019 at 08:16:19AM -0700, Paul E. McKenney wrote: > On Thu, Aug 08, 2019 at 07:30:14PM -0400, Joel Fernandes wrote: [snip] > > > But I could make it something like: > > > 1. Letting ->head grow if ->head_free busy > > > 2. If head_free is busy, then just queue/requeue the monitor to try again. > > > > > > This would even improve performance, but will still risk going out of memory. > > > > It seems I can indeed hit an out of memory condition once I changed it to > > "letting list grow" (diff is below which applies on top of this patch) while > > at the same time removing the schedule_timeout(2) and replacing it with > > cond_resched() in the rcuperf test. I think the reason is the rcuperf test > > starves the worker threads that are executing in workqueue context after a > > grace period and those are unable to get enough CPU time to kfree things fast > > enough. But I am not fully sure about it and need to test/trace more to > > figure out why this is happening. > > > > If I add back the schedule_uninterruptibe_timeout(2) call, the out of memory > > situation goes away. > > > > Clearly we need to do more work on this patch. > > > > In the regular kfree_rcu_no_batch() case, I don't hit this issue. I believe > > that since the kfree is happening in softirq context in the _no_batch() case, > > it fares better. The question then I guess is how do we run the rcu_work in a > > higher priority context so it is not starved and runs often enough. I'll > > trace more. > > > > Perhaps I can also lower the priority of the rcuperf threads to give the > > worker thread some more room to run and see if anything changes. But I am not > > sure then if we're preparing the code for the real world with such > > modifications. > > > > Any thoughts? > > Several! With luck, perhaps some are useful. ;-) > > o Increase the memory via kvm.sh "--memory 1G" or more. The > default is "--memory 500M". Thanks, this definitely helped. > o Leave a CPU free to run things like the RCU grace-period kthread. > You might also need to bind that kthread to that CPU. > > o Alternatively, use the "rcutree.kthread_prio=" boot parameter to > boost the RCU kthreads to real-time priority. This won't do > anything for ksoftirqd, though. I will try these as well. > > o Along with the above boot parameter, use "rcutree.use_softirq=0" > to cause RCU to use kthreads instead of softirq. (You might well > find issues in priority setting as well, but might as well find > them now if so!) Doesn't think one actually reduce the priority of the core RCU work? softirq will always have higher priority than any there. So wouldn't that have the effect of not reclaiming things fast enough? (Or, in my case not scheduling the rcu_work which does the reclaim). > o With any of the above, invoke rcu_momentary_dyntick_idle() along > with cond_resched() in your kfree_rcu() loop. This simulates > a trip to userspace for nohz_full CPUs, so if this helps for > non-nohz_full CPUs, adjustments to the kernel might be called for. Ok, will try it. Save these bullet points for future reference! ;-) thanks, - Joel > > Probably others, but this should do for a start. > > Thanx, Paul > > > thanks, > > > > - Joel > > > > ---8<----------------------- > > > > >From 098d62e5a1b84a11139236c9b1f59e7f32289b40 Mon Sep 17 00:00:00 2001 > > From: "Joel Fernandes (Google)" > > Date: Thu, 8 Aug 2019 16:29:58 -0400 > > Subject: [PATCH] Let list grow > > > > Signed-off-by: Joel Fernandes (Google) > > --- > > kernel/rcu/rcuperf.c | 2 +- > > kernel/rcu/tree.c | 52 +++++++++++++++++++------------------------- > > 2 files changed, 23 insertions(+), 31 deletions(-) > > > > diff --git a/kernel/rcu/rcuperf.c b/kernel/rcu/rcuperf.c > > index 34658760da5e..7dc831db89ae 100644 > > --- a/kernel/rcu/rcuperf.c > > +++ b/kernel/rcu/rcuperf.c > > @@ -654,7 +654,7 @@ kfree_perf_thread(void *arg) > > } > > } > > > > - schedule_timeout_uninterruptible(2); > > + cond_resched(); > > } while (!torture_must_stop() && ++l < kfree_loops); > > > > kfree(alloc_ptrs); > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index bdbd483606ce..bab77220d8ac 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2595,7 +2595,7 @@ EXPORT_SYMBOL_GPL(call_rcu); > > > > > > /* Maximum number of jiffies to wait before draining batch */ > > -#define KFREE_DRAIN_JIFFIES 50 > > +#define KFREE_DRAIN_JIFFIES (HZ / 20) > > > > /* > > * Maximum number of kfree(s) to batch, if limit is hit > > @@ -2684,27 +2684,19 @@ static void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krc, > > { > > struct rcu_head *head, *next; > > > > - /* It is time to do bulk reclaim after grace period */ > > - krc->monitor_todo = false; > > + /* It is time to do bulk reclaim after grace period. */ > > if (queue_kfree_rcu_work(krc)) { > > spin_unlock_irqrestore(&krc->lock, flags); > > return; > > } > > > > - /* > > - * Use non-batch regular call_rcu for kfree_rcu in case things are too > > - * busy and batching of kfree_rcu could not be used. > > - */ > > - head = krc->head; > > - krc->head = NULL; > > - krc->kfree_batch_len = 0; > > - spin_unlock_irqrestore(&krc->lock, flags); > > - > > - for (; head; head = next) { > > - next = head->next; > > - head->next = NULL; > > - __call_rcu(head, head->func, -1, 1); > > + /* Previous batch did not get free yet, let us try again soon. */ > > + if (krc->monitor_todo == false) { > > + schedule_delayed_work_on(smp_processor_id(), > > + &krc->monitor_work, KFREE_DRAIN_JIFFIES/4); > > + krc->monitor_todo = true; > > } > > + spin_unlock_irqrestore(&krc->lock, flags); > > } > > > > /* > > -- > > 2.23.0.rc1.153.gdeed80330f-goog > > >