Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1800006ybb; Sat, 4 Apr 2020 12:52:55 -0700 (PDT) X-Google-Smtp-Source: APiQypKbJWluZtrT/BB2DrzV6F3QVcUbFzwxV8O4JHreFLZK7GsnSWmpg1RKX0JCoYGDsw1pUo6/ X-Received: by 2002:aca:5317:: with SMTP id h23mr7853684oib.33.1586029975244; Sat, 04 Apr 2020 12:52:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586029975; cv=none; d=google.com; s=arc-20160816; b=Ufm16NzY02FXwn7bcnbpjY4p2wDAdon/9v6EBrOkSzdWqSFd4oF93jqUKAb8DJ2Azu MdT0Z/2JVov4CC8UbuKJtQK2Nb9vvNArO+9bf3toX/QRQ+qFD7X9Wpg4zUPyveq2wN6P Xp2pp7PqngEY0SjsumJCk7eZgdtfY4MPYfOBKfnQad+jhRDihT28O9vZyAk/6XAmZnFs b1sff/zFqrwMC+lzCoQK+gYJ+0S1IW+Cgc6oEJAKSDMhTNyPUr713MZuDhioqS01j/Mz 8sDtKw+H1A59v4PCenGkp3amRlZFs1p4AQgpzLG6ob/6ilz6db0sZAk2R2OLfjo9S2Ad NgcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=JnH2YdzKei6/i4FAonkrjq4wGGtTFL+OPicU7RFsfDY=; b=bgF+o8HcMPxPQbwNqqa1E2Ax1c+jxWB4BaoHyymRqXPu5nlZCMl4oT6Q4vM+gUCkQa FJhCQSgCGj2okWjAn/9PAM2SeCgV4r9qjmUcUM9WrEbTghBo9NRsh0+M9Gn9hMzChu7z 3LxTO/PociQkIdfdwvwHceyID3D5Drlbc1p1Rwzf2EpnopD5a1YENxiKv+fgGES6/gNw 1k2noC2xiEQXLM6cufPigS9AND7Z0k7Lpj3s+2UpsDVHOve3RiKGXja94d3TuPu5iiFS 2mqltB/nO/c1zbeMNVq+Cp20XIjxjAMwLGyafvP9f+GXPsyCecHqxm30ksvglcQ0NqL6 87fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=kks2rrNp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c9si7518310ots.110.2020.04.04.12.52.26; Sat, 04 Apr 2020 12:52:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=kks2rrNp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726314AbgDDTve (ORCPT + 99 others); Sat, 4 Apr 2020 15:51:34 -0400 Received: from mail-qt1-f196.google.com ([209.85.160.196]:45467 "EHLO mail-qt1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726297AbgDDTve (ORCPT ); Sat, 4 Apr 2020 15:51:34 -0400 Received: by mail-qt1-f196.google.com with SMTP id 71so2704762qtc.12 for ; Sat, 04 Apr 2020 12:51:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=JnH2YdzKei6/i4FAonkrjq4wGGtTFL+OPicU7RFsfDY=; b=kks2rrNp4B8vC42ruxIC+b0Hdo81tksjcDRn8BoHO7XDoK1XLqkvpQX9S67CbbUYK4 wjrr2ZE5QRXKmXsPkPxVhuQAvi77cjCecx0elDwRtEFRLLYmd1TLBZ9dT+cmlvdTNSEl DDDqgRh9zTc9R1SSxvZpC8gpr/8rPL5T8kqnQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JnH2YdzKei6/i4FAonkrjq4wGGtTFL+OPicU7RFsfDY=; b=WeQaW0HSfQp2R01ggB1xosJYQZp2CXec6pUulfdgnp0pO5xm9KOnjl+cwwkx724Eit tbijky8G9X6utOig+OA21OzXAO9FdCa3OXtXn2L0By+QKUcXA9KGoz7/XCCuTUnthQdV Xy4GhK/QhKbTGdHA9KzxRO+tj22YWxlKesTEwN1hPb7Hmv51xE3B/FdRwvlIGCFxUBqu fQeB0auW091hNYMOylo86B8z9Zh7X+mC+blUIptLLX3ZiqKcc/quZoi86yA1gbhsbAvD zHXbr5xBHsKJOZ17lxg6ib+DIn7w4Ag1W0JqHFo2Ibv8dD1qz2QPy03oNFggUpeFWdZ0 THkA== X-Gm-Message-State: AGi0Pub9IiBPFEKYk8SuNsDCzOrvZ1qeEeA0rXSTVATbjkYyBZF+7HHU 1r6egU3zNJqlAQ3FsbWePk0klA== X-Received: by 2002:ac8:40da:: with SMTP id f26mr380755qtm.391.1586029890360; Sat, 04 Apr 2020 12:51:30 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id y27sm9843253qky.50.2020.04.04.12.51.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Apr 2020 12:51:29 -0700 (PDT) Date: Sat, 4 Apr 2020 15:51:29 -0400 From: Joel Fernandes To: "Uladzislau Rezki (Sony)" Cc: LKML , "Paul E . McKenney" , RCU , linux-mm@kvack.org, Andrew Morton , Steven Rostedt , Oleksiy Avramchenko Subject: Re: [PATCH 1/1] rcu/tree: add emergency pool for headless case Message-ID: <20200404195129.GA83565@google.com> References: <20200403173051.4081-1-urezki@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200403173051.4081-1-urezki@gmail.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 03, 2020 at 07:30:51PM +0200, Uladzislau Rezki (Sony) wrote: > Maintain an emergency pool for each CPU with some > extra objects. There is read-only sysfs attribute, > the name is "rcu_nr_emergency_objs". It reflects > the size of the pool. As for now the default value > is 3. > > The pool is populated when low memory condition is > detected. Please note it is only for headless case > it means when the regular SLAB is not able to serve > any request, the pool is used. > > Signed-off-by: Uladzislau Rezki (Sony) Hi Vlad, One concern I have is this moves the problem a bit further down. My belief is we should avoid the likelihood of even needing an rcu_head allocated for the headless case, to begin with - than trying to do damage-control when it does happen. The only way we would end up needing an rcu_head is if we could not allocate an array. So instead of adding a pool for rcu_head allocations, how do you feel about pre-allocation of the per-cpu cache array instead, which has the same effect as you are intending? This has 3 benefits: 1. It scales with number of CPUs, no configuration needed. 2. It makes the first kfree_rcu() faster and less dependent on an allocation succeeding. 3. Much simpler code, no new structures or special handling. 4. In the future we can extend it to allocate more than 2 pages per CPU using the same caching mechanism. The obvious drawback being its 2 pages per CPU but at least it scales by number of CPUs. Something like the following (just lightly tested): ---8<----------------------- From: "Joel Fernandes (Google)" Subject: [PATCH] rcu/tree: Preallocate the per-cpu cache for kfree_rcu() In recent changes, we have made it possible to use kfree_rcu() without embedding an rcu_head in the object being free'd. This requires dynamic allocation. In case dynamic allocation fails due to memory pressure, we would end up synchronously waiting for an RCU grace period thus hurting kfree_rcu() latency. To make this less probable, let us pre-allocate the per-cpu cache so we depend less on the dynamic allocation succeeding. This also has the effect of making kfree_rcu() slightly faster at run time. Signed-off-by: Joel Fernandes (Google) --- kernel/rcu/tree.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 6172e6296dd7d..9fbdeb4048425 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4251,6 +4251,11 @@ static void __init kfree_rcu_batch_init(void) krcp->krw_arr[i].krcp = krcp; } + krcp->bkvcache[0] = (struct kvfree_rcu_bulk_data *) + __get_free_page(GFP_NOWAIT | __GFP_NOWARN); + krcp->bkvcache[1] = (struct kvfree_rcu_bulk_data *) + __get_free_page(GFP_NOWAIT | __GFP_NOWARN); + INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); krcp->initialized = true; } -- 2.26.0.292.g33ef6b2f38-goog > --- > kernel/rcu/tree.c | 133 +++++++++++++++++++++++++++++++++------------- > 1 file changed, 97 insertions(+), 36 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 5e26145e9ead..f9f1f935ab0b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -114,6 +114,14 @@ int rcu_num_lvls __read_mostly = RCU_NUM_LVLS; > int rcu_kfree_nowarn; > module_param(rcu_kfree_nowarn, int, 0444); > > +/* > + * For headless variant. Under memory pressure an > + * emergency pool can be used if the regular SLAB > + * is not able to serve some memory for us. > + */ > +int rcu_nr_emergency_objs = 3; > +module_param(rcu_nr_emergency_objs, int, 0444); > + > /* Number of rcu_nodes at specified level. */ > int num_rcu_lvl[] = NUM_RCU_LVL_INIT; > int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */ > @@ -2877,6 +2885,12 @@ struct kfree_rcu_cpu { > bool initialized; > // Number of objects for which GP not started > int count; > + > + /* > + * Reserved emergency pool for headless variant. > + */ > + int nr_emergency; > + void **emergency; > }; > > static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc); > @@ -2892,6 +2906,27 @@ debug_rcu_bhead_unqueue(struct kvfree_rcu_bulk_data *bhead) > #endif > } > > +static inline struct kfree_rcu_cpu * > +krc_this_cpu_lock(unsigned long *flags) > +{ > + struct kfree_rcu_cpu *krcp; > + > + local_irq_save(*flags); // For safely calling this_cpu_ptr(). > + krcp = this_cpu_ptr(&krc); > + if (likely(krcp->initialized)) > + spin_lock(&krcp->lock); > + > + return krcp; > +} > + > +static inline void > +krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags) > +{ > + if (likely(krcp->initialized)) > + spin_unlock(&krcp->lock); > + local_irq_restore(flags); > +} > + > /* > * This function is invoked in workqueue context after a grace period. > * It frees all the objects queued on ->bhead_free or ->head_free. > @@ -2974,6 +3009,7 @@ static void kfree_rcu_work(struct work_struct *work) > */ > for (; head; head = next) { > unsigned long offset = (unsigned long)head->func; > + unsigned long flags; > bool headless; > void *ptr; > > @@ -2991,10 +3027,23 @@ static void kfree_rcu_work(struct work_struct *work) > trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset); > > if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset))) { > - if (headless) > + if (headless) { > kvfree((void *) *((unsigned long *) ptr)); > > - kvfree(ptr); > + krcp = krc_this_cpu_lock(&flags); > + if (krcp->emergency) { > + if (krcp->nr_emergency < rcu_nr_emergency_objs) { > + krcp->emergency[krcp->nr_emergency++] = ptr; > + > + /* Bypass freeing of it, it is in emergency pool. */ > + ptr = NULL; > + } > + } > + krc_this_cpu_unlock(krcp, flags); > + } > + > + if (ptr) > + kvfree(ptr); > } > > rcu_lock_release(&rcu_callback_map); > @@ -3144,40 +3193,26 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > return true; > } > > -static inline struct rcu_head *attach_rcu_head_to_object(void *obj) > +static inline struct rcu_head * > +set_ptr_in_rcu_head_obj(void *ptr, unsigned long *rho) > +{ > + rho[0] = (unsigned long) ptr; > + return ((struct rcu_head *) ++rho); > +} > + > +static inline struct rcu_head * > +alloc_rcu_head_obj(void *ptr) > { > - unsigned long *ptr; > + unsigned long *rho; > > /* Try hard to get the memory. */ > - ptr = kmalloc(sizeof(unsigned long *) + > + rho = kmalloc(sizeof(unsigned long *) + > sizeof(struct rcu_head), GFP_KERNEL | > __GFP_ATOMIC | __GFP_HIGH | __GFP_RETRY_MAYFAIL); > - if (!ptr) > + if (!rho) > return NULL; > > - ptr[0] = (unsigned long) obj; > - return ((struct rcu_head *) ++ptr); > -} > - > -static inline struct kfree_rcu_cpu * > -krc_this_cpu_lock(unsigned long *flags) > -{ > - struct kfree_rcu_cpu *krcp; > - > - local_irq_save(*flags); // For safely calling this_cpu_ptr(). > - krcp = this_cpu_ptr(&krc); > - if (likely(krcp->initialized)) > - spin_lock(&krcp->lock); > - > - return krcp; > -} > - > -static inline void > -krc_this_cpu_unlock(struct kfree_rcu_cpu *krcp, unsigned long flags) > -{ > - if (likely(krcp->initialized)) > - spin_unlock(&krcp->lock); > - local_irq_restore(flags); > + return set_ptr_in_rcu_head_obj(ptr, rho); > } > > /* > @@ -3237,15 +3272,31 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > if (!success) { > /* Is headless object? */ > if (head == NULL) { > - /* Drop the lock. */ > + /* > + * Drop the lock to use more permissive > + * parameters, after that take it back. > + */ > krc_this_cpu_unlock(krcp, flags); > + head = alloc_rcu_head_obj(ptr); > + krcp = krc_this_cpu_lock(&flags); > > - head = attach_rcu_head_to_object(ptr); > - if (head == NULL) > - goto inline_return; > + /* > + * Use emergency pool if still fails. > + */ > + if (head == NULL) { > + if (!krcp->nr_emergency) > + goto unlock_return; > > - /* Take it back. */ > - krcp = krc_this_cpu_lock(&flags); > + head = set_ptr_in_rcu_head_obj(ptr, > + krcp->emergency[--krcp->nr_emergency]); > + > + /* > + * We do not need to do it. But just in case > + * let's set the pulled slot to NULL to avoid > + * magic issues. > + */ > + krcp->emergency[krcp->nr_emergency] = NULL; > + } > > /* > * Tag the headless object. Such objects have a back-pointer > @@ -3282,7 +3333,6 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > unlock_return: > krc_this_cpu_unlock(krcp, flags); > > -inline_return: > /* > * High memory pressure, so inline kvfree() after > * synchronize_rcu(). We can do it from might_sleep() > @@ -4272,6 +4322,17 @@ static void __init kfree_rcu_batch_init(void) > } > > INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); > + > + /* > + * The poll will be populated when low memory condition > + * is detected. Therefore we do not fill it in here. > + */ > + krcp->emergency = kmalloc_array(rcu_nr_emergency_objs, > + sizeof(void *), GFP_NOWAIT); > + > + if (!krcp->emergency) > + pr_err("Failed to create emergency pool for %d CPU!\n", cpu); > + > krcp->initialized = true; > } > if (register_shrinker(&kfree_rcu_shrinker)) > -- > 2.20.1 >