Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1946942ybb; Sun, 29 Mar 2020 18:00:24 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvDVZ0pEcvJu0+t7Lt5/RZGu18pNaPjA+noLY+i4a2gOXIoD/6zcOFHrJe1Dj8Cqma0ZsTs X-Received: by 2002:a05:6830:1610:: with SMTP id g16mr7227959otr.358.1585530024187; Sun, 29 Mar 2020 18:00:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585530024; cv=none; d=google.com; s=arc-20160816; b=sJUQMzLMvxYUzVPN1MdjdIxD1PdnSMXolQQR2Tcc9iNIv3pKh7GQSLE12PJuwcmQhW 7t46heW2y8WTeEsL55vGBOwmbiGWC0Oa23JGfhu2BM+9ufSzcXsXWV13kFxS0XfjiVpE VkEECvX44DWP4LJe9VVf5xn99wJT3qyoLM+tU5d9+/hukEiqdIyr4BvgMXTra2mCjiFb 4A9Og6z+dehKi0ArUhEkQ9D/AG8UX3R/eKuYkO05VujyPr/mb8GS+WxJ8Yzxdu2lj1+M SpMd2XzMxs02MTTDiPS/iPT8dnMmfjdIDJLpKhSs0IysDr8b7GfcCIMg91IwqlRXJ1PT RgCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=iI0jKAoAyWtfChZH75VkjoG0ZcHAQComTx8+SSOtfxY=; b=ou6N4xgglTHYCay1aQ03UeU8onWIiVKpwXCMHOJrV+AlBwWD5tVuyFdaLPeZ3IKgTt NdEiX6WEramiWKfIcGX4c1NzBuKVbZxNzGXDUyzc5YnYqCfy4dkbGlDY/f6UD9ZOKK+L zOGhlda26bmF9MZE8FaqajfU1NlvH26cIfL8U5C9k6c6q7T2GA0MQRaht2ShWwJHAdJA uDedne4lz+p44r6r8LWIU1gwRPfvyrtEQPlfmoMOpYUsF1MmWW2e3KcLGiFO/izoaSeW qhiQZ3RTIo51T9cLXGAH8/xpPn//MUbhzl/hmYYLLbpNIbL8rElydxMY/FNW+mbqWw77 dtRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=UNdhRGys; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z13si3408701oto.214.2020.03.29.18.00.05; Sun, 29 Mar 2020 18:00:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=UNdhRGys; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727841AbgC3A4k (ORCPT + 99 others); Sun, 29 Mar 2020 20:56:40 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:32972 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727755AbgC3A4k (ORCPT ); Sun, 29 Mar 2020 20:56:40 -0400 Received: by mail-qt1-f193.google.com with SMTP id c14so13844215qtp.0 for ; Sun, 29 Mar 2020 17:56:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=iI0jKAoAyWtfChZH75VkjoG0ZcHAQComTx8+SSOtfxY=; b=UNdhRGysEPwhw+OCPajgfkaDX9PGLRR5RHy5KC9Q4lKNaxeEnuJNIWvWTwaOCeH2/x xTqpSwA5NaovS6TGkJqFfaB31t38UfZ87CDmPed4NCAsHdi8M7QAJjkcccJLhRTAdcqt NZKrOu6QJpPp0pr0KfeFjRLu6SWXXvsoMVpuI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=iI0jKAoAyWtfChZH75VkjoG0ZcHAQComTx8+SSOtfxY=; b=buM0NAh6y0bIRLGM1wCZO5EXM1iQ2dToyyMSTmFooPP/ZYD8C/V0okTJk6kVutkNsB opZgbKUTp4htVB6z8mUtwvlNx/+4ImGotHR8GKQSVkv936EMOq5N72xwO6XAztLpFi/j z3nvmhshCXDoJSkfA3RpFoHjdV8NuxxawsrBP5Yc9VAFe0G7LRItzJZfai0ngw2+oV7y FENIFWxgQ+Dbj5vCzDa3NV+lzuF7rRsKs+h88GJQUHGGUo2NLE4aDnXD2XA2+mPWrH83 TN3BD4NMHKlVP6LuSwHfe7LF1GrFTfh266NWzVppwd5Yik5fEz0lMEm4/M669Zzwrt3e GDbQ== X-Gm-Message-State: ANhLgQ3FQPbKmBT0opPDD1+X2FkvKIuQ/B81ROlkOJzpUzQN1zsyKDn6 VE+h/BXZjIwGaBiRjDoP/0egxQ== X-Received: by 2002:aed:2499:: with SMTP id t25mr9933397qtc.127.1585529798595; Sun, 29 Mar 2020 17:56:38 -0700 (PDT) Received: from localhost ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id 60sm9643819qtb.95.2020.03.29.17.56.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Mar 2020 17:56:38 -0700 (PDT) Date: Sun, 29 Mar 2020 20:56:37 -0400 From: Joel Fernandes To: "Uladzislau Rezki (Sony)" Cc: LKML , "Paul E . McKenney" , RCU , Andrew Morton , Steven Rostedt , Oleksiy Avramchenko Subject: Re: [PATCH 6/7] rcu/tiny: support reclaim for head-less object Message-ID: <20200330005637.GA138004@google.com> References: <20200323113621.12048-1-urezki@gmail.com> <20200323113621.12048-7-urezki@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200323113621.12048-7-urezki@gmail.com> User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 23, 2020 at 12:36:20PM +0100, Uladzislau Rezki (Sony) wrote: > Make a kvfree_call_rcu() function to support head-less > freeing. Same as for tree-RCU, for such purpose we store > pointers in array. SLAB and vmalloc ptrs. are mixed and > coexist together. > > Under high memory pressure it can be that maintaining of > arrays becomes impossible. Objects with an rcu_head are > released via call_rcu(). When it comes to the head-less > variant, the kvfree() call is directly inlined, i.e. we > do the same as for tree-RCU: > a) wait until a grace period has elapsed; > b) direct inlining of the kvfree() call. > > Thus the current context has to follow might_sleep() > annotation. Also please note that for tiny-RCU any > call of synchronize_rcu() is actually a quiescent > state, therefore (a) does nothing. Hmm, Ok. So on -tiny, if there's any allocation failure ever, we immediately revert to call_rcu(). I guess we could also create a regular (non-array) queue for objects with an rcu_head and queue it on that (since it does not need allocation) in case of array allocation failure, however that may not be worth it. So this LGTM. Thanks! For entire series: Reviewed-by: Joel Fernandes (Google) (I will submit a follow-up to fix the tagging, please let me submit Vlad's entire series with some patches on top -- I also did a bit of wordsmithing in the commit messages of this series). Loved the might_sleep() idea btw, I suppose if atomic context wants to do kvfree_rcu(), then we could also have kfree_rcu() defer the kvfree_rcu() to execute from a workqueue. Thoughts? We can then allow poor insomniacs from calling this API :) thanks, - Joel > Signed-off-by: Uladzislau Rezki (Sony) > --- > kernel/rcu/tiny.c | 157 +++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 156 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c > index 508c82faa45c..b1c31a935db9 100644 > --- a/kernel/rcu/tiny.c > +++ b/kernel/rcu/tiny.c > @@ -40,6 +40,29 @@ static struct rcu_ctrlblk rcu_ctrlblk = { > .curtail = &rcu_ctrlblk.rcucblist, > }; > > +/* Can be common with tree-RCU. */ > +#define KVFREE_DRAIN_JIFFIES (HZ / 50) > + > +/* Can be common with tree-RCU. */ > +struct kvfree_rcu_bulk_data { > + unsigned long nr_records; > + struct kvfree_rcu_bulk_data *next; > + void *records[]; > +}; > + > +/* Can be common with tree-RCU. */ > +#define KVFREE_BULK_MAX_ENTR \ > + ((PAGE_SIZE - sizeof(struct kvfree_rcu_bulk_data)) / sizeof(void *)) > + > +static struct kvfree_rcu_bulk_data *kvhead; > +static struct kvfree_rcu_bulk_data *kvhead_free; > +static struct kvfree_rcu_bulk_data *kvcache; > + > +static DEFINE_STATIC_KEY_FALSE(rcu_init_done); > +static struct delayed_work monitor_work; > +static struct rcu_work rcu_work; > +static bool monitor_todo; > + > void rcu_barrier(void) > { > wait_rcu_gp(call_rcu); > @@ -177,9 +200,137 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) > } > EXPORT_SYMBOL_GPL(call_rcu); > > +static inline bool > +kvfree_call_rcu_add_ptr_to_bulk(void *ptr) > +{ > + struct kvfree_rcu_bulk_data *bnode; > + > + if (!kvhead || kvhead->nr_records == KVFREE_BULK_MAX_ENTR) { > + bnode = xchg(&kvcache, NULL); > + if (!bnode) > + bnode = (struct kvfree_rcu_bulk_data *) > + __get_free_page(GFP_NOWAIT | __GFP_NOWARN); > + > + if (unlikely(!bnode)) > + return false; > + > + /* Initialize the new block. */ > + bnode->nr_records = 0; > + bnode->next = kvhead; > + > + /* Attach it to the bvhead. */ > + kvhead = bnode; > + } > + > + /* Done. */ > + kvhead->records[kvhead->nr_records++] = ptr; > + return true; > +} > + > +static void > +kvfree_rcu_work(struct work_struct *work) > +{ > + struct kvfree_rcu_bulk_data *kvhead_tofree, *next; > + unsigned long flags; > + int i; > + > + local_irq_save(flags); > + kvhead_tofree = kvhead_free; > + kvhead_free = NULL; > + local_irq_restore(flags); > + > + /* Reclaim process. */ > + for (; kvhead_tofree; kvhead_tofree = next) { > + next = kvhead_tofree->next; > + > + for (i = 0; i < kvhead_tofree->nr_records; i++) { > + debug_rcu_head_unqueue((struct rcu_head *) > + kvhead_tofree->records[i]); > + kvfree(kvhead_tofree->records[i]); > + } > + > + if (cmpxchg(&kvcache, NULL, kvhead_tofree)) > + free_page((unsigned long) kvhead_tofree); > + } > +} > + > +static inline bool > +queue_kvfree_rcu_work(void) > +{ > + /* Check if the free channel is available. */ > + if (kvhead_free) > + return false; > + > + kvhead_free = kvhead; > + kvhead = NULL; > + > + /* > + * Queue the job for memory reclaim after GP. > + */ > + queue_rcu_work(system_wq, &rcu_work); > + return true; > +} > + > +static void kvfree_rcu_monitor(struct work_struct *work) > +{ > + unsigned long flags; > + bool queued; > + > + local_irq_save(flags); > + queued = queue_kvfree_rcu_work(); > + if (queued) > + /* Success. */ > + monitor_todo = false; > + local_irq_restore(flags); > + > + /* > + * If previous RCU reclaim process is still in progress, > + * schedule the work one more time to try again later. > + */ > + if (monitor_todo) > + schedule_delayed_work(&monitor_work, > + KVFREE_DRAIN_JIFFIES); > +} > + > void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > { > - call_rcu(head, func); > + unsigned long flags; > + bool success; > + void *ptr; > + > + if (head) { > + ptr = (void *) head - (unsigned long) func; > + } else { > + might_sleep(); > + ptr = (void *) func; > + } > + > + if (debug_rcu_head_queue(ptr)) { > + /* Probable double free, just leak. */ > + WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", > + __func__, head); > + return; > + } > + > + local_irq_save(flags); > + success = kvfree_call_rcu_add_ptr_to_bulk(ptr); > + if (static_branch_likely(&rcu_init_done)) { > + if (success && !monitor_todo) { > + monitor_todo = true; > + schedule_delayed_work(&monitor_work, > + KVFREE_DRAIN_JIFFIES); > + } > + } > + local_irq_restore(flags); > + > + if (!success) { > + if (!head) { > + synchronize_rcu(); > + kvfree(ptr); > + } else { > + call_rcu(head, func); > + } > + } > } > EXPORT_SYMBOL_GPL(kvfree_call_rcu); > > @@ -188,4 +339,8 @@ void __init rcu_init(void) > open_softirq(RCU_SOFTIRQ, rcu_process_callbacks); > rcu_early_boot_tests(); > srcu_init(); > + > + INIT_DELAYED_WORK(&monitor_work, kvfree_rcu_monitor); > + INIT_RCU_WORK(&rcu_work, kvfree_rcu_work); > + static_branch_enable(&rcu_init_done); > } > -- > 2.20.1 >