Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp4586214ybh; Tue, 6 Aug 2019 14:22:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqz67qiP3buPGuhOQFKHV1wxmPMuKtG6PSoiRGQ86WYnBuxQ506li48Z/0lC7bV9zrEtTP2Z X-Received: by 2002:a65:44cc:: with SMTP id g12mr4694417pgs.409.1565126521407; Tue, 06 Aug 2019 14:22:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565126521; cv=none; d=google.com; s=arc-20160816; b=bc7bI+t3s3b2oTU+EtL1CW0KassZNPey4mcTzhf5xHntq+NP6TKzVZF5wxexaVjF2e 6ccma3EAglKfhBx0zX9ZZejq8zwPIZSac9qegWl5c4iCZv9f/djkBsayhlNU19oeBQiW cE+yWgS4oKgruVQ/x9jrSHFua2/gsGtWne48TulvIFnFiQc6rQWW1SR39o8Vd/oOm3n9 j4brrlwk3lf7jFPsilrnth/a2Ynv1Eu4WUEgowbYvSQ+kRCLuAd32LRnt7FR79Ew0NqP pTeX0XSJ3ozo+fE9tY45crcoOyN9GwINuwuykRp3FOm7knBhCMdyfN+3jw3Og4QL7K0G X09g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=01Z7KPmsDPbjU4b8J1t9pHgMezzTkqyFHd5rBVu5BpA=; b=ex5i26+v7WB5n6r7SKOfTU3NA/5ODPE9mHFpxFk4NG0OIz6HDulRhKOY3c5qz1Rorr xBzd7buIjmV6cEPTbK1+Hbn37RD/+zXF0PrYBC++vAZ/p1KLDKQ+MUlU6IpM6jCC0BAE TRs2mpL+Fekh06UKjeOKRGJj+W76Hw6e2fTAaTj+Ms0QlgY+IL3M/DaAyiJEUyhcvCPy Mkpd9EWld+KS++IUHi5wPItK0tVUb1ZvsKPwkoP0MjHpBsDRjks8YJuLNUjZfDml8cQi 6+Aut2s+K/4gLt5TOpQpj6/rhf08VKgNgABuGh8xN9SDv3rlp6qvqyGlccbB28G+87tV 4CmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=LLppZc2n; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d21si15632594pjw.18.2019.08.06.14.21.45; Tue, 06 Aug 2019 14:22:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=LLppZc2n; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726812AbfHFVU5 (ORCPT + 99 others); Tue, 6 Aug 2019 17:20:57 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:42502 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726785AbfHFVU5 (ORCPT ); Tue, 6 Aug 2019 17:20:57 -0400 Received: by mail-pf1-f196.google.com with SMTP id q10so42178600pff.9 for ; Tue, 06 Aug 2019 14:20:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=01Z7KPmsDPbjU4b8J1t9pHgMezzTkqyFHd5rBVu5BpA=; b=LLppZc2nQvCztaEdlOl3Fn5aSZlOqQpi54g7OSBjQFprgfXHP5mqZYMGMo6KuwGAL1 ja+ykZYf0UVtlm7fyckwl4od525ZOJqOmvACNrkcsVEbcSDx2vRN9M8Uj4ZTs+AmBB6X M/4gM7h+1EOpyVP4xutan07zgB8x3IPuyb/D4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=01Z7KPmsDPbjU4b8J1t9pHgMezzTkqyFHd5rBVu5BpA=; b=apliU5kPh8EVja8MPTZqUu7qW+mw7+kkiokkyPKJQj6WA9m+siUV4zwWu+E3Vd3O9a 8B/XoSVihx3vE/QP6wTAaHQpI8jNXprdjaTWdgRnsjCBCPXU0cSIr8kDv4Zu1AIuRUl+ rzBu/SM45a4VrBNQH/NEJj4FbxsrJBoxJuHz3oGB5SLPNBPwDS7WPNdSwIF7UpHVSgfk kSdy1scojiuzBgemRBfLcB3L1UamzSJojP+xT7FZ9M/RKJ9pJpvb4WIG6kLW8BEinfS7 0wq1iddfEPYpWoYsMl6SNaV+0piSWv1qnTi+ipRuRk/taeYd7K9g42PbUa7l0bNodU+m 6csA== X-Gm-Message-State: APjAAAVIQjG0vYo7KSmXmZ1KItmrXJaVdk70gqJ1EfJ2+8ZkRlwBy498 vGypnECHsiwE8LKii73j9MEkpNXdVSk= X-Received: by 2002:aa7:90d8:: with SMTP id k24mr5715204pfk.115.1565126455586; Tue, 06 Aug 2019 14:20:55 -0700 (PDT) Received: from joelaf.cam.corp.google.com ([2620:15c:6:12:9c46:e0da:efbf:69cc]) by smtp.gmail.com with ESMTPSA id b126sm128786325pfa.126.2019.08.06.14.20.53 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Tue, 06 Aug 2019 14:20:54 -0700 (PDT) From: "Joel Fernandes (Google)" To: linux-kernel@vger.kernel.org Cc: "Joel Fernandes (Google)" , Rao Shoaib , max.byungchul.park@gmail.com, byungchul.park@lge.com, kernel-team@android.com, kernel-team@lge.com, Davidlohr Bueso , Josh Triplett , Lai Jiangshan , Mathieu Desnoyers , "Paul E. McKenney" , rcu@vger.kernel.org, Steven Rostedt Subject: [PATCH RFC v1 1/2] rcu/tree: Add basic support for kfree_rcu batching Date: Tue, 6 Aug 2019 17:20:40 -0400 Message-Id: <20190806212041.118146-1-joel@joelfernandes.org> X-Mailer: git-send-email 2.22.0.770.g0f2c4a37fd-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Recently a discussion about performance of system involving a high rate of kfree_rcu() calls surfaced on the list [1] which led to another discussion how to prepare for this situation. This patch adds basic batching support for kfree_rcu. It is "basic" because we do none of the slab management, dynamic allocation, code moving or any of the other things, some of which previous attempts did [2]. These fancier improvements can be follow-up patches and there are several ideas being experimented in those regards. Torture tests follow in the next patch and show improvements of around ~13% with continuous flooding of kfree_rcu() calls on a 16 CPU system. [1] http://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org [2] https://lkml.org/lkml/2017/12/19/824 This is an effort just to start simple, and build up from there. Cc: Rao Shoaib Cc: max.byungchul.park@gmail.com Cc: byungchul.park@lge.com Cc: kernel-team@android.com Cc: kernel-team@lge.com Co-developed-by: Byungchul Park Signed-off-by: Byungchul Park Signed-off-by: Joel Fernandes (Google) --- kernel/rcu/tree.c | 198 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 193 insertions(+), 5 deletions(-) diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index a14e5fbbea46..bdbd483606ce 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2593,19 +2593,194 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func) } EXPORT_SYMBOL_GPL(call_rcu); + +/* Maximum number of jiffies to wait before draining batch */ +#define KFREE_DRAIN_JIFFIES 50 + +/* + * Maximum number of kfree(s) to batch, if limit is hit + * then RCU work is queued right away + */ +#define KFREE_MAX_BATCH 200000ULL + +struct kfree_rcu_cpu { + /* The work done to free objects after GP */ + struct rcu_work rcu_work; + + /* The list of objects being queued */ + struct rcu_head *head; + int kfree_batch_len; + + /* The list of objects pending a free */ + struct rcu_head *head_free; + + /* Protect concurrent access to this structure */ + spinlock_t lock; + + /* The work done to monitor whether objects need free */ + struct delayed_work monitor_work; + bool monitor_todo; +}; + +static DEFINE_PER_CPU(struct kfree_rcu_cpu, krc); + +/* Free all heads after a grace period (worker function) */ +static void kfree_rcu_work(struct work_struct *work) +{ + unsigned long flags; + struct rcu_head *head, *next; + struct kfree_rcu_cpu *krc = container_of(to_rcu_work(work), + struct kfree_rcu_cpu, rcu_work); + + spin_lock_irqsave(&krc->lock, flags); + head = krc->head_free; + krc->head_free = NULL; + spin_unlock_irqrestore(&krc->lock, flags); + + /* The head must be detached and not referenced from anywhere */ + for (; head; head = next) { + next = head->next; + head->next = NULL; + /* Could be possible to optimize with kfree_bulk in future */ + __rcu_reclaim(rcu_state.name, head); + } +} + +/* + * Schedule the kfree batch RCU work to run after GP. + * + * Either the batch reached its maximum size, or the monitor's + * time reached, either way schedule the batch work. + */ +static bool queue_kfree_rcu_work(struct kfree_rcu_cpu *krc) +{ + lockdep_assert_held(&krc->lock); + + /* + * Someone already drained, probably before the monitor's worker + * thread ran. Just return to avoid useless work. + */ + if (!krc->head) + return true; + + /* + * If RCU batch work already in progress, we cannot + * queue another one, just refuse the optimization. + */ + if (krc->head_free) + return false; + + krc->head_free = krc->head; + krc->head = NULL; + krc->kfree_batch_len = 0; + INIT_RCU_WORK(&krc->rcu_work, kfree_rcu_work); + queue_rcu_work(system_wq, &krc->rcu_work); + + return true; +} + +static void kfree_rcu_drain_unlock(struct kfree_rcu_cpu *krc, + unsigned long flags) +{ + struct rcu_head *head, *next; + + /* It is time to do bulk reclaim after grace period */ + krc->monitor_todo = false; + if (queue_kfree_rcu_work(krc)) { + spin_unlock_irqrestore(&krc->lock, flags); + return; + } + + /* + * Use non-batch regular call_rcu for kfree_rcu in case things are too + * busy and batching of kfree_rcu could not be used. + */ + head = krc->head; + krc->head = NULL; + krc->kfree_batch_len = 0; + spin_unlock_irqrestore(&krc->lock, flags); + + for (; head; head = next) { + next = head->next; + head->next = NULL; + __call_rcu(head, head->func, -1, 1); + } +} + +/* + * If enough time has passed, the kfree batch has to be drained + * and the monitor takes care of that. + */ +static void kfree_rcu_monitor(struct work_struct *work) +{ + bool todo; + unsigned long flags; + struct kfree_rcu_cpu *krc = container_of(work, struct kfree_rcu_cpu, + monitor_work.work); + + /* It is time to do bulk reclaim after grace period */ + spin_lock_irqsave(&krc->lock, flags); + todo = krc->monitor_todo; + krc->monitor_todo = false; + if (todo) + kfree_rcu_drain_unlock(krc, flags); + else + spin_unlock_irqrestore(&krc->lock, flags); +} + +static void kfree_rcu_batch(struct rcu_head *head, rcu_callback_t func) +{ + unsigned long flags; + struct kfree_rcu_cpu *this_krc; + bool monitor_todo; + + local_irq_save(flags); + this_krc = this_cpu_ptr(&krc); + + spin_lock(&this_krc->lock); + + /* Queue the kfree but don't yet schedule the batch */ + head->func = func; + head->next = this_krc->head; + this_krc->head = head; + this_krc->kfree_batch_len++; + + if (this_krc->kfree_batch_len == KFREE_MAX_BATCH) { + kfree_rcu_drain_unlock(this_krc, flags); + return; + } + + /* Maximum has not been reached, schedule monitor for timely drain */ + monitor_todo = this_krc->monitor_todo; + this_krc->monitor_todo = true; + spin_unlock(&this_krc->lock); + + if (!monitor_todo) { + schedule_delayed_work_on(smp_processor_id(), + &this_krc->monitor_work, KFREE_DRAIN_JIFFIES); + } + local_irq_restore(flags); +} + /* * Queue an RCU callback for lazy invocation after a grace period. - * This will likely be later named something like "call_rcu_lazy()", - * but this change will require some way of tagging the lazy RCU - * callbacks in the list of pending callbacks. Until then, this - * function may only be called from __kfree_rcu(). */ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) { - __call_rcu(head, func, -1, 1); + kfree_rcu_batch(head, func); } EXPORT_SYMBOL_GPL(kfree_call_rcu); +/* + * The version of kfree_call_rcu that does not do batching of kfree_rcu() + * requests. To be used only for performance testing comparisons with + * kfree_rcu_batch(). + */ +void kfree_call_rcu_nobatch(struct rcu_head *head, rcu_callback_t func) +{ + __call_rcu(head, func, -1, 1); +} + /* * During early boot, any blocking grace-period wait automatically * implies a grace period. Later on, this is never the case for PREEMPT. @@ -3452,6 +3627,17 @@ static void __init rcu_dump_rcu_node_tree(void) pr_cont("\n"); } +void kfree_rcu_batch_init(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu); + + INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor); + } +} + struct workqueue_struct *rcu_gp_wq; struct workqueue_struct *rcu_par_gp_wq; @@ -3459,6 +3645,8 @@ void __init rcu_init(void) { int cpu; + kfree_rcu_batch_init(); + rcu_early_boot_tests(); rcu_bootup_announce(); -- 2.22.0.770.g0f2c4a37fd-goog