Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp3476084pxb; Mon, 4 Apr 2022 18:11:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzoB5/Rqm1qZrOHKiGgSTuMekeW+5vTHAwlhw9Ww9xxT7uwMozKF9TqHNRsjxqd427Ja4/i X-Received: by 2002:a63:1758:0:b0:381:effc:b48f with SMTP id 24-20020a631758000000b00381effcb48fmr801910pgx.124.1649121065750; Mon, 04 Apr 2022 18:11:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649121065; cv=none; d=google.com; s=arc-20160816; b=pYg9Wek77Fzg4OZHFPMYr/qrmVqvlEfsgYO9T3KXXaj5eaYRYmOIQl+bGhRl8YBGVQ 9d2gpxmaegXk5OvYjbK+PLpF5Uyr6qTTWFJUZU47zoVy46AVIUsAZC0wOGDLI9QanTmT MLhYCtt/qJGv5u8kbXN44eyU/UujCFoWIUns+vQ0gF7kEl4XQg4XT++b+hoPd0QoHump dctbVwoOCx2HuNJX5RjWRF/7wfYKv8V1OhT4U4sp4V2UuvDQ8bPghuLK6xHFxM+v7l+/ Oe//ILPADeQPgvpRvd/kcQzR23+EeiLayZRimUeoSPyqrQ5SrBD+sgrTcjg70rO8sUWk 3CSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:cc:from:subject:mime-version:message-id:date :dkim-signature; bh=MXD2T5A4d5cpPYWa//yFP+/rApKvY9ylhuwHg4LZcZI=; b=IBUljJJamXa6gWOQXtyOj4OaOFq8h/WE0eXLBK+Uo9Uyi54a1h4h6DKMmvjrpeHv0R VPi7JeXu97n7k26ISwLdBpCWblM2EdsVFyqcoaGbiLMe3s/QscT0kX10yJ3ElRKuo8QC Ylfi1qq4AYnoUo/knuDAvRmRdbiI9WyZuBxvApONkAs1BO9sGQBBeOE2/KolMpRFc/fo WgGzx6WZL/HDpb0wJIoqKmYAiQXq+jWkcY0w4/5XTzz/RUCkTyOaGSVncNDWi2//R1aX s6HWdB0FAT4Q2r+PhH1GnIwdgGFZ9tXm/KPKf8l+VNFbaKUwNChVw7hOSsND4P5fRNt1 XboA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=SUAzhUAA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s5-20020a170902988500b00153b974f3aesi11122676plp.427.2022.04.04.18.11.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Apr 2022 18:11:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=SUAzhUAA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5ACF36A02F; Mon, 4 Apr 2022 17:10:32 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351891AbiDAW7k (ORCPT + 99 others); Fri, 1 Apr 2022 18:59:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234030AbiDAW7j (ORCPT ); Fri, 1 Apr 2022 18:59:39 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41AE410F6CD for ; Fri, 1 Apr 2022 15:57:48 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-2d11b6259adso37879347b3.19 for ; Fri, 01 Apr 2022 15:57:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:message-id:mime-version:subject:from:cc; bh=MXD2T5A4d5cpPYWa//yFP+/rApKvY9ylhuwHg4LZcZI=; b=SUAzhUAAD6p1EZMwdte6AIxXBGJ2OVaNXtbwGsBRHqoRN95DgNlIte8Y4TJxIp08Hz Jgk+qomyIqZmx7fRVomDxN12d70Xrcu6OYp5cqv/NjUecAoDnbYoFx9/VURgQ/STnyR9 9su35jJ+VJgj5t5dGxTe9bTqicnXahyrGbpka5YzD7r2hcM24fE+jZWK9EOJ0rve3KiU /NEkosRdA55r+cO8Po6NdqIgObritC9UYti2tzq3nLg0TQv4Y4hoqVBb6FyGAT/WbHK0 /3yNQvXGwcsLKo0zNCcSnDxJ2YNENmpad+jUUsX04nf9S7LsvDqMsKmL/U/5+QsvGV1N gYiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:message-id:mime-version:subject:from:cc; bh=MXD2T5A4d5cpPYWa//yFP+/rApKvY9ylhuwHg4LZcZI=; b=qdse3AZ/Lz+mPZ6ccBhVpd35UpMZ7ewW94J+3DCxNWfXN5jfx/RK1ob8hMmOBm/Mip vsXyeeFjDQ2cs0svis+g5YL9TR+23e8cIKFbOcnuhPGp3Vo3s64xXUDeQirTOH0G8CYe eiNzZFDVn1x+2Buz4m4jeQQ4t/z2BmlyV7MzhAoGfUsDyxhTt2O47h67drLwIPNc5ZEZ gv/xOIZp4BF/edV+E23fBj/L1lEb1RE+7r1RGQ96uIBom+JcmpyVKUf0I0/7XmCfU1Ne PYpEXAb89yc/vcWw49JSaHiIRBQtNfZyyuoBjmKdwiPs6np5ij5vTA5YSvAyH5hwkbo0 XRXA== X-Gm-Message-State: AOAM530tQLhIcnQa2B5cH0bhIfLZbYur2jOEDKXrVrEMYI7M8QzlU5Wv 84b/VlYoK+89I2lycudNuqzG5uIpFJFQfI/nRg== X-Received: from kaleshsingh.mtv.corp.google.com ([2620:15c:211:200:272c:b057:35a:b8b4]) (user=kaleshsingh job=sendgmr) by 2002:a05:6902:72a:b0:634:6843:499c with SMTP id l10-20020a056902072a00b006346843499cmr11644105ybt.36.1648853867431; Fri, 01 Apr 2022 15:57:47 -0700 (PDT) Date: Fri, 1 Apr 2022 15:57:40 -0700 Message-Id: <20220401225740.1984689-1-kaleshsingh@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.35.1.1094.g7c7d902a7c-goog Subject: [PATCH] RCU: Move expedited grace period (GP) work to RT kthread_worker From: Kalesh Singh Cc: surenb@google.com, kernel-team@android.com, Kalesh Singh , "Paul E. McKenney" , Tejun Heo , Tim Murray , Wei Wang , Kyle Lin , Chunwei Lu , Lulu Wang , Frederic Weisbecker , Neeraj Upadhyay , Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Joel Fernandes , rcu@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It was noticed that enabling CONFIG_RCU_BOOST did not help RCU performance because the workqueues that runs expedited GP work are subject to scheduling delays. This patch moves the expedited GP work items to RT kthread_worker. The results were evaluated on arm64 Android devices (6GB ram) running 5.10 kernel, and caturing trace data during critical user journeys. The table below compares the time synchronize_rcu_expedited() is blocked: ---------------------------------------------------------------------- | | Using WQ | Using kthead_worker | Diff | ----------------------------------------------------------------------- | Max duration (ns) | 372766967 | 2329671 | -99.38% | ----------------------------------------------------------------------- | Avg duration (ns) | 2746353.16 | 151242.311 | -94.49% | ---------------------------------------------------------------------- Cc: "Paul E. McKenney" Cc: Tejun Heo Reported-by: Tim Murray Reported-by: Wei Wang Tested-by: Kyle Lin Tested-by: Chunwei Lu Tested-by: Lulu Wang Signed-off-by: Kalesh Singh --- kernel/rcu/rcu.h | 3 ++- kernel/rcu/tree.c | 41 +++++++++++++++++++++++++++++++++++++---- kernel/rcu/tree.h | 3 ++- kernel/rcu/tree_exp.h | 35 +++++++++++++++-------------------- 4 files changed, 56 insertions(+), 26 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 24b5f2c2de87..13d2b74bf19f 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -534,7 +534,8 @@ int rcu_get_gp_kthreads_prio(void); void rcu_fwd_progress_check(unsigned long j); void rcu_force_quiescent_state(void); extern struct workqueue_struct *rcu_gp_wq; -extern struct workqueue_struct *rcu_par_gp_wq; +extern struct kthread_worker *rcu_exp_gp_kworker; +extern struct kthread_worker *rcu_exp_par_gp_kworker; #endif /* #else #ifdef CONFIG_TINY_RCU */ #ifdef CONFIG_RCU_NOCB_CPU diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index a4b8189455d5..bd5e672ffa5a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4471,6 +4471,33 @@ static int rcu_pm_notify(struct notifier_block *self, return NOTIFY_OK; } +struct kthread_worker *rcu_exp_gp_kworker; +struct kthread_worker *rcu_exp_par_gp_kworker; + +static void __init rcu_start_exp_gp_kworkers(void) +{ + const char *par_gp_kworker_name = "rcu_exp_par_gp_kthread_worker"; + const char *gp_kworker_name = "rcu_exp_gp_kthread_worker"; + struct sched_param param = { .sched_priority = kthread_prio }; + + rcu_exp_gp_kworker = kthread_create_worker(0, gp_kworker_name); + if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) { + pr_err("Failed to create %s!\n", gp_kworker_name); + return; + } + + rcu_exp_par_gp_kworker = kthread_create_worker(0, par_gp_kworker_name); + if (IS_ERR_OR_NULL(rcu_exp_par_gp_kworker)) { + pr_err("Failed to create %s!\n", par_gp_kworker_name); + kthread_destroy_worker(rcu_exp_gp_kworker); + return; + } + + sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, ¶m); + sched_setscheduler_nocheck(rcu_exp_par_gp_kworker->task, SCHED_FIFO, + ¶m); +} + /* * Spawn the kthreads that handle RCU's grace periods. */ @@ -4500,6 +4527,10 @@ static int __init rcu_spawn_gp_kthread(void) rcu_spawn_nocb_kthreads(); rcu_spawn_boost_kthreads(); rcu_spawn_core_kthreads(); + + /* Create kthread worker for expedited GPs */ + rcu_start_exp_gp_kworkers(); + return 0; } early_initcall(rcu_spawn_gp_kthread); @@ -4745,7 +4776,6 @@ static void __init rcu_dump_rcu_node_tree(void) } struct workqueue_struct *rcu_gp_wq; -struct workqueue_struct *rcu_par_gp_wq; static void __init kfree_rcu_batch_init(void) { @@ -4808,11 +4838,14 @@ void __init rcu_init(void) rcutree_online_cpu(cpu); } - /* Create workqueue for Tree SRCU and for expedited GPs. */ + /* + * Create workqueue for Tree SRCU. + * + * Expedited GPs use RT kthread_worker. + * See: rcu_start_exp_gp_kworkers() + */ rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0); WARN_ON(!rcu_gp_wq); - rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0); - WARN_ON(!rcu_par_gp_wq); /* Fill in default value for rcutree.qovld boot parameter. */ /* -After- the rcu_node ->lock fields are initialized! */ diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index 926673ebe355..0193d67a706a 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -10,6 +10,7 @@ */ #include +#include #include #include #include @@ -23,7 +24,7 @@ /* Communicate arguments to a workqueue handler. */ struct rcu_exp_work { unsigned long rew_s; - struct work_struct rew_work; + struct kthread_work rew_work; }; /* RCU's kthread states for tracing. */ diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 60197ea24ceb..f5f3722c0a74 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -334,7 +334,7 @@ static bool exp_funnel_lock(unsigned long s) * Select the CPUs within the specified rcu_node that the upcoming * expedited grace period needs to wait for. */ -static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) +static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp) { int cpu; unsigned long flags; @@ -423,7 +423,6 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) */ static void sync_rcu_exp_select_cpus(void) { - int cpu; struct rcu_node *rnp; trace_rcu_exp_grace_period(rcu_state.name, rcu_exp_gp_seq_endval(), TPS("reset")); @@ -435,28 +434,27 @@ static void sync_rcu_exp_select_cpus(void) rnp->exp_need_flush = false; if (!READ_ONCE(rnp->expmask)) continue; /* Avoid early boot non-existent wq. */ - if (!READ_ONCE(rcu_par_gp_wq) || + if (!READ_ONCE(rcu_exp_par_gp_kworker) || rcu_scheduler_active != RCU_SCHEDULER_RUNNING || rcu_is_last_leaf_node(rnp)) { - /* No workqueues yet or last leaf, do direct call. */ + /* kthread worker not started yet or last leaf, do direct call. */ sync_rcu_exp_select_node_cpus(&rnp->rew.rew_work); continue; } - INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus); - cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1); - /* If all offline, queue the work on an unbound CPU. */ - if (unlikely(cpu > rnp->grphi - rnp->grplo)) - cpu = WORK_CPU_UNBOUND; - else - cpu += rnp->grplo; - queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work); + kthread_init_work(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus); + /* + * Use rcu_exp_par_gp_kworker, because flushing a work item from + * another work item on the same kthread worker can result in + * deadlock. + */ + kthread_queue_work(rcu_exp_par_gp_kworker, &rnp->rew.rew_work); rnp->exp_need_flush = true; } /* Wait for workqueue jobs (if any) to complete. */ rcu_for_each_leaf_node(rnp) if (rnp->exp_need_flush) - flush_work(&rnp->rew.rew_work); + kthread_flush_work(&rnp->rew.rew_work); } /* @@ -625,7 +623,7 @@ static void rcu_exp_sel_wait_wake(unsigned long s) /* * Work-queue handler to drive an expedited grace period forward. */ -static void wait_rcu_exp_gp(struct work_struct *wp) +static void wait_rcu_exp_gp(struct kthread_work *wp) { struct rcu_exp_work *rewp; @@ -848,20 +846,17 @@ void synchronize_rcu_expedited(void) } else { /* Marshall arguments & schedule the expedited grace period. */ rew.rew_s = s; - INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp); - queue_work(rcu_gp_wq, &rew.rew_work); + kthread_init_work(&rew.rew_work, wait_rcu_exp_gp); + kthread_queue_work(rcu_exp_gp_kworker, &rew.rew_work); } /* Wait for expedited grace period to complete. */ rnp = rcu_get_root(); wait_event(rnp->exp_wq[rcu_seq_ctr(s) & 0x3], sync_exp_work_done(s)); - smp_mb(); /* Workqueue actions happen before return. */ + smp_mb(); /* kthread actions happen before return. */ /* Let the next expedited grace period start. */ mutex_unlock(&rcu_state.exp_mutex); - - if (likely(!boottime)) - destroy_work_on_stack(&rew.rew_work); } EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); base-commit: 7a3ecddc571cc3294e5d6bb5948ff2b0cfa12735 -- 2.35.1.1094.g7c7d902a7c-goog