Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2630002rdb; Fri, 8 Dec 2023 14:06:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IF61fjq48OOa9CQFU/Jr4GbER82bAycaQ+5RZsOEJzhH23CkXiJ8nGs42cN6EVfsgo61EVj X-Received: by 2002:a05:6a21:9810:b0:18f:ef9d:d41 with SMTP id ue16-20020a056a21981000b0018fef9d0d41mr1567713pzb.45.1702073218449; Fri, 08 Dec 2023 14:06:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702073218; cv=none; d=google.com; s=arc-20160816; b=b3dmt3H8CfDGXEFM5T+2S3VjSMFvAxlFyMo5Z3Ccgv0UGNmCWY4ZVMgCqbirXc6xU3 VKvU+JVHOyeUZiH2sLLbmp/kIFxOZTs+go5zLW08n1URucpxpMhfuSLxo7k4ByE0ISyZ 7eh7EVLbuT4C4RL4Cbdo1dtZKi/oPTeVgbkrE0nDS0RmT4hUrxhKa4LVyU5Wt85GowzV eTOpB1aKpnbFEVZeajzW7/3Baux+V0RE11P9NM312BTd02fALhii/sAoWa7TKd+6W6xG lg17MtNu8oUbAvsD51PkfPOlQ9URGWMJ6+4fDEhcIk1P4sLTsEpqr5dYhqmG1vT/Xkn9 oxYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZT00O8Vo096jz01k+sOGLtlY1MvLTUlQ5zHl4Q04JXU=; fh=nQGE9Ev5fZ1wOEfZyjbmCcB4dQE4zTJGbgPJrgjfu7k=; b=L6TIpR0HXrZsCM7Pk2dG+026woXf8W6tnPOzBYvefMgDwPs6/4PVrjhs4oaDIebw/l 6dA3hwwV5FQ+WM1b4rU2MoE9Pd0Vxxj/NWmitg8lrtcLXoMEtx8tDqWkQodVvc50WMgz aJ2+T6T6+reFVUCKGuAybUrzZguDXEy4flj0W42YC5WS6V5cP5K+z6QnlhMWYEAkbNWM ueV/b8c49eUWO7r7iapG27hDwGgo3wcDtZn882jykQOGQ0Vuthq+TDcpNPGbpZfyCNSt PzUd1qGEE9PKnJJSEFtayeJcPs/9vwICQIyI+3iNs3VzxeHbmZzUdz6yapGbjWWWhQnp jbVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N7ajCbOy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id v7-20020a655687000000b005be14925624si2035981pgs.714.2023.12.08.14.06.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 14:06:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N7ajCbOy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 61D8C81CA3E0; Fri, 8 Dec 2023 14:06:55 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1574938AbjLHWGh (ORCPT + 99 others); Fri, 8 Dec 2023 17:06:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1574868AbjLHWGN (ORCPT ); Fri, 8 Dec 2023 17:06:13 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BED04172A for ; Fri, 8 Dec 2023 14:06:16 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A6F3DC433C7; Fri, 8 Dec 2023 22:06:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702073176; bh=JLGAnxgy6nMvCAyCTe/HdN56Qsu1Gnt8OmKPAxvGxvo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=N7ajCbOy9LlP59Yg0Xo3AzLcwsyZc0S0S2qpauxE4tjdhJgOZWjBfBSnmxCsRX40B 39C1NreJGUtnKIw5QIjWyMIH/c2/auPtH8rmtB9WgobxtqUd7hU4jZVQhhVh6S2KCp Bo0Vs1Kv3ZAtjG2/6SzfBAkVmV7Be4cnc1Tb9FkNpKscIEgab7Cfu/rSoF29CyTqgr 5OtUd2/ThgK7newWTCY06JwbRc5Rdwyumd5xH0xi7Kn9LkiHwh0STVkYeB7c00JuuS DIj/JSf2kXsEKAB6DdmQ/ajkBNQpJQ7gYryshlUO8m8DzaeZ/EnY587PfblOytxp0r IxkoHLVG8girA== From: Frederic Weisbecker To: LKML Cc: Frederic Weisbecker , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , "Paul E . McKenney" , Uladzislau Rezki , Zqiang , rcu , Anna-Maria Behnsen , Thomas Gleixner , Neeraj upadhyay Subject: [PATCH 8/8] rcu/exp: Remove rcu_par_gp_wq Date: Fri, 8 Dec 2023 23:05:45 +0100 Message-ID: <20231208220545.7452-9-frederic@kernel.org> X-Mailer: git-send-email 2.42.1 In-Reply-To: <20231208220545.7452-1-frederic@kernel.org> References: <20231208220545.7452-1-frederic@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 08 Dec 2023 14:06:55 -0800 (PST) TREE04 running on short iterations can produce writer stalls of the following kind: ??? Writer stall state RTWS_EXP_SYNC(4) g3968 f0x0 ->state 0x2 cpu 0 task:rcu_torture_wri state:D stack:14568 pid:83 ppid:2 flags:0x00004000 Call Trace: __schedule+0x2de/0x850 ? trace_event_raw_event_rcu_exp_funnel_lock+0x6d/0xb0 schedule+0x4f/0x90 synchronize_rcu_expedited+0x430/0x670 ? __pfx_autoremove_wake_function+0x10/0x10 ? __pfx_synchronize_rcu_expedited+0x10/0x10 do_rtws_sync.constprop.0+0xde/0x230 rcu_torture_writer+0x4b4/0xcd0 ? __pfx_rcu_torture_writer+0x10/0x10 kthread+0xc7/0xf0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 Waiting for an expedited grace period and polling for an expedited grace period both are operations that internally rely on the same workqueue performing necessary asynchronous work. However, a dependency chain is involved between those two operations, as depicted below: ====== CPU 0 ======= ====== CPU 1 ======= synchronize_rcu_expedited() exp_funnel_lock() mutex_lock(&rcu_state.exp_mutex); start_poll_synchronize_rcu_expedited queue_work(rcu_gp_wq, &rnp->exp_poll_wq); synchronize_rcu_expedited_queue_work() queue_work(rcu_gp_wq, &rew->rew_work); wait_event() // A, wait for &rew->rew_work completion mutex_unlock() // B //======> switch to kworker sync_rcu_do_polled_gp() { synchronize_rcu_expedited() exp_funnel_lock() mutex_lock(&rcu_state.exp_mutex); // C, wait B .... } // D Since workqueues are usually implemented on top of several kworkers handling the queue concurrently, the above situation wouldn't deadlock most of the time because A then doesn't depend on D. But in case of memory stress, a single kworker may end up handling alone all the works in a serialized way. In that case the above layout becomes a problem because A then waits for D, closing a circular dependency: A -> D -> C -> B -> A This however only happens when CONFIG_RCU_EXP_KTHREAD=n. Indeed synchronize_rcu_expedited() is otherwise implemented on top of a kthread worker while polling still relies on rcu_gp_wq workqueue, breaking the above circular dependency chain. Fix this with making expedited grace period to always rely on kthread worker. The workqueue based implementation is essentially a duplicate anyway now that the per-node initialization is performed by per-node kthread workers. Meanwhile the CONFIG_RCU_EXP_KTHREAD switch is still kept around to manage the scheduler policy of these kthread workers. Reported-by: Anna-Maria Behnsen Reported-by: Thomas Gleixner Suggested-by: Joel Fernandes Suggested-by: Paul E. McKenney Suggested-by: Neeraj upadhyay Signed-off-by: Frederic Weisbecker --- kernel/rcu/rcu.h | 4 --- kernel/rcu/tree.c | 40 ++++--------------------- kernel/rcu/tree.h | 4 --- kernel/rcu/tree_exp.h | 70 +------------------------------------------ 4 files changed, 7 insertions(+), 111 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index 6beaf70d629f..99032b9cb667 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -623,11 +623,7 @@ int rcu_get_gp_kthreads_prio(void); void rcu_fwd_progress_check(unsigned long j); void rcu_force_quiescent_state(void); extern struct workqueue_struct *rcu_gp_wq; -#ifdef CONFIG_RCU_EXP_KTHREAD extern struct kthread_worker *rcu_exp_gp_kworker; -#else /* !CONFIG_RCU_EXP_KTHREAD */ -extern struct workqueue_struct *rcu_par_gp_wq; -#endif /* CONFIG_RCU_EXP_KTHREAD */ void rcu_gp_slow_register(atomic_t *rgssp); void rcu_gp_slow_unregister(atomic_t *rgssp); #endif /* #else #ifdef CONFIG_TINY_RCU */ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index e75ddf42e9b1..0c28adb56ad4 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4367,7 +4367,6 @@ rcu_boot_init_percpu_data(int cpu) rcu_boot_init_nocb_percpu_data(rdp); } -#ifdef CONFIG_RCU_EXP_KTHREAD struct kthread_worker *rcu_exp_gp_kworker; static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) @@ -4387,7 +4386,9 @@ static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) return; } WRITE_ONCE(rnp->exp_kworker, kworker); - sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, ¶m); + + if (IS_ENABLED(CONFIG_RCU_EXP_KTHREAD)) + sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, ¶m); } static struct task_struct *rcu_exp_par_gp_task(struct rcu_node *rnp) @@ -4411,39 +4412,14 @@ static void __init rcu_start_exp_gp_kworker(void) rcu_exp_gp_kworker = NULL; return; } - sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, ¶m); -} - -static inline void rcu_alloc_par_gp_wq(void) -{ -} -#else /* !CONFIG_RCU_EXP_KTHREAD */ -struct workqueue_struct *rcu_par_gp_wq; - -static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp) -{ -} - -static struct task_struct *rcu_exp_par_gp_task(struct rcu_node *rnp) -{ - return NULL; -} - -static void __init rcu_start_exp_gp_kworker(void) -{ -} -static inline void rcu_alloc_par_gp_wq(void) -{ - rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0); - WARN_ON(!rcu_par_gp_wq); + if (IS_ENABLED(CONFIG_RCU_EXP_KTHREAD)) + sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, ¶m); } -#endif /* CONFIG_RCU_EXP_KTHREAD */ static void rcu_spawn_rnp_kthreads(struct rcu_node *rnp) { - if ((IS_ENABLED(CONFIG_RCU_EXP_KTHREAD) || - IS_ENABLED(CONFIG_RCU_BOOST)) && rcu_scheduler_fully_active) { + if (rcu_scheduler_fully_active) { mutex_lock(&rnp->kthread_mutex); rcu_spawn_one_boost_kthread(rnp); rcu_spawn_exp_par_gp_kworker(rnp); @@ -4527,9 +4503,6 @@ static void rcutree_affinity_setting(unsigned int cpu, int outgoingcpu) struct rcu_node *rnp; struct task_struct *task_boost, *task_exp; - if (!IS_ENABLED(CONFIG_RCU_EXP_KTHREAD) && !IS_ENABLED(CONFIG_RCU_BOOST)) - return; - rdp = per_cpu_ptr(&rcu_data, cpu); rnp = rdp->mynode; @@ -5209,7 +5182,6 @@ void __init rcu_init(void) /* Create workqueue for Tree SRCU and for expedited GPs. */ rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0); WARN_ON(!rcu_gp_wq); - rcu_alloc_par_gp_wq(); /* Fill in default value for rcutree.qovld boot parameter. */ /* -After- the rcu_node ->lock fields are initialized! */ diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h index ef3d3385063f..35f7af331e6c 100644 --- a/kernel/rcu/tree.h +++ b/kernel/rcu/tree.h @@ -24,11 +24,7 @@ /* Communicate arguments to a workqueue handler. */ struct rcu_exp_work { unsigned long rew_s; -#ifdef CONFIG_RCU_EXP_KTHREAD struct kthread_work rew_work; -#else - struct work_struct rew_work; -#endif /* CONFIG_RCU_EXP_KTHREAD */ }; /* RCU's kthread states for tracing. */ diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 744d6acf5553..dd33948ab80f 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -420,7 +420,6 @@ static void __sync_rcu_exp_select_node_cpus(struct rcu_exp_work *rewp) static void rcu_exp_sel_wait_wake(unsigned long s); -#ifdef CONFIG_RCU_EXP_KTHREAD static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp) { struct rcu_exp_work *rewp = @@ -472,69 +471,6 @@ static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew kthread_queue_work(rcu_exp_gp_kworker, &rew->rew_work); } -static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew) -{ -} -#else /* !CONFIG_RCU_EXP_KTHREAD */ -static void sync_rcu_exp_select_node_cpus(struct work_struct *wp) -{ - struct rcu_exp_work *rewp = - container_of(wp, struct rcu_exp_work, rew_work); - - __sync_rcu_exp_select_node_cpus(rewp); -} - -static inline bool rcu_exp_worker_started(void) -{ - return !!READ_ONCE(rcu_gp_wq); -} - -static inline bool rcu_exp_par_worker_started(struct rcu_node *rnp) -{ - return !!READ_ONCE(rcu_par_gp_wq); -} - -static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp) -{ - int cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1); - - INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus); - /* If all offline, queue the work on an unbound CPU. */ - if (unlikely(cpu > rnp->grphi - rnp->grplo)) - cpu = WORK_CPU_UNBOUND; - else - cpu += rnp->grplo; - queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work); -} - -static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp) -{ - flush_work(&rnp->rew.rew_work); -} - -/* - * Work-queue handler to drive an expedited grace period forward. - */ -static void wait_rcu_exp_gp(struct work_struct *wp) -{ - struct rcu_exp_work *rewp; - - rewp = container_of(wp, struct rcu_exp_work, rew_work); - rcu_exp_sel_wait_wake(rewp->rew_s); -} - -static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew) -{ - INIT_WORK_ONSTACK(&rew->rew_work, wait_rcu_exp_gp); - queue_work(rcu_gp_wq, &rew->rew_work); -} - -static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew) -{ - destroy_work_on_stack(&rew->rew_work); -} -#endif /* CONFIG_RCU_EXP_KTHREAD */ - /* * Select the nodes that the upcoming expedited grace period needs * to wait for. @@ -978,8 +914,7 @@ void synchronize_rcu_expedited(void) lock_is_held(&rcu_sched_lock_map), "Illegal synchronize_rcu_expedited() in RCU read-side critical section"); - can_queue = (rcu_scheduler_active != RCU_SCHEDULER_INIT) && - rcu_exp_worker_started(); + can_queue = (rcu_scheduler_active != RCU_SCHEDULER_INIT) && rcu_exp_worker_started(); /* Is the state is such that the call is a grace period? */ if (rcu_blocking_is_gp()) { @@ -1027,9 +962,6 @@ void synchronize_rcu_expedited(void) /* Let the next expedited grace period start. */ mutex_unlock(&rcu_state.exp_mutex); - - if (likely(can_queue)) - synchronize_rcu_expedited_destroy_work(&rew); } EXPORT_SYMBOL_GPL(synchronize_rcu_expedited); -- 2.42.1