Received: by 2002:a05:7412:d008:b0:f9:6acb:47ec with SMTP id bd8csp417463rdb; Wed, 20 Dec 2023 00:26:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IEV3Ja+KBE8yoPzsEKawWF+fNVMWHmMXuCUnIebt03QrOyiyjAlo37GBFTJb35FpzYEwKfs X-Received: by 2002:a05:6830:13d7:b0:6d9:65f3:5d96 with SMTP id e23-20020a05683013d700b006d965f35d96mr17065605otq.4.1703060774199; Wed, 20 Dec 2023 00:26:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703060774; cv=none; d=google.com; s=arc-20160816; b=T5HFbTkUjHbP5X3ulUS1TVWeQMHhK2fk85Dc9isXtvI7la9CR7Kb3a+rNrakgYiwHI LON4+W8UHKCcwOXAeSlVbwDRBRcNBh7qbeUXfweIMaJ1NMQIlGPWzea1FFQaQO+ccQA7 hojtEcwHB8FmGik3+5GUJTUsHbO8bxwqMVRfmtpfgd5bAbuHJCga7MfLiOD0YztqvE9b F9TbRGtG7iOMmiXQjLEFh1dDbnjfICB8d7PbYISKjvzH2HnUFUHY4d542/JcHwLegRXH nQOxbqHSlzI2yYX+aVMBHPjXNfOeuFuhZJ0g8N5WhZUZNBGx4ypLEHrufTtLdPII57Zd oDrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :dkim-signature; bh=JGBJIX00nRFnxttmOloymB1Dg5dVPv450YCsoRrnAV0=; fh=qDeJHO+HieDhsQo9OHtihljDPmj7CwwQT9rBuJNF0/s=; b=lKJozme64aIXJYEejoBXu/iHFO0ZLtj3AV1fofbse5pg+gTASQVzENS+QpEZ73+cFK XQuV/3RIGxoJ+J6B9NyhV5+i47Dyi1R6Bftdvlg9c9OLnj/mbzy4H8a7Ug9Ngm5gyrud hxMMYj2lsLqLpB48FF7QebIOU8q90iXIRWqDJugCUKdhVSyY0tP09kpPenOhQonXHzaZ zkgqNyGH3zEqu4zQAf7upSrJEoWcMGJNE3gz0GsC+D+foVfWFbAX1bStCD3fWwObuB6K MwzVSZmxuOXLbifCnE5/11P9jBSBXxPJeWvtjGhF6B6U3OhqT8Ok7HU8QTjiOn5GDk01 Vw7g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=WL2HSFPD; spf=pass (google.com: domain of linux-kernel+bounces-6565-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-6565-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id s66-20020a632c45000000b005c700fd8da0si1150094pgs.113.2023.12.20.00.26.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Dec 2023 00:26:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-6565-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=WL2HSFPD; spf=pass (google.com: domain of linux-kernel+bounces-6565-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-6565-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id CC573284B23 for ; Wed, 20 Dec 2023 08:26:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 283531D6A4; Wed, 20 Dec 2023 08:26:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WL2HSFPD" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 011551D6A6; Wed, 20 Dec 2023 08:26:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-28bc870c540so743953a91.2; Wed, 20 Dec 2023 00:26:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703060763; x=1703665563; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JGBJIX00nRFnxttmOloymB1Dg5dVPv450YCsoRrnAV0=; b=WL2HSFPDcBR+TnH+wyelx6DE7s/KMNcexqzLOYgBKkRAgkZgZ26rmwcNeIDGGNG0WJ QaIh4dDAUm95v0IZ2FVd6tcR/+2d6ZlS4mFoyta74vQj4pnuodQ8dMtka9WCLM/2VE96 m9hf8T0zn3RWA+tptAhla7EEN0gGYDuCHxejMeSgLiTRIwwcztrTvHIkskBq+3rJbTje 4VgQQymHIZ9zQNYsuzTYU8CYKr2cArU8ufkHNFXSBvpLsANtLSG0FNh2JIHWr3Is/hm6 AjwOamsERY6V8Kz5p2hND+u9cQuld/+pQlHJdwHRa31WDnj0/wFg5k7G67YnOTtuVoGT Lg0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703060763; x=1703665563; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JGBJIX00nRFnxttmOloymB1Dg5dVPv450YCsoRrnAV0=; b=uR7vEM6UAGjG0RmbP11egVv5ZC3xtE4c+oJbRhte1Zs1+VBE1k1gEPczRilSHrWL2X 0I0P2D8DqnM/b6iiwvkO1J0ttw6NboC5Bb6ltTOYHPXsdzazS82RnpllKKi6icxQsQjJ k0nYAhVNqiLYhVEQTx7eiXzIcClE3/MbaeHT+JRat5KknVQ13IMSxSSOWfZT+djyUfeq ainNIMFOp6HhI3C+qslLR8irezNItbhaOr5KBcqxoY+vgHvqoq+oL2xYV8WtrVbsm+Tb jBO7s9j3h0RSFN6dX7rWlbtN07R5D3ZGV5nwf/HuySkLrrwxwYmcXKoBfB9Q6gJ/UYTB xosA== X-Gm-Message-State: AOJu0Yz9aQXYqqBGuqodSSPX38Z0jZFMwbaOgwhSHjNENCJkQKsKo34V JGCAkPyED315+0dwXsE1ISqS2eTdinbbHG+T6Ig= X-Received: by 2002:a17:90b:438d:b0:28b:5429:97fe with SMTP id in13-20020a17090b438d00b0028b542997femr3164500pjb.4.1703060763153; Wed, 20 Dec 2023 00:26:03 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231218231916.11719-1-frederic@kernel.org> <20231218231916.11719-3-frederic@kernel.org> In-Reply-To: <20231218231916.11719-3-frederic@kernel.org> From: Z qiang Date: Wed, 20 Dec 2023 16:24:35 +0800 Message-ID: Subject: Re: [PATCH 2/3] rcu: Defer RCU kthreads wakeup when CPU is dying To: Frederic Weisbecker Cc: LKML , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , Uladzislau Rezki , rcu , "Paul E . McKenney" , Thomas Gleixner , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" > > When the CPU goes idle for the last time during the CPU down hotplug > process, RCU reports a final quiescent state for the current CPU. If > this quiescent state propagates up to the top, some tasks may then be > woken up to complete the grace period: the main grace period kthread > and/or the expedited main workqueue (or kworker). > > If those kthreads have a SCHED_FIFO policy, the wake up can indirectly > arm the RT bandwith timer to the local offline CPU. Since this happens > after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the > timer gets ignored. Therefore if the RCU kthreads are waiting for RT > bandwidth to be available, they may never be actually scheduled. > In the rcutree_report_cpu_dead(), the rcuog kthreads may also be wakeup in do_nocb_deferred_wakeup(), if the rcuog kthreads is rt-fifo and wakeup happen, the rt_period_active is set 1 and enqueue hrtimer to offline CPU in do_start_rt_bandwidth(), after that, we invoke swake_up_one_online() send ipi to online CPU, due to the rt_period_active is 1, the rt-bandwith hrtimer will not enqueue to online CPU. any thoughts? Thanks Zqiang > > This triggers TREE03 rcutorture hangs: > > rcu: INFO: rcu_preempt self-detected stall on CPU > rcu: 4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved) > rcu: (t=21035 jiffies g=938281 q=40787 ncpus=6) > rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 > rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. > rcu: RCU grace-period kthread stack dump: > task:rcu_preempt state:R running task stack:14896 pid:14 tgid:14 ppid:2 flags:0x00004000 > Call Trace: > > __schedule+0x2eb/0xa80 > schedule+0x1f/0x90 > schedule_timeout+0x163/0x270 > ? __pfx_process_timeout+0x10/0x10 > rcu_gp_fqs_loop+0x37c/0x5b0 > ? __pfx_rcu_gp_kthread+0x10/0x10 > rcu_gp_kthread+0x17c/0x200 > kthread+0xde/0x110 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x2b/0x40 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > > > The situation can't be solved with just unpinning the timer. The hrtimer > infrastructure and the nohz heuristics involved in finding the best > remote target for an unpinned timer would then also need to handle > enqueues from an offline CPU in the most horrendous way. > > So fix this on the RCU side instead and defer the wake up to an online > CPU if it's too late for the local one. > > Reported-by: Paul E. McKenney > Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") > Signed-off-by: Frederic Weisbecker > --- > kernel/rcu/tree.c | 34 +++++++++++++++++++++++++++++++++- > kernel/rcu/tree_exp.h | 3 +-- > 2 files changed, 34 insertions(+), 3 deletions(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 3ac3c846105f..157f3ca2a9b5 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1013,6 +1013,38 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp) > return needmore; > } > > +static void swake_up_one_online_ipi(void *arg) > +{ > + struct swait_queue_head *wqh = arg; > + > + swake_up_one(wqh); > +} > + > +static void swake_up_one_online(struct swait_queue_head *wqh) > +{ > + int cpu = get_cpu(); > + > + /* > + * If called from rcutree_report_cpu_starting(), wake up > + * is dangerous that late in the CPU-down hotplug process. The > + * scheduler might queue an ignored hrtimer. Defer the wake up > + * to an online CPU instead. > + */ > + if (unlikely(cpu_is_offline(cpu))) { > + int target; > + > + target = cpumask_any_and(housekeeping_cpumask(HK_TYPE_RCU), > + cpu_online_mask); > + > + smp_call_function_single(target, swake_up_one_online_ipi, > + wqh, 0); > + put_cpu(); > + } else { > + put_cpu(); > + swake_up_one(wqh); > + } > +} > + > /* > * Awaken the grace-period kthread. Don't do a self-awaken (unless in an > * interrupt or softirq handler, in which case we just might immediately > @@ -1037,7 +1069,7 @@ static void rcu_gp_kthread_wake(void) > return; > WRITE_ONCE(rcu_state.gp_wake_time, jiffies); > WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq)); > - swake_up_one(&rcu_state.gp_wq); > + swake_up_one_online(&rcu_state.gp_wq); > } > > /* > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index 6d7cea5d591f..2ac440bc7e10 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -173,7 +173,6 @@ static bool sync_rcu_exp_done_unlocked(struct rcu_node *rnp) > return ret; > } > > - > /* > * Report the exit from RCU read-side critical section for the last task > * that queued itself during or before the current expedited preemptible-RCU > @@ -201,7 +200,7 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp, > raw_spin_unlock_irqrestore_rcu_node(rnp, flags); > if (wake) { > smp_mb(); /* EGP done before wake_up(). */ > - swake_up_one(&rcu_state.expedited_wq); > + swake_up_one_online(&rcu_state.expedited_wq); > } > break; > } > -- > 2.42.1 >