Received: by 2002:a05:7412:d008:b0:f9:6acb:47ec with SMTP id bd8csp245325rdb; Tue, 19 Dec 2023 15:42:33 -0800 (PST) X-Google-Smtp-Source: AGHT+IHv7UpWGhsRzuRhrCZnbht/Oua05FZMPJSpinghn8+bvqo1dmk+vAKDBedkaqp0T1rM7Rjh X-Received: by 2002:a17:90b:1003:b0:28b:4c9f:dbb1 with SMTP id gm3-20020a17090b100300b0028b4c9fdbb1mr3447577pjb.89.1703029353671; Tue, 19 Dec 2023 15:42:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703029353; cv=none; d=google.com; s=arc-20160816; b=AfNIiJP39UX+kFJmF5EKXpLUR4xZbVlZrRCtQsLV4LWiPmb/SOs0581pOf0aH/QMcw AqkuvpHoPQrfY9tCtnMtZhXVf6z+9pi5gMzx9h5sdFNOUqDYqjtfw5HYoK6I3qeVgye4 tvulusrb9RouPNQqCwfSE1gSmHIeg9jwZZ6jIXYks1FEiTYJR0NZYFQEJFETaAR1igFy UMWbDWNpD0rXz+N7Mv22s2lwILkqxe+I7cfy7N5L4+ycmLMrvC7NQlIQa9ojL6aK3uhR uABTz2OR5qsv8Bt92MpiF1nP6QMOF43XaKy/5faNd7Ay9KBPhuhuksZxWHgUqCaPcTB0 R8yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=dhMcxZkIUsxs6t7gnHmO8wjv5LMm3sh1FYY+Yt28GWQ=; fh=e3EDm1LFlaOarHX5m5o+Jj2IVkcDd0Z9BWNaGAg+sFs=; b=gyuqkyfATVXjWLiEycZPP7XeTAuMRazA4Ne7mOu4muUYSboupUG+CfD/d1rPpIo6zG W9KfHVM+QxdX+al7BrfQrfOhRKRNXd5U6xyX8QDCMVzkFLcWS8W0oIO2/2F+rsxZ5Y9I HNX05QoJr0DQ9FAiAqywuRRl3FAufbXT5d6FfEZsgTZYaLqZ6zhR9eRGlf66AaHuObYJ 131KYHmg+IivpA87iqcQ5c2Jq236LwhupC9femz/zLWus7KQIpqSaO18FOR6+jn84ZIa FGYZQ6Ex6PLTWGCWcve4l1rNes2odXLivpTNNedNHJnJqduyNQ4VDPl71KrbosSm4fFH HgFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Pc9pAhDv; spf=pass (google.com: domain of linux-kernel+bounces-6107-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-6107-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id s18-20020a170902a51200b001cf5cc3029fsi13348156plq.548.2023.12.19.15.42.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 15:42:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-6107-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Pc9pAhDv; spf=pass (google.com: domain of linux-kernel+bounces-6107-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-6107-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 53783282760 for ; Tue, 19 Dec 2023 23:42:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D29B440BEA; Tue, 19 Dec 2023 23:42:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Pc9pAhDv" X-Original-To: linux-kernel@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DD2040BE4 for ; Tue, 19 Dec 2023 23:42:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 037CEC433C7; Tue, 19 Dec 2023 23:42:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703029347; bh=aEx6B6Q64Y892F9XBs5+b2dv64GtALZwtZNoKj2mxyM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Pc9pAhDvK/Lql88r/Us7KeJ/DRr+/CGI+F30ltXhtn8EhwXW9dpSTtyPwMhJT7z9+ 3T8MQhrG3D7QgLxJ1+WGzpdXvnmb0ChhUxE78aVZjuhIEdtjdsovVqFhdpZL2GdgKl /7nXGRQ5TDGFmEDxhigQneFEHeXkhNDRPbWBHK28SgpRk32HdT1yMNvjA4LjjvUfN7 evagLYeW/iRWuJTvpM3NLSgoK0OBdQekEjmZqxpzt0TwAAV14CoalPsWzXMNhQSoRs BoqzPk7f5SWL4t0Yt1JZma2avRKzfdnnEkIkE9zDjB6nnUY3sJ9LCc06YVghPqMtse rqZLFi4amlRTA== Date: Wed, 20 Dec 2023 00:42:24 +0100 From: Frederic Weisbecker To: Hillf Danton Cc: LKML , Boqun Feng , Joel Fernandes , Neeraj Upadhyay , Uladzislau Rezki , "Paul E . McKenney" , Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH 2/3] rcu: Defer RCU kthreads wakeup when CPU is dying Message-ID: References: <20231218231916.11719-1-frederic@kernel.org> <20231219044247.1708-1-hdanton@sina.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231219044247.1708-1-hdanton@sina.com> On Tue, Dec 19, 2023 at 12:42:47PM +0800, Hillf Danton wrote: > On Tue, 19 Dec 2023 00:19:15 +0100 Frederic Weisbecker > > +static void swake_up_one_online(struct swait_queue_head *wqh) > > +{ > > + int cpu = get_cpu(); > > + > > + /* > > + * If called from rcutree_report_cpu_starting(), wake up > > + * is dangerous that late in the CPU-down hotplug process. The > > + * scheduler might queue an ignored hrtimer. Defer the wake up > > + * to an online CPU instead. > > + */ > > But why is scheduler having any interest selecting a dying CPU for > adding a hrtimer on at the first place? So indeed that timer could be unpinned. But we tried that and it's not enough. If we want to make hrtimers and nohz infrastructure aware of the fact the current CPU is offline when it queues an hrtimer, we must face the ugliness below. And still it's hacky because we must also find an online target whose earliest deadline is below/equal the scheduler hrtimer we are trying to enqueue. And that requires even more ugliness that isn't handled below. So for now I assume that queuing a timer after hrtimers_cpu_dying() is unreasonable and that RCU is the only candidate trying that. If there are more to be reported, we shall see... diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index f2044d5a652b..9eac39fad31c 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -234,6 +234,7 @@ struct hrtimer_cpu_base { struct hrtimer *next_timer; ktime_t softirq_expires_next; struct hrtimer *softirq_next_timer; + int online; struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES]; } ____cacheline_aligned; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a708d225c28e..83c75768f290 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1088,7 +1088,7 @@ int get_nohz_timer_target(void) struct sched_domain *sd; const struct cpumask *hk_mask; - if (housekeeping_cpu(cpu, HK_TYPE_TIMER)) { + if (housekeeping_cpu(cpu, HK_TYPE_TIMER) && cpu_online(cpu)) { if (!idle_cpu(cpu)) return cpu; default_cpu = cpu; @@ -1109,7 +1109,8 @@ int get_nohz_timer_target(void) } if (default_cpu == -1) - default_cpu = housekeeping_any_cpu(HK_TYPE_TIMER); + default_cpu = cpumask_any_and(housekeeping_cpumask(HK_TYPE_TIMER), + cpu_online_mask); return default_cpu; } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 6aaf0a3d6081..26cb9455272a 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -118,7 +118,7 @@ static inline void do_start_rt_bandwidth(struct rt_bandwidth *rt_b) */ hrtimer_forward_now(&rt_b->rt_period_timer, ns_to_ktime(0)); hrtimer_start_expires(&rt_b->rt_period_timer, - HRTIMER_MODE_ABS_PINNED_HARD); + HRTIMER_MODE_ABS_HARD); } raw_spin_unlock(&rt_b->rt_runtime_lock); } diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 760793998cdd..82f9ace2e4fd 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -206,6 +207,12 @@ struct hrtimer_cpu_base *get_target_base(struct hrtimer_cpu_base *base, #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) if (static_branch_likely(&timers_migration_enabled) && !pinned) return &per_cpu(hrtimer_bases, get_nohz_timer_target()); +#else + if (!base->online) { + int cpu = cpumask_any_and(housekeeping_cpumask(HK_TYPE_TIMER), + cpu_online_mask); + base = &per_cpu(hrtimer_bases, cpu); + } #endif return base; } @@ -254,7 +261,13 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base, raw_spin_lock(&new_base->cpu_base->lock); if (new_cpu_base != this_cpu_base && - hrtimer_check_target(timer, new_base)) { + hrtimer_check_target(timer, new_base) && + /* + * Crude hack and buggy: if this CPU is offline and + * the timer is the earliest on the remote target, the timer + * will expire late... + */ + this_cpu_base->online) { raw_spin_unlock(&new_base->cpu_base->lock); raw_spin_lock(&base->cpu_base->lock); new_cpu_base = this_cpu_base; @@ -2183,6 +2196,7 @@ int hrtimers_prepare_cpu(unsigned int cpu) cpu_base->softirq_next_timer = NULL; cpu_base->expires_next = KTIME_MAX; cpu_base->softirq_expires_next = KTIME_MAX; + cpu_base->online = 1; hrtimer_cpu_base_init_expiry_lock(cpu_base); return 0; } @@ -2248,7 +2262,7 @@ int hrtimers_cpu_dying(unsigned int dying_cpu) __hrtimer_get_next_event(new_base, HRTIMER_ACTIVE_SOFT); /* Tell the other CPU to retrigger the next event */ smp_call_function_single(ncpu, retrigger_next_event, NULL, 0); - + old_base->online = 0; raw_spin_unlock(&new_base->lock); raw_spin_unlock(&old_base->lock);