Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp1237469lqo; Fri, 17 May 2024 15:52:37 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUZNwjR2wVVS1vtLjc2kTKrJebN6mxdefD1eR+i/e/27E37yGe2uAdMvW3ubd4F+sseld0mf012APe2k3im72G9t7kD4fkNypaBIQ7jnA== X-Google-Smtp-Source: AGHT+IGozQRmMBI9Ws+J65myVyRTNVZ/IVZ1SL8eYhKDPnG9NO2tasA4pA1DDy30lSYaxeSTsNde X-Received: by 2002:a05:6358:98a5:b0:194:81b4:e86 with SMTP id e5c5f4694b2df-19481b41132mr1262601555d.20.1715986357464; Fri, 17 May 2024 15:52:37 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715986357; cv=pass; d=google.com; s=arc-20160816; b=Q2SLHmcoyfad1Rcggsxbpp65qg3NvnBnN7OqBmX23GF4+DfFGA+RcOn64QLNxV00vN q8Z27dQtMS83NwzUYcvpZr+dJIjJAcODM6/gPuG8zlxwSKDiCB/XJUQSNsdbfm7zQO5z fGtHvtrvpRN3KVSPwg0uHjDTMdsG34VoSyZLnTl6cThBfNGw+ny1tHsVYDgRmvKQ9mbV DhiQCq1J2Tm2nEQ+VBT7dUSVm3e0F+pWxuqcN9DN5OMMq02ASq6TMT7NtaUytV0aqygA HjmuvoLTYn6OTU/wuq3b+OSw21iugOkO02Q+6R8I6BWWOBAPd8h+Jy04zXhS7GgyGBf0 ADHg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :message-id:date:references:in-reply-to:subject:to:dkim-signature :dkim-signature:from; bh=b00C1wf8zb2j7XW4X0xaBdXYhORLdwCo3gVdxTJiRDM=; fh=qjDezo9vaQ5m1N/atvey9CNr7tz3j4GqssRLHeg3PrE=; b=mfl/o4hzUpm9sDA0HrffSmXTCNiL2Zdc75v/8muOdtPHIz4crX/ZIeOEgrpuVsfcOb cjBOvGQfXj+CudW7bc7+LJAo7rpI3wBUlhhcjpzMl1Bgvv5IHmrlYA5Hhmw1XkoCVDl/ 7Dgwe4bwQHEY+lYkIH6k7q96rKyRu3AOrpcfEVvWI9a7qejmgphTaEcM9N/BSSCihjQ/ 3ilsHvgSTw9FHXg9rfDcNOaLNmnDJrnF1wBwoICtP08CqQC9yrT0StO2SbOu6ZICpEo7 qHgYJHWPT1LpSC9IpBXXd1lCmh10ynAJWNUmShH+sWzqjt+pEPneaZ5lU8BswH6FkaMR YR7g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=VDj2yeXg; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; arc=pass (i=1 spf=pass spfdomain=linutronix.de dkim=pass dkdomain=linutronix.de dmarc=pass fromdomain=linutronix.de); spf=pass (google.com: domain of linux-kernel+bounces-182677-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182677-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 41be03b00d2f7-658acf2d66asi4419487a12.664.2024.05.17.15.52.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 May 2024 15:52:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-182677-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=VDj2yeXg; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; arc=pass (i=1 spf=pass spfdomain=linutronix.de dkim=pass dkdomain=linutronix.de dmarc=pass fromdomain=linutronix.de); spf=pass (google.com: domain of linux-kernel+bounces-182677-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-182677-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 104A92839BC for ; Fri, 17 May 2024 22:52:37 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 49B943A1BA; Fri, 17 May 2024 22:52:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VDj2yeXg"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="n9Ntx4CU" Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C11F2032A; Fri, 17 May 2024 22:52:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715986349; cv=none; b=N4FarY+bQJmTEQHVd1J2pfM6PuVQW37OKFisMLR3ASRW2bUUQWU01kzVPS4q+lR9lTLfgLi0eixERTIUqvPgM5WIhowF6BOW3p9RP0HTNsFwyOFtc9OYmxYKZ1f3KRbqiC5Sj45cIVm4tHPlTl44yEY/3CWQwtu/psqcPA1n8RE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715986349; c=relaxed/simple; bh=5wpZ9GWwh36hq4YMFyY8PjjQXZcKxqj+xqPbJl097Gk=; h=From:To:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=YVkBUx0+rHt1mCV4YD91ERmti7mk/swJWpNO8K1vqdML8HRxOSPcxAlacppcEpVZI5lRqNgrN9tZGkN3sOxNkFEeGdtMV/Ew+YyyEDixeK8ma8WPp4sGWTxG+whLM//izp6+iSSXee+qiR1OcNkjSnFRVrvOiE6A15gZhLeG9V0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=VDj2yeXg; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=n9Ntx4CU; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1715986345; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b00C1wf8zb2j7XW4X0xaBdXYhORLdwCo3gVdxTJiRDM=; b=VDj2yeXgnwmnoNn2Uml6S4+rOCeu/QM9dfwhYL36JSdEr8tmCD94wC+NbD87pkXYaNraiM Ci6U8Bf5SG+h3KqwmfJUeUA3RMVp0ymb4LNO5NTWKTcafDR+4EBvEtNx/Y7pnVkJ+7+PM0 +32CMuVUWCtDbRZTqd0UtKyrsXdRQLzcqMTgoqBqokypNoEJM2WBbaBA4a0SvyzQ/rcvpL zmRQ8PV2Yz+8XswuSF6NaXIgxs/30RTXAn33leEYZ71PRQnDmDmvBWTUu5ok1wwlCT6p3q JBk3HsHuRGvpuAPXerf77Vgph/+ydZsbDStAu2PF8jvXLwWUSifVOYXSzFkk/w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1715986345; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=b00C1wf8zb2j7XW4X0xaBdXYhORLdwCo3gVdxTJiRDM=; b=n9Ntx4CUdND/sbXRnAshGaaXK8I+2s50KB+eUVda+QLdq5kIhkNZBISCYHIyhuqBX9ydSI 8UgKz4EaZIIx8xDA== To: Costa Shulyupin , longman@redhat.com, pauld@redhat.com, juri.lelli@redhat.com, prarit@redhat.com, vschneid@redhat.com, Anna-Maria Behnsen , Frederic Weisbecker , Zefan Li , Tejun Heo , Johannes Weiner , Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Petr Mladek , Andrew Morton , Masahiro Yamada , Randy Dunlap , Yoann Congal , "Gustavo A. R. Silva" , Nhat Pham , Costa Shulyupin , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v1 2/7] sched/isolation: Adjust affinity of timers according to change of housekeeping cpumask In-Reply-To: <875xvcjc69.ffs@tglx> References: <20240516190437.3545310-1-costa.shul@redhat.com> <20240516190437.3545310-3-costa.shul@redhat.com> <875xvcjc69.ffs@tglx> Date: Sat, 18 May 2024 00:52:24 +0200 Message-ID: <871q60jbkn.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain On Sat, May 18 2024 at 00:39, Thomas Gleixner wrote: > On Thu, May 16 2024 at 22:04, Costa Shulyupin wrote: >> Adjust affinity timers and watchdog_cpumask according to >> change of housekeeping.cpumasks[HK_TYPE_TIMER] during runtime. > > Timers and watchdog are two different things. It's clearly documented > that a patch should change one thing and not combine random stuff. > >> watchdog_cpumask is initialized during boot in lockup_detector_init() >> from housekeeping_cpumask(HK_TYPE_TIMER). >> >> lockup_detector_reconfigure() utilizes updated watchdog_cpumask >> via __lockup_detector_reconfigure(). > > You describe WHAT the patch is doing, but there is no WHY and zero > rationale why this is correct. > >> timers_resettle_from_cpu() is blindly prototyped from timers_dead_cpu(). >> local_irq_disable is used because cpuhp_thread_fun uses it before >> cpuhp_invoke_callback() call. > > I have no idea what this word salad is trying to tell me. > > The local_irq_disable() in cpuhp_thread_fun() has absolutely nothing to > do with timers_dead_cpu(). > >> Core test snippets without infrastructure: >> >> 1. Create timer on specific cpu with: >> >> timer_setup(&test_timer, test_timer_cb, TIMER_PINNED); >> test_timer.expires = KTIME_MAX; >> add_timer_on(&test_timer, test_cpu); >> >> 2. Call housekeeping_update() >> >> 3. Assure that there is no timers on specified cpu at the end >> of timers_resettle_from_cpu() with: >> >> static int count_timers(int cpu) >> { >> struct timer_base *base; >> int b, v, count = 0; >> >> for (b = 0; b < NR_BASES; b++) { >> base = per_cpu_ptr(&timer_bases[b], cpu); >> raw_spin_lock_irq(&base->lock); >> >> for (v = 0; v < WHEEL_SIZE; v++) { >> struct hlist_node *c; >> >> hlist_for_each(c, base->vectors + v) >> count++; >> } >> raw_spin_unlock_irq(&base->lock); >> } >> >> return count; >> } > > And that snippet in the change log is magically providing a unit test or what? > >> diff --git a/init/Kconfig b/init/Kconfig >> index 72404c1f21577..fac49c6bb965a 100644 >> --- a/init/Kconfig >> +++ b/init/Kconfig >> @@ -682,6 +682,7 @@ config CPU_ISOLATION >> bool "CPU isolation" >> depends on SMP || COMPILE_TEST >> default y >> + select HOTPLUG_CPU > > Why? > >> +#ifdef CONFIG_LOCKUP_DETECTOR > > That ifdef is required because? > >> +#include >> +#endif >> + >> enum hk_flags { >> HK_FLAG_TIMER = BIT(HK_TYPE_TIMER), >> HK_FLAG_RCU = BIT(HK_TYPE_RCU), >> @@ -116,6 +120,19 @@ static void __init housekeeping_setup_type(enum hk_type type, >> housekeeping_staging); >> } >> >> +static void resettle_all_timers(cpumask_var_t enable_mask, cpumask_var_t disable_mask) >> +{ >> + unsigned int cpu; >> + >> + for_each_cpu(cpu, enable_mask) { > > Pointless bracket > >> + timers_prepare_cpu(cpu); > > Seriously? You reset the timer base of an online CPU? > > What's the point of this excercise? The timer base is initialized and in > consistent state. The timer base of an isolated CPU can have active > pinned timers on it. > > So going there and resetting state without locking is definitely a > brilliant idea. > >> + for_each_cpu(cpu, disable_mask) { >> + timers_resettle_from_cpu(cpu); >> + } >> +} >> + >> /* >> * housekeeping_update - change housekeeping.cpumasks[type] and propagate the >> * change. >> @@ -144,6 +161,16 @@ static int housekeeping_update(enum hk_type type, cpumask_var_t update) >> if (!static_branch_unlikely(&housekeeping_overridden)) >> static_key_enable_cpuslocked(&housekeeping_overridden.key); >> >> + switch (type) { >> + case HK_TYPE_TIMER: >> + resettle_all_timers(&masks->enable, &masks->disable); >> +#ifdef CONFIG_LOCKUP_DETECTOR >> + cpumask_copy(&watchdog_cpumask, housekeeping_cpumask(HK_TYPE_TIMER)); >> + lockup_detector_reconfigure(); >> +#endif > > What's wrong with adding a function > > void lockup_detector_update_mask(const struct cpumask *msk); > > and having an empty stub for it when CONFIG_LOCKUP_DETECTOR=n? > > That spares all the ugly ifdeffery and the nmi.h include in one go. > >> + break; >> + default: >> + } >> kfree(masks); >> >> return 0; >> diff --git a/kernel/time/timer.c b/kernel/time/timer.c >> index 48288dd4a102f..2d15c0e7b0550 100644 >> --- a/kernel/time/timer.c >> +++ b/kernel/time/timer.c >> @@ -51,6 +51,7 @@ >> #include >> #include >> #include >> +#include > > What needs this include? > >> #include "tick-internal.h" >> #include "timer_migration.h" >> @@ -2657,6 +2658,47 @@ int timers_prepare_cpu(unsigned int cpu) >> return 0; >> } >> >> +/** >> + * timers_resettle_from_cpu - resettles timers from >> + * specified cpu to housekeeping cpus. >> + */ >> +void timers_resettle_from_cpu(unsigned int cpu) >> +{ >> + struct timer_base *old_base; >> + struct timer_base *new_base; >> + int b, i; >> + >> + local_irq_disable(); > > What for? > >> + for (b = 0; b < NR_BASES; b++) { > > You cannot blindly move pinned timers away from a CPU. That's the last > resort which is used in the CPU hotplug case, but the CPU is not going > out in the dynamic change case and the pinned timer might be there for a > reason. > >> + old_base = per_cpu_ptr(&timer_bases[b], cpu); >> + new_base = per_cpu_ptr(&timer_bases[b], >> + cpumask_any_and(cpu_active_mask, >> + housekeeping_cpumask(HK_TYPE_TIMER))); > > The cpumask needs to be reevaluted for every base because you blindly > copied the code from timers_dead_cpu(), right? > >> + /* >> + * The caller is globally serialized and nobody else >> + * takes two locks at once, deadlock is not possible. >> + */ >> + raw_spin_lock_irq(&new_base->lock); > > Here you disable interrupts again and further down you enable them again > when dropping the lock. So on loop exit this does an imbalanced > local_irq_enable(), no? What's the point of this local_irq_dis/enable() > pair around the loop? > > > >> + raw_spin_lock_nested(&old_base->lock, SINGLE_DEPTH_NESTING); >> + >> + /* >> + * The current CPUs base clock might be stale. Update it > > What guarantees that this is the current CPUs timer base? Nothing... > >> + * before moving the timers over. >> + */ >> + forward_timer_base(new_base); >> + >> + WARN_ON_ONCE(old_base->running_timer); >> + old_base->running_timer = NULL; >> + >> + for (i = 0; i < WHEEL_SIZE; i++) >> + migrate_timer_list(new_base, old_base->vectors + i); >> + >> + raw_spin_unlock(&old_base->lock); >> + raw_spin_unlock_irq(&new_base->lock); >> + } >> + local_irq_enable(); >> +} > > It's absolutely not rocket science to reuse timers_dead_cpu() for this. > > The only requirement timers_dead_cpu() has is that the CPU to which the > timers are migrated is not going away. That's already given due to the > hotplug context. > > The reason why it uses get_cpu_ptr(), which disables preemption and > therefore migration, is that it want's to avoid moving the timers to a > remote CPU as that has implications vs. NOHZ because it might end up > kicking the remote CPU out of idle. > > timers_dead_cpu() could be simply modified to: > > void timers_dead_cpu(unsigned int cpu) > { > migrate_disable(); > timers_migrate_from_cpu(cpu, BASE_LOCAL); > migrate_enable(); > } > > and have > > #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_ISOLATION) > static void timers_migrate_from_cpu(unsigned int cpu, unsigned int base) > { > lockdep_assert(migration_disabled()); > > for (; base < NR_BASES; base++) { > old_base = per_cpu_ptr(&timer_bases[b], cpu); > new_base = this_cpu_ptr(&timer_bases[b]); > .... > } > } > #endif > > Now that isolation muck just has to do: > > 1) Ensure that the CPU it runs on is a housekeeping CPU > > 2) Invoke timers_migrate_to_hkcpu(cpu) which is in timer.c > > #ifdef CONFIG_ISOLATION > void timers_migrate_to_hkcpu(unsigned int cpu) > { > timers_migrate_from_cpu(cpu, BASE_GLOBAL); > } > #endif > > No? So if you don't care about the remote CPU queueing aspect, then this can simply do: void timers_dead_cpu(unsigned int cpu) { migrate_disable(); timers_migrate_from_cpu(cpu, smp_processor_id(), BASE_LOCAL); migrate_enable(); } and have #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_ISOLATION) static void timers_migrate_from_cpu(unsigned int from_cpu, unsigned int to_cpu, unsigned int base) { for (; base < NR_BASES; base++) { old_base = per_cpu_ptr(&timer_bases[b], from_cpu); new_base = per_cpu_ptr(&timer_bases[b], to_cpu); .... } } #endif Now that isolation muck just has to do: Invoke timers_migrate_to_hkcpu(cpu) which is in timer.c #ifdef CONFIG_ISOLATION void timers_migrate_to_hkcpu(unsigned int from) { unsigned int to = cpumask_any_and(cpu_active_mask, housekeeping_cpumask(HK_TYPE_TIMER))); timers_migrate_from_cpu(from, to, BASE_GLOBAL); } #endif See? Thanks, tglx