Received: by 2002:ab2:3350:0:b0:1f4:6588:b3a7 with SMTP id o16csp1984447lqe; Tue, 9 Apr 2024 06:37:51 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUlcq3jp5wjaUxXtengMoQnASonI1gCgsPw1o5LHSkJbK5TQfGDEZXtW1Rb8upegpyZ3yQecp59sExuWW9ssx6cvZkLEmhCp4bj7QlSSA== X-Google-Smtp-Source: AGHT+IF9rVYIWHrqwlEz+152hSKKdSdtU7tgOoOP/8wXRUYuv+FWABQ8E2sU/ZfC6uPMB8/W3WzC X-Received: by 2002:a05:6808:1209:b0:3c5:f38f:84ed with SMTP id a9-20020a056808120900b003c5f38f84edmr7059677oil.28.1712669871524; Tue, 09 Apr 2024 06:37:51 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712669871; cv=pass; d=google.com; s=arc-20160816; b=tMYzdo3ZDC/r5o7AzB1aNdWShx0BlDDYEYs2vl/PjMO3zy3oyAArWAunGJol/D+pJo jCMIdZeVZPI3/0foVOxpxSO8/ZKY+vQTDiYPr3r8R/bmhjp5gUEIZ9XXhpAa032OBI1Y TgQ+UY7mmaePvoNyhn44awFBzwRSS9l1vpQs2NiVHL7nAWEpdNm/gQ9LvdI+DrlNL2bc zSnKicff4AgfKKwHQPbvVtnG+XFm+bGQAanTvrGF53t0MlMhN8wnVC5jHxCaWmsbesTV GlutEVtp5MIgIIpVcWfPP5wW9z3EwvrWWiopIKelMu+jfK0XL7HIUCyEYbfGgPKluQpl iqtg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=O0rPYVCo7k1SgTwiL4/AKCuc3OrUl7BNnPr8LqdHnuM=; fh=UiOr/GOSCFvhxHzstiKjFjpFOaMZqV/Q1x4+FjFROVo=; b=VQiU4Jr6+mfmBBhmMZiR/0d8cGC2AlgbTG9iNzxMhlvpgVIEOSjKUNNC/+QlriIVHO 0ZyFbmp0HrXTfC3M5z47FJejjLruxolXRYbhrkCeuK8uqsrrmUAdQjCPEeZDLKlJ5puJ EZqzWoonpy+cANqKqQpPRrnYnRKMpmw9TLQeLKe/GrNh9/Imh0YIGj+9pVaE3Wc5QKnK HyktFTzn+AboMoknjNCjjp8zYKRuJ7lzDIV8HbWjW47QtKi0I3rcsQGGBOVw9u4Mp6GV 8sO3iKL3wlGItkjpjtD4Y0qACT+c5d24QNyHKN5j8u/nyi0bF0ytDY/w8b05XALqHoCS jRTA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-136967-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-136967-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id q20-20020a05622a031400b00434e8a074e7si784859qtw.403.2024.04.09.06.37.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Apr 2024 06:37:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-136967-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-136967-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-136967-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 223591C20CD9 for ; Tue, 9 Apr 2024 13:37:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7CB56131195; Tue, 9 Apr 2024 13:33:39 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0A60512F5BB for ; Tue, 9 Apr 2024 13:33:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712669618; cv=none; b=jI5yWQlL+Y09L6DNXHD/lUc+fnYH6qYKiEdxIG1rGdFfsBr2lhONhqYmSISQqDFfHF2jnGDwaflqRkDPlg6A74QjWZHncllW081NCWUE/vWe5JwiIH4+Tx4yiDKAxgBpsfBZR4yMpM5Cq3Saqb5XXEaevTvocbMWcL235psWXMM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712669618; c=relaxed/simple; bh=fwkzdKE+tW3BMjFP6EmSFT5s1nmcvcB5JeKygQXVYrY=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=sn7kGqH4j6BDB6U0cI7QldddEtbove3IdOE4zN4YXQqUhK8M/brKiF6CfZiJjzLi7hUOFZoVIGrnQ4kmm3O+nOVbezN0GVY7CC2aPf/upi1jdx+i6P7zUbD7xdbXt8YteP+uXSvE5U4qN3y1QvfhwB2Kiq1ARiB9Qj9UFbkBelY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B74DE139F; Tue, 9 Apr 2024 06:34:06 -0700 (PDT) Received: from [10.34.100.110] (e126645.nice.arm.com [10.34.100.110]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1A5763F766; Tue, 9 Apr 2024 06:33:33 -0700 (PDT) Message-ID: <4c48e76b-80cb-4811-8be8-f8cfd5221a80@arm.com> Date: Tue, 9 Apr 2024 15:33:32 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 0/7] sched/fair|isolation: Correctly clear nohz.[nr_cpus|idle_cpus_mask] for isolated CPUs To: Waiman Long , linux-kernel@vger.kernel.org Cc: Aaron Lu , Rui Zhang , Anna-Maria Behnsen , Frederic Weisbecker , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Michal Hocko References: <20240403150543.2793354-1-pierre.gondois@arm.com> <98443f19-c653-493e-a2a9-e1d07b9d8468@redhat.com> <40c388eb-12c5-4136-ba21-6173e61c0e25@redhat.com> Content-Language: en-US From: Pierre Gondois In-Reply-To: <40c388eb-12c5-4136-ba21-6173e61c0e25@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 4/5/24 02:23, Waiman Long wrote: > On 4/4/24 08:55, Pierre Gondois wrote: >> Hello Waiman, >> Thanks for the link, I didn't see the patchset previously. >> >> On 4/4/24 05:01, Waiman Long wrote: >>> On 4/3/24 11:05, Pierre Gondois wrote: >>>> Zhang Rui reported that find_new_ilb() was iterating over CPUs in >>>> isolated cgroup partitions. This triggered spurious wakeups for >>>> theses CPUs. [1] >>>> The initial approach was to ignore CPUs on NULL sched domains, as >>>> isolated CPUs have a NULL sched domain. However a CPU: >>>> - with its tick disabled, so taken into account in >>>>     nohz.[idle_cpus_mask|nr_cpus] >>>> - which is placed in an isolated cgroup partition >>>> will never update nohz.[idle_cpus_mask|nr_cpus] again. >>>> >>>> To avoid that, the following variables should be cleared >>>> when a CPU is placed in an isolated cgroup partition: >>>> - nohz.idle_cpus_mask >>>> - nohz.nr_cpus >>>> - rq->nohz_tick_stopped >>>> This would allow to avoid considering wrong nohz.* values during >>>> idle load balance. >>>> >>>> As suggested in [2] and to avoid calling >>>> nohz_balance_[enter|exit]_idle() >>>> from a remote CPU and create concurrency issues, leverage the existing >>>> housekeeping HK_TYPE_SCHED mask to reflect isolated CPUs (i.e. on NULL >>>> sched domains). >>>> Indeed the HK_TYPE_SCHED mask is currently never set by the >>>> isolcpus/nohz_full kernel parameters, so it defaults to >>>> cpu_online_mask. >>>> Plus it's current usage fits CPUs that are isolated and should >>>> not take part in load balancing. >>>> >>>> Making use of HK_TYPE_SCHED for this purpose implies creating a >>>> housekeeping mask which can be modified at runtime. >>>> >>>> [1] >>>> https://lore.kernel.org/all/20230804090858.7605-1-rui.zhang@intel.com/ >>>> [2] >>>> https://lore.kernel.org/all/CAKfTPtAMd_KNKhXXGk5MEibzzQUX3BFkWgxtEW2o8FFTX99DKw@mail.gmail.com/ >>>> >>>> Pierre Gondois (7): >>>>     sched/isolation: Introduce housekeeping_runtime isolation >>>>     sched/isolation: Move HK_TYPE_SCHED to housekeeping runtime >>>>     sched/isolation: Use HKR_TYPE_SCHED in find_new_ilb() >>>>     sched/fair: Move/add on_null_domain()/housekeeping_cpu() checks >>>>     sched/topology: Remove CPUs with NULL sd from HKR_TYPE_SCHED mask >>>>     sched/fair: Remove on_null_domain() and redundant checks >>>>     sched/fair: Clear idle_cpus_mask for CPUs with NULL sd >>>> >>>>    include/linux/sched/isolation.h | 30 ++++++++++++++++++++- >>>>    include/linux/sched/nohz.h      |  2 ++ >>>>    kernel/sched/fair.c             | 44 +++++++++++++++++------------- >>>>    kernel/sched/isolation.c        | 48 >>>> ++++++++++++++++++++++++++++++++- >>>>    kernel/sched/topology.c         |  7 +++++ >>>>    5 files changed, 110 insertions(+), 21 deletions(-) >>>> >>> I had also posted a patch series on excluding isolated CPUs in isolated >>> partitions from housekeeping cpumasks earlier this year. See >>> >>> https://lore.kernel.org/lkml/20240229021414.508972-1-longman@redhat.com/ >>> >>> It took a different approach from this series. It looks like I should >>> include HK_TYPE_MISC as well. Yes right, but as noted, if CONFIG_CPUSET is set without CPU_ISOLATION, adding HK_TYPE_MISC to HOUSEKEEPING_FLAGS in your patchset would have no effect right ? The only check that would be always true is on_null_domain() from [1]. >> >> The common point between the 2 patchset is that find_new_ilb() won't >> take into account isolated CPUs. >> The present patchset also: >> - clears nohz.[idle_cpus_mask|nr_cpus] variable when a CPU becomes >> isolated, >>   cf. [PATCH 7/7] sched/fair: Clear idle_cpus_mask for CPUs with NULL sd >> - tries to clean up/gather on_null_domain()/HK_TYPE_SCHED/HK_TYPE_MISC >>   mask checks, as HK_TYPE_SCHED/HK_TYPE_MISC masks are currently never >>   set. >> but it also: >> - updates the housekeeping mask from sched/topology.c. It might be better >>   to do it from cpuset.c as you did as the update originally comes from >>   here and it is unlikely another place would require updating >> housekeeping >>   CPUs. >>   A new housekeeping_runtime type is also created, but I think the way >> you >>   handle updating housekeeping mask at runtime is better. >> - adds a dependency of sched/fair.c over CPU_ISOLATION (cf. >> housekeeping_* >>   calls), as Peter noted (IIUC) [1]. > > That is true. Without CONFIG_CPU_ISOLATION, all the housekeeping* calls > are essentially no-ops. > > OTOH, without CONFIG_CPU_ISOLATION, a number of isolation capabilities > won't be there. Most distros will have this config option set anyway. > > BTW, a number of the HK_TYPE_* are also used at runtime, like > HK_TYPE_TIMER and HK_TYPE_RCU. So it is hard to artificially distinguish > between runtime or boot time. > > I don't believe you need to add direct dependency on > CONFIG_CPU_ISOLATION, but you do have to add any housekeeping check as > an additional check, not as a replacement of the existing check. (on another topic) Isolated CPUs currently keep the state they were in when the isolation was done, i.e. if the tick was stopped when adding the CPU was added to the isolated cpumask, then the CPU stays in nohz.idle_cpus_mask forever. Similarly if the tick was not stopped, the CPU is cleared forever in nohz.idle_cpus_mask. This patchset also intended to clear isolated CPUs in nohz.idle_cpus_mask to let them in a known state. This might not be a good approach. nohz.idle_cpus_mask is also used in: nohz_run_idle_balance() \-_nohz_idle_balance() \-for_each_cpu_wrap(balance_cpu, nohz.idle_cpus_mask, this_cpu+1) \-update_nohz_stats() This is apparently done to update 'the blocked load of already idle CPUs'. However isolated CPUs might not have their blocked load updated as they are: - currently randomly part of nohz.idle_cpus_mask. - after this patch, never part of nohz.idle_cpus_mask. I am also wondering whether this makes sense for isolated CPUs to update the blocked load of other CPUs, i.e. if the following would not be needed: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1dd37168da50..9b92700564b1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12291,6 +12291,9 @@ void nohz_run_idle_balance(int cpu) { unsigned int flags; + if (on_null_domain(cpu_rq(cpu))) + return; + flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu)); /* Regards, Pierre