Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp3120112rdh; Thu, 28 Sep 2023 03:19:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH6dWW5RtwAfDPk8gabNXHz3U+z/APUc2r/3+/LKBDVVxIhioMl/Dk+vmGydYOXY1g0Q8IH X-Received: by 2002:a05:6870:b290:b0:1d5:da78:5c7d with SMTP id c16-20020a056870b29000b001d5da785c7dmr812228oao.50.1695896394484; Thu, 28 Sep 2023 03:19:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695896394; cv=none; d=google.com; s=arc-20160816; b=n6oXCUW9FAtYYcS12ENVPAxYRyQoUSptuzuI5Zmskxz2zQNmMwtuzvgtqhnaLlQCQT lILCApq7VEB5Ox/SAIDoRmHWphKDhcw2BKqvn1pPsASoRlfEicaCibDAt+uCrtdzCJ6l O/lB6Ok2RSmcvQz7tRBhirwGH0x0MZ9FRO0ce2GwNenFMLADvnO1wTDPMiGeoQsWX/EM mIrepXnCiP6/qxDJKQceC3pCsABdnpQAIReM5KOmHd4h1QGy42OVobUT9CsNwWnfMIja yW0gWwCHvrqIl2ifMlJbJc+0vVV9Nmh8wbGbXNvtVSvOWnCZagIvIn8AIq7r55k8oeXW a0Gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=yzn2QZLp1dYhkiDtbgdviHulF3X44J9BarvjhVhengg=; fh=93Ltr4TV/KJR7b3vvj4sTBTWa6cwJwf6gtfjOQi4n9s=; b=Qp2ag92euJ24zcwIyTbxcEk6uPNUmmpey20Ni6LjpSKRSqzgWmNjbsHpkr3rUxJPkx xNFzelPcJEPrMKfr7eU4n3JGFluozoobVhWb9bVRg3RObJ4cAKdy4Besg7SMGSEVKd9c SjBYJYSaw/Gr0u3axQ8d1WrAY+1SWdtCvLyXW1mlqf+xZL+HTK/6UZmS/O6dAovXyonf 3ahHCEvNKjdyCjxmuv2nxMZh8WF1FVaE4hpRFqGAU9gqGTU1S989UG4VY4GDLWFLR/2d 1a3cd9LZauLj6bLb5n96VoWZd+Yvexg48VMExcMyctFcl/xd0uvFI92b8DkthH2YsLrr vMJw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id bg26-20020a056a02011a00b0057771e49c25si17973963pgb.693.2023.09.28.03.19.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 03:19:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id D6AB28068873; Thu, 28 Sep 2023 02:27:05 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231663AbjI1J0m (ORCPT + 99 others); Thu, 28 Sep 2023 05:26:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231844AbjI1J0K (ORCPT ); Thu, 28 Sep 2023 05:26:10 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7CF8010CA for ; Thu, 28 Sep 2023 02:25:54 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4FA831FB; Thu, 28 Sep 2023 02:26:32 -0700 (PDT) Received: from [10.34.100.121] (e126645.nice.arm.com [10.34.100.121]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CFED53F59C; Thu, 28 Sep 2023 02:25:51 -0700 (PDT) Message-ID: Date: Thu, 28 Sep 2023 11:25:45 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform Content-Language: en-US To: Shrikanth Hegde , Dietmar Eggemann Cc: linux-kernel@vger.kernel.org, ionela.voinescu@arm.com, srikar@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org, yu.c.chen@intel.com, tim.c.chen@linux.intel.com, pauld@redhat.com, mingo@redhat.com, peterz@infradead.org, vincent.guittot@linaro.org, vschneid@redhat.com, qperret@google.com References: <20230926100046.405188-1-sshegde@linux.vnet.ibm.com> <8ac1576c-909b-ec6b-930d-0683ca288bf9@arm.com> <46353945-fced-9b69-b334-3b7ae50c957c@linux.vnet.ibm.com> From: Pierre Gondois In-Reply-To: <46353945-fced-9b69-b334-3b7ae50c957c@linux.vnet.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.2 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 28 Sep 2023 02:27:09 -0700 (PDT) Hello Shrikanth, Dietmar, On 9/27/23 19:08, Shrikanth Hegde wrote: > > > On 9/27/23 6:44 PM, Dietmar Eggemann wrote: >> Ah, BTW s/quentin.perret@arm.com/qperret@google.com >> >> On 27/09/2023 10:14, Shrikanth Hegde wrote: >>> >>> >>> On 9/27/23 2:59 AM, Dietmar Eggemann wrote: >>>> On 26/09/2023 12:00, Shrikanth Hegde wrote: >> >> [...] >> >>>>> At present, though platform doesn't support EAS, this sysctl returns 1 >>>>> and it ends up calling rebuild of sched domain on write to 1 and >>>> >>>> sched domains are not rebuild in this case, i.e. >>>> partition_sched_domains_locked() does not call detach_destroy_domains() >>>> or build_sched_domains(). Only build_perf_domains() is called which >>>> bails out if !sysctl_sched_energy_aware or one of the EAS conditions is >>>> not true. >>>> >>> >>> ok. that's because it goes to match1 and match2 right? >> >> yes. >> >> [...] >> >>>>> @@ -231,6 +289,14 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write, >>>>> return -EPERM; >>>>> >>>>> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); >>>>> + if (!sched_is_eas_possible(cpu_active_mask)) { >>>> >>>> This is using `cpu_active_mask` so EAS can only be disabled system-wide. >>>> >>>> So I experimented with an exclusive cpuset setup creating a symmetric >>>> (cs1) and an asymmetric (cs2) island on my Juno with its cpumask = [l B >>>> B l l l] (l - little CPU, B - Big CPU). >>>> >>>> root@juno:~# cd /sys/fs/cgroup/cpuset >>>> root@juno:/sys/fs/cgroup/cpuset# mkdir cs1 >>>> root@juno:/sys/fs/cgroup/cpuset# echo 1 > cs1/cpuset.cpu_exclusive >>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cs1/cpuset.mems >>>> root@juno:/sys/fs/cgroup/cpuset# echo 4,5 > cs1/cpuset.cpus >>>> root@juno:/sys/fs/cgroup/cpuset# mkdir cs2 >>>> root@juno:/sys/fs/cgroup/cpuset# echo 1 > cs2/cpuset.cpu_exclusive >>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cs2/cpuset.mems >>>> root@juno:/sys/fs/cgroup/cpuset# echo 0-3 > cs2/cpuset.cpus >>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cpuset.sched_load_balance >>>> >>>> [ 3021.761278] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{ >>>> cpus=0,3-5 nr_pstate=5 } >>>> >>>> root@juno:~# echo 0 > /proc/sys/kernel/sched_energy_aware >>>> >>>> log messages: >>>> ... >>>> [ 3143.538583] rd 4-5: Disabling EAS >>>> [ 3143.549569] rd 0-3: Disabling EAS >>>> [ 3143.560681] sched_energy_set: stopping EAS >>>> ... >>>> >>>> root@juno:~# echo 1 > /proc/sys/kernel/sched_energy_aware >>>> >>>> log messages: >>>> ... >>>> [ 3223.277521] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{ >>>> cpus=0,3-5 nr_pstate=5 } >>>> [ 3223.293409] sched_energy_set: starting EAS >>>> >>>> Seems still to work correctly. >>> >>> I see that can be a issue. using first cpu here check to asymmetric cpu capacity. >>> It would have worked here, since the first group had asymmetry. ( l B B l ). >>> It wouldn't have worked if the first group had like ( l l ) and second group has ( l B B l ) >> >> Yeah, that's true. >> >> sched_is_eas_possible(const struct cpumask *cpu_mask) >> >> !per_cpu(sd_asym_cpucapacity, cpumask_first(cpu_mask)); >> >> cpusets cs1=[0,5] and cs2=[1-4] as such an example. >> > > right. > >> >>> Instead of cpu_active_mask, I can make use of ndoms_cur and doms_cur[i] logic to >>> traverse through possible doms, and if none of the domains have asymmetric cpu capacity >>> return false. Does that look right? >> >> rebuild_sched_domains() >> >> rebuild_sched_domains_locked() >> >> ndoms = generate_sched_domains(&doms, &attr) >> >> You would need generate_sched_domains() in sched_energy_aware_handler()? >> > > clearly I didnt think through this. ndoms_cur and doms_cur are updated at the end. > So If EAS is possible at boot, this would fail. > > > What sched_is_eas_possible needs is if there is asymmetric cpu topology. > Simpler loop of all CPU's and checking if there any of them have sd_asym_cpucapacity might do, > though it goes through all CPU's, Since these functions are not in hot path > So it should not affect any performance. Something like below would work? > > bool any_asym_capacity = false; > > /* EAS is enabled for asymmetric CPU capacity topologies. */ > for_each_cpu(i, cpu_mask) { > if (per_cpu(sd_asym_cpucapacity, i)) { > any_asym_capacity = true; > break; > } > } > if (!any_asym_capacity) { > if (sched_debug()) { > pr_info("rd %*pbl: Checking EAS, CPUs do not have asymmetric capacities\n", > cpumask_pr_args(cpu_mask)); > } > return false; > } > FYIW I could reproduce the issue mentioned above, and the suggested bit seems to solve it. > > >>>> [...] >>>> >>>>> @@ -458,6 +487,8 @@ static bool build_perf_domains(const struct cpumask *cpu_map) >>>>> return !!pd; >>>>> >>>>> free: >>>>> + if (sched_debug()) >>>>> + pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map)); >>>> >>>> Shouldn't this be used in condition `if (!sysctl_sched_energy_aware)`? >>>> Otherwise we have 2 warnings when the other conditions which leads to >>>> `goto free` aren't met. >>> Since sched_energy_set has the info messages about start and stop of EAS, maybe >>> this debug is not needed. Will remove it. Doing it only `if (!sysctl_sched_energy_aware)` >> >> OK. >> >>> also doesn't seem correct, as calling free in this function would free the perf_domains. >> >> But !sched_is_eas_possible() also does `goto free` and in there we we >> emit pr_info's indicating why EAS isn't possible right now. >> >> When issuing a: >> >> # echo 0 > /proc/sys/kernel/sched_energy_aware >> >> we would see in the logs: >> >> ... >> [ 416.325324] rd 0-5: sysctl_sched_energy_aware is N <-- (*) >> [ 416.337844] sched_energy_set: stopping EAS >> ... >> >> but maybe (*) is not necessary ... I m not sure I understand 100% the point, but it seems to me that when changing sysctl_sched_energy_aware's value, either: - EAS is not possible, and sched_is_eas_possible() will output the reason why (i.e. "Checking EAS, [...]") - EAS was deactivated by the user, and it is possible to check the value of sysctl_sched_energy_aware. So there would be no need to have an additional message "Checking EAS, sysclt sched_energy_aware set to N" When build_perf_domains() is called while rebuilding the sched domains, it is also possible to check sched_energy_aware's value and understand why EAS is not enabled. > > ok. I think we can add similar info message in if(!sysctl_sched_energy_aware) like below? > Would that be enough? > > if (!sysctl_sched_energy_aware) { > if (sched_debug()) { > pr_info("rd %*pbl: Checking EAS, sysclt sched_energy_aware set to N\n", > cpumask_pr_args(cpu_map)); > } (No need for brackets here, just in case) > goto free; > } > > and remove the below one. > if (sched_debug()) > pr_warn("rd %*pbl: Disabling EAS", cpumask_pr_args(cpu_map)); > > > So there will be these "Checking EAS" and if possible, "starting EAS" or "stopping EAS" message. I will rebase the patch removing the EM complexity and check it/resend it, like this the 2 patches could go together: https://lore.kernel.org/lkml/20221121094336.3250917-1-pierre.gondois@arm.com/ Regards, Pierre