Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp203374rdb; Sat, 30 Sep 2023 00:46:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEjcGVg7xOuu/9sidAfzOPbM66QW8Hrn7XSaU1rLreGzI52Y6mg6J+pXbP9ey/z2x1BuxBV X-Received: by 2002:a05:6808:1c7:b0:3ad:9540:dc0 with SMTP id x7-20020a05680801c700b003ad95400dc0mr6366063oic.4.1696060001923; Sat, 30 Sep 2023 00:46:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696060001; cv=none; d=google.com; s=arc-20160816; b=exxkXB2wmPZJt81xnM6xuYHqc2Fe9ocC/MFA9Uo5jmFMp2vqlq8SFcusGL25XjWcfQ 0G2IUhab3RgaXH4qhh6qcwqHuDMgSRGVEDQpn7HGuUkt3RKVrEgE9w/9nT9mKdY4SYQg vbnLEXQeqLEG6ws39mEIo8B/RGdIZYIHJokk1L9VVfK3Aqe775ENT0TmNJXTpwW4OEzp h7/JczOiiFyxxDZWk0pR61Jz+dzuDhCsQPDS7ddG5mRW+p6ztHUfAgOJCLfuNIRKFSw1 t7qEIaxdQUL6WPZaYLI4dyfoIwicdel78x46IFuOEYgW09ZGWgft9SsHFDUj7/Agwdu3 wORA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :in-reply-to:from:references:cc:to:content-language:subject :user-agent:date:message-id:dkim-signature; bh=hax/gXeddJZ69gYN/iUwd43/4stSZKTf2sElBWkn7J0=; fh=AN2kKpgVDP33GrGbrqDa2ismwjTuOwk9CMWoVPCT6Tg=; b=NV+WUgcDXIhStI5xYecV9qiAIozlAay6icBFYaS9M3P14RiQbtR5UpUg0rKp44lGWc 34aTe8dtGPz21KoVorZXftyAnqtX80i6wVyPiDrUbThNvg17bIwNRIy9QIAopb4Y/uk0 Rd0KzaFFbWEAlFyw8TwWdOcPaEF66xUeZqXEMCEoKN+d6XcKeeo9+YXv7wzGIFpsjfZd XGXEH1FtFp+osjY0BmM4z6JTKTFblSfgydHtbA2dJTccqjC83BR6KNuGvAxRNlQaPDUc ocUBsWqUVUvljg7WUPm6WYFhBAfumHrl0+spbsmehUWiqm5sjcVWcW6DbUIcngnmsM+b 0Hqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=nOp5wae5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id j70-20020a638049000000b005777535a67bsi22903671pgd.746.2023.09.30.00.46.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 30 Sep 2023 00:46:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=nOp5wae5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 40A878311BD9; Fri, 29 Sep 2023 01:49:19 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232519AbjI2ItI (ORCPT + 99 others); Fri, 29 Sep 2023 04:49:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231774AbjI2ItH (ORCPT ); Fri, 29 Sep 2023 04:49:07 -0400 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCF7ABF for ; Fri, 29 Sep 2023 01:49:04 -0700 (PDT) Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38T8K5Hk021418; Fri, 29 Sep 2023 08:48:45 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pp1; bh=hax/gXeddJZ69gYN/iUwd43/4stSZKTf2sElBWkn7J0=; b=nOp5wae5FpjD/HUi4SQs167GyByI7i347tawIkZ4xvVtR6dmPc07Ouyffvr4I9QWZN+S 828cqwohn6J4qxoPKLu0b8vYVVh3+GlSI9ag6dNfPdKapM2F8B/+xDbe2A3GJNuYD73K XsuiH7Yxpn/BaDC/0DqlQFSnOK8lZx+RQ8qcjgvaDjDL7lGKWAEMYV/8BrBU7oj5H5uD cKH8LjjtvV8qCcZ36/JB76bqHpTxd+QTl3J6z8cysVne3hRUChd/RC7YUAoIxqQ4eyFw 6GY2cp34XqpsfI0HWlGVZDiH0CZGvmLzEhBIisHeT6zZD2nPsZq79pOtTUgC2/Rx0B3q AA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3tdsj0trd3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 08:48:44 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 38T8gI9O031866; Fri, 29 Sep 2023 08:48:44 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3tdsj0trcq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 08:48:44 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 38T6YGmq011019; Fri, 29 Sep 2023 08:48:43 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3tabum37v1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 29 Sep 2023 08:48:43 +0000 Received: from smtpav04.wdc07v.mail.ibm.com (smtpav04.wdc07v.mail.ibm.com [10.39.53.231]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 38T8mg6V918172 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 29 Sep 2023 08:48:42 GMT Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 488D258050; Fri, 29 Sep 2023 08:48:42 +0000 (GMT) Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EA9D958045; Fri, 29 Sep 2023 08:48:21 +0000 (GMT) Received: from [9.171.6.196] (unknown [9.171.6.196]) by smtpav04.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 29 Sep 2023 08:48:21 +0000 (GMT) Message-ID: <3d62f900-7c2a-43ae-9451-254fb2517c10@linux.vnet.ibm.com> Date: Fri, 29 Sep 2023 14:18:20 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH v4] sched/topology: change behaviour of sysctl sched_energy_aware based on the platform Content-Language: en-US To: Pierre Gondois Cc: linux-kernel@vger.kernel.org, ionela.voinescu@arm.com, srikar@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org, yu.c.chen@intel.com, tim.c.chen@linux.intel.com, pauld@redhat.com, mingo@redhat.com, peterz@infradead.org, vincent.guittot@linaro.org, vschneid@redhat.com, qperret@google.com, Dietmar Eggemann References: <20230926100046.405188-1-sshegde@linux.vnet.ibm.com> <8ac1576c-909b-ec6b-930d-0683ca288bf9@arm.com> <46353945-fced-9b69-b334-3b7ae50c957c@linux.vnet.ibm.com> <4534b7b5-7127-27d9-8264-6703497dcb81@arm.com> From: Shrikanth Hegde In-Reply-To: <4534b7b5-7127-27d9-8264-6703497dcb81@arm.com> Content-Type: text/plain; charset=UTF-8 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: H1QMIsU8XAvOOipx3AuC1X7VeAKEXjz4 X-Proofpoint-GUID: O7MJH2u9Y3Kl_c6R3vEHnW-LvWKbXaXi Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-09-29_07,2023-09-28_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 suspectscore=0 adultscore=0 clxscore=1015 priorityscore=1501 bulkscore=0 mlxlogscore=999 lowpriorityscore=0 spamscore=0 mlxscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2309180000 definitions=main-2309290073 X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 29 Sep 2023 01:49:19 -0700 (PDT) On 9/28/23 3:24 PM, Pierre Gondois wrote: > > > On 9/28/23 11:25, Pierre Gondois wrote: >> Hello Shrikanth, Dietmar, >> >> On 9/27/23 19:08, Shrikanth Hegde wrote: >>> >>> >>> On 9/27/23 6:44 PM, Dietmar Eggemann wrote: >>>> Ah, BTW s/quentin.perret@arm.com/qperret@google.com >>>> >>>> On 27/09/2023 10:14, Shrikanth Hegde wrote: >>>>> >>>>> >>>>> On 9/27/23 2:59 AM, Dietmar Eggemann wrote: >>>>>> On 26/09/2023 12:00, Shrikanth Hegde wrote: >>>> >>>> [...] >>>> >>>>>>> At present, though platform doesn't support EAS, this sysctl >>>>>>> returns 1 >>>>>>> and it ends up calling rebuild of sched domain on write to 1 and >>>>>> >>>>>> sched domains are not rebuild in this case, i.e. >>>>>> partition_sched_domains_locked() does not call >>>>>> detach_destroy_domains() >>>>>> or build_sched_domains(). Only build_perf_domains() is called which >>>>>> bails out if !sysctl_sched_energy_aware or one of the EAS >>>>>> conditions is >>>>>> not true. >>>>>> >>>>> >>>>> ok. that's because it goes to match1 and match2 right? >>>> >>>> yes. >>>> >>>> [...] >>>> >>>>>>> @@ -231,6 +289,14 @@ static int sched_energy_aware_handler(struct >>>>>>> ctl_table *table, int write, >>>>>>>            return -EPERM; >>>>>>> >>>>>>>        ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); >>>>>>> +    if (!sched_is_eas_possible(cpu_active_mask)) { >>>>>> >>>>>> This is using `cpu_active_mask` so EAS can only be disabled >>>>>> system-wide. >>>>>> >>>>>> So I experimented with an exclusive cpuset setup creating a symmetric >>>>>> (cs1) and an asymmetric (cs2) island on my Juno with its cpumask = >>>>>> [l B >>>>>> B l l l] (l - little CPU, B - Big CPU). >>>>>> >>>>>> root@juno:~# cd /sys/fs/cgroup/cpuset >>>>>> root@juno:/sys/fs/cgroup/cpuset# mkdir cs1 >>>>>> root@juno:/sys/fs/cgroup/cpuset# echo 1 > cs1/cpuset.cpu_exclusive >>>>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cs1/cpuset.mems >>>>>> root@juno:/sys/fs/cgroup/cpuset# echo 4,5 > cs1/cpuset.cpus >>>>>> root@juno:/sys/fs/cgroup/cpuset# mkdir cs2 >>>>>> root@juno:/sys/fs/cgroup/cpuset# echo 1 > cs2/cpuset.cpu_exclusive >>>>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cs2/cpuset.mems >>>>>> root@juno:/sys/fs/cgroup/cpuset# echo 0-3 > cs2/cpuset.cpus >>>>>> root@juno:/sys/fs/cgroup/cpuset# echo 0 > cpuset.sched_load_balance >>>>>> >>>>>> [ 3021.761278] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{ >>>>>> cpus=0,3-5 nr_pstate=5 } >>>>>> >>>>>> root@juno:~# echo 0 > /proc/sys/kernel/sched_energy_aware >>>>>> >>>>>> log messages: >>>>>> ... >>>>>> [ 3143.538583] rd 4-5: Disabling EAS >>>>>> [ 3143.549569] rd 0-3: Disabling EAS >>>>>> [ 3143.560681] sched_energy_set: stopping EAS >>>>>> ... >>>>>> >>>>>> root@juno:~# echo 1 > /proc/sys/kernel/sched_energy_aware >>>>>> >>>>>> log messages: >>>>>> ... >>>>>> [ 3223.277521] root_domain 0-3: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{ >>>>>> cpus=0,3-5 nr_pstate=5 } >>>>>> [ 3223.293409] sched_energy_set: starting EAS >>>>>> >>>>>> Seems still to work correctly. >>>>> >>>>> I see that can be a issue. using first cpu here check to asymmetric >>>>> cpu capacity. >>>>> It would have worked here, since the first group had asymmetry. ( l >>>>> B B l ). >>>>> It wouldn't have worked if the first group had like ( l l ) and >>>>> second group has ( l B B l ) >>>> >>>> Yeah, that's true. >>>> >>>>     sched_is_eas_possible(const struct cpumask *cpu_mask) >>>> >>>>       !per_cpu(sd_asym_cpucapacity, cpumask_first(cpu_mask)); >>>> >>>> cpusets cs1=[0,5] and cs2=[1-4] as such an example. >>>> >>> >>> right. >>> >>>> >>>>> Instead of cpu_active_mask, I can make use of ndoms_cur and >>>>> doms_cur[i] logic to >>>>> traverse through possible doms, and if none of the domains have >>>>> asymmetric cpu capacity >>>>> return false.  Does that look right? >>>> >>>>     rebuild_sched_domains() >>>> >>>>       rebuild_sched_domains_locked() >>>> >>>>         ndoms = generate_sched_domains(&doms, &attr) >>>> >>>> You would need generate_sched_domains() in >>>> sched_energy_aware_handler()? >>>> >>> >>> clearly I didnt think through this. ndoms_cur and doms_cur are >>> updated at the end. >>> So If EAS is possible at boot, this would fail. >>> >>> >>> What  sched_is_eas_possible needs is if there is asymmetric cpu >>> topology. >>> Simpler loop of all CPU's and checking if there any of them have >>> sd_asym_cpucapacity might do, >>> though it goes through all CPU's, Since these functions are not in >>> hot path >>> So it should not affect any performance. Something like below would >>> work? >>> >>>     bool any_asym_capacity = false; >>> >>>           /* EAS is enabled for asymmetric CPU capacity topologies. */ >>>           for_each_cpu(i, cpu_mask) { >>>                   if (per_cpu(sd_asym_cpucapacity, i)) { >>>                           any_asym_capacity = true; >>>                           break; >>>                   } >>>           } >>>           if (!any_asym_capacity) { >>>                   if (sched_debug()) { >>>                           pr_info("rd %*pbl: Checking EAS, CPUs do >>> not have asymmetric capacities\n", >>>                                   cpumask_pr_args(cpu_mask)); >>>                   } >>>                   return false; >>>           } >>> >> >> FYIW I could reproduce the issue mentioned above, and the suggested bit >> seems to solve it. >> Thanks for trying this out. >>> >>> >>>>>> [...] >>>>>> >>>>>>> @@ -458,6 +487,8 @@ static bool build_perf_domains(const struct >>>>>>> cpumask *cpu_map) >>>>>>>        return !!pd; >>>>>>> >>>>>>>    free: >>>>>>> +    if (sched_debug()) >>>>>>> +        pr_warn("rd %*pbl: Disabling EAS", >>>>>>> cpumask_pr_args(cpu_map)); >>>>>> >>>>>> Shouldn't this be used in condition `if >>>>>> (!sysctl_sched_energy_aware)`? >>>>>> Otherwise we have 2 warnings when the other conditions which leads to >>>>>> `goto free` aren't met. >>>>> Since sched_energy_set has the info messages about start and stop >>>>> of EAS, maybe >>>>> this debug is not needed. Will remove it. Doing it only `if >>>>> (!sysctl_sched_energy_aware)` >>>> >>>> OK. >>>> >>>>> also doesn't seem correct, as calling free in this function would >>>>> free the perf_domains. >>>> >>>> But !sched_is_eas_possible() also does `goto free` and in there we we >>>> emit pr_info's indicating why EAS isn't possible right now. >>>> >>>> When issuing a: >>>> >>>> # echo 0 > /proc/sys/kernel/sched_energy_aware >>>> >>>> we would see in the logs: >>>> >>>> ... >>>> [  416.325324] rd 0-5: sysctl_sched_energy_aware is N   <-- (*) >>>> [  416.337844] sched_energy_set: stopping EAS >>>> ... >>>> >>>> but maybe (*) is not necessary ... >> >> I m not sure I understand 100% the point, but it seems to me that when >> changing sysctl_sched_energy_aware's value, either: >> - EAS is not possible, and sched_is_eas_possible() will output the >> reason why >>     (i.e. "Checking EAS, [...]") >> - EAS was deactivated by the user, and it is possible to check the >> value of >>     sysctl_sched_energy_aware. So there would be no need to have an >> additional >>     message "Checking EAS, sysclt sched_energy_aware set to N" >> >> When build_perf_domains() is called while rebuilding the sched >> domains, it is also >> possible to check sched_energy_aware's value and understand why EAS is >> not enabled. >> >>> >>> ok. I think we can add similar info message in >>> if(!sysctl_sched_energy_aware) like below? >>> Would that be enough? >>> >>>           if (!sysctl_sched_energy_aware) { >>>                   if (sched_debug()) { >>>                           pr_info("rd %*pbl: Checking EAS, sysclt >>> sched_energy_aware set to N\n", >>>                                   cpumask_pr_args(cpu_map)); >>>                   } >> >> (No need for brackets here, just in case) >> >>>                   goto free; >>>           } >>> >>> and remove the below one. >>>           if (sched_debug()) >>>                   pr_warn("rd %*pbl: Disabling EAS", >>> cpumask_pr_args(cpu_map)); >>> >>> >>> So there will be these "Checking EAS" and if possible, "starting EAS" >>> or "stopping EAS" message. >> >> >> I will rebase the patch removing the EM complexity and check it/resend >> it, >> like this the 2 patches could go together: >> https://lore.kernel.org/lkml/20221121094336.3250917-1-pierre.gondois@arm.com/ > > The patch actually seems to apply cleanly on v6.6-rc3, and the > complexity of > find_energy_efficient_cpu() seems to be the same as what it was when the > patch > was sent, so I believe you can: > - Rebase your patch on top of it > - Provide a pointer to it in the changelog to mention the dependency > - Remove this bit in the commit message of your patch: > "It takes most of the cases into account except one where EM complexity is > too high as the code was bit tricky to separate that." > > Regards, > Pierre Ok. I should be able to do this and send out v5 soon.