Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp246632imw; Fri, 8 Jul 2022 02:07:33 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uisa/cSP+efjrzD6U59njQWWNmyT+5+12pd8dzhZQrs0g+lCQib+d3Bpvk8ANZIi0NhniI X-Received: by 2002:a05:6402:241e:b0:437:d732:20d2 with SMTP id t30-20020a056402241e00b00437d73220d2mr3361122eda.39.1657271253605; Fri, 08 Jul 2022 02:07:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657271253; cv=none; d=google.com; s=arc-20160816; b=tOgoRkaQbryp7SsSx6Hig/DHBty430sa8I5nCM7fweyzNPUdiI1geSpm2iOpX7O+jt pM/7C6PhnHngY4sMUEJoVDQ6GuB1W4qWtZTQVFKbt9qihJVJrW/DkD+8jSehL+1JmZ02 bOSdTho86P/pfGgpcyiw9ztUtOET5F7xJpizbK5T1MwIYAzbOzBkaO99PqjB0lbT3tf2 /yTx8spASzrRfmTJLuxyzWubkucUxJei4BaSEfYugD1QRNBxFFAUgqfevR+VPtFBwd3V iqZjOEusUrvE+0NtElE7Em+3UtiNHqnBu6MAmABmwQgfSwRzZwG3LNFqbkbTCsvfAGRH AiNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=hxuAMC0S8R1XK3GUV7aF5NNiswmLMqc1uxgfzosEDDM=; b=PVkC2RMSs8j8PCrM2XlD7iXMbzwNp1SuL2eaZnsriKhIN7+N33HgpkVAiVU+htV+wz znSxpYw+nEH+Fp/ZQ9h06KvLhLSFQSTR8YeaZT3Et2a2vZuM5LUTM4ZpJrhDjZiNw1z0 c/Rg/TgU4ALGgtTQDc6PzZpNTgP4azv54xfJNPIRMsek8mEROcWZisA6bMMUV4aL0eUy l7oV4dm9FDdk96d27Hudu3ELvUyC+ejzOwTuvj7UD29BICJLRe9A77ENurHLydOi6uHs BkCCUPozdnEVLm1nlJa91JdSyqFRZVIrUTYED42gY2sz2vuk3iwMyxoKaVSMdsp+Bp05 nagA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hc17-20020a170907169100b007152a3ee4fesi6921758ejc.834.2022.07.08.02.07.01; Fri, 08 Jul 2022 02:07:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237365AbiGHJFh (ORCPT + 99 others); Fri, 8 Jul 2022 05:05:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230230AbiGHJFg (ORCPT ); Fri, 8 Jul 2022 05:05:36 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 32DB1C05 for ; Fri, 8 Jul 2022 02:05:35 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 53740D6E; Fri, 8 Jul 2022 02:05:35 -0700 (PDT) Received: from localhost (ionvoi01-desktop.cambridge.arm.com [10.1.196.65]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9321F3F792; Fri, 8 Jul 2022 02:05:34 -0700 (PDT) Date: Fri, 8 Jul 2022 10:05:32 +0100 From: Ionela Voinescu To: Darren Hart Cc: Sudeep Holla , linux-kernel@vger.kernel.org, Greg Kroah-Hartman , conor.dooley@microchip.com, valentina.fernandezalanis@microchip.com, Vincent Guittot , Dietmar Eggemann , Qing Wang , Rob Herring , "Rafael J . Wysocki" , Pierre Gondois , linux-arm-kernel@lists.infradead.org, linux-riscv@lists.infradead.org Subject: Re: [PATCH v6 17/21] arch_topology: Limit span of cpu_clustergroup_mask() Message-ID: References: <20220704101605.1318280-1-sudeep.holla@arm.com> <20220704101605.1318280-18-sudeep.holla@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Darren, On Thursday 07 Jul 2022 at 17:10:19 (-0700), Darren Hart wrote: > On Mon, Jul 04, 2022 at 11:16:01AM +0100, Sudeep Holla wrote: > > From: Ionela Voinescu > > Hi Sudeep and Ionela, > > > > > Currently the cluster identifier is not set on DT based platforms. > > The reset or default value is -1 for all the CPUs. Once we assign the > > cluster identifier values correctly, the cluster_sibling mask will be > > populated and returned by cpu_clustergroup_mask() to contribute in the > > creation of the CLS scheduling domain level, if SCHED_CLUSTER is > > enabled. > > > > To avoid topologies that will result in questionable or incorrect > > scheduling domains, impose restrictions regarding the span of clusters, > > Can you provide a specific example of a valid topology that results in > the wrong thing currently? > When CONFIG_SCHED_CLUSTER=y, all typical big.LITTLE platforms will end up having a CLS level instead of MC, with an extra flag for the CLS level: SD_PREFER_SIBLING. Additional to this, potentially broken cluster descriptions in DT (let's say clusters spanning more CPUs than the LLC domain) will result in broken scheduler topologies. This drew our attention that the span of clusters should be restricted to ensure they always span less CPUs than LLC, if LLC information exists and LLC spans more than 1 core. But the Ampere Altra functionality you introduced is maintained. I'll detail this below. > > as presented to scheduling domains building code: cluster_sibling should > > not span more or the same CPUs as cpu_coregroup_mask(). > > > > This is needed in order to obtain a strict separation between the MC and > > CLS levels, and maintain the same domains for existing platforms in > > the presence of CONFIG_SCHED_CLUSTER, where the new cluster information > > is redundant and irrelevant for the scheduler. > > Unfortunately, I believe this changes the behavior for the existing > Ampere Altra systems, resulting in degraded performance particularly > latency sensitive workloads by effectively reverting: > > db1e59483d topology: make core_mask include at least cluster_siblings > > and ensuring the clustergroup_mask will return with just one CPU for the > condition the above commit addresses. > It does not change the functionality on Ampere Altra. cpu_coregroup_mask will still return 2 CPUs (cluster span). The difference is that cpu_clustergroup_mask will see that cpu_coregroup_masks spans the same CPUs and it will return a single CPU. This results in the CLS level being invalidated, and the MC level maintained. But MC will span 2 CPUs, instead of 1, which was the case before your fix. This is alright as MC and CLS have the same flags so the existing functionality is fully maintained. I've tested on Ampere Altra: without patch: cpu0/domain0/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_PKG_RESOURCES SD_PREFER_SIBLING cpu0/domain0/min_interval:2 cpu0/domain0/name:CLS cpu0/domain1/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_PREFER_SIBLING cpu0/domain1/min_interval:80 cpu0/domain1/name:DIE cpu0/domain2/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SERIALIZE SD_OVERLAP SD_NUMA cpu0/domain2/min_interval:160 cpu0/domain2/name:NUMA with this patch: cpu0/domain0/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_PKG_RESOURCES SD_PREFER_SIBLING cpu0/domain0/min_interval:2 cpu0/domain0/name:MC cpu0/domain1/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_PREFER_SIBLING cpu0/domain1/min_interval:80 cpu0/domain1/name:DIE cpu0/domain2/flags:SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SERIALIZE SD_OVERLAP SD_NUMA cpu0/domain2/min_interval:160 cpu0/domain2/name:NUMA > > > > While previously the scheduling domain builder code would have removed MC > > as redundant and kept CLS if SCHED_CLUSTER was enabled and the > > cpu_coregroup_mask() and cpu_clustergroup_mask() spanned the same CPUs, > > now CLS will be removed and MC kept. > > > > This is not desireable for all systems, particular those which don't > have an L3 but do share other resources - such as the snoop filter in > the case of the Ampere Altra. > > While not universally supported, we agreed in the discussion on the > above patch to allow systems to define clusters independently from the > L3 as an LLC since this is also independently defined in PPTT. > > Going back to my first comment - does this fix an existing system with a > valid topology? It's not clear to me what that would look like. The > Ampere Altra presents a cluster level in PPTT because that is the > desireable topology for the system. If it's not desirable for another > system to have the cluster topology - shouldn't it not present that > layer to the kernel in the first place? > Hopefully my comments above have clarified these. Thanks, Ionela. > Thanks, > > > Cc: Darren Hart > > Acked-by: Vincent Guittot > > Tested-by: Ionela Voinescu > > Signed-off-by: Ionela Voinescu > > Signed-off-by: Sudeep Holla > > --- > > drivers/base/arch_topology.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c > > index e384afb6cac7..591c1f8e15e2 100644 > > --- a/drivers/base/arch_topology.c > > +++ b/drivers/base/arch_topology.c > > @@ -686,6 +686,14 @@ const struct cpumask *cpu_coregroup_mask(int cpu) > > > > const struct cpumask *cpu_clustergroup_mask(int cpu) > > { > > + /* > > + * Forbid cpu_clustergroup_mask() to span more or the same CPUs as > > + * cpu_coregroup_mask(). > > + */ > > + if (cpumask_subset(cpu_coregroup_mask(cpu), > > + &cpu_topology[cpu].cluster_sibling)) > > + return get_cpu_mask(cpu); > > + > > return &cpu_topology[cpu].cluster_sibling; > > } > > > > -- > > 2.37.0 > > > > -- > Darren Hart > Ampere Computing / OS and Kernel