2022-03-16 12:18:32

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched: dynamic config sd_flags if described in DT

On 15/03/2022 08:58, Qing Wang wrote:
> From: Wang Qing <[email protected]>

(1) Can you share more information about your CPU topology?

I guess it is a single DSU (DynamIQ Shared Unit) ARMv9 system with 8
CPUs? So L3 spans over [CPU0..CPU7].

You also mentioned complexes. Am I right in assuming that [CPU0..CPU3]
are Cortex-A510 cores where each 2 CPUs share a complex?

What kind of uarch are the CPUs in [CPU4..CPU7]? Are they Cortex-A510's
as well? I'm not sure after reading your email:

https://lkml.kernel.org/r/SL2PR06MB30828CF9FF2879AFC9DC53D2BD0C9@SL2PR06MB3082.apcprd06.prod.outlook.com

You might run into the issue that individual CPUs of your system see a
different SD hierarchy in case that [CPU4..CPU7] aren't Cortex-A510's,
i.e. CPUs not sharing complexes.

(2) Related to your MC Sched Domain (SD) layer:

If you have a single DSU ARMv9 system, then in Linux kernel mainline you
shouldn't have sub-clustering of [CPU0..CPU3] and [CPU4...CPU7].

I.e. the cpu-map entry in your dts file should only list cores, not
clusters.

I know that in Android the cluster entries are used to sub-group
different uarch CPUs in an asymmetric CPU capacity system (a.k.a. Arm
DynamIQ and Phantom domains) but this is eclipsing the true L3 (LLC)
information and is not "supported" (in the sense of "used") in mainline.

But I have a hard time to see what [CPU0..CPU3] or [CPU4..CPU7] are
shareing in your system.

(3) Why do you want this different SD hierarchy?

I assume in mainline your system will have a single SD which is MC (w/o
the Phantom domain approach from Android).

You mentioned cpus_share_cache(). Or is it the extra SD level which
changes the behaviour of CFS load-balancing? I'm just wondering since
EAS wouldn't be affected here. I'm sure I can understand this better
once we know more about your CPU topology.

[...]


2022-03-17 05:29:25

by 王擎

[permalink] [raw]
Subject: RE: [PATCH] sched: dynamic config sd_flags if described in DT


>(1) Can you share more information about your CPU topology?
>
>I guess it is a single DSU (DynamIQ Shared Unit) ARMv9 system with 8
>CPUs? So L3 spans over [CPU0..CPU7].
>
>You also mentioned complexes. Am I right in assuming that [CPU0..CPU3]
>are Cortex-A510 cores where each 2 CPUs share a complex?
>
>What kind of uarch are the CPUs in [CPU4..CPU7]? Are they Cortex-A510's
>as well? I'm not sure after reading your email:

Yes, Android systems are currently used default_domain with wrong sd_flags,
take Qualcomm SM8450 as an example, the CPU and cache topology(1+3+4):
| DSU |
| cluster0 | cluster1 |cluster2|
| core0 core1 core2 core3 | core4 core5 core6 | core7 |
| complex0 | complex1 | ------------------------ |
| L2 cache | L2 cache | L2 | L2 | L2 | L2 |
| L3 cache |

The sched domain now:
DIE[0-7] (no SD_SHARE_PKG_RESOURCES)
MC[0-3][4-6][7] (SD_SHARE_PKG_RESOURCES)

The sched domain should be:
DIE[0-7] (SD_SHARE_PKG_RESOURCES)
MC[0-3][4-6][7] (no SD_SHARE_PKG_RESOURCES)
*CLS[0-1][2-3](SD_SHARE_PKG_RESOURCES)

>https://lkml.kernel.org/r/SL2PR06MB30828CF9FF2879AFC9DC53D2BD0C9@SL2PR06MB3082.apcprd06.prod.outlook.com
>
>You might run into the issue that individual CPUs of your system see a
>different SD hierarchy in case that [CPU4..CPU7] aren't Cortex-A510's,
>i.e. CPUs not sharing complexes.
>
>(2) Related to your MC Sched Domain (SD) layer:
>
>If you have a single DSU ARMv9 system, then in Linux kernel mainline you
>shouldn't have sub-clustering of [CPU0..CPU3] and [CPU4...CPU7].
>
>I.e. the cpu-map entry in your dts file should only list cores, not
>clusters.

But in fact we will, as mentioned above.
>
>I know that in Android the cluster entries are used to sub-group
>different uarch CPUs in an asymmetric CPU capacity system (a.k.a. Arm
>DynamIQ and Phantom domains) but this is eclipsing the true L3 (LLC)
>information and is not "supported" (in the sense of "used") in mainline.
>
>But I have a hard time to see what [CPU0..CPU3] or [CPU4..CPU7] are
>shareing in your system.

They share L3 cache, but no share L2 which only shared within complex.
>
>(3) Why do you want this different SD hierarchy?
>
>I assume in mainline your system will have a single SD which is MC (w/o
>the Phantom domain approach from Android).
>
>You mentioned cpus_share_cache(). Or is it the extra SD level which
>changes the behaviour of CFS load-balancing? I'm just wondering since
>EAS wouldn't be affected here. I'm sure I can understand this better
>once we know more about your CPU topology.

What I want to do is :
1.Config the right sd_llc to sd, better to get it dynamically from DT
2.Benefit from the shared cache(L2) of the complex
i.e. when we look for sibling idle CPU, prior select the L2 shared CPU(inner complex) if L3 is all shared.

Thanks,
Wang

2022-03-17 12:51:17

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched: dynamic config sd_flags if described in DT

On 16/03/2022 03:46, 王擎 wrote:
>
>> (1) Can you share more information about your CPU topology?
>>
>> I guess it is a single DSU (DynamIQ Shared Unit) ARMv9 system with 8
>> CPUs? So L3 spans over [CPU0..CPU7].
>>
>> You also mentioned complexes. Am I right in assuming that [CPU0..CPU3]
>> are Cortex-A510 cores where each 2 CPUs share a complex?
>>
>> What kind of uarch are the CPUs in [CPU4..CPU7]? Are they Cortex-A510's
>> as well? I'm not sure after reading your email:
>
> Yes, Android systems are currently used default_domain with wrong sd_flags,
> take Qualcomm SM8450 as an example, the CPU and cache topology(1+3+4):

Ah, your system looks like this:

.---------------.
CPU |0 1 2 3 4 5 6 7|
+---------------+
uarch |l l l l m m m b| (so called tri-gear: little, medium, big)
+---------------+
L2 | | | | | | |
+---------------+
L3 |<-- -->|
+---------------+
|<-- cluster -->|
+---------------+
|<-- DSU -->|
'---------------'

> | DSU |
> | cluster0 | cluster1 |cluster2|

^^^ Those aren't real clusters, hence the name <Phantom> SD. The cluster
is [CPU0...CPU7]. Android uses Phantom SD to subgroup CPUs with the same
uarch. That's why you get your MC->DIE SD's on your system and
SHARE_PKG_RESOURCES (ShPR) on MC, rather DIE.

Note, you should already have an asymmetric SD hierarchy. CPU7 should
only have DIE not MC! Each CPU has its own SD hierarchy!

> | core0 core1 core2 core3 | core4 core5 core6 | core7 |
> | complex0 | complex1 | ------------------------ |
> | L2 cache | L2 cache | L2 | L2 | L2 | L2 |
> | L3 cache |
>
> The sched domain now:
> DIE[0-7] (no SD_SHARE_PKG_RESOURCES)
> MC[0-3][4-6][7] (SD_SHARE_PKG_RESOURCES)
>
> The sched domain should be:
> DIE[0-7] (SD_SHARE_PKG_RESOURCES)
> MC[0-3][4-6][7] (no SD_SHARE_PKG_RESOURCES)

First remember, using Phantom SD in Android is already a hack. Normally
your system should only have an MC SD for each CPU (with ShPR).

Now, if you want to move ShPR from MC to DIE then a custom topology
table should do it, i.e. you don't have to change any generic task
scheduler code.

static inline int cpu_cpu_flags(void)
{
return SD_SHARE_PKG_RESOURCES;
}

static struct sched_domain_topology_level custom_topology[] = {
#ifdef CONFIG_SCHED_SMT
{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
#endif

#ifdef CONFIG_SCHED_CLUSTER
{ cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
#endif

#ifdef CONFIG_SCHED_MC
{ cpu_coregroup_mask, SD_INIT_NAME(MC) },
^^^^
#endif
{ cpu_cpu_mask, cpu_cpu_flags, SD_INIT_NAME(DIE) },
^^^^^^^^^^^^^
{ NULL, },
};

set_sched_topology(custom_topology);

> *CLS[0-1][2-3](SD_SHARE_PKG_RESOURCES)

But why do you want to have yet another SD underneath MC for CPU0-CPU3?
sd_llc is assigned to the highest ShPR SD, which would be DIE.

>> https://lkml.kernel.org/r/SL2PR06MB30828CF9FF2879AFC9DC53D2BD0C9@SL2PR06MB3082.apcprd06.prod.outlook.com
>>
>> You might run into the issue that individual CPUs of your system see a
>> different SD hierarchy in case that [CPU4..CPU7] aren't Cortex-A510's,
>> i.e. CPUs not sharing complexes.
>>
>> (2) Related to your MC Sched Domain (SD) layer:
>>
>> If you have a single DSU ARMv9 system, then in Linux kernel mainline you
>> shouldn't have sub-clustering of [CPU0..CPU3] and [CPU4...CPU7].
>>
>> I.e. the cpu-map entry in your dts file should only list cores, not
>> clusters.
>
> But in fact we will, as mentioned above.

OK, so your system needs this `fake` sub-grouping on uarch boundaries.
Probably because of (out-of-tree) Android/platform code? Have you tried
to run on a clean mainline SD hierarchy (only MC)? Is the Phantom SD
still required?

>> I know that in Android the cluster entries are used to sub-group
>> different uarch CPUs in an asymmetric CPU capacity system (a.k.a. Arm
>> DynamIQ and Phantom domains) but this is eclipsing the true L3 (LLC)
>> information and is not "supported" (in the sense of "used") in mainline.
>>
>> But I have a hard time to see what [CPU0..CPU3] or [CPU4..CPU7] are
>> shareing in your system.
>
> They share L3 cache, but no share L2 which only shared within complex.

I should have written: What do they share exclusively? I can only see
that they ([CPU0..CPU3], [CPU4..CPU6], [CPU7]) share uarch exclusively.
Which relates to this fake (Phantom) SD. L3 is shared between all CPUs.

>> (3) Why do you want this different SD hierarchy?
>>
>> I assume in mainline your system will have a single SD which is MC (w/o
>> the Phantom domain approach from Android).
>>
>> You mentioned cpus_share_cache(). Or is it the extra SD level which
>> changes the behaviour of CFS load-balancing? I'm just wondering since
>> EAS wouldn't be affected here. I'm sure I can understand this better
>> once we know more about your CPU topology.
>
> What I want to do is :
> 1.Config the right sd_llc to sd, better to get it dynamically from DT

So this should be ShPR on DIE. You have 2 choices here. Use flat MC SD
(mainline) or Phantom SD's + custom_topology.

> 2.Benefit from the shared cache(L2) of the complex
> i.e. when we look for sibling idle CPU, prior select the L2 shared CPU(inner complex) if L3 is all shared.

But cpus_share_cache() operates on sd_llc_id which relates to DIE SD
even for [CPU0..CPU3]?

2022-03-24 07:32:58

by 王擎

[permalink] [raw]
Subject: RE: [PATCH] sched: dynamic config sd_flags if described in DT


>>
>>> (1) Can you share more information about your CPU topology?
>>>
>>> I guess it is a single DSU (DynamIQ Shared Unit) ARMv9 system with 8
>>> CPUs? So L3 spans over [CPU0..CPU7].
>>>
>>> You also mentioned complexes. Am I right in assuming that [CPU0..CPU3]
>>> are Cortex-A510 cores where each 2 CPUs share a complex?
>>>
>>> What kind of uarch are the CPUs in [CPU4..CPU7]? Are they Cortex-A510's
>>> as well? I'm not sure after reading your email:
>>
>> Yes, Android systems are currently used default_domain with wrong sd_flags,
>> take Qualcomm SM8450 as an example, the CPU and cache topology(1+3+4):
>
>Ah, your system looks like this:
>
>      .---------------.
>CPU   |0 1 2 3 4 5 6 7|
>      +---------------+
>uarch |l l l l m m m b| (so called tri-gear: little, medium, big)
>      +---------------+
>  L2  |   |   | | | | |
>      +---------------+
>  L3  |<--         -->|
>      +---------------+
>      |<-- cluster -->|
>      +---------------+
>      |<--   DSU   -->|
>      '---------------'
>
>> |                           DSU                            |
>> |           cluster0         |       cluster1     |cluster2|
>
>^^^ Those aren't real clusters, hence the name <Phantom> SD. The cluster
>is [CPU0...CPU7]. Android uses Phantom SD to subgroup CPUs with the same
>uarch. That's why you get your MC->DIE SD's on your system and
>SHARE_PKG_RESOURCES (ShPR) on MC, rather DIE.
>
>Note, you should already have an asymmetric SD hierarchy. CPU7 should
>only have DIE not MC! Each CPU has its own SD hierarchy!
>
>> | core0  core1  core2  core3 |  core4 core5 core6 | core7  |
>> |   complex0  |   complex1   |  ------------------------   |
>> |   L2 cache  |   L2 cache   |   L2  |  L2 |  L2  |   L2   |
>> |                         L3 cache                         |
>>
>> The sched domain now:
>> DIE[0-7]  (no SD_SHARE_PKG_RESOURCES)
>> MC[0-3][4-6][7] (SD_SHARE_PKG_RESOURCES)
>>
>> The sched domain should be:
>> DIE[0-7]  (SD_SHARE_PKG_RESOURCES)
>> MC[0-3][4-6][7] (no SD_SHARE_PKG_RESOURCES)
>
>First remember, using Phantom SD in Android is already a hack. Normally
>your system should only have an MC SD for each CPU (with ShPR).
>
>Now, if you want to move ShPR from MC to DIE then a custom topology
>table should do it, i.e. you don't have to change any generic task
>scheduler code.
>
>static inline int cpu_cpu_flags(void)
>{
>       return SD_SHARE_PKG_RESOURCES;
>}
>
>static struct sched_domain_topology_level custom_topology[] = {
>#ifdef CONFIG_SCHED_SMT
>        { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
>#endif
>
>#ifdef CONFIG_SCHED_CLUSTER
>        { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
>#endif
>
>#ifdef CONFIG_SCHED_MC
>        { cpu_coregroup_mask, SD_INIT_NAME(MC) },
>                            ^^^^
>#endif
>        { cpu_cpu_mask, cpu_cpu_flags, SD_INIT_NAME(DIE) },
>                        ^^^^^^^^^^^^^
>        { NULL, },
>};
>
>set_sched_topology(custom_topology);

However, due to the limitation of GKI, we cannot change the sd topology
by ourselves. But we can configure CPU and cache topology through DT.

So why not get the ShPR from DT first? If not configured, use the default.
>
>> *CLS[0-1][2-3](SD_SHARE_PKG_RESOURCES)
>
>But why do you want to have yet another SD underneath MC for CPU0-CPU3?
>sd_llc is assigned to the highest ShPR SD, which would be DIE.

We want do something from the shared L2 cache(for complex, like walt),
you can ignore it here and talk about it when we done.

Thanks,
Wang

2022-03-25 17:16:50

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched: dynamic config sd_flags if described in DT

On 23/03/2022 07:45, 王擎 wrote:

[...]

>> Now, if you want to move ShPR from MC to DIE then a custom topology
>> table should do it, i.e. you don't have to change any generic task
>> scheduler code.
>>
>> static inline int cpu_cpu_flags(void)
>> {
>>        return SD_SHARE_PKG_RESOURCES;
>> }
>>
>> static struct sched_domain_topology_level custom_topology[] = {
>> #ifdef CONFIG_SCHED_SMT
>>         { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
>> #endif
>>
>> #ifdef CONFIG_SCHED_CLUSTER
>>         { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
>> #endif
>>
>> #ifdef CONFIG_SCHED_MC
>>         { cpu_coregroup_mask, SD_INIT_NAME(MC) },
>>                             ^^^^
>> #endif
>>         { cpu_cpu_mask, cpu_cpu_flags, SD_INIT_NAME(DIE) },
>>                         ^^^^^^^^^^^^^
>>         { NULL, },
>> };
>>
>> set_sched_topology(custom_topology);
>
> However, due to the limitation of GKI, we cannot change the sd topology
> by ourselves. But we can configure CPU and cache topology through DT.

IMHO, mainline can't do anything here. You should talk to your Android
platform provider in this case. Android concepts like Generic Kernel
Image (GKI) are normally not discussed here.

From mainline perspective we're OK with scheduling such a system flat,
e.g. only with a single MC SD [CPU0..CPU7] for each CPU.
It could be that the Phantom SD is still needed for additional
proprietary or Android add-ons though?

In case you would remove `clusterX` from your DT cpu-map (Phantom SD
information, i.e. the reason for why you have e.g. for CPU0: `MC (ShPR)
[CPU0..CPU3] and DIE [CPU0..CPU7]`) , you should see the natural
topology: only `MC (ShPR) [CPU0..CPU7]`.

> So why not get the ShPR from DT first? If not configured, use the default.

I'm not convinced that mainline will accept a change which is necessary
for a out-of-tree tweak (Phantom SD).

>>> *CLS[0-1][2-3](SD_SHARE_PKG_RESOURCES)
>>
>> But why do you want to have yet another SD underneath MC for CPU0-CPU3?
>> sd_llc is assigned to the highest ShPR SD, which would be DIE.
>
> We want do something from the shared L2 cache(for complex, like walt),
> you can ignore it here and talk about it when we done.

I assume you refer to the proprietary load-tracking mechanism `Window
Assisted Load Tracking` (WALT) here? It's also not in mainline.