2022-05-24 11:54:32

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH] arch_topology: Limit threads to one specific cluster

On Tue, May 24, 2022 at 04:12:12PM +0800, Gavin Shan wrote:
> The sibling information for one particular CPU is updated after ACPI
> PPTT table is parsed. struct cpu_topology::thread_sibling tracks the
> the CPUs in same core. However, cluster isn't considered when it's
> populated. In this case, multiple threads belonging to different
> clusters can be put together through the sibling information. It
> eventually leads to unexpected warning from sched subsystem.
>
> For example, the following warning is observed in a VM where we have
> 2 sockets, 4 clusters, 8 cores and 16 threads and the CPU topology
> is populated as below.
>
> CPU Socket-ID Cluster-ID Core-ID Thread-ID
> ----------------------------------------------
> 0 0 0 0 0
> 1 0 0 0 1
> 2 0 0 1 0
> 3 0 0 1 1
> 4 0 1 0 0
> 5 0 1 0 1
> 6 0 1 1 0
> 7 0 1 1 1
> 8 1 0 0 0
> 9 1 0 0 1
> 10 1 0 1 0
> 11 1 0 1 1
> 12 1 1 0 0
> 13 1 1 0 1
> 14 1 1 1 0
> 15 1 1 1 1
>
> [ 0.592181] CPU: All CPU(s) started at EL1
> [ 0.593766] alternatives: patching kernel code
> [ 0.595890] BUG: arch topology borken
> [ 0.597210] the SMT domain not a subset of the CLS domain
> [ 0.599286] child=0-1,4-5 sd=0-3
>
> # cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
> 0-3
> # cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
> 0-1,4-5
>
> This fixes the issue by limiting threads to one specific cluster.
> With this applied, the unexpected warning disappears in the VM.
>

I have similar fix but as part of bigger series[1] to get DT support in
line with ACPI.

--
Regards,
Sudeep

[1] https://lore.kernel.org/lkml/[email protected]


2022-05-25 06:07:52

by Gavin Shan

[permalink] [raw]
Subject: Re: [PATCH] arch_topology: Limit threads to one specific cluster

Hi Sudeep,

On 5/24/22 4:51 PM, Sudeep Holla wrote:
> On Tue, May 24, 2022 at 04:12:12PM +0800, Gavin Shan wrote:
>> The sibling information for one particular CPU is updated after ACPI
>> PPTT table is parsed. struct cpu_topology::thread_sibling tracks the
>> the CPUs in same core. However, cluster isn't considered when it's
>> populated. In this case, multiple threads belonging to different
>> clusters can be put together through the sibling information. It
>> eventually leads to unexpected warning from sched subsystem.
>>
>> For example, the following warning is observed in a VM where we have
>> 2 sockets, 4 clusters, 8 cores and 16 threads and the CPU topology
>> is populated as below.
>>
>> CPU Socket-ID Cluster-ID Core-ID Thread-ID
>> ----------------------------------------------
>> 0 0 0 0 0
>> 1 0 0 0 1
>> 2 0 0 1 0
>> 3 0 0 1 1
>> 4 0 1 0 0
>> 5 0 1 0 1
>> 6 0 1 1 0
>> 7 0 1 1 1
>> 8 1 0 0 0
>> 9 1 0 0 1
>> 10 1 0 1 0
>> 11 1 0 1 1
>> 12 1 1 0 0
>> 13 1 1 0 1
>> 14 1 1 1 0
>> 15 1 1 1 1
>>
>> [ 0.592181] CPU: All CPU(s) started at EL1
>> [ 0.593766] alternatives: patching kernel code
>> [ 0.595890] BUG: arch topology borken
>> [ 0.597210] the SMT domain not a subset of the CLS domain
>> [ 0.599286] child=0-1,4-5 sd=0-3
>>
>> # cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
>> 0-3
>> # cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
>> 0-1,4-5
>>
>> This fixes the issue by limiting threads to one specific cluster.
>> With this applied, the unexpected warning disappears in the VM.
>>
>
> I have similar fix but as part of bigger series[1] to get DT support in
> line with ACPI.
>

Your patch resolves the issue I have. So please ignore mine. Sorry
for the noise.

https://lore.kernel.org/lkml/[email protected]

# cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
0-3
# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0-1

Thanks,
Gavin