2016-11-29 07:03:29

by Brice Goglin

[permalink] [raw]
Subject: AMD Bulldozer topology regression since 4.6

Hello

Since Linux 4.6 (and still in 4.9-rc5 at least), both AMD Bulldozer
cores of a single dual-core compute unit report the same core_id:

$ cat /sys/devices/system/cpu/cpu{?,??}/topology/core_id
0
0
1
1
2
2
3
0
3
[...]


Before 4.5 (and for a very long time), the kernel reported different
core_ids:

0
1
2
3
4
5
6
7
0
[...]


This causes user-space tools (at least all that rely on hwloc for
topology discovery) to think the processor has dual-threaded cores
instead of dual-core compute-unit modules.

The cause is likely this patch, which seems to assume both cores of a
same compute-unit have the same ID, which is wrong?

commit 8196dab4fc159943df6baaac04973bb1accb7100
Author: Borislav Petkov <[email protected]>
Date: Fri Mar 25 15:52:36 2016 +0100

x86/cpu: Get rid of compute_unit_id

It is cpu_core_id anyway.

Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>



Thanks
Brice



2016-11-29 19:39:34

by Borislav Petkov

[permalink] [raw]
Subject: Re: AMD Bulldozer topology regression since 4.6

On Tue, Nov 29, 2016 at 08:03:15AM +0100, Brice Goglin wrote:
> This causes user-space tools (at least all that rely on hwloc for
> topology discovery) to think the processor has dual-threaded cores
> instead of dual-core compute-unit modules.
>
> The cause is likely this patch, which seems to assume both cores of a
> same compute-unit have the same ID, which is wrong?

Does that fix it?

Patch is against latest tip/master because we have some more changes in
that area.

---
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 4daad1e39352..a070d7b0a133 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -305,15 +305,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)

/* get information required for multi-node processors */
if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
- u32 eax, ebx, ecx, edx;
-
- cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
- node_id = ecx & 7;
-
- /* get compute unit information */
- smp_num_siblings = ((ebx >> 8) & 3) + 1;
- c->x86_max_cores /= smp_num_siblings;
- c->cpu_core_id = ebx & 0xff;
+ node_id = cpuid_ecx(0x8000001e) & 7;

/*
* We may have multiple LLCs if L3 caches exist, so check if we
---

Just in case, I have one against Linus' master too:

---
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 1e81a37c034e..19f50a9d6b42 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -305,15 +305,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)

/* get information required for multi-node processors */
if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
- u32 eax, ebx, ecx, edx;
-
- cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
- node_id = ecx & 7;
-
- /* get compute unit information */
- smp_num_siblings = ((ebx >> 8) & 3) + 1;
- c->x86_max_cores /= smp_num_siblings;
- c->cpu_core_id = ebx & 0xff;
+ node_id = cpuid_ecx(0x8000001e) & 7;
} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
u64 value;
---

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--

2016-11-29 21:02:32

by Brice Goglin

[permalink] [raw]
Subject: Re: AMD Bulldozer topology regression since 4.6

Le 29/11/2016 20:39, Borislav Petkov a écrit :
> Does that fix it?
>
> Patch is against latest tip/master because we have some more changes in
> that area.

I tested the second patch on top of 4.8.11, it brings core_id back to
where it was before 4.6, thanks.

Reported-and-tested-by: Brice Goglin <[email protected]>

However thread_siblings isn't back to where it was in 4.5. Now we have a
single bit in each thread_siblings mask. That's correct with respect to
the sysfs topology documentation. In 4.5, there were two bits (one for
each core of the compute unit), which was wrong (cores with different
core_ids shouldn't appear in each other thread_siblings). I assumed that
these processors had to break the sysfs topology documentation to expose
the concept of "dual-core compute-unit" which somehow sits between
hyperthreading and dual-core.

I personally do not care much about this regression, not sure about
other user-space tools?

Another minor related change: /proc/cpuinfo shows "cpu cores : 16"
instead of "8".

Brice


>
> ---
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 4daad1e39352..a070d7b0a133 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -305,15 +305,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
>
> /* get information required for multi-node processors */
> if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
> - u32 eax, ebx, ecx, edx;
> -
> - cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
> - node_id = ecx & 7;
> -
> - /* get compute unit information */
> - smp_num_siblings = ((ebx >> 8) & 3) + 1;
> - c->x86_max_cores /= smp_num_siblings;
> - c->cpu_core_id = ebx & 0xff;
> + node_id = cpuid_ecx(0x8000001e) & 7;
>
> /*
> * We may have multiple LLCs if L3 caches exist, so check if we
> ---
>
> Just in case, I have one against Linus' master too:
>
> ---
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 1e81a37c034e..19f50a9d6b42 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -305,15 +305,7 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
>
> /* get information required for multi-node processors */
> if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
> - u32 eax, ebx, ecx, edx;
> -
> - cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
> - node_id = ecx & 7;
> -
> - /* get compute unit information */
> - smp_num_siblings = ((ebx >> 8) & 3) + 1;
> - c->x86_max_cores /= smp_num_siblings;
> - c->cpu_core_id = ebx & 0xff;
> + node_id = cpuid_ecx(0x8000001e) & 7;
> } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
> u64 value;
> ---
>

2016-11-29 21:15:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: AMD Bulldozer topology regression since 4.6

On Tue, Nov 29, 2016 at 10:02:00PM +0100, Brice Goglin wrote:
> However thread_siblings isn't back to where it was in 4.5. Now we have a
> single bit in each thread_siblings mask. That's correct with respect to
> the sysfs topology documentation. In 4.5, there were two bits (one for
> each core of the compute unit), which was wrong (cores with different
> core_ids shouldn't appear in each other thread_siblings). I assumed that
> these processors had to break the sysfs topology documentation to expose
> the concept of "dual-core compute-unit" which somehow sits between
> hyperthreading and dual-core.

This is exactly the problem - there's never proper fitting of the
compute unit "ideology" in the whole topology view. So what we're doing
now is revert to the old strategy of keeping Bulldozer have core_id ==
thread_id. Basically, all CU threads are cores, as they're more powerful
than, say, SMT threads, but they still share an FPU.

So it is a different kind of sharing and it's kinda, well, different.
There's no other way to put it.

> I personally do not care much about this regression, not sure about
> other user-space tools?
>
> Another minor related change: /proc/cpuinfo shows "cpu cores : 16"
> instead of "8".

Can you send me the whole thing? Offlist is fine too.

Thanks.

--
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
--