2023-05-08 08:48:29

by K Prateek Nayak

[permalink] [raw]
Subject: [PATCH 1/2] drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU hotplug

While building the shared_cpu_map, check if the cache level and cache
type matches. On certain systems that build the cache topology based on
the instance ID, there are cases where the same ID may repeat across
multiple cache levels, leading inaccurate topology.

In event of CPU offlining, the cache_shared_cpu_map_remove() does not
consider if IDs at same level are being compared. As a result, when same
IDs repeat across different cache levels, the CPU going offline is not
removed from all the shared_cpu_map.

Below is the output of cache topology of CPU8 and it's SMT sibling after
CPU8 is offlined on a dual socket 3rd Generation AMD EPYC processor
(2 x 64C/128T) running kernel release v6.3:

# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143

# echo 0 > /sys/devices/system/cpu/cpu8/online

# for i in /sys/devices/system/cpu/cpu136/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list: 136
/sys/devices/system/cpu/cpu136/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu136/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list: 9-15,136-143

CPU8 is removed from index0 (L1i) but remains in the shared_cpu_list of
index1 (L1d) and index2 (L2). Since L1i, L1d, and L2 are shared by the
SMT siblings, and they have the same cache instance ID, CPU 2 is only
removed from the first index with matching ID which is index1 (L1i) in
this case. With this fix, the results are as expected when performing
the same experiment on the same system:

# for i in /sys/devices/system/cpu/cpu8/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list: 8,136
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list: 8-15,136-143

# echo 0 > /sys/devices/system/cpu/cpu8/online

# for i in /sys/devices/system/cpu/cpu136/cache/index*/shared_cpu_list; do echo -n "$i: "; cat $i; done
/sys/devices/system/cpu/cpu136/cache/index0/shared_cpu_list: 136
/sys/devices/system/cpu/cpu136/cache/index1/shared_cpu_list: 136
/sys/devices/system/cpu/cpu136/cache/index2/shared_cpu_list: 136
/sys/devices/system/cpu/cpu136/cache/index3/shared_cpu_list: 9-15,136-143

When rebuilding topology, the same problem appears as
cache_shared_cpu_map_setup() implements a similar logic. Consider the
same 3rd Generation EPYC processor: CPUs in Core 1, that share the L1
and L2 caches, have L1 and L2 instance ID as 1. For all the CPUs on
the second chiplet, the L3 ID is also 1 leading to grouping on CPUs from
Core 1 (1, 17) and the entire second chiplet (8-15, 24-31) as CPUs
sharing one cache domain. This went undetected since x86 processors
depended on arch specific populate_cache_leaves() method to repopulate
the shared_cpus_map when CPU came back online until kernel release
v6.3-rc5.

Fixes: 198102c9103f ("cacheinfo: Fix shared_cpu_map to handle shared caches at different levels")
Signed-off-by: K Prateek Nayak <[email protected]>
---
drivers/base/cacheinfo.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index bba3482ddeb8..d1ae443fd7a0 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -388,6 +388,16 @@ static int cache_shared_cpu_map_setup(unsigned int cpu)
continue;/* skip if itself or no cacheinfo */
for (sib_index = 0; sib_index < cache_leaves(i); sib_index++) {
sib_leaf = per_cpu_cacheinfo_idx(i, sib_index);
+
+ /*
+ * Comparing cache IDs only makes sense if the leaves
+ * belong to the same cache level of same type. Skip
+ * the check if level and type do not match.
+ */
+ if (sib_leaf->level != this_leaf->level ||
+ sib_leaf->type != this_leaf->type)
+ continue;
+
if (cache_leaves_are_shared(this_leaf, sib_leaf)) {
cpumask_set_cpu(cpu, &sib_leaf->shared_cpu_map);
cpumask_set_cpu(i, &this_leaf->shared_cpu_map);
@@ -419,6 +429,16 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)

for (sib_index = 0; sib_index < cache_leaves(sibling); sib_index++) {
sib_leaf = per_cpu_cacheinfo_idx(sibling, sib_index);
+
+ /*
+ * Comparing cache IDs only makes sense if the leaves
+ * belong to the same cache level of same type. Skip
+ * the check if level and type do not match.
+ */
+ if (sib_leaf->level != this_leaf->level ||
+ sib_leaf->type != this_leaf->type)
+ continue;
+
if (cache_leaves_are_shared(this_leaf, sib_leaf)) {
cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
--
2.34.1