2019-02-19 03:42:37

by Len Brown

[permalink] [raw]
Subject: [PATCH 0/11] multi-die/package support

This patch series does 3 things.

1. Parses the new CPUID.1F leaf to discover multi-die/package topology

2. Exports knowledge of multi-die topology inside the kernel,
and also to user space via the topology sysfs.

3. Updates parts of the kernel that need to know the difference between
a die-scope MSR and a package_scope MSR.

To follow, Kan will send some additional perf-uncore patches for #3.

Also not include here are some updates to user-space utilities,
cpuid, lscpu, turbostat, x86_energy_perf_policy.

Rafael suggests that the most practical way to handle this series
is via the x86 tree.

Thanks,
-Len




2019-02-19 03:41:10

by Len Brown

[permalink] [raw]
Subject: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

From: Len Brown <[email protected]>

Some new systems have multiple software-visible die within each package.
The new CPUID.1F leaf can enumerate this multi-die/package topology.

CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
and a common updated routine can parse either leaf.

Legacy systems without CPUID.1F, and systems without multi-die/package
hardware, will see no functional change from this patch series.

Multi-die/package systems will use CPUID.B before this patch,
and CPUID.1F after this patch. In the CPUID.B case, all die appear
(incorrectly) to software as individual packages. In the CPUID.1F case,
the package id's reflect reality, and die_id's become meaningful.

Subsequent patches in this series update the kernel to be multi-die aware.
In particular, some software needs to know the difference between
a die-scope MSR and a package-scope MSR.

Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
Documentation/x86/topology.txt | 4 ++
arch/x86/include/asm/processor.h | 4 +-
arch/x86/kernel/cpu/topology.c | 82 ++++++++++++++++++++++++--------
arch/x86/kernel/smpboot.c | 4 +-
4 files changed, 73 insertions(+), 21 deletions(-)

diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
index 06b3cdbc4048..8107b6cfc9ea 100644
--- a/Documentation/x86/topology.txt
+++ b/Documentation/x86/topology.txt
@@ -46,6 +46,10 @@ The topology of a system is described in the units of:

The number of cores in a package. This information is retrieved via CPUID.

+ - cpuinfo_x86.x86_max_dies:
+
+ The number of dies in a package. This information is retrieved via CPUID.
+
- cpuinfo_x86.phys_proc_id:

The physical ID of the package. This information is retrieved via CPUID
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 33051436c864..f2856fe03715 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -105,7 +105,8 @@ struct cpuinfo_x86 {
int x86_power;
unsigned long loops_per_jiffy;
/* cpuid returned max cores value: */
- u16 x86_max_cores;
+ u16 x86_max_cores;
+ u16 x86_max_dies;
u16 apicid;
u16 initial_apicid;
u16 x86_clflush_size;
@@ -117,6 +118,7 @@ struct cpuinfo_x86 {
u16 logical_proc_id;
/* Core id: */
u16 cpu_core_id;
+ u16 cpu_die_id;
/* Index into per_cpu list: */
u16 cpu_index;
u32 microcode;
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 8f6c784141d1..6dce6ee77849 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -15,33 +15,61 @@
/* leaf 0xb SMT level */
#define SMT_LEVEL 0

-/* leaf 0xb sub-leaf types */
+/* extended topology sub-leaf types */
#define INVALID_TYPE 0
#define SMT_TYPE 1
#define CORE_TYPE 2
+#define DIE_TYPE 5

#define LEAFB_SUBTYPE(ecx) (((ecx) >> 8) & 0xff)
#define BITS_SHIFT_NEXT_LEVEL(eax) ((eax) & 0x1f)
#define LEVEL_MAX_SIBLINGS(ebx) ((ebx) & 0xffff)

-int detect_extended_topology_early(struct cpuinfo_x86 *c)
-{
#ifdef CONFIG_SMP
+/*
+ * Check if given CPUID extended toplogy "leaf" is implemented
+ */
+static int check_extended_topology_leaf(int leaf)
+{
unsigned int eax, ebx, ecx, edx;

- if (c->cpuid_level < 0xb)
+ cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+
+ if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
return -1;

- cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+ return 0;
+}
+/*
+ * Return best CPUID Extended Toplogy Leaf supported
+ */
+static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
+{
+ if (c->cpuid_level >= 0x1f)
+ if (check_extended_topology_leaf(0x1f) == 0)
+ return 0x1f;

- /*
- * check if the cpuid leaf 0xb is actually implemented.
- */
- if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
+ if (c->cpuid_level >= 0xb)
+ if (check_extended_topology_leaf(0xb) == 0)
+ return 0xb;
+
+ return -1;
+}
+#endif
+
+int detect_extended_topology_early(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+ unsigned int eax, ebx, ecx, edx;
+ int leaf;
+
+ leaf = detect_extended_topology_leaf(c);
+ if (leaf < 0)
return -1;

set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);

+ cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
/*
* initial apic id, which also represents 32-bit extended x2apic id.
*/
@@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
}

/*
- * Check for extended topology enumeration cpuid leaf 0xb and if it
+ * Check for extended topology enumeration cpuid leaf, and if it
* exists, use it for populating initial_apicid and cpu topology
* detection.
*/
@@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_SMP
unsigned int eax, ebx, ecx, edx, sub_index;
- unsigned int ht_mask_width, core_plus_mask_width;
+ unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
unsigned int core_select_mask, core_level_siblings;
+ unsigned int die_select_mask, die_level_siblings;
+ int leaf;

- if (detect_extended_topology_early(c) < 0)
+ leaf = detect_extended_topology_leaf(c);
+ if (leaf < 0)
return -1;

/*
* Populate HT related information from sub-leaf level 0.
*/
- cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+ cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+ c->initial_apicid = edx;
core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);

sub_index = 1;
do {
- cpuid_count(0xb, sub_index, &eax, &ebx, &ecx, &edx);
+ cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);

/*
* Check for the Core type in the implemented sub leaves.
*/
if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_level_siblings = core_level_siblings;
core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
- break;
+ }
+ if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
+ die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}

sub_index++;
} while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);

core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
-
- c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, ht_mask_width)
- & core_select_mask;
- c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, core_plus_mask_width);
+ die_select_mask = (~(-1 << die_plus_mask_width)) >>
+ core_plus_mask_width;
+
+ c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
+ ht_mask_width) & core_select_mask;
+ c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
+ core_plus_mask_width) & die_select_mask;
+ c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
+ die_plus_mask_width);
/*
* Reinit the apicid, now that we have extended initial_apicid.
*/
c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);

c->x86_max_cores = (core_level_siblings / smp_num_siblings);
+ c->x86_max_dies = (die_level_siblings / core_level_siblings);
#endif
return 0;
}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ccd1f2a8e557..4250a87f57db 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
int cpu1 = c->cpu_index, cpu2 = o->cpu_index;

if (c->phys_proc_id == o->phys_proc_id &&
+ c->cpu_die_id == o->cpu_die_id &&
per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
if (c->cpu_core_id == o->cpu_core_id)
return topology_sane(c, o, "smt");
@@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
}

} else if (c->phys_proc_id == o->phys_proc_id &&
+ c->cpu_die_id == o->cpu_die_id &&
c->cpu_core_id == o->cpu_core_id) {
return topology_sane(c, o, "smt");
}
@@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
*/
static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
- if (c->phys_proc_id == o->phys_proc_id)
+ if (c->cpu_die_id == o->cpu_die_id)
return true;
return false;
}
--
2.18.0-rc0


2019-02-19 03:41:27

by Len Brown

[permalink] [raw]
Subject: [PATCH 10/11] thermal/x86_pkg_temp_thermal: Support multi-die/package

From: Zhang Rui <[email protected]>

On the new dual-die/package systems, the package temperature MSR becomes
die-scope. Thus instead of one thermal zone device per physical package,
now there should be one thermal_zone for each die on these systems.

This patch introduces x86_pkg_temp_thermal support for new
dual-die/package systems.

On the hardwares that do not have multi-die, topology_die_id() equals
topology_physical_package_id(), thus there is no functional change.

Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Len Brown <[email protected]>
---
drivers/thermal/intel/x86_pkg_temp_thermal.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/thermal/intel/x86_pkg_temp_thermal.c b/drivers/thermal/intel/x86_pkg_temp_thermal.c
index 1ef937d799e4..a2eb9136105e 100644
--- a/drivers/thermal/intel/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c
@@ -122,7 +122,7 @@ static int pkg_temp_debugfs_init(void)
*/
static struct pkg_device *pkg_temp_thermal_get_dev(unsigned int cpu)
{
- int pkgid = topology_logical_package_id(cpu);
+ int pkgid = topology_unique_die_id(cpu);

if (pkgid >= 0 && pkgid < max_packages)
return packages[pkgid];
@@ -353,7 +353,7 @@ static int pkg_thermal_notify(u64 msr_val)

static int pkg_temp_thermal_device_add(unsigned int cpu)
{
- int pkgid = topology_logical_package_id(cpu);
+ int pkgid = topology_unique_die_id(cpu);
u32 tj_max, eax, ebx, ecx, edx;
struct pkg_device *pkgdev;
int thres_count, err;
@@ -449,7 +449,7 @@ static int pkg_thermal_cpu_offline(unsigned int cpu)
* worker will see the package anymore.
*/
if (lastcpu) {
- packages[topology_logical_package_id(cpu)] = NULL;
+ packages[topology_unique_die_id(cpu)] = NULL;
/* After this point nothing touches the MSR anymore. */
wrmsr(MSR_IA32_PACKAGE_THERM_INTERRUPT,
pkgdev->msr_pkg_therm_low, pkgdev->msr_pkg_therm_high);
@@ -511,11 +511,12 @@ MODULE_DEVICE_TABLE(x86cpu, pkg_temp_thermal_ids);
static int __init pkg_temp_thermal_init(void)
{
int ret;
+ struct cpuinfo_x86 *c = &cpu_data(0);

if (!x86_match_cpu(pkg_temp_thermal_ids))
return -ENODEV;

- max_packages = topology_max_packages();
+ max_packages = topology_max_packages() * c->x86_max_dies;
packages = kcalloc(max_packages, sizeof(struct pkg_device *),
GFP_KERNEL);
if (!packages)
--
2.18.0-rc0


2019-02-19 03:41:34

by Len Brown

[permalink] [raw]
Subject: [PATCH 11/11] hwmon/coretemp: Support multi-die/package

From: Zhang Rui <[email protected]>

This patch introduces coretemp driver support
for new dual-die/package systems.

On the new dual-die/package systems, the package temperature MSRs becomes
die-scope. Thus instead of one hwmon device per physical package, now
there should be one hwmon device for each die on these systems.

On the hardwares that do not have multi-die support,
topology_unique_die_id() equals topology_physical_package_id(), thus the
only difference is that physical package id is used as the coretemp
platform device id, instead of logical package id on these systems.

Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
drivers/hwmon/coretemp.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 5d34f7271e67..a0b6b2247c3a 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -435,7 +435,7 @@ static int chk_ucode_version(unsigned int cpu)

static struct platform_device *coretemp_get_pdev(unsigned int cpu)
{
- int pkgid = topology_logical_package_id(cpu);
+ int pkgid = topology_unique_die_id(cpu);

if (pkgid >= 0 && pkgid < max_packages)
return pkg_devices[pkgid];
@@ -579,7 +579,7 @@ static struct platform_driver coretemp_driver = {

static struct platform_device *coretemp_device_add(unsigned int cpu)
{
- int err, pkgid = topology_logical_package_id(cpu);
+ int err, pkgid = topology_unique_die_id(cpu);
struct platform_device *pdev;

if (pkgid < 0)
@@ -703,7 +703,7 @@ static int coretemp_cpu_offline(unsigned int cpu)
* the rest.
*/
if (cpumask_empty(&pd->cpumask)) {
- pkg_devices[topology_logical_package_id(cpu)] = NULL;
+ pkg_devices[topology_unique_die_id(cpu)] = NULL;
platform_device_unregister(pdev);
return 0;
}
@@ -732,6 +732,7 @@ static enum cpuhp_state coretemp_hp_online;
static int __init coretemp_init(void)
{
int err;
+ struct cpuinfo_x86 *c = &cpu_data(0);

/*
* CPUID.06H.EAX[0] indicates whether the CPU has thermal
@@ -741,7 +742,7 @@ static int __init coretemp_init(void)
if (!x86_match_cpu(coretemp_ids))
return -ENODEV;

- max_packages = topology_max_packages();
+ max_packages = topology_max_packages() * c->x86_max_dies;
pkg_devices = kcalloc(max_packages, sizeof(struct platform_device *),
GFP_KERNEL);
if (!pkg_devices)
--
2.18.0-rc0


2019-02-19 03:41:53

by Len Brown

[permalink] [raw]
Subject: [PATCH 07/11] powercap/intel_rapl: simplify rapl_find_package()

From: Zhang Rui <[email protected]>

Simplify how the code to discover a package is called.
Rename find_package_by_id() to rapl_find_package()

No functional change.

Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
drivers/powercap/intel_rapl.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 6cdb2c14eee4..6057d9695fed 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -264,8 +264,9 @@ static struct powercap_control_type *control_type; /* PowerCap Controller */
static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */

/* caller to ensure CPU hotplug lock is held */
-static struct rapl_package *find_package_by_id(int id)
+static struct rapl_package *rapl_find_package(int cpu)
{
+ int id = topology_physical_package_id(cpu);
struct rapl_package *rp;

list_for_each_entry(rp, &rapl_packages, plist) {
@@ -1298,7 +1299,7 @@ static int __init rapl_register_psys(void)
rd->rpl[0].name = pl1_name;
rd->rpl[1].prim_id = PL2_ENABLE;
rd->rpl[1].name = pl2_name;
- rd->rp = find_package_by_id(0);
+ rd->rp = rapl_find_package(0);

power_zone = powercap_register_zone(&rd->power_zone, control_type,
"psys", NULL,
@@ -1454,8 +1455,9 @@ static void rapl_remove_package(struct rapl_package *rp)
}

/* called from CPU hotplug notifier, hotplug lock held */
-static struct rapl_package *rapl_add_package(int cpu, int pkgid)
+static struct rapl_package *rapl_add_package(int cpu)
{
+ int id = topology_physical_package_id(cpu);
struct rapl_package *rp;
int ret;

@@ -1464,7 +1466,7 @@ static struct rapl_package *rapl_add_package(int cpu, int pkgid)
return ERR_PTR(-ENOMEM);

/* add the new package to the list */
- rp->id = pkgid;
+ rp->id = id;
rp->lead_cpu = cpu;

/* check if the package contains valid domains */
@@ -1495,12 +1497,11 @@ static struct rapl_package *rapl_add_package(int cpu, int pkgid)
*/
static int rapl_cpu_online(unsigned int cpu)
{
- int pkgid = topology_physical_package_id(cpu);
struct rapl_package *rp;

- rp = find_package_by_id(pkgid);
+ rp = rapl_find_package(cpu);
if (!rp) {
- rp = rapl_add_package(cpu, pkgid);
+ rp = rapl_add_package(cpu);
if (IS_ERR(rp))
return PTR_ERR(rp);
}
@@ -1510,11 +1511,10 @@ static int rapl_cpu_online(unsigned int cpu)

static int rapl_cpu_down_prep(unsigned int cpu)
{
- int pkgid = topology_physical_package_id(cpu);
struct rapl_package *rp;
int lead_cpu;

- rp = find_package_by_id(pkgid);
+ rp = rapl_find_package(cpu);
if (!rp)
return 0;

--
2.18.0-rc0


2019-02-19 03:42:15

by Len Brown

[permalink] [raw]
Subject: [PATCH 09/11] powercap/intel_rapl: update rapl domain name and debug messages

From: Zhang Rui <[email protected]>

The RAPL domain "name" attribute contains "Package-N",
which is ambiguous on multi-die per-package systems.

Update the name to "package-X-die-Y" on those systems.

No change on systems without multi-die.

Driver debug messages are also updated.

Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
drivers/powercap/intel_rapl.c | 57 ++++++++++++++++++++---------------
1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index e004707283da..c990eee45239 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -178,12 +178,15 @@ struct rapl_domain {
#define power_zone_to_rapl_domain(_zone) \
container_of(_zone, struct rapl_domain, power_zone)

+/* maximum rapl package domain name: package-%d-die-%d */
+#define PACKAGE_DOMAIN_NAME_LENGTH 30

-/* Each physical package contains multiple domains, these are the common
+
+/* Each rapl package contains multiple domains, these are the common
* data across RAPL domains within a package.
*/
struct rapl_package {
- unsigned int id; /* physical package/socket id */
+ unsigned int id; /* logical die id, equals physical 1-die systems */
unsigned int nr_domains;
unsigned long domain_map; /* bit map of active domains */
unsigned int power_unit;
@@ -198,6 +201,7 @@ struct rapl_package {
int lead_cpu; /* one active cpu per package for access */
/* Track active cpus */
struct cpumask cpumask;
+ char name[PACKAGE_DOMAIN_NAME_LENGTH];
};

struct rapl_defaults {
@@ -926,8 +930,8 @@ static int rapl_check_unit_core(struct rapl_package *rp, int cpu)
value = (msr_val & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
rp->time_unit = 1000000 / (1 << value);

- pr_debug("Core CPU package %d energy=%dpJ, time=%dus, power=%duW\n",
- rp->id, rp->energy_unit, rp->time_unit, rp->power_unit);
+ pr_debug("Core CPU %s energy=%dpJ, time=%dus, power=%duW\n",
+ rp->name, rp->energy_unit, rp->time_unit, rp->power_unit);

return 0;
}
@@ -951,8 +955,8 @@ static int rapl_check_unit_atom(struct rapl_package *rp, int cpu)
value = (msr_val & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
rp->time_unit = 1000000 / (1 << value);

- pr_debug("Atom package %d energy=%dpJ, time=%dus, power=%duW\n",
- rp->id, rp->energy_unit, rp->time_unit, rp->power_unit);
+ pr_debug("Atom %s energy=%dpJ, time=%dus, power=%duW\n",
+ rp->name, rp->energy_unit, rp->time_unit, rp->power_unit);

return 0;
}
@@ -1179,7 +1183,7 @@ static void rapl_update_domain_data(struct rapl_package *rp)
u64 val;

for (dmn = 0; dmn < rp->nr_domains; dmn++) {
- pr_debug("update package %d domain %s data\n", rp->id,
+ pr_debug("update %s domain %s data\n", rp->name,
rp->domains[dmn].name);
/* exclude non-raw primitives */
for (prim = 0; prim < NR_RAW_PRIMITIVES; prim++) {
@@ -1204,7 +1208,6 @@ static void rapl_unregister_powercap(void)
static int rapl_package_register_powercap(struct rapl_package *rp)
{
struct rapl_domain *rd;
- char dev_name[17]; /* max domain name = 7 + 1 + 8 for int + 1 for null*/
struct powercap_zone *power_zone = NULL;
int nr_pl, ret;

@@ -1215,20 +1218,16 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
if (rd->id == RAPL_DOMAIN_PACKAGE) {
nr_pl = find_nr_power_limit(rd);
- pr_debug("register socket %d package domain %s\n",
- rp->id, rd->name);
- memset(dev_name, 0, sizeof(dev_name));
- snprintf(dev_name, sizeof(dev_name), "%s-%d",
- rd->name, rp->id);
+ pr_debug("register package domain %s\n", rp->name);
power_zone = powercap_register_zone(&rd->power_zone,
control_type,
- dev_name, NULL,
+ rp->name, NULL,
&zone_ops[rd->id],
nr_pl,
&constraint_ops);
if (IS_ERR(power_zone)) {
- pr_debug("failed to register package, %d\n",
- rp->id);
+ pr_debug("failed to register power zone %s\n",
+ rp->name);
return PTR_ERR(power_zone);
}
/* track parent zone in per package/socket data */
@@ -1254,8 +1253,8 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
&constraint_ops);

if (IS_ERR(power_zone)) {
- pr_debug("failed to register power_zone, %d:%s:%s\n",
- rp->id, rd->name, dev_name);
+ pr_debug("failed to register power_zone, %s:%s\n",
+ rp->name, rd->name);
ret = PTR_ERR(power_zone);
goto err_cleanup;
}
@@ -1268,7 +1267,7 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
* failed after the first domain setup.
*/
while (--rd >= rp->domains) {
- pr_debug("unregister package %d domain %s\n", rp->id, rd->name);
+ pr_debug("unregister %s domain %s\n", rp->name, rd->name);
powercap_unregister_zone(control_type, &rd->power_zone);
}

@@ -1378,8 +1377,8 @@ static void rapl_detect_powerlimit(struct rapl_domain *rd)
/* check if the domain is locked by BIOS, ignore if MSR doesn't exist */
if (!rapl_read_data_raw(rd, FW_LOCK, false, &val64)) {
if (val64) {
- pr_info("RAPL package %d domain %s locked by BIOS\n",
- rd->rp->id, rd->name);
+ pr_info("RAPL %s domain %s locked by BIOS\n",
+ rd->rp->name, rd->name);
rd->state |= DOMAIN_STATE_BIOS_LOCKED;
}
}
@@ -1408,10 +1407,10 @@ static int rapl_detect_domains(struct rapl_package *rp, int cpu)
}
rp->nr_domains = bitmap_weight(&rp->domain_map, RAPL_DOMAIN_MAX);
if (!rp->nr_domains) {
- pr_debug("no valid rapl domains found in package %d\n", rp->id);
+ pr_debug("no valid rapl domains found in %s\n", rp->name);
return -ENODEV;
}
- pr_debug("found %d domains on package %d\n", rp->nr_domains, rp->id);
+ pr_debug("found %d domains on %s\n", rp->nr_domains, rp->name);

rp->domains = kcalloc(rp->nr_domains + 1, sizeof(struct rapl_domain),
GFP_KERNEL);
@@ -1444,8 +1443,8 @@ static void rapl_remove_package(struct rapl_package *rp)
rd_package = rd;
continue;
}
- pr_debug("remove package, undo power limit on %d: %s\n",
- rp->id, rd->name);
+ pr_debug("remove package, undo power limit on %s: %s\n",
+ rp->name, rd->name);
powercap_unregister_zone(control_type, &rd->power_zone);
}
/* do parent zone last */
@@ -1459,6 +1458,7 @@ static struct rapl_package *rapl_add_package(int cpu)
{
int id = topology_unique_die_id(cpu);
struct rapl_package *rp;
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
int ret;

rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL);
@@ -1469,6 +1469,13 @@ static struct rapl_package *rapl_add_package(int cpu)
rp->id = id;
rp->lead_cpu = cpu;

+ if (c->x86_max_dies > 1)
+ snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH,
+ "package-%d-die-%d", c->phys_proc_id, c->cpu_die_id);
+ else
+ snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d",
+ c->phys_proc_id);
+
/* check if the package contains valid domains */
if (rapl_detect_domains(rp, cpu) ||
rapl_defaults->check_unit(rp, cpu)) {
--
2.18.0-rc0


2019-02-19 03:42:21

by Len Brown

[permalink] [raw]
Subject: [PATCH 04/11] cpu topology: export die_id

From: Len Brown <[email protected]>

Export die_id in cpu topology, for the benefit of hardware that
has multiple die per package. die_id is quite similar to core_id,
in that it holds multiple CPUs, but is inside a package.

This needed by topology-aware user-space. In particular,
an application, such as turbostat(8), which needs to know
the actual scope of "package-scope" MSRs.

Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
Documentation/cputopology.txt | 10 ++++++++--
arch/x86/include/asm/topology.h | 1 +
drivers/base/topology.c | 4 ++++
include/linux/topology.h | 3 +++
4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 2698da7e4f49..287213b4517b 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -12,6 +12,12 @@ physical_package_id:
socket number, but the actual value is architecture and platform
dependent.

+die_id:
+
+ the CPU die ID of cpuX. Typically it is the hardware platform's
+ identifier (rather than the kernel's). The actual value is
+ architecture and platform dependent.
+
core_id:

the CPU core ID of cpuX. Typically it is the hardware platform's
@@ -43,12 +49,12 @@ thread_siblings_list:
core_siblings:

internal kernel map of cpuX's hardware threads within the same
- physical_package_id.
+ die_id.

core_siblings_list:

human-readable list of cpuX's hardware threads within the same
- physical_package_id.
+ die_id.

book_siblings:

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 453cf38a1c33..281be6bbc80d 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -106,6 +106,7 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);

#define topology_logical_package_id(cpu) (cpu_data(cpu).logical_proc_id)
#define topology_physical_package_id(cpu) (cpu_data(cpu).phys_proc_id)
+#define topology_die_id(cpu) (cpu_data(cpu).cpu_die_id)
#define topology_core_id(cpu) (cpu_data(cpu).cpu_core_id)

#ifdef CONFIG_SMP
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index 5fd9f167ecc1..50352cf96f85 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -43,6 +43,9 @@ static ssize_t name##_list_show(struct device *dev, \
define_id_show_func(physical_package_id);
static DEVICE_ATTR_RO(physical_package_id);

+define_id_show_func(die_id);
+static DEVICE_ATTR_RO(die_id);
+
define_id_show_func(core_id);
static DEVICE_ATTR_RO(core_id);

@@ -72,6 +75,7 @@ static DEVICE_ATTR_RO(drawer_siblings_list);

static struct attribute *default_attrs[] = {
&dev_attr_physical_package_id.attr,
+ &dev_attr_die_id.attr,
&dev_attr_core_id.attr,
&dev_attr_thread_siblings.attr,
&dev_attr_thread_siblings_list.attr,
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..5cc8595dd0e4 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
#ifndef topology_physical_package_id
#define topology_physical_package_id(cpu) ((void)(cpu), -1)
#endif
+#ifndef topology_die_id
+#define topology_die_id(cpu) ((void)(cpu), -1)
+#endif
#ifndef topology_core_id
#define topology_core_id(cpu) ((void)(cpu), 0)
#endif
--
2.18.0-rc0


2019-02-19 03:42:50

by Len Brown

[permalink] [raw]
Subject: [PATCH 06/11] x86 topology: define topology_unique_die_id()

From: Len Brown <[email protected]>

The topology_unique_die_id() helper calculates a unique system-wide die-id,
for the benefit of multi-die per-package systems.

This simple system-wide unique number within known bounds is sufficient.
No need for the overhead of generating a global consecutive "logical" id.

Signed-off-by: Len Brown <[email protected]>
---
arch/x86/include/asm/topology.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index a52a572147ba..12444a1a30f3 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -106,6 +106,9 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);

#define topology_logical_package_id(cpu) (cpu_data(cpu).logical_proc_id)
#define topology_physical_package_id(cpu) (cpu_data(cpu).phys_proc_id)
+#define topology_unique_die_id(cpu) ((cpu_data(cpu).logical_proc_id * \
+ cpu_data(cpu).x86_max_dies) + \
+ cpu_data(cpu).cpu_die_id)
#define topology_die_id(cpu) (cpu_data(cpu).cpu_die_id)
#define topology_core_id(cpu) (cpu_data(cpu).cpu_core_id)

--
2.18.0-rc0


2019-02-19 03:42:56

by Len Brown

[permalink] [raw]
Subject: [PATCH 08/11] powercap/intel_rapl: Support multi-die/package

From: Zhang Rui <[email protected]>

On the new dual-die/package systems, the RAPL MSR becomes die-scope.
Thus instead of one powercap device per physical package, now there
should be one powercap device for each unique die on these systems.

This patch introduces intel_rapl driver support for new
dual-die/package systems.

On the hardwares that do not have multi-die, topology_unique_die_id()
equals topology_physical_package_id(), thus there is no functional change.

Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
drivers/powercap/intel_rapl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
index 6057d9695fed..e004707283da 100644
--- a/drivers/powercap/intel_rapl.c
+++ b/drivers/powercap/intel_rapl.c
@@ -266,7 +266,7 @@ static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */
/* caller to ensure CPU hotplug lock is held */
static struct rapl_package *rapl_find_package(int cpu)
{
- int id = topology_physical_package_id(cpu);
+ int id = topology_unique_die_id(cpu);
struct rapl_package *rp;

list_for_each_entry(rp, &rapl_packages, plist) {
@@ -1457,7 +1457,7 @@ static void rapl_remove_package(struct rapl_package *rp)
/* called from CPU hotplug notifier, hotplug lock held */
static struct rapl_package *rapl_add_package(int cpu)
{
- int id = topology_physical_package_id(cpu);
+ int id = topology_unique_die_id(cpu);
struct rapl_package *rp;
int ret;

--
2.18.0-rc0


2019-02-19 03:43:00

by Len Brown

[permalink] [raw]
Subject: [PATCH 05/11] x86 topology: export die_siblings

From: Len Brown <[email protected]>

like core_siblings, except it shows which die are in the same package.

This is needed for lscpu(1) to correctly display die topology.

Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
Documentation/cputopology.txt | 10 ++++++++++
arch/x86/include/asm/smp.h | 1 +
arch/x86/include/asm/topology.h | 1 +
arch/x86/kernel/smpboot.c | 20 ++++++++++++++++++++
arch/x86/xen/smp_pv.c | 1 +
drivers/base/topology.c | 6 ++++++
include/linux/topology.h | 3 +++
7 files changed, 42 insertions(+)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 287213b4517b..7dd2ae3df233 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -56,6 +56,16 @@ core_siblings_list:
human-readable list of cpuX's hardware threads within the same
die_id.

+die_siblings:
+
+ internal kernel map of cpuX's hardware threads within the same
+ physical_package_id.
+
+die_siblings_list:
+
+ human-readable list of cpuX's hardware threads within the same
+ physical_package_id.
+
book_siblings:

internal kernel map of cpuX's hardware threads within the same
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 2e95b6c1bca3..39266d193597 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -23,6 +23,7 @@ extern unsigned int num_processors;

DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
+DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
/* cpus sharing the last level cache: */
DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 281be6bbc80d..a52a572147ba 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -110,6 +110,7 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
#define topology_core_id(cpu) (cpu_data(cpu).cpu_core_id)

#ifdef CONFIG_SMP
+#define topology_die_cpumask(cpu) (per_cpu(cpu_die_map, cpu))
#define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
#define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu))

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4250a87f57db..42d37e4a1918 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -90,6 +90,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
EXPORT_PER_CPU_SYMBOL(cpu_core_map);

+/* representing HT and core and die siblings of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
+EXPORT_PER_CPU_SYMBOL(cpu_die_map);
+
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);

/* Per CPU bogomips and other parameters */
@@ -461,6 +465,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
* multicore group inside a NUMA node. If this happens, we will
* discard the MC level of the topology later.
*/
+static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+ if (c->phys_proc_id == o->phys_proc_id)
+ return true;
+ return false;
+}
static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
if (c->cpu_die_id == o->cpu_die_id)
@@ -530,6 +540,7 @@ void set_cpu_sibling_map(int cpu)
cpumask_set_cpu(cpu, topology_sibling_cpumask(cpu));
cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu));
cpumask_set_cpu(cpu, topology_core_cpumask(cpu));
+ cpumask_set_cpu(cpu, topology_die_cpumask(cpu));
c->booted_cores = 1;
return;
}
@@ -576,8 +587,12 @@ void set_cpu_sibling_map(int cpu)
} else if (i != cpu && !c->booted_cores)
c->booted_cores = cpu_data(i).booted_cores;
}
+
if (match_die(c, o) && !topology_same_node(c, o))
x86_has_numa_in_package = true;
+
+ if ((i == cpu) || (has_mp && match_pkg(c, o)))
+ link_mask(topology_die_cpumask, cpu, i);
}

threads = cpumask_weight(topology_sibling_cpumask(cpu));
@@ -1173,6 +1188,7 @@ static __init void disable_smp(void)
physid_set_mask_of_physid(0, &phys_cpu_present_map);
cpumask_set_cpu(0, topology_sibling_cpumask(0));
cpumask_set_cpu(0, topology_core_cpumask(0));
+ cpumask_set_cpu(0, topology_die_cpumask(0));
}

/*
@@ -1268,6 +1284,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
for_each_possible_cpu(i) {
zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
+ zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
}

@@ -1488,6 +1505,8 @@ static void remove_siblinginfo(int cpu)
cpu_data(sibling).booted_cores--;
}

+ for_each_cpu(sibling, topology_die_cpumask(cpu))
+ cpumask_clear_cpu(cpu, topology_die_cpumask(sibling));
for_each_cpu(sibling, topology_sibling_cpumask(cpu))
cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
@@ -1495,6 +1514,7 @@ static void remove_siblinginfo(int cpu)
cpumask_clear(cpu_llc_shared_mask(cpu));
cpumask_clear(topology_sibling_cpumask(cpu));
cpumask_clear(topology_core_cpumask(cpu));
+ cpumask_clear(topology_die_cpumask(cpu));
c->cpu_core_id = 0;
c->booted_cores = 0;
cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 145506f9fdbe..ac13b0be8448 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -251,6 +251,7 @@ static void __init xen_pv_smp_prepare_cpus(unsigned int max_cpus)
for_each_possible_cpu(i) {
zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
+ zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
}
set_cpu_sibling_map(0);
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index 50352cf96f85..5b1317ae3262 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -57,6 +57,10 @@ define_siblings_show_func(core_siblings, core_cpumask);
static DEVICE_ATTR_RO(core_siblings);
static DEVICE_ATTR_RO(core_siblings_list);

+define_siblings_show_func(die_siblings, die_cpumask);
+static DEVICE_ATTR_RO(die_siblings);
+static DEVICE_ATTR_RO(die_siblings_list);
+
#ifdef CONFIG_SCHED_BOOK
define_id_show_func(book_id);
static DEVICE_ATTR_RO(book_id);
@@ -81,6 +85,8 @@ static struct attribute *default_attrs[] = {
&dev_attr_thread_siblings_list.attr,
&dev_attr_core_siblings.attr,
&dev_attr_core_siblings_list.attr,
+ &dev_attr_die_siblings.attr,
+ &dev_attr_die_siblings_list.attr,
#ifdef CONFIG_SCHED_BOOK
&dev_attr_book_id.attr,
&dev_attr_book_siblings.attr,
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 5cc8595dd0e4..47a3e3c08036 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -196,6 +196,9 @@ static inline int cpu_to_mem(int cpu)
#ifndef topology_core_cpumask
#define topology_core_cpumask(cpu) cpumask_of(cpu)
#endif
+#ifndef topology_die_cpumask
+#define topology_die_cpumask(cpu) cpumask_of(cpu)
+#endif

#ifdef CONFIG_SCHED_SMT
static inline const struct cpumask *cpu_smt_mask(int cpu)
--
2.18.0-rc0


2019-02-19 03:43:05

by Len Brown

[permalink] [raw]
Subject: [PATCH 01/11] x86 topology: fix doc typo

From: Len Brown <[email protected]>

reflect actual cpuinfo_x86 field name:

s/logical_id/logical_proc_id/

Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
Documentation/x86/topology.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
index 2953e3ec9a02..06b3cdbc4048 100644
--- a/Documentation/x86/topology.txt
+++ b/Documentation/x86/topology.txt
@@ -51,7 +51,7 @@ The topology of a system is described in the units of:
The physical ID of the package. This information is retrieved via CPUID
and deduced from the APIC IDs of the cores in the package.

- - cpuinfo_x86.logical_id:
+ - cpuinfo_x86.logical_proc_id:

The logical ID of the package. As we do not trust BIOSes to enumerate the
packages in a consistent way, we introduced the concept of logical package
--
2.18.0-rc0


2019-02-19 03:43:44

by Len Brown

[permalink] [raw]
Subject: [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording

From: Len Brown <[email protected]>

No semantic changes.

Signed-off-by: Len Brown <[email protected]>
Cc: [email protected]
Signed-off-by: Len Brown <[email protected]>
---
Documentation/cputopology.txt | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index c6e7e9196a8b..2698da7e4f49 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -3,76 +3,76 @@ How CPU topology info is exported via sysfs
===========================================

Export CPU topology info via sysfs. Items (attributes) are similar
-to /proc/cpuinfo output of some architectures:
+to /proc/cpuinfo output of some architectures. They reside in
+/sys/devices/system/cpu/cpuX/topology/:

-1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
+physical_package_id:

physical package id of cpuX. Typically corresponds to a physical
socket number, but the actual value is architecture and platform
dependent.

-2) /sys/devices/system/cpu/cpuX/topology/core_id:
+core_id:

the CPU core ID of cpuX. Typically it is the hardware platform's
identifier (rather than the kernel's). The actual value is
architecture and platform dependent.

-3) /sys/devices/system/cpu/cpuX/topology/book_id:
+book_id:

the book ID of cpuX. Typically it is the hardware platform's
identifier (rather than the kernel's). The actual value is
architecture and platform dependent.

-4) /sys/devices/system/cpu/cpuX/topology/drawer_id:
+drawer_id:

the drawer ID of cpuX. Typically it is the hardware platform's
identifier (rather than the kernel's). The actual value is
architecture and platform dependent.

-5) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
+thread_siblings:

internal kernel map of cpuX's hardware threads within the same
core as cpuX.

-6) /sys/devices/system/cpu/cpuX/topology/thread_siblings_list:
+thread_siblings_list:

human-readable list of cpuX's hardware threads within the same
core as cpuX.

-7) /sys/devices/system/cpu/cpuX/topology/core_siblings:
+core_siblings:

internal kernel map of cpuX's hardware threads within the same
physical_package_id.

-8) /sys/devices/system/cpu/cpuX/topology/core_siblings_list:
+core_siblings_list:

human-readable list of cpuX's hardware threads within the same
physical_package_id.

-9) /sys/devices/system/cpu/cpuX/topology/book_siblings:
+book_siblings:

internal kernel map of cpuX's hardware threads within the same
book_id.

-10) /sys/devices/system/cpu/cpuX/topology/book_siblings_list:
+book_siblings_list:

human-readable list of cpuX's hardware threads within the same
book_id.

-11) /sys/devices/system/cpu/cpuX/topology/drawer_siblings:
+drawer_siblings:

internal kernel map of cpuX's hardware threads within the same
drawer_id.

-12) /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list:
+drawer_siblings_list:

human-readable list of cpuX's hardware threads within the same
drawer_id.

-To implement it in an architecture-neutral way, a new source file,
-drivers/base/topology.c, is to export the 6 to 12 attributes. The book
-and drawer related sysfs files will only be created if CONFIG_SCHED_BOOK
-and CONFIG_SCHED_DRAWER are selected.
+Architecture-neutral, drivers/base/topology.c, exports these attributes.
+However, the book and drawer related sysfs files will only be created if
+CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.

CONFIG_SCHED_BOOK and CONFIG_DRAWER are currently only used on s390, where
they reflect the cpu and cache hierarchy.
--
2.18.0-rc0


2019-02-19 09:11:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 09/11] powercap/intel_rapl: update rapl domain name and debug messages

On Tue, Feb 19, 2019 at 4:40 AM Len Brown <[email protected]> wrote:
>
> From: Zhang Rui <[email protected]>
>
> The RAPL domain "name" attribute contains "Package-N",
> which is ambiguous on multi-die per-package systems.
>
> Update the name to "package-X-die-Y" on those systems.
>
> No change on systems without multi-die.
>
> Driver debug messages are also updated.
>
> Signed-off-by: Zhang Rui <[email protected]>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>

Acked-by: Rafael J. Wysocki <[email protected]>

> ---
> drivers/powercap/intel_rapl.c | 57 ++++++++++++++++++++---------------
> 1 file changed, 32 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
> index e004707283da..c990eee45239 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -178,12 +178,15 @@ struct rapl_domain {
> #define power_zone_to_rapl_domain(_zone) \
> container_of(_zone, struct rapl_domain, power_zone)
>
> +/* maximum rapl package domain name: package-%d-die-%d */
> +#define PACKAGE_DOMAIN_NAME_LENGTH 30
>
> -/* Each physical package contains multiple domains, these are the common
> +
> +/* Each rapl package contains multiple domains, these are the common
> * data across RAPL domains within a package.
> */
> struct rapl_package {
> - unsigned int id; /* physical package/socket id */
> + unsigned int id; /* logical die id, equals physical 1-die systems */
> unsigned int nr_domains;
> unsigned long domain_map; /* bit map of active domains */
> unsigned int power_unit;
> @@ -198,6 +201,7 @@ struct rapl_package {
> int lead_cpu; /* one active cpu per package for access */
> /* Track active cpus */
> struct cpumask cpumask;
> + char name[PACKAGE_DOMAIN_NAME_LENGTH];
> };
>
> struct rapl_defaults {
> @@ -926,8 +930,8 @@ static int rapl_check_unit_core(struct rapl_package *rp, int cpu)
> value = (msr_val & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
> rp->time_unit = 1000000 / (1 << value);
>
> - pr_debug("Core CPU package %d energy=%dpJ, time=%dus, power=%duW\n",
> - rp->id, rp->energy_unit, rp->time_unit, rp->power_unit);
> + pr_debug("Core CPU %s energy=%dpJ, time=%dus, power=%duW\n",
> + rp->name, rp->energy_unit, rp->time_unit, rp->power_unit);
>
> return 0;
> }
> @@ -951,8 +955,8 @@ static int rapl_check_unit_atom(struct rapl_package *rp, int cpu)
> value = (msr_val & TIME_UNIT_MASK) >> TIME_UNIT_OFFSET;
> rp->time_unit = 1000000 / (1 << value);
>
> - pr_debug("Atom package %d energy=%dpJ, time=%dus, power=%duW\n",
> - rp->id, rp->energy_unit, rp->time_unit, rp->power_unit);
> + pr_debug("Atom %s energy=%dpJ, time=%dus, power=%duW\n",
> + rp->name, rp->energy_unit, rp->time_unit, rp->power_unit);
>
> return 0;
> }
> @@ -1179,7 +1183,7 @@ static void rapl_update_domain_data(struct rapl_package *rp)
> u64 val;
>
> for (dmn = 0; dmn < rp->nr_domains; dmn++) {
> - pr_debug("update package %d domain %s data\n", rp->id,
> + pr_debug("update %s domain %s data\n", rp->name,
> rp->domains[dmn].name);
> /* exclude non-raw primitives */
> for (prim = 0; prim < NR_RAW_PRIMITIVES; prim++) {
> @@ -1204,7 +1208,6 @@ static void rapl_unregister_powercap(void)
> static int rapl_package_register_powercap(struct rapl_package *rp)
> {
> struct rapl_domain *rd;
> - char dev_name[17]; /* max domain name = 7 + 1 + 8 for int + 1 for null*/
> struct powercap_zone *power_zone = NULL;
> int nr_pl, ret;
>
> @@ -1215,20 +1218,16 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
> for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
> if (rd->id == RAPL_DOMAIN_PACKAGE) {
> nr_pl = find_nr_power_limit(rd);
> - pr_debug("register socket %d package domain %s\n",
> - rp->id, rd->name);
> - memset(dev_name, 0, sizeof(dev_name));
> - snprintf(dev_name, sizeof(dev_name), "%s-%d",
> - rd->name, rp->id);
> + pr_debug("register package domain %s\n", rp->name);
> power_zone = powercap_register_zone(&rd->power_zone,
> control_type,
> - dev_name, NULL,
> + rp->name, NULL,
> &zone_ops[rd->id],
> nr_pl,
> &constraint_ops);
> if (IS_ERR(power_zone)) {
> - pr_debug("failed to register package, %d\n",
> - rp->id);
> + pr_debug("failed to register power zone %s\n",
> + rp->name);
> return PTR_ERR(power_zone);
> }
> /* track parent zone in per package/socket data */
> @@ -1254,8 +1253,8 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
> &constraint_ops);
>
> if (IS_ERR(power_zone)) {
> - pr_debug("failed to register power_zone, %d:%s:%s\n",
> - rp->id, rd->name, dev_name);
> + pr_debug("failed to register power_zone, %s:%s\n",
> + rp->name, rd->name);
> ret = PTR_ERR(power_zone);
> goto err_cleanup;
> }
> @@ -1268,7 +1267,7 @@ static int rapl_package_register_powercap(struct rapl_package *rp)
> * failed after the first domain setup.
> */
> while (--rd >= rp->domains) {
> - pr_debug("unregister package %d domain %s\n", rp->id, rd->name);
> + pr_debug("unregister %s domain %s\n", rp->name, rd->name);
> powercap_unregister_zone(control_type, &rd->power_zone);
> }
>
> @@ -1378,8 +1377,8 @@ static void rapl_detect_powerlimit(struct rapl_domain *rd)
> /* check if the domain is locked by BIOS, ignore if MSR doesn't exist */
> if (!rapl_read_data_raw(rd, FW_LOCK, false, &val64)) {
> if (val64) {
> - pr_info("RAPL package %d domain %s locked by BIOS\n",
> - rd->rp->id, rd->name);
> + pr_info("RAPL %s domain %s locked by BIOS\n",
> + rd->rp->name, rd->name);
> rd->state |= DOMAIN_STATE_BIOS_LOCKED;
> }
> }
> @@ -1408,10 +1407,10 @@ static int rapl_detect_domains(struct rapl_package *rp, int cpu)
> }
> rp->nr_domains = bitmap_weight(&rp->domain_map, RAPL_DOMAIN_MAX);
> if (!rp->nr_domains) {
> - pr_debug("no valid rapl domains found in package %d\n", rp->id);
> + pr_debug("no valid rapl domains found in %s\n", rp->name);
> return -ENODEV;
> }
> - pr_debug("found %d domains on package %d\n", rp->nr_domains, rp->id);
> + pr_debug("found %d domains on %s\n", rp->nr_domains, rp->name);
>
> rp->domains = kcalloc(rp->nr_domains + 1, sizeof(struct rapl_domain),
> GFP_KERNEL);
> @@ -1444,8 +1443,8 @@ static void rapl_remove_package(struct rapl_package *rp)
> rd_package = rd;
> continue;
> }
> - pr_debug("remove package, undo power limit on %d: %s\n",
> - rp->id, rd->name);
> + pr_debug("remove package, undo power limit on %s: %s\n",
> + rp->name, rd->name);
> powercap_unregister_zone(control_type, &rd->power_zone);
> }
> /* do parent zone last */
> @@ -1459,6 +1458,7 @@ static struct rapl_package *rapl_add_package(int cpu)
> {
> int id = topology_unique_die_id(cpu);
> struct rapl_package *rp;
> + struct cpuinfo_x86 *c = &cpu_data(cpu);
> int ret;
>
> rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL);
> @@ -1469,6 +1469,13 @@ static struct rapl_package *rapl_add_package(int cpu)
> rp->id = id;
> rp->lead_cpu = cpu;
>
> + if (c->x86_max_dies > 1)
> + snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH,
> + "package-%d-die-%d", c->phys_proc_id, c->cpu_die_id);
> + else
> + snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d",
> + c->phys_proc_id);
> +
> /* check if the package contains valid domains */
> if (rapl_detect_domains(rp, cpu) ||
> rapl_defaults->check_unit(rp, cpu)) {
> --
> 2.18.0-rc0
>

2019-02-19 09:12:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 08/11] powercap/intel_rapl: Support multi-die/package

On Tue, Feb 19, 2019 at 4:40 AM Len Brown <[email protected]> wrote:
>
> From: Zhang Rui <[email protected]>
>
> On the new dual-die/package systems, the RAPL MSR becomes die-scope.
> Thus instead of one powercap device per physical package, now there
> should be one powercap device for each unique die on these systems.
>
> This patch introduces intel_rapl driver support for new
> dual-die/package systems.
>
> On the hardwares that do not have multi-die, topology_unique_die_id()
> equals topology_physical_package_id(), thus there is no functional change.
>
> Signed-off-by: Zhang Rui <[email protected]>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>

Acked-by: Rafael J. Wysocki <[email protected]>

> ---
> drivers/powercap/intel_rapl.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
> index 6057d9695fed..e004707283da 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -266,7 +266,7 @@ static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */
> /* caller to ensure CPU hotplug lock is held */
> static struct rapl_package *rapl_find_package(int cpu)
> {
> - int id = topology_physical_package_id(cpu);
> + int id = topology_unique_die_id(cpu);
> struct rapl_package *rp;
>
> list_for_each_entry(rp, &rapl_packages, plist) {
> @@ -1457,7 +1457,7 @@ static void rapl_remove_package(struct rapl_package *rp)
> /* called from CPU hotplug notifier, hotplug lock held */
> static struct rapl_package *rapl_add_package(int cpu)
> {
> - int id = topology_physical_package_id(cpu);
> + int id = topology_unique_die_id(cpu);
> struct rapl_package *rp;
> int ret;
>
> --
> 2.18.0-rc0
>

2019-02-19 09:13:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 07/11] powercap/intel_rapl: simplify rapl_find_package()

On Tue, Feb 19, 2019 at 4:41 AM Len Brown <[email protected]> wrote:
>
> From: Zhang Rui <[email protected]>
>
> Simplify how the code to discover a package is called.
> Rename find_package_by_id() to rapl_find_package()
>
> No functional change.
>
> Signed-off-by: Zhang Rui <[email protected]>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>

Acked-by: Rafael J. Wysocki <[email protected]>

> ---
> drivers/powercap/intel_rapl.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
> index 6cdb2c14eee4..6057d9695fed 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -264,8 +264,9 @@ static struct powercap_control_type *control_type; /* PowerCap Controller */
> static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */
>
> /* caller to ensure CPU hotplug lock is held */
> -static struct rapl_package *find_package_by_id(int id)
> +static struct rapl_package *rapl_find_package(int cpu)
> {
> + int id = topology_physical_package_id(cpu);
> struct rapl_package *rp;
>
> list_for_each_entry(rp, &rapl_packages, plist) {
> @@ -1298,7 +1299,7 @@ static int __init rapl_register_psys(void)
> rd->rpl[0].name = pl1_name;
> rd->rpl[1].prim_id = PL2_ENABLE;
> rd->rpl[1].name = pl2_name;
> - rd->rp = find_package_by_id(0);
> + rd->rp = rapl_find_package(0);
>
> power_zone = powercap_register_zone(&rd->power_zone, control_type,
> "psys", NULL,
> @@ -1454,8 +1455,9 @@ static void rapl_remove_package(struct rapl_package *rp)
> }
>
> /* called from CPU hotplug notifier, hotplug lock held */
> -static struct rapl_package *rapl_add_package(int cpu, int pkgid)
> +static struct rapl_package *rapl_add_package(int cpu)
> {
> + int id = topology_physical_package_id(cpu);
> struct rapl_package *rp;
> int ret;
>
> @@ -1464,7 +1466,7 @@ static struct rapl_package *rapl_add_package(int cpu, int pkgid)
> return ERR_PTR(-ENOMEM);
>
> /* add the new package to the list */
> - rp->id = pkgid;
> + rp->id = id;
> rp->lead_cpu = cpu;
>
> /* check if the package contains valid domains */
> @@ -1495,12 +1497,11 @@ static struct rapl_package *rapl_add_package(int cpu, int pkgid)
> */
> static int rapl_cpu_online(unsigned int cpu)
> {
> - int pkgid = topology_physical_package_id(cpu);
> struct rapl_package *rp;
>
> - rp = find_package_by_id(pkgid);
> + rp = rapl_find_package(cpu);
> if (!rp) {
> - rp = rapl_add_package(cpu, pkgid);
> + rp = rapl_add_package(cpu);
> if (IS_ERR(rp))
> return PTR_ERR(rp);
> }
> @@ -1510,11 +1511,10 @@ static int rapl_cpu_online(unsigned int cpu)
>
> static int rapl_cpu_down_prep(unsigned int cpu)
> {
> - int pkgid = topology_physical_package_id(cpu);
> struct rapl_package *rp;
> int lead_cpu;
>
> - rp = find_package_by_id(pkgid);
> + rp = rapl_find_package(cpu);
> if (!rp)
> return 0;
>
> --
> 2.18.0-rc0
>

2019-02-19 16:48:05

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 11/11] hwmon/coretemp: Support multi-die/package

On 2/18/19 7:40 PM, Len Brown wrote:
> From: Zhang Rui <[email protected]>
>
> This patch introduces coretemp driver support
> for new dual-die/package systems.
>
> On the new dual-die/package systems, the package temperature MSRs becomes
> die-scope. Thus instead of one hwmon device per physical package, now
> there should be one hwmon device for each die on these systems.
>
> On the hardwares that do not have multi-die support,
> topology_unique_die_id() equals topology_physical_package_id(), thus the
> only difference is that physical package id is used as the coretemp
> platform device id, instead of logical package id on these systems.
>
> Signed-off-by: Zhang Rui <[email protected]>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>

Acked-by: Guenter Roeck <[email protected]>

> ---
> drivers/hwmon/coretemp.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index 5d34f7271e67..a0b6b2247c3a 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -435,7 +435,7 @@ static int chk_ucode_version(unsigned int cpu)
>
> static struct platform_device *coretemp_get_pdev(unsigned int cpu)
> {
> - int pkgid = topology_logical_package_id(cpu);
> + int pkgid = topology_unique_die_id(cpu);
>
> if (pkgid >= 0 && pkgid < max_packages)
> return pkg_devices[pkgid];
> @@ -579,7 +579,7 @@ static struct platform_driver coretemp_driver = {
>
> static struct platform_device *coretemp_device_add(unsigned int cpu)
> {
> - int err, pkgid = topology_logical_package_id(cpu);
> + int err, pkgid = topology_unique_die_id(cpu);
> struct platform_device *pdev;
>
> if (pkgid < 0)
> @@ -703,7 +703,7 @@ static int coretemp_cpu_offline(unsigned int cpu)
> * the rest.
> */
> if (cpumask_empty(&pd->cpumask)) {
> - pkg_devices[topology_logical_package_id(cpu)] = NULL;
> + pkg_devices[topology_unique_die_id(cpu)] = NULL;
> platform_device_unregister(pdev);
> return 0;
> }
> @@ -732,6 +732,7 @@ static enum cpuhp_state coretemp_hp_online;
> static int __init coretemp_init(void)
> {
> int err;
> + struct cpuinfo_x86 *c = &cpu_data(0);
>
> /*
> * CPUID.06H.EAX[0] indicates whether the CPU has thermal
> @@ -741,7 +742,7 @@ static int __init coretemp_init(void)
> if (!x86_match_cpu(coretemp_ids))
> return -ENODEV;
>
> - max_packages = topology_max_packages();
> + max_packages = topology_max_packages() * c->x86_max_dies;
> pkg_devices = kcalloc(max_packages, sizeof(struct platform_device *),
> GFP_KERNEL);
> if (!pkg_devices)
>


2019-02-19 16:49:52

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support



On 2/18/2019 10:40 PM, Len Brown wrote:
> From: Len Brown <[email protected]>
>
> Some new systems have multiple software-visible die within each package.
> The new CPUID.1F leaf can enumerate this multi-die/package topology.
>
> CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
> and a common updated routine can parse either leaf.
>
> Legacy systems without CPUID.1F, and systems without multi-die/package
> hardware, will see no functional change from this patch series.
>
> Multi-die/package systems will use CPUID.B before this patch,
> and CPUID.1F after this patch. In the CPUID.B case, all die appear
> (incorrectly) to software as individual packages. In the CPUID.1F case,
> the package id's reflect reality, and die_id's become meaningful.
>
> Subsequent patches in this series update the kernel to be multi-die aware.
> In particular, some software needs to know the difference between
> a die-scope MSR and a package-scope MSR.
>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>
> ---
> Documentation/x86/topology.txt | 4 ++
> arch/x86/include/asm/processor.h | 4 +-
> arch/x86/kernel/cpu/topology.c | 82 ++++++++++++++++++++++++--------
> arch/x86/kernel/smpboot.c | 4 +-
> 4 files changed, 73 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
> index 06b3cdbc4048..8107b6cfc9ea 100644
> --- a/Documentation/x86/topology.txt
> +++ b/Documentation/x86/topology.txt
> @@ -46,6 +46,10 @@ The topology of a system is described in the units of:
>
> The number of cores in a package. This information is retrieved via CPUID.
>
> + - cpuinfo_x86.x86_max_dies:
> +
> + The number of dies in a package. This information is retrieved via CPUID.
> +
> - cpuinfo_x86.phys_proc_id:
>
> The physical ID of the package. This information is retrieved via CPUID
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 33051436c864..f2856fe03715 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -105,7 +105,8 @@ struct cpuinfo_x86 {
> int x86_power;
> unsigned long loops_per_jiffy;
> /* cpuid returned max cores value: */
> - u16 x86_max_cores;
> + u16 x86_max_cores;
> + u16 x86_max_dies;
> u16 apicid;
> u16 initial_apicid;
> u16 x86_clflush_size;
> @@ -117,6 +118,7 @@ struct cpuinfo_x86 {
> u16 logical_proc_id;
> /* Core id: */
> u16 cpu_core_id;
> + u16 cpu_die_id;
> /* Index into per_cpu list: */
> u16 cpu_index;
> u32 microcode;
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 8f6c784141d1..6dce6ee77849 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -15,33 +15,61 @@
> /* leaf 0xb SMT level */
> #define SMT_LEVEL 0
>
> -/* leaf 0xb sub-leaf types */
> +/* extended topology sub-leaf types */
> #define INVALID_TYPE 0
> #define SMT_TYPE 1
> #define CORE_TYPE 2
> +#define DIE_TYPE 5
>
> #define LEAFB_SUBTYPE(ecx) (((ecx) >> 8) & 0xff)
> #define BITS_SHIFT_NEXT_LEVEL(eax) ((eax) & 0x1f)
> #define LEVEL_MAX_SIBLINGS(ebx) ((ebx) & 0xffff)
>
> -int detect_extended_topology_early(struct cpuinfo_x86 *c)
> -{
> #ifdef CONFIG_SMP
> +/*
> + * Check if given CPUID extended toplogy "leaf" is implemented
> + */
> +static int check_extended_topology_leaf(int leaf)
> +{
> unsigned int eax, ebx, ecx, edx;
>
> - if (c->cpuid_level < 0xb)
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +
> + if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
> return -1;
>
> - cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> + return 0;
> +}
> +/*
> + * Return best CPUID Extended Toplogy Leaf supported
> + */
> +static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
> +{
> + if (c->cpuid_level >= 0x1f)
> + if (check_extended_topology_leaf(0x1f) == 0)
> + return 0x1f;
>
> - /*
> - * check if the cpuid leaf 0xb is actually implemented.
> - */
> - if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
> + if (c->cpuid_level >= 0xb)
> + if (check_extended_topology_leaf(0xb) == 0)
> + return 0xb;
> +
> + return -1;
> +}
> +#endif
> +
> +int detect_extended_topology_early(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> + unsigned int eax, ebx, ecx, edx;
> + int leaf;
> +
> + leaf = detect_extended_topology_leaf(c);
> + if (leaf < 0)
> return -1;
>
> set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
>
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> /*
> * initial apic id, which also represents 32-bit extended x2apic id.
> */
> @@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
> }
>
> /*
> - * Check for extended topology enumeration cpuid leaf 0xb and if it
> + * Check for extended topology enumeration cpuid leaf, and if it
> * exists, use it for populating initial_apicid and cpu topology
> * detection.
> */
> @@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
> {
> #ifdef CONFIG_SMP
> unsigned int eax, ebx, ecx, edx, sub_index;
> - unsigned int ht_mask_width, core_plus_mask_width;
> + unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
> unsigned int core_select_mask, core_level_siblings;
> + unsigned int die_select_mask, die_level_siblings;
> + int leaf;
>
> - if (detect_extended_topology_early(c) < 0)
> + leaf = detect_extended_topology_leaf(c);
> + if (leaf < 0)
> return -1;
>
> /*
> * Populate HT related information from sub-leaf level 0.
> */
> - cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> + c->initial_apicid = edx;
> core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> + die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>
> sub_index = 1;
> do {
> - cpuid_count(0xb, sub_index, &eax, &ebx, &ecx, &edx);
> + cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
>
> /*
> * Check for the Core type in the implemented sub leaves.
> */
> if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
> core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_level_siblings = core_level_siblings;
> core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> - break;
> + }
> + if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
> + die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> }
>
> sub_index++;
> } while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
>
> core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
> -
> - c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, ht_mask_width)
> - & core_select_mask;
> - c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, core_plus_mask_width);
> + die_select_mask = (~(-1 << die_plus_mask_width)) >>
> + core_plus_mask_width;
> +
> + c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
> + ht_mask_width) & core_select_mask;
> + c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
> + core_plus_mask_width) & die_select_mask;
> + c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
> + die_plus_mask_width);
> /*
> * Reinit the apicid, now that we have extended initial_apicid.
> */
> c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
>
> c->x86_max_cores = (core_level_siblings / smp_num_siblings);
> + c->x86_max_dies = (die_level_siblings / core_level_siblings);
> #endif
> return 0;
> }
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>
> if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
> per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
> if (c->cpu_core_id == o->cpu_core_id)
> return topology_sane(c, o, "smt");
> @@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> }
>
> } else if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
> c->cpu_core_id == o->cpu_core_id) {
> return topology_sane(c, o, "smt");
> }
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> */
> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> - if (c->phys_proc_id == o->phys_proc_id)
> + if (c->cpu_die_id == o->cpu_die_id)
> return true;
> return false;
> }

Shouldn't we check the unique_die_id here?
Die from different package can have the same cpu_die_id.

Thanks,
Kan


2019-02-19 16:58:12

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 05/11] x86 topology: export die_siblings



On 2/18/2019 10:40 PM, Len Brown wrote:
> From: Len Brown <[email protected]>
>
> like core_siblings, except it shows which die are in the same package.
>
> This is needed for lscpu(1) to correctly display die topology.
>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>
> ---
> Documentation/cputopology.txt | 10 ++++++++++
> arch/x86/include/asm/smp.h | 1 +
> arch/x86/include/asm/topology.h | 1 +
> arch/x86/kernel/smpboot.c | 20 ++++++++++++++++++++
> arch/x86/xen/smp_pv.c | 1 +
> drivers/base/topology.c | 6 ++++++
> include/linux/topology.h | 3 +++
> 7 files changed, 42 insertions(+)
>
> diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
> index 287213b4517b..7dd2ae3df233 100644
> --- a/Documentation/cputopology.txt
> +++ b/Documentation/cputopology.txt
> @@ -56,6 +56,16 @@ core_siblings_list:
> human-readable list of cpuX's hardware threads within the same
> die_id.
>
> +die_siblings:
> +
> + internal kernel map of cpuX's hardware threads within the same
> + physical_package_id.
> +
> +die_siblings_list:
> +
> + human-readable list of cpuX's hardware threads within the same
> + physical_package_id.
> +
> book_siblings:
>
> internal kernel map of cpuX's hardware threads within the same
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index 2e95b6c1bca3..39266d193597 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -23,6 +23,7 @@ extern unsigned int num_processors;
>
> DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
> +DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
> /* cpus sharing the last level cache: */
> DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
> DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index 281be6bbc80d..a52a572147ba 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -110,6 +110,7 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
> #define topology_core_id(cpu) (cpu_data(cpu).cpu_core_id)
>
> #ifdef CONFIG_SMP
> +#define topology_die_cpumask(cpu) (per_cpu(cpu_die_map, cpu))
> #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))

Could you please update the document regarding to topology_die_cpumask
and topology_core_cpumask in Documentation/x86/topology.txt

Thanks,
Kan

> #define topology_sibling_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu))
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 4250a87f57db..42d37e4a1918 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -90,6 +90,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
> EXPORT_PER_CPU_SYMBOL(cpu_core_map);
>
> +/* representing HT and core and die siblings of each logical CPU */
> +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
> +EXPORT_PER_CPU_SYMBOL(cpu_die_map);
> +
> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
>
> /* Per CPU bogomips and other parameters */
> @@ -461,6 +465,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> * multicore group inside a NUMA node. If this happens, we will
> * discard the MC level of the topology later.
> */
> +static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> +{
> + if (c->phys_proc_id == o->phys_proc_id)
> + return true;
> + return false;
> +}
> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> if (c->cpu_die_id == o->cpu_die_id)
> @@ -530,6 +540,7 @@ void set_cpu_sibling_map(int cpu)
> cpumask_set_cpu(cpu, topology_sibling_cpumask(cpu));
> cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu));
> cpumask_set_cpu(cpu, topology_core_cpumask(cpu));
> + cpumask_set_cpu(cpu, topology_die_cpumask(cpu));
> c->booted_cores = 1;
> return;
> }
> @@ -576,8 +587,12 @@ void set_cpu_sibling_map(int cpu)
> } else if (i != cpu && !c->booted_cores)
> c->booted_cores = cpu_data(i).booted_cores;
> }
> +
> if (match_die(c, o) && !topology_same_node(c, o))
> x86_has_numa_in_package = true;
> +
> + if ((i == cpu) || (has_mp && match_pkg(c, o)))
> + link_mask(topology_die_cpumask, cpu, i);
> }
>
> threads = cpumask_weight(topology_sibling_cpumask(cpu));
> @@ -1173,6 +1188,7 @@ static __init void disable_smp(void)
> physid_set_mask_of_physid(0, &phys_cpu_present_map);
> cpumask_set_cpu(0, topology_sibling_cpumask(0));
> cpumask_set_cpu(0, topology_core_cpumask(0));
> + cpumask_set_cpu(0, topology_die_cpumask(0));
> }
>
> /*
> @@ -1268,6 +1284,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
> for_each_possible_cpu(i) {
> zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
> zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
> + zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
> zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
> }
>
> @@ -1488,6 +1505,8 @@ static void remove_siblinginfo(int cpu)
> cpu_data(sibling).booted_cores--;
> }
>
> + for_each_cpu(sibling, topology_die_cpumask(cpu))
> + cpumask_clear_cpu(cpu, topology_die_cpumask(sibling));
> for_each_cpu(sibling, topology_sibling_cpumask(cpu))
> cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
> for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
> @@ -1495,6 +1514,7 @@ static void remove_siblinginfo(int cpu)
> cpumask_clear(cpu_llc_shared_mask(cpu));
> cpumask_clear(topology_sibling_cpumask(cpu));
> cpumask_clear(topology_core_cpumask(cpu));
> + cpumask_clear(topology_die_cpumask(cpu));
> c->cpu_core_id = 0;
> c->booted_cores = 0;
> cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
> diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> index 145506f9fdbe..ac13b0be8448 100644
> --- a/arch/x86/xen/smp_pv.c
> +++ b/arch/x86/xen/smp_pv.c
> @@ -251,6 +251,7 @@ static void __init xen_pv_smp_prepare_cpus(unsigned int max_cpus)
> for_each_possible_cpu(i) {
> zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
> zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
> + zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
> zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
> }
> set_cpu_sibling_map(0);
> diff --git a/drivers/base/topology.c b/drivers/base/topology.c
> index 50352cf96f85..5b1317ae3262 100644
> --- a/drivers/base/topology.c
> +++ b/drivers/base/topology.c
> @@ -57,6 +57,10 @@ define_siblings_show_func(core_siblings, core_cpumask);
> static DEVICE_ATTR_RO(core_siblings);
> static DEVICE_ATTR_RO(core_siblings_list);
>
> +define_siblings_show_func(die_siblings, die_cpumask);
> +static DEVICE_ATTR_RO(die_siblings);
> +static DEVICE_ATTR_RO(die_siblings_list);
> +
> #ifdef CONFIG_SCHED_BOOK
> define_id_show_func(book_id);
> static DEVICE_ATTR_RO(book_id);
> @@ -81,6 +85,8 @@ static struct attribute *default_attrs[] = {
> &dev_attr_thread_siblings_list.attr,
> &dev_attr_core_siblings.attr,
> &dev_attr_core_siblings_list.attr,
> + &dev_attr_die_siblings.attr,
> + &dev_attr_die_siblings_list.attr,
> #ifdef CONFIG_SCHED_BOOK
> &dev_attr_book_id.attr,
> &dev_attr_book_siblings.attr,
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 5cc8595dd0e4..47a3e3c08036 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -196,6 +196,9 @@ static inline int cpu_to_mem(int cpu)
> #ifndef topology_core_cpumask
> #define topology_core_cpumask(cpu) cpumask_of(cpu)
> #endif
> +#ifndef topology_die_cpumask
> +#define topology_die_cpumask(cpu) cpumask_of(cpu)
> +#endif
>
> #ifdef CONFIG_SCHED_SMT
> static inline const struct cpumask *cpu_smt_mask(int cpu)
>

2019-02-19 18:46:36

by Brown, Len

[permalink] [raw]
Subject: RE: [PATCH 05/11] x86 topology: export die_siblings

Thanks for the comments, Kan,

>> diff --git a/Documentation/cputopology.txt
>> b/Documentation/cputopology.txt index 287213b4517b..7dd2ae3df233
>> 100644
>> --- a/Documentation/cputopology.txt
>> +++ b/Documentation/cputopology.txt
>> @@ -56,6 +56,16 @@ core_siblings_list:
>> human-readable list of cpuX's hardware threads within the same
>> die_id.
>>
>> +die_siblings:
>> +
>> + internal kernel map of cpuX's hardware threads within the same
>> + physical_package_id.
>> +
>> +die_siblings_list:
>> +
>> + human-readable list of cpuX's hardware threads within the same
>> + physical_package_id.
>> +
>> book_siblings:
>>
>> internal kernel map of cpuX's hardware threads within the same diff

> Could you please update the document regarding to topology_die_cpumask and topology_core_cpumask in Documentation/x86/topology.txt

I agree that the top part of this file, as updated above, should document the external sysfs interface...

I'm less excited about the center of the file trying to document the internal implementation -- as the source code
is actually more clear than the document, but here is an update, consistent with the existing file.
Let me know if it does not fully address your comment.

Thanks,
-Len
---

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 7dd2ae3..f39c759 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -102,6 +102,7 @@ these macros in include/asm-XXX/topology.h::
#define topology_drawer_id(cpu)
#define topology_sibling_cpumask(cpu)
#define topology_core_cpumask(cpu)
+ #define topology_die_cpumask(cpu)
#define topology_book_cpumask(cpu)
#define topology_drawer_cpumask(cpu)

@@ -114,10 +115,11 @@ To be consistent on all architectures, include/linux/topology.h
provides default definitions for any of the above macros that are
not defined by include/asm-XXX/topology.h:

-1) physical_package_id: -1
-2) core_id: 0
-3) sibling_cpumask: just the given CPU
-4) core_cpumask: just the given CPU
+1) topology_physical_package_id: -1
+2) topology_core_id: 0
+3) topology_sibling_cpumask: just the given CPU
+4) topology_core_cpumask: just the given CPU
+5) topology_die_cpumask: just the given CPU

For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
default definitions for topology_book_id() and topology_book_cpumask().


2019-02-19 19:28:33

by Brown, Len

[permalink] [raw]
Subject: RE: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

>> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>> */
>> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>> {
>> - if (c->>phys_proc_id == o->>phys_proc_id)
>> + if (c->>cpu_die_id == o->>cpu_die_id)
>> return true;
>> return false;
>> }

> Shouldn't we check the unique_die_id here?
> Die from different package can have the same cpu_die_id.

Good catch.
Updated this hunk as below.

Thanks!
-Len

@@ -461,7 +463,8 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
*/
static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
- if (c->phys_proc_id == o->phys_proc_id)
+ if ((c->phys_proc_id == o->phys_proc_id) &&
+ (c->cpu_die_id == o->cpu_die_id))
return true;
return false;
}

2019-02-19 19:36:00

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 05/11] x86 topology: export die_siblings



On 2/19/2019 1:43 PM, Brown, Len wrote:
> Thanks for the comments, Kan,
>
>>> diff --git a/Documentation/cputopology.txt
>>> b/Documentation/cputopology.txt index 287213b4517b..7dd2ae3df233
>>> 100644
>>> --- a/Documentation/cputopology.txt
>>> +++ b/Documentation/cputopology.txt
>>> @@ -56,6 +56,16 @@ core_siblings_list:
>>> human-readable list of cpuX's hardware threads within the same
>>> die_id.
>>>
>>> +die_siblings:
>>> +
>>> + internal kernel map of cpuX's hardware threads within the same
>>> + physical_package_id.
>>> +
>>> +die_siblings_list:
>>> +
>>> + human-readable list of cpuX's hardware threads within the same
>>> + physical_package_id.
>>> +
>>> book_siblings:
>>>
>>> internal kernel map of cpuX's hardware threads within the same diff
>
>> Could you please update the document regarding to topology_die_cpumask and topology_core_cpumask in Documentation/x86/topology.txt
>
> I agree that the top part of this file, as updated above, should document the external sysfs interface...
>
> I'm less excited about the center of the file trying to document the internal implementation -- as the source code
> is actually more clear than the document, but here is an update, consistent with the existing file.
> Let me know if it does not fully address your comment.
>

Besides the generic document, I think we should update x86 document as
well, which is in Documentation/x86/topology.txt.

The definition of topology_core_cpumask has to be changed to per die, right?

diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
index 8107b6c..8698a41 100644
--- a/Documentation/x86/topology.txt
+++ b/Documentation/x86/topology.txt
@@ -105,11 +105,16 @@ The topology of a system is described in the units of:

Thread-related topology information in the kernel:

- - topology_core_cpumask():
+ - topology_die_cpumask():

The cpumask contains all online threads in the package to which a
thread
belongs.

+ - topology_core_cpumask():
+
+ The cpumask contains all online threads in the die to which a thread
+ belongs.
+
The number of online threads is also printed in /proc/cpuinfo
"siblings."

- topology_sibling_cpumask():


Thanks,
Kan

> Thanks,
> -Len
> ---
>
> diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
> index 7dd2ae3..f39c759 100644
> --- a/Documentation/cputopology.txt
> +++ b/Documentation/cputopology.txt
> @@ -102,6 +102,7 @@ these macros in include/asm-XXX/topology.h::
> #define topology_drawer_id(cpu)
> #define topology_sibling_cpumask(cpu)
> #define topology_core_cpumask(cpu)
> + #define topology_die_cpumask(cpu)
> #define topology_book_cpumask(cpu)
> #define topology_drawer_cpumask(cpu)
>
> @@ -114,10 +115,11 @@ To be consistent on all architectures, include/linux/topology.h
> provides default definitions for any of the above macros that are
> not defined by include/asm-XXX/topology.h:
>
> -1) physical_package_id: -1
> -2) core_id: 0
> -3) sibling_cpumask: just the given CPU
> -4) core_cpumask: just the given CPU
> +1) topology_physical_package_id: -1
> +2) topology_core_id: 0
> +3) topology_sibling_cpumask: just the given CPU
> +4) topology_core_cpumask: just the given CPU
> +5) topology_die_cpumask: just the given CPU
>
> For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
> default definitions for topology_book_id() and topology_book_cpumask().
>
>

2019-02-20 03:00:11

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

On 2019/2/19 11:40, Len Brown wrote:
> From: Len Brown <[email protected]>
>
> Some new systems have multiple software-visible die within each package.
> The new CPUID.1F leaf can enumerate this multi-die/package topology.
>
> CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
> and a common updated routine can parse either leaf.
>
> Legacy systems without CPUID.1F, and systems without multi-die/package
> hardware, will see no functional change from this patch series.
>
> Multi-die/package systems will use CPUID.B before this patch,
> and CPUID.1F after this patch. In the CPUID.B case, all die appear
> (incorrectly) to software as individual packages. In the CPUID.1F case,
> the package id's reflect reality, and die_id's become meaningful.
>
> Subsequent patches in this series update the kernel to be multi-die aware.
> In particular, some software needs to know the difference between
> a die-scope MSR and a package-scope MSR.
>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>
> ---
> Documentation/x86/topology.txt | 4 ++
> arch/x86/include/asm/processor.h | 4 +-
> arch/x86/kernel/cpu/topology.c | 82 ++++++++++++++++++++++++--------
> arch/x86/kernel/smpboot.c | 4 +-
> 4 files changed, 73 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
> index 06b3cdbc4048..8107b6cfc9ea 100644
> --- a/Documentation/x86/topology.txt
> +++ b/Documentation/x86/topology.txt
> @@ -46,6 +46,10 @@ The topology of a system is described in the units of:
>
> The number of cores in a package. This information is retrieved via CPUID.
>
> + - cpuinfo_x86.x86_max_dies:
> +
> + The number of dies in a package. This information is retrieved via CPUID.
> +
> - cpuinfo_x86.phys_proc_id:
>
> The physical ID of the package. This information is retrieved via CPUID
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 33051436c864..f2856fe03715 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -105,7 +105,8 @@ struct cpuinfo_x86 {
> int x86_power;
> unsigned long loops_per_jiffy;
> /* cpuid returned max cores value: */
> - u16 x86_max_cores;
> + u16 x86_max_cores;
> + u16 x86_max_dies;
> u16 apicid;
> u16 initial_apicid;
> u16 x86_clflush_size;
> @@ -117,6 +118,7 @@ struct cpuinfo_x86 {
> u16 logical_proc_id;
> /* Core id: */
> u16 cpu_core_id;
> + u16 cpu_die_id;
> /* Index into per_cpu list: */
> u16 cpu_index;
> u32 microcode;
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 8f6c784141d1..6dce6ee77849 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -15,33 +15,61 @@
> /* leaf 0xb SMT level */
> #define SMT_LEVEL 0
>
> -/* leaf 0xb sub-leaf types */
> +/* extended topology sub-leaf types */
> #define INVALID_TYPE 0
> #define SMT_TYPE 1
> #define CORE_TYPE 2
> +#define DIE_TYPE 5

This patch set is going to export die topology via sysfs
while the extended topology sub-leaf type could be one of followings:
SMT/Core/Module/Tile/Die according to CPUID.1F from SDM.

Why not choose to export module/tile topology as well?

>
> #define LEAFB_SUBTYPE(ecx) (((ecx) >> 8) & 0xff)
> #define BITS_SHIFT_NEXT_LEVEL(eax) ((eax) & 0x1f)
> #define LEVEL_MAX_SIBLINGS(ebx) ((ebx) & 0xffff)
>
> -int detect_extended_topology_early(struct cpuinfo_x86 *c)
> -{
> #ifdef CONFIG_SMP
> +/*
> + * Check if given CPUID extended toplogy "leaf" is implemented
> + */
> +static int check_extended_topology_leaf(int leaf)
> +{
> unsigned int eax, ebx, ecx, edx;
>
> - if (c->cpuid_level < 0xb)
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +
> + if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
> return -1;
>
> - cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> + return 0;
> +}
> +/*
> + * Return best CPUID Extended Toplogy Leaf supported
> + */
> +static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
> +{
> + if (c->cpuid_level >= 0x1f)
> + if (check_extended_topology_leaf(0x1f) == 0)
> + return 0x1f;
>
> - /*
> - * check if the cpuid leaf 0xb is actually implemented.
> - */
> - if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
> + if (c->cpuid_level >= 0xb)
> + if (check_extended_topology_leaf(0xb) == 0)
> + return 0xb;
> +
> + return -1;
> +}
> +#endif
> +
> +int detect_extended_topology_early(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> + unsigned int eax, ebx, ecx, edx;
> + int leaf;
> +
> + leaf = detect_extended_topology_leaf(c);
> + if (leaf < 0)
> return -1;
>
> set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
>
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> /*
> * initial apic id, which also represents 32-bit extended x2apic id.
> */
> @@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
> }
>
> /*
> - * Check for extended topology enumeration cpuid leaf 0xb and if it
> + * Check for extended topology enumeration cpuid leaf, and if it
> * exists, use it for populating initial_apicid and cpu topology
> * detection.
> */
> @@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
> {
> #ifdef CONFIG_SMP
> unsigned int eax, ebx, ecx, edx, sub_index;
> - unsigned int ht_mask_width, core_plus_mask_width;
> + unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
> unsigned int core_select_mask, core_level_siblings;
> + unsigned int die_select_mask, die_level_siblings;
> + int leaf;
>
> - if (detect_extended_topology_early(c) < 0)
> + leaf = detect_extended_topology_leaf(c);
> + if (leaf < 0)
> return -1;
>
> /*
> * Populate HT related information from sub-leaf level 0.
> */
> - cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> + c->initial_apicid = edx;
> core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> + die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>
> sub_index = 1;
> do {
> - cpuid_count(0xb, sub_index, &eax, &ebx, &ecx, &edx);
> + cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
>
> /*
> * Check for the Core type in the implemented sub leaves.
> */
> if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
> core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_level_siblings = core_level_siblings;
> core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> - break;
> + }
> + if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
> + die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> }
>
> sub_index++;
> } while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
>
> core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
> -
> - c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, ht_mask_width)
> - & core_select_mask;
> - c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, core_plus_mask_width);
> + die_select_mask = (~(-1 << die_plus_mask_width)) >>
> + core_plus_mask_width;
> +
> + c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
> + ht_mask_width) & core_select_mask;
> + c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
> + core_plus_mask_width) & die_select_mask;
> + c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
> + die_plus_mask_width);
> /*
> * Reinit the apicid, now that we have extended initial_apicid.
> */
> c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
>
> c->x86_max_cores = (core_level_siblings / smp_num_siblings);
> + c->x86_max_dies = (die_level_siblings / core_level_siblings);
> #endif
> return 0;
> }
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>
> if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
> per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
> if (c->cpu_core_id == o->cpu_core_id)
> return topology_sane(c, o, "smt");
> @@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> }
>
> } else if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
> c->cpu_core_id == o->cpu_core_id) {
> return topology_sane(c, o, "smt");
> }
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> */
> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> - if (c->phys_proc_id == o->phys_proc_id)
> + if (c->cpu_die_id == o->cpu_die_id)
> return true;
> return false;
> }
>


2019-02-20 06:10:54

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

On Tue, Feb 19, 2019 at 9:59 PM Like Xu <[email protected]> wrote:

> > -/* leaf 0xb sub-leaf types */
> > +/* extended topology sub-leaf types */
> > #define INVALID_TYPE 0
> > #define SMT_TYPE 1
> > #define CORE_TYPE 2
> > +#define DIE_TYPE 5
>
> This patch set is going to export die topology via sysfs
> while the extended topology sub-leaf type could be one of followings:
> SMT/Core/Module/Tile/Die according to CPUID.1F from SDM.
>
> Why not choose to export module/tile topology as well?

Excellent question. (and thank you for reading the SDM:-)

While it is true that there are shipping products with
software-visible CPU modules and tiles,
they shipped before this mechanism was available. As a result, there
are currently zero
products that use this mechanism to enumerate modules and tiles. If
future products
have software-visible modules and tiles, and they choose to use this mechanism,
we can add support for them. But I do not advocate adding code to the kernel
"just in case".

In contrast, the need to support multi-die/package products as soon as
possible is quite clear.

thanks,
Len Brown, Intel Open Source Technology Center

2019-02-20 10:56:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

On Mon, Feb 18, 2019 at 10:40:05PM -0500, Len Brown wrote:
> From: Len Brown <[email protected]>
>
> Some new systems have multiple software-visible die within each package.
> The new CPUID.1F leaf can enumerate this multi-die/package topology.
>
> CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
> and a common updated routine can parse either leaf.
>
> Legacy systems without CPUID.1F, and systems without multi-die/package
> hardware, will see no functional change from this patch series.
>
> Multi-die/package systems will use CPUID.B before this patch,
> and CPUID.1F after this patch. In the CPUID.B case, all die appear
> (incorrectly) to software as individual packages. In the CPUID.1F case,
> the package id's reflect reality, and die_id's become meaningful.
>
> Subsequent patches in this series update the kernel to be multi-die aware.
> In particular, some software needs to know the difference between
> a die-scope MSR and a package-scope MSR.

It would've been nice to have the CPUID instruction 1F leaf reference
3B-3.9 in the SDM, and maybe mention this here too.

Also, figure 8-6 uses Q,R ID, without prior mention.

> +/*
> + * Check if given CPUID extended toplogy "leaf" is implemented
> + */
> +static int check_extended_topology_leaf(int leaf)
> +{
> unsigned int eax, ebx, ecx, edx;
>
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +
> + if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
> return -1;
>
> + return 0;
> +}
> +/*
> + * Return best CPUID Extended Toplogy Leaf supported
> + */
> +static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
> +{
> + if (c->cpuid_level >= 0x1f)
> + if (check_extended_topology_leaf(0x1f) == 0)
> + return 0x1f;

Coding-style requires { } on the outer if-stmt.

>
> + if (c->cpuid_level >= 0xb)
> + if (check_extended_topology_leaf(0xb) == 0)
> + return 0xb;

idem.

> + return -1;
> +}
> +#endif
> +
> +int detect_extended_topology_early(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> + unsigned int eax, ebx, ecx, edx;
> + int leaf;
> +
> + leaf = detect_extended_topology_leaf(c);
> + if (leaf < 0)
> return -1;
>
> set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
>
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> /*
> * initial apic id, which also represents 32-bit extended x2apic id.
> */
> @@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
> }
>
> /*
> + * Check for extended topology enumeration cpuid leaf, and if it
> * exists, use it for populating initial_apicid and cpu topology
> * detection.
> */
> @@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
> {
> #ifdef CONFIG_SMP
> unsigned int eax, ebx, ecx, edx, sub_index;
> + unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
> unsigned int core_select_mask, core_level_siblings;
> + unsigned int die_select_mask, die_level_siblings;
> + int leaf;
>
> + leaf = detect_extended_topology_leaf(c);
> + if (leaf < 0)

s/ / /

> return -1;
>
> /*
> * Populate HT related information from sub-leaf level 0.
> */
> + cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> + c->initial_apicid = edx;
> core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
> core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> + die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>
> sub_index = 1;
> do {
> + cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
>
> /*
> * Check for the Core type in the implemented sub leaves.
> */
> if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
> core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_level_siblings = core_level_siblings;
> core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> - break;
> + }
> + if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
> + die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> + die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> }
>
> sub_index++;
> } while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);

Personally I much prefer union based decoding of cpuid leafs over this
macro magic (git grep cpuid10_):

union cpuid1f_eax {
struct {
unsigned int x2apic_shift : 5;
} split;
unsigned int full;
};

union cpuid1f_ebx {
struct {
unsigned int nr_cpus : 16;
} split;
unsigned int full;
};

enum level_type_enum {
invalid_type = 0,
smt_type,
core_type,
module_type,
tile_type,
die_type,
};

union cpuid1f_ecx {
struct {
unsigned int subleaf : 8;
unsigned int level_type : 8;
} split;
unsigned int full;
};

> core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
> + die_select_mask = (~(-1 << die_plus_mask_width)) >>
> + core_plus_mask_width;
> +
> + c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
> + ht_mask_width) & core_select_mask;
> + c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
> + core_plus_mask_width) & die_select_mask;
> + c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
> + die_plus_mask_width);
> /*
> * Reinit the apicid, now that we have extended initial_apicid.
> */
> c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
>
> c->x86_max_cores = (core_level_siblings / smp_num_siblings);
> + c->x86_max_dies = (die_level_siblings / core_level_siblings);
> #endif
> return 0;
> }


> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> */
> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> - if (c->phys_proc_id == o->phys_proc_id)
> + if (c->cpu_die_id == o->cpu_die_id)
> return true;
> return false;
> }

You haven't explained, and I can't readily find it in the SDM either,
how these topology bits relate to caches and interconnects.

Will these die thingies share LLC, or will LLC be per die. Will NUMA
span dies or not.



2019-02-20 10:59:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 05/11] x86 topology: export die_siblings

On Tue, Feb 19, 2019 at 02:33:59PM -0500, Liang, Kan wrote:
> Besides the generic document, I think we should update x86 document as well,
> which is in Documentation/x86/topology.txt.
>
> The definition of topology_core_cpumask has to be changed to per die, right?

That's what the change to match_die() did. It limits the core mask.

2019-02-20 11:02:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 08/11] powercap/intel_rapl: Support multi-die/package

On Mon, Feb 18, 2019 at 10:40:10PM -0500, Len Brown wrote:
> From: Zhang Rui <[email protected]>
>
> On the new dual-die/package systems, the RAPL MSR becomes die-scope.
> Thus instead of one powercap device per physical package, now there
> should be one powercap device for each unique die on these systems.
>
> This patch introduces intel_rapl driver support for new
> dual-die/package systems.
>
> On the hardwares that do not have multi-die, topology_unique_die_id()
> equals topology_physical_package_id(), thus there is no functional change.
>
> Signed-off-by: Zhang Rui <[email protected]>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>
> ---
> drivers/powercap/intel_rapl.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c
> index 6057d9695fed..e004707283da 100644
> --- a/drivers/powercap/intel_rapl.c
> +++ b/drivers/powercap/intel_rapl.c
> @@ -266,7 +266,7 @@ static struct rapl_domain *platform_rapl_domain; /* Platform (PSys) domain */
> /* caller to ensure CPU hotplug lock is held */
> static struct rapl_package *rapl_find_package(int cpu)
> {
> - int id = topology_physical_package_id(cpu);
> + int id = topology_unique_die_id(cpu);
> struct rapl_package *rp;
>
> list_for_each_entry(rp, &rapl_packages, plist) {
> @@ -1457,7 +1457,7 @@ static void rapl_remove_package(struct rapl_package *rp)
> /* called from CPU hotplug notifier, hotplug lock held */
> static struct rapl_package *rapl_add_package(int cpu)
> {
> - int id = topology_physical_package_id(cpu);
> + int id = topology_unique_die_id(cpu);
> struct rapl_package *rp;
> int ret;

And now your new function names are misnomers.

2019-02-20 15:11:22

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

Thanks for the comments, Peter. I'll update the patch to address the
syntax points. (Maybe checkpatch.pl should be updated to reflect your
preferences?).

About macros vs C. I agree with your preference.
I used macros to be consistent with the existing code, and to be as
backport friendly as possible.
(a number of distros need to pull these patches into their supported kernels)
Sure, I'm willing to write in a cosmetic-only patch, after the
functional changes are upstream.

> It would've been nice to have the CPUID instruction 1F leaf reference
> 3B-3.9 in the SDM, and maybe mention this here too.

I didn't mention SDM sections because they change -- leaving stale
pointers in our commit messages. The SDM is re-published 4 times per
year.

> Also, figure 8-6 uses Q,R ID, without prior mention.

Yeah, the tech-writer took my real example and turned it into a
generic example. Probably a good idea, even if not gracefully
executed. The point of undefined "Q" and "R" are that a new level can
be invented at any time in the future, and the existing code that
doesn't know about future levels should still function properly.

The back-story is that Leaf-B was supposed to work this way, but the
original SDM code example was hard-coded to assume 3-levels. Plenty
of software was hard-coded and would have broken if we had actually
added new levels to Leaf-B. And so Leaf-B had to be "forked" into
Leaf-1F, with the implicit contract that only new code that can
function properly in the face of unknown levels parses leaf-1F.

Yes, the parsing routine in the Linux Kernel will work fine in the
face of future unknown levels. Some utilities (such as cpuid), would
have actually crashed if we added levels to Leaf-B.

> You haven't explained, and I can't readily find it in the SDM either,
> how these topology bits relate to caches and interconnects.
>
> Will these die thingies share LLC, or will LLC be per die. Will NUMA
> span dies or not.

Excellent question.
Cache enumeration in Leaf-4 is totally unchanged.
ACPI NUMA tables are totally unchanged.

From a scheduler point of view, imagine that a SKX system with 4 die
in 4 packages
was mechanically re-designed so that those 4 die resided in 2
double-sized packages.

They may have tweaked the links between the die, but logically it is
identical and compatible,
and the legacy kernel will function properly.

So the effect of Leaf B,1F is that it defines the scope of MSRs.
eg. what processors does a die-scope MSR cover.
That is why the rest of the patch is about sysfs topology, and about
package MSR scope.

Yes, there will be more exotic MSR situations in future products --
the first ones are pretty simple --
something called a package-scope-MSR in the SDM today becomes a
die-scope-MSR in this generation
on a multi-die/package system.

It also reflects how many packages appear in sysfs, and this can
effect licensing of some kinds of software.

thanks,
Len Brown, Intel Open Source Technology Center

2019-02-20 21:53:43

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH 05/11] x86 topology: export die_siblings

Le 19/02/2019 à 04:40, Len Brown a écrit :
> From: Len Brown <[email protected]>
>
> like core_siblings, except it shows which die are in the same package.
>
> This is needed for lscpu(1) to correctly display die topology.
>
> Signed-off-by: Len Brown <[email protected]>
> Cc: [email protected]
> Signed-off-by: Len Brown <[email protected]>
> ---
> Documentation/cputopology.txt | 10 ++++++++++
> arch/x86/include/asm/smp.h | 1 +
> arch/x86/include/asm/topology.h | 1 +
> arch/x86/kernel/smpboot.c | 20 ++++++++++++++++++++
> arch/x86/xen/smp_pv.c | 1 +
> drivers/base/topology.c | 6 ++++++
> include/linux/topology.h | 3 +++
> 7 files changed, 42 insertions(+)
>
> diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
> index 287213b4517b..7dd2ae3df233 100644
> --- a/Documentation/cputopology.txt
> +++ b/Documentation/cputopology.txt
> @@ -56,6 +56,16 @@ core_siblings_list:
> human-readable list of cpuX's hardware threads within the same
> die_id.
>
> +die_siblings:
> +
> + internal kernel map of cpuX's hardware threads within the same
> + physical_package_id.
> +
> +die_siblings_list:
> +
> + human-readable list of cpuX's hardware threads within the same
> + physical_package_id.
> +


Hello Len,

Patches #4 and #5 are changing the meaning the core_siblings (in the
past, it always returned all threads in the entire package). All
existing user-space tools will see each die as a separate package until
they are updated to read die_siblings too. It only matters for multi-die
CPUs when running a recent kernel with an old userspace tool, but it may
still be consider as a sysfs ABI change.

Worse, things will break again if you ever add tile_siblings for
CPUID.1f "Tiles". User-space will suddenly see 2 dies of 2 cores instead
1 die of 2 tiles of 2 cores.

I understand that this isn't easy to fix. But I want to make sure people
are aware of the meaning of this change.

The proper way to avoid this is to stop having file foo_siblings refer
to "the container of foo" instead of "foo itself" (because that
container changes when you add intermediate levels). Rename sysfs files
like below, and you don't get any breakage anymore when adding
intermediate levels:

thread_siblings -> core_threads (can we do sysfs alias or symlink to
keep the old name?)

core_siblings -> die_threads

die_siblings -> package_threads (needs an alias too)

The documentation would also be much easier to read since "die_threads"
is obviously "human-readable list of cpuX's hardware threads within the
same die_id". And no need to modify the doc anymore when adding levels :)

Brice



2019-02-21 05:45:02

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 08/11] powercap/intel_rapl: Support multi-die/package

On Wed, Feb 20, 2019 at 6:02 AM Peter Zijlstra <[email protected]> wrote:

> > list_for_each_entry(rp, &rapl_packages, plist) {
> > @@ -1457,7 +1457,7 @@ static void rapl_remove_package(struct rapl_package *rp)
> > /* called from CPU hotplug notifier, hotplug lock held */
> > static struct rapl_package *rapl_add_package(int cpu)
> > {
> > - int id = topology_physical_package_id(cpu);
> > + int id = topology_unique_die_id(cpu);
> > struct rapl_package *rp;
> > int ret;
>
> And now your new function names are misnomers.

That is fair.

Seems that a subsequent re-name-only patch is appropriate.

Len Brown, Intel Open Source Technology Center

2019-02-21 07:42:19

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 05/11] x86 topology: export die_siblings

Hi Brice,
Thank you for your suggestions!

> Patches #4 and #5 are changing the meaning the core_siblings (in the
> past, it always returned all threads in the entire package). All
> existing user-space tools will see each die as a separate package until
> they are updated to read die_siblings too. It only matters for multi-die
> CPUs when running a recent kernel with an old userspace tool, but it may
> still be consider as a sysfs ABI change.

I agree.

Exhibit 1 is the "lscpu" program.

> Worse, things will break again if you ever add tile_siblings for
> CPUID.1f "Tiles". User-space will suddenly see 2 dies of 2 cores instead
> 1 die of 2 tiles of 2 cores.

Agreed, the existing naming scheme is not resilient to future additions.

> I understand that this isn't easy to fix. But I want to make sure people
> are aware of the meaning of this change.

Here is my list of applications that care about the new CPUID leaf
and the concepts of packages and die:

cpuid
lscpu
x86_energy_perf_policy
turbostat

> The proper way to avoid this is to stop having file foo_siblings refer
> to "the container of foo" instead of "foo itself" (because that
> container changes when you add intermediate levels). Rename sysfs files
> like below, and you don't get any breakage anymore when adding
> intermediate levels:
>
> thread_siblings -> core_threads (can we do sysfs alias or symlink to
> keep the old name?)
>
> core_siblings -> die_threads
>
> die_siblings -> package_threads (needs an alias too)
>
> The documentation would also be much easier to read since "die_threads"
> is obviously "human-readable list of cpuX's hardware threads within the
> same die_id". And no need to modify the doc anymore when adding levels :)

I like your idea!

Hm, I think i'd skip creating "die_siblings", as it adds to the
fragile legacy naming scheme
that we want to deprecate.

And although it is ill-defined and has a mis-leading name, I now think
it would be
better to leave "core_siblings" as defined -- a legacy synonym for
"package_threads". Deprecate it, but keep its original definition
until it is removed.

Updated applications would use:

core_threads
die_threads
package_threads

and they'll be future proof if/when we add any new levels.

the legacy thread_siblings and core_siblings will stick around as aliases:

core_threads (thread_siblings)
die_threads
package_threads (core_siblings)

thanks!
Len Brown, Intel Open Source Technology Center

2019-02-21 08:41:19

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH 05/11] x86 topology: export die_siblings


Le 21/02/2019 à 08:41, Len Brown a écrit :
>
> Here is my list of applications that care about the new CPUID leaf
> and the concepts of packages and die:
>
> cpuid
> lscpu
> x86_energy_perf_policy
> turbostat


You may add hwloc/lstopo which is used by most HPC runtimes (including
your employers' OpenMP and MPI commercial products :))

Brice



2019-02-24 10:06:08

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

Le 19/02/2019 à 04:40, Len Brown a écrit :
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>
> if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
> per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
> if (c->cpu_core_id == o->cpu_core_id)
> return topology_sane(c, o, "smt");
> @@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> }
>
> } else if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
> c->cpu_core_id == o->cpu_core_id) {
> return topology_sane(c, o, "smt");
> }
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> */
> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> - if (c->phys_proc_id == o->phys_proc_id)
> + if (c->cpu_die_id == o->cpu_die_id)
> return true;
> return false;
> }

Hello Len,

I am testing your patches in a VM with CPUID.1f QEMU patches (author Like Xu is CC'ed),
booted to get 2 packages with 1 NUMA node each and 2 dies each:

-smp cpus=16,threads=2,cores=2,dies=2,sockets=2 -numa node,cpus=0-7,nodeid=0 -numa node,cpus=8-15,nodeid=1

sysfs files expose wrong information:
core_siblings contains threads of the local die AND of die with same number in other processors,
eg cpu 0-3 and 8-11 instead of 0-3 only.

The issue is that you seem to assume that die ids will always be unique across multiple packages.
Is this a valid assumption? If so, QEMU patches should be fixed.
If not, I fixed the issue by changing match_die() to check both phys_proc_id and cpu_die_id:

static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
if (c->phys_proc_id == o->phys_proc_id && c->cpu_die_id == o->cpu_die_id)
return true;
return false;
}

Thanks
Brice



2019-02-25 05:32:24

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

On 2019/2/24 18:04, Brice Goglin wrote:
> Le 19/02/2019 à 04:40, Len Brown a écrit :
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index ccd1f2a8e557..4250a87f57db 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>> int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>>
>> if (c->phys_proc_id == o->phys_proc_id &&
>> + c->cpu_die_id == o->cpu_die_id &&
>> per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
>> if (c->cpu_core_id == o->cpu_core_id)
>> return topology_sane(c, o, "smt");
>> @@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>> }
>>
>> } else if (c->phys_proc_id == o->phys_proc_id &&
>> + c->cpu_die_id == o->cpu_die_id &&
>> c->cpu_core_id == o->cpu_core_id) {
>> return topology_sane(c, o, "smt");
>> }
>> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>> */
>> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>> {
>> - if (c->phys_proc_id == o->phys_proc_id)
>> + if (c->cpu_die_id == o->cpu_die_id)
>> return true;
>> return false;
>> }
>
> Hello Len,
>
> I am testing your patches in a VM with CPUID.1f QEMU patches (author Like Xu is CC'ed),
> booted to get 2 packages with 1 NUMA node each and 2 dies each:
>
> -smp cpus=16,threads=2,cores=2,dies=2,sockets=2 -numa node,cpus=0-7,nodeid=0 -numa node,cpus=8-15,nodeid=1
>
> sysfs files expose wrong information:
> core_siblings contains threads of the local die AND of die with same number in other processors,
> eg cpu 0-3 and 8-11 instead of 0-3 only.
>
> The issue is that you seem to assume that die ids will always be unique across multiple packages.

I don't think the intel MCP topolohgy has this global uniqueness
assumption for die_id according to CPUID.1F SDM.

> Is this a valid assumption? If so, QEMU patches should be fixed.
> If not, I fixed the issue by changing match_die() to check both phys_proc_id and cpu_die_id:
>
> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> if (c->phys_proc_id == o->phys_proc_id && c->cpu_die_id == o->cpu_die_id)
> return true;
> return false;
> }
>
> Thanks
> Brice
>
>
>


2019-02-25 08:09:03

by Brown, Len

[permalink] [raw]
Subject: RE: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

Hi Brice,

Yes, you re-discovered the bug that Kan Liang pointed out last week.

I have updated this patch set, and the latest for testing is in my git tree on kernel.org or

https://github.com/lenb/linux.git x86

Note that I took your advice and left the core_siblings with its original definition,
And created package_threads as a synonym. I will e-mail out the patch set again
When I do 2 more things:

1. add core_threads map to sysfs
2. replace unique_die_id with logical_die_id -- turns out it is useful for same reason as logical_package_id.

Thanks,
-Len




-----Original Message-----
From: Brice Goglin [mailto:[email protected]]
Sent: Sunday, February 24, 2019 5:04 AM
To: Len Brown <[email protected]>; [email protected]
Cc: [email protected]; Brown, Len <[email protected]>; [email protected]; Like Xu <[email protected]>
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

Le 19/02/2019 à 04:40, Len Brown a écrit :
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>
> if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
> per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
> if (c->cpu_core_id == o->cpu_core_id)
> return topology_sane(c, o, "smt"); @@ -404,6 +405,7 @@ static
> bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> }
>
> } else if (c->phys_proc_id == o->phys_proc_id &&
> + c->cpu_die_id == o->cpu_die_id &&
> c->cpu_core_id == o->cpu_core_id) {
> return topology_sane(c, o, "smt");
> }
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> */
> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> - if (c->phys_proc_id == o->phys_proc_id)
> + if (c->cpu_die_id == o->cpu_die_id)
> return true;
> return false;
> }

Hello Len,

I am testing your patches in a VM with CPUID.1f QEMU patches (author Like Xu is CC'ed), booted to get 2 packages with 1 NUMA node each and 2 dies each:

-smp cpus=16,threads=2,cores=2,dies=2,sockets=2 -numa node,cpus=0-7,nodeid=0 -numa node,cpus=8-15,nodeid=1

sysfs files expose wrong information:
core_siblings contains threads of the local die AND of die with same number in other processors, eg cpu 0-3 and 8-11 instead of 0-3 only.

The issue is that you seem to assume that die ids will always be unique across multiple packages.
Is this a valid assumption? If so, QEMU patches should be fixed.
If not, I fixed the issue by changing match_die() to check both phys_proc_id and cpu_die_id:

static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) {
if (c->phys_proc_id == o->phys_proc_id && c->cpu_die_id == o->cpu_die_id)
return true;
return false;
}

Thanks
Brice


2019-02-26 04:43:10

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 08/11] powercap/intel_rapl: Support multi-die/package

On Thu, Feb 21, 2019 at 12:44 AM Len Brown <[email protected]> wrote:
>
> On Wed, Feb 20, 2019 at 6:02 AM Peter Zijlstra <[email protected]> wrote:
>
> > > list_for_each_entry(rp, &rapl_packages, plist) {
> > > @@ -1457,7 +1457,7 @@ static void rapl_remove_package(struct rapl_package *rp)
> > > /* called from CPU hotplug notifier, hotplug lock held */
> > > static struct rapl_package *rapl_add_package(int cpu)
> > > {
> > > - int id = topology_physical_package_id(cpu);
> > > + int id = topology_unique_die_id(cpu);
> > > struct rapl_package *rp;
> > > int ret;
> >
> > And now your new function names are misnomers.
>
> That is fair.
>
> Seems that a subsequent re-name-only patch is appropriate.

I'm not sure that re-naming these functions is a good idea.

Fundamentally, the reason stems from the SDM being in-consistent.
And the reason that the SDM is inconsistent is for compatibility.

ie. the PACKAGE MSRs in the SDM are still called PACKAGE MSRs,
even though on a multi-die system, they are DIE scoped.
There is no plan to re-name all of those MSRs.

And so what do you call a routine that parses a PACKAGE_RAPL domain?
Well, it is still called PACKAGE MSR, even though the code is smart enough
to know that on a multi-die system, its scope is die-scoped, not package-scoped.

And yes, just to confuse things, there WILL be PACKAGE scope MSRs
in the future that span multiple die on multi-die systems. No, it will not
be a surprise when they appear -- by definition, they will be different
and incompatible with previous PACKAGE MSRs. We will need to update
some software to be smart about handling them -- no blind assumptions
on using the word "package" in this context.

So unless Rui disagrees, I'm inclined to leave these routine names alone.

thanks,
Len Brown, Intel Open Source Technology Center

2019-02-26 06:56:17

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH 08/11] powercap/intel_rapl: Support multi-die/package

On 一, 2019-02-25 at 23:41 -0500, Len Brown wrote:
> On Thu, Feb 21, 2019 at 12:44 AM Len Brown <[email protected]> wrote:
> >
> >
> > On Wed, Feb 20, 2019 at 6:02 AM Peter Zijlstra <[email protected]
> > g> wrote:
> >
> > >
> > > >
> > > >       list_for_each_entry(rp, &rapl_packages, plist) {
> > > > @@ -1457,7 +1457,7 @@ static void rapl_remove_package(struct
> > > > rapl_package *rp)
> > > >  /* called from CPU hotplug notifier, hotplug lock held */
> > > >  static struct rapl_package *rapl_add_package(int cpu)
> > > >  {
> > > > -     int id = topology_physical_package_id(cpu);
> > > > +     int id = topology_unique_die_id(cpu);
> > > >       struct rapl_package *rp;
> > > >       int ret;
> > > And now your new function names are misnomers.
> > That is fair.
> >
> > Seems that a subsequent re-name-only patch is appropriate.
> I'm not sure that re-naming these functions is a good idea.
>
> Fundamentally, the reason stems from the SDM being in-consistent.
> And the reason that the SDM is inconsistent is for compatibility.
>
> ie. the PACKAGE MSRs in the SDM are still called PACKAGE MSRs,
> even though on a multi-die system, they are DIE scoped.
> There is no plan to re-name all of those MSRs.
>
> And so what do you call a routine that parses a PACKAGE_RAPL domain?
> Well, it is still called PACKAGE MSR, even though the code is smart
> enough
> to know that on a multi-die system, its scope is die-scoped, not
> package-scoped.
>
Agreed.

rapl_add_package() actually adds a package RAPL domain, and "package
RAPL domain" comes from SDM, which is used to describe the RAPL domain
that uses the package MSRs.

IMO, we can keep using "package RAPL domain" as the name of this
certain kind of RAPL domains, but just stop aligning it with the cpu
physical package.
Actually, my next patch fixes the places that had this assumption.
In short, "package domain foo" is okay, but "domain for package X"
should be avoided.

thanks,
rui

> And yes, just to confuse things, there WILL be PACKAGE scope MSRs
> in the future that span multiple die on multi-die systems.  No, it
> will not
> be a surprise when they appear -- by definition, they will be
> different
> and incompatible with previous PACKAGE MSRs.  We will need to update
> some software to be smart about handling them -- no blind assumptions
> on using the word "package" in this context.
>
> So unless Rui disagrees, I'm inclined to leave these routine names
> alone.
>
> thanks,
> Len Brown, Intel Open Source Technology Center

2019-02-26 13:56:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

On Wed, Feb 20, 2019 at 10:08:48AM -0500, Len Brown wrote:
> Thanks for the comments, Peter. I'll update the patch to address the
> syntax points. (Maybe checkpatch.pl should be updated to reflect your
> preferences?).

Don't know about checkpatch; I ignore plenty of its output. I think tglx
started a document somewhere for what tip prefers, but I'm not sure
where that went.

> About macros vs C. I agree with your preference.
> I used macros to be consistent with the existing code, and to be as
> backport friendly as possible.
> (a number of distros need to pull these patches into their supported kernels)
> Sure, I'm willing to write in a cosmetic-only patch, after the
> functional changes are upstream.

Fair enough.

> > It would've been nice to have the CPUID instruction 1F leaf reference
> > 3B-3.9 in the SDM, and maybe mention this here too.
>
> I didn't mention SDM sections because they change -- leaving stale
> pointers in our commit messages. The SDM is re-published 4 times per
> year.

Yah, I know. Which is why I keep all SDMs. So if you say, book 3 section
8 of Jul'17, I can find it :-)

> > You haven't explained, and I can't readily find it in the SDM either,
> > how these topology bits relate to caches and interconnects.
> >
> > Will these die thingies share LLC, or will LLC be per die. Will NUMA
> > span dies or not.
>
> Excellent question.
> Cache enumeration in Leaf-4 is totally unchanged.
> ACPI NUMA tables are totally unchanged.

Sure; and yet Sub-NUMA-Clustering broke stuff in interesting ways. I'm
trying to get a feel for how these levels will interact with all that.

Before that SNC stuff, caches had never spanned NODEs (and I still
think that is 'creative' at best).

> From a scheduler point of view, imagine that a SKX system with 4 die
> in 4 packages was mechanically re-designed so that those 4 die resided
> in 2 double-sized packages.
>
> They may have tweaked the links between the die, but logically it is
> identical and compatible, and the legacy kernel will function
> properly.

This example has LLC in die and yes that works.

But I can imagine things like L2 in tile and L3 across tiles but within
DIE and then it _might_ make sense to still consider the tile for
scheduling.

Another option is having the LLC off die; also not unheard of.

And then there's many creative and slightly crazy ways this can all be
combined :/

> So the effect of Leaf B,1F is that it defines the scope of MSRs. eg.
> what processors does a die-scope MSR cover. That is why the rest of
> the patch is about sysfs topology, and about package MSR scope.
>
> Yes, there will be more exotic MSR situations in future products --
> the first ones are pretty simple -- something called a
> package-scope-MSR in the SDM today becomes a die-scope-MSR in this
> generation on a multi-die/package system.

Yes :-(

> It also reflects how many packages appear in sysfs, and this can
> effect licensing of some kinds of software.

That's just plain insanity and we should not let that affect our sysfs
interfaces.

2019-02-28 16:00:20

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

On Tue, Feb 26, 2019 at 8:54 AM Peter Zijlstra <[email protected]> wrote:

> > > It would've been nice to have the CPUID instruction 1F leaf reference
> > > 3B-3.9 in the SDM, and maybe mention this here too.
> >
> > I didn't mention SDM sections because they change -- leaving stale
> > pointers in our commit messages. The SDM is re-published 4 times per
> > year.
>
> Yah, I know. Which is why I keep all SDMs. So if you say, book 3 section
> 8 of Jul'17, I can find it :-)

The SDM is like software -- usually (but not always) you are better
off with the latest version:-)

> > Cache enumeration in Leaf-4 is totally unchanged.
> > ACPI NUMA tables are totally unchanged.
>
> Sure; and yet Sub-NUMA-Clustering broke stuff in interesting ways. I'm
> trying to get a feel for how these levels will interact with all that.
>
> Before that SNC stuff, caches had never spanned NODEs (and I still
> think that is 'creative' at best).

Yeah, SNC is sort of a curve ball. I guess it made enough stuff run better that
it is available as an option. But it doesn't help everything, so it is disabled
by default...

I think from a scheduler point of view, sticking with the output of
CPUID.4 for the cache topology, and the ACPI tables for the
node topology/distances, is the right strategy.

> > From a scheduler point of view, imagine that a SKX system with 4 die
> > in 4 packages was mechanically re-designed so that those 4 die resided
> > in 2 double-sized packages.
> >
> > They may have tweaked the links between the die, but logically it is
> > identical and compatible, and the legacy kernel will function
> > properly.
>
> This example has LLC in die and yes that works.
>
> But I can imagine things like L2 in tile and L3 across tiles but within
> DIE and then it _might_ make sense to still consider the tile for
> scheduling.
>
> Another option is having the LLC off die; also not unheard of.
>
> And then there's many creative and slightly crazy ways this can all be
> combined :/

If any of those crazy things happen, CPUID.B/CPUID.1F are not
going to help software understand it -- CPUID.4 and the NUMA tables
are the tool of choice.

> > So the effect of Leaf B,1F is that it defines the scope of MSRs. eg.
> > what processors does a die-scope MSR cover. That is why the rest of
> > the patch is about sysfs topology, and about package MSR scope.
> >
> > Yes, there will be more exotic MSR situations in future products --
> > the first ones are pretty simple -- something called a
> > package-scope-MSR in the SDM today becomes a die-scope-MSR in this
> > generation on a multi-die/package system.
>
> Yes :-(
>
> > It also reflects how many packages appear in sysfs, and this can
> > effect licensing of some kinds of software.
>
> That's just plain insanity and we should not let that affect our sysfs
> interfaces.

This change isn't made for compatibility with per-package licensing.
Indeed, vendors, who license based on package-count need to
be made aware that on a system with multi-die/package, they'll
see their package count go _down_ as a result of this change.
Thankfully, I'm told that per-package licensing is quite rare --
most stuff that cares has moved to per-CPU.

I think a good semantic side effect of this series is that it
maintains the invariant that a physical package and a socket are synonymous.
While we don't use the word "socket" in Linux anymore, the industry
broadly assume that the two are synonyms. And people expect that a
physical package really is a physical package -- you can see it,
buy it in a box, and hold it in your hand.

Functionally, the bottom line is that it allows software to discover topology
levels that previously needed to be discovered by looking up family/model,
in the past, which was sort of annoying. The things that care are
things that care about MSR scope. Thankfully, the list of things that care
about MSR scope is quite finite.

thanks,
-Len




--
Len Brown, Intel Open Source Technology Center

2019-02-28 18:39:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

On Thu, Feb 28, 2019 at 10:59:19AM -0500, Len Brown wrote:

> The SDM is like software -- usually (but not always) you are better
> off with the latest version:-)

If only it existed as a .tex file in a git repo. It has been very useful
to have older versions around a number of times.