2021-06-24 01:38:18

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 0/7] perf: Add Sapphire Rapids server uncore support

From: Kan Liang <[email protected]>

Intel Sapphire Rapids supports a discovery mechanism, that allows an
uncore driver to discover the different components ("boxes") of the
chip.

All the generic information of the uncore boxes should be retrieved from
the discovery tables. This has been enabled with the commit edae1f06c2cd
("perf/x86/intel/uncore: Parse uncore discovery tables"). The uncore
driver doesn't need to hard code the generic information for each uncore
box. But we still need to enable various functionality that cannot be
directly discovered. This is done in the patchset.

Without this platform-specific enabling patch set, perf uses a type ID
plus a box ID, e.g., uncore_type_0_0 to name an uncore PMU. With the
patch set, perf has the mapping information from a type ID to a specific
uncore unit. Just like the previous platforms, the uncore PMU can be
named by the real PMU name, e.g., uncore_cha_0. To make the name
backward-compatible, a symlink is created from the new to the old name.
So the user scripts which use the old name still work.

The uncore spec of Sapphire Rapids server can be found at
https://cdrdv2.intel.com/v1/dl/getContent/642245

Kan Liang (7):
driver core: Add a way to get to bus devices kset
perf: Create a symlink for a PMU
perf/x86/intel/uncore: Create a symlink for an uncore PMU
perf/x86/intel/uncore: Add Sapphire Rapids server support
perf/x86/intel/uncore: Factor out snr_uncore_mmio_map()
perf/x86/intel/uncore: Support free-running counters on Sapphire
Rapids server
perf/x86/intel/uncore: Fix invalid unit check

arch/x86/events/intel/uncore.c | 44 ++-
arch/x86/events/intel/uncore.h | 5 +
arch/x86/events/intel/uncore_discovery.c | 42 +--
arch/x86/events/intel/uncore_discovery.h | 23 +-
arch/x86/events/intel/uncore_snbep.c | 508 ++++++++++++++++++++++++++++++-
drivers/base/bus.c | 6 +
include/linux/device/bus.h | 1 +
include/linux/perf_event.h | 1 +
kernel/events/core.c | 19 ++
9 files changed, 607 insertions(+), 42 deletions(-)

--
2.7.4


2021-06-24 01:38:20

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 1/7] driver core: Add a way to get to bus devices kset

From: Kan Liang <[email protected]>

Add an accessor function to get to the bus devices kset associated with
a struct bus_type.

The user of this is the following perf changes, which will need to get
to the kobj of the 'devices' directory of a certain bus.

Reviewed-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
drivers/base/bus.c | 6 ++++++
include/linux/device/bus.h | 1 +
2 files changed, 7 insertions(+)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 36d0c65..3d621a8 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -899,6 +899,12 @@ struct kset *bus_get_kset(struct bus_type *bus)
}
EXPORT_SYMBOL_GPL(bus_get_kset);

+struct kset *bus_get_devices_kset(struct bus_type *bus)
+{
+ return bus->p->devices_kset;
+}
+EXPORT_SYMBOL_GPL(bus_get_devices_kset);
+
struct klist *bus_get_device_klist(struct bus_type *bus)
{
return &bus->p->klist_devices;
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index 1ea5e1d..0ab9273 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -283,6 +283,7 @@ extern int bus_unregister_notifier(struct bus_type *bus,
#define BUS_NOTIFY_DRIVER_NOT_BOUND 0x00000008 /* driver fails to be bound */

extern struct kset *bus_get_kset(struct bus_type *bus);
+extern struct kset *bus_get_devices_kset(struct bus_type *bus);
extern struct klist *bus_get_device_klist(struct bus_type *bus);

#endif
--
2.7.4

2021-06-24 01:38:26

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 2/7] perf: Create a symlink for a PMU

From: Kan Liang <[email protected]>

A perf PMU may have two PMU names. For example, Intel Sapphire Rapids
server supports the discovery mechanism. Without the platform-specific
support, an uncore PMU is named by a type ID plus a box ID, e.g.,
uncore_type_0_0, because the real name of the uncore PMU cannot be
retrieved from the discovery table. With the platform-specific support
later, perf has the mapping information from a type ID to a specific
uncore unit. Just like the previous platforms, the uncore PMU will be
named by the real PMU name, e.g., uncore_cha_0. The user scripts which
work well with the old name may not work anymore. To avoid the issue, a
symlink should be created from the new to the old name.

The perf PMU devices are created under /sys/bus/event_source/devices/.
The symlink should be created in the same directory to facilitate the
perf tool.

Add a new variable, link_name, to store the new name of a PMU. Link it
to the old name if applies.

Signed-off-by: Kan Liang <[email protected]>
---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f5a6a2f..c8a3388 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -284,6 +284,7 @@ struct pmu {
const struct attribute_group **attr_groups;
const struct attribute_group **attr_update;
const char *name;
+ const char *link_name;
int type;

/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0cc8e5..b6024f6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10952,6 +10952,16 @@ static struct bus_type pmu_bus = {

static void pmu_dev_release(struct device *dev)
{
+ struct pmu *pmu = dev_get_drvdata(dev);
+
+ if (pmu && pmu->link_name) {
+ struct kset *devices_kset;
+
+ devices_kset = bus_get_devices_kset(pmu->dev->bus);
+ if (devices_kset)
+ sysfs_remove_link(&devices_kset->kobj, pmu->link_name);
+ }
+
kfree(dev);
}

@@ -10989,6 +10999,15 @@ static int pmu_dev_alloc(struct pmu *pmu)
if (ret)
goto del_dev;

+ if (pmu->link_name) {
+ struct kset *devices_kset;
+
+ devices_kset = bus_get_devices_kset(pmu->dev->bus);
+ if (devices_kset)
+ ret = sysfs_create_link(&devices_kset->kobj,
+ &pmu->dev->kobj,
+ pmu->link_name);
+ }
out:
return ret;

--
2.7.4

2021-06-24 01:39:05

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 3/7] perf/x86/intel/uncore: Create a symlink for an uncore PMU

From: Kan Liang <[email protected]>

The platform specific support for Sapphire Rapids will apply a
meaningful name for each uncore PMU. The script which works well with
the old name may not work anymore because of the name change. To avoid
the issue, a symlink should be created from the new name to the old
name.

Add an variable link_name to store the new name.

The rule to name a new meaningful uncore name is the same as the
previous platforms. Factor out __uncore_get_pmu_name().

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/uncore.c | 28 ++++++++++++++++++++--------
arch/x86/events/intel/uncore.h | 2 ++
2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 9bf4dbb..04e5d37 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -842,6 +842,18 @@ static const struct attribute_group uncore_pmu_attr_group = {
.attrs = uncore_pmu_attrs,
};

+static void __uncore_get_pmu_name(char *pmu_name, const char *type_name,
+ int num_boxes, int idx)
+{
+ if (num_boxes == 1) {
+ if (strlen(type_name) > 0)
+ sprintf(pmu_name, "uncore_%s", type_name);
+ else
+ sprintf(pmu_name, "uncore");
+ } else
+ sprintf(pmu_name, "uncore_%s_%d", type_name, idx);
+}
+
static void uncore_get_pmu_name(struct intel_uncore_pmu *pmu)
{
struct intel_uncore_type *type = pmu->type;
@@ -857,17 +869,17 @@ static void uncore_get_pmu_name(struct intel_uncore_pmu *pmu)
sprintf(pmu->name, "uncore_type_%u_%d",
type->type_id, type->box_ids[pmu->pmu_idx]);
}
+
+ if (type->link_name) {
+ __uncore_get_pmu_name(pmu->link_name, type->link_name,
+ type->num_boxes, type->box_ids[pmu->pmu_idx]);
+ pmu->pmu.link_name = pmu->link_name;
+ }
return;
}

- if (type->num_boxes == 1) {
- if (strlen(type->name) > 0)
- sprintf(pmu->name, "uncore_%s", type->name);
- else
- sprintf(pmu->name, "uncore");
- } else
- sprintf(pmu->name, "uncore_%s_%d", type->name, pmu->pmu_idx);
-
+ __uncore_get_pmu_name(pmu->name, type->name,
+ type->num_boxes, pmu->pmu_idx);
}

static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 187d728..2fc8565 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -46,6 +46,7 @@ struct intel_uncore_topology;

struct intel_uncore_type {
const char *name;
+ const char *link_name;
int num_counters;
int num_boxes;
int perf_ctr_bits;
@@ -118,6 +119,7 @@ struct intel_uncore_ops {
struct intel_uncore_pmu {
struct pmu pmu;
char name[UNCORE_PMU_NAME_LEN];
+ char link_name[UNCORE_PMU_NAME_LEN];
int pmu_idx;
int func_id;
bool registered;
--
2.7.4

2021-06-24 01:40:03

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 6/7] perf/x86/intel/uncore: Support free-running counters on Sapphire Rapids server

From: Kan Liang <[email protected]>

Several free-running counters for IMC and IIO uncore blocks are
supported on Sapphire Rapids server.

They are not enumerated in the discovery tables. Extend
generic_init_uncores() to support extra uncore types. The uncore types
for the free-running counters is inserted right after the uncore types
retrieved from the discovery table.

The number of the free-running counter boxes is calculated from the
number of corresponding standard boxes.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/uncore_discovery.c | 10 +-
arch/x86/events/intel/uncore_discovery.h | 2 +-
arch/x86/events/intel/uncore_snbep.c | 188 ++++++++++++++++++++++++++++++-
3 files changed, 189 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 2de5563..8e9f1df 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -569,7 +569,7 @@ static bool uncore_update_uncore_type(enum uncore_access_type type_id,
}

struct intel_uncore_type **
-intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
+intel_uncore_generic_init_uncores(enum uncore_access_type type_id, int num_extra)
{
struct intel_uncore_discovery_type *type;
struct intel_uncore_type **uncores;
@@ -577,7 +577,7 @@ intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
struct rb_node *node;
int i = 0;

- uncores = kcalloc(num_discovered_types[type_id] + 1,
+ uncores = kcalloc(num_discovered_types[type_id] + num_extra + 1,
sizeof(struct intel_uncore_type *), GFP_KERNEL);
if (!uncores)
return empty_uncore;
@@ -606,17 +606,17 @@ intel_uncore_generic_init_uncores(enum uncore_access_type type_id)

void intel_uncore_generic_uncore_cpu_init(void)
{
- uncore_msr_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MSR);
+ uncore_msr_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MSR, 0);
}

int intel_uncore_generic_uncore_pci_init(void)
{
- uncore_pci_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_PCI);
+ uncore_pci_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_PCI, 0);

return 0;
}

void intel_uncore_generic_uncore_mmio_init(void)
{
- uncore_mmio_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MMIO);
+ uncore_mmio_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MMIO, 0);
}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index b85655b..7280c8a 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -149,4 +149,4 @@ u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
struct perf_event *event);

struct intel_uncore_type **
-intel_uncore_generic_init_uncores(enum uncore_access_type type_id);
+intel_uncore_generic_init_uncores(enum uncore_access_type type_id, int num_extra);
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 2696657..f530fc5 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -5728,6 +5728,8 @@ static struct intel_uncore_type spr_uncore_mdf = {
};

#define UNCORE_SPR_NUM_UNCORE_TYPES 12
+#define UNCORE_SPR_IIO 1
+#define UNCORE_SPR_IMC 6

static struct intel_uncore_type *spr_uncores[UNCORE_SPR_NUM_UNCORE_TYPES] = {
&spr_uncore_chabox,
@@ -5744,6 +5746,145 @@ static struct intel_uncore_type *spr_uncores[UNCORE_SPR_NUM_UNCORE_TYPES] = {
&spr_uncore_mdf,
};

+enum perf_uncore_spr_iio_freerunning_type_id {
+ SPR_IIO_MSR_IOCLK,
+ SPR_IIO_MSR_BW_IN,
+ SPR_IIO_MSR_BW_OUT,
+
+ SPR_IIO_FREERUNNING_TYPE_MAX,
+};
+
+static struct freerunning_counters spr_iio_freerunning[] = {
+ [SPR_IIO_MSR_IOCLK] = { 0x340e, 0x1, 0x10, 1, 48 },
+ [SPR_IIO_MSR_BW_IN] = { 0x3800, 0x1, 0x10, 8, 48 },
+ [SPR_IIO_MSR_BW_OUT] = { 0x3808, 0x1, 0x10, 8, 48 },
+};
+
+static struct uncore_event_desc spr_uncore_iio_freerunning_events[] = {
+ /* Free-Running IIO CLOCKS Counter */
+ INTEL_UNCORE_EVENT_DESC(ioclk, "event=0xff,umask=0x10"),
+ /* Free-Running IIO BANDWIDTH IN Counters */
+ INTEL_UNCORE_EVENT_DESC(bw_in_port0, "event=0xff,umask=0x20"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port0.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port0.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port1, "event=0xff,umask=0x21"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port1.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port1.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port2, "event=0xff,umask=0x22"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port2.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port2.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port3, "event=0xff,umask=0x23"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port3.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port3.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port4, "event=0xff,umask=0x24"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port4.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port4.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port5, "event=0xff,umask=0x25"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port5.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port5.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port6, "event=0xff,umask=0x26"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port6.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port6.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port7, "event=0xff,umask=0x27"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port7.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_in_port7.unit, "MiB"),
+ /* Free-Running IIO BANDWIDTH OUT Counters */
+ INTEL_UNCORE_EVENT_DESC(bw_out_port0, "event=0xff,umask=0x30"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port0.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port0.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port1, "event=0xff,umask=0x31"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port1.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port1.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port2, "event=0xff,umask=0x32"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port2.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port2.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port3, "event=0xff,umask=0x33"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port3.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port3.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port4, "event=0xff,umask=0x34"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port4.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port4.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port5, "event=0xff,umask=0x35"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port5.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port5.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port6, "event=0xff,umask=0x36"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port6.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port6.unit, "MiB"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port7, "event=0xff,umask=0x37"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port7.scale, "3.814697266e-6"),
+ INTEL_UNCORE_EVENT_DESC(bw_out_port7.unit, "MiB"),
+ { /* end: all zeroes */ },
+};
+
+static struct intel_uncore_type spr_uncore_iio_free_running = {
+ .name = "iio_free_running",
+ .num_counters = 17,
+ .num_freerunning_types = SPR_IIO_FREERUNNING_TYPE_MAX,
+ .freerunning = spr_iio_freerunning,
+ .ops = &skx_uncore_iio_freerunning_ops,
+ .event_descs = spr_uncore_iio_freerunning_events,
+ .format_group = &skx_uncore_iio_freerunning_format_group,
+};
+
+enum perf_uncore_spr_imc_freerunning_type_id {
+ SPR_IMC_DCLK,
+ SPR_IMC_PQ_CYCLES,
+
+ SPR_IMC_FREERUNNING_TYPE_MAX,
+};
+
+static struct freerunning_counters spr_imc_freerunning[] = {
+ [SPR_IMC_DCLK] = { 0x22b0, 0x0, 0, 1, 48 },
+ [SPR_IMC_PQ_CYCLES] = { 0x2318, 0x8, 0, 2, 48 },
+};
+
+static struct uncore_event_desc spr_uncore_imc_freerunning_events[] = {
+ INTEL_UNCORE_EVENT_DESC(dclk, "event=0xff,umask=0x10"),
+
+ INTEL_UNCORE_EVENT_DESC(rpq_cycles, "event=0xff,umask=0x20"),
+ INTEL_UNCORE_EVENT_DESC(wpq_cycles, "event=0xff,umask=0x21"),
+ { /* end: all zeroes */ },
+};
+
+#define SPR_MC_DEVICE_ID 0x3251
+
+static void spr_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
+{
+ int mem_offset = box->pmu->pmu_idx * ICX_IMC_MEM_STRIDE + SNR_IMC_MMIO_MEM0_OFFSET;
+
+ snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
+ mem_offset, SPR_MC_DEVICE_ID);
+}
+
+static struct intel_uncore_ops spr_uncore_imc_freerunning_ops = {
+ .init_box = spr_uncore_imc_freerunning_init_box,
+ .exit_box = uncore_mmio_exit_box,
+ .read_counter = uncore_mmio_read_counter,
+ .hw_config = uncore_freerunning_hw_config,
+};
+
+static struct intel_uncore_type spr_uncore_imc_free_running = {
+ .name = "imc_free_running",
+ .num_counters = 3,
+ .mmio_map_size = SNR_IMC_MMIO_SIZE,
+ .num_freerunning_types = SPR_IMC_FREERUNNING_TYPE_MAX,
+ .freerunning = spr_imc_freerunning,
+ .ops = &spr_uncore_imc_freerunning_ops,
+ .event_descs = spr_uncore_imc_freerunning_events,
+ .format_group = &skx_uncore_iio_freerunning_format_group,
+};
+
+#define UNCORE_SPR_MSR_EXTRA_UNCORES 1
+#define UNCORE_SPR_MMIO_EXTRA_UNCORES 1
+
+static struct intel_uncore_type *spr_msr_uncores[UNCORE_SPR_MSR_EXTRA_UNCORES] = {
+ &spr_uncore_iio_free_running,
+};
+
+static struct intel_uncore_type *spr_mmio_uncores[UNCORE_SPR_MMIO_EXTRA_UNCORES] = {
+ &spr_uncore_imc_free_running,
+};
+
static void uncore_type_customized_copy(struct intel_uncore_type *to_type,
struct intel_uncore_type *from_type)
{
@@ -5779,11 +5920,13 @@ static void uncore_type_customized_copy(struct intel_uncore_type *to_type,
}

static struct intel_uncore_type **
-uncore_get_uncores(enum uncore_access_type type_id)
+uncore_get_uncores(enum uncore_access_type type_id, int num_extra,
+ struct intel_uncore_type **extra)
{
struct intel_uncore_type **types, **start_types;
+ int i;

- start_types = types = intel_uncore_generic_init_uncores(type_id);
+ start_types = types = intel_uncore_generic_init_uncores(type_id, num_extra);

/* Only copy the customized features */
for (; *types; types++) {
@@ -5792,23 +5935,58 @@ uncore_get_uncores(enum uncore_access_type type_id)
uncore_type_customized_copy(*types, spr_uncores[(*types)->type_id]);
}

+ for (i = 0; i < num_extra; i++, types++)
+ *types = extra[i];
+
return start_types;
}

+static struct intel_uncore_type *
+uncore_find_type_by_id(struct intel_uncore_type **types, int type_id)
+{
+ for (; *types; types++) {
+ if (type_id == (*types)->type_id)
+ return *types;
+ }
+
+ return NULL;
+}
+
void spr_uncore_cpu_init(void)
{
- uncore_msr_uncores = uncore_get_uncores(UNCORE_ACCESS_MSR);
+ struct intel_uncore_type *iio;
+
+ uncore_msr_uncores = uncore_get_uncores(UNCORE_ACCESS_MSR,
+ UNCORE_SPR_MSR_EXTRA_UNCORES,
+ spr_msr_uncores);
+
+ iio = uncore_find_type_by_id(uncore_msr_uncores, UNCORE_SPR_IIO);
+ if (iio)
+ spr_uncore_iio_free_running.num_boxes = iio->num_boxes;
}

int spr_uncore_pci_init(void)
{
- uncore_pci_uncores = uncore_get_uncores(UNCORE_ACCESS_PCI);
+ uncore_pci_uncores = uncore_get_uncores(UNCORE_ACCESS_PCI, 0, NULL);
return 0;
}

void spr_uncore_mmio_init(void)
{
- uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO);
+ struct intel_uncore_type *imc;
+
+ int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true);
+
+ if (ret)
+ uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO, 0, NULL);
+ else {
+ uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO,
+ UNCORE_SPR_MMIO_EXTRA_UNCORES,
+ spr_mmio_uncores);
+ imc = uncore_find_type_by_id(uncore_mmio_uncores, UNCORE_SPR_IMC);
+ if (imc)
+ spr_uncore_imc_free_running.num_boxes = imc->num_boxes / 2;
+ }
}

/* end of SPR uncore support */
--
2.7.4

2021-06-24 01:40:12

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 7/7] perf/x86/intel/uncore: Fix invalid unit check

From: Kan Liang <[email protected]>

The uncore unit with the type ID 0 and the unit ID 0 is missed.

The table3 of the uncore unit maybe 0. The
uncore_discovery_invalid_unit() mistakenly treated it as an invalid
value.

Remove the !unit.table3 check.

Fixes: edae1f06c2cd ("perf/x86/intel/uncore: Parse uncore discovery tables")
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Cc: [email protected]
---
arch/x86/events/intel/uncore_discovery.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 7280c8a..6d735611 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -30,7 +30,7 @@


#define uncore_discovery_invalid_unit(unit) \
- (!unit.table1 || !unit.ctl || !unit.table3 || \
+ (!unit.table1 || !unit.ctl || \
unit.table1 == -1ULL || unit.ctl == -1ULL || \
unit.table3 == -1ULL)

--
2.7.4

2021-06-24 01:41:36

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 4/7] perf/x86/intel/uncore: Add Sapphire Rapids server support

From: Kan Liang <[email protected]>

Intel Sapphire Rapids supports a discovery mechanism, that allows an
uncore driver to discover the different components ("boxes") of the
chip.

All the generic information of the uncore boxes should be retrieved from
the discovery tables. This has been enabled with the commit edae1f06c2cd
("perf/x86/intel/uncore: Parse uncore discovery tables"). Add
use_discovery to indicate the case. The uncore driver doesn't need to
hard code the generic information for each uncore box.

But we still need to enable various functionality that cannot be
directly discovered. This is done here.
- Add a meaningful name for each uncore block.
- Add CHA filter support.
- The layout of the control registers for each uncore block is a little
bit different from the generic one. Set the platform specific format
and ops. Expose the common ops which can be reused.
- Add a fixed counter for IMC

All the undiscovered platform-specific features are hard code in the
spr_uncores[]. Add uncore_type_customized_copy(), instead of the memcpy,
to only overwrite these features.

Only the uncore blocks which are inculded in the discovery tables are
enabled here. Other uncore blocks, e.g., free-running counters, will be
supported in the following patch.

Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/uncore.c | 16 +-
arch/x86/events/intel/uncore.h | 3 +
arch/x86/events/intel/uncore_discovery.c | 32 ++--
arch/x86/events/intel/uncore_discovery.h | 21 +++
arch/x86/events/intel/uncore_snbep.c | 294 +++++++++++++++++++++++++++++++
5 files changed, 349 insertions(+), 17 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 04e5d37..d2837b6 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1675,6 +1675,7 @@ struct intel_uncore_init_fun {
void (*cpu_init)(void);
int (*pci_init)(void);
void (*mmio_init)(void);
+ bool use_discovery;
};

static const struct intel_uncore_init_fun nhm_uncore_init __initconst = {
@@ -1777,6 +1778,13 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
.mmio_init = snr_uncore_mmio_init,
};

+static const struct intel_uncore_init_fun spr_uncore_init __initconst = {
+ .cpu_init = spr_uncore_cpu_init,
+ .pci_init = spr_uncore_pci_init,
+ .mmio_init = spr_uncore_mmio_init,
+ .use_discovery = true,
+};
+
static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
.cpu_init = intel_uncore_generic_uncore_cpu_init,
.pci_init = intel_uncore_generic_uncore_pci_init,
@@ -1821,6 +1829,7 @@ static const struct x86_cpu_id intel_uncore_match[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE, &rkl_uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, &adl_uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, &adl_uncore_init),
+ X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, &spr_uncore_init),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D, &snr_uncore_init),
{},
};
@@ -1844,8 +1853,13 @@ static int __init intel_uncore_init(void)
uncore_init = (struct intel_uncore_init_fun *)&generic_uncore_init;
else
return -ENODEV;
- } else
+ } else {
uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
+ if (uncore_no_discover && uncore_init->use_discovery)
+ return -ENODEV;
+ if (uncore_init->use_discovery && !intel_uncore_has_discovery_tables())
+ return -ENODEV;
+ }

if (uncore_init->pci_init) {
pret = uncore_init->pci_init();
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 2fc8565..050d7ff 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -610,6 +610,9 @@ void snr_uncore_mmio_init(void);
int icx_uncore_pci_init(void);
void icx_uncore_cpu_init(void);
void icx_uncore_mmio_init(void);
+int spr_uncore_pci_init(void);
+void spr_uncore_cpu_init(void);
+void spr_uncore_mmio_init(void);

/* uncore_nhmex.c */
void nhmex_uncore_cpu_init(void);
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index aba9bff..2de5563 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -337,17 +337,17 @@ static const struct attribute_group generic_uncore_format_group = {
.attrs = generic_uncore_formats_attr,
};

-static void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
+void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
{
wrmsrl(uncore_msr_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
}

-static void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box)
{
wrmsrl(uncore_msr_box_ctl(box), GENERIC_PMON_BOX_CTL_FRZ);
}

-static void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box)
{
wrmsrl(uncore_msr_box_ctl(box), 0);
}
@@ -377,7 +377,7 @@ static struct intel_uncore_ops generic_uncore_msr_ops = {
.read_counter = uncore_msr_read_counter,
};

-static void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
+void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
int box_ctl = uncore_pci_box_ctl(box);
@@ -386,7 +386,7 @@ static void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
}

-static void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
int box_ctl = uncore_pci_box_ctl(box);
@@ -394,7 +394,7 @@ static void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_FRZ);
}

-static void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box)
{
struct pci_dev *pdev = box->pci_dev;
int box_ctl = uncore_pci_box_ctl(box);
@@ -403,7 +403,7 @@ static void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box)
}

static void intel_generic_uncore_pci_enable_event(struct intel_uncore_box *box,
- struct perf_event *event)
+ struct perf_event *event)
{
struct pci_dev *pdev = box->pci_dev;
struct hw_perf_event *hwc = &event->hw;
@@ -411,8 +411,8 @@ static void intel_generic_uncore_pci_enable_event(struct intel_uncore_box *box,
pci_write_config_dword(pdev, hwc->config_base, hwc->config);
}

-static void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
- struct perf_event *event)
+void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
+ struct perf_event *event)
{
struct pci_dev *pdev = box->pci_dev;
struct hw_perf_event *hwc = &event->hw;
@@ -420,8 +420,8 @@ static void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
pci_write_config_dword(pdev, hwc->config_base, 0);
}

-static u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
- struct perf_event *event)
+u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
+ struct perf_event *event)
{
struct pci_dev *pdev = box->pci_dev;
struct hw_perf_event *hwc = &event->hw;
@@ -454,7 +454,7 @@ static unsigned int generic_uncore_mmio_box_ctl(struct intel_uncore_box *box)
return type->box_ctls[box->dieid] + type->mmio_offsets[box->pmu->pmu_idx];
}

-static void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
+void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
{
unsigned int box_ctl = generic_uncore_mmio_box_ctl(box);
struct intel_uncore_type *type = box->pmu->type;
@@ -478,7 +478,7 @@ static void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
writel(GENERIC_PMON_BOX_CTL_INT, box->io_addr);
}

-static void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
{
if (!box->io_addr)
return;
@@ -486,7 +486,7 @@ static void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
writel(GENERIC_PMON_BOX_CTL_FRZ, box->io_addr);
}

-static void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box)
{
if (!box->io_addr)
return;
@@ -505,7 +505,7 @@ static void intel_generic_uncore_mmio_enable_event(struct intel_uncore_box *box,
writel(hwc->config, box->io_addr + hwc->config_base);
}

-static void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
+void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
struct perf_event *event)
{
struct hw_perf_event *hwc = &event->hw;
@@ -568,7 +568,7 @@ static bool uncore_update_uncore_type(enum uncore_access_type type_id,
return true;
}

-static struct intel_uncore_type **
+struct intel_uncore_type **
intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
{
struct intel_uncore_discovery_type *type;
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 1d65293..b85655b 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -129,3 +129,24 @@ void intel_uncore_clear_discovery_tables(void);
void intel_uncore_generic_uncore_cpu_init(void);
int intel_uncore_generic_uncore_pci_init(void);
void intel_uncore_generic_uncore_mmio_init(void);
+
+void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box);
+void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box);
+
+void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box);
+void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
+ struct perf_event *event);
+
+void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box);
+void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
+ struct perf_event *event);
+u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
+ struct perf_event *event);
+
+struct intel_uncore_type **
+intel_uncore_generic_init_uncores(enum uncore_access_type type_id);
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 7622762..a4f436a 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* SandyBridge-EP/IvyTown uncore support */
#include "uncore.h"
+#include "uncore_discovery.h"

/* SNB-EP pci bus to socket mapping */
#define SNBEP_CPUNODEID 0x40
@@ -454,6 +455,17 @@
#define ICX_NUMBER_IMC_CHN 2
#define ICX_IMC_MEM_STRIDE 0x4

+/* SPR */
+#define SPR_RAW_EVENT_MASK_EXT 0xffffff
+
+/* SPR CHA */
+#define SPR_CHA_PMON_CTL_TID_EN (1 << 16)
+#define SPR_CHA_PMON_EVENT_MASK (SNBEP_PMON_RAW_EVENT_MASK | \
+ SPR_CHA_PMON_CTL_TID_EN)
+#define SPR_CHA_PMON_BOX_FILTER_TID 0x3ff
+
+#define SPR_C0_MSR_PMON_BOX_FILTER0 0x200e
+
DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
DEFINE_UNCORE_FORMAT_ATTR(event2, event, "config:0-6");
DEFINE_UNCORE_FORMAT_ATTR(event_ext, event, "config:0-7,21");
@@ -466,6 +478,7 @@ DEFINE_UNCORE_FORMAT_ATTR(umask_ext4, umask, "config:8-15,32-55");
DEFINE_UNCORE_FORMAT_ATTR(qor, qor, "config:16");
DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
DEFINE_UNCORE_FORMAT_ATTR(tid_en, tid_en, "config:19");
+DEFINE_UNCORE_FORMAT_ATTR(tid_en2, tid_en, "config:16");
DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
DEFINE_UNCORE_FORMAT_ATTR(thresh9, thresh, "config:24-35");
DEFINE_UNCORE_FORMAT_ATTR(thresh8, thresh, "config:24-31");
@@ -5504,3 +5517,284 @@ void icx_uncore_mmio_init(void)
}

/* end of ICX uncore support */
+
+/* SPR uncore support */
+
+static void spr_uncore_msr_enable_event(struct intel_uncore_box *box,
+ struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct hw_perf_event_extra *reg1 = &hwc->extra_reg;
+
+ if (reg1->idx != EXTRA_REG_NONE)
+ wrmsrl(reg1->reg, reg1->config);
+
+ wrmsrl(hwc->config_base, hwc->config);
+}
+
+static void spr_uncore_msr_disable_event(struct intel_uncore_box *box,
+ struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct hw_perf_event_extra *reg1 = &hwc->extra_reg;
+
+ if (reg1->idx != EXTRA_REG_NONE)
+ wrmsrl(reg1->reg, 0);
+
+ wrmsrl(hwc->config_base, 0);
+}
+
+static int spr_cha_hw_config(struct intel_uncore_box *box, struct perf_event *event)
+{
+ struct hw_perf_event_extra *reg1 = &event->hw.extra_reg;
+ bool tie_en = !!(event->hw.config & SPR_CHA_PMON_CTL_TID_EN);
+ struct intel_uncore_type *type = box->pmu->type;
+
+ if (tie_en) {
+ reg1->reg = SPR_C0_MSR_PMON_BOX_FILTER0 +
+ HSWEP_CBO_MSR_OFFSET * type->box_ids[box->pmu->pmu_idx];
+ reg1->config = event->attr.config1 & SPR_CHA_PMON_BOX_FILTER_TID;
+ reg1->idx = 0;
+ }
+
+ return 0;
+}
+
+static struct intel_uncore_ops spr_uncore_chabox_ops = {
+ .init_box = intel_generic_uncore_msr_init_box,
+ .disable_box = intel_generic_uncore_msr_disable_box,
+ .enable_box = intel_generic_uncore_msr_enable_box,
+ .disable_event = spr_uncore_msr_disable_event,
+ .enable_event = spr_uncore_msr_enable_event,
+ .read_counter = uncore_msr_read_counter,
+ .hw_config = spr_cha_hw_config,
+ .get_constraint = uncore_get_constraint,
+ .put_constraint = uncore_put_constraint,
+};
+
+static struct attribute *spr_uncore_cha_formats_attr[] = {
+ &format_attr_event.attr,
+ &format_attr_umask_ext4.attr,
+ &format_attr_tid_en2.attr,
+ &format_attr_edge.attr,
+ &format_attr_inv.attr,
+ &format_attr_thresh8.attr,
+ &format_attr_filter_tid5.attr,
+ NULL,
+};
+static const struct attribute_group spr_uncore_chabox_format_group = {
+ .name = "format",
+ .attrs = spr_uncore_cha_formats_attr,
+};
+
+static struct intel_uncore_type spr_uncore_chabox = {
+ .link_name = "cha",
+ .event_mask = SPR_CHA_PMON_EVENT_MASK,
+ .event_mask_ext = SPR_RAW_EVENT_MASK_EXT,
+ .num_shared_regs = 1,
+ .ops = &spr_uncore_chabox_ops,
+ .format_group = &spr_uncore_chabox_format_group,
+};
+
+static struct intel_uncore_type spr_uncore_iio = {
+ .link_name = "iio",
+ .event_mask = SNBEP_PMON_RAW_EVENT_MASK,
+ .event_mask_ext = SNR_IIO_PMON_RAW_EVENT_MASK_EXT,
+ .format_group = &snr_uncore_iio_format_group,
+};
+
+static struct attribute *spr_uncore_raw_formats_attr[] = {
+ &format_attr_event.attr,
+ &format_attr_umask_ext4.attr,
+ &format_attr_edge.attr,
+ &format_attr_inv.attr,
+ &format_attr_thresh8.attr,
+ NULL,
+};
+
+static const struct attribute_group spr_uncore_raw_format_group = {
+ .name = "format",
+ .attrs = spr_uncore_raw_formats_attr,
+};
+
+#define SPR_UNCORE_COMMON_FORMAT() \
+ .event_mask = SNBEP_PMON_RAW_EVENT_MASK, \
+ .event_mask_ext = SPR_RAW_EVENT_MASK_EXT, \
+ .format_group = &spr_uncore_raw_format_group
+
+static struct intel_uncore_type spr_uncore_irp = {
+ SPR_UNCORE_COMMON_FORMAT(),
+ .link_name = "irp",
+
+};
+
+static struct intel_uncore_type spr_uncore_m2pcie = {
+ SPR_UNCORE_COMMON_FORMAT(),
+ .link_name = "m2pcie",
+};
+
+static struct intel_uncore_type spr_uncore_pcu = {
+ .link_name = "pcu",
+};
+
+static void spr_uncore_mmio_enable_event(struct intel_uncore_box *box,
+ struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (!box->io_addr)
+ return;
+
+ if (uncore_pmc_fixed(hwc->idx))
+ writel(SNBEP_PMON_CTL_EN, box->io_addr + hwc->config_base);
+ else
+ writel(hwc->config, box->io_addr + hwc->config_base);
+}
+
+static struct intel_uncore_ops spr_uncore_mmio_ops = {
+ .init_box = intel_generic_uncore_mmio_init_box,
+ .exit_box = uncore_mmio_exit_box,
+ .disable_box = intel_generic_uncore_mmio_disable_box,
+ .enable_box = intel_generic_uncore_mmio_enable_box,
+ .disable_event = intel_generic_uncore_mmio_disable_event,
+ .enable_event = spr_uncore_mmio_enable_event,
+ .read_counter = uncore_mmio_read_counter,
+};
+
+static struct intel_uncore_type spr_uncore_imc = {
+ SPR_UNCORE_COMMON_FORMAT(),
+ .link_name = "imc",
+ .fixed_ctr_bits = 48,
+ .fixed_ctr = SNR_IMC_MMIO_PMON_FIXED_CTR,
+ .fixed_ctl = SNR_IMC_MMIO_PMON_FIXED_CTL,
+ .ops = &spr_uncore_mmio_ops,
+};
+
+static void spr_uncore_pci_enable_event(struct intel_uncore_box *box,
+ struct perf_event *event)
+{
+ struct pci_dev *pdev = box->pci_dev;
+ struct hw_perf_event *hwc = &event->hw;
+
+ pci_write_config_dword(pdev, hwc->config_base + 4, (u32)(hwc->config >> 32));
+ pci_write_config_dword(pdev, hwc->config_base, (u32)hwc->config);
+}
+
+static struct intel_uncore_ops spr_uncore_pci_ops = {
+ .init_box = intel_generic_uncore_pci_init_box,
+ .disable_box = intel_generic_uncore_pci_disable_box,
+ .enable_box = intel_generic_uncore_pci_enable_box,
+ .disable_event = intel_generic_uncore_pci_disable_event,
+ .enable_event = spr_uncore_pci_enable_event,
+ .read_counter = intel_generic_uncore_pci_read_counter,
+};
+
+#define SPR_UNCORE_PCI_COMMON_FORMAT() \
+ SPR_UNCORE_COMMON_FORMAT(), \
+ .ops = &spr_uncore_pci_ops
+
+static struct intel_uncore_type spr_uncore_m2m = {
+ SPR_UNCORE_PCI_COMMON_FORMAT(),
+ .link_name = "m2m",
+};
+
+static struct intel_uncore_type spr_uncore_upi = {
+ SPR_UNCORE_PCI_COMMON_FORMAT(),
+ .link_name = "upi",
+};
+
+static struct intel_uncore_type spr_uncore_m3upi = {
+ SPR_UNCORE_PCI_COMMON_FORMAT(),
+ .link_name = "m3upi",
+};
+
+static struct intel_uncore_type spr_uncore_mdf = {
+ SPR_UNCORE_COMMON_FORMAT(),
+ .link_name = "mdf",
+};
+
+#define UNCORE_SPR_NUM_UNCORE_TYPES 12
+
+static struct intel_uncore_type *spr_uncores[UNCORE_SPR_NUM_UNCORE_TYPES] = {
+ &spr_uncore_chabox,
+ &spr_uncore_iio,
+ &spr_uncore_irp,
+ &spr_uncore_m2pcie,
+ &spr_uncore_pcu,
+ NULL,
+ &spr_uncore_imc,
+ &spr_uncore_m2m,
+ &spr_uncore_upi,
+ &spr_uncore_m3upi,
+ NULL,
+ &spr_uncore_mdf,
+};
+
+static void uncore_type_customized_copy(struct intel_uncore_type *to_type,
+ struct intel_uncore_type *from_type)
+{
+ if (!to_type || !from_type)
+ return;
+
+ if (from_type->name)
+ to_type->name = from_type->name;
+ if (from_type->link_name)
+ to_type->link_name = from_type->link_name;
+ if (from_type->fixed_ctr_bits)
+ to_type->fixed_ctr_bits = from_type->fixed_ctr_bits;
+ if (from_type->event_mask)
+ to_type->event_mask = from_type->event_mask;
+ if (from_type->event_mask_ext)
+ to_type->event_mask_ext = from_type->event_mask_ext;
+ if (from_type->fixed_ctr)
+ to_type->fixed_ctr = from_type->fixed_ctr;
+ if (from_type->fixed_ctl)
+ to_type->fixed_ctl = from_type->fixed_ctl;
+ if (from_type->fixed_ctr_bits)
+ to_type->fixed_ctr_bits = from_type->fixed_ctr_bits;
+ if (from_type->num_shared_regs)
+ to_type->num_shared_regs = from_type->num_shared_regs;
+ if (from_type->constraints)
+ to_type->constraints = from_type->constraints;
+ if (from_type->ops)
+ to_type->ops = from_type->ops;
+ if (from_type->event_descs)
+ to_type->event_descs = from_type->event_descs;
+ if (from_type->format_group)
+ to_type->format_group = from_type->format_group;
+}
+
+static struct intel_uncore_type **
+uncore_get_uncores(enum uncore_access_type type_id)
+{
+ struct intel_uncore_type **types, **start_types;
+
+ start_types = types = intel_uncore_generic_init_uncores(type_id);
+
+ /* Only copy the customized features */
+ for (; *types; types++) {
+ if ((*types)->type_id >= UNCORE_SPR_NUM_UNCORE_TYPES)
+ continue;
+ uncore_type_customized_copy(*types, spr_uncores[(*types)->type_id]);
+ }
+
+ return start_types;
+}
+
+void spr_uncore_cpu_init(void)
+{
+ uncore_msr_uncores = uncore_get_uncores(UNCORE_ACCESS_MSR);
+}
+
+int spr_uncore_pci_init(void)
+{
+ uncore_pci_uncores = uncore_get_uncores(UNCORE_ACCESS_PCI);
+ return 0;
+}
+
+void spr_uncore_mmio_init(void)
+{
+ uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO);
+}
+
+/* end of SPR uncore support */
--
2.7.4

2021-06-24 01:42:02

by Liang, Kan

[permalink] [raw]
Subject: [PATCH 5/7] perf/x86/intel/uncore: Factor out snr_uncore_mmio_map()

From: Kan Liang <[email protected]>

The IMC free-running counters on Sapphire Rapids server are also
accessed by MMIO, which is similar to the previous platforms, SNR and
ICX. The only difference is the device ID of the device which contains
BAR address.

Factor out snr_uncore_mmio_map() which ioremap the MMIO space. It can be
reused in the following patch for SPR.

Use the snr_uncore_mmio_map() in the icx_uncore_imc_freerunning_init_box().
There is no box_ctl for the free-running counters.

Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/events/intel/uncore_snbep.c | 36 +++++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index a4f436a..2696657 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -4792,13 +4792,15 @@ int snr_uncore_pci_init(void)
return 0;
}

-static struct pci_dev *snr_uncore_get_mc_dev(int id)
+#define SNR_MC_DEVICE_ID 0x3451
+
+static struct pci_dev *snr_uncore_get_mc_dev(unsigned int device, int id)
{
struct pci_dev *mc_dev = NULL;
int pkg;

while (1) {
- mc_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x3451, mc_dev);
+ mc_dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, mc_dev);
if (!mc_dev)
break;
pkg = uncore_pcibus_to_dieid(mc_dev->bus);
@@ -4808,16 +4810,17 @@ static struct pci_dev *snr_uncore_get_mc_dev(int id)
return mc_dev;
}

-static void __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
- unsigned int box_ctl, int mem_offset)
+static int snr_uncore_mmio_map(struct intel_uncore_box *box,
+ unsigned int box_ctl, int mem_offset,
+ unsigned int device)
{
- struct pci_dev *pdev = snr_uncore_get_mc_dev(box->dieid);
+ struct pci_dev *pdev = snr_uncore_get_mc_dev(device, box->dieid);
struct intel_uncore_type *type = box->pmu->type;
resource_size_t addr;
u32 pci_dword;

if (!pdev)
- return;
+ return -ENODEV;

pci_read_config_dword(pdev, SNR_IMC_MMIO_BASE_OFFSET, &pci_dword);
addr = (pci_dword & SNR_IMC_MMIO_BASE_MASK) << 23;
@@ -4830,16 +4833,25 @@ static void __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
box->io_addr = ioremap(addr, type->mmio_map_size);
if (!box->io_addr) {
pr_warn("perf uncore: Failed to ioremap for %s.\n", type->name);
- return;
+ return -EINVAL;
}

- writel(IVBEP_PMON_BOX_CTL_INT, box->io_addr);
+ return 0;
+}
+
+static void __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
+ unsigned int box_ctl, int mem_offset,
+ unsigned int device)
+{
+ if (!snr_uncore_mmio_map(box, box_ctl, mem_offset, device))
+ writel(IVBEP_PMON_BOX_CTL_INT, box->io_addr);
}

static void snr_uncore_mmio_init_box(struct intel_uncore_box *box)
{
__snr_uncore_mmio_init_box(box, uncore_mmio_box_ctl(box),
- SNR_IMC_MMIO_MEM0_OFFSET);
+ SNR_IMC_MMIO_MEM0_OFFSET,
+ SNR_MC_DEVICE_ID);
}

static void snr_uncore_mmio_disable_box(struct intel_uncore_box *box)
@@ -5413,7 +5425,8 @@ static void icx_uncore_imc_init_box(struct intel_uncore_box *box)
int mem_offset = (box->pmu->pmu_idx / ICX_NUMBER_IMC_CHN) * ICX_IMC_MEM_STRIDE +
SNR_IMC_MMIO_MEM0_OFFSET;

- __snr_uncore_mmio_init_box(box, box_ctl, mem_offset);
+ __snr_uncore_mmio_init_box(box, box_ctl, mem_offset,
+ SNR_MC_DEVICE_ID);
}

static struct intel_uncore_ops icx_uncore_mmio_ops = {
@@ -5483,7 +5496,8 @@ static void icx_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
int mem_offset = box->pmu->pmu_idx * ICX_IMC_MEM_STRIDE +
SNR_IMC_MMIO_MEM0_OFFSET;

- __snr_uncore_mmio_init_box(box, uncore_mmio_box_ctl(box), mem_offset);
+ snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
+ mem_offset, SNR_MC_DEVICE_ID);
}

static struct intel_uncore_ops icx_uncore_imc_freerunning_ops = {
--
2.7.4

2021-06-24 05:44:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 1/7] driver core: Add a way to get to bus devices kset

On Wed, Jun 23, 2021 at 06:22:03PM -0700, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> Add an accessor function to get to the bus devices kset associated with
> a struct bus_type.
>
> The user of this is the following perf changes, which will need to get
> to the kobj of the 'devices' directory of a certain bus.

What "following perf changes"?

Nothing should be messing with the kobject of a bus, where are those
patches?

> Reviewed-by: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Kan Liang <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
> drivers/base/bus.c | 6 ++++++
> include/linux/device/bus.h | 1 +
> 2 files changed, 7 insertions(+)
>
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 36d0c65..3d621a8 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -899,6 +899,12 @@ struct kset *bus_get_kset(struct bus_type *bus)
> }
> EXPORT_SYMBOL_GPL(bus_get_kset);
>
> +struct kset *bus_get_devices_kset(struct bus_type *bus)
> +{
> + return bus->p->devices_kset;
> +}
> +EXPORT_SYMBOL_GPL(bus_get_devices_kset);

No, sorry, this feels really wrong, why does anyone care about the bus
kset?

thanks,

greg k-h

2021-06-24 05:48:54

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 3/7] perf/x86/intel/uncore: Create a symlink for an uncore PMU

On Wed, Jun 23, 2021 at 06:22:05PM -0700, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> The platform specific support for Sapphire Rapids will apply a
> meaningful name for each uncore PMU. The script which works well with
> the old name may not work anymore because of the name change. To avoid
> the issue, a symlink should be created from the new name to the old
> name.

What script needs a specific name?

And where in Documentation/ABI/ are you documenting this change and new
symlink?

greg k-h

2021-06-24 05:52:49

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Wed, Jun 23, 2021 at 06:22:04PM -0700, [email protected] wrote:
> From: Kan Liang <[email protected]>
>
> A perf PMU may have two PMU names. For example, Intel Sapphire Rapids
> server supports the discovery mechanism. Without the platform-specific
> support, an uncore PMU is named by a type ID plus a box ID, e.g.,
> uncore_type_0_0, because the real name of the uncore PMU cannot be
> retrieved from the discovery table. With the platform-specific support
> later, perf has the mapping information from a type ID to a specific
> uncore unit. Just like the previous platforms, the uncore PMU will be
> named by the real PMU name, e.g., uncore_cha_0. The user scripts which
> work well with the old name may not work anymore. To avoid the issue, a
> symlink should be created from the new to the old name.
>
> The perf PMU devices are created under /sys/bus/event_source/devices/.
> The symlink should be created in the same directory to facilitate the
> perf tool.
>
> Add a new variable, link_name, to store the new name of a PMU. Link it
> to the old name if applies.

Ah, here is the "new name". This needs to be documented.

But first off, why is this symlink suddenly needed? What is so special
about this new hardware that it breaks the existing model?

thanks,

greg k-h

2021-06-24 14:26:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


> But first off, why is this symlink suddenly needed? What is so special
> about this new hardware that it breaks the existing model?

The driver can be in two modes:

- Driver fully knows the hardware and puts in the correct Linux names

- Driver doesn't know the hardware but is in a fallback mode where it
only looks at a discovery table. There we don't have the correct names,
just an numeric identifier for the different hardware sub components.

In the later mode the numeric identifier is used in sysfs, in the former
case the full Linux name. But we want to keep some degree of Linux user
space compatibility between the two, that is why the full mode creates a
symlink from the "numeric" name. This way the (ugly) identifiers needed
for the fallback mode work everywhere.

-Andi

2021-06-24 14:31:44

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
>
> > But first off, why is this symlink suddenly needed? What is so special
> > about this new hardware that it breaks the existing model?
>
> The driver can be in two modes:
>
> - Driver fully knows the hardware and puts in the correct Linux names
>
> - Driver doesn't know the hardware but is in a fallback mode where it only
> looks at a discovery table. There we don't have the correct names, just an
> numeric identifier for the different hardware sub components.

Why does this matter? Why would the driver not "know" the hardware? If
it doesn't know it, why would it bind to it?

> In the later mode the numeric identifier is used in sysfs, in the former
> case the full Linux name. But we want to keep some degree of Linux user
> space compatibility between the two, that is why the full mode creates a
> symlink from the "numeric" name. This way the (ugly) identifiers needed for
> the fallback mode work everywhere.

So what _exactly_ does the symlink do here? What is it from->to?

And where is it being documented? What userspace tool needs to be fixed
up so that the symlink can be removed?

thanks,

greg k-h

2021-06-24 15:26:26

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


On 6/24/2021 7:29 AM, Greg KH wrote:
> On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
>>> But first off, why is this symlink suddenly needed? What is so special
>>> about this new hardware that it breaks the existing model?
>> The driver can be in two modes:
>>
>> - Driver fully knows the hardware and puts in the correct Linux names
>>
>> - Driver doesn't know the hardware but is in a fallback mode where it only
>> looks at a discovery table. There we don't have the correct names, just an
>> numeric identifier for the different hardware sub components.
> Why does this matter? Why would the driver not "know" the hardware? If
> it doesn't know it, why would it bind to it?

It's a similar concept as a PCI class. How to have a driver that can
handle future hardware, but with some restrictions

The perf CPU PMU has had a similar concept for a long time. The driver
can be either in architectural mode (with a subset of features), or be
fully enabled. This allows users who are on an older kernel to still use
at least a subset of the functionality.

It will bind as long as the discovery table is there.

>
>> In the later mode the numeric identifier is used in sysfs, in the former
>> case the full Linux name. But we want to keep some degree of Linux user
>> space compatibility between the two, that is why the full mode creates a
>> symlink from the "numeric" name. This way the (ugly) identifiers needed for
>> the fallback mode work everywhere.
> So what _exactly_ does the symlink do here? What is it from->to?

It's from numeric identifier to full perf name

In fallback mode there is no symlink, only the numeric identifier.


>
> And where is it being documented? What userspace tool needs to be fixed
> up so that the symlink can be removed?

The names are visible in the perf command lines. Perf supports either
name without changes. So it's not about fixing a specific tool, but
about using the drivers in both modes, with limited compatibility
between the two.

Yes probably it needs better documentation.

-Andi

2021-06-24 15:33:01

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Thu, Jun 24, 2021 at 08:24:29AM -0700, Andi Kleen wrote:
>
> On 6/24/2021 7:29 AM, Greg KH wrote:
> > On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
> > > > But first off, why is this symlink suddenly needed? What is so special
> > > > about this new hardware that it breaks the existing model?
> > > The driver can be in two modes:
> > >
> > > - Driver fully knows the hardware and puts in the correct Linux names
> > >
> > > - Driver doesn't know the hardware but is in a fallback mode where it only
> > > looks at a discovery table. There we don't have the correct names, just an
> > > numeric identifier for the different hardware sub components.
> > Why does this matter? Why would the driver not "know" the hardware? If
> > it doesn't know it, why would it bind to it?
>
> It's a similar concept as a PCI class. How to have a driver that can handle
> future hardware, but with some restrictions

But this is NOT how busses work in the driver model.

PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
a driver goes from being handled by "generic_pci_type_foo" to
"vendor_foo". Userspace can handle the change and life goes on.

> The perf CPU PMU has had a similar concept for a long time. The driver can
> be either in architectural mode (with a subset of features), or be fully
> enabled. This allows users who are on an older kernel to still use at least
> a subset of the functionality.

So a device name will move from "generic" to "specific", right?

Why does a bus have to do with any of this?

> > > In the later mode the numeric identifier is used in sysfs, in the former
> > > case the full Linux name. But we want to keep some degree of Linux user
> > > space compatibility between the two, that is why the full mode creates a
> > > symlink from the "numeric" name. This way the (ugly) identifiers needed for
> > > the fallback mode work everywhere.
> > So what _exactly_ does the symlink do here? What is it from->to?
>
> It's from numeric identifier to full perf name
>
> In fallback mode there is no symlink, only the numeric identifier.

Those two sentences do not describe a sysfs path to me.

Where are the Documentation/ABI/ entries for all of this?

> > And where is it being documented? What userspace tool needs to be fixed
> > up so that the symlink can be removed?
>
> The names are visible in the perf command lines. Perf supports either name
> without changes. So it's not about fixing a specific tool, but about using
> the drivers in both modes, with limited compatibility between the two.

But a driver does not caer. And if perf does not care, who cares?

still totally confused,

greg k-h

2021-06-24 17:09:42

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU



On 6/24/2021 11:31 AM, Greg KH wrote:
> On Thu, Jun 24, 2021 at 08:24:29AM -0700, Andi Kleen wrote:
>>
>> On 6/24/2021 7:29 AM, Greg KH wrote:
>>> On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
>>>>> But first off, why is this symlink suddenly needed? What is so special
>>>>> about this new hardware that it breaks the existing model?
>>>> The driver can be in two modes:
>>>>
>>>> - Driver fully knows the hardware and puts in the correct Linux names
>>>>
>>>> - Driver doesn't know the hardware but is in a fallback mode where it only
>>>> looks at a discovery table. There we don't have the correct names, just an
>>>> numeric identifier for the different hardware sub components.
>>> Why does this matter? Why would the driver not "know" the hardware? If
>>> it doesn't know it, why would it bind to it?
>>
>> It's a similar concept as a PCI class. How to have a driver that can handle
>> future hardware, but with some restrictions
>
> But this is NOT how busses work in the driver model.
>
> PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
> a driver goes from being handled by "generic_pci_type_foo" to
> "vendor_foo". Userspace can handle the change and life goes on.
>
>> The perf CPU PMU has had a similar concept for a long time. The driver can
>> be either in architectural mode (with a subset of features), or be fully
>> enabled. This allows users who are on an older kernel to still use at least
>> a subset of the functionality.
>
> So a device name will move from "generic" to "specific", right?

We'd like to keep both names.

Here is the whole story.

This patch set provides platform-specific support for Sapphire Rapids
server on top of the discovery mechanism which was recently introduced
in the uncore driver.

https://lore.kernel.org/lkml/[email protected]/

The discovery mechanism is a mechanism of self-describing HW. By reading
through an MMIO page worth of information, the uncore driver can
‘discover’ all the standard uncore PMON registers. The driver "know" the
hardware.
However, the discovery mechanism only provides basic uncore information.
It doesn't provide a meaningful name for each uncore unit, but a type
ID. So we can only name an uncore PMU by it's type ID (numeric name),
e.g., uncore_type_0_0. End uses can use it to access the uncore counters.

With this platform-specific patch, the driver can "know" more about the
hardware, e.g., the mapping between a type ID and an meaningful uncore
name. It can creates an uncore PMU with a meaningful name just like the
previous platforms.

It usually takes some time to implement a platform-specific patch. Some
early user or users still have old drivers may have already wrote some
scripts (which composed of many perf commands for core and uncore
events) with the numeric name, and done some basic test. We don't want
to bother them to update their scripts for the new driver. So we'd like
to keep both names and add a symlink from a meaningful name to a type ID
name. For example,

/sys/bus/event_source/devices/uncore_iio_0 ->
../../../devices/uncore_type_1_0

>
> Why does a bus have to do with any of this?

Both names should be created under /sys/bus/event_source/devices/

Exposing the bus_get_devices_kset() is to get the bus devices kset,
which is used to create the symlink.

>
>>>> In the later mode the numeric identifier is used in sysfs, in the former
>>>> case the full Linux name. But we want to keep some degree of Linux user
>>>> space compatibility between the two, that is why the full mode creates a
>>>> symlink from the "numeric" name. This way the (ugly) identifiers needed for
>>>> the fallback mode work everywhere.
>>> So what _exactly_ does the symlink do here? What is it from->to?
>>
>> It's from numeric identifier to full perf name
>>
>> In fallback mode there is no symlink, only the numeric identifier.
>
> Those two sentences do not describe a sysfs path to me.
>
> Where are the Documentation/ABI/ entries for all of this?

The below patch created a file for uncore to explain the naming rules.
I also found that the current sysfs-devices-mapping is for uncore driver
as well. So I move it into the uncore document.


>
>>> And where is it being documented? What userspace tool needs to be fixed
>>> up so that the symlink can be removed?
>>
>> The names are visible in the perf command lines. Perf supports either name
>> without changes. So it's not about fixing a specific tool, but about using
>> the drivers in both modes, with limited compatibility between the two.
>
> But a driver does not caer. And if perf does not care, who cares?
>

That's to facilitate the user's script which composed of many perf
commands for core and uncore events.

They may run the script on both the old driver (without
platform-specific support) and the new driver (with platform-specific
support). The numeric name with the type ID is not deprecated. So they
don't need to update the uncore PMU names in their scripts when updating
the driver.



From f6989d87988866994ed4e198be16aa44e3f682eb Mon Sep 17 00:00:00 2001
From: Kan Liang <[email protected]>
Date: Thu, 24 Jun 2021 09:16:21 -0700
Subject: [PATCH] docs: ABI: testing: Add perf uncore PMU support

Document the naming rules for the uncore related PMUs.

Move the sysfs-devices-mapping into this uncore document. The attribute
is only available for uncore PMUs. Update the location of the attribute
to /sys/bus/event_source/devices/. Update the Contact which is not
available anymore.

Signed-off-by: Kan Liang <[email protected]>
---
.../testing/sysfs-bus-event_source-devices-uncore | 126
+++++++++++++++++++++
Documentation/ABI/testing/sysfs-devices-mapping | 34 ------
2 files changed, 126 insertions(+), 34 deletions(-)
create mode 100644
Documentation/ABI/testing/sysfs-bus-event_source-devices-uncore
delete mode 100644 Documentation/ABI/testing/sysfs-devices-mapping

diff --git
a/Documentation/ABI/testing/sysfs-bus-event_source-devices-uncore
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-uncore
new file mode 100644
index 0000000..612fa19
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-uncore
@@ -0,0 +1,126 @@
+What: /sys/bus/event_source/devices/uncore_<unit_name>[_<unit_idx>]
+Date: June 2021
+KernelVersion: 3.5
+Contact: Linux kernel mailing list <[email protected]>
+Description: Generic Uncore performance monitoring unit (PMU)
+
+ A collection of uncore performance monitoring units that
+ may be supported. These units can be monitored using the
+ 'perf(1)' tool.
+
+ The name of each uncore PMU would look like:
+
+ uncore_<unit_name>[_<unit_idx>]
+
+ Where <unit_name> is the name of the uncore PMU.
+ If the unit number is more than one, <unit_idx>
+ indicates the index.
+
+ Examples:
+
+ The second uncore PMU for the Caching/Home Agent.
+
+ uncore_cha_2
+
+ There is only one uncore PMU for the Power
+ Control Unit
+
+ uncore_pcu
+
+
+ A mechanism of self-describing HW for the uncore PMU has
+ been introduced since the Intel Sapphire Rapids server.
+ Perf can provide basic uncore support based on the
+ generic uncore unit information provided by the
+ discovery mechanism. But the discovery mechanism only
+ provides a unit type ID for an uncore PMU, not a unit
+ name. The unit name can only be parsed with the later
+ platform-specific support.
+
+ On a platform that supports the discovery mechanism,
+ but lacks platform-specific support, the type ID is used
+ to name a uncore PMU.
+
+ uncore_type_<type_ID>[_<unit_idx>]
+
+ Examples:
+
+ The second uncore PMU for the type 0 unit
+
+ uncore_type_0_2
+
+ When the platform-specific support is ready on the
+ platform that supports the discovery mechanism. The unit
+ name should be used to name the uncore PMU. But the
+ numeric name with the type ID is not deprecated. Both
+ names coexist. A symlink is created to link the two
+ names.
+
+ Examples:
+
+ /sys/bus/event_source/devices/uncore_iio_0 ->
+ ../../../devices/uncore_type_1_0
+
+ For users, who may use both the old driver (without
+ platform-specific support) and the new driver (with
+ platform-specific support), the numeric name with the
+ type ID is recommended, since they don't need to update
+ the uncore PMU names in their scripts when updating the
+ driver.
+
+ The attribute groups created under the PMU are similar to
+ other core PMUs.
+ (See ABI/testing/sysfs-bus-event_source-devices-format and
+ ABI/testing/sysfs-bus-event_source-devices-events).
+
+
+What:
/sys/bus/event_source/devices/uncore_<unit_name>_free_running_<unit_idx>
+Date: June 2021
+KernelVersion: 4.17
+Contact: Linux kernel mailing list <[email protected]>
+Description: Uncore Free-running performance monitoring unit (PMU)
+
+ A collection of uncore Free-running performance
+ monitoring units that may be supported. These units can
+ be monitored using the 'perf(1)' tool.
+
+ The attribute groups created under the PMU are similar to
+ other core PMUs.
+ (See ABI/testing/sysfs-bus-event_source-devices-format and
+ ABI/testing/sysfs-bus-event_source-devices-events).
+
+What: /sys/bus/event_source/devices/uncore_iio_x/dieX
+Date: February 2020
+Contact: Linux kernel mailing list <[email protected]>
+Description:
+ Each IIO stack (PCIe root port) has its own IIO PMON
block, so
+ each dieX file (where X is die number) holds
"Segment:Root Bus"
+ for PCIe root port, which can be monitored by that IIO PMON
+ block.
+ For example, on 4-die Xeon platform with up to 6 IIO
stacks per
+ die and, therefore, 6 IIO PMON blocks per die, the
mapping of
+ IIO PMON block 0 exposes as the following::
+
+ $ ls /sys/bus/event_source/devices/uncore_iio_0/die*
+ -r--r--r-- /sys/bus/event_source/devices/uncore_iio_0/die0
+ -r--r--r-- /sys/bus/event_source/devices/uncore_iio_0/die1
+ -r--r--r-- /sys/bus/event_source/devices/uncore_iio_0/die2
+ -r--r--r-- /sys/bus/event_source/devices/uncore_iio_0/die3
+
+ $ tail /sys/bus/event_source/devices/uncore_iio_0/die*
+ ==> /sys/bus/event_source/devices/uncore_iio_0/die0 <==
+ 0000:00
+ ==> /sys/bus/event_source/devices/uncore_iio_0/die1 <==
+ 0000:40
+ ==> /sys/bus/event_source/devices/uncore_iio_0/die2 <==
+ 0000:80
+ ==> /sys/bus/event_source/devices/uncore_iio_0/die3 <==
+ 0000:c0
+
+ Which means::
+
+ IIO PMU 0 on die 0 belongs to PCI RP on bus 0x00, domain 0x0000
+ IIO PMU 0 on die 1 belongs to PCI RP on bus 0x40, domain 0x0000
+ IIO PMU 0 on die 2 belongs to PCI RP on bus 0x80, domain 0x0000
+ IIO PMU 0 on die 3 belongs to PCI RP on bus 0xc0, domain 0x0000
+
diff --git a/Documentation/ABI/testing/sysfs-devices-mapping
b/Documentation/ABI/testing/sysfs-devices-mapping
deleted file mode 100644
index 8d202ba..0000000
--- a/Documentation/ABI/testing/sysfs-devices-mapping
+++ /dev/null
@@ -1,34 +0,0 @@
-What: /sys/devices/uncore_iio_x/dieX
-Date: February 2020
-Contact: Roman Sudarikov <[email protected]>
-Description:
- Each IIO stack (PCIe root port) has its own IIO PMON
block, so
- each dieX file (where X is die number) holds
"Segment:Root Bus"
- for PCIe root port, which can be monitored by that IIO PMON
- block.
- For example, on 4-die Xeon platform with up to 6 IIO
stacks per
- die and, therefore, 6 IIO PMON blocks per die, the
mapping of
- IIO PMON block 0 exposes as the following::
-
- $ ls /sys/devices/uncore_iio_0/die*
- -r--r--r-- /sys/devices/uncore_iio_0/die0
- -r--r--r-- /sys/devices/uncore_iio_0/die1
- -r--r--r-- /sys/devices/uncore_iio_0/die2
- -r--r--r-- /sys/devices/uncore_iio_0/die3
-
- $ tail /sys/devices/uncore_iio_0/die*
- ==> /sys/devices/uncore_iio_0/die0 <==
- 0000:00
- ==> /sys/devices/uncore_iio_0/die1 <==
- 0000:40
- ==> /sys/devices/uncore_iio_0/die2 <==
- 0000:80
- ==> /sys/devices/uncore_iio_0/die3 <==
- 0000:c0
-
- Which means::
-
- IIO PMU 0 on die 0 belongs to PCI RP on bus 0x00, domain 0x0000
- IIO PMU 0 on die 1 belongs to PCI RP on bus 0x40, domain 0x0000
- IIO PMU 0 on die 2 belongs to PCI RP on bus 0x80, domain 0x0000
- IIO PMU 0 on die 3 belongs to PCI RP on bus 0xc0, domain 0x0000
--
2.7.4

Thanks,
Kan


2021-06-24 17:30:40

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


> But this is NOT how busses work in the driver model.
>
> PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
> a driver goes from being handled by "generic_pci_type_foo" to
> "vendor_foo". Userspace can handle the change and life goes on.

In perf this is exposed to the user command line, and lots of people
configure their own events, so it's very common in scripts. To use the
pmu you have to use something like perf stat -a -e uncore_foo/.../. So
it's not a single thing that could be patched up.

The perf tools supports PMUs abstractly and doesn't have special
handling for every PMU. Also the perf design was always that the kernel
should provide these abstractions with the user tool being (mostly)
generic and abstract.


> So a device name will move from "generic" to "specific", right?
>
> Why does a bus have to do with any of this?

The perf pmus are in /sys/devices, so the symlinks for compatibility
have to be created there.


> But a driver does not caer. And if perf does not care, who cares?


The users who write scripts that specify the perf events on the perf
command line.


-Andi


2021-06-24 17:38:01

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Thu, Jun 24, 2021 at 01:07:14PM -0400, Liang, Kan wrote:
>
>
> On 6/24/2021 11:31 AM, Greg KH wrote:
> > On Thu, Jun 24, 2021 at 08:24:29AM -0700, Andi Kleen wrote:
> > >
> > > On 6/24/2021 7:29 AM, Greg KH wrote:
> > > > On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
> > > > > > But first off, why is this symlink suddenly needed? What is so special
> > > > > > about this new hardware that it breaks the existing model?
> > > > > The driver can be in two modes:
> > > > >
> > > > > - Driver fully knows the hardware and puts in the correct Linux names
> > > > >
> > > > > - Driver doesn't know the hardware but is in a fallback mode where it only
> > > > > looks at a discovery table. There we don't have the correct names, just an
> > > > > numeric identifier for the different hardware sub components.
> > > > Why does this matter? Why would the driver not "know" the hardware? If
> > > > it doesn't know it, why would it bind to it?
> > >
> > > It's a similar concept as a PCI class. How to have a driver that can handle
> > > future hardware, but with some restrictions
> >
> > But this is NOT how busses work in the driver model.
> >
> > PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
> > a driver goes from being handled by "generic_pci_type_foo" to
> > "vendor_foo". Userspace can handle the change and life goes on.
> >
> > > The perf CPU PMU has had a similar concept for a long time. The driver can
> > > be either in architectural mode (with a subset of features), or be fully
> > > enabled. This allows users who are on an older kernel to still use at least
> > > a subset of the functionality.
> >
> > So a device name will move from "generic" to "specific", right?
>
> We'd like to keep both names.

That is not how sysfs and the driver model works, sorry. You don't get
to keep both names, otherwise sysfs would be even more of a mess than it
currently is. What happens if you need "another" name in the future?
When do you stop?

this isn't ok, please do it right.

greg k-h

2021-06-24 17:51:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


> That is not how sysfs and the driver model works, sorry. You don't get
> to keep both names, otherwise sysfs would be even more of a mess than it
> currently is. What happens if you need "another" name in the future?
> When do you stop

I don't see any scenario where we would ever need more than a single
symlink.

I believe there is already precedent for this elsewhere.

>
> this isn't ok, please do it right.

I don't see what exactly are you proposing.

Are you proposing to break every perf script on a kernel update? Doesn't
seem acceptable to me.

Or move the compatibility into the perf tool? That would require the
users to both update the perf tool and the kernel. I suppose it would be
possible, but it would be totally against the standard perf abstraction
design. Besides there are other consumers of this than just the perf
tool, so it could possibly have a wide impact.

-Andi

2021-06-25 05:18:42

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Thu, Jun 24, 2021 at 01:07:14PM -0400, Liang, Kan wrote:
>
>
> On 6/24/2021 11:31 AM, Greg KH wrote:
> > On Thu, Jun 24, 2021 at 08:24:29AM -0700, Andi Kleen wrote:
> > >
> > > On 6/24/2021 7:29 AM, Greg KH wrote:
> > > > On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
> > > > > > But first off, why is this symlink suddenly needed? What is so special
> > > > > > about this new hardware that it breaks the existing model?
> > > > > The driver can be in two modes:
> > > > >
> > > > > - Driver fully knows the hardware and puts in the correct Linux names
> > > > >
> > > > > - Driver doesn't know the hardware but is in a fallback mode where it only
> > > > > looks at a discovery table. There we don't have the correct names, just an
> > > > > numeric identifier for the different hardware sub components.
> > > > Why does this matter? Why would the driver not "know" the hardware? If
> > > > it doesn't know it, why would it bind to it?
> > >
> > > It's a similar concept as a PCI class. How to have a driver that can handle
> > > future hardware, but with some restrictions
> >
> > But this is NOT how busses work in the driver model.
> >
> > PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
> > a driver goes from being handled by "generic_pci_type_foo" to
> > "vendor_foo". Userspace can handle the change and life goes on.
> >
> > > The perf CPU PMU has had a similar concept for a long time. The driver can
> > > be either in architectural mode (with a subset of features), or be fully
> > > enabled. This allows users who are on an older kernel to still use at least
> > > a subset of the functionality.
> >
> > So a device name will move from "generic" to "specific", right?
>
> We'd like to keep both names.

That is not how the driver model works, sorry.

Stick with the first name you got (i.e. the one in the kernel now). Do
not try to use symlinks for things that should not be symlinked.

If a device changes names, wonderful, deal with that in userspace, we do
so all the time in tools for busses because device names are never
guaranteed to be the same each boot. But bus names do not change for
the obvious reason that bus names are not dynamic, you pick them at
build time.

If you have userspace code that relies on device names to be static and
never change, then you need to stick with that as that was your choice
when you created that api.

thanks,

greg k-h

2021-06-25 05:20:28

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Thu, Jun 24, 2021 at 10:49:42AM -0700, Andi Kleen wrote:
> Are you proposing to break every perf script on a kernel update? Doesn't
> seem acceptable to me.

I'm suggesting that you not submit kernel code that breaks the device
names of things that you previously stated would never change.

thanks,

greg k-h

2021-06-25 05:21:03

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Thu, Jun 24, 2021 at 10:28:36AM -0700, Andi Kleen wrote:
> > But a driver does not caer. And if perf does not care, who cares?
>
> The users who write scripts that specify the perf events on the perf command
> line.

Great, then as you have deemed the device name part of your documented
api, then keep it stable as that is what you decided to do a long time
ago when it was created.

thanks,

greg k-h

2021-06-25 14:24:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


On 6/24/2021 10:19 PM, Greg KH wrote:
> On Thu, Jun 24, 2021 at 10:28:36AM -0700, Andi Kleen wrote:
>>> But a driver does not caer. And if perf does not care, who cares?
>> The users who write scripts that specify the perf events on the perf command
>> line.
> Great, then as you have deemed the device name part of your documented
> api, then keep it stable as that is what you decided to do a long time
> ago when it was created.

The fully supported driver keeps it stable, but the driver in fallback
mode (as in running on a yet unknown system) can't because it doesn't
have enough information. It has to fall back to the numeric identifiers.

Supporting yet unknown systems is a big advantage, missing full kernel
support is the number one reason people can't use uncore monitoring today.

The symlink keeps some degree of compatibility between the two.

Anyways thinking about it if Greg doesn't want symlinks (even though
sysfs already has symlinks elsewhere), maybe we could just create two
devices without symlinks. Kan, do you think that would work?

-Andi

2021-06-25 14:39:58

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Fri, Jun 25, 2021 at 07:22:08AM -0700, Andi Kleen wrote:
>
> On 6/24/2021 10:19 PM, Greg KH wrote:
> > On Thu, Jun 24, 2021 at 10:28:36AM -0700, Andi Kleen wrote:
> > > > But a driver does not caer. And if perf does not care, who cares?
> > > The users who write scripts that specify the perf events on the perf command
> > > line.
> > Great, then as you have deemed the device name part of your documented
> > api, then keep it stable as that is what you decided to do a long time
> > ago when it was created.
>
> The fully supported driver keeps it stable, but the driver in fallback mode
> (as in running on a yet unknown system) can't because it doesn't have enough
> information. It has to fall back to the numeric identifiers.
>
> Supporting yet unknown systems is a big advantage, missing full kernel
> support is the number one reason people can't use uncore monitoring today.
>
> The symlink keeps some degree of compatibility between the two.

But it creates something that is not needed at the moment, and then
userspace will rely on it. Don't make userspace rely on it today and
all should be fine.

Device names will change, that's always a given, as the kernel can never
always make them the same. That's why userspace needs to scan the bus
for all devices and then pick out the one that it wants to look at.
Don't hard-encode device names into userspace tools, that way lies
madness.

> Anyways thinking about it if Greg doesn't want symlinks (even though sysfs
> already has symlinks elsewhere), maybe we could just create two devices
> without symlinks. Kan, do you think that would work?

Do not have 2 different structures represent the same hardware device,
that too is a shortcut to madness.

What prevents userspace from handling device names changing today? Why
are you forcing userspace to pick a specific device name at all?

thanks,

greg k-h

2021-06-25 14:50:38

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


> Device names will change, that's always a given, as the kernel can never
> always make them the same. That's why userspace needs to scan the bus
> for all devices and then pick out the one that it wants to look at.

In perf the tool doesn't normally know what devices (= pmu) the users
want to look at. It's all specified on the command line depending on
what events you want to measure. There's no way for the tool to figure
that out on its own.


> Don't hard-encode device names into userspace tools, that way lies
> madness.

There's no hard coding in the tools (or at least not for the non json
event list case). It all comes from the command line. But that is where
the problem comes from.

>
>> Anyways thinking about it if Greg doesn't want symlinks (even though sysfs
>> already has symlinks elsewhere), maybe we could just create two devices
>> without symlinks. Kan, do you think that would work?
> Do not have 2 different structures represent the same hardware device,
> that too is a shortcut to madness.
>
> What prevents userspace from handling device names changing today? Why
> are you forcing userspace to pick a specific device name at all?

The way the perf tool works is that you have to specify the names on the
command line:

perf stat -a -e uncore_cha/event=1/ ...

With the numeric identifiers it would be

perf stat -a -e uncore_type_X_Y/event=1/

The tool handles it all abstractly.

So yes the user tools itself can handle it. But the problem is that it
is directly exposed to the users, so the users would need to change all
their scripts when switching between the two cases. That is what we're
trying to avoid -- provide them a way that works on both.

-Andi


2021-06-25 15:07:35

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU



On 6/25/2021 10:49 AM, Andi Kleen wrote:
>
>> Device names will change, that's always a given, as the kernel can never
>> always make them the same.  That's why userspace needs to scan the bus
>> for all devices and then pick out the one that it wants to look at.
>
> In perf the tool doesn't normally know what devices (= pmu) the users
> want to look at. It's all specified on the command line depending on
> what events you want to measure. There's no way for the tool to figure
> that out on its own.
>
>
>> Don't hard-encode device names into userspace tools, that way lies
>> madness.
>
> There's no hard coding in the tools (or at least not for the non json
> event list case). It all comes from the command line. But that is where
> the problem comes from.
>
>>
>>> Anyways thinking about it if Greg doesn't want symlinks (even though
>>> sysfs
>>> already has symlinks elsewhere), maybe we could just create two devices
>>> without symlinks. Kan, do you think that would work?
>> Do not have 2 different structures represent the same hardware device,
>> that too is a shortcut to madness.
>>
>> What prevents userspace from handling device names changing today?  Why
>> are you forcing userspace to pick a specific device name at all?
>
> The way the perf tool works is that you have to specify the names on the
> command line:
>
> perf stat -a -e uncore_cha/event=1/ ...
>
> With the numeric identifiers it would be
>
> perf stat -a -e uncore_type_X_Y/event=1/
>
> The tool handles it all abstractly.
>
> So yes the user tools itself can handle it. But the problem is that it
> is directly exposed to the users, so the users would need to change all
> their scripts when switching between the two cases. That is what we're
> trying to avoid -- provide them a way that works on both.
>

We have an attribute "caps/pmu_name" for the core PMU. Maybe we should
add it for uncore PMU as well. For example,

$ cat /sys/devices/uncore_type_0_0/caps/pmu_name
cha_0

Userspace tool can get clues about what type_0_0 is.

Thanks,
Kan

2021-06-25 15:46:04

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


> We have an attribute "caps/pmu_name" for the core PMU. Maybe we should
> add it for uncore PMU as well. For example,
>
> $ cat /sys/devices/uncore_type_0_0/caps/pmu_name
> cha_0
>
> Userspace tool can get clues about what type_0_0 is.

It would break all the old tools, but I suppose it could work for
updated tools.

It isn't only perf that is parsing this, but some other libraries too.
They all would need to be updated too.

I think I still prefer the symlink. It would just work and keep full
compatibility.

But yes maybe there is no choice.

-Andi


2021-06-25 15:59:33

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU



On 6/25/2021 11:44 AM, Andi Kleen wrote:
>
>> We have an attribute "caps/pmu_name" for the core PMU. Maybe we should
>> add it for uncore PMU as well. For example,
>>
>> $ cat /sys/devices/uncore_type_0_0/caps/pmu_name
>> cha_0
>>
>> Userspace tool can get clues about what type_0_0 is.
>
> It would break all the old tools, but I suppose it could work for
> updated tools.
>

Right, users have to update their perf tool to use the new name,
uncore_cha_0.

> It isn't only perf that is parsing this, but some other libraries too.
> They all would need to be updated too.
>

Yes, I will update the Documentation/ABI/ in V2 to indicate the changes.
If any other libraries are broken, they may get clues from the Document.

Thanks,
Kan

> I think I still prefer the symlink. It would just work and keep full
> compatibility.
>
> But yes maybe there is no choice.
>
> -Andi
>
>

2021-06-25 16:22:29

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU



On 6/25/2021 11:57 AM, Liang, Kan wrote:
>
>
> On 6/25/2021 11:44 AM, Andi Kleen wrote:
>>
>>> We have an attribute "caps/pmu_name" for the core PMU. Maybe we
>>> should add it for uncore PMU as well. For example,
>>>
>>> $ cat /sys/devices/uncore_type_0_0/caps/pmu_name
>>> cha_0
>>>
>>> Userspace tool can get clues about what type_0_0 is.
>>
>> It would break all the old tools, but I suppose it could work for
>> updated tools.
>>
>
> Right, users have to update their perf tool to use the new name,
> uncore_cha_0.

I think the above example is misleading. Let me rephrase.
Here is what I'm planing to do in V2.

With the V2 platform-specific patch, uncore driver will only create a
meaningful uncore name, e.g., uncore_cha_0.

An attribute "caps/pmu_name" is also created to indicate the previous
name. For example,

$ cat /sys/devices/uncore_cha_0/caps/pmu_name
type_0_0

If any users use the old numeric name, they have to update either their
script or a perf tool which supports "caps/pmu_name".

In the future, if the users already use a perf tool which supports
"caps/pmu_name", nothing needs to be updated. The old numeric name
should just work.

Thanks,
Kan

2021-06-27 11:03:50

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Fri, Jun 25, 2021 at 07:49:36AM -0700, Andi Kleen wrote:
>
> > Device names will change, that's always a given, as the kernel can never
> > always make them the same. That's why userspace needs to scan the bus
> > for all devices and then pick out the one that it wants to look at.
>
> In perf the tool doesn't normally know what devices (= pmu) the users want
> to look at. It's all specified on the command line depending on what events
> you want to measure. There's no way for the tool to figure that out on its
> own.
>
>
> > Don't hard-encode device names into userspace tools, that way lies
> > madness.
>
> There's no hard coding in the tools (or at least not for the non json event
> list case). It all comes from the command line. But that is where the
> problem comes from.

Then do not break things by renaming the device name, as you all have
now stated that this name is part of the user/kernel api.

But really, I do not see why this is an issue, why isn't userspace just
properly walking the list of devices and picking the one on this
specific system that they want to look at?

> > > Anyways thinking about it if Greg doesn't want symlinks (even though sysfs
> > > already has symlinks elsewhere), maybe we could just create two devices
> > > without symlinks. Kan, do you think that would work?
> > Do not have 2 different structures represent the same hardware device,
> > that too is a shortcut to madness.
> >
> > What prevents userspace from handling device names changing today? Why
> > are you forcing userspace to pick a specific device name at all?
>
> The way the perf tool works is that you have to specify the names on the
> command line:
>
> perf stat -a -e uncore_cha/event=1/ ...
>
> With the numeric identifiers it would be
>
> perf stat -a -e uncore_type_X_Y/event=1/
>
> The tool handles it all abstractly.

Great, and that device name is something that is unique per machine.
And per boot. So why are you suddenly thinking that this name has to be
"stable"?

If you think it does have to be stable, that was your choice, so now you
must keep it stable. Don't try to mess with symlinks and the like
please, as again, that way lies madness and unmaintainability for the
next 20+ years.

> So yes the user tools itself can handle it. But the problem is that it is
> directly exposed to the users, so the users would need to change all their
> scripts when switching between the two cases. That is what we're trying to
> avoid -- provide them a way that works on both.

But these are different systems! Why would anyone expect that the
device name is the same on different systems? If you insist on keeping
the name identical for newer kernel versions, then again, that was your
choice and now you have to do that. Do not try to work around your own
requirement by using a symlink.

greg k-h

2021-06-27 16:32:28

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


> Then do not break things by renaming the device name, as you all have
> now stated that this name is part of the user/kernel api.

The renaming comes from the fallback mode on future systems. In the
fallback mode the driver doesn't know the true name, so it has to use 
the numeric name. If you don't use the fallback mode and have the full
driver then yes you'll get the same names as always (or at least as they
make sense for the hardware).

But we would like to have the fallback mode too to allow more people use
uncore monitoring, and that's where the need to for the second name
comes in.

>
> But really, I do not see why this is an issue, why isn't userspace just
> properly walking the list of devices and picking the one on this
> specific system that they want to look at?

perf is not an fully automated tool that knows what it wants to look at.
It's not like udev etc.

It's a tool to let the user specify what they want to measure on the
command line. And that specification is through the pmu name.

Of course it walks the list to find the name, but it can only chose
based on the name the user specified.

It's like the ftrace tool doesn't know what trace points to measure on
its own, it just knows what is specified on the command line. Or the ip
tool doesn't know on its own what network device names to use for some
network configuration, it has to get the information from the command line.


>
>>>> Anyways thinking about it if Greg doesn't want symlinks (even though sysfs
>>>> already has symlinks elsewhere), maybe we could just create two devices
>>>> without symlinks. Kan, do you think that would work?
>>> Do not have 2 different structures represent the same hardware device,
>>> that too is a shortcut to madness.
>>>
>>> What prevents userspace from handling device names changing today? Why
>>> are you forcing userspace to pick a specific device name at all?
>> The way the perf tool works is that you have to specify the names on the
>> command line:
>>
>> perf stat -a -e uncore_cha/event=1/ ...
>>
>> With the numeric identifiers it would be
>>
>> perf stat -a -e uncore_type_X_Y/event=1/
>>
>> The tool handles it all abstractly.
> Great, and that device name is something that is unique per machine.
> And per boot.

No it's not unique and per boot. It's always the same on a given
platform, it's specified by firmware. I would expect the names to be
stable over all systems of a given chip.


> So why are you suddenly thinking that this name has to be
> "stable"?
It's about as stable as the existing names. The existing names change
sometimes too when the hardware changes (for example before Skylake we
had "uncore_cbo", since Skylake there is "uncore_cha"). The numeric
identifier should have similar stability (doesn't change as long as the
hardware doesn't change significantly)


>
> If you think it does have to be stable, that was your choice, so now you
> must keep it stable. Don't try to mess with symlinks and the like
> please, as again, that way lies madness and unmaintainability for the
> next 20+ years.

We keep it as stable as possible, but in the fallback mode only the
numeric IDs are possible. In the "driver knows full hardware" mode it
keeps the existing names, as possible.


>
>> So yes the user tools itself can handle it. But the problem is that it is
>> directly exposed to the users, so the users would need to change all their
>> scripts when switching between the two cases. That is what we're trying to
>> avoid -- provide them a way that works on both.
> But these are different systems! Why would anyone expect that the
> device name is the same on different systems?

The scenario is that you run the same system but with two different kernels.

Kernel 1 doesn't know the model number and can only operate in fallback
mode:

It only shows numeric IDs and that's what you have to use

Kernel 2 knows the model number and has a full driver which supports the
full Linux standard naming.

You can use the standard names (like uncore_cha)

But the problem is that it would be impossible to write a script that
works on both Kernel 1 and Kernel 2 on the same system without the
symlink or equivalent.


> If you insist on keeping
> the name identical for newer kernel versions, then again, that was your
> choice and now you have to do that. Do not try to work around your own
> requirement by using a symlink.

Are you suggesting to only use numerical names everywhere?

That would be a big change for existing users. The idea was that people
who use the fully enabled driver can use the much nicer symbolic names.
But people who care about writing scripts that work everywhere can use
the more difficult to use numeric names.

Anyways it looks like we're setting on using the "name" method suggested
by Kan. I must say I'm not a big fan of it though, it's just another
incompatible break in the perf ABI that would be totally avoidable.

-Andi

2021-06-28 06:59:26

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU

On Sun, Jun 27, 2021 at 09:30:53AM -0700, Andi Kleen wrote:
>
> > Then do not break things by renaming the device name, as you all have
> > now stated that this name is part of the user/kernel api.
>
> The renaming comes from the fallback mode on future systems. In the fallback
> mode the driver doesn't know the true name, so it has to use? the numeric
> name. If you don't use the fallback mode and have the full driver then yes
> you'll get the same names as always (or at least as they make sense for the
> hardware).
>
> But we would like to have the fallback mode too to allow more people use
> uncore monitoring, and that's where the need to for the second name comes
> in.

So then just always use the "fallback" name if that is going to be the
name you have for this hardware device. Why would you want it to be
renamed later on to a "fancier" name if there is only going to be
one-per-chipset-type anyway?

Naming is hard, make it simple and do not change it if your userspace
tools are not going to be able to handle the issue. Do NOT paper over
your naming scheme with symlinks from the very beginning, as that shows
the naming scheme is flawed.

thanks,

greg k-h

2021-06-28 15:36:42

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 2/7] perf: Create a symlink for a PMU


On 6/27/2021 11:55 PM, Greg KH wrote:
> On Sun, Jun 27, 2021 at 09:30:53AM -0700, Andi Kleen wrote:
>>> Then do not break things by renaming the device name, as you all have
>>> now stated that this name is part of the user/kernel api.
>> The renaming comes from the fallback mode on future systems. In the fallback
>> mode the driver doesn't know the true name, so it has to use  the numeric
>> name. If you don't use the fallback mode and have the full driver then yes
>> you'll get the same names as always (or at least as they make sense for the
>> hardware).
>>
>> But we would like to have the fallback mode too to allow more people use
>> uncore monitoring, and that's where the need to for the second name comes
>> in.
> So then just always use the "fallback" name if that is going to be the
> name you have for this hardware device. Why would you want it to be
> renamed later on to a "fancier" name if there is only going to be
> one-per-chipset-type anyway?

It's an ugly numeric name, difficult to use

perf stat -e uncore_0_2//

instead of

perf stat -e uncore_cha//

It wouldn't exactly be an improvement for the full driver.

-Andi