This patch series provides access to various counters on the ThunderX SOC.
For details of the uncore implementation see patch #1.
Patches #2-5 add the various ThunderX specific PMUs.
As suggested I've put the files under drivers/perf/uncore. I would
prefer this location over drivers/bus because not all of the uncore
drivers are bus related.
Changes to v1:
- Added NUMA support
- Fixed CPU hotplug by pmu migration
- Moved files to drivers/perf/uncore
- Removed OCX FRC and LNE drivers, these will fit better into a edac driver
- improved comments abount overflow interrupts
- removed max device limit
- trimmed include files
Feedback welcome!
Jan
-------------------------------------------------
Jan Glauber (5):
arm64/perf: Basic uncore counter support for Cavium ThunderX
arm64/perf: Cavium ThunderX L2C TAD uncore support
arm64/perf: Cavium ThunderX L2C CBC uncore support
arm64/perf: Cavium ThunderX LMC uncore support
arm64/perf: Cavium ThunderX OCX TLK uncore support
drivers/perf/Makefile | 1 +
drivers/perf/uncore/Makefile | 5 +
drivers/perf/uncore/uncore_cavium.c | 314 +++++++++++++++
drivers/perf/uncore/uncore_cavium.h | 95 +++++
drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
drivers/perf/uncore/uncore_cavium_lmc.c | 196 +++++++++
drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
8 files changed, 1828 insertions(+)
create mode 100644 drivers/perf/uncore/Makefile
create mode 100644 drivers/perf/uncore/uncore_cavium.c
create mode 100644 drivers/perf/uncore/uncore_cavium.h
create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
--
1.9.1
Support counters of the L2 Cache tag and data units.
Also support pass2 added/modified counters by checking MIDR.
Signed-off-by: Jan Glauber <[email protected]>
---
drivers/perf/uncore/Makefile | 3 +-
drivers/perf/uncore/uncore_cavium.c | 6 +-
drivers/perf/uncore/uncore_cavium.h | 7 +-
drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
4 files changed, 613 insertions(+), 3 deletions(-)
create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index b9c72c2..6a16caf 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1 +1,2 @@
-obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
+obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o \
+ uncore_cavium_l2c_tad.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index 4fd5e45..b92b2ae 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -15,7 +15,10 @@ int thunder_uncore_version;
struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
{
- return NULL;
+ if (event->pmu->type == thunder_l2c_tad_pmu.type)
+ return thunder_uncore_l2c_tad;
+ else
+ return NULL;
}
void thunder_uncore_read(struct perf_event *event)
@@ -296,6 +299,7 @@ static int __init thunder_uncore_init(void)
thunder_uncore_version = 1;
pr_info("PMU version: %d\n", thunder_uncore_version);
+ thunder_uncore_l2c_tad_setup();
return 0;
}
late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index c799709..7a9c367 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -7,7 +7,7 @@
#define pr_fmt(fmt) "thunderx_uncore: " fmt
enum uncore_type {
- NOP_TYPE,
+ L2C_TAD_TYPE,
};
extern int thunder_uncore_version;
@@ -65,6 +65,9 @@ static inline struct thunder_uncore_node *get_node(u64 config,
extern struct attribute_group thunder_uncore_attr_group;
extern struct device_attribute format_attr_node;
+extern struct thunder_uncore *thunder_uncore_l2c_tad;
+extern struct pmu thunder_l2c_tad_pmu;
+
/* Prototypes */
struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
void thunder_uncore_del(struct perf_event *event, int flags);
@@ -76,3 +79,5 @@ int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
ssize_t thunder_events_sysfs_show(struct device *dev,
struct device_attribute *attr,
char *page);
+
+int thunder_uncore_l2c_tad_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_l2c_tad.c b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
new file mode 100644
index 0000000..c8dc305
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
@@ -0,0 +1,600 @@
+/*
+ * Cavium Thunder uncore PMU support, L2C TAD counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <[email protected]>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_L2C_TAD
+#define PCI_DEVICE_ID_THUNDER_L2C_TAD 0xa02e
+#endif
+
+#define L2C_TAD_NR_COUNTERS 4
+#define L2C_TAD_CONTROL_OFFSET 0x10000
+#define L2C_TAD_COUNTER_OFFSET 0x100
+
+/* L2C TAD event list */
+#define L2C_TAD_EVENTS_DISABLED 0x00
+
+#define L2C_TAD_EVENT_L2T_HIT 0x01
+#define L2C_TAD_EVENT_L2T_MISS 0x02
+#define L2C_TAD_EVENT_L2T_NOALLOC 0x03
+#define L2C_TAD_EVENT_L2_VIC 0x04
+#define L2C_TAD_EVENT_SC_FAIL 0x05
+#define L2C_TAD_EVENT_SC_PASS 0x06
+#define L2C_TAD_EVENT_LFB_OCC 0x07
+#define L2C_TAD_EVENT_WAIT_LFB 0x08
+#define L2C_TAD_EVENT_WAIT_VAB 0x09
+
+#define L2C_TAD_EVENT_RTG_HIT 0x41
+#define L2C_TAD_EVENT_RTG_MISS 0x42
+#define L2C_TAD_EVENT_L2_RTG_VIC 0x44
+#define L2C_TAD_EVENT_L2_OPEN_OCI 0x48
+
+#define L2C_TAD_EVENT_QD0_IDX 0x80
+#define L2C_TAD_EVENT_QD0_RDAT 0x81
+#define L2C_TAD_EVENT_QD0_BNKS 0x82
+#define L2C_TAD_EVENT_QD0_WDAT 0x83
+
+#define L2C_TAD_EVENT_QD1_IDX 0x90
+#define L2C_TAD_EVENT_QD1_RDAT 0x91
+#define L2C_TAD_EVENT_QD1_BNKS 0x92
+#define L2C_TAD_EVENT_QD1_WDAT 0x93
+
+#define L2C_TAD_EVENT_QD2_IDX 0xa0
+#define L2C_TAD_EVENT_QD2_RDAT 0xa1
+#define L2C_TAD_EVENT_QD2_BNKS 0xa2
+#define L2C_TAD_EVENT_QD2_WDAT 0xa3
+
+#define L2C_TAD_EVENT_QD3_IDX 0xb0
+#define L2C_TAD_EVENT_QD3_RDAT 0xb1
+#define L2C_TAD_EVENT_QD3_BNKS 0xb2
+#define L2C_TAD_EVENT_QD3_WDAT 0xb3
+
+#define L2C_TAD_EVENT_QD4_IDX 0xc0
+#define L2C_TAD_EVENT_QD4_RDAT 0xc1
+#define L2C_TAD_EVENT_QD4_BNKS 0xc2
+#define L2C_TAD_EVENT_QD4_WDAT 0xc3
+
+#define L2C_TAD_EVENT_QD5_IDX 0xd0
+#define L2C_TAD_EVENT_QD5_RDAT 0xd1
+#define L2C_TAD_EVENT_QD5_BNKS 0xd2
+#define L2C_TAD_EVENT_QD5_WDAT 0xd3
+
+#define L2C_TAD_EVENT_QD6_IDX 0xe0
+#define L2C_TAD_EVENT_QD6_RDAT 0xe1
+#define L2C_TAD_EVENT_QD6_BNKS 0xe2
+#define L2C_TAD_EVENT_QD6_WDAT 0xe3
+
+#define L2C_TAD_EVENT_QD7_IDX 0xf0
+#define L2C_TAD_EVENT_QD7_RDAT 0xf1
+#define L2C_TAD_EVENT_QD7_BNKS 0xf2
+#define L2C_TAD_EVENT_QD7_WDAT 0xf3
+
+/* pass2 added/changed event list */
+#define L2C_TAD_EVENT_OPEN_CCPI 0x0a
+#define L2C_TAD_EVENT_LOOKUP 0x40
+#define L2C_TAD_EVENT_LOOKUP_XMC_LCL 0x41
+#define L2C_TAD_EVENT_LOOKUP_XMC_RMT 0x42
+#define L2C_TAD_EVENT_LOOKUP_MIB 0x43
+#define L2C_TAD_EVENT_LOOKUP_ALL 0x44
+#define L2C_TAD_EVENT_TAG_ALC_HIT 0x48
+#define L2C_TAD_EVENT_TAG_ALC_MISS 0x49
+#define L2C_TAD_EVENT_TAG_ALC_NALC 0x4a
+#define L2C_TAD_EVENT_TAG_NALC_HIT 0x4b
+#define L2C_TAD_EVENT_TAG_NALC_MISS 0x4c
+#define L2C_TAD_EVENT_LMC_WR 0x4e
+#define L2C_TAD_EVENT_LMC_SBLKDTY 0x4f
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HIT 0x50
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HITE 0x51
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HITS 0x52
+#define L2C_TAD_EVENT_TAG_ALC_RTG_MISS 0x53
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HIT 0x54
+#define L2C_TAD_EVENT_TAG_NALC_RTG_MISS 0x55
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HITE 0x56
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HITS 0x57
+#define L2C_TAD_EVENT_TAG_ALC_LCL_EVICT 0x58
+#define L2C_TAD_EVENT_TAG_ALC_LCL_CLNVIC 0x59
+#define L2C_TAD_EVENT_TAG_ALC_LCL_DTYVIC 0x5a
+#define L2C_TAD_EVENT_TAG_ALC_RMT_EVICT 0x5b
+#define L2C_TAD_EVENT_TAG_ALC_RMT_VIC 0x5c
+#define L2C_TAD_EVENT_RTG_ALC 0x5d
+#define L2C_TAD_EVENT_RTG_ALC_HIT 0x5e
+#define L2C_TAD_EVENT_RTG_ALC_HITWB 0x5f
+#define L2C_TAD_EVENT_STC_TOTAL 0x60
+#define L2C_TAD_EVENT_STC_TOTAL_FAIL 0x61
+#define L2C_TAD_EVENT_STC_RMT 0x62
+#define L2C_TAD_EVENT_STC_RMT_FAIL 0x63
+#define L2C_TAD_EVENT_STC_LCL 0x64
+#define L2C_TAD_EVENT_STC_LCL_FAIL 0x65
+#define L2C_TAD_EVENT_OCI_RTG_WAIT 0x68
+#define L2C_TAD_EVENT_OCI_FWD_CYC_HIT 0x69
+#define L2C_TAD_EVENT_OCI_FWD_RACE 0x6a
+#define L2C_TAD_EVENT_OCI_HAKS 0x6b
+#define L2C_TAD_EVENT_OCI_FLDX_TAG_E_NODAT 0x6c
+#define L2C_TAD_EVENT_OCI_FLDX_TAG_E_DAT 0x6d
+#define L2C_TAD_EVENT_OCI_RLDD 0x6e
+#define L2C_TAD_EVENT_OCI_RLDD_PEMD 0x6f
+#define L2C_TAD_EVENT_OCI_RRQ_DAT_CNT 0x70
+#define L2C_TAD_EVENT_OCI_RRQ_DAT_DMASK 0x71
+#define L2C_TAD_EVENT_OCI_RSP_DAT_CNT 0x72
+#define L2C_TAD_EVENT_OCI_RSP_DAT_DMASK 0x73
+#define L2C_TAD_EVENT_OCI_RSP_DAT_VICD_CNT 0x74
+#define L2C_TAD_EVENT_OCI_RSP_DAT_VICD_DMASK 0x75
+#define L2C_TAD_EVENT_OCI_RTG_ALC_EVICT 0x76
+#define L2C_TAD_EVENT_OCI_RTG_ALC_VIC 0x77
+
+struct thunder_uncore *thunder_uncore_l2c_tad;
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ struct thunder_uncore_unit *unit;
+ u64 prev;
+ int id;
+
+ node = get_node(hwc->config, uncore);
+ id = get_id(hwc->config);
+
+ /* restore counter value divided by units into all counters */
+ if (flags & PERF_EF_RELOAD) {
+ prev = local64_read(&hwc->prev_count);
+ prev = prev / node->nr_units;
+
+ list_for_each_entry(unit, &node->unit_list, entry)
+ writeq(prev, hwc->event_base + unit->map);
+ }
+
+ hwc->state = 0;
+
+ /* write byte in control registers for all units on the node */
+ list_for_each_entry(unit, &node->unit_list, entry)
+ writeb(id, hwc->config_base + unit->map);
+
+ perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ struct thunder_uncore_unit *unit;
+
+ /* reset selection value for all units on the node */
+ node = get_node(hwc->config, uncore);
+
+ list_for_each_entry(unit, &node->unit_list, entry)
+ writeb(L2C_TAD_EVENTS_DISABLED, hwc->config_base + unit->map);
+ hwc->state |= PERF_HES_STOPPED;
+
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ thunder_uncore_read(event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ int i;
+
+ WARN_ON_ONCE(!uncore);
+ node = get_node(hwc->config, uncore);
+
+ /* are we already assigned? */
+ if (hwc->idx != -1 && node->events[hwc->idx] == event)
+ goto out;
+
+ for (i = 0; i < node->num_counters; i++) {
+ if (node->events[i] == event) {
+ hwc->idx = i;
+ goto out;
+ }
+ }
+
+ /* if not take the first available counter */
+ hwc->idx = -1;
+ for (i = 0; i < node->num_counters; i++) {
+ if (cmpxchg(&node->events[i], NULL, event) == NULL) {
+ hwc->idx = i;
+ break;
+ }
+ }
+out:
+ if (hwc->idx == -1)
+ return -EBUSY;
+
+ hwc->config_base = hwc->idx;
+ hwc->event_base = L2C_TAD_COUNTER_OFFSET +
+ hwc->idx * sizeof(unsigned long long);
+ hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+ if (flags & PERF_EF_START)
+ thunder_uncore_start(event, PERF_EF_RELOAD);
+ return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-7");
+
+static struct attribute *thunder_l2c_tad_format_attr[] = {
+ &format_attr_event.attr,
+ &format_attr_node.attr,
+ NULL,
+};
+
+static struct attribute_group thunder_l2c_tad_format_group = {
+ .name = "format",
+ .attrs = thunder_l2c_tad_format_attr,
+};
+
+EVENT_ATTR(l2t_hit, L2C_TAD_EVENT_L2T_HIT);
+EVENT_ATTR(l2t_miss, L2C_TAD_EVENT_L2T_MISS);
+EVENT_ATTR(l2t_noalloc, L2C_TAD_EVENT_L2T_NOALLOC);
+EVENT_ATTR(l2_vic, L2C_TAD_EVENT_L2_VIC);
+EVENT_ATTR(sc_fail, L2C_TAD_EVENT_SC_FAIL);
+EVENT_ATTR(sc_pass, L2C_TAD_EVENT_SC_PASS);
+EVENT_ATTR(lfb_occ, L2C_TAD_EVENT_LFB_OCC);
+EVENT_ATTR(wait_lfb, L2C_TAD_EVENT_WAIT_LFB);
+EVENT_ATTR(wait_vab, L2C_TAD_EVENT_WAIT_VAB);
+EVENT_ATTR(rtg_hit, L2C_TAD_EVENT_RTG_HIT);
+EVENT_ATTR(rtg_miss, L2C_TAD_EVENT_RTG_MISS);
+EVENT_ATTR(l2_rtg_vic, L2C_TAD_EVENT_L2_RTG_VIC);
+EVENT_ATTR(l2_open_oci, L2C_TAD_EVENT_L2_OPEN_OCI);
+
+EVENT_ATTR(qd0_idx, L2C_TAD_EVENT_QD0_IDX);
+EVENT_ATTR(qd0_rdat, L2C_TAD_EVENT_QD0_RDAT);
+EVENT_ATTR(qd0_bnks, L2C_TAD_EVENT_QD0_BNKS);
+EVENT_ATTR(qd0_wdat, L2C_TAD_EVENT_QD0_WDAT);
+
+EVENT_ATTR(qd1_idx, L2C_TAD_EVENT_QD1_IDX);
+EVENT_ATTR(qd1_rdat, L2C_TAD_EVENT_QD1_RDAT);
+EVENT_ATTR(qd1_bnks, L2C_TAD_EVENT_QD1_BNKS);
+EVENT_ATTR(qd1_wdat, L2C_TAD_EVENT_QD1_WDAT);
+
+EVENT_ATTR(qd2_idx, L2C_TAD_EVENT_QD2_IDX);
+EVENT_ATTR(qd2_rdat, L2C_TAD_EVENT_QD2_RDAT);
+EVENT_ATTR(qd2_bnks, L2C_TAD_EVENT_QD2_BNKS);
+EVENT_ATTR(qd2_wdat, L2C_TAD_EVENT_QD2_WDAT);
+
+EVENT_ATTR(qd3_idx, L2C_TAD_EVENT_QD3_IDX);
+EVENT_ATTR(qd3_rdat, L2C_TAD_EVENT_QD3_RDAT);
+EVENT_ATTR(qd3_bnks, L2C_TAD_EVENT_QD3_BNKS);
+EVENT_ATTR(qd3_wdat, L2C_TAD_EVENT_QD3_WDAT);
+
+EVENT_ATTR(qd4_idx, L2C_TAD_EVENT_QD4_IDX);
+EVENT_ATTR(qd4_rdat, L2C_TAD_EVENT_QD4_RDAT);
+EVENT_ATTR(qd4_bnks, L2C_TAD_EVENT_QD4_BNKS);
+EVENT_ATTR(qd4_wdat, L2C_TAD_EVENT_QD4_WDAT);
+
+EVENT_ATTR(qd5_idx, L2C_TAD_EVENT_QD5_IDX);
+EVENT_ATTR(qd5_rdat, L2C_TAD_EVENT_QD5_RDAT);
+EVENT_ATTR(qd5_bnks, L2C_TAD_EVENT_QD5_BNKS);
+EVENT_ATTR(qd5_wdat, L2C_TAD_EVENT_QD5_WDAT);
+
+EVENT_ATTR(qd6_idx, L2C_TAD_EVENT_QD6_IDX);
+EVENT_ATTR(qd6_rdat, L2C_TAD_EVENT_QD6_RDAT);
+EVENT_ATTR(qd6_bnks, L2C_TAD_EVENT_QD6_BNKS);
+EVENT_ATTR(qd6_wdat, L2C_TAD_EVENT_QD6_WDAT);
+
+EVENT_ATTR(qd7_idx, L2C_TAD_EVENT_QD7_IDX);
+EVENT_ATTR(qd7_rdat, L2C_TAD_EVENT_QD7_RDAT);
+EVENT_ATTR(qd7_bnks, L2C_TAD_EVENT_QD7_BNKS);
+EVENT_ATTR(qd7_wdat, L2C_TAD_EVENT_QD7_WDAT);
+
+static struct attribute *thunder_l2c_tad_events_attr[] = {
+ EVENT_PTR(l2t_hit),
+ EVENT_PTR(l2t_miss),
+ EVENT_PTR(l2t_noalloc),
+ EVENT_PTR(l2_vic),
+ EVENT_PTR(sc_fail),
+ EVENT_PTR(sc_pass),
+ EVENT_PTR(lfb_occ),
+ EVENT_PTR(wait_lfb),
+ EVENT_PTR(wait_vab),
+ EVENT_PTR(rtg_hit),
+ EVENT_PTR(rtg_miss),
+ EVENT_PTR(l2_rtg_vic),
+ EVENT_PTR(l2_open_oci),
+
+ EVENT_PTR(qd0_idx),
+ EVENT_PTR(qd0_rdat),
+ EVENT_PTR(qd0_bnks),
+ EVENT_PTR(qd0_wdat),
+
+ EVENT_PTR(qd1_idx),
+ EVENT_PTR(qd1_rdat),
+ EVENT_PTR(qd1_bnks),
+ EVENT_PTR(qd1_wdat),
+
+ EVENT_PTR(qd2_idx),
+ EVENT_PTR(qd2_rdat),
+ EVENT_PTR(qd2_bnks),
+ EVENT_PTR(qd2_wdat),
+
+ EVENT_PTR(qd3_idx),
+ EVENT_PTR(qd3_rdat),
+ EVENT_PTR(qd3_bnks),
+ EVENT_PTR(qd3_wdat),
+
+ EVENT_PTR(qd4_idx),
+ EVENT_PTR(qd4_rdat),
+ EVENT_PTR(qd4_bnks),
+ EVENT_PTR(qd4_wdat),
+
+ EVENT_PTR(qd5_idx),
+ EVENT_PTR(qd5_rdat),
+ EVENT_PTR(qd5_bnks),
+ EVENT_PTR(qd5_wdat),
+
+ EVENT_PTR(qd6_idx),
+ EVENT_PTR(qd6_rdat),
+ EVENT_PTR(qd6_bnks),
+ EVENT_PTR(qd6_wdat),
+
+ EVENT_PTR(qd7_idx),
+ EVENT_PTR(qd7_rdat),
+ EVENT_PTR(qd7_bnks),
+ EVENT_PTR(qd7_wdat),
+ NULL,
+};
+
+/* pass2 added/chanegd events */
+EVENT_ATTR(open_ccpi, L2C_TAD_EVENT_OPEN_CCPI);
+EVENT_ATTR(lookup, L2C_TAD_EVENT_LOOKUP);
+EVENT_ATTR(lookup_xmc_lcl, L2C_TAD_EVENT_LOOKUP_XMC_LCL);
+EVENT_ATTR(lookup_xmc_rmt, L2C_TAD_EVENT_LOOKUP_XMC_RMT);
+EVENT_ATTR(lookup_mib, L2C_TAD_EVENT_LOOKUP_MIB);
+EVENT_ATTR(lookup_all, L2C_TAD_EVENT_LOOKUP_ALL);
+
+EVENT_ATTR(tag_alc_hit, L2C_TAD_EVENT_TAG_ALC_HIT);
+EVENT_ATTR(tag_alc_miss, L2C_TAD_EVENT_TAG_ALC_MISS);
+EVENT_ATTR(tag_alc_nalc, L2C_TAD_EVENT_TAG_ALC_NALC);
+EVENT_ATTR(tag_nalc_hit, L2C_TAD_EVENT_TAG_NALC_HIT);
+EVENT_ATTR(tag_nalc_miss, L2C_TAD_EVENT_TAG_NALC_MISS);
+
+EVENT_ATTR(lmc_wr, L2C_TAD_EVENT_LMC_WR);
+EVENT_ATTR(lmc_sblkdty, L2C_TAD_EVENT_LMC_SBLKDTY);
+
+EVENT_ATTR(tag_alc_rtg_hit, L2C_TAD_EVENT_TAG_ALC_RTG_HIT);
+EVENT_ATTR(tag_alc_rtg_hite, L2C_TAD_EVENT_TAG_ALC_RTG_HITE);
+EVENT_ATTR(tag_alc_rtg_hits, L2C_TAD_EVENT_TAG_ALC_RTG_HITS);
+EVENT_ATTR(tag_alc_rtg_miss, L2C_TAD_EVENT_TAG_ALC_RTG_MISS);
+EVENT_ATTR(tag_alc_nalc_rtg_hit, L2C_TAD_EVENT_TAG_NALC_RTG_HIT);
+EVENT_ATTR(tag_nalc_rtg_miss, L2C_TAD_EVENT_TAG_NALC_RTG_MISS);
+EVENT_ATTR(tag_nalc_rtg_hite, L2C_TAD_EVENT_TAG_NALC_RTG_HITE);
+EVENT_ATTR(tag_nalc_rtg_hits, L2C_TAD_EVENT_TAG_NALC_RTG_HITS);
+EVENT_ATTR(tag_alc_lcl_evict, L2C_TAD_EVENT_TAG_ALC_LCL_EVICT);
+EVENT_ATTR(tag_alc_lcl_clnvic, L2C_TAD_EVENT_TAG_ALC_LCL_CLNVIC);
+EVENT_ATTR(tag_alc_lcl_dtyvic, L2C_TAD_EVENT_TAG_ALC_LCL_DTYVIC);
+EVENT_ATTR(tag_alc_rmt_evict, L2C_TAD_EVENT_TAG_ALC_RMT_EVICT);
+EVENT_ATTR(tag_alc_rmt_vic, L2C_TAD_EVENT_TAG_ALC_RMT_VIC);
+
+EVENT_ATTR(rtg_alc, L2C_TAD_EVENT_RTG_ALC);
+EVENT_ATTR(rtg_alc_hit, L2C_TAD_EVENT_RTG_ALC_HIT);
+EVENT_ATTR(rtg_alc_hitwb, L2C_TAD_EVENT_RTG_ALC_HITWB);
+
+EVENT_ATTR(stc_total, L2C_TAD_EVENT_STC_TOTAL);
+EVENT_ATTR(stc_total_fail, L2C_TAD_EVENT_STC_TOTAL_FAIL);
+EVENT_ATTR(stc_rmt, L2C_TAD_EVENT_STC_RMT);
+EVENT_ATTR(stc_rmt_fail, L2C_TAD_EVENT_STC_RMT_FAIL);
+EVENT_ATTR(stc_lcl, L2C_TAD_EVENT_STC_LCL);
+EVENT_ATTR(stc_lcl_fail, L2C_TAD_EVENT_STC_LCL_FAIL);
+
+EVENT_ATTR(oci_rtg_wait, L2C_TAD_EVENT_OCI_RTG_WAIT);
+EVENT_ATTR(oci_fwd_cyc_hit, L2C_TAD_EVENT_OCI_FWD_CYC_HIT);
+EVENT_ATTR(oci_fwd_race, L2C_TAD_EVENT_OCI_FWD_RACE);
+EVENT_ATTR(oci_haks, L2C_TAD_EVENT_OCI_HAKS);
+EVENT_ATTR(oci_fldx_tag_e_nodat, L2C_TAD_EVENT_OCI_FLDX_TAG_E_NODAT);
+EVENT_ATTR(oci_fldx_tag_e_dat, L2C_TAD_EVENT_OCI_FLDX_TAG_E_DAT);
+EVENT_ATTR(oci_rldd, L2C_TAD_EVENT_OCI_RLDD);
+EVENT_ATTR(oci_rldd_pemd, L2C_TAD_EVENT_OCI_RLDD_PEMD);
+EVENT_ATTR(oci_rrq_dat_cnt, L2C_TAD_EVENT_OCI_RRQ_DAT_CNT);
+EVENT_ATTR(oci_rrq_dat_dmask, L2C_TAD_EVENT_OCI_RRQ_DAT_DMASK);
+EVENT_ATTR(oci_rsp_dat_cnt, L2C_TAD_EVENT_OCI_RSP_DAT_CNT);
+EVENT_ATTR(oci_rsp_dat_dmaks, L2C_TAD_EVENT_OCI_RSP_DAT_DMASK);
+EVENT_ATTR(oci_rsp_dat_vicd_cnt, L2C_TAD_EVENT_OCI_RSP_DAT_VICD_CNT);
+EVENT_ATTR(oci_rsp_dat_vicd_dmask, L2C_TAD_EVENT_OCI_RSP_DAT_VICD_DMASK);
+EVENT_ATTR(oci_rtg_alc_evict, L2C_TAD_EVENT_OCI_RTG_ALC_EVICT);
+EVENT_ATTR(oci_rtg_alc_vic, L2C_TAD_EVENT_OCI_RTG_ALC_VIC);
+
+static struct attribute *thunder_l2c_tad_pass2_events_attr[] = {
+ EVENT_PTR(l2t_hit),
+ EVENT_PTR(l2t_miss),
+ EVENT_PTR(l2t_noalloc),
+ EVENT_PTR(l2_vic),
+ EVENT_PTR(sc_fail),
+ EVENT_PTR(sc_pass),
+ EVENT_PTR(lfb_occ),
+ EVENT_PTR(wait_lfb),
+ EVENT_PTR(wait_vab),
+ EVENT_PTR(open_ccpi),
+
+ EVENT_PTR(lookup),
+ EVENT_PTR(lookup_xmc_lcl),
+ EVENT_PTR(lookup_xmc_rmt),
+ EVENT_PTR(lookup_mib),
+ EVENT_PTR(lookup_all),
+
+ EVENT_PTR(tag_alc_hit),
+ EVENT_PTR(tag_alc_miss),
+ EVENT_PTR(tag_alc_nalc),
+ EVENT_PTR(tag_nalc_hit),
+ EVENT_PTR(tag_nalc_miss),
+
+ EVENT_PTR(lmc_wr),
+ EVENT_PTR(lmc_sblkdty),
+
+ EVENT_PTR(tag_alc_rtg_hit),
+ EVENT_PTR(tag_alc_rtg_hite),
+ EVENT_PTR(tag_alc_rtg_hits),
+ EVENT_PTR(tag_alc_rtg_miss),
+ EVENT_PTR(tag_alc_nalc_rtg_hit),
+ EVENT_PTR(tag_nalc_rtg_miss),
+ EVENT_PTR(tag_nalc_rtg_hite),
+ EVENT_PTR(tag_nalc_rtg_hits),
+ EVENT_PTR(tag_alc_lcl_evict),
+ EVENT_PTR(tag_alc_lcl_clnvic),
+ EVENT_PTR(tag_alc_lcl_dtyvic),
+ EVENT_PTR(tag_alc_rmt_evict),
+ EVENT_PTR(tag_alc_rmt_vic),
+
+ EVENT_PTR(rtg_alc),
+ EVENT_PTR(rtg_alc_hit),
+ EVENT_PTR(rtg_alc_hitwb),
+
+ EVENT_PTR(stc_total),
+ EVENT_PTR(stc_total_fail),
+ EVENT_PTR(stc_rmt),
+ EVENT_PTR(stc_rmt_fail),
+ EVENT_PTR(stc_lcl),
+ EVENT_PTR(stc_lcl_fail),
+
+ EVENT_PTR(oci_rtg_wait),
+ EVENT_PTR(oci_fwd_cyc_hit),
+ EVENT_PTR(oci_fwd_race),
+ EVENT_PTR(oci_haks),
+ EVENT_PTR(oci_fldx_tag_e_nodat),
+ EVENT_PTR(oci_fldx_tag_e_dat),
+ EVENT_PTR(oci_rldd),
+ EVENT_PTR(oci_rldd_pemd),
+ EVENT_PTR(oci_rrq_dat_cnt),
+ EVENT_PTR(oci_rrq_dat_dmask),
+ EVENT_PTR(oci_rsp_dat_cnt),
+ EVENT_PTR(oci_rsp_dat_dmaks),
+ EVENT_PTR(oci_rsp_dat_vicd_cnt),
+ EVENT_PTR(oci_rsp_dat_vicd_dmask),
+ EVENT_PTR(oci_rtg_alc_evict),
+ EVENT_PTR(oci_rtg_alc_vic),
+
+ EVENT_PTR(qd0_idx),
+ EVENT_PTR(qd0_rdat),
+ EVENT_PTR(qd0_bnks),
+ EVENT_PTR(qd0_wdat),
+
+ EVENT_PTR(qd1_idx),
+ EVENT_PTR(qd1_rdat),
+ EVENT_PTR(qd1_bnks),
+ EVENT_PTR(qd1_wdat),
+
+ EVENT_PTR(qd2_idx),
+ EVENT_PTR(qd2_rdat),
+ EVENT_PTR(qd2_bnks),
+ EVENT_PTR(qd2_wdat),
+
+ EVENT_PTR(qd3_idx),
+ EVENT_PTR(qd3_rdat),
+ EVENT_PTR(qd3_bnks),
+ EVENT_PTR(qd3_wdat),
+
+ EVENT_PTR(qd4_idx),
+ EVENT_PTR(qd4_rdat),
+ EVENT_PTR(qd4_bnks),
+ EVENT_PTR(qd4_wdat),
+
+ EVENT_PTR(qd5_idx),
+ EVENT_PTR(qd5_rdat),
+ EVENT_PTR(qd5_bnks),
+ EVENT_PTR(qd5_wdat),
+
+ EVENT_PTR(qd6_idx),
+ EVENT_PTR(qd6_rdat),
+ EVENT_PTR(qd6_bnks),
+ EVENT_PTR(qd6_wdat),
+
+ EVENT_PTR(qd7_idx),
+ EVENT_PTR(qd7_rdat),
+ EVENT_PTR(qd7_bnks),
+ EVENT_PTR(qd7_wdat),
+ NULL,
+};
+
+static struct attribute_group thunder_l2c_tad_events_group = {
+ .name = "events",
+ .attrs = NULL,
+};
+
+static const struct attribute_group *thunder_l2c_tad_attr_groups[] = {
+ &thunder_uncore_attr_group,
+ &thunder_l2c_tad_format_group,
+ &thunder_l2c_tad_events_group,
+ NULL,
+};
+
+struct pmu thunder_l2c_tad_pmu = {
+ .attr_groups = thunder_l2c_tad_attr_groups,
+ .name = "thunder_l2c_tad",
+ .event_init = thunder_uncore_event_init,
+ .add = thunder_uncore_add,
+ .del = thunder_uncore_del,
+ .start = thunder_uncore_start,
+ .stop = thunder_uncore_stop,
+ .read = thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+ if ((config > 0 && config <= L2C_TAD_EVENT_WAIT_VAB) ||
+ config == L2C_TAD_EVENT_RTG_HIT ||
+ config == L2C_TAD_EVENT_RTG_MISS ||
+ config == L2C_TAD_EVENT_L2_RTG_VIC ||
+ config == L2C_TAD_EVENT_L2_OPEN_OCI ||
+ ((config & 0x80) && ((config & 0xf) <= 3)))
+ return 1;
+
+ if (thunder_uncore_version == 1)
+ if (config == L2C_TAD_EVENT_OPEN_CCPI ||
+ (config >= L2C_TAD_EVENT_LOOKUP &&
+ config <= L2C_TAD_EVENT_LOOKUP_ALL) ||
+ (config >= L2C_TAD_EVENT_TAG_ALC_HIT &&
+ config <= L2C_TAD_EVENT_OCI_RTG_ALC_VIC &&
+ config != 0x4d &&
+ config != 0x66 &&
+ config != 0x67))
+ return 1;
+
+ return 0;
+}
+
+int __init thunder_uncore_l2c_tad_setup(void)
+{
+ int ret = -ENOMEM;
+
+ thunder_uncore_l2c_tad = kzalloc(sizeof(struct thunder_uncore),
+ GFP_KERNEL);
+ if (!thunder_uncore_l2c_tad)
+ goto fail_nomem;
+
+ if (thunder_uncore_version == 0)
+ thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_events_attr;
+ else /* default */
+ thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_pass2_events_attr;
+
+ ret = thunder_uncore_setup(thunder_uncore_l2c_tad,
+ PCI_DEVICE_ID_THUNDER_L2C_TAD,
+ L2C_TAD_CONTROL_OFFSET,
+ L2C_TAD_COUNTER_OFFSET + L2C_TAD_NR_COUNTERS
+ * sizeof(unsigned long long),
+ &thunder_l2c_tad_pmu,
+ L2C_TAD_NR_COUNTERS);
+ if (ret)
+ goto fail;
+
+ thunder_uncore_l2c_tad->type = L2C_TAD_TYPE;
+ thunder_uncore_l2c_tad->event_valid = event_valid;
+ return 0;
+
+fail:
+ kfree(thunder_uncore_l2c_tad);
+fail_nomem:
+ return ret;
+}
--
1.9.1
Support counters on the DRAM controllers.
Also support pass2 added counters by checking MIDR.
Signed-off-by: Jan Glauber <[email protected]>
---
drivers/perf/uncore/Makefile | 3 +-
drivers/perf/uncore/uncore_cavium.c | 3 +
drivers/perf/uncore/uncore_cavium.h | 4 +
drivers/perf/uncore/uncore_cavium_lmc.c | 196 ++++++++++++++++++++++++++++++++
4 files changed, 205 insertions(+), 1 deletion(-)
create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index d52ecc9..81479e8 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,3 +1,4 @@
obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o \
uncore_cavium_l2c_tad.o \
- uncore_cavium_l2c_cbc.o
+ uncore_cavium_l2c_cbc.o \
+ uncore_cavium_lmc.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index a230450..45c81d0 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -19,6 +19,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
return thunder_uncore_l2c_tad;
else if (event->pmu->type == thunder_l2c_cbc_pmu.type)
return thunder_uncore_l2c_cbc;
+ else if (event->pmu->type == thunder_lmc_pmu.type)
+ return thunder_uncore_lmc;
else
return NULL;
}
@@ -303,6 +305,7 @@ static int __init thunder_uncore_init(void)
thunder_uncore_l2c_tad_setup();
thunder_uncore_l2c_cbc_setup();
+ thunder_uncore_lmc_setup();
return 0;
}
late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index 94bd02c..f14f6be 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -9,6 +9,7 @@
enum uncore_type {
L2C_TAD_TYPE,
L2C_CBC_TYPE,
+ LMC_TYPE,
};
extern int thunder_uncore_version;
@@ -68,8 +69,10 @@ extern struct device_attribute format_attr_node;
extern struct thunder_uncore *thunder_uncore_l2c_tad;
extern struct thunder_uncore *thunder_uncore_l2c_cbc;
+extern struct thunder_uncore *thunder_uncore_lmc;
extern struct pmu thunder_l2c_tad_pmu;
extern struct pmu thunder_l2c_cbc_pmu;
+extern struct pmu thunder_lmc_pmu;
/* Prototypes */
struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -85,3 +88,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
int thunder_uncore_l2c_tad_setup(void);
int thunder_uncore_l2c_cbc_setup(void);
+int thunder_uncore_lmc_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_lmc.c b/drivers/perf/uncore/uncore_cavium_lmc.c
new file mode 100644
index 0000000..b8d21b4
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_lmc.c
@@ -0,0 +1,196 @@
+/*
+ * Cavium Thunder uncore PMU support, LMC counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <[email protected]>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_LMC
+#define PCI_DEVICE_ID_THUNDER_LMC 0xa022
+#endif
+
+#define LMC_NR_COUNTERS 3
+#define LMC_PASS2_NR_COUNTERS 5
+#define LMC_MAX_NR_COUNTERS LMC_PASS2_NR_COUNTERS
+
+/* LMC event list */
+#define LMC_EVENT_IFB_CNT 0
+#define LMC_EVENT_OPS_CNT 1
+#define LMC_EVENT_DCLK_CNT 2
+
+/* pass 2 added counters */
+#define LMC_EVENT_BANK_CONFLICT1 3
+#define LMC_EVENT_BANK_CONFLICT2 4
+
+#define LMC_COUNTER_START LMC_EVENT_IFB_CNT
+#define LMC_COUNTER_END (LMC_EVENT_BANK_CONFLICT2 + 8)
+
+struct thunder_uncore *thunder_uncore_lmc;
+
+int lmc_events[LMC_MAX_NR_COUNTERS] = { 0x1d0, 0x1d8, 0x1e0, 0x360, 0x368 };
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ hwc->state = 0;
+ perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ thunder_uncore_read(event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ int id, i;
+
+ WARN_ON_ONCE(!uncore);
+ node = get_node(hwc->config, uncore);
+ id = get_id(hwc->config);
+
+ /* are we already assigned? */
+ if (hwc->idx != -1 && node->events[hwc->idx] == event)
+ goto out;
+
+ for (i = 0; i < node->num_counters; i++) {
+ if (node->events[i] == event) {
+ hwc->idx = i;
+ goto out;
+ }
+ }
+
+ /* these counters are self-sustained so idx must match the counter! */
+ hwc->idx = -1;
+ if (cmpxchg(&node->events[id], NULL, event) == NULL)
+ hwc->idx = i;
+
+out:
+ if (hwc->idx == -1)
+ return -EBUSY;
+
+ hwc->event_base = lmc_events[id];
+ hwc->state = PERF_HES_UPTODATE;
+
+ /* counters are read-only, so avoid PERF_EF_RELOAD */
+ if (flags & PERF_EF_START)
+ thunder_uncore_start(event, 0);
+
+ return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-2");
+
+static struct attribute *thunder_lmc_format_attr[] = {
+ &format_attr_event.attr,
+ &format_attr_node.attr,
+ NULL,
+};
+
+static struct attribute_group thunder_lmc_format_group = {
+ .name = "format",
+ .attrs = thunder_lmc_format_attr,
+};
+
+EVENT_ATTR(ifb_cnt, LMC_EVENT_IFB_CNT);
+EVENT_ATTR(ops_cnt, LMC_EVENT_OPS_CNT);
+EVENT_ATTR(dclk_cnt, LMC_EVENT_DCLK_CNT);
+EVENT_ATTR(bank_conflict1, LMC_EVENT_BANK_CONFLICT1);
+EVENT_ATTR(bank_conflict2, LMC_EVENT_BANK_CONFLICT2);
+
+static struct attribute *thunder_lmc_events_attr[] = {
+ EVENT_PTR(ifb_cnt),
+ EVENT_PTR(ops_cnt),
+ EVENT_PTR(dclk_cnt),
+ NULL,
+};
+
+static struct attribute *thunder_lmc_pass2_events_attr[] = {
+ EVENT_PTR(ifb_cnt),
+ EVENT_PTR(ops_cnt),
+ EVENT_PTR(dclk_cnt),
+ EVENT_PTR(bank_conflict1),
+ EVENT_PTR(bank_conflict2),
+ NULL,
+};
+
+static struct attribute_group thunder_lmc_events_group = {
+ .name = "events",
+ .attrs = NULL,
+};
+
+static const struct attribute_group *thunder_lmc_attr_groups[] = {
+ &thunder_uncore_attr_group,
+ &thunder_lmc_format_group,
+ &thunder_lmc_events_group,
+ NULL,
+};
+
+struct pmu thunder_lmc_pmu = {
+ .attr_groups = thunder_lmc_attr_groups,
+ .name = "thunder_lmc",
+ .event_init = thunder_uncore_event_init,
+ .add = thunder_uncore_add,
+ .del = thunder_uncore_del,
+ .start = thunder_uncore_start,
+ .stop = thunder_uncore_stop,
+ .read = thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+ if (config <= LMC_EVENT_DCLK_CNT)
+ return 1;
+
+ if (thunder_uncore_version == 1)
+ if (config == LMC_EVENT_BANK_CONFLICT1 ||
+ config == LMC_EVENT_BANK_CONFLICT2)
+ return 1;
+ return 0;
+}
+
+int __init thunder_uncore_lmc_setup(void)
+{
+ int ret = -ENOMEM;
+
+ thunder_uncore_lmc = kzalloc(sizeof(struct thunder_uncore), GFP_KERNEL);
+ if (!thunder_uncore_lmc)
+ goto fail_nomem;
+
+ /* pass2 is default */
+ thunder_lmc_events_group.attrs = (thunder_uncore_version == 0) ?
+ thunder_lmc_events_attr : thunder_lmc_pass2_events_attr;
+
+ ret = thunder_uncore_setup(thunder_uncore_lmc,
+ PCI_DEVICE_ID_THUNDER_LMC,
+ LMC_COUNTER_START,
+ LMC_COUNTER_END - LMC_COUNTER_START,
+ &thunder_lmc_pmu,
+ (thunder_uncore_version == 1) ?
+ LMC_PASS2_NR_COUNTERS : LMC_NR_COUNTERS);
+ if (ret)
+ goto fail;
+
+ thunder_uncore_lmc->type = LMC_TYPE;
+ thunder_uncore_lmc->event_valid = event_valid;
+ return 0;
+
+fail:
+ kfree(thunder_uncore_lmc);
+fail_nomem:
+ return ret;
+}
--
1.9.1
Provide "uncore" facilities for different non-CPU performance
counter units. Based on Intel/AMD uncore pmu support.
The uncore drivers cover quite different functionality including
L2 Cache, memory controllers and interconnects.
The uncore PMUs can be found under /sys/bus/event_source/devices.
All counters are exported via sysfs in the corresponding events
files under the PMU directory so the perf tool can list the event names.
There are some points that are special in this implementation:
1) The PMU detection relies on PCI device detection. If a
matching PCI device is found the PMU is created. The code can deal
with multiple units of the same type, e.g. more than one memory
controller.
Note: There is also a CPUID check to determine the CPU variant,
this is needed to support different hardware versions that use
the same PCI IDs.
2) Counters are summarized across different units of the same type
on one NUMA node.
For instance L2C TAD 0..7 are presented as a single counter
(adding the values from TAD 0 to 7). Although losing the ability
to read a single value the merged values are easier to use.
3) NUMA support. The device node id is used to group devices by node
so counters on one node can be merged. The NUMA node can be selected
via a new sysfs node attribute.
Without NUMA support all devices will be on node 0.
4) All counters are 64 bit wide without overflow interrupts.
Signed-off-by: Jan Glauber <[email protected]>
---
drivers/perf/Makefile | 1 +
drivers/perf/uncore/Makefile | 1 +
drivers/perf/uncore/uncore_cavium.c | 301 ++++++++++++++++++++++++++++++++++++
drivers/perf/uncore/uncore_cavium.h | 78 ++++++++++
4 files changed, 381 insertions(+)
create mode 100644 drivers/perf/uncore/Makefile
create mode 100644 drivers/perf/uncore/uncore_cavium.c
create mode 100644 drivers/perf/uncore/uncore_cavium.h
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index acd2397..61b6084 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -1 +1,2 @@
obj-$(CONFIG_ARM_PMU) += arm_pmu.o
+obj-$(CONFIG_ARCH_THUNDER) += uncore/
diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
new file mode 100644
index 0000000..b9c72c2
--- /dev/null
+++ b/drivers/perf/uncore/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
new file mode 100644
index 0000000..4fd5e45
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -0,0 +1,301 @@
+/*
+ * Cavium Thunder uncore PMU support. Derived from Intel and AMD uncore code.
+ *
+ * Copyright (C) 2015,2016 Cavium Inc.
+ * Author: Jan Glauber <[email protected]>
+ */
+
+#include <linux/slab.h>
+#include <linux/numa.h>
+#include <linux/cpufeature.h>
+
+#include "uncore_cavium.h"
+
+int thunder_uncore_version;
+
+struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
+{
+ return NULL;
+}
+
+void thunder_uncore_read(struct perf_event *event)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ struct thunder_uncore_unit *unit;
+ u64 prev, new = 0;
+ s64 delta;
+
+ node = get_node(hwc->config, uncore);
+
+ /*
+ * No counter overflow interrupts so we do not
+ * have to worry about prev_count changing on us.
+ */
+ prev = local64_read(&hwc->prev_count);
+
+ /* read counter values from all units on the node */
+ list_for_each_entry(unit, &node->unit_list, entry)
+ new += readq(hwc->event_base + unit->map);
+
+ local64_set(&hwc->prev_count, new);
+ delta = new - prev;
+ local64_add(delta, &event->count);
+}
+
+void thunder_uncore_del(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ int i;
+
+ event->pmu->stop(event, PERF_EF_UPDATE);
+
+ /*
+ * For programmable counters we need to check where we installed it.
+ * To keep this function generic always test the more complicated
+ * case (free running counters won't need the loop).
+ */
+ node = get_node(hwc->config, uncore);
+ for (i = 0; i < node->num_counters; i++) {
+ if (cmpxchg(&node->events[i], event, NULL) == event)
+ break;
+ }
+ hwc->idx = -1;
+}
+
+int thunder_uncore_event_init(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ struct thunder_uncore *uncore;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* we do not support sampling */
+ if (is_sampling_event(event))
+ return -EINVAL;
+
+ /* counters do not have these bits */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle)
+ return -EINVAL;
+
+ /* counters are 64 bit wide and without overflow interrupts */
+
+ uncore = event_to_thunder_uncore(event);
+ if (!uncore)
+ return -ENODEV;
+ if (!uncore->event_valid(event->attr.config & UNCORE_EVENT_ID_MASK))
+ return -EINVAL;
+
+ /* check NUMA node */
+ node = get_node(event->attr.config, uncore);
+ if (!node) {
+ pr_debug("Invalid numa node selected\n");
+ return -EINVAL;
+ }
+
+ hwc->config = event->attr.config;
+ hwc->idx = -1;
+ return 0;
+}
+
+/*
+ * Thunder uncore events are independent from CPUs. Provide a cpumask
+ * nevertheless to prevent perf from adding the event per-cpu and just
+ * set the mask to one online CPU. Use the same cpumask for all uncore
+ * devices.
+ */
+static cpumask_t thunder_active_mask;
+
+static ssize_t thunder_uncore_attr_show_cpumask(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &thunder_active_mask);
+}
+static DEVICE_ATTR(cpumask, S_IRUGO, thunder_uncore_attr_show_cpumask, NULL);
+
+static struct attribute *thunder_uncore_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+struct attribute_group thunder_uncore_attr_group = {
+ .attrs = thunder_uncore_attrs,
+};
+
+ssize_t thunder_events_sysfs_show(struct device *dev,
+ struct device_attribute *attr,
+ char *page)
+{
+ struct perf_pmu_events_attr *pmu_attr =
+ container_of(attr, struct perf_pmu_events_attr, attr);
+
+ if (pmu_attr->event_str)
+ return sprintf(page, "%s", pmu_attr->event_str);
+
+ return 0;
+}
+
+/* node attribute depending on number of numa nodes */
+static ssize_t node_show(struct device *dev, struct device_attribute *attr, char *page)
+{
+ if (NODES_SHIFT)
+ return sprintf(page, "config:16-%d\n", 16 + NODES_SHIFT - 1);
+ else
+ return sprintf(page, "config:16\n");
+}
+
+struct device_attribute format_attr_node = __ATTR_RO(node);
+
+static int thunder_uncore_pmu_cpu_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct thunder_uncore *uncore = container_of(nb, struct thunder_uncore, cpu_nb);
+ int new_cpu, old_cpu = (long) data;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_DOWN_PREPARE:
+ if (!cpumask_test_and_clear_cpu(old_cpu, &thunder_active_mask))
+ break;
+ new_cpu = cpumask_any_but(cpu_online_mask, old_cpu);
+ if (new_cpu >= nr_cpu_ids)
+ break;
+ perf_pmu_migrate_context(uncore->pmu, old_cpu, new_cpu);
+ cpumask_set_cpu(new_cpu, &thunder_active_mask);
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct thunder_uncore_node *alloc_node(struct thunder_uncore *uncore, int node_id, int counters)
+{
+ struct thunder_uncore_node *node;
+
+ node = kzalloc(sizeof(struct thunder_uncore_node), GFP_KERNEL);
+ if (!node)
+ return NULL;
+ node->num_counters = counters;
+ INIT_LIST_HEAD(&node->unit_list);
+ return node;
+}
+
+int __init thunder_uncore_setup(struct thunder_uncore *uncore, int device_id,
+ unsigned long offset, unsigned long size,
+ struct pmu *pmu, int counters)
+{
+ struct thunder_uncore_unit *unit, *tmp;
+ struct thunder_uncore_node *node;
+ struct pci_dev *pdev = NULL;
+ int ret, node_id, found = 0;
+
+ /* detect PCI devices */
+ do {
+ pdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, device_id, pdev);
+ if (!pdev)
+ break;
+
+ node_id = dev_to_node(&pdev->dev);
+ /*
+ * -1 without NUMA, set to 0 because we always have at
+ * least node 0.
+ */
+ if (node_id < 0)
+ node_id = 0;
+
+ /* allocate node if necessary */
+ if (!uncore->nodes[node_id])
+ uncore->nodes[node_id] = alloc_node(uncore, node_id, counters);
+
+ node = uncore->nodes[node_id];
+ if (!node) {
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ unit = kzalloc(sizeof(struct thunder_uncore_unit), GFP_KERNEL);
+ if (!unit) {
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ unit->pdev = pdev;
+ unit->map = ioremap(pci_resource_start(pdev, 0) + offset, size);
+ list_add(&unit->entry, &node->unit_list);
+ node->nr_units++;
+ found++;
+ } while (1);
+
+ if (!found)
+ return -ENODEV;
+
+ /*
+ * perf PMU is CPU dependent in difference to our uncore devices.
+ * Just pick a CPU and migrate away if it goes offline.
+ */
+ cpumask_set_cpu(smp_processor_id(), &thunder_active_mask);
+
+ uncore->cpu_nb.notifier_call = thunder_uncore_pmu_cpu_notifier;
+ uncore->cpu_nb.priority = CPU_PRI_PERF + 1;
+ ret = register_cpu_notifier(&uncore->cpu_nb);
+ if (ret)
+ goto fail;
+
+ ret = perf_pmu_register(pmu, pmu->name, -1);
+ if (ret)
+ goto fail_pmu;
+
+ uncore->pmu = pmu;
+ return 0;
+
+fail_pmu:
+ unregister_cpu_notifier(&uncore->cpu_nb);
+fail:
+ node_id = 0;
+ while (uncore->nodes[node_id]) {
+ node = uncore->nodes[node_id];
+
+ list_for_each_entry_safe(unit, tmp, &node->unit_list, entry) {
+ if (unit->pdev) {
+ if (unit->map)
+ iounmap(unit->map);
+ pci_dev_put(unit->pdev);
+ }
+ kfree(unit);
+ }
+ kfree(uncore->nodes[node_id]);
+ node_id++;
+ }
+ return ret;
+}
+
+static int __init thunder_uncore_init(void)
+{
+ unsigned long implementor = read_cpuid_implementor();
+ unsigned long part_number = read_cpuid_part_number();
+ u32 variant;
+
+ if (implementor != ARM_CPU_IMP_CAVIUM ||
+ part_number != CAVIUM_CPU_PART_THUNDERX)
+ return -ENODEV;
+
+ /* detect pass2 which contains different counters */
+ variant = MIDR_VARIANT(read_cpuid_id());
+ if (variant == 1)
+ thunder_uncore_version = 1;
+ pr_info("PMU version: %d\n", thunder_uncore_version);
+
+ return 0;
+}
+late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
new file mode 100644
index 0000000..c799709
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -0,0 +1,78 @@
+#include <linux/perf_event.h>
+#include <linux/pci.h>
+#include <linux/list.h>
+#include <linux/io.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt) "thunderx_uncore: " fmt
+
+enum uncore_type {
+ NOP_TYPE,
+};
+
+extern int thunder_uncore_version;
+
+#define UNCORE_EVENT_ID_MASK 0xffff
+#define UNCORE_EVENT_ID_SHIFT 16
+
+/* maximum number of parallel hardware counters for all uncore parts */
+#define MAX_COUNTERS 64
+
+struct thunder_uncore_unit {
+ struct list_head entry;
+ void __iomem *map;
+ struct pci_dev *pdev;
+};
+
+struct thunder_uncore_node {
+ int nr_units;
+ int num_counters;
+ struct list_head unit_list;
+ struct perf_event *events[MAX_COUNTERS];
+};
+
+/* generic uncore struct for different pmu types */
+struct thunder_uncore {
+ int type;
+ struct pmu *pmu;
+ int (*event_valid)(u64);
+ struct notifier_block cpu_nb;
+ struct thunder_uncore_node *nodes[MAX_NUMNODES];
+};
+
+#define EVENT_PTR(_id) (&event_attr_##_id.attr.attr)
+
+#define EVENT_ATTR(_name, _val) \
+static struct perf_pmu_events_attr event_attr_##_name = { \
+ .attr = __ATTR(_name, 0444, thunder_events_sysfs_show, NULL), \
+ .event_str = "event=" __stringify(_val), \
+}
+
+#define EVENT_ATTR_STR(_name, _str) \
+static struct perf_pmu_events_attr event_attr_##_name = { \
+ .attr = __ATTR(_name, 0444, thunder_events_sysfs_show, NULL), \
+ .event_str = _str, \
+}
+
+static inline struct thunder_uncore_node *get_node(u64 config,
+ struct thunder_uncore *uncore)
+{
+ return uncore->nodes[config >> UNCORE_EVENT_ID_SHIFT];
+}
+
+#define get_id(config) (config & UNCORE_EVENT_ID_MASK)
+
+extern struct attribute_group thunder_uncore_attr_group;
+extern struct device_attribute format_attr_node;
+
+/* Prototypes */
+struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
+void thunder_uncore_del(struct perf_event *event, int flags);
+int thunder_uncore_event_init(struct perf_event *event);
+void thunder_uncore_read(struct perf_event *event);
+int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
+ unsigned long offset, unsigned long size,
+ struct pmu *pmu, int counters);
+ssize_t thunder_events_sysfs_show(struct device *dev,
+ struct device_attribute *attr,
+ char *page);
--
1.9.1
Support counters of the L2 cache crossbar connect.
Signed-off-by: Jan Glauber <[email protected]>
---
drivers/perf/uncore/Makefile | 3 +-
drivers/perf/uncore/uncore_cavium.c | 3 +
drivers/perf/uncore/uncore_cavium.h | 4 +
drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 ++++++++++++++++++++++++++++
4 files changed, 246 insertions(+), 1 deletion(-)
create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index 6a16caf..d52ecc9 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,2 +1,3 @@
obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o \
- uncore_cavium_l2c_tad.o
+ uncore_cavium_l2c_tad.o \
+ uncore_cavium_l2c_cbc.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index b92b2ae..a230450 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -17,6 +17,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
{
if (event->pmu->type == thunder_l2c_tad_pmu.type)
return thunder_uncore_l2c_tad;
+ else if (event->pmu->type == thunder_l2c_cbc_pmu.type)
+ return thunder_uncore_l2c_cbc;
else
return NULL;
}
@@ -300,6 +302,7 @@ static int __init thunder_uncore_init(void)
pr_info("PMU version: %d\n", thunder_uncore_version);
thunder_uncore_l2c_tad_setup();
+ thunder_uncore_l2c_cbc_setup();
return 0;
}
late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index 7a9c367..94bd02c 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -8,6 +8,7 @@
enum uncore_type {
L2C_TAD_TYPE,
+ L2C_CBC_TYPE,
};
extern int thunder_uncore_version;
@@ -66,7 +67,9 @@ extern struct attribute_group thunder_uncore_attr_group;
extern struct device_attribute format_attr_node;
extern struct thunder_uncore *thunder_uncore_l2c_tad;
+extern struct thunder_uncore *thunder_uncore_l2c_cbc;
extern struct pmu thunder_l2c_tad_pmu;
+extern struct pmu thunder_l2c_cbc_pmu;
/* Prototypes */
struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -81,3 +84,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
char *page);
int thunder_uncore_l2c_tad_setup(void);
+int thunder_uncore_l2c_cbc_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_l2c_cbc.c b/drivers/perf/uncore/uncore_cavium_l2c_cbc.c
new file mode 100644
index 0000000..bde7a51
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_l2c_cbc.c
@@ -0,0 +1,237 @@
+/*
+ * Cavium Thunder uncore PMU support, L2C CBC counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <[email protected]>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_L2C_CBC
+#define PCI_DEVICE_ID_THUNDER_L2C_CBC 0xa02f
+#endif
+
+#define L2C_CBC_NR_COUNTERS 16
+
+/* L2C CBC event list */
+#define L2C_CBC_EVENT_XMC0 0x00
+#define L2C_CBC_EVENT_XMD0 0x01
+#define L2C_CBC_EVENT_RSC0 0x02
+#define L2C_CBC_EVENT_RSD0 0x03
+#define L2C_CBC_EVENT_INV0 0x04
+#define L2C_CBC_EVENT_IOC0 0x05
+#define L2C_CBC_EVENT_IOR0 0x06
+
+#define L2C_CBC_EVENT_XMC1 0x08 /* 0x40 */
+#define L2C_CBC_EVENT_XMD1 0x09
+#define L2C_CBC_EVENT_RSC1 0x0a
+#define L2C_CBC_EVENT_RSD1 0x0b
+#define L2C_CBC_EVENT_INV1 0x0c
+
+#define L2C_CBC_EVENT_XMC2 0x10 /* 0x80 */
+#define L2C_CBC_EVENT_XMD2 0x11
+#define L2C_CBC_EVENT_RSC2 0x12
+#define L2C_CBC_EVENT_RSD2 0x13
+
+struct thunder_uncore *thunder_uncore_l2c_cbc;
+
+int l2c_cbc_events[L2C_CBC_NR_COUNTERS] = {
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06,
+ 0x08, 0x09, 0x0a, 0x0b, 0x0c,
+ 0x10, 0x11, 0x12, 0x13
+};
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ struct thunder_uncore_unit *unit;
+ u64 prev;
+
+ node = get_node(hwc->config, uncore);
+
+ /* restore counter value divided by units into all counters */
+ if (flags & PERF_EF_RELOAD) {
+ prev = local64_read(&hwc->prev_count);
+ prev = prev / node->nr_units;
+
+ list_for_each_entry(unit, &node->unit_list, entry)
+ writeq(prev, hwc->event_base + unit->map);
+ }
+
+ hwc->state = 0;
+ perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ thunder_uncore_read(event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ int id, i;
+
+ WARN_ON_ONCE(!uncore);
+ node = get_node(hwc->config, uncore);
+ id = get_id(hwc->config);
+
+ /* are we already assigned? */
+ if (hwc->idx != -1 && node->events[hwc->idx] == event)
+ goto out;
+
+ for (i = 0; i < node->num_counters; i++) {
+ if (node->events[i] == event) {
+ hwc->idx = i;
+ goto out;
+ }
+ }
+
+ /* these counters are self-sustained so idx must match the counter! */
+ hwc->idx = -1;
+ for (i = 0; i < node->num_counters; i++) {
+ if (l2c_cbc_events[i] == id) {
+ if (cmpxchg(&node->events[i], NULL, event) == NULL) {
+ hwc->idx = i;
+ break;
+ }
+ }
+ }
+
+out:
+ if (hwc->idx == -1)
+ return -EBUSY;
+
+ hwc->event_base = id * sizeof(unsigned long long);
+
+ /* counter is not stoppable so avoiding PERF_HES_STOPPED */
+ hwc->state = PERF_HES_UPTODATE;
+
+ if (flags & PERF_EF_START)
+ thunder_uncore_start(event, 0);
+
+ return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *thunder_l2c_cbc_format_attr[] = {
+ &format_attr_event.attr,
+ &format_attr_node.attr,
+ NULL,
+};
+
+static struct attribute_group thunder_l2c_cbc_format_group = {
+ .name = "format",
+ .attrs = thunder_l2c_cbc_format_attr,
+};
+
+EVENT_ATTR(xmc0, L2C_CBC_EVENT_XMC0);
+EVENT_ATTR(xmd0, L2C_CBC_EVENT_XMD0);
+EVENT_ATTR(rsc0, L2C_CBC_EVENT_RSC0);
+EVENT_ATTR(rsd0, L2C_CBC_EVENT_RSD0);
+EVENT_ATTR(inv0, L2C_CBC_EVENT_INV0);
+EVENT_ATTR(ioc0, L2C_CBC_EVENT_IOC0);
+EVENT_ATTR(ior0, L2C_CBC_EVENT_IOR0);
+EVENT_ATTR(xmc1, L2C_CBC_EVENT_XMC1);
+EVENT_ATTR(xmd1, L2C_CBC_EVENT_XMD1);
+EVENT_ATTR(rsc1, L2C_CBC_EVENT_RSC1);
+EVENT_ATTR(rsd1, L2C_CBC_EVENT_RSD1);
+EVENT_ATTR(inv1, L2C_CBC_EVENT_INV1);
+EVENT_ATTR(xmc2, L2C_CBC_EVENT_XMC2);
+EVENT_ATTR(xmd2, L2C_CBC_EVENT_XMD2);
+EVENT_ATTR(rsc2, L2C_CBC_EVENT_RSC2);
+EVENT_ATTR(rsd2, L2C_CBC_EVENT_RSD2);
+
+static struct attribute *thunder_l2c_cbc_events_attr[] = {
+ EVENT_PTR(xmc0),
+ EVENT_PTR(xmd0),
+ EVENT_PTR(rsc0),
+ EVENT_PTR(rsd0),
+ EVENT_PTR(inv0),
+ EVENT_PTR(ioc0),
+ EVENT_PTR(ior0),
+ EVENT_PTR(xmc1),
+ EVENT_PTR(xmd1),
+ EVENT_PTR(rsc1),
+ EVENT_PTR(rsd1),
+ EVENT_PTR(inv1),
+ EVENT_PTR(xmc2),
+ EVENT_PTR(xmd2),
+ EVENT_PTR(rsc2),
+ EVENT_PTR(rsd2),
+ NULL,
+};
+
+static struct attribute_group thunder_l2c_cbc_events_group = {
+ .name = "events",
+ .attrs = thunder_l2c_cbc_events_attr,
+};
+
+static const struct attribute_group *thunder_l2c_cbc_attr_groups[] = {
+ &thunder_uncore_attr_group,
+ &thunder_l2c_cbc_format_group,
+ &thunder_l2c_cbc_events_group,
+ NULL,
+};
+
+struct pmu thunder_l2c_cbc_pmu = {
+ .attr_groups = thunder_l2c_cbc_attr_groups,
+ .name = "thunder_l2c_cbc",
+ .event_init = thunder_uncore_event_init,
+ .add = thunder_uncore_add,
+ .del = thunder_uncore_del,
+ .start = thunder_uncore_start,
+ .stop = thunder_uncore_stop,
+ .read = thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+ if (config <= L2C_CBC_EVENT_IOR0 ||
+ (config >= L2C_CBC_EVENT_XMC1 && config <= L2C_CBC_EVENT_INV1) ||
+ (config >= L2C_CBC_EVENT_XMC2 && config <= L2C_CBC_EVENT_RSD2))
+ return 1;
+ else
+ return 0;
+}
+
+int __init thunder_uncore_l2c_cbc_setup(void)
+{
+ int ret = -ENOMEM;
+
+ thunder_uncore_l2c_cbc = kzalloc(sizeof(struct thunder_uncore),
+ GFP_KERNEL);
+ if (!thunder_uncore_l2c_cbc)
+ goto fail_nomem;
+
+ ret = thunder_uncore_setup(thunder_uncore_l2c_cbc,
+ PCI_DEVICE_ID_THUNDER_L2C_CBC,
+ 0,
+ 0x100,
+ &thunder_l2c_cbc_pmu,
+ L2C_CBC_NR_COUNTERS);
+ if (ret)
+ goto fail;
+
+ thunder_uncore_l2c_cbc->type = L2C_CBC_TYPE;
+ thunder_uncore_l2c_cbc->event_valid = event_valid;
+ return 0;
+
+fail:
+ kfree(thunder_uncore_l2c_cbc);
+fail_nomem:
+ return ret;
+}
--
1.9.1
Support for the OCX transmit link counters.
Signed-off-by: Jan Glauber <[email protected]>
---
drivers/perf/uncore/Makefile | 3 +-
drivers/perf/uncore/uncore_cavium.c | 3 +
drivers/perf/uncore/uncore_cavium.h | 4 +
drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++++++++++++
4 files changed, 389 insertions(+), 1 deletion(-)
create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index 81479e8..88d1f57 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,4 +1,5 @@
obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o \
uncore_cavium_l2c_tad.o \
uncore_cavium_l2c_cbc.o \
- uncore_cavium_lmc.o
+ uncore_cavium_lmc.o \
+ uncore_cavium_ocx_tlk.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index 45c81d0..e210457 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -21,6 +21,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
return thunder_uncore_l2c_cbc;
else if (event->pmu->type == thunder_lmc_pmu.type)
return thunder_uncore_lmc;
+ else if (event->pmu->type == thunder_ocx_tlk_pmu.type)
+ return thunder_uncore_ocx_tlk;
else
return NULL;
}
@@ -306,6 +308,7 @@ static int __init thunder_uncore_init(void)
thunder_uncore_l2c_tad_setup();
thunder_uncore_l2c_cbc_setup();
thunder_uncore_lmc_setup();
+ thunder_uncore_ocx_tlk_setup();
return 0;
}
late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index f14f6be..78e95c7 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -10,6 +10,7 @@ enum uncore_type {
L2C_TAD_TYPE,
L2C_CBC_TYPE,
LMC_TYPE,
+ OCX_TLK_TYPE,
};
extern int thunder_uncore_version;
@@ -70,9 +71,11 @@ extern struct device_attribute format_attr_node;
extern struct thunder_uncore *thunder_uncore_l2c_tad;
extern struct thunder_uncore *thunder_uncore_l2c_cbc;
extern struct thunder_uncore *thunder_uncore_lmc;
+extern struct thunder_uncore *thunder_uncore_ocx_tlk;
extern struct pmu thunder_l2c_tad_pmu;
extern struct pmu thunder_l2c_cbc_pmu;
extern struct pmu thunder_lmc_pmu;
+extern struct pmu thunder_ocx_tlk_pmu;
/* Prototypes */
struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -89,3 +92,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
int thunder_uncore_l2c_tad_setup(void);
int thunder_uncore_l2c_cbc_setup(void);
int thunder_uncore_lmc_setup(void);
+int thunder_uncore_ocx_tlk_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_ocx_tlk.c b/drivers/perf/uncore/uncore_cavium_ocx_tlk.c
new file mode 100644
index 0000000..02f1bc1
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_ocx_tlk.c
@@ -0,0 +1,380 @@
+/*
+ * Cavium Thunder uncore PMU support, OCX TLK counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <[email protected]>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_OCX
+#define PCI_DEVICE_ID_THUNDER_OCX 0xa013
+#endif
+
+#define OCX_TLK_NR_UNITS 3
+#define OCX_TLK_UNIT_OFFSET 0x2000
+#define OCX_TLK_CONTROL_OFFSET 0x10040
+#define OCX_TLK_COUNTER_OFFSET 0x10400
+
+#define OCX_TLK_STAT_DISABLE 0
+#define OCX_TLK_STAT_ENABLE 1
+
+/* OCX TLK event list */
+#define OCX_TLK_EVENT_STAT_IDLE_CNT 0x00
+#define OCX_TLK_EVENT_STAT_DATA_CNT 0x01
+#define OCX_TLK_EVENT_STAT_SYNC_CNT 0x02
+#define OCX_TLK_EVENT_STAT_RETRY_CNT 0x03
+#define OCX_TLK_EVENT_STAT_ERR_CNT 0x04
+
+#define OCX_TLK_EVENT_STAT_MAT0_CNT 0x08
+#define OCX_TLK_EVENT_STAT_MAT1_CNT 0x09
+#define OCX_TLK_EVENT_STAT_MAT2_CNT 0x0a
+#define OCX_TLK_EVENT_STAT_MAT3_CNT 0x0b
+
+#define OCX_TLK_EVENT_STAT_VC0_CMD 0x10
+#define OCX_TLK_EVENT_STAT_VC1_CMD 0x11
+#define OCX_TLK_EVENT_STAT_VC2_CMD 0x12
+#define OCX_TLK_EVENT_STAT_VC3_CMD 0x13
+#define OCX_TLK_EVENT_STAT_VC4_CMD 0x14
+#define OCX_TLK_EVENT_STAT_VC5_CMD 0x15
+
+#define OCX_TLK_EVENT_STAT_VC0_PKT 0x20
+#define OCX_TLK_EVENT_STAT_VC1_PKT 0x21
+#define OCX_TLK_EVENT_STAT_VC2_PKT 0x22
+#define OCX_TLK_EVENT_STAT_VC3_PKT 0x23
+#define OCX_TLK_EVENT_STAT_VC4_PKT 0x24
+#define OCX_TLK_EVENT_STAT_VC5_PKT 0x25
+#define OCX_TLK_EVENT_STAT_VC6_PKT 0x26
+#define OCX_TLK_EVENT_STAT_VC7_PKT 0x27
+#define OCX_TLK_EVENT_STAT_VC8_PKT 0x28
+#define OCX_TLK_EVENT_STAT_VC9_PKT 0x29
+#define OCX_TLK_EVENT_STAT_VC10_PKT 0x2a
+#define OCX_TLK_EVENT_STAT_VC11_PKT 0x2b
+#define OCX_TLK_EVENT_STAT_VC12_PKT 0x2c
+#define OCX_TLK_EVENT_STAT_VC13_PKT 0x2d
+
+#define OCX_TLK_EVENT_STAT_VC0_CON 0x30
+#define OCX_TLK_EVENT_STAT_VC1_CON 0x31
+#define OCX_TLK_EVENT_STAT_VC2_CON 0x32
+#define OCX_TLK_EVENT_STAT_VC3_CON 0x33
+#define OCX_TLK_EVENT_STAT_VC4_CON 0x34
+#define OCX_TLK_EVENT_STAT_VC5_CON 0x35
+#define OCX_TLK_EVENT_STAT_VC6_CON 0x36
+#define OCX_TLK_EVENT_STAT_VC7_CON 0x37
+#define OCX_TLK_EVENT_STAT_VC8_CON 0x38
+#define OCX_TLK_EVENT_STAT_VC9_CON 0x39
+#define OCX_TLK_EVENT_STAT_VC10_CON 0x3a
+#define OCX_TLK_EVENT_STAT_VC11_CON 0x3b
+#define OCX_TLK_EVENT_STAT_VC12_CON 0x3c
+#define OCX_TLK_EVENT_STAT_VC13_CON 0x3d
+
+#define OCX_TLK_MAX_COUNTER OCX_TLK_EVENT_STAT_VC13_CON
+#define OCX_TLK_NR_COUNTERS OCX_TLK_MAX_COUNTER
+
+struct thunder_uncore *thunder_uncore_ocx_tlk;
+
+/*
+ * The OCX devices have a single device per node, therefore picking the
+ * first device from the list is correct.
+ */
+static inline void __iomem *map_offset(struct thunder_uncore_node *node,
+ unsigned long addr, int offset, int nr)
+{
+ struct thunder_uncore_unit *unit;
+
+ unit = list_first_entry(&node->unit_list, struct thunder_uncore_unit,
+ entry);
+ return (void __iomem *) (addr + unit->map + nr * offset);
+}
+
+static void __iomem *map_offset_ocx_tlk(struct thunder_uncore_node *node,
+ unsigned long addr, int nr)
+{
+ return (void __iomem *) map_offset(node, addr, nr,
+ OCX_TLK_UNIT_OFFSET);
+}
+
+/*
+ * Summarize counters across all TLK's. Different from the other uncore
+ * PMUs because all TLK's are on one PCI device.
+ */
+static void thunder_uncore_read_ocx_tlk(struct perf_event *event)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ u64 prev, new = 0;
+ s64 delta;
+ int i;
+
+ /*
+ * No counter overflow interrupts so we do not
+ * have to worry about prev_count changing on us.
+ */
+
+ prev = local64_read(&hwc->prev_count);
+
+ /* read counter values from all units */
+ node = get_node(hwc->config, uncore);
+ for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+ new += readq(map_offset_ocx_tlk(node, hwc->event_base, i));
+
+ local64_set(&hwc->prev_count, new);
+ delta = new - prev;
+ local64_add(delta, &event->count);
+}
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ int i;
+
+ hwc->state = 0;
+
+ /* enable counters on all units */
+ node = get_node(hwc->config, uncore);
+ for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+ writeb(OCX_TLK_STAT_ENABLE,
+ map_offset_ocx_tlk(node, hwc->config_base, i));
+
+ perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ int i;
+
+ /* disable counters on all units */
+ node = get_node(hwc->config, uncore);
+ for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+ writeb(OCX_TLK_STAT_DISABLE,
+ map_offset_ocx_tlk(node, hwc->config_base, i));
+ hwc->state |= PERF_HES_STOPPED;
+
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ thunder_uncore_read_ocx_tlk(event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+ struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunder_uncore_node *node;
+ int id, i;
+
+ WARN_ON_ONCE(!uncore);
+ node = get_node(hwc->config, uncore);
+ id = get_id(hwc->config);
+
+ /* are we already assigned? */
+ if (hwc->idx != -1 && node->events[hwc->idx] == event)
+ goto out;
+
+ for (i = 0; i < node->num_counters; i++) {
+ if (node->events[i] == event) {
+ hwc->idx = i;
+ goto out;
+ }
+ }
+
+ /* counters are 1:1 */
+ hwc->idx = -1;
+ if (cmpxchg(&node->events[id], NULL, event) == NULL)
+ hwc->idx = id;
+
+out:
+ if (hwc->idx == -1)
+ return -EBUSY;
+
+ hwc->config_base = 0;
+ hwc->event_base = OCX_TLK_COUNTER_OFFSET - OCX_TLK_CONTROL_OFFSET +
+ hwc->idx * sizeof(unsigned long long);
+ hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+ if (flags & PERF_EF_START)
+ thunder_uncore_start(event, PERF_EF_RELOAD);
+ return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-5");
+
+static struct attribute *thunder_ocx_tlk_format_attr[] = {
+ &format_attr_event.attr,
+ &format_attr_node.attr,
+ NULL,
+};
+
+static struct attribute_group thunder_ocx_tlk_format_group = {
+ .name = "format",
+ .attrs = thunder_ocx_tlk_format_attr,
+};
+
+EVENT_ATTR(idle_cnt, OCX_TLK_EVENT_STAT_IDLE_CNT);
+EVENT_ATTR(data_cnt, OCX_TLK_EVENT_STAT_DATA_CNT);
+EVENT_ATTR(sync_cnt, OCX_TLK_EVENT_STAT_SYNC_CNT);
+EVENT_ATTR(retry_cnt, OCX_TLK_EVENT_STAT_RETRY_CNT);
+EVENT_ATTR(err_cnt, OCX_TLK_EVENT_STAT_ERR_CNT);
+EVENT_ATTR(mat0_cnt, OCX_TLK_EVENT_STAT_MAT0_CNT);
+EVENT_ATTR(mat1_cnt, OCX_TLK_EVENT_STAT_MAT1_CNT);
+EVENT_ATTR(mat2_cnt, OCX_TLK_EVENT_STAT_MAT2_CNT);
+EVENT_ATTR(mat3_cnt, OCX_TLK_EVENT_STAT_MAT3_CNT);
+EVENT_ATTR(vc0_cmd, OCX_TLK_EVENT_STAT_VC0_CMD);
+EVENT_ATTR(vc1_cmd, OCX_TLK_EVENT_STAT_VC1_CMD);
+EVENT_ATTR(vc2_cmd, OCX_TLK_EVENT_STAT_VC2_CMD);
+EVENT_ATTR(vc3_cmd, OCX_TLK_EVENT_STAT_VC3_CMD);
+EVENT_ATTR(vc4_cmd, OCX_TLK_EVENT_STAT_VC4_CMD);
+EVENT_ATTR(vc5_cmd, OCX_TLK_EVENT_STAT_VC5_CMD);
+EVENT_ATTR(vc0_pkt, OCX_TLK_EVENT_STAT_VC0_PKT);
+EVENT_ATTR(vc1_pkt, OCX_TLK_EVENT_STAT_VC1_PKT);
+EVENT_ATTR(vc2_pkt, OCX_TLK_EVENT_STAT_VC2_PKT);
+EVENT_ATTR(vc3_pkt, OCX_TLK_EVENT_STAT_VC3_PKT);
+EVENT_ATTR(vc4_pkt, OCX_TLK_EVENT_STAT_VC4_PKT);
+EVENT_ATTR(vc5_pkt, OCX_TLK_EVENT_STAT_VC5_PKT);
+EVENT_ATTR(vc6_pkt, OCX_TLK_EVENT_STAT_VC6_PKT);
+EVENT_ATTR(vc7_pkt, OCX_TLK_EVENT_STAT_VC7_PKT);
+EVENT_ATTR(vc8_pkt, OCX_TLK_EVENT_STAT_VC8_PKT);
+EVENT_ATTR(vc9_pkt, OCX_TLK_EVENT_STAT_VC9_PKT);
+EVENT_ATTR(vc10_pkt, OCX_TLK_EVENT_STAT_VC10_PKT);
+EVENT_ATTR(vc11_pkt, OCX_TLK_EVENT_STAT_VC11_PKT);
+EVENT_ATTR(vc12_pkt, OCX_TLK_EVENT_STAT_VC12_PKT);
+EVENT_ATTR(vc13_pkt, OCX_TLK_EVENT_STAT_VC13_PKT);
+EVENT_ATTR(vc0_con, OCX_TLK_EVENT_STAT_VC0_CON);
+EVENT_ATTR(vc1_con, OCX_TLK_EVENT_STAT_VC1_CON);
+EVENT_ATTR(vc2_con, OCX_TLK_EVENT_STAT_VC2_CON);
+EVENT_ATTR(vc3_con, OCX_TLK_EVENT_STAT_VC3_CON);
+EVENT_ATTR(vc4_con, OCX_TLK_EVENT_STAT_VC4_CON);
+EVENT_ATTR(vc5_con, OCX_TLK_EVENT_STAT_VC5_CON);
+EVENT_ATTR(vc6_con, OCX_TLK_EVENT_STAT_VC6_CON);
+EVENT_ATTR(vc7_con, OCX_TLK_EVENT_STAT_VC7_CON);
+EVENT_ATTR(vc8_con, OCX_TLK_EVENT_STAT_VC8_CON);
+EVENT_ATTR(vc9_con, OCX_TLK_EVENT_STAT_VC9_CON);
+EVENT_ATTR(vc10_con, OCX_TLK_EVENT_STAT_VC10_CON);
+EVENT_ATTR(vc11_con, OCX_TLK_EVENT_STAT_VC11_CON);
+EVENT_ATTR(vc12_con, OCX_TLK_EVENT_STAT_VC12_CON);
+EVENT_ATTR(vc13_con, OCX_TLK_EVENT_STAT_VC13_CON);
+
+static struct attribute *thunder_ocx_tlk_events_attr[] = {
+ EVENT_PTR(idle_cnt),
+ EVENT_PTR(data_cnt),
+ EVENT_PTR(sync_cnt),
+ EVENT_PTR(retry_cnt),
+ EVENT_PTR(err_cnt),
+ EVENT_PTR(mat0_cnt),
+ EVENT_PTR(mat1_cnt),
+ EVENT_PTR(mat2_cnt),
+ EVENT_PTR(mat3_cnt),
+ EVENT_PTR(vc0_cmd),
+ EVENT_PTR(vc1_cmd),
+ EVENT_PTR(vc2_cmd),
+ EVENT_PTR(vc3_cmd),
+ EVENT_PTR(vc4_cmd),
+ EVENT_PTR(vc5_cmd),
+ EVENT_PTR(vc0_pkt),
+ EVENT_PTR(vc1_pkt),
+ EVENT_PTR(vc2_pkt),
+ EVENT_PTR(vc3_pkt),
+ EVENT_PTR(vc4_pkt),
+ EVENT_PTR(vc5_pkt),
+ EVENT_PTR(vc6_pkt),
+ EVENT_PTR(vc7_pkt),
+ EVENT_PTR(vc8_pkt),
+ EVENT_PTR(vc9_pkt),
+ EVENT_PTR(vc10_pkt),
+ EVENT_PTR(vc11_pkt),
+ EVENT_PTR(vc12_pkt),
+ EVENT_PTR(vc13_pkt),
+ EVENT_PTR(vc0_con),
+ EVENT_PTR(vc1_con),
+ EVENT_PTR(vc2_con),
+ EVENT_PTR(vc3_con),
+ EVENT_PTR(vc4_con),
+ EVENT_PTR(vc5_con),
+ EVENT_PTR(vc6_con),
+ EVENT_PTR(vc7_con),
+ EVENT_PTR(vc8_con),
+ EVENT_PTR(vc9_con),
+ EVENT_PTR(vc10_con),
+ EVENT_PTR(vc11_con),
+ EVENT_PTR(vc12_con),
+ EVENT_PTR(vc13_con),
+ NULL,
+};
+
+static struct attribute_group thunder_ocx_tlk_events_group = {
+ .name = "events",
+ .attrs = thunder_ocx_tlk_events_attr,
+};
+
+static const struct attribute_group *thunder_ocx_tlk_attr_groups[] = {
+ &thunder_uncore_attr_group,
+ &thunder_ocx_tlk_format_group,
+ &thunder_ocx_tlk_events_group,
+ NULL,
+};
+
+struct pmu thunder_ocx_tlk_pmu = {
+ .attr_groups = thunder_ocx_tlk_attr_groups,
+ .name = "thunder_ocx_tlk",
+ .event_init = thunder_uncore_event_init,
+ .add = thunder_uncore_add,
+ .del = thunder_uncore_del,
+ .start = thunder_uncore_start,
+ .stop = thunder_uncore_stop,
+ .read = thunder_uncore_read_ocx_tlk,
+};
+
+static int event_valid(u64 config)
+{
+ if (config <= OCX_TLK_EVENT_STAT_ERR_CNT ||
+ (config >= OCX_TLK_EVENT_STAT_MAT0_CNT &&
+ config <= OCX_TLK_EVENT_STAT_MAT3_CNT) ||
+ (config >= OCX_TLK_EVENT_STAT_VC0_CMD &&
+ config <= OCX_TLK_EVENT_STAT_VC5_CMD) ||
+ (config >= OCX_TLK_EVENT_STAT_VC0_PKT &&
+ config <= OCX_TLK_EVENT_STAT_VC13_PKT) ||
+ (config >= OCX_TLK_EVENT_STAT_VC0_CON &&
+ config <= OCX_TLK_EVENT_STAT_VC13_CON))
+ return 1;
+ else
+ return 0;
+}
+
+int __init thunder_uncore_ocx_tlk_setup(void)
+{
+ int ret;
+
+ thunder_uncore_ocx_tlk = kzalloc(sizeof(struct thunder_uncore),
+ GFP_KERNEL);
+ if (!thunder_uncore_ocx_tlk) {
+ ret = -ENOMEM;
+ goto fail_nomem;
+ }
+
+ ret = thunder_uncore_setup(thunder_uncore_ocx_tlk,
+ PCI_DEVICE_ID_THUNDER_OCX,
+ OCX_TLK_CONTROL_OFFSET,
+ OCX_TLK_UNIT_OFFSET * OCX_TLK_NR_UNITS,
+ &thunder_ocx_tlk_pmu,
+ OCX_TLK_NR_COUNTERS);
+ if (ret)
+ goto fail;
+
+ thunder_uncore_ocx_tlk->type = OCX_TLK_TYPE;
+ thunder_uncore_ocx_tlk->event_valid = event_valid;
+ return 0;
+
+fail:
+ kfree(thunder_uncore_ocx_tlk);
+fail_nomem:
+ return ret;
+}
--
1.9.1
Hi Mark,
can you have a look at these patches?
Thanks,
Jan
On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> This patch series provides access to various counters on the ThunderX SOC.
>
> For details of the uncore implementation see patch #1.
>
> Patches #2-5 add the various ThunderX specific PMUs.
>
> As suggested I've put the files under drivers/perf/uncore. I would
> prefer this location over drivers/bus because not all of the uncore
> drivers are bus related.
>
> Changes to v1:
> - Added NUMA support
> - Fixed CPU hotplug by pmu migration
> - Moved files to drivers/perf/uncore
> - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
> - improved comments abount overflow interrupts
> - removed max device limit
> - trimmed include files
>
> Feedback welcome!
> Jan
>
> -------------------------------------------------
>
> Jan Glauber (5):
> arm64/perf: Basic uncore counter support for Cavium ThunderX
> arm64/perf: Cavium ThunderX L2C TAD uncore support
> arm64/perf: Cavium ThunderX L2C CBC uncore support
> arm64/perf: Cavium ThunderX LMC uncore support
> arm64/perf: Cavium ThunderX OCX TLK uncore support
>
> drivers/perf/Makefile | 1 +
> drivers/perf/uncore/Makefile | 5 +
> drivers/perf/uncore/uncore_cavium.c | 314 +++++++++++++++
> drivers/perf/uncore/uncore_cavium.h | 95 +++++
> drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
> drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
> drivers/perf/uncore/uncore_cavium_lmc.c | 196 +++++++++
> drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
> 8 files changed, 1828 insertions(+)
> create mode 100644 drivers/perf/uncore/Makefile
> create mode 100644 drivers/perf/uncore/uncore_cavium.c
> create mode 100644 drivers/perf/uncore/uncore_cavium.h
> create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
> create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
> create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
> create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
>
> --
> 1.9.1
Mark,
are these patches still queued or should I repost them?
--Jan
On Mon, Apr 04, 2016 at 01:03:13PM +0200, Jan Glauber wrote:
> Hi Mark,
>
> can you have a look at these patches?
>
> Thanks,
> Jan
>
> 2016-03-09 17:21 GMT+01:00 Jan Glauber <[email protected]>:
>
> This patch series provides access to various counters on the ThunderX SOC.
>
> For details of the uncore implementation see patch #1.
>
> Patches #2-5 add the various ThunderX specific PMUs.
>
> As suggested I've put the files under drivers/perf/uncore. I would
> prefer this location over drivers/bus because not all of the uncore
> drivers are bus related.
>
> Changes to v1:
> - Added NUMA support
> - Fixed CPU hotplug by pmu migration
> - Moved files to drivers/perf/uncore
> - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
> - improved comments abount overflow interrupts
> - removed max device limit
> - trimmed include files
>
> Feedback welcome!
> Jan
>
> -------------------------------------------------
>
> Jan Glauber (5):
> ? arm64/perf: Basic uncore counter support for Cavium ThunderX
> ? arm64/perf: Cavium ThunderX L2C TAD uncore support
> ? arm64/perf: Cavium ThunderX L2C CBC uncore support
> ? arm64/perf: Cavium ThunderX LMC uncore support
> ? arm64/perf: Cavium ThunderX OCX TLK uncore support
>
> ?drivers/perf/Makefile? ? ? ? ? ? ? ? ? ? ? ?|? ?1 +
> ?drivers/perf/uncore/Makefile? ? ? ? ? ? ? ? |? ?5 +
> ?drivers/perf/uncore/uncore_cavium.c? ? ? ? ?| 314 +++++++++++++++
> ?drivers/perf/uncore/uncore_cavium.h? ? ? ? ?|? 95 +++++
> ?drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
> ?drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600
> ++++++++++++++++++++++++++++
> ?drivers/perf/uncore/uncore_cavium_lmc.c? ? ?| 196 +++++++++
> ?drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
> ?8 files changed, 1828 insertions(+)
> ?create mode 100644 drivers/perf/uncore/Makefile
> ?create mode 100644 drivers/perf/uncore/uncore_cavium.c
> ?create mode 100644 drivers/perf/uncore/uncore_cavium.h
> ?create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
> ?create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
> ?create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
> ?create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
>
> --
> 1.9.1
>
>
>
On Wed, Mar 09, 2016 at 05:21:03PM +0100, Jan Glauber wrote:
> Provide "uncore" facilities for different non-CPU performance
> counter units. Based on Intel/AMD uncore pmu support.
>
> The uncore drivers cover quite different functionality including
> L2 Cache, memory controllers and interconnects.
>
> The uncore PMUs can be found under /sys/bus/event_source/devices.
> All counters are exported via sysfs in the corresponding events
> files under the PMU directory so the perf tool can list the event names.
>
> There are some points that are special in this implementation:
>
> 1) The PMU detection relies on PCI device detection. If a
> matching PCI device is found the PMU is created. The code can deal
> with multiple units of the same type, e.g. more than one memory
> controller.
> Note: There is also a CPUID check to determine the CPU variant,
> this is needed to support different hardware versions that use
> the same PCI IDs.
>
> 2) Counters are summarized across different units of the same type
> on one NUMA node.
> For instance L2C TAD 0..7 are presented as a single counter
> (adding the values from TAD 0 to 7). Although losing the ability
> to read a single value the merged values are easier to use.
Merging within a NUMA node, but no further seems a little arbitrary.
> 3) NUMA support. The device node id is used to group devices by node
> so counters on one node can be merged. The NUMA node can be selected
> via a new sysfs node attribute.
> Without NUMA support all devices will be on node 0.
It doesn't seem great that this depends on kernel configuration (which
is independent of HW configuration). It seems confusing for the user,
and fragile.
Do we not have access to another way of grouping cores (e.g. a socket
ID), that's independent of kernel configuration? That seems to be how
the x86 uncore PMUs are handled.
If we don't have that information, it really feels like we need
additional info from FW (which would also solve the CPUID issue with
point 1), or this is likely to be very fragile.
> +void thunder_uncore_read(struct perf_event *event)
> +{
> + struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunder_uncore_node *node;
> + struct thunder_uncore_unit *unit;
> + u64 prev, new = 0;
> + s64 delta;
> +
> + node = get_node(hwc->config, uncore);
> +
> + /*
> + * No counter overflow interrupts so we do not
> + * have to worry about prev_count changing on us.
> + */
> + prev = local64_read(&hwc->prev_count);
> +
> + /* read counter values from all units on the node */
> + list_for_each_entry(unit, &node->unit_list, entry)
> + new += readq(hwc->event_base + unit->map);
> +
> + local64_set(&hwc->prev_count, new);
> + delta = new - prev;
> + local64_add(delta, &event->count);
> +}
> +
> +void thunder_uncore_del(struct perf_event *event, int flags)
> +{
> + struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunder_uncore_node *node;
> + int i;
> +
> + event->pmu->stop(event, PERF_EF_UPDATE);
> +
> + /*
> + * For programmable counters we need to check where we installed it.
> + * To keep this function generic always test the more complicated
> + * case (free running counters won't need the loop).
> + */
> + node = get_node(hwc->config, uncore);
> + for (i = 0; i < node->num_counters; i++) {
> + if (cmpxchg(&node->events[i], event, NULL) == event)
> + break;
> + }
> + hwc->idx = -1;
> +}
It's very difficult to know what's going on here with the lack of a
corresponding *_add function. Is there any reason there is not a common
implementation, at least for the shared logic?
Similarly, it's difficult to know what state the read function is
expecting (e.g. are counters always initialised to 0 to start with)?
> +int thunder_uncore_event_init(struct perf_event *event)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunder_uncore_node *node;
> + struct thunder_uncore *uncore;
> +
> + if (event->attr.type != event->pmu->type)
> + return -ENOENT;
> +
> + /* we do not support sampling */
> + if (is_sampling_event(event))
> + return -EINVAL;
> +
> + /* counters do not have these bits */
> + if (event->attr.exclude_user ||
> + event->attr.exclude_kernel ||
> + event->attr.exclude_host ||
> + event->attr.exclude_guest ||
> + event->attr.exclude_hv ||
> + event->attr.exclude_idle)
> + return -EINVAL;
> +
> + /* counters are 64 bit wide and without overflow interrupts */
> +
It would be good to describe the implications of this; otherwise it
seems like a floating comment.
> + uncore = event_to_thunder_uncore(event);
> + if (!uncore)
> + return -ENODEV;
> + if (!uncore->event_valid(event->attr.config & UNCORE_EVENT_ID_MASK))
> + return -EINVAL;
> +
> + /* check NUMA node */
> + node = get_node(event->attr.config, uncore);
As above, I don't think using Linux NUMA node IDs is a good idea,
especially for a user-facing ABI.
> + if (!node) {
> + pr_debug("Invalid numa node selected\n");
> + return -EINVAL;
> + }
> +
> + hwc->config = event->attr.config;
> + hwc->idx = -1;
> + return 0;
> +}
What about the CPU handling?
Where do you verify that cpu != -1, and assign the event to a particular
CPU prior to the pmu::add callback? That should be common to all of your
uncore PMUs, and should live here.
> +static ssize_t node_show(struct device *dev, struct device_attribute *attr, char *page)
> +{
> + if (NODES_SHIFT)
> + return sprintf(page, "config:16-%d\n", 16 + NODES_SHIFT - 1);
> + else
> + return sprintf(page, "config:16\n");
> +}
I'm not keen on this depending on the kernel configuration.
> +static int thunder_uncore_pmu_cpu_notifier(struct notifier_block *nb,
> + unsigned long action, void *data)
> +{
> + struct thunder_uncore *uncore = container_of(nb, struct thunder_uncore, cpu_nb);
> + int new_cpu, old_cpu = (long) data;
> +
> + switch (action & ~CPU_TASKS_FROZEN) {
> + case CPU_DOWN_PREPARE:
> + if (!cpumask_test_and_clear_cpu(old_cpu, &thunder_active_mask))
> + break;
> + new_cpu = cpumask_any_but(cpu_online_mask, old_cpu);
Above it was mentioned that events are groups per node/socket. So surely
it doesn't make sens to migrate this to any arbitrary CPU, but only CPUs
in the same node?
If I have active events for node 0, what happens when I hotplug out the
last CPU for that node (but have CPUs online in node 1)?
Is it guaranteed that power is retained for the device?
> + if (new_cpu >= nr_cpu_ids)
> + break;
> + perf_pmu_migrate_context(uncore->pmu, old_cpu, new_cpu);
> + cpumask_set_cpu(new_cpu, &thunder_active_mask);
> + break;
> + default:
> + break;
> + }
> + return NOTIFY_OK;
> +}
> +
> +static struct thunder_uncore_node *alloc_node(struct thunder_uncore *uncore, int node_id, int counters)
> +{
> + struct thunder_uncore_node *node;
> +
> + node = kzalloc(sizeof(struct thunder_uncore_node), GFP_KERNEL);
Use:
node = kzalloc(sizeof(*node), GFP_KERNEL);
> +int __init thunder_uncore_setup(struct thunder_uncore *uncore, int device_id,
> + unsigned long offset, unsigned long size,
> + struct pmu *pmu, int counters)
> +{
> + struct thunder_uncore_unit *unit, *tmp;
> + struct thunder_uncore_node *node;
> + struct pci_dev *pdev = NULL;
> + int ret, node_id, found = 0;
> +
> + /* detect PCI devices */
> + do {
> + pdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, device_id, pdev);
> + if (!pdev)
> + break;
the loop would look cleaner like:
unsigned int vendor_id = PCI_VENDOR_ID_CAVIUM;
while ((pdev = pci_get_device(vendor_id, device_id, pdev))) {
...
}
> +
> + node_id = dev_to_node(&pdev->dev);
> + /*
> + * -1 without NUMA, set to 0 because we always have at
> + * least node 0.
> + */
> + if (node_id < 0)
> + node_id = 0;
Again, this seems fragile to me. I am very much not keen on this
behaviour varying based on a logically unrelated kernel configuration
option.
> +
> + /* allocate node if necessary */
> + if (!uncore->nodes[node_id])
> + uncore->nodes[node_id] = alloc_node(uncore, node_id, counters);
> +
> + node = uncore->nodes[node_id];
> + if (!node) {
> + ret = -ENOMEM;
> + goto fail;
> + }
> +
> + unit = kzalloc(sizeof(struct thunder_uncore_unit), GFP_KERNEL);
Use:
unit = kzalloc(sizeof(*unit), GFP_KERNEL)
> + /*
> + * perf PMU is CPU dependent in difference to our uncore devices.
> + * Just pick a CPU and migrate away if it goes offline.
> + */
> + cpumask_set_cpu(smp_processor_id(), &thunder_active_mask);
The current CPU is not guaranteed to be in the same node, no?
My comments earlier w.r.t. migration apply here too.
> +
> + uncore->cpu_nb.notifier_call = thunder_uncore_pmu_cpu_notifier;
> + uncore->cpu_nb.priority = CPU_PRI_PERF + 1;
> + ret = register_cpu_notifier(&uncore->cpu_nb);
> + if (ret)
> + goto fail;
> +
> + ret = perf_pmu_register(pmu, pmu->name, -1);
> + if (ret)
> + goto fail_pmu;
> +
> + uncore->pmu = pmu;
Typically, the data related to the PMU is put in a struct which wraps
the struct pmu. That allows you to map either way using container_of.
Is there a particular reason for thunder_uncore to not contain the
struct PMU, rather than a pointer to it?
Thanks,
Mark.
On Wed, Mar 09, 2016 at 05:21:04PM +0100, Jan Glauber wrote:
> Support counters of the L2 Cache tag and data units.
>
> Also support pass2 added/modified counters by checking MIDR.
>
> Signed-off-by: Jan Glauber <[email protected]>
> ---
> drivers/perf/uncore/Makefile | 3 +-
> drivers/perf/uncore/uncore_cavium.c | 6 +-
> drivers/perf/uncore/uncore_cavium.h | 7 +-
> drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
> 4 files changed, 613 insertions(+), 3 deletions(-)
> create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
>
> diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
> index b9c72c2..6a16caf 100644
> --- a/drivers/perf/uncore/Makefile
> +++ b/drivers/perf/uncore/Makefile
> @@ -1 +1,2 @@
> -obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
> +obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o \
> + uncore_cavium_l2c_tad.o
> diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
> index 4fd5e45..b92b2ae 100644
> --- a/drivers/perf/uncore/uncore_cavium.c
> +++ b/drivers/perf/uncore/uncore_cavium.c
> @@ -15,7 +15,10 @@ int thunder_uncore_version;
>
> struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
> {
> - return NULL;
> + if (event->pmu->type == thunder_l2c_tad_pmu.type)
> + return thunder_uncore_l2c_tad;
> + else
> + return NULL;
> }
If thunder_uncore contained the relevant struct pmu, you wouldn't need
this function.
You could take event->pmu, and use container_of to get the relevant
thunder_uncore.
So please do that and get rid of this function.
>
> void thunder_uncore_read(struct perf_event *event)
> @@ -296,6 +299,7 @@ static int __init thunder_uncore_init(void)
> thunder_uncore_version = 1;
> pr_info("PMU version: %d\n", thunder_uncore_version);
>
> + thunder_uncore_l2c_tad_setup();
> return 0;
> }
> late_initcall(thunder_uncore_init);
> diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
> index c799709..7a9c367 100644
> --- a/drivers/perf/uncore/uncore_cavium.h
> +++ b/drivers/perf/uncore/uncore_cavium.h
> @@ -7,7 +7,7 @@
> #define pr_fmt(fmt) "thunderx_uncore: " fmt
>
> enum uncore_type {
> - NOP_TYPE,
> + L2C_TAD_TYPE,
> };
>
> extern int thunder_uncore_version;
> @@ -65,6 +65,9 @@ static inline struct thunder_uncore_node *get_node(u64 config,
> extern struct attribute_group thunder_uncore_attr_group;
> extern struct device_attribute format_attr_node;
>
> +extern struct thunder_uncore *thunder_uncore_l2c_tad;
> +extern struct pmu thunder_l2c_tad_pmu;
The above hopefully means you can get rid of these.
> /* Prototypes */
> struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
> void thunder_uncore_del(struct perf_event *event, int flags);
> @@ -76,3 +79,5 @@ int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
> ssize_t thunder_events_sysfs_show(struct device *dev,
> struct device_attribute *attr,
> char *page);
> +
> +int thunder_uncore_l2c_tad_setup(void);
> diff --git a/drivers/perf/uncore/uncore_cavium_l2c_tad.c b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
> new file mode 100644
> index 0000000..c8dc305
> --- /dev/null
> +++ b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
> @@ -0,0 +1,600 @@
> +/*
> + * Cavium Thunder uncore PMU support, L2C TAD counters.
It would be good to put an explaination of the TAD unit here, even if
just expanding that to Tag And Data.
> + *
> + * Copyright 2016 Cavium Inc.
> + * Author: Jan Glauber <[email protected]>
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/perf_event.h>
Minor nit, but as a general note I'd recommend alphabetically sorting
your includes now.
That way any subsequent additions/removals are less likely to cause
painful conflicts (so long as they retain that order).
> +static void thunder_uncore_start(struct perf_event *event, int flags)
> +{
> + struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunder_uncore_node *node;
> + struct thunder_uncore_unit *unit;
> + u64 prev;
> + int id;
> +
> + node = get_node(hwc->config, uncore);
> + id = get_id(hwc->config);
> +
> + /* restore counter value divided by units into all counters */
> + if (flags & PERF_EF_RELOAD) {
> + prev = local64_read(&hwc->prev_count);
> + prev = prev / node->nr_units;
> +
> + list_for_each_entry(unit, &node->unit_list, entry)
> + writeq(prev, hwc->event_base + unit->map);
> + }
It would be vastly simpler to always restore zero into all counters, and
to update prev_count to account for this.
That will also save you any rounding loss from the division.
> +
> + hwc->state = 0;
> +
> + /* write byte in control registers for all units on the node */
> + list_for_each_entry(unit, &node->unit_list, entry)
> + writeb(id, hwc->config_base + unit->map);
That comment isn't very helpful. What is the intent and effect of this
write?
> +
> + perf_event_update_userpage(event);
> +}
> +
> +static void thunder_uncore_stop(struct perf_event *event, int flags)
> +{
> + struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunder_uncore_node *node;
> + struct thunder_uncore_unit *unit;
> +
> + /* reset selection value for all units on the node */
> + node = get_node(hwc->config, uncore);
> +
> + list_for_each_entry(unit, &node->unit_list, entry)
> + writeb(L2C_TAD_EVENTS_DISABLED, hwc->config_base + unit->map);
> + hwc->state |= PERF_HES_STOPPED;
> +
> + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> + thunder_uncore_read(event);
> + hwc->state |= PERF_HES_UPTODATE;
> + }
> +}
> +
> +static int thunder_uncore_add(struct perf_event *event, int flags)
> +{
> + struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunder_uncore_node *node;
> + int i;
> +
> + WARN_ON_ONCE(!uncore);
This is trivially never possible if uncore contains the pmu (or we
couldn't have initialised the event in the first place).
> + node = get_node(hwc->config, uncore);
> +
> + /* are we already assigned? */
> + if (hwc->idx != -1 && node->events[hwc->idx] == event)
> + goto out;
Why would the event already be assigned a particular counter?
Which other piece of code might do that?
As far as I can see, nothing else can.
> +
> + for (i = 0; i < node->num_counters; i++) {
> + if (node->events[i] == event) {
> + hwc->idx = i;
> + goto out;
> + }
> + }
This should never happen, in the absence of a programming error. An
event should not be added multiple times, and adds and dels should be
balanced.
> +
> + /* if not take the first available counter */
> + hwc->idx = -1;
> + for (i = 0; i < node->num_counters; i++) {
> + if (cmpxchg(&node->events[i], NULL, event) == NULL) {
> + hwc->idx = i;
> + break;
> + }
> + }
> +out:
> + if (hwc->idx == -1)
> + return -EBUSY;
> +
> + hwc->config_base = hwc->idx;
> + hwc->event_base = L2C_TAD_COUNTER_OFFSET +
> + hwc->idx * sizeof(unsigned long long);
What's going on here?
I see that we write use hwc->event_base as an offset into registers in
the HW, so a sizeof unsigned long long is unusual.
I'm guessing that you're figuring out the address of a 64 bit register.
A comment, and sizeof(u64) would be better.
> +EVENT_ATTR(l2t_hit, L2C_TAD_EVENT_L2T_HIT);
> +EVENT_ATTR(l2t_miss, L2C_TAD_EVENT_L2T_MISS);
> +EVENT_ATTR(l2t_noalloc, L2C_TAD_EVENT_L2T_NOALLOC);
> +EVENT_ATTR(l2_vic, L2C_TAD_EVENT_L2_VIC);
> +EVENT_ATTR(sc_fail, L2C_TAD_EVENT_SC_FAIL);
> +EVENT_ATTR(sc_pass, L2C_TAD_EVENT_SC_PASS);
> +EVENT_ATTR(lfb_occ, L2C_TAD_EVENT_LFB_OCC);
> +EVENT_ATTR(wait_lfb, L2C_TAD_EVENT_WAIT_LFB);
> +EVENT_ATTR(wait_vab, L2C_TAD_EVENT_WAIT_VAB);
> +EVENT_ATTR(rtg_hit, L2C_TAD_EVENT_RTG_HIT);
> +EVENT_ATTR(rtg_miss, L2C_TAD_EVENT_RTG_MISS);
> +EVENT_ATTR(l2_rtg_vic, L2C_TAD_EVENT_L2_RTG_VIC);
> +EVENT_ATTR(l2_open_oci, L2C_TAD_EVENT_L2_OPEN_OCI);
> +static struct attribute *thunder_l2c_tad_events_attr[] = {
> + EVENT_PTR(l2t_hit),
> + EVENT_PTR(l2t_miss),
> + EVENT_PTR(l2t_noalloc),
> + EVENT_PTR(l2_vic),
> + EVENT_PTR(sc_fail),
> + EVENT_PTR(sc_pass),
> + EVENT_PTR(lfb_occ),
> + EVENT_PTR(wait_lfb),
> + EVENT_PTR(wait_vab),
> + EVENT_PTR(rtg_hit),
> + EVENT_PTR(rtg_miss),
> + EVENT_PTR(l2_rtg_vic),
> + EVENT_PTR(l2_open_oci),
This duplication is tedious.
Please do something like we did for CCI in commit 5e442eba342e567e
("arm-cci: simplify sysfs attr handling") so you only need to define
each attribute once to create it and place it in the relevant attribute
pointer list.
Likewise for the other PMUs.
> +static struct attribute_group thunder_l2c_tad_events_group = {
> + .name = "events",
> + .attrs = NULL,
> +};
> +
> +static const struct attribute_group *thunder_l2c_tad_attr_groups[] = {
> + &thunder_uncore_attr_group,
> + &thunder_l2c_tad_format_group,
> + &thunder_l2c_tad_events_group,
> + NULL,
> +};
> +
> +struct pmu thunder_l2c_tad_pmu = {
> + .attr_groups = thunder_l2c_tad_attr_groups,
> + .name = "thunder_l2c_tad",
> + .event_init = thunder_uncore_event_init,
> + .add = thunder_uncore_add,
> + .del = thunder_uncore_del,
> + .start = thunder_uncore_start,
> + .stop = thunder_uncore_stop,
> + .read = thunder_uncore_read,
> +};
> +
> +static int event_valid(u64 config)
A bool would be clearer.
> +{
> + if ((config > 0 && config <= L2C_TAD_EVENT_WAIT_VAB) ||
> + config == L2C_TAD_EVENT_RTG_HIT ||
> + config == L2C_TAD_EVENT_RTG_MISS ||
> + config == L2C_TAD_EVENT_L2_RTG_VIC ||
> + config == L2C_TAD_EVENT_L2_OPEN_OCI ||
> + ((config & 0x80) && ((config & 0xf) <= 3)))
What are these last cases?
> + return 1;
> +
> + if (thunder_uncore_version == 1)
> + if (config == L2C_TAD_EVENT_OPEN_CCPI ||
> + (config >= L2C_TAD_EVENT_LOOKUP &&
> + config <= L2C_TAD_EVENT_LOOKUP_ALL) ||
> + (config >= L2C_TAD_EVENT_TAG_ALC_HIT &&
> + config <= L2C_TAD_EVENT_OCI_RTG_ALC_VIC &&
> + config != 0x4d &&
> + config != 0x66 &&
> + config != 0x67))
Likewise, what are these last cases?
Why not rule these out explicitly first?
> + return 1;
> +
> + return 0;
> +}
> +
> +int __init thunder_uncore_l2c_tad_setup(void)
> +{
> + int ret = -ENOMEM;
> +
> + thunder_uncore_l2c_tad = kzalloc(sizeof(struct thunder_uncore),
> + GFP_KERNEL);
As previously, sizeof(*ptr) is preferred to sizeof(type), though it
doesn't save you anything here.
> + if (!thunder_uncore_l2c_tad)
> + goto fail_nomem;
> +
> + if (thunder_uncore_version == 0)
> + thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_events_attr;
> + else /* default */
> + thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_pass2_events_attr;
> +
> + ret = thunder_uncore_setup(thunder_uncore_l2c_tad,
> + PCI_DEVICE_ID_THUNDER_L2C_TAD,
> + L2C_TAD_CONTROL_OFFSET,
> + L2C_TAD_COUNTER_OFFSET + L2C_TAD_NR_COUNTERS
> + * sizeof(unsigned long long),
It would be nicer to calculate the size earlier (with sizeof(u64) as
previously mentioned).
> + &thunder_l2c_tad_pmu,
> + L2C_TAD_NR_COUNTERS);
> + if (ret)
> + goto fail;
> +
> + thunder_uncore_l2c_tad->type = L2C_TAD_TYPE;
I believe this can go, with thunder_uncore containing a pmu.
Thanks,
Mark.
On Wed, Mar 09, 2016 at 05:21:05PM +0100, Jan Glauber wrote:
> @@ -300,6 +302,7 @@ static int __init thunder_uncore_init(void)
> pr_info("PMU version: %d\n", thunder_uncore_version);
>
> thunder_uncore_l2c_tad_setup();
> + thunder_uncore_l2c_cbc_setup();
> return 0;
> }
> late_initcall(thunder_uncore_init);
Why aren't these just probed independently, as separate PCI devices,
rather than using a shared initcall?
You'd have to read the MIDR a few times, but that's a tiny fraction of
the rest of the cost of probing, and you can keep the common portion as
a stateless library.
> +int l2c_cbc_events[L2C_CBC_NR_COUNTERS] = {
> + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06,
> + 0x08, 0x09, 0x0a, 0x0b, 0x0c,
> + 0x10, 0x11, 0x12, 0x13
> +};
What are these magic numbers?
A comment would be helpful here.
> +
> +static void thunder_uncore_start(struct perf_event *event, int flags)
> +{
> + struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunder_uncore_node *node;
> + struct thunder_uncore_unit *unit;
> + u64 prev;
> +
> + node = get_node(hwc->config, uncore);
> +
> + /* restore counter value divided by units into all counters */
> + if (flags & PERF_EF_RELOAD) {
> + prev = local64_read(&hwc->prev_count);
> + prev = prev / node->nr_units;
> +
> + list_for_each_entry(unit, &node->unit_list, entry)
> + writeq(prev, hwc->event_base + unit->map);
> + }
> +
> + hwc->state = 0;
> + perf_event_update_userpage(event);
> +}
This looks practically identical to the code in patch 2. Please factor
the common portion into the library code from patch 1 (zeroing the
registers), and share it.
> +
> +static void thunder_uncore_stop(struct perf_event *event, int flags)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> +
> + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> + thunder_uncore_read(event);
> + hwc->state |= PERF_HES_UPTODATE;
> + }
> +}
There's no stop control for this PMU?
I was under the impression the core perf code could read the counter
while it was stopped, and it would unexpectedly count increasing values.
Does PERF_HES_UPTODATE stop the core from reading the counter, or is
it the responsibility of the backend to check that? I see that
thunder_uncore_read does not.
Do you need PERF_HES_STOPPED, or does that not matter due to the lack of
interrupts?
> +
> +static int thunder_uncore_add(struct perf_event *event, int flags)
> +{
> + struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunder_uncore_node *node;
> + int id, i;
> +
> + WARN_ON_ONCE(!uncore);
> + node = get_node(hwc->config, uncore);
> + id = get_id(hwc->config);
> +
> + /* are we already assigned? */
> + if (hwc->idx != -1 && node->events[hwc->idx] == event)
> + goto out;
> +
> + for (i = 0; i < node->num_counters; i++) {
> + if (node->events[i] == event) {
> + hwc->idx = i;
> + goto out;
> + }
> + }
> +
> + /* these counters are self-sustained so idx must match the counter! */
> + hwc->idx = -1;
> + for (i = 0; i < node->num_counters; i++) {
> + if (l2c_cbc_events[i] == id) {
> + if (cmpxchg(&node->events[i], NULL, event) == NULL) {
> + hwc->idx = i;
> + break;
> + }
> + }
> + }
> +
> +out:
> + if (hwc->idx == -1)
> + return -EBUSY;
> +
> + hwc->event_base = id * sizeof(unsigned long long);
> +
> + /* counter is not stoppable so avoiding PERF_HES_STOPPED */
> + hwc->state = PERF_HES_UPTODATE;
> +
> + if (flags & PERF_EF_START)
> + thunder_uncore_start(event, 0);
> +
> + return 0;
> +}
This looks practically identical to code from path 2, and all my
comments there apply.
Please factor this out into the library code in patch 1, taking into
account my comments on patch 2.
Likewise, the remainder of the file is mostly a copy+paste of patch 2.
All those comments apply equally to this patch.
Thanks,
Mark.
On Tue, Apr 19, 2016 at 12:35:10PM +0200, Jan Glauber wrote:
> Mark,
>
> are these patches still queued or should I repost them?
Apologies for the delay. I've just given these a review.
I note an awful lot of duplication over patches 2-5. The pmu::add
implementations are practically identical, and I suspect can be shared
without much difficulty. Please try to address that unnecessary
duplication.
My comments on patches 2 and 3 larely apply to 4 and 5, so I haven't
reviewed those individually.
Thanks,
Mark.
>
> --Jan
>
> On Mon, Apr 04, 2016 at 01:03:13PM +0200, Jan Glauber wrote:
> > Hi Mark,
> >
> > can you have a look at these patches?
> >
> > Thanks,
> > Jan
> >
> > 2016-03-09 17:21 GMT+01:00 Jan Glauber <[email protected]>:
> >
> > This patch series provides access to various counters on the ThunderX SOC.
> >
> > For details of the uncore implementation see patch #1.
> >
> > Patches #2-5 add the various ThunderX specific PMUs.
> >
> > As suggested I've put the files under drivers/perf/uncore. I would
> > prefer this location over drivers/bus because not all of the uncore
> > drivers are bus related.
> >
> > Changes to v1:
> > - Added NUMA support
> > - Fixed CPU hotplug by pmu migration
> > - Moved files to drivers/perf/uncore
> > - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
> > - improved comments abount overflow interrupts
> > - removed max device limit
> > - trimmed include files
> >
> > Feedback welcome!
> > Jan
> >
> > -------------------------------------------------
> >
> > Jan Glauber (5):
> > arm64/perf: Basic uncore counter support for Cavium ThunderX
> > arm64/perf: Cavium ThunderX L2C TAD uncore support
> > arm64/perf: Cavium ThunderX L2C CBC uncore support
> > arm64/perf: Cavium ThunderX LMC uncore support
> > arm64/perf: Cavium ThunderX OCX TLK uncore support
> >
> > drivers/perf/Makefile | 1 +
> > drivers/perf/uncore/Makefile | 5 +
> > drivers/perf/uncore/uncore_cavium.c | 314 +++++++++++++++
> > drivers/perf/uncore/uncore_cavium.h | 95 +++++
> > drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
> > drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600
> > ++++++++++++++++++++++++++++
> > drivers/perf/uncore/uncore_cavium_lmc.c | 196 +++++++++
> > drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
> > 8 files changed, 1828 insertions(+)
> > create mode 100644 drivers/perf/uncore/Makefile
> > create mode 100644 drivers/perf/uncore/uncore_cavium.c
> > create mode 100644 drivers/perf/uncore/uncore_cavium.h
> > create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
> > create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
> > create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
> > create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
> >
> > --
> > 1.9.1
> >
> >
> >
>
On Tue, Apr 19, 2016 at 04:06:08PM +0100, Mark Rutland wrote:
> On Wed, Mar 09, 2016 at 05:21:03PM +0100, Jan Glauber wrote:
> > Provide "uncore" facilities for different non-CPU performance
> > counter units. Based on Intel/AMD uncore pmu support.
> >
> > The uncore drivers cover quite different functionality including
> > L2 Cache, memory controllers and interconnects.
> >
> > The uncore PMUs can be found under /sys/bus/event_source/devices.
> > All counters are exported via sysfs in the corresponding events
> > files under the PMU directory so the perf tool can list the event names.
> >
> > There are some points that are special in this implementation:
> >
> > 1) The PMU detection relies on PCI device detection. If a
> > matching PCI device is found the PMU is created. The code can deal
> > with multiple units of the same type, e.g. more than one memory
> > controller.
> > Note: There is also a CPUID check to determine the CPU variant,
> > this is needed to support different hardware versions that use
> > the same PCI IDs.
> >
> > 2) Counters are summarized across different units of the same type
> > on one NUMA node.
> > For instance L2C TAD 0..7 are presented as a single counter
> > (adding the values from TAD 0 to 7). Although losing the ability
> > to read a single value the merged values are easier to use.
>
> Merging within a NUMA node, but no further seems a little arbitrary.
>
> > 3) NUMA support. The device node id is used to group devices by node
> > so counters on one node can be merged. The NUMA node can be selected
> > via a new sysfs node attribute.
> > Without NUMA support all devices will be on node 0.
>
> It doesn't seem great that this depends on kernel configuration (which
> is independent of HW configuration). It seems confusing for the user,
> and fragile.
>
> Do we not have access to another way of grouping cores (e.g. a socket
> ID), that's independent of kernel configuration? That seems to be how
> the x86 uncore PMUs are handled.
I'm not sure how relevant the use case of a multi-node system without
CONFIG_NUMA is, but maybe we can get the socket ID from the
multiprocessor affinity register (MPIDR_EL1)? The AFF2 part (bits 23:16)
should contain the socket number on ThunderX.
Would that be better?
thanks,
Jan
> If we don't have that information, it really feels like we need
> additional info from FW (which would also solve the CPUID issue with
> point 1), or this is likely to be very fragile.
Hi Jan,
On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> Hi Mark,
>
> can you have a look at these patches?
Looks like Mark reviewed this last week -- are you planning to respin?
Will
Hi Will,
On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> Hi Jan,
>
> On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > Hi Mark,
> >
> > can you have a look at these patches?
>
> Looks like Mark reviewed this last week -- are you planning to respin?
>
> Will
Yes, of course. I just had no time yet and I'm a bit lost on how to
proceed without using the NUMA node information which Mark did not like
to be used.
The only way to know which device is on which node would be to look
at the PCI topology (which is also the source of the NUMA node_id).
We could do this manually in order to not depend on CONFIG_NUMA,
but I would like to know if that is acceptable before respinning the
patches.
Thanks!
Jan
On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > can you have a look at these patches?
> >
> > Looks like Mark reviewed this last week -- are you planning to respin?
>
> Yes, of course. I just had no time yet and I'm a bit lost on how to
> proceed without using the NUMA node information which Mark did not like
> to be used.
>
> The only way to know which device is on which node would be to look
> at the PCI topology (which is also the source of the NUMA node_id).
> We could do this manually in order to not depend on CONFIG_NUMA,
> but I would like to know if that is acceptable before respinning the
> patches.
That doesn't feel like it really addresses Mark's concerns -- it's just
another way to get the information that isn't a first-class PMU topology
description from firmware.
Now, I don't actually mind using the NUMA topology so much in the cases
where it genuinely correlates with the PMU topology. My objection is more
that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
result in working with an incorrect PMU topology and passing all of that
through to userspace.
So I'd prefer either making the driver depend on NUMA, or at the very least
failing to probe the PMU if we discover a socketed system and NUMA is not
selected. Do either of those work as a compromise?
Will
On Mon, Apr 25, 2016 at 02:19:07PM +0100, Will Deacon wrote:
> On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> > On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > > can you have a look at these patches?
> > >
> > > Looks like Mark reviewed this last week -- are you planning to respin?
> >
> > Yes, of course. I just had no time yet and I'm a bit lost on how to
> > proceed without using the NUMA node information which Mark did not like
> > to be used.
> >
> > The only way to know which device is on which node would be to look
> > at the PCI topology (which is also the source of the NUMA node_id).
> > We could do this manually in order to not depend on CONFIG_NUMA,
> > but I would like to know if that is acceptable before respinning the
> > patches.
>
> That doesn't feel like it really addresses Mark's concerns -- it's just
> another way to get the information that isn't a first-class PMU topology
> description from firmware.
>
> Now, I don't actually mind using the NUMA topology so much in the cases
> where it genuinely correlates with the PMU topology. My objection is more
> that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
> result in working with an incorrect PMU topology and passing all of that
> through to userspace.
>
> So I'd prefer either making the driver depend on NUMA, or at the very least
> failing to probe the PMU if we discover a socketed system and NUMA is not
> selected. Do either of those work as a compromise?
>
> Will
That sounds like a good compromise.
So I could do the following:
1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
information to determine the device node
2) If CONFIG_NUMA is not set we check if we run on a socketed system
a) In that case we return an error and give a message that CONFIG_NUMA needs
to be enabled
b) Otherwise we have a single node system and use node_id = 0
David noted that it would also be possible to extract the node id from
the physical address of the device, but I'm not sure that classifies as
'first-class' topology description...
--Jan
On Tue, Apr 26, 2016 at 02:08:09PM +0200, Jan Glauber wrote:
> On Mon, Apr 25, 2016 at 02:19:07PM +0100, Will Deacon wrote:
> > On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> > > On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > > > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > > > can you have a look at these patches?
> > > >
> > > > Looks like Mark reviewed this last week -- are you planning to respin?
> > >
> > > Yes, of course. I just had no time yet and I'm a bit lost on how to
> > > proceed without using the NUMA node information which Mark did not like
> > > to be used.
> > >
> > > The only way to know which device is on which node would be to look
> > > at the PCI topology (which is also the source of the NUMA node_id).
> > > We could do this manually in order to not depend on CONFIG_NUMA,
> > > but I would like to know if that is acceptable before respinning the
> > > patches.
> >
> > That doesn't feel like it really addresses Mark's concerns -- it's just
> > another way to get the information that isn't a first-class PMU topology
> > description from firmware.
> >
> > Now, I don't actually mind using the NUMA topology so much in the cases
> > where it genuinely correlates with the PMU topology. My objection is more
> > that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
> > result in working with an incorrect PMU topology and passing all of that
> > through to userspace.
> >
> > So I'd prefer either making the driver depend on NUMA, or at the very least
> > failing to probe the PMU if we discover a socketed system and NUMA is not
> > selected. Do either of those work as a compromise?
> >
> > Will
>
> That sounds like a good compromise.
>
> So I could do the following:
>
> 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
> information to determine the device node
>
> 2) If CONFIG_NUMA is not set we check if we run on a socketed system
>
> a) In that case we return an error and give a message that CONFIG_NUMA needs
> to be enabled
> b) Otherwise we have a single node system and use node_id = 0
That sounds sensible to me. How do you "check if we run on a socketed
system"? My assumption would be that you could figure this out from the
firmware tables?
> David noted that it would also be possible to extract the node id from
> the physical address of the device, but I'm not sure that classifies as
> 'first-class' topology description...
I'd rather avoid this sort of probing, as it inevitably breaks when it
sees new hardware that doesn't follow the unwritten assumptions of the
old hardware.
Will
On Tue, Apr 26, 2016 at 02:53:54PM +0100, Will Deacon wrote:
[...]
> >
> > That sounds like a good compromise.
> >
> > So I could do the following:
> >
> > 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
> > information to determine the device node
> >
> > 2) If CONFIG_NUMA is not set we check if we run on a socketed system
> >
> > a) In that case we return an error and give a message that CONFIG_NUMA needs
> > to be enabled
> > b) Otherwise we have a single node system and use node_id = 0
>
> That sounds sensible to me. How do you "check if we run on a socketed
> system"? My assumption would be that you could figure this out from the
> firmware tables?
There are probably multiple ways to detect a socketed system, with some quite
hardware specific. I would like to avoid parsing DT (and ACPI) though,
if possible.
A generic approach would be to do a query of the multiprocessor affinity
register (MPIDR_EL1) on all CPUs. The AFF2 part (bits 23:16) contains the
socket number on ThunderX. If this is non-zero on any CPU I would assume a
socketed system.
Would that be feasible?
thanks,
Jan
On Wed, Apr 27, 2016 at 12:51:56PM +0200, Jan Glauber wrote:
> On Tue, Apr 26, 2016 at 02:53:54PM +0100, Will Deacon wrote:
>
> [...]
>
> > >
> > > That sounds like a good compromise.
> > >
> > > So I could do the following:
> > >
> > > 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
> > > information to determine the device node
> > >
> > > 2) If CONFIG_NUMA is not set we check if we run on a socketed system
> > >
> > > a) In that case we return an error and give a message that CONFIG_NUMA needs
> > > to be enabled
> > > b) Otherwise we have a single node system and use node_id = 0
> >
> > That sounds sensible to me. How do you "check if we run on a socketed
> > system"? My assumption would be that you could figure this out from the
> > firmware tables?
>
> There are probably multiple ways to detect a socketed system, with some quite
> hardware specific. I would like to avoid parsing DT (and ACPI) though,
> if possible.
>
> A generic approach would be to do a query of the multiprocessor affinity
> register (MPIDR_EL1) on all CPUs. The AFF2 part (bits 23:16) contains the
> socket number on ThunderX. If this is non-zero on any CPU I would assume a
> socketed system.
>
> Would that be feasible?
As with checking the physical address of a peripheral, this is an
unwritten assumption, and I suspect that similarly, it will inevitably
break (e.g. if Aff3 becomes used).
If you expect kernels relevant to your platform to have NUMA support,
you can simply depend on NUMA to determine whether or not you have NUMA
nodes.
Regarding relying on NUMA nodes, I have two concerns:
In general a NUMA node is not necessarily a socket, as you can have NUMA
properties even within a socket. If you can guarantee that for your
platform NUMA nodes will always be sockets, then I guess using NUMA
nodes is ok, though I imagine that as with the physical address map and
organisation of CPU IDs, that's difficult to have set in stone.
Linux NUMA node IDs are arbitrary tokens, and may not necessarily idmap
to documented socket IDs for your platform (even if they happen to
today). If you're happy to have users figure out how those IDs map to
clusters, that's fine, but otherwise you need to expose additional
information such that users get what they expect (at which point, if you
have said information we probably don't need NUMA information).
Thanks,
Mark.