This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices.
The SoC has PMU support in L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).
v6:
Rebased to 4.18-rc1
Updated with comments from John Garry[3]
[3] https://lkml.org/lkml/2018/5/17/408
v5:
-Incroporated review comments from Mark Rutland[2]
v4:
-Incroporated review comments from Mark Rutland[1]
[1] https://www.spinics.net/lists/arm-kernel/msg588563.html
[2] https://lkml.org/lkml/2018/4/26/376
v3:
- fixed warning reported by kbuild robot
v2:
- rebased to 4.12-rc1
- Removed Arch VULCAN dependency.
- update SMC call parameters as per latest firmware.
v1:
-Initial patch
Ganapatrao Kulkarni (2):
perf: uncore: Adding documentation for ThunderX2 pmu uncore driver
ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
Documentation/perf/thunderx2-pmu.txt | 66 +++
drivers/perf/Kconfig | 8 +
drivers/perf/Makefile | 1 +
drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++
include/linux/cpuhotplug.h | 1 +
5 files changed, 1025 insertions(+)
create mode 100644 Documentation/perf/thunderx2-pmu.txt
create mode 100644 drivers/perf/thunderx2_pmu.c
--
2.9.4
Documentation for the UNCORE PMUs on Cavium's ThunderX2 SoC.
The SoC has PMU support in its L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).
Signed-off-by: Ganapatrao Kulkarni <[email protected]>
---
Documentation/perf/thunderx2-pmu.txt | 66 ++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
create mode 100644 Documentation/perf/thunderx2-pmu.txt
diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt
new file mode 100644
index 0000000..7d89935
--- /dev/null
+++ b/Documentation/perf/thunderx2-pmu.txt
@@ -0,0 +1,66 @@
+
+Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
+==========================================================================
+
+ThunderX2 SoC PMU consists of independent system wide per Socket PMUs such
+as Level 3 Cache(L3C) and DDR4 Memory Controller(DMC).
+
+It has 8 independent DMC PMUs to capture performance events corresponding
+to 8 channels of DDR4 Memory Controller. There are 16 independent L3C PMUs
+to capture events corresponding to 16 tiles of L3 cache. Each PMU supports
+up to 4 counters.
+
+Counters are independently programmable and can be started and stopped
+individually. Each counter can be set to sample specific perf events.
+Counters are 32 bit and do not support overflow interrupt; they are
+sampled at every 2 seconds. The Counters register access are multiplexed
+across channels of DMC and L3C. The muxing(select channel) is done through
+write to a Secure register using smcc calls.
+
+PMU UNCORE (perf) driver:
+
+The thunderx2-pmu driver registers several perf PMUs for DMC and L3C devices.
+Each of the PMUs provides description of its available events
+and configuration options in sysfs.
+ see /sys/devices/uncore_<l3c_S_X/dmc_S_X/>
+
+S is socket id and X represents channel number.
+Each PMU can be used to sample up to 4 events simultaneously.
+
+The "format" directory describes format of the config (event ID).
+The "events" directory provides configuration templates for all
+supported event types that can be used with perf tool.
+
+For example, "uncore_dmc_0_0/cnt_cycles/" is an
+equivalent of "uncore_dmc_0_0/config=0x1/".
+
+Each perf driver also provides a "cpumask" sysfs attribute, which contains a
+single CPU ID of the processor which is likely to be used to handle all the
+PMU events. It will be the first online CPU from the NUMA node of PMU device.
+
+Example for perf tool use:
+
+perf stat -a -e \
+uncore_dmc_0_0/cnt_cycles/,\
+uncore_dmc_0_1/cnt_cycles/,\
+uncore_dmc_0_2/cnt_cycles/,\
+uncore_dmc_0_3/cnt_cycles/,\
+uncore_dmc_0_4/cnt_cycles/,\
+uncore_dmc_0_5/cnt_cycles/,\
+uncore_dmc_0_6/cnt_cycles/,\
+uncore_dmc_0_7/cnt_cycles/ sleep 1
+
+perf stat -a -e \
+uncore_dmc_0_0/cancelled_read_txns/,\
+uncore_dmc_0_0/cnt_cycles/,\
+uncore_dmc_0_0/consumed_read_txns/,\
+uncore_dmc_0_0/data_transfers/ sleep 1
+
+perf stat -a -e \
+uncore_l3c_0_0/l3_retry/,\
+uncore_l3c_0_0/read_hit/,\
+uncore_l3c_0_0/read_request/,\
+uncore_l3c_0_0/inv_request/ sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task (without "-a") perf sessions are not supported.
--
2.9.4
This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
Controller(DMC) and Level 3 Cache(L3C).
ThunderX2 has 8 independent DMC PMUs to capture performance events
corresponding to 8 channels of DDR4 Memory Controller and 16 independent
L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
Each PMU supports up to 4 counters. All counters lack overflow interrupt
and are sampled periodically.
Signed-off-by: Ganapatrao Kulkarni <[email protected]>
---
drivers/perf/Kconfig | 8 +
drivers/perf/Makefile | 1 +
drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++++++++++
include/linux/cpuhotplug.h | 1 +
4 files changed, 959 insertions(+)
create mode 100644 drivers/perf/thunderx2_pmu.c
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 08ebaf7..ecedb9e 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -87,6 +87,14 @@ config QCOM_L3_PMU
Adds the L3 cache PMU into the perf events subsystem for
monitoring L3 cache events.
+config THUNDERX2_PMU
+ bool "Cavium ThunderX2 SoC PMU UNCORE"
+ depends on ARCH_THUNDER2 && ARM64 && ACPI
+ help
+ Provides support for ThunderX2 UNCORE events.
+ The SoC has PMU support in its L3 cache controller (L3C) and
+ in the DDR4 Memory Controller (DMC).
+
config XGENE_PMU
depends on ARCH_XGENE
bool "APM X-Gene SoC PMU"
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index b3902bd..909f27f 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -7,5 +7,6 @@ obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
obj-$(CONFIG_HISI_PMU) += hisilicon/
obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o
obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
+obj-$(CONFIG_THUNDERX2_PMU) += thunderx2_pmu.o
obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
new file mode 100644
index 0000000..676e27e
--- /dev/null
+++ b/drivers/perf/thunderx2_pmu.c
@@ -0,0 +1,949 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CAVIUM THUNDERX2 SoC PMU UNCORE
+ * Copyright (C) 2018 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <[email protected]>
+ */
+
+#include <linux/acpi.h>
+#include <linux/arm-smccc.h>
+#include <linux/cpuhotplug.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+
+/* L3C and DMC has 16 and 8 channels per socket respectively.
+ * Each Channel supports UNCORE PMU device and consists of
+ * 4 independent programmable counters. Counters are 32 bit
+ * and do not support overflow interrupt, they need to be
+ * sampled before overflow(i.e, at every 2 seconds).
+ */
+
+#define UNCORE_MAX_COUNTERS 4
+#define UNCORE_DMC_MAX_CHANNELS 8
+#define UNCORE_L3_MAX_TILES 16
+
+#define UNCORE_HRTIMER_INTERVAL (2 * NSEC_PER_SEC)
+#define GET_EVENTID(ev) ((ev->hw.config) & 0x1ff)
+#define GET_COUNTERID(ev) ((ev->hw.idx) & 0xf)
+#define GET_CHANNELID(pmu_uncore) (pmu_uncore->channel)
+#define DMC_EVENT_CFG(idx, val) ((val) << (((idx) * 8) + 1))
+
+#define L3C_COUNTER_CTL 0xA8
+#define L3C_COUNTER_DATA 0xAC
+#define DMC_COUNTER_CTL 0x234
+#define DMC_COUNTER_DATA 0x240
+
+#define THUNDERX2_SMC_CALL_ID 0xC200FF00
+#define THUNDERX2_SMC_SET_CHANNEL 0xB010
+
+enum thunderx2_uncore_l3_events {
+ L3_EVENT_NONE,
+ L3_EVENT_NBU_CANCEL,
+ L3_EVENT_DIB_RETRY,
+ L3_EVENT_DOB_RETRY,
+ L3_EVENT_DIB_CREDIT_RETRY,
+ L3_EVENT_DOB_CREDIT_RETRY,
+ L3_EVENT_FORCE_RETRY,
+ L3_EVENT_IDX_CONFLICT_RETRY,
+ L3_EVENT_EVICT_CONFLICT_RETRY,
+ L3_EVENT_BANK_CONFLICT_RETRY,
+ L3_EVENT_FILL_ENTRY_RETRY,
+ L3_EVENT_EVICT_NOT_READY_RETRY,
+ L3_EVENT_L3_RETRY,
+ L3_EVENT_READ_REQ,
+ L3_EVENT_WRITE_BACK_REQ,
+ L3_EVENT_INVALIDATE_NWRITE_REQ,
+ L3_EVENT_INV_REQ,
+ L3_EVENT_SELF_REQ,
+ L3_EVENT_REQ,
+ L3_EVENT_EVICT_REQ,
+ L3_EVENT_INVALIDATE_NWRITE_HIT,
+ L3_EVENT_INVALIDATE_HIT,
+ L3_EVENT_SELF_HIT,
+ L3_EVENT_READ_HIT,
+ L3_EVENT_MAX,
+};
+
+enum thunderx2_uncore_dmc_events {
+ DMC_EVENT_NONE,
+ DMC_EVENT_COUNT_CYCLES,
+ DMC_EVENT_RES2,
+ DMC_EVENT_RES3,
+ DMC_EVENT_RES4,
+ DMC_EVENT_RES5,
+ DMC_EVENT_RES6,
+ DMC_EVENT_RES7,
+ DMC_EVENT_RES8,
+ DMC_EVENT_READ_64B_TXNS,
+ DMC_EVENT_READ_BELOW_64B_TXNS,
+ DMC_EVENT_WRITE_TXNS,
+ DMC_EVENT_TXN_CYCLES,
+ DMC_EVENT_DATA_TRANSFERS,
+ DMC_EVENT_CANCELLED_READ_TXNS,
+ DMC_EVENT_CONSUMED_READ_TXNS,
+ DMC_EVENT_MAX,
+};
+
+enum thunderx2_uncore_type {
+ PMU_TYPE_L3C,
+ PMU_TYPE_DMC,
+ PMU_TYPE_INVALID,
+};
+
+/*
+ * pmu on each socket has 2 uncore devices(dmc and l3),
+ * each uncore device has up to 16 channels, each channel can sample
+ * events independently with counters up to 4.
+ */
+struct thunderx2_pmu_uncore_channel {
+ struct pmu pmu;
+ struct hlist_node node;
+ struct thunderx2_pmu_uncore_dev *uncore_dev;
+ int channel;
+ int cpu;
+ DECLARE_BITMAP(active_counters, UNCORE_MAX_COUNTERS);
+ struct perf_event *events[UNCORE_MAX_COUNTERS];
+ struct hrtimer hrtimer;
+ /* to sync counter alloc/release */
+ raw_spinlock_t lock;
+};
+
+struct thunderx2_pmu_uncore_dev {
+ char *name;
+ struct device *dev;
+ enum thunderx2_uncore_type type;
+ void __iomem *base;
+ int node;
+ u32 max_counters;
+ u32 max_channels;
+ u32 max_events;
+ u64 hrtimer_interval;
+ /* this lock synchronizes across channels */
+ raw_spinlock_t lock;
+ const struct attribute_group **attr_groups;
+ void (*init_cntr_base)(struct perf_event *event,
+ struct thunderx2_pmu_uncore_dev *uncore_dev);
+ void (*select_channel)(struct perf_event *event);
+ void (*stop_event)(struct perf_event *event);
+ void (*start_event)(struct perf_event *event, int flags);
+};
+
+static inline struct thunderx2_pmu_uncore_channel *
+pmu_to_thunderx2_pmu_uncore(struct pmu *pmu)
+{
+ return container_of(pmu, struct thunderx2_pmu_uncore_channel, pmu);
+}
+
+/*
+ * sysfs format attributes
+ */
+static ssize_t thunderx2_pmu_format_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dev_ext_attribute *eattr;
+
+ eattr = container_of(attr, struct dev_ext_attribute, attr);
+ return sprintf(buf, "%s\n", (char *) eattr->var);
+}
+
+#define FORMAT_ATTR(_name, _config) \
+ (&((struct dev_ext_attribute[]) { \
+ { \
+ .attr = __ATTR(_name, 0444, thunderx2_pmu_format_show, NULL), \
+ .var = (void *) _config, \
+ } \
+ })[0].attr.attr)
+
+static struct attribute *l3c_pmu_format_attrs[] = {
+ FORMAT_ATTR(event, "config:0-4"),
+ NULL,
+};
+
+static struct attribute *dmc_pmu_format_attrs[] = {
+ FORMAT_ATTR(event, "config:0-4"),
+ NULL,
+};
+
+static const struct attribute_group l3c_pmu_format_attr_group = {
+ .name = "format",
+ .attrs = l3c_pmu_format_attrs,
+};
+
+static const struct attribute_group dmc_pmu_format_attr_group = {
+ .name = "format",
+ .attrs = dmc_pmu_format_attrs,
+};
+
+/*
+ * sysfs event attributes
+ */
+static ssize_t thunderx2_pmu_event_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dev_ext_attribute *eattr;
+
+ eattr = container_of(attr, struct dev_ext_attribute, attr);
+ return sprintf(buf, "config=0x%lx\n", (unsigned long) eattr->var);
+}
+
+#define EVENT_ATTR(_name, _config) \
+ (&((struct dev_ext_attribute[]) { \
+ { \
+ .attr = __ATTR(_name, 0444, thunderx2_pmu_event_show, NULL), \
+ .var = (void *) _config, \
+ } \
+ })[0].attr.attr)
+
+static struct attribute *l3c_pmu_events_attrs[] = {
+ EVENT_ATTR(nbu_cancel, L3_EVENT_NBU_CANCEL),
+ EVENT_ATTR(dib_retry, L3_EVENT_DIB_RETRY),
+ EVENT_ATTR(dob_retry, L3_EVENT_DOB_RETRY),
+ EVENT_ATTR(dib_credit_retry, L3_EVENT_DIB_CREDIT_RETRY),
+ EVENT_ATTR(dob_credit_retry, L3_EVENT_DOB_CREDIT_RETRY),
+ EVENT_ATTR(force_retry, L3_EVENT_FORCE_RETRY),
+ EVENT_ATTR(idx_conflict_retry, L3_EVENT_IDX_CONFLICT_RETRY),
+ EVENT_ATTR(evict_conflict_retry, L3_EVENT_EVICT_CONFLICT_RETRY),
+ EVENT_ATTR(bank_conflict_retry, L3_EVENT_BANK_CONFLICT_RETRY),
+ EVENT_ATTR(fill_entry_retry, L3_EVENT_FILL_ENTRY_RETRY),
+ EVENT_ATTR(evict_not_ready_retry, L3_EVENT_EVICT_NOT_READY_RETRY),
+ EVENT_ATTR(l3_retry, L3_EVENT_L3_RETRY),
+ EVENT_ATTR(read_request, L3_EVENT_READ_REQ),
+ EVENT_ATTR(write_back_request, L3_EVENT_WRITE_BACK_REQ),
+ EVENT_ATTR(inv_nwrite_request, L3_EVENT_INVALIDATE_NWRITE_REQ),
+ EVENT_ATTR(inv_request, L3_EVENT_INV_REQ),
+ EVENT_ATTR(self_request, L3_EVENT_SELF_REQ),
+ EVENT_ATTR(request, L3_EVENT_REQ),
+ EVENT_ATTR(evict_request, L3_EVENT_EVICT_REQ),
+ EVENT_ATTR(inv_nwrite_hit, L3_EVENT_INVALIDATE_NWRITE_HIT),
+ EVENT_ATTR(inv_hit, L3_EVENT_INVALIDATE_HIT),
+ EVENT_ATTR(self_hit, L3_EVENT_SELF_HIT),
+ EVENT_ATTR(read_hit, L3_EVENT_READ_HIT),
+ NULL,
+};
+
+static struct attribute *dmc_pmu_events_attrs[] = {
+ EVENT_ATTR(cnt_cycles, DMC_EVENT_COUNT_CYCLES),
+ EVENT_ATTR(read_64b_txns, DMC_EVENT_READ_64B_TXNS),
+ EVENT_ATTR(read_below_64b_txns, DMC_EVENT_READ_BELOW_64B_TXNS),
+ EVENT_ATTR(write_txns, DMC_EVENT_WRITE_TXNS),
+ EVENT_ATTR(txn_cycles, DMC_EVENT_TXN_CYCLES),
+ EVENT_ATTR(data_transfers, DMC_EVENT_DATA_TRANSFERS),
+ EVENT_ATTR(cancelled_read_txns, DMC_EVENT_CANCELLED_READ_TXNS),
+ EVENT_ATTR(consumed_read_txns, DMC_EVENT_CONSUMED_READ_TXNS),
+ NULL,
+};
+
+static const struct attribute_group l3c_pmu_events_attr_group = {
+ .name = "events",
+ .attrs = l3c_pmu_events_attrs,
+};
+
+static const struct attribute_group dmc_pmu_events_attr_group = {
+ .name = "events",
+ .attrs = dmc_pmu_events_attrs,
+};
+
+/*
+ * sysfs cpumask attributes
+ */
+static ssize_t cpumask_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct cpumask cpu_mask;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore =
+ pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
+
+ cpumask_clear(&cpu_mask);
+ cpumask_set_cpu(pmu_uncore->cpu, &cpu_mask);
+ return cpumap_print_to_pagebuf(true, buf, &cpu_mask);
+}
+static DEVICE_ATTR_RO(cpumask);
+
+static struct attribute *thunderx2_pmu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static const struct attribute_group pmu_cpumask_attr_group = {
+ .attrs = thunderx2_pmu_cpumask_attrs,
+};
+
+/*
+ * Per PMU device attribute groups
+ */
+static const struct attribute_group *l3c_pmu_attr_groups[] = {
+ &l3c_pmu_format_attr_group,
+ &pmu_cpumask_attr_group,
+ &l3c_pmu_events_attr_group,
+ NULL
+};
+
+static const struct attribute_group *dmc_pmu_attr_groups[] = {
+ &dmc_pmu_format_attr_group,
+ &pmu_cpumask_attr_group,
+ &dmc_pmu_events_attr_group,
+ NULL
+};
+
+static inline u32 reg_readl(unsigned long addr)
+{
+ return readl((void __iomem *)addr);
+}
+
+static inline void reg_writel(u32 val, unsigned long addr)
+{
+ writel(val, (void __iomem *)addr);
+}
+
+static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
+{
+ int counter;
+
+ raw_spin_lock(&pmu_uncore->lock);
+ counter = find_first_zero_bit(pmu_uncore->active_counters,
+ pmu_uncore->uncore_dev->max_counters);
+ if (counter == pmu_uncore->uncore_dev->max_counters) {
+ raw_spin_unlock(&pmu_uncore->lock);
+ return -ENOSPC;
+ }
+ set_bit(counter, pmu_uncore->active_counters);
+ raw_spin_unlock(&pmu_uncore->lock);
+ return counter;
+}
+
+static void free_counter(
+ struct thunderx2_pmu_uncore_channel *pmu_uncore, int counter)
+{
+ raw_spin_lock(&pmu_uncore->lock);
+ clear_bit(counter, pmu_uncore->active_counters);
+ raw_spin_unlock(&pmu_uncore->lock);
+}
+
+/*
+ * DMC and L3 counter interface is muxed across all channels.
+ * hence we need to select the channel before accessing counter
+ * data/control registers.
+ *
+ * L3 Tile and DMC channel selection is through SMC call
+ * SMC call arguments,
+ * x0 = THUNDERX2_SMC_CALL_ID (Vendor SMC call Id)
+ * x1 = THUNDERX2_SMC_SET_CHANNEL (Id to set DMC/L3C channel)
+ * x2 = Node id
+ * x3 = DMC(1)/L3C(0)
+ * x4 = channel Id
+ */
+static void uncore_select_channel(struct perf_event *event)
+{
+ struct arm_smccc_res res;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore =
+ pmu_to_thunderx2_pmu_uncore(event->pmu);
+ struct thunderx2_pmu_uncore_dev *uncore_dev =
+ pmu_uncore->uncore_dev;
+
+ arm_smccc_smc(THUNDERX2_SMC_CALL_ID, THUNDERX2_SMC_SET_CHANNEL,
+ uncore_dev->node, uncore_dev->type,
+ pmu_uncore->channel, 0, 0, 0, &res);
+ if (res.a0) {
+ dev_err(uncore_dev->dev,
+ "SMC to Select channel failed for PMU UNCORE[%s]\n",
+ pmu_uncore->uncore_dev->name);
+ }
+}
+
+/* early probe for firmware support */
+static int __init test_uncore_select_channel_early(struct device *dev)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_smc(THUNDERX2_SMC_CALL_ID, THUNDERX2_SMC_SET_CHANNEL,
+ dev_to_node(dev), 0, 0, 0, 0, 0, &res);
+ if (res.a0) {
+ dev_err(dev, "No Firmware support for PMU UNCORE(%d)\n",
+ dev_to_node(dev));
+ return -ENODEV;
+ }
+ return 0;
+}
+
+static void uncore_start_event_l3c(struct perf_event *event, int flags)
+{
+ u32 val;
+ struct hw_perf_event *hwc = &event->hw;
+
+ /* event id encoded in bits [07:03] */
+ val = GET_EVENTID(event) << 3;
+ reg_writel(val, hwc->config_base);
+ local64_set(&hwc->prev_count, 0);
+ reg_writel(0, hwc->event_base);
+}
+
+static void uncore_stop_event_l3c(struct perf_event *event)
+{
+ reg_writel(0, event->hw.config_base);
+}
+
+static void uncore_start_event_dmc(struct perf_event *event, int flags)
+{
+ u32 val;
+ struct hw_perf_event *hwc = &event->hw;
+ int idx = GET_COUNTERID(event);
+ int event_type = GET_EVENTID(event);
+
+ /* enable and start counters.
+ * 8 bits for each counter, bits[05:01] of a counter to set event type.
+ */
+ val = reg_readl(hwc->config_base);
+ val &= ~DMC_EVENT_CFG(idx, 0x1f);
+ val |= DMC_EVENT_CFG(idx, event_type);
+ reg_writel(val, hwc->config_base);
+ local64_set(&hwc->prev_count, 0);
+ reg_writel(0, hwc->event_base);
+}
+
+static void uncore_stop_event_dmc(struct perf_event *event)
+{
+ u32 val;
+ struct hw_perf_event *hwc = &event->hw;
+ int idx = GET_COUNTERID(event);
+
+ /* clear event type(bits[05:01]) to stop counter */
+ val = reg_readl(hwc->config_base);
+ val &= ~DMC_EVENT_CFG(idx, 0x1f);
+ reg_writel(val, hwc->config_base);
+}
+
+static void init_cntr_base_l3c(struct perf_event *event,
+ struct thunderx2_pmu_uncore_dev *uncore_dev)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ /* counter ctrl/data reg offset at 8 */
+ hwc->config_base = (unsigned long)uncore_dev->base
+ + L3C_COUNTER_CTL + (8 * GET_COUNTERID(event));
+ hwc->event_base = (unsigned long)uncore_dev->base
+ + L3C_COUNTER_DATA + (8 * GET_COUNTERID(event));
+}
+
+static void init_cntr_base_dmc(struct perf_event *event,
+ struct thunderx2_pmu_uncore_dev *uncore_dev)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ hwc->config_base = (unsigned long)uncore_dev->base
+ + DMC_COUNTER_CTL;
+ /* counter data reg offset at 0xc */
+ hwc->event_base = (unsigned long)uncore_dev->base
+ + DMC_COUNTER_DATA + (0xc * GET_COUNTERID(event));
+}
+
+static void thunderx2_uncore_update(struct perf_event *event)
+{
+ s64 prev, new = 0;
+ u64 delta;
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore;
+ enum thunderx2_uncore_type type;
+
+ pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
+ type = pmu_uncore->uncore_dev->type;
+
+ pmu_uncore->uncore_dev->select_channel(event);
+
+ new = reg_readl(hwc->event_base);
+ prev = local64_xchg(&hwc->prev_count, new);
+
+ /* handles rollover of 32 bit counter */
+ delta = (u32)(((1UL << 32) - prev) + new);
+ local64_add(delta, &event->count);
+}
+
+enum thunderx2_uncore_type get_uncore_device_type(struct acpi_device *adev)
+{
+ int i = 0;
+ struct acpi_uncore_device {
+ __u8 id[ACPI_ID_LEN];
+ enum thunderx2_uncore_type type;
+ } devices[] = {
+ {"CAV901D", PMU_TYPE_L3C},
+ {"CAV901F", PMU_TYPE_DMC},
+ {"", PMU_TYPE_INVALID}
+ };
+
+ while (devices[i].type != PMU_TYPE_INVALID) {
+ if (!strcmp(acpi_device_hid(adev), devices[i].id))
+ return devices[i].type;
+ i++;
+ }
+ return PMU_TYPE_INVALID;
+}
+
+/*
+ * We must NOT create groups containing events from multiple hardware PMUs,
+ * although mixing different software and hardware PMUs is allowed.
+ */
+static bool thunderx2_uncore_validate_event_group(struct perf_event *event)
+{
+ struct pmu *pmu = event->pmu;
+ struct perf_event *leader = event->group_leader;
+ struct perf_event *sibling;
+ int counters = 0;
+
+ if (leader->pmu != event->pmu && !is_software_event(leader))
+ return false;
+
+ for_each_sibling_event(sibling, event->group_leader) {
+ if (is_software_event(sibling))
+ continue;
+ if (sibling->pmu != pmu)
+ return false;
+ counters++;
+ }
+
+ /*
+ * If the group requires more counters than the HW has,
+ * it cannot ever be scheduled.
+ */
+ return counters < UNCORE_MAX_COUNTERS;
+}
+
+static int thunderx2_uncore_event_init(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore;
+
+ /* Test the event attr type check for PMU enumeration */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /*
+ * SOC PMU counters are shared across all cores.
+ * Therefore, it does not support per-process mode.
+ * Also, it does not support event sampling mode.
+ */
+ if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
+ return -EINVAL;
+
+ /* SOC counters do not have usr/os/guest/host bits */
+ if (event->attr.exclude_user || event->attr.exclude_kernel ||
+ event->attr.exclude_host || event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
+ event->cpu = pmu_uncore->cpu;
+
+ if (event->attr.config >= pmu_uncore->uncore_dev->max_events)
+ return -EINVAL;
+
+ /* store event id */
+ hwc->config = event->attr.config;
+
+ /* Validate the group */
+ if (!thunderx2_uncore_validate_event_group(event))
+ return -EINVAL;
+
+ return 0;
+}
+
+static void thunderx2_uncore_start(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore;
+ struct thunderx2_pmu_uncore_dev *uncore_dev;
+ unsigned long irqflags;
+
+ hwc->state = 0;
+ pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
+ uncore_dev = pmu_uncore->uncore_dev;
+
+ raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
+ uncore_dev->select_channel(event);
+ uncore_dev->start_event(event, flags);
+ raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
+
+ perf_event_update_userpage(event);
+
+ if (!find_last_bit(pmu_uncore->active_counters,
+ pmu_uncore->uncore_dev->max_counters)) {
+ hrtimer_start(&pmu_uncore->hrtimer,
+ ns_to_ktime(uncore_dev->hrtimer_interval),
+ HRTIMER_MODE_REL_PINNED);
+ }
+}
+
+static void thunderx2_uncore_stop(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore;
+ struct thunderx2_pmu_uncore_dev *uncore_dev;
+ unsigned long irqflags;
+
+ if (hwc->state & PERF_HES_UPTODATE)
+ return;
+
+ pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
+ uncore_dev = pmu_uncore->uncore_dev;
+
+ raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
+
+ uncore_dev->select_channel(event);
+ uncore_dev->stop_event(event);
+
+ WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+ hwc->state |= PERF_HES_STOPPED;
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ thunderx2_uncore_update(event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+ raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
+}
+
+static int thunderx2_uncore_add(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore;
+ struct thunderx2_pmu_uncore_dev *uncore_dev;
+
+ pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
+ uncore_dev = pmu_uncore->uncore_dev;
+
+ /* Allocate a free counter */
+ hwc->idx = alloc_counter(pmu_uncore);
+ if (hwc->idx < 0)
+ return -EAGAIN;
+
+ pmu_uncore->events[hwc->idx] = event;
+ /* set counter control and data registers base address */
+ uncore_dev->init_cntr_base(event, uncore_dev);
+
+ hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+ if (flags & PERF_EF_START)
+ thunderx2_uncore_start(event, flags);
+
+ return 0;
+}
+
+static void thunderx2_uncore_del(struct perf_event *event, int flags)
+{
+ struct thunderx2_pmu_uncore_channel *pmu_uncore =
+ pmu_to_thunderx2_pmu_uncore(event->pmu);
+ struct hw_perf_event *hwc = &event->hw;
+
+ thunderx2_uncore_stop(event, PERF_EF_UPDATE);
+
+ /* clear the assigned counter */
+ free_counter(pmu_uncore, GET_COUNTERID(event));
+
+ perf_event_update_userpage(event);
+ pmu_uncore->events[hwc->idx] = NULL;
+ hwc->idx = -1;
+}
+
+static void thunderx2_uncore_read(struct perf_event *event)
+{
+ unsigned long irqflags;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore =
+ pmu_to_thunderx2_pmu_uncore(event->pmu);
+
+ raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
+ thunderx2_uncore_update(event);
+ raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
+}
+
+static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
+ struct hrtimer *hrt)
+{
+ struct thunderx2_pmu_uncore_channel *pmu_uncore;
+ unsigned long irqflags;
+ int idx;
+ bool restart_timer = false;
+
+ pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
+ hrtimer);
+
+ raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
+ for_each_set_bit(idx, pmu_uncore->active_counters,
+ pmu_uncore->uncore_dev->max_counters) {
+ struct perf_event *event = pmu_uncore->events[idx];
+
+ thunderx2_uncore_update(event);
+ restart_timer = true;
+ }
+ raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
+
+ if (restart_timer)
+ hrtimer_forward_now(hrt,
+ ns_to_ktime(
+ pmu_uncore->uncore_dev->hrtimer_interval));
+
+ return restart_timer ? HRTIMER_RESTART : HRTIMER_NORESTART;
+}
+
+static int thunderx2_pmu_uncore_register(
+ struct thunderx2_pmu_uncore_channel *pmu_uncore)
+{
+ struct device *dev = pmu_uncore->uncore_dev->dev;
+ char *name = pmu_uncore->uncore_dev->name;
+ int channel = pmu_uncore->channel;
+
+ /* Perf event registration */
+ pmu_uncore->pmu = (struct pmu) {
+ .attr_groups = pmu_uncore->uncore_dev->attr_groups,
+ .task_ctx_nr = perf_invalid_context,
+ .event_init = thunderx2_uncore_event_init,
+ .add = thunderx2_uncore_add,
+ .del = thunderx2_uncore_del,
+ .start = thunderx2_uncore_start,
+ .stop = thunderx2_uncore_stop,
+ .read = thunderx2_uncore_read,
+ };
+
+ pmu_uncore->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
+ "%s_%d", name, channel);
+
+ return perf_pmu_register(&pmu_uncore->pmu, pmu_uncore->pmu.name, -1);
+}
+
+static int thunderx2_pmu_uncore_add(struct thunderx2_pmu_uncore_dev *uncore_dev,
+ int channel)
+{
+ struct thunderx2_pmu_uncore_channel *pmu_uncore;
+ int ret, cpu;
+
+ pmu_uncore = devm_kzalloc(uncore_dev->dev, sizeof(*pmu_uncore),
+ GFP_KERNEL);
+ if (!pmu_uncore)
+ return -ENOMEM;
+
+ cpu = cpumask_any_and(cpumask_of_node(uncore_dev->node),
+ cpu_online_mask);
+ if (cpu >= nr_cpu_ids)
+ return -EINVAL;
+
+ pmu_uncore->cpu = cpu;
+ pmu_uncore->channel = channel;
+ pmu_uncore->uncore_dev = uncore_dev;
+
+ hrtimer_init(&pmu_uncore->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ pmu_uncore->hrtimer.function = thunderx2_uncore_hrtimer_callback;
+
+ ret = thunderx2_pmu_uncore_register(pmu_uncore);
+ if (ret) {
+ dev_err(uncore_dev->dev, "%s PMU: Failed to init driver\n",
+ uncore_dev->name);
+ return -ENODEV;
+ }
+
+ /* register hotplug callback for the pmu */
+ ret = cpuhp_state_add_instance(
+ CPUHP_AP_PERF_ARM_THUNDERX2_UNCORE_ONLINE,
+ &pmu_uncore->node);
+ if (ret) {
+ dev_err(uncore_dev->dev, "Error %d registering hotplug", ret);
+ return ret;
+ }
+
+ dev_dbg(uncore_dev->dev, "%s PMU UNCORE registered\n",
+ pmu_uncore->pmu.name);
+ return ret;
+}
+
+static struct thunderx2_pmu_uncore_dev *init_pmu_uncore_dev(
+ struct device *dev, acpi_handle handle,
+ struct acpi_device *adev, u32 type)
+{
+ struct thunderx2_pmu_uncore_dev *uncore_dev;
+ void __iomem *base;
+ struct resource res;
+ struct resource_entry *rentry;
+ struct list_head list;
+ int ret;
+
+ INIT_LIST_HEAD(&list);
+ ret = acpi_dev_get_resources(adev, &list, NULL, NULL);
+ if (ret <= 0) {
+ dev_err(dev, "failed to parse _CRS method, error %d\n", ret);
+ return NULL;
+ }
+
+ list_for_each_entry(rentry, &list, node) {
+ if (resource_type(rentry->res) == IORESOURCE_MEM) {
+ res = *rentry->res;
+ break;
+ }
+ }
+
+ if (!rentry->res)
+ return NULL;
+
+ acpi_dev_free_resource_list(&list);
+ base = devm_ioremap_resource(dev, &res);
+ if (IS_ERR(base)) {
+ dev_err(dev, "PMU type %d: Fail to map resource\n", type);
+ return NULL;
+ }
+
+ uncore_dev = devm_kzalloc(dev, sizeof(*uncore_dev), GFP_KERNEL);
+ if (!uncore_dev)
+ return NULL;
+
+ uncore_dev->dev = dev;
+ uncore_dev->type = type;
+ uncore_dev->base = base;
+ uncore_dev->node = dev_to_node(dev);
+
+ raw_spin_lock_init(&uncore_dev->lock);
+
+ switch (uncore_dev->type) {
+ case PMU_TYPE_L3C:
+ uncore_dev->max_counters = UNCORE_MAX_COUNTERS;
+ uncore_dev->max_channels = UNCORE_L3_MAX_TILES;
+ uncore_dev->max_events = L3_EVENT_MAX;
+ uncore_dev->hrtimer_interval = UNCORE_HRTIMER_INTERVAL;
+ uncore_dev->attr_groups = l3c_pmu_attr_groups;
+ uncore_dev->name = devm_kasprintf(dev, GFP_KERNEL,
+ "uncore_l3c_%d", uncore_dev->node);
+ uncore_dev->init_cntr_base = init_cntr_base_l3c;
+ uncore_dev->start_event = uncore_start_event_l3c;
+ uncore_dev->stop_event = uncore_stop_event_l3c;
+ uncore_dev->select_channel = uncore_select_channel;
+ break;
+ case PMU_TYPE_DMC:
+ uncore_dev->max_counters = UNCORE_MAX_COUNTERS;
+ uncore_dev->max_channels = UNCORE_DMC_MAX_CHANNELS;
+ uncore_dev->max_events = DMC_EVENT_MAX;
+ uncore_dev->hrtimer_interval = UNCORE_HRTIMER_INTERVAL;
+ uncore_dev->attr_groups = dmc_pmu_attr_groups;
+ uncore_dev->name = devm_kasprintf(dev, GFP_KERNEL,
+ "uncore_dmc_%d", uncore_dev->node);
+ uncore_dev->init_cntr_base = init_cntr_base_dmc;
+ uncore_dev->start_event = uncore_start_event_dmc;
+ uncore_dev->stop_event = uncore_stop_event_dmc;
+ uncore_dev->select_channel = uncore_select_channel;
+ break;
+ case PMU_TYPE_INVALID:
+ devm_kfree(dev, uncore_dev);
+ uncore_dev = NULL;
+ break;
+ }
+
+ return uncore_dev;
+}
+
+static acpi_status thunderx2_pmu_uncore_dev_add(acpi_handle handle, u32 level,
+ void *data, void **return_value)
+{
+ struct thunderx2_pmu_uncore_dev *uncore_dev;
+ struct acpi_device *adev;
+ enum thunderx2_uncore_type type;
+ int channel;
+
+ if (acpi_bus_get_device(handle, &adev))
+ return AE_OK;
+ if (acpi_bus_get_status(adev) || !adev->status.present)
+ return AE_OK;
+
+ type = get_uncore_device_type(adev);
+ if (type == PMU_TYPE_INVALID)
+ return AE_OK;
+
+ uncore_dev = init_pmu_uncore_dev((struct device *)data, handle,
+ adev, type);
+
+ if (!uncore_dev)
+ return AE_ERROR;
+
+ for (channel = 0; channel < uncore_dev->max_channels; channel++) {
+ if (thunderx2_pmu_uncore_add(uncore_dev, channel)) {
+ /* Can't add the PMU device, abort */
+ return AE_ERROR;
+ }
+ }
+ return AE_OK;
+}
+
+static int thunderx2_uncore_pmu_offline_cpu(unsigned int cpu,
+ struct hlist_node *node)
+{
+ int new_cpu;
+ struct thunderx2_pmu_uncore_channel *pmu_uncore;
+
+ pmu_uncore = hlist_entry_safe(node,
+ struct thunderx2_pmu_uncore_channel, node);
+ if (cpu != pmu_uncore->cpu)
+ return 0;
+
+ new_cpu = cpumask_any_and(
+ cpumask_of_node(pmu_uncore->uncore_dev->node),
+ cpu_online_mask);
+ if (new_cpu >= nr_cpu_ids)
+ return 0;
+
+ pmu_uncore->cpu = new_cpu;
+ perf_pmu_migrate_context(&pmu_uncore->pmu, cpu, new_cpu);
+ return 0;
+}
+
+static const struct acpi_device_id thunderx2_uncore_acpi_match[] = {
+ {"CAV901C", 0},
+ {},
+};
+MODULE_DEVICE_TABLE(acpi, thunderx2_uncore_acpi_match);
+
+static int thunderx2_uncore_probe(struct platform_device *pdev)
+{
+ struct device *dev = &pdev->dev;
+ acpi_handle handle;
+ acpi_status status;
+
+ set_dev_node(dev, acpi_get_node(ACPI_HANDLE(dev)));
+
+ /* Make sure firmware supports DMC/L3C set channel smc call */
+ if (test_uncore_select_channel_early(dev))
+ return -ENODEV;
+
+ if (!has_acpi_companion(dev))
+ return -ENODEV;
+
+ handle = ACPI_HANDLE(dev);
+ if (!handle)
+ return -EINVAL;
+
+ /* Walk through the tree for all PMU UNCORE devices */
+ status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, 1,
+ thunderx2_pmu_uncore_dev_add,
+ NULL, dev, NULL);
+ if (ACPI_FAILURE(status)) {
+ dev_err(dev, "failed to probe PMU devices\n");
+ return_ACPI_STATUS(status);
+ }
+
+ dev_info(dev, "node%d: pmu uncore registered\n", dev_to_node(dev));
+ return 0;
+}
+
+static struct platform_driver thunderx2_uncore_driver = {
+ .probe = thunderx2_uncore_probe,
+ .driver = {
+ .name = "thunderx2-uncore-pmu",
+ .acpi_match_table = ACPI_PTR(thunderx2_uncore_acpi_match),
+ },
+};
+
+static int __init register_thunderx2_uncore_driver(void)
+{
+ int ret;
+
+ ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_ARM_THUNDERX2_UNCORE_ONLINE,
+ "perf/tx2/uncore:online",
+ NULL,
+ thunderx2_uncore_pmu_offline_cpu);
+ if (ret)
+ return ret;
+
+ return platform_driver_register(&thunderx2_uncore_driver);
+
+}
+device_initcall(register_thunderx2_uncore_driver);
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 8796ba3..eb0c896 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -161,6 +161,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_ARM_L2X0_ONLINE,
CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE,
+ CPUHP_AP_PERF_ARM_THUNDERX2_UNCORE_ONLINE,
CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
--
2.9.4
Hi Mark,
can you please review this patch?
On Thu, Jun 21, 2018 at 12:03 PM, Ganapatrao Kulkarni
<[email protected]> wrote:
> This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices.
> The SoC has PMU support in L3 cache controller (L3C) and in the
> DDR4 Memory Controller (DMC).
>
> v6:
> Rebased to 4.18-rc1
> Updated with comments from John Garry[3]
>
> [3] https://lkml.org/lkml/2018/5/17/408
>
> v5:
> -Incroporated review comments from Mark Rutland[2]
> v4:
> -Incroporated review comments from Mark Rutland[1]
>
> [1] https://www.spinics.net/lists/arm-kernel/msg588563.html
> [2] https://lkml.org/lkml/2018/4/26/376
>
> v3:
> - fixed warning reported by kbuild robot
>
> v2:
> - rebased to 4.12-rc1
> - Removed Arch VULCAN dependency.
> - update SMC call parameters as per latest firmware.
>
> v1:
> -Initial patch
>
> Ganapatrao Kulkarni (2):
> perf: uncore: Adding documentation for ThunderX2 pmu uncore driver
> ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
>
> Documentation/perf/thunderx2-pmu.txt | 66 +++
> drivers/perf/Kconfig | 8 +
> drivers/perf/Makefile | 1 +
> drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++
> include/linux/cpuhotplug.h | 1 +
> 5 files changed, 1025 insertions(+)
> create mode 100644 Documentation/perf/thunderx2-pmu.txt
> create mode 100644 drivers/perf/thunderx2_pmu.c
>
> --
> 2.9.4
>
thanks
Ganapat
Hi Ganapatrao,
On Wed, Jun 20, 2018 at 11:33 PM, Ganapatrao Kulkarni
<[email protected]> wrote:
> +
> +enum thunderx2_uncore_l3_events {
> + L3_EVENT_NONE,
> + L3_EVENT_NBU_CANCEL,
> + L3_EVENT_DIB_RETRY,
> + L3_EVENT_DOB_RETRY,
> + L3_EVENT_DIB_CREDIT_RETRY,
> + L3_EVENT_DOB_CREDIT_RETRY,
> + L3_EVENT_FORCE_RETRY,
> + L3_EVENT_IDX_CONFLICT_RETRY,
> + L3_EVENT_EVICT_CONFLICT_RETRY,
> + L3_EVENT_BANK_CONFLICT_RETRY,
> + L3_EVENT_FILL_ENTRY_RETRY,
> + L3_EVENT_EVICT_NOT_READY_RETRY,
> + L3_EVENT_L3_RETRY,
> + L3_EVENT_READ_REQ,
> + L3_EVENT_WRITE_BACK_REQ,
> + L3_EVENT_INVALIDATE_NWRITE_REQ,
> + L3_EVENT_INV_REQ,
> + L3_EVENT_SELF_REQ,
> + L3_EVENT_REQ,
> + L3_EVENT_EVICT_REQ,
> + L3_EVENT_INVALIDATE_NWRITE_HIT,
> + L3_EVENT_INVALIDATE_HIT,
> + L3_EVENT_SELF_HIT,
> + L3_EVENT_READ_HIT,
> + L3_EVENT_MAX,
> +};
> +
> +enum thunderx2_uncore_dmc_events {
> + DMC_EVENT_NONE,
> + DMC_EVENT_COUNT_CYCLES,
> + DMC_EVENT_RES2,
> + DMC_EVENT_RES3,
> + DMC_EVENT_RES4,
> + DMC_EVENT_RES5,
> + DMC_EVENT_RES6,
> + DMC_EVENT_RES7,
> + DMC_EVENT_RES8,
> + DMC_EVENT_READ_64B_TXNS,
> + DMC_EVENT_READ_BELOW_64B_TXNS,
> + DMC_EVENT_WRITE_TXNS,
> + DMC_EVENT_TXN_CYCLES,
> + DMC_EVENT_DATA_TRANSFERS,
> + DMC_EVENT_CANCELLED_READ_TXNS,
> + DMC_EVENT_CONSUMED_READ_TXNS,
> + DMC_EVENT_MAX,
> +};
Can you please provide a link to where these counters are documented?
It is not clear what each counter does especially for the L3C events.
Also, what counter do you need to use to get L3 hit/miss ratio? I
think this is the most useful counter related to L3.
Thanks,
--
Pranith
On Wed, Jul 4, 2018 at 3:44 PM, Ganapatrao Kulkarni <[email protected]> wrote:
> Hi Mark,
>
> can you please review this patch?
any comments on this patch series?
>
> On Thu, Jun 21, 2018 at 12:03 PM, Ganapatrao Kulkarni
> <[email protected]> wrote:
>> This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices.
>> The SoC has PMU support in L3 cache controller (L3C) and in the
>> DDR4 Memory Controller (DMC).
>>
>> v6:
>> Rebased to 4.18-rc1
>> Updated with comments from John Garry[3]
>>
>> [3] https://lkml.org/lkml/2018/5/17/408
>>
>> v5:
>> -Incroporated review comments from Mark Rutland[2]
>> v4:
>> -Incroporated review comments from Mark Rutland[1]
>>
>> [1] https://www.spinics.net/lists/arm-kernel/msg588563.html
>> [2] https://lkml.org/lkml/2018/4/26/376
>>
>> v3:
>> - fixed warning reported by kbuild robot
>>
>> v2:
>> - rebased to 4.12-rc1
>> - Removed Arch VULCAN dependency.
>> - update SMC call parameters as per latest firmware.
>>
>> v1:
>> -Initial patch
>>
>> Ganapatrao Kulkarni (2):
>> perf: uncore: Adding documentation for ThunderX2 pmu uncore driver
>> ThunderX2: Add Cavium ThunderX2 SoC UNCORE PMU driver
>>
>> Documentation/perf/thunderx2-pmu.txt | 66 +++
>> drivers/perf/Kconfig | 8 +
>> drivers/perf/Makefile | 1 +
>> drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++
>> include/linux/cpuhotplug.h | 1 +
>> 5 files changed, 1025 insertions(+)
>> create mode 100644 Documentation/perf/thunderx2-pmu.txt
>> create mode 100644 drivers/perf/thunderx2_pmu.c
>>
>> --
>> 2.9.4
>>
>
> thanks
> Ganapat
On Tue, Jul 10, 2018 at 4:56 AM, Ganapatrao Kulkarni <[email protected]> wrote:
> On Wed, Jul 4, 2018 at 3:44 PM, Ganapatrao Kulkarni <[email protected]> wrote:
>> Hi Mark,
>>
>> can you please review this patch?
>
> any comments on this patch series?
>
I had some comments on the second patch here:
https://www.spinics.net/lists/arm-kernel/msg664025.html
Thanks,
--
Pranith
Hi Ganapat,
On Thu, Jun 21, 2018 at 12:03:38PM +0530, Ganapatrao Kulkarni wrote:
> This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
> Controller(DMC) and Level 3 Cache(L3C).
>
> ThunderX2 has 8 independent DMC PMUs to capture performance events
> corresponding to 8 channels of DDR4 Memory Controller and 16 independent
> L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
> Each PMU supports up to 4 counters. All counters lack overflow interrupt
> and are sampled periodically.
>
> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> ---
> drivers/perf/Kconfig | 8 +
> drivers/perf/Makefile | 1 +
> drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/cpuhotplug.h | 1 +
> 4 files changed, 959 insertions(+)
> create mode 100644 drivers/perf/thunderx2_pmu.c
>
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 08ebaf7..ecedb9e 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -87,6 +87,14 @@ config QCOM_L3_PMU
> Adds the L3 cache PMU into the perf events subsystem for
> monitoring L3 cache events.
>
> +config THUNDERX2_PMU
> + bool "Cavium ThunderX2 SoC PMU UNCORE"
> + depends on ARCH_THUNDER2 && ARM64 && ACPI
Surely this depends on NUMA for the node id?
[...]
> +/*
> + * pmu on each socket has 2 uncore devices(dmc and l3),
> + * each uncore device has up to 16 channels, each channel can sample
> + * events independently with counters up to 4.
> + */
> +struct thunderx2_pmu_uncore_channel {
> + struct pmu pmu;
> + struct hlist_node node;
> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> + int channel;
> + int cpu;
> + DECLARE_BITMAP(active_counters, UNCORE_MAX_COUNTERS);
> + struct perf_event *events[UNCORE_MAX_COUNTERS];
> + struct hrtimer hrtimer;
> + /* to sync counter alloc/release */
> + raw_spinlock_t lock;
> +};
This lock should not be necessary. The pmu::{add,del} callbacks are
strictly serialised w.r.t. one another per CPU because the core perf
code holds ctx->lock whenever it calls either.
Previously, you mentioned you'd seen scheduling while atomic when this
lock was removed. Could you please provide a backtrace?
[...]
It would be worth a comment here explaining which resources the channel
PMUs share, and why we need this thunderx2_pmu_uncore_dev. IIUC, it's
just that they share a register windows switched by FW?
> +struct thunderx2_pmu_uncore_dev {
> + char *name;
> + struct device *dev;
> + enum thunderx2_uncore_type type;
> + void __iomem *base;
> + int node;
> + u32 max_counters;
> + u32 max_channels;
> + u32 max_events;
> + u64 hrtimer_interval;
> + /* this lock synchronizes across channels */
> + raw_spinlock_t lock;
> + const struct attribute_group **attr_groups;
> + void (*init_cntr_base)(struct perf_event *event,
> + struct thunderx2_pmu_uncore_dev *uncore_dev);
> + void (*select_channel)(struct perf_event *event);
> + void (*stop_event)(struct perf_event *event);
> + void (*start_event)(struct perf_event *event, int flags);
> +};
As a general thing, the really long structure/enum/macro names make
things really hard to read.
Could we s/THUNDERX2/TX2/ and s/thunderx2/tx2/ for identifiers please?
Similarly:
* s/thunderx2_pmu_uncore_channel/tx2_pmu/
* s/thunderx2_pmu_uncore_dev/tx2_pmu_group/
... and consistently name those "pmu" and "pmu_group" respectively --
that mekas the relationship between the two much clearer for someone who
is not intimately familiar with the hardware.
That makes things far more legible, e.g.
> +static inline struct thunderx2_pmu_uncore_channel *
> +pmu_to_thunderx2_pmu_uncore(struct pmu *pmu)
> +{
> + return container_of(pmu, struct thunderx2_pmu_uncore_channel, pmu);
> +}
... becomes:
static struct tx2_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
{
return container_of(pmu, struct tx2_pmu, pmu);
}
[...]
> +/*
> + * sysfs cpumask attributes
> + */
> +static ssize_t cpumask_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct cpumask cpu_mask;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
> +
> + cpumask_clear(&cpu_mask);
> + cpumask_set_cpu(pmu_uncore->cpu, &cpu_mask);
> + return cpumap_print_to_pagebuf(true, buf, &cpu_mask);
> +}
> +static DEVICE_ATTR_RO(cpumask);
This can be simplified to:
static ssize_t cpumask_show(struct device *dev, struct device_attribute
*attr, char *buf)
{
struct thunderx2_pmu_uncore_channel *pmu_uncore;
pmu_uncore = pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
return cpumap_print_to_pagebuf(true, buf, cpumask_of(pmu_uncore->cpu));
}
[...]
> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
> +{
> + int counter;
> +
> + raw_spin_lock(&pmu_uncore->lock);
> + counter = find_first_zero_bit(pmu_uncore->active_counters,
> + pmu_uncore->uncore_dev->max_counters);
> + if (counter == pmu_uncore->uncore_dev->max_counters) {
> + raw_spin_unlock(&pmu_uncore->lock);
> + return -ENOSPC;
> + }
> + set_bit(counter, pmu_uncore->active_counters);
> + raw_spin_unlock(&pmu_uncore->lock);
> + return counter;
> +}
> +
> +static void free_counter(
> + struct thunderx2_pmu_uncore_channel *pmu_uncore, int counter)
> +{
> + raw_spin_lock(&pmu_uncore->lock);
> + clear_bit(counter, pmu_uncore->active_counters);
> + raw_spin_unlock(&pmu_uncore->lock);
> +}
As above, I still don't believe that these locks are necessary, unless
we have a bug elsewhere.
[...]
> +/*
> + * DMC and L3 counter interface is muxed across all channels.
> + * hence we need to select the channel before accessing counter
> + * data/control registers.
> + *
> + * L3 Tile and DMC channel selection is through SMC call
> + * SMC call arguments,
> + * x0 = THUNDERX2_SMC_CALL_ID (Vendor SMC call Id)
> + * x1 = THUNDERX2_SMC_SET_CHANNEL (Id to set DMC/L3C channel)
> + * x2 = Node id
> + * x3 = DMC(1)/L3C(0)
> + * x4 = channel Id
> + */
Please document which ID space the node id is in, e.g.
x2 = FW Node ID (matching SRAT/SLIT IDs)
... so that it's clear that this isn't a Linux logical node id, even if
those happen to be the same today.
[...]
> +static void thunderx2_uncore_start(struct perf_event *event, int flags)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> + unsigned long irqflags;
> +
> + hwc->state = 0;
> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> + uncore_dev = pmu_uncore->uncore_dev;
> +
> + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
> + uncore_dev->select_channel(event);
> + uncore_dev->start_event(event, flags);
> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
> + perf_event_update_userpage(event);
> +
> + if (!find_last_bit(pmu_uncore->active_counters,
> + pmu_uncore->uncore_dev->max_counters)) {
This would be clearer using !bitmap_empty().
> + hrtimer_start(&pmu_uncore->hrtimer,
> + ns_to_ktime(uncore_dev->hrtimer_interval),
> + HRTIMER_MODE_REL_PINNED);
> + }
> +}
[...]
> +static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
> + struct hrtimer *hrt)
s/hrt/timer/ please
> +{
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> + unsigned long irqflags;
> + int idx;
> + bool restart_timer = false;
> +
> + pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
> + hrtimer);
> +
> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
> + for_each_set_bit(idx, pmu_uncore->active_counters,
> + pmu_uncore->uncore_dev->max_counters) {
> + struct perf_event *event = pmu_uncore->events[idx];
> +
> + thunderx2_uncore_update(event);
> + restart_timer = true;
> + }
> + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
> +
> + if (restart_timer)
> + hrtimer_forward_now(hrt,
> + ns_to_ktime(
> + pmu_uncore->uncore_dev->hrtimer_interval));
> +
> + return restart_timer ? HRTIMER_RESTART : HRTIMER_NORESTART;
> +}
You don't need to take the lock at all if there are no active events,
and we can avoid all the conditional logic that way. e.g. assuming the
renames I requested above:
static enum hrtimer_restart tx2_hrtimer_callback(struct hrtimer *timer)
{
struct tx2_pmu *pmu;
struct tx2_pmu_group *pmu_group;
unsigned long irqflags;
int max_counters;
int idx;
pmu = container_of(timer, struct tx2_pmu, hrtimer);
pmu_group = pmu->pmu_group;
max_counters = pmu_group->max_counters;
if (cpumask_empty(pmu->active_counters, max_counters))
return HRTIMER_NORESTART;
raw_spin_lock_irqsave(&pmu_group->lock, irqflags);
for_each_set_bit(idx, pmu->active_counters, max_counters) {
struct perf_event *event = pmu->events[idx];
tx2_uncore_update(event);
restart_timer = true;
}
raw_spin_unlock_irqrestore(&pmu_group->lock, irqflags);
hrtimer_forward_now(hrt, ns_to_ktime(pmu_group->hrtimer_interval));
return HRTIMER_RESTART;
}
[...]
> + switch (uncore_dev->type) {
> + case PMU_TYPE_L3C:
> + uncore_dev->max_counters = UNCORE_MAX_COUNTERS;
> + uncore_dev->max_channels = UNCORE_L3_MAX_TILES;
> + uncore_dev->max_events = L3_EVENT_MAX;
> + uncore_dev->hrtimer_interval = UNCORE_HRTIMER_INTERVAL;
> + uncore_dev->attr_groups = l3c_pmu_attr_groups;
> + uncore_dev->name = devm_kasprintf(dev, GFP_KERNEL,
> + "uncore_l3c_%d", uncore_dev->node);
> + uncore_dev->init_cntr_base = init_cntr_base_l3c;
> + uncore_dev->start_event = uncore_start_event_l3c;
> + uncore_dev->stop_event = uncore_stop_event_l3c;
> + uncore_dev->select_channel = uncore_select_channel;
> + break;
> + case PMU_TYPE_DMC:
> + uncore_dev->max_counters = UNCORE_MAX_COUNTERS;
> + uncore_dev->max_channels = UNCORE_DMC_MAX_CHANNELS;
> + uncore_dev->max_events = DMC_EVENT_MAX;
> + uncore_dev->hrtimer_interval = UNCORE_HRTIMER_INTERVAL;
> + uncore_dev->attr_groups = dmc_pmu_attr_groups;
> + uncore_dev->name = devm_kasprintf(dev, GFP_KERNEL,
> + "uncore_dmc_%d", uncore_dev->node);
> + uncore_dev->init_cntr_base = init_cntr_base_dmc;
> + uncore_dev->start_event = uncore_start_event_dmc;
> + uncore_dev->stop_event = uncore_stop_event_dmc;
> + uncore_dev->select_channel = uncore_select_channel;
> + break;
We should probably s/uncore/tx2/, or s/uncore/thunderx2/ to namespace
this.
> +static int thunderx2_uncore_pmu_offline_cpu(unsigned int cpu,
> + struct hlist_node *node)
> +{
> + int new_cpu;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> +
> + pmu_uncore = hlist_entry_safe(node,
> + struct thunderx2_pmu_uncore_channel, node);
> + if (cpu != pmu_uncore->cpu)
> + return 0;
> +
> + new_cpu = cpumask_any_and(
> + cpumask_of_node(pmu_uncore->uncore_dev->node),
> + cpu_online_mask);
> + if (new_cpu >= nr_cpu_ids)
> + return 0;
> +
> + pmu_uncore->cpu = new_cpu;
> + perf_pmu_migrate_context(&pmu_uncore->pmu, cpu, new_cpu);
> + return 0;
> +}
We'll also need a onlining callback. Consider if all CPUs in a NUMA node
were offlined, then we tried to online an arbitrary CPU from that node.
Thanks,
Mark.
thanks Mark for the comments.
On Thu, Jul 12, 2018 at 10:26 PM, Mark Rutland <[email protected]> wrote:
> Hi Ganapat,
>
> On Thu, Jun 21, 2018 at 12:03:38PM +0530, Ganapatrao Kulkarni wrote:
>> This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
>> Controller(DMC) and Level 3 Cache(L3C).
>>
>> ThunderX2 has 8 independent DMC PMUs to capture performance events
>> corresponding to 8 channels of DDR4 Memory Controller and 16 independent
>> L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
>> Each PMU supports up to 4 counters. All counters lack overflow interrupt
>> and are sampled periodically.
>>
>> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
>> ---
>> drivers/perf/Kconfig | 8 +
>> drivers/perf/Makefile | 1 +
>> drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++++++++++
>> include/linux/cpuhotplug.h | 1 +
>> 4 files changed, 959 insertions(+)
>> create mode 100644 drivers/perf/thunderx2_pmu.c
>>
>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>> index 08ebaf7..ecedb9e 100644
>> --- a/drivers/perf/Kconfig
>> +++ b/drivers/perf/Kconfig
>> @@ -87,6 +87,14 @@ config QCOM_L3_PMU
>> Adds the L3 cache PMU into the perf events subsystem for
>> monitoring L3 cache events.
>>
>> +config THUNDERX2_PMU
>> + bool "Cavium ThunderX2 SoC PMU UNCORE"
>> + depends on ARCH_THUNDER2 && ARM64 && ACPI
>
> Surely this depends on NUMA for the node id?
it is, i will update.
>
> [...]
>
>> +/*
>> + * pmu on each socket has 2 uncore devices(dmc and l3),
>> + * each uncore device has up to 16 channels, each channel can sample
>> + * events independently with counters up to 4.
>> + */
>> +struct thunderx2_pmu_uncore_channel {
>> + struct pmu pmu;
>> + struct hlist_node node;
>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>> + int channel;
>> + int cpu;
>> + DECLARE_BITMAP(active_counters, UNCORE_MAX_COUNTERS);
>> + struct perf_event *events[UNCORE_MAX_COUNTERS];
>> + struct hrtimer hrtimer;
>> + /* to sync counter alloc/release */
>> + raw_spinlock_t lock;
>> +};
>
> This lock should not be necessary. The pmu::{add,del} callbacks are
> strictly serialised w.r.t. one another per CPU because the core perf
> code holds ctx->lock whenever it calls either.
>
> Previously, you mentioned you'd seen scheduling while atomic when this
> lock was removed. Could you please provide a backtrace?
yes, i have seen, if i try to kick in simulatneously more events than
h/w counters available(4 here).
i will try with v6 and send the trace asap.
>
> [...]
>
> It would be worth a comment here explaining which resources the channel
> PMUs share, and why we need this thunderx2_pmu_uncore_dev. IIUC, it's
> just that they share a register windows switched by FW?
>
>> +struct thunderx2_pmu_uncore_dev {
>> + char *name;
>> + struct device *dev;
>> + enum thunderx2_uncore_type type;
>> + void __iomem *base;
>> + int node;
>> + u32 max_counters;
>> + u32 max_channels;
>> + u32 max_events;
>> + u64 hrtimer_interval;
>> + /* this lock synchronizes across channels */
>> + raw_spinlock_t lock;
>> + const struct attribute_group **attr_groups;
>> + void (*init_cntr_base)(struct perf_event *event,
>> + struct thunderx2_pmu_uncore_dev *uncore_dev);
>> + void (*select_channel)(struct perf_event *event);
>> + void (*stop_event)(struct perf_event *event);
>> + void (*start_event)(struct perf_event *event, int flags);
>> +};
>
> As a general thing, the really long structure/enum/macro names make
> things really hard to read.
>
> Could we s/THUNDERX2/TX2/ and s/thunderx2/tx2/ for identifiers please?
thanks, will do.
>
> Similarly:
>
> * s/thunderx2_pmu_uncore_channel/tx2_pmu/
>
> * s/thunderx2_pmu_uncore_dev/tx2_pmu_group/
>
> ... and consistently name those "pmu" and "pmu_group" respectively --
> that mekas the relationship between the two much clearer for someone who
> is not intimately familiar with the hardware.
These PMUs(UNCORE) counters makes only sense to someone who is
familiar with the hardware.
IMHO, the current names aligned to hardware design like channels ,
tiles, counter etc,
which is explained in PATCH1/2.
I will try to simplify long names.
>
> That makes things far more legible, e.g.
>
>> +static inline struct thunderx2_pmu_uncore_channel *
>> +pmu_to_thunderx2_pmu_uncore(struct pmu *pmu)
>> +{
>> + return container_of(pmu, struct thunderx2_pmu_uncore_channel, pmu);
>> +}
>
> ... becomes:
>
> static struct tx2_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
> {
> return container_of(pmu, struct tx2_pmu, pmu);
> }
>
> [...]
>
>> +/*
>> + * sysfs cpumask attributes
>> + */
>> +static ssize_t cpumask_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct cpumask cpu_mask;
>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
>> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
>> +
>> + cpumask_clear(&cpu_mask);
>> + cpumask_set_cpu(pmu_uncore->cpu, &cpu_mask);
>> + return cpumap_print_to_pagebuf(true, buf, &cpu_mask);
>> +}
>> +static DEVICE_ATTR_RO(cpumask);
>
> This can be simplified to:
>
> static ssize_t cpumask_show(struct device *dev, struct device_attribute
> *attr, char *buf)
> {
> struct thunderx2_pmu_uncore_channel *pmu_uncore;
> pmu_uncore = pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
>
> return cpumap_print_to_pagebuf(true, buf, cpumask_of(pmu_uncore->cpu));
> }
thanks, will do
>
> [...]
>
>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
>> +{
>> + int counter;
>> +
>> + raw_spin_lock(&pmu_uncore->lock);
>> + counter = find_first_zero_bit(pmu_uncore->active_counters,
>> + pmu_uncore->uncore_dev->max_counters);
>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>> + raw_spin_unlock(&pmu_uncore->lock);
>> + return -ENOSPC;
>> + }
>> + set_bit(counter, pmu_uncore->active_counters);
>> + raw_spin_unlock(&pmu_uncore->lock);
>> + return counter;
>> +}
>> +
>> +static void free_counter(
>> + struct thunderx2_pmu_uncore_channel *pmu_uncore, int counter)
>> +{
>> + raw_spin_lock(&pmu_uncore->lock);
>> + clear_bit(counter, pmu_uncore->active_counters);
>> + raw_spin_unlock(&pmu_uncore->lock);
>> +}
>
> As above, I still don't believe that these locks are necessary, unless
> we have a bug elsewhere.
i will try to debug and provide you more info.
>
> [...]
>
>> +/*
>> + * DMC and L3 counter interface is muxed across all channels.
>> + * hence we need to select the channel before accessing counter
>> + * data/control registers.
>> + *
>> + * L3 Tile and DMC channel selection is through SMC call
>> + * SMC call arguments,
>> + * x0 = THUNDERX2_SMC_CALL_ID (Vendor SMC call Id)
>> + * x1 = THUNDERX2_SMC_SET_CHANNEL (Id to set DMC/L3C channel)
>> + * x2 = Node id
>> + * x3 = DMC(1)/L3C(0)
>> + * x4 = channel Id
>> + */
>
> Please document which ID space the node id is in, e.g.
>
> x2 = FW Node ID (matching SRAT/SLIT IDs)
>
> ... so that it's clear that this isn't a Linux logical node id, even if
> those happen to be the same today.
sure.
>
> [...]
>
>> +static void thunderx2_uncore_start(struct perf_event *event, int flags)
>> +{
>> + struct hw_perf_event *hwc = &event->hw;
>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>> + unsigned long irqflags;
>> +
>> + hwc->state = 0;
>> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
>> + uncore_dev = pmu_uncore->uncore_dev;
>> +
>> + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
>> + uncore_dev->select_channel(event);
>> + uncore_dev->start_event(event, flags);
>> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
>> + perf_event_update_userpage(event);
>> +
>> + if (!find_last_bit(pmu_uncore->active_counters,
>> + pmu_uncore->uncore_dev->max_counters)) {
>
> This would be clearer using !bitmap_empty().
thanks.
>
>> + hrtimer_start(&pmu_uncore->hrtimer,
>> + ns_to_ktime(uncore_dev->hrtimer_interval),
>> + HRTIMER_MODE_REL_PINNED);
>> + }
>> +}
>
> [...]
>
>> +static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
>> + struct hrtimer *hrt)
>
> s/hrt/timer/ please
sure, thanks.
>
>> +{
>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>> + unsigned long irqflags;
>> + int idx;
>> + bool restart_timer = false;
>> +
>> + pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
>> + hrtimer);
>> +
>> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
>> + for_each_set_bit(idx, pmu_uncore->active_counters,
>> + pmu_uncore->uncore_dev->max_counters) {
>> + struct perf_event *event = pmu_uncore->events[idx];
>> +
>> + thunderx2_uncore_update(event);
>> + restart_timer = true;
>> + }
>> + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
>> +
>> + if (restart_timer)
>> + hrtimer_forward_now(hrt,
>> + ns_to_ktime(
>> + pmu_uncore->uncore_dev->hrtimer_interval));
>> +
>> + return restart_timer ? HRTIMER_RESTART : HRTIMER_NORESTART;
>> +}
>
> You don't need to take the lock at all if there are no active events,
> and we can avoid all the conditional logic that way. e.g. assuming the
> renames I requested above:
>
> static enum hrtimer_restart tx2_hrtimer_callback(struct hrtimer *timer)
>
> {
> struct tx2_pmu *pmu;
> struct tx2_pmu_group *pmu_group;
> unsigned long irqflags;
> int max_counters;
> int idx;
>
> pmu = container_of(timer, struct tx2_pmu, hrtimer);
> pmu_group = pmu->pmu_group;
> max_counters = pmu_group->max_counters;
>
> if (cpumask_empty(pmu->active_counters, max_counters))
> return HRTIMER_NORESTART;
>
> raw_spin_lock_irqsave(&pmu_group->lock, irqflags);
> for_each_set_bit(idx, pmu->active_counters, max_counters) {
> struct perf_event *event = pmu->events[idx];
>
> tx2_uncore_update(event);
> restart_timer = true;
> }
> raw_spin_unlock_irqrestore(&pmu_group->lock, irqflags);
>
> hrtimer_forward_now(hrt, ns_to_ktime(pmu_group->hrtimer_interval));
>
> return HRTIMER_RESTART;
> }
thanks, i will adopt this function.
>
> [...]
>
>> + switch (uncore_dev->type) {
>> + case PMU_TYPE_L3C:
>> + uncore_dev->max_counters = UNCORE_MAX_COUNTERS;
>> + uncore_dev->max_channels = UNCORE_L3_MAX_TILES;
>> + uncore_dev->max_events = L3_EVENT_MAX;
>> + uncore_dev->hrtimer_interval = UNCORE_HRTIMER_INTERVAL;
>> + uncore_dev->attr_groups = l3c_pmu_attr_groups;
>> + uncore_dev->name = devm_kasprintf(dev, GFP_KERNEL,
>> + "uncore_l3c_%d", uncore_dev->node);
>> + uncore_dev->init_cntr_base = init_cntr_base_l3c;
>> + uncore_dev->start_event = uncore_start_event_l3c;
>> + uncore_dev->stop_event = uncore_stop_event_l3c;
>> + uncore_dev->select_channel = uncore_select_channel;
>> + break;
>> + case PMU_TYPE_DMC:
>> + uncore_dev->max_counters = UNCORE_MAX_COUNTERS;
>> + uncore_dev->max_channels = UNCORE_DMC_MAX_CHANNELS;
>> + uncore_dev->max_events = DMC_EVENT_MAX;
>> + uncore_dev->hrtimer_interval = UNCORE_HRTIMER_INTERVAL;
>> + uncore_dev->attr_groups = dmc_pmu_attr_groups;
>> + uncore_dev->name = devm_kasprintf(dev, GFP_KERNEL,
>> + "uncore_dmc_%d", uncore_dev->node);
>> + uncore_dev->init_cntr_base = init_cntr_base_dmc;
>> + uncore_dev->start_event = uncore_start_event_dmc;
>> + uncore_dev->stop_event = uncore_stop_event_dmc;
>> + uncore_dev->select_channel = uncore_select_channel;
>> + break;
>
> We should probably s/uncore/tx2/, or s/uncore/thunderx2/ to namespace
> this.
ok, i would preffer tx2.
>
>> +static int thunderx2_uncore_pmu_offline_cpu(unsigned int cpu,
>> + struct hlist_node *node)
>> +{
>> + int new_cpu;
>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>> +
>> + pmu_uncore = hlist_entry_safe(node,
>> + struct thunderx2_pmu_uncore_channel, node);
>> + if (cpu != pmu_uncore->cpu)
>> + return 0;
>> +
>> + new_cpu = cpumask_any_and(
>> + cpumask_of_node(pmu_uncore->uncore_dev->node),
>> + cpu_online_mask);
>> + if (new_cpu >= nr_cpu_ids)
>> + return 0;
>> +
>> + pmu_uncore->cpu = new_cpu;
>> + perf_pmu_migrate_context(&pmu_uncore->pmu, cpu, new_cpu);
>> + return 0;
>> +}
>
> We'll also need a onlining callback. Consider if all CPUs in a NUMA node
> were offlined, then we tried to online an arbitrary CPU from that node.
sure, will add.
>
> Thanks,
> Mark.
thanks
Ganapat
Hi Mark,
On Fri, Jul 13, 2018 at 6:28 PM, Ganapatrao Kulkarni <[email protected]> wrote:
> thanks Mark for the comments.
>
> On Thu, Jul 12, 2018 at 10:26 PM, Mark Rutland <[email protected]> wrote:
>> Hi Ganapat,
>>
>> On Thu, Jun 21, 2018 at 12:03:38PM +0530, Ganapatrao Kulkarni wrote:
>>> This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
>>> Controller(DMC) and Level 3 Cache(L3C).
>>>
>>> ThunderX2 has 8 independent DMC PMUs to capture performance events
>>> corresponding to 8 channels of DDR4 Memory Controller and 16 independent
>>> L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
>>> Each PMU supports up to 4 counters. All counters lack overflow interrupt
>>> and are sampled periodically.
>>>
>>> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
>>> ---
>>> drivers/perf/Kconfig | 8 +
>>> drivers/perf/Makefile | 1 +
>>> drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++++++++++
>>> include/linux/cpuhotplug.h | 1 +
>>> 4 files changed, 959 insertions(+)
>>> create mode 100644 drivers/perf/thunderx2_pmu.c
>>>
>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>> index 08ebaf7..ecedb9e 100644
>>> --- a/drivers/perf/Kconfig
>>> +++ b/drivers/perf/Kconfig
>>> @@ -87,6 +87,14 @@ config QCOM_L3_PMU
>>> Adds the L3 cache PMU into the perf events subsystem for
>>> monitoring L3 cache events.
>>>
>>> +config THUNDERX2_PMU
>>> + bool "Cavium ThunderX2 SoC PMU UNCORE"
>>> + depends on ARCH_THUNDER2 && ARM64 && ACPI
>>
>> Surely this depends on NUMA for the node id?
> it is, i will update.
>
>>
>> [...]
>>
>>> +/*
>>> + * pmu on each socket has 2 uncore devices(dmc and l3),
>>> + * each uncore device has up to 16 channels, each channel can sample
>>> + * events independently with counters up to 4.
>>> + */
>>> +struct thunderx2_pmu_uncore_channel {
>>> + struct pmu pmu;
>>> + struct hlist_node node;
>>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>>> + int channel;
>>> + int cpu;
>>> + DECLARE_BITMAP(active_counters, UNCORE_MAX_COUNTERS);
>>> + struct perf_event *events[UNCORE_MAX_COUNTERS];
>>> + struct hrtimer hrtimer;
>>> + /* to sync counter alloc/release */
>>> + raw_spinlock_t lock;
>>> +};
>>
>> This lock should not be necessary. The pmu::{add,del} callbacks are
>> strictly serialised w.r.t. one another per CPU because the core perf
>> code holds ctx->lock whenever it calls either.
>>
>> Previously, you mentioned you'd seen scheduling while atomic when this
>> lock was removed. Could you please provide a backtrace?
>
> yes, i have seen, if i try to kick in simulatneously more events than
> h/w counters available(4 here).
> i will try with v6 and send the trace asap.
i dont have the setup on which i have seen this issue. I have tried to
recreate on my current setup and not able to see it as before. I will
remove these locks, if i dont see any issue while sending next
version. Sorry for the noise!
>
>>
>> [...]
>>
>> It would be worth a comment here explaining which resources the channel
>> PMUs share, and why we need this thunderx2_pmu_uncore_dev. IIUC, it's
>> just that they share a register windows switched by FW?
>>
>>> +struct thunderx2_pmu_uncore_dev {
>>> + char *name;
>>> + struct device *dev;
>>> + enum thunderx2_uncore_type type;
>>> + void __iomem *base;
>>> + int node;
>>> + u32 max_counters;
>>> + u32 max_channels;
>>> + u32 max_events;
>>> + u64 hrtimer_interval;
>>> + /* this lock synchronizes across channels */
>>> + raw_spinlock_t lock;
>>> + const struct attribute_group **attr_groups;
>>> + void (*init_cntr_base)(struct perf_event *event,
>>> + struct thunderx2_pmu_uncore_dev *uncore_dev);
>>> + void (*select_channel)(struct perf_event *event);
>>> + void (*stop_event)(struct perf_event *event);
>>> + void (*start_event)(struct perf_event *event, int flags);
>>> +};
>>
>> As a general thing, the really long structure/enum/macro names make
>> things really hard to read.
>>
>> Could we s/THUNDERX2/TX2/ and s/thunderx2/tx2/ for identifiers please?
>
> thanks, will do.
>>
>> Similarly:
>>
>> * s/thunderx2_pmu_uncore_channel/tx2_pmu/
>>
>> * s/thunderx2_pmu_uncore_dev/tx2_pmu_group/
>>
>> ... and consistently name those "pmu" and "pmu_group" respectively --
>> that mekas the relationship between the two much clearer for someone who
>> is not intimately familiar with the hardware.
>
> These PMUs(UNCORE) counters makes only sense to someone who is
> familiar with the hardware.
> IMHO, the current names aligned to hardware design like channels ,
> tiles, counter etc,
> which is explained in PATCH1/2.
>
> I will try to simplify long names.
>
>>
>> That makes things far more legible, e.g.
>>
>>> +static inline struct thunderx2_pmu_uncore_channel *
>>> +pmu_to_thunderx2_pmu_uncore(struct pmu *pmu)
>>> +{
>>> + return container_of(pmu, struct thunderx2_pmu_uncore_channel, pmu);
>>> +}
>>
>> ... becomes:
>>
>> static struct tx2_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
>> {
>> return container_of(pmu, struct tx2_pmu, pmu);
>> }
>>
>> [...]
>>
>>> +/*
>>> + * sysfs cpumask attributes
>>> + */
>>> +static ssize_t cpumask_show(struct device *dev,
>>> + struct device_attribute *attr, char *buf)
>>> +{
>>> + struct cpumask cpu_mask;
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
>>> + pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
>>> +
>>> + cpumask_clear(&cpu_mask);
>>> + cpumask_set_cpu(pmu_uncore->cpu, &cpu_mask);
>>> + return cpumap_print_to_pagebuf(true, buf, &cpu_mask);
>>> +}
>>> +static DEVICE_ATTR_RO(cpumask);
>>
>> This can be simplified to:
>>
>> static ssize_t cpumask_show(struct device *dev, struct device_attribute
>> *attr, char *buf)
>> {
>> struct thunderx2_pmu_uncore_channel *pmu_uncore;
>> pmu_uncore = pmu_to_thunderx2_pmu_uncore(dev_get_drvdata(dev));
>>
>> return cpumap_print_to_pagebuf(true, buf, cpumask_of(pmu_uncore->cpu));
>> }
>
> thanks, will do
>>
>> [...]
>>
>>> +static int alloc_counter(struct thunderx2_pmu_uncore_channel *pmu_uncore)
>>> +{
>>> + int counter;
>>> +
>>> + raw_spin_lock(&pmu_uncore->lock);
>>> + counter = find_first_zero_bit(pmu_uncore->active_counters,
>>> + pmu_uncore->uncore_dev->max_counters);
>>> + if (counter == pmu_uncore->uncore_dev->max_counters) {
>>> + raw_spin_unlock(&pmu_uncore->lock);
>>> + return -ENOSPC;
>>> + }
>>> + set_bit(counter, pmu_uncore->active_counters);
>>> + raw_spin_unlock(&pmu_uncore->lock);
>>> + return counter;
>>> +}
>>> +
>>> +static void free_counter(
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore, int counter)
>>> +{
>>> + raw_spin_lock(&pmu_uncore->lock);
>>> + clear_bit(counter, pmu_uncore->active_counters);
>>> + raw_spin_unlock(&pmu_uncore->lock);
>>> +}
>>
>> As above, I still don't believe that these locks are necessary, unless
>> we have a bug elsewhere.
>
> i will try to debug and provide you more info.
>
>>
>> [...]
>>
>>> +/*
>>> + * DMC and L3 counter interface is muxed across all channels.
>>> + * hence we need to select the channel before accessing counter
>>> + * data/control registers.
>>> + *
>>> + * L3 Tile and DMC channel selection is through SMC call
>>> + * SMC call arguments,
>>> + * x0 = THUNDERX2_SMC_CALL_ID (Vendor SMC call Id)
>>> + * x1 = THUNDERX2_SMC_SET_CHANNEL (Id to set DMC/L3C channel)
>>> + * x2 = Node id
>>> + * x3 = DMC(1)/L3C(0)
>>> + * x4 = channel Id
>>> + */
>>
>> Please document which ID space the node id is in, e.g.
>>
>> x2 = FW Node ID (matching SRAT/SLIT IDs)
>>
>> ... so that it's clear that this isn't a Linux logical node id, even if
>> those happen to be the same today.
>
> sure.
>>
>> [...]
>>
>>> +static void thunderx2_uncore_start(struct perf_event *event, int flags)
>>> +{
>>> + struct hw_perf_event *hwc = &event->hw;
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>>> + unsigned long irqflags;
>>> +
>>> + hwc->state = 0;
>>> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
>>> + uncore_dev = pmu_uncore->uncore_dev;
>>> +
>>> + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
>>> + uncore_dev->select_channel(event);
>>> + uncore_dev->start_event(event, flags);
>>> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
>>> + perf_event_update_userpage(event);
>>> +
>>> + if (!find_last_bit(pmu_uncore->active_counters,
>>> + pmu_uncore->uncore_dev->max_counters)) {
>>
>> This would be clearer using !bitmap_empty().
>
> thanks.
>>
>>> + hrtimer_start(&pmu_uncore->hrtimer,
>>> + ns_to_ktime(uncore_dev->hrtimer_interval),
>>> + HRTIMER_MODE_REL_PINNED);
>>> + }
>>> +}
>>
>> [...]
>>
>>> +static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
>>> + struct hrtimer *hrt)
>>
>> s/hrt/timer/ please
>
> sure, thanks.
>>
>>> +{
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>>> + unsigned long irqflags;
>>> + int idx;
>>> + bool restart_timer = false;
>>> +
>>> + pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
>>> + hrtimer);
>>> +
>>> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
>>> + for_each_set_bit(idx, pmu_uncore->active_counters,
>>> + pmu_uncore->uncore_dev->max_counters) {
>>> + struct perf_event *event = pmu_uncore->events[idx];
>>> +
>>> + thunderx2_uncore_update(event);
>>> + restart_timer = true;
>>> + }
>>> + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
>>> +
>>> + if (restart_timer)
>>> + hrtimer_forward_now(hrt,
>>> + ns_to_ktime(
>>> + pmu_uncore->uncore_dev->hrtimer_interval));
>>> +
>>> + return restart_timer ? HRTIMER_RESTART : HRTIMER_NORESTART;
>>> +}
>>
>> You don't need to take the lock at all if there are no active events,
>> and we can avoid all the conditional logic that way. e.g. assuming the
>> renames I requested above:
>>
>> static enum hrtimer_restart tx2_hrtimer_callback(struct hrtimer *timer)
>>
>> {
>> struct tx2_pmu *pmu;
>> struct tx2_pmu_group *pmu_group;
>> unsigned long irqflags;
>> int max_counters;
>> int idx;
>>
>> pmu = container_of(timer, struct tx2_pmu, hrtimer);
>> pmu_group = pmu->pmu_group;
>> max_counters = pmu_group->max_counters;
>>
>> if (cpumask_empty(pmu->active_counters, max_counters))
>> return HRTIMER_NORESTART;
>>
>> raw_spin_lock_irqsave(&pmu_group->lock, irqflags);
>> for_each_set_bit(idx, pmu->active_counters, max_counters) {
>> struct perf_event *event = pmu->events[idx];
>>
>> tx2_uncore_update(event);
>> restart_timer = true;
>> }
>> raw_spin_unlock_irqrestore(&pmu_group->lock, irqflags);
>>
>> hrtimer_forward_now(hrt, ns_to_ktime(pmu_group->hrtimer_interval));
>>
>> return HRTIMER_RESTART;
>> }
>
> thanks, i will adopt this function.
>>
>> [...]
>>
>>> + switch (uncore_dev->type) {
>>> + case PMU_TYPE_L3C:
>>> + uncore_dev->max_counters = UNCORE_MAX_COUNTERS;
>>> + uncore_dev->max_channels = UNCORE_L3_MAX_TILES;
>>> + uncore_dev->max_events = L3_EVENT_MAX;
>>> + uncore_dev->hrtimer_interval = UNCORE_HRTIMER_INTERVAL;
>>> + uncore_dev->attr_groups = l3c_pmu_attr_groups;
>>> + uncore_dev->name = devm_kasprintf(dev, GFP_KERNEL,
>>> + "uncore_l3c_%d", uncore_dev->node);
>>> + uncore_dev->init_cntr_base = init_cntr_base_l3c;
>>> + uncore_dev->start_event = uncore_start_event_l3c;
>>> + uncore_dev->stop_event = uncore_stop_event_l3c;
>>> + uncore_dev->select_channel = uncore_select_channel;
>>> + break;
>>> + case PMU_TYPE_DMC:
>>> + uncore_dev->max_counters = UNCORE_MAX_COUNTERS;
>>> + uncore_dev->max_channels = UNCORE_DMC_MAX_CHANNELS;
>>> + uncore_dev->max_events = DMC_EVENT_MAX;
>>> + uncore_dev->hrtimer_interval = UNCORE_HRTIMER_INTERVAL;
>>> + uncore_dev->attr_groups = dmc_pmu_attr_groups;
>>> + uncore_dev->name = devm_kasprintf(dev, GFP_KERNEL,
>>> + "uncore_dmc_%d", uncore_dev->node);
>>> + uncore_dev->init_cntr_base = init_cntr_base_dmc;
>>> + uncore_dev->start_event = uncore_start_event_dmc;
>>> + uncore_dev->stop_event = uncore_stop_event_dmc;
>>> + uncore_dev->select_channel = uncore_select_channel;
>>> + break;
>>
>> We should probably s/uncore/tx2/, or s/uncore/thunderx2/ to namespace
>> this.
>
> ok, i would preffer tx2.
>>
>>> +static int thunderx2_uncore_pmu_offline_cpu(unsigned int cpu,
>>> + struct hlist_node *node)
>>> +{
>>> + int new_cpu;
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>>> +
>>> + pmu_uncore = hlist_entry_safe(node,
>>> + struct thunderx2_pmu_uncore_channel, node);
>>> + if (cpu != pmu_uncore->cpu)
>>> + return 0;
>>> +
>>> + new_cpu = cpumask_any_and(
>>> + cpumask_of_node(pmu_uncore->uncore_dev->node),
>>> + cpu_online_mask);
>>> + if (new_cpu >= nr_cpu_ids)
>>> + return 0;
>>> +
>>> + pmu_uncore->cpu = new_cpu;
>>> + perf_pmu_migrate_context(&pmu_uncore->pmu, cpu, new_cpu);
>>> + return 0;
>>> +}
>>
>> We'll also need a onlining callback. Consider if all CPUs in a NUMA node
>> were offlined, then we tried to online an arbitrary CPU from that node.
>
> sure, will add.
>>
>> Thanks,
>> Mark.
>
> thanks
> Ganapat
thanks
Ganapat
Hi Pranith,
On Sat, Jul 7, 2018 at 11:22 AM Pranith Kumar <[email protected]> wrote:
>
> Hi Ganapatrao,
>
>
> On Wed, Jun 20, 2018 at 11:33 PM, Ganapatrao Kulkarni
> <[email protected]> wrote:
>
> > +
> > +enum thunderx2_uncore_l3_events {
> > + L3_EVENT_NONE,
> > + L3_EVENT_NBU_CANCEL,
> > + L3_EVENT_DIB_RETRY,
> > + L3_EVENT_DOB_RETRY,
> > + L3_EVENT_DIB_CREDIT_RETRY,
> > + L3_EVENT_DOB_CREDIT_RETRY,
> > + L3_EVENT_FORCE_RETRY,
> > + L3_EVENT_IDX_CONFLICT_RETRY,
> > + L3_EVENT_EVICT_CONFLICT_RETRY,
> > + L3_EVENT_BANK_CONFLICT_RETRY,
> > + L3_EVENT_FILL_ENTRY_RETRY,
> > + L3_EVENT_EVICT_NOT_READY_RETRY,
> > + L3_EVENT_L3_RETRY,
> > + L3_EVENT_READ_REQ,
> > + L3_EVENT_WRITE_BACK_REQ,
> > + L3_EVENT_INVALIDATE_NWRITE_REQ,
> > + L3_EVENT_INV_REQ,
> > + L3_EVENT_SELF_REQ,
> > + L3_EVENT_REQ,
> > + L3_EVENT_EVICT_REQ,
> > + L3_EVENT_INVALIDATE_NWRITE_HIT,
> > + L3_EVENT_INVALIDATE_HIT,
> > + L3_EVENT_SELF_HIT,
> > + L3_EVENT_READ_HIT,
> > + L3_EVENT_MAX,
> > +};
> > +
> > +enum thunderx2_uncore_dmc_events {
> > + DMC_EVENT_NONE,
> > + DMC_EVENT_COUNT_CYCLES,
> > + DMC_EVENT_RES2,
> > + DMC_EVENT_RES3,
> > + DMC_EVENT_RES4,
> > + DMC_EVENT_RES5,
> > + DMC_EVENT_RES6,
> > + DMC_EVENT_RES7,
> > + DMC_EVENT_RES8,
> > + DMC_EVENT_READ_64B_TXNS,
> > + DMC_EVENT_READ_BELOW_64B_TXNS,
> > + DMC_EVENT_WRITE_TXNS,
> > + DMC_EVENT_TXN_CYCLES,
> > + DMC_EVENT_DATA_TRANSFERS,
> > + DMC_EVENT_CANCELLED_READ_TXNS,
> > + DMC_EVENT_CONSUMED_READ_TXNS,
> > + DMC_EVENT_MAX,
> > +};
>
> Can you please provide a link to where these counters are documented?
> It is not clear what each counter does especially for the L3C events.
I will add brief description of each event in Documentation.
>
> Also, what counter do you need to use to get L3 hit/miss ratio? I
> think this is the most useful counter related to L3.
L3C cache Hit Ratio = (L3_READ_HIT + L3_INV_N_WRITE_HIT +
L3_INVALIDATE_HIT) / (L3_READ_REQ + L3_INV_N_WRITE_REQ +
L3_INVALIDATE_REQ)
>
> Thanks,
> --
> Pranith
thanks
Ganapat
Hi Ganapatrao,
On 21/06/18 07:33, Ganapatrao Kulkarni wrote:
> This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
> Controller(DMC) and Level 3 Cache(L3C).
>
> ThunderX2 has 8 independent DMC PMUs to capture performance events
> corresponding to 8 channels of DDR4 Memory Controller and 16 independent
> L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
> Each PMU supports up to 4 counters. All counters lack overflow interrupt
> and are sampled periodically.
>
> Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> ---
> drivers/perf/Kconfig | 8 +
> drivers/perf/Makefile | 1 +
> drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/cpuhotplug.h | 1 +
> 4 files changed, 959 insertions(+)
> create mode 100644 drivers/perf/thunderx2_pmu.c
>
> +
> +/*
> + * DMC and L3 counter interface is muxed across all channels.
> + * hence we need to select the channel before accessing counter
> + * data/control registers.
> + *
> + * L3 Tile and DMC channel selection is through SMC call
> + * SMC call arguments,
> + * x0 = THUNDERX2_SMC_CALL_ID (Vendor SMC call Id)
> + * x1 = THUNDERX2_SMC_SET_CHANNEL (Id to set DMC/L3C channel)
> + * x2 = Node id
> + * x3 = DMC(1)/L3C(0)
> + * x4 = channel Id
> + */
> +static void uncore_select_channel(struct perf_event *event)
> +{
> + struct arm_smccc_res res;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> + pmu_to_thunderx2_pmu_uncore(event->pmu);
> + struct thunderx2_pmu_uncore_dev *uncore_dev =
> + pmu_uncore->uncore_dev;
> +
> + arm_smccc_smc(THUNDERX2_SMC_CALL_ID, THUNDERX2_SMC_SET_CHANNEL,
> + uncore_dev->node, uncore_dev->type,
> + pmu_uncore->channel, 0, 0, 0, &res);
> + if (res.a0) {
> + dev_err(uncore_dev->dev,
> + "SMC to Select channel failed for PMU UNCORE[%s]\n",
> + pmu_uncore->uncore_dev->name);
> + }
> +}
> +
> +static void uncore_start_event_l3c(struct perf_event *event, int flags)
> +{
> + u32 val;
> + struct hw_perf_event *hwc = &event->hw;
> +
> + /* event id encoded in bits [07:03] */
> + val = GET_EVENTID(event) << 3;
> + reg_writel(val, hwc->config_base);
> + local64_set(&hwc->prev_count, 0);
> + reg_writel(0, hwc->event_base);
> +}
> +
> +static void uncore_stop_event_l3c(struct perf_event *event)
> +{
> + reg_writel(0, event->hw.config_base);
> +}
> +
> +static void uncore_start_event_dmc(struct perf_event *event, int flags)
> +{
> + u32 val;
> + struct hw_perf_event *hwc = &event->hw;
> + int idx = GET_COUNTERID(event);
> + int event_type = GET_EVENTID(event);
> +
> + /* enable and start counters.
> + * 8 bits for each counter, bits[05:01] of a counter to set event type.
> + */
> + val = reg_readl(hwc->config_base);
> + val &= ~DMC_EVENT_CFG(idx, 0x1f);
> + val |= DMC_EVENT_CFG(idx, event_type);
> + reg_writel(val, hwc->config_base);
> + local64_set(&hwc->prev_count, 0);
> + reg_writel(0, hwc->event_base);
> +}
> +
> +static void uncore_stop_event_dmc(struct perf_event *event)
> +{
> + u32 val;
> + struct hw_perf_event *hwc = &event->hw;
> + int idx = GET_COUNTERID(event);
> +
> + /* clear event type(bits[05:01]) to stop counter */
> + val = reg_readl(hwc->config_base);
> + val &= ~DMC_EVENT_CFG(idx, 0x1f);
> + reg_writel(val, hwc->config_base);
> +}
> +
> +static void init_cntr_base_l3c(struct perf_event *event,
> + struct thunderx2_pmu_uncore_dev *uncore_dev)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> +
> + /* counter ctrl/data reg offset at 8 */
> + hwc->config_base = (unsigned long)uncore_dev->base
> + + L3C_COUNTER_CTL + (8 * GET_COUNTERID(event));
> + hwc->event_base = (unsigned long)uncore_dev->base
> + + L3C_COUNTER_DATA + (8 * GET_COUNTERID(event));
> +}
> +
> +static void init_cntr_base_dmc(struct perf_event *event,
> + struct thunderx2_pmu_uncore_dev *uncore_dev)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> +
> + hwc->config_base = (unsigned long)uncore_dev->base
> + + DMC_COUNTER_CTL;
> + /* counter data reg offset at 0xc */
> + hwc->event_base = (unsigned long)uncore_dev->base
> + + DMC_COUNTER_DATA + (0xc * GET_COUNTERID(event));
> +}
> +
> +static void thunderx2_uncore_update(struct perf_event *event)
> +{
> + s64 prev, new = 0;
> + u64 delta;
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> + enum thunderx2_uncore_type type;
> +
> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> + type = pmu_uncore->uncore_dev->type;
> +
> + pmu_uncore->uncore_dev->select_channel(event);
> +
> + new = reg_readl(hwc->event_base);
> + prev = local64_xchg(&hwc->prev_count, new);
> +
> + /* handles rollover of 32 bit counter */
> + delta = (u32)(((1UL << 32) - prev) + new);
> + local64_add(delta, &event->count);
> +}
> +
> +
> +/*
> + * We must NOT create groups containing events from multiple hardware PMUs,
> + * although mixing different software and hardware PMUs is allowed.
> + */
> +static bool thunderx2_uncore_validate_event_group(struct perf_event *event)
> +{
> + struct pmu *pmu = event->pmu;
> + struct perf_event *leader = event->group_leader;
> + struct perf_event *sibling;
> + int counters = 0;
> +
> + if (leader->pmu != event->pmu && !is_software_event(leader))
> + return false;
> +
> + for_each_sibling_event(sibling, event->group_leader) {
> + if (is_software_event(sibling))
> + continue;
> + if (sibling->pmu != pmu)
> + return false;
> + counters++;
> + }
The above loop will not account for :
1) The leader event
2) The event that we are about to add.
So you need to allocate counters for both the above to make sure we can
accommodate the events.
> +
> + /*
> + * If the group requires more counters than the HW has,
> + * it cannot ever be scheduled.
> + */
> + return counters < UNCORE_MAX_COUNTERS;
> +}
> +
> +
> +static void thunderx2_uncore_start(struct perf_event *event, int flags)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> + unsigned long irqflags;
> +
> + hwc->state = 0;
> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> + uncore_dev = pmu_uncore->uncore_dev;
> +
> + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
> + uncore_dev->select_channel(event);
> + uncore_dev->start_event(event, flags);
> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
> +
> + perf_event_update_userpage(event);
> +
> + if (!find_last_bit(pmu_uncore->active_counters,
> + pmu_uncore->uncore_dev->max_counters)) {
Are we guaranteed that the counter0 is always allocated ? What if the
event is deleted and there are other events active ? A better check
would be :
if (find_last_bit(...) != pmu_uncore->uncore_dev->max_counters)
> + hrtimer_start(&pmu_uncore->hrtimer,
> + ns_to_ktime(uncore_dev->hrtimer_interval),
> + HRTIMER_MODE_REL_PINNED);
> + }
> +}
> +
> +static void thunderx2_uncore_stop(struct perf_event *event, int flags)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> + unsigned long irqflags;
> +
> + if (hwc->state & PERF_HES_UPTODATE)
> + return;
> +
> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> + uncore_dev = pmu_uncore->uncore_dev;
> +
> + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
> +
> + uncore_dev->select_channel(event);
> + uncore_dev->stop_event(event);
> +
> + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
> + hwc->state |= PERF_HES_STOPPED;
> + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
We already check the UPTODATE flag above, we could skip that
flag check here ?
> + thunderx2_uncore_update(event);
> + hwc->state |= PERF_HES_UPTODATE;
> + }
> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
> +}
> +
> +static int thunderx2_uncore_add(struct perf_event *event, int flags)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> +
> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> + uncore_dev = pmu_uncore->uncore_dev;
> +
> + /* Allocate a free counter */
> + hwc->idx = alloc_counter(pmu_uncore);
> + if (hwc->idx < 0)
> + return -EAGAIN;
> +
> + pmu_uncore->events[hwc->idx] = event;
> + /* set counter control and data registers base address */
> + uncore_dev->init_cntr_base(event, uncore_dev);
> +
> + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> + if (flags & PERF_EF_START)
> + thunderx2_uncore_start(event, flags);
> +
> + return 0;
> +}
> +
> +static void thunderx2_uncore_del(struct perf_event *event, int flags)
> +{
> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> + pmu_to_thunderx2_pmu_uncore(event->pmu);
> + struct hw_perf_event *hwc = &event->hw;
> +
> + thunderx2_uncore_stop(event, PERF_EF_UPDATE);
> +
> + /* clear the assigned counter */
> + free_counter(pmu_uncore, GET_COUNTERID(event));
> +
> + perf_event_update_userpage(event);
> + pmu_uncore->events[hwc->idx] = NULL;
> + hwc->idx = -1;
> +}
> +
> +static void thunderx2_uncore_read(struct perf_event *event)
> +{
> + unsigned long irqflags;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> + pmu_to_thunderx2_pmu_uncore(event->pmu);
> +
> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
> + thunderx2_uncore_update(event);
> + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
> +}
> +
> +static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
> + struct hrtimer *hrt)
> +{
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> + unsigned long irqflags;
> + int idx;
> + bool restart_timer = false;
> +
> + pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
> + hrtimer);
> +
> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
> + for_each_set_bit(idx, pmu_uncore->active_counters,
> + pmu_uncore->uncore_dev->max_counters) {
> + struct perf_event *event = pmu_uncore->events[idx];
Do we need to check if the "event" is not NULL ? Couldn't this race with
an event_del() operation ?
> +
> + thunderx2_uncore_update(event);
> + restart_timer = true;
> + }
> + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
> +
> + if (restart_timer)
> + hrtimer_forward_now(hrt,
> + ns_to_ktime(
> + pmu_uncore->uncore_dev->hrtimer_interval));
> +
> + return restart_timer ? HRTIMER_RESTART : HRTIMER_NORESTART;
> +}
> +
> +static int thunderx2_pmu_uncore_register(
> + struct thunderx2_pmu_uncore_channel *pmu_uncore)
> +{
> + struct device *dev = pmu_uncore->uncore_dev->dev;
> + char *name = pmu_uncore->uncore_dev->name;
> + int channel = pmu_uncore->channel;
> +
> + /* Perf event registration */
> + pmu_uncore->pmu = (struct pmu) {
> + .attr_groups = pmu_uncore->uncore_dev->attr_groups,
> + .task_ctx_nr = perf_invalid_context,
> + .event_init = thunderx2_uncore_event_init,
> + .add = thunderx2_uncore_add,
> + .del = thunderx2_uncore_del,
> + .start = thunderx2_uncore_start,
> + .stop = thunderx2_uncore_stop,
> + .read = thunderx2_uncore_read,
You must fill in the module field of the PMU to prevent this module from
being unloaded while it is in use.
> + };
> +
> + pmu_uncore->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
> + "%s_%d", name, channel);
> +
> + return perf_pmu_register(&pmu_uncore->pmu, pmu_uncore->pmu.name, -1);
> +}
> +
> +static int thunderx2_pmu_uncore_add(struct thunderx2_pmu_uncore_dev *uncore_dev,
> + int channel)
> +{
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> + int ret, cpu;
> +
> + pmu_uncore = devm_kzalloc(uncore_dev->dev, sizeof(*pmu_uncore),
> + GFP_KERNEL);
> + if (!pmu_uncore)
> + return -ENOMEM;
> +
> + cpu = cpumask_any_and(cpumask_of_node(uncore_dev->node),
> + cpu_online_mask);
> + if (cpu >= nr_cpu_ids)
> + return -EINVAL;
Do we need to fail this here ? We could always add this back when a CPU
comes online (e.g, maxcpus=n). We do have cpu-hotplug call backs anyway.
> +static acpi_status thunderx2_pmu_uncore_dev_add(acpi_handle handle, u32 level,
> + void *data, void **return_value)
> +{
> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> + struct acpi_device *adev;
> + enum thunderx2_uncore_type type;
> + int channel;
> +
> + if (acpi_bus_get_device(handle, &adev))
> + return AE_OK;
> + if (acpi_bus_get_status(adev) || !adev->status.present)
> + return AE_OK;
> +
> + type = get_uncore_device_type(adev);
> + if (type == PMU_TYPE_INVALID)
> + return AE_OK;
> +
> + uncore_dev = init_pmu_uncore_dev((struct device *)data, handle,
> + adev, type);
> +
> + if (!uncore_dev)
> + return AE_ERROR;
> +
> + for (channel = 0; channel < uncore_dev->max_channels; channel++) {
> + if (thunderx2_pmu_uncore_add(uncore_dev, channel)) {
> + /* Can't add the PMU device, abort */
> + return AE_ERROR;
> + }
> + }
> + return AE_OK;
> +}
> +
> +static int thunderx2_uncore_pmu_offline_cpu(unsigned int cpu,
> + struct hlist_node *node)
> +{
> + int new_cpu;
> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> +
> + pmu_uncore = hlist_entry_safe(node,
> + struct thunderx2_pmu_uncore_channel, node);
> + if (cpu != pmu_uncore->cpu)
> + return 0;
> +
> + new_cpu = cpumask_any_and(
> + cpumask_of_node(pmu_uncore->uncore_dev->node),
> + cpu_online_mask);
> + if (new_cpu >= nr_cpu_ids)
> + return 0;
> +
> + pmu_uncore->cpu = new_cpu;
> + perf_pmu_migrate_context(&pmu_uncore->pmu, cpu, new_cpu);
> + return 0;
> +}
> +
> +static const struct acpi_device_id thunderx2_uncore_acpi_match[] = {
> + {"CAV901C", 0},
> + {},
> +};
> +MODULE_DEVICE_TABLE(acpi, thunderx2_uncore_acpi_match);
> +
> +static int thunderx2_uncore_probe(struct platform_device *pdev)
> +{
> + struct device *dev = &pdev->dev;
> + acpi_handle handle;
> + acpi_status status;
> +
> + set_dev_node(dev, acpi_get_node(ACPI_HANDLE(dev)));
> +
> + /* Make sure firmware supports DMC/L3C set channel smc call */
> + if (test_uncore_select_channel_early(dev))
> + return -ENODEV;
> +
> + if (!has_acpi_companion(dev))
> + return -ENODEV;
> +
> + handle = ACPI_HANDLE(dev);
> + if (!handle)
> + return -EINVAL;
> +
> + /* Walk through the tree for all PMU UNCORE devices */
> + status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, 1,
> + thunderx2_pmu_uncore_dev_add,
> + NULL, dev, NULL);
> + if (ACPI_FAILURE(status)) {
> + dev_err(dev, "failed to probe PMU devices\n");
> + return_ACPI_STATUS(status);
> + }
> +
> + dev_info(dev, "node%d: pmu uncore registered\n", dev_to_node(dev));
> + return 0;
> +}
> +
> +static struct platform_driver thunderx2_uncore_driver = {
> + .probe = thunderx2_uncore_probe,
> + .driver = {
> + .name = "thunderx2-uncore-pmu",
> + .acpi_match_table = ACPI_PTR(thunderx2_uncore_acpi_match),
> + },
> +};
Don't we need to unregister the pmu's when we remove the module ?
Suzuki
Hi Suzuki,
On Wed, Oct 10, 2018 at 3:22 PM Suzuki K Poulose <[email protected]> wrote:
>
> Hi Ganapatrao,
>
> On 21/06/18 07:33, Ganapatrao Kulkarni wrote:
> > This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory
> > Controller(DMC) and Level 3 Cache(L3C).
> >
> > ThunderX2 has 8 independent DMC PMUs to capture performance events
> > corresponding to 8 channels of DDR4 Memory Controller and 16 independent
> > L3C PMUs to capture events corresponding to 16 tiles of L3 cache.
> > Each PMU supports up to 4 counters. All counters lack overflow interrupt
> > and are sampled periodically.
> >
> > Signed-off-by: Ganapatrao Kulkarni <[email protected]>
> > ---
> > drivers/perf/Kconfig | 8 +
> > drivers/perf/Makefile | 1 +
> > drivers/perf/thunderx2_pmu.c | 949 +++++++++++++++++++++++++++++++++++++++++++
> > include/linux/cpuhotplug.h | 1 +
> > 4 files changed, 959 insertions(+)
> > create mode 100644 drivers/perf/thunderx2_pmu.c
> >
>
>
> > +
> > +/*
> > + * DMC and L3 counter interface is muxed across all channels.
> > + * hence we need to select the channel before accessing counter
> > + * data/control registers.
> > + *
> > + * L3 Tile and DMC channel selection is through SMC call
> > + * SMC call arguments,
> > + * x0 = THUNDERX2_SMC_CALL_ID (Vendor SMC call Id)
> > + * x1 = THUNDERX2_SMC_SET_CHANNEL (Id to set DMC/L3C channel)
> > + * x2 = Node id
> > + * x3 = DMC(1)/L3C(0)
> > + * x4 = channel Id
> > + */
> > +static void uncore_select_channel(struct perf_event *event)
> > +{
> > + struct arm_smccc_res res;
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> > + pmu_to_thunderx2_pmu_uncore(event->pmu);
> > + struct thunderx2_pmu_uncore_dev *uncore_dev =
> > + pmu_uncore->uncore_dev;
> > +
> > + arm_smccc_smc(THUNDERX2_SMC_CALL_ID, THUNDERX2_SMC_SET_CHANNEL,
> > + uncore_dev->node, uncore_dev->type,
> > + pmu_uncore->channel, 0, 0, 0, &res);
> > + if (res.a0) {
> > + dev_err(uncore_dev->dev,
> > + "SMC to Select channel failed for PMU UNCORE[%s]\n",
> > + pmu_uncore->uncore_dev->name);
> > + }
> > +}
> > +
>
> > +static void uncore_start_event_l3c(struct perf_event *event, int flags)
> > +{
> > + u32 val;
> > + struct hw_perf_event *hwc = &event->hw;
> > +
> > + /* event id encoded in bits [07:03] */
> > + val = GET_EVENTID(event) << 3;
> > + reg_writel(val, hwc->config_base);
> > + local64_set(&hwc->prev_count, 0);
> > + reg_writel(0, hwc->event_base);
> > +}
> > +
> > +static void uncore_stop_event_l3c(struct perf_event *event)
> > +{
> > + reg_writel(0, event->hw.config_base);
> > +}
> > +
> > +static void uncore_start_event_dmc(struct perf_event *event, int flags)
> > +{
> > + u32 val;
> > + struct hw_perf_event *hwc = &event->hw;
> > + int idx = GET_COUNTERID(event);
> > + int event_type = GET_EVENTID(event);
> > +
> > + /* enable and start counters.
> > + * 8 bits for each counter, bits[05:01] of a counter to set event type.
> > + */
> > + val = reg_readl(hwc->config_base);
> > + val &= ~DMC_EVENT_CFG(idx, 0x1f);
> > + val |= DMC_EVENT_CFG(idx, event_type);
> > + reg_writel(val, hwc->config_base);
> > + local64_set(&hwc->prev_count, 0);
> > + reg_writel(0, hwc->event_base);
> > +}
> > +
> > +static void uncore_stop_event_dmc(struct perf_event *event)
> > +{
> > + u32 val;
> > + struct hw_perf_event *hwc = &event->hw;
> > + int idx = GET_COUNTERID(event);
> > +
> > + /* clear event type(bits[05:01]) to stop counter */
> > + val = reg_readl(hwc->config_base);
> > + val &= ~DMC_EVENT_CFG(idx, 0x1f);
> > + reg_writel(val, hwc->config_base);
> > +}
> > +
> > +static void init_cntr_base_l3c(struct perf_event *event,
> > + struct thunderx2_pmu_uncore_dev *uncore_dev)
> > +{
> > + struct hw_perf_event *hwc = &event->hw;
> > +
> > + /* counter ctrl/data reg offset at 8 */
> > + hwc->config_base = (unsigned long)uncore_dev->base
> > + + L3C_COUNTER_CTL + (8 * GET_COUNTERID(event));
> > + hwc->event_base = (unsigned long)uncore_dev->base
> > + + L3C_COUNTER_DATA + (8 * GET_COUNTERID(event));
> > +}
> > +
> > +static void init_cntr_base_dmc(struct perf_event *event,
> > + struct thunderx2_pmu_uncore_dev *uncore_dev)
> > +{
> > + struct hw_perf_event *hwc = &event->hw;
> > +
> > + hwc->config_base = (unsigned long)uncore_dev->base
> > + + DMC_COUNTER_CTL;
> > + /* counter data reg offset at 0xc */
> > + hwc->event_base = (unsigned long)uncore_dev->base
> > + + DMC_COUNTER_DATA + (0xc * GET_COUNTERID(event));
> > +}
> > +
> > +static void thunderx2_uncore_update(struct perf_event *event)
> > +{
> > + s64 prev, new = 0;
> > + u64 delta;
> > + struct hw_perf_event *hwc = &event->hw;
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> > + enum thunderx2_uncore_type type;
> > +
> > + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> > + type = pmu_uncore->uncore_dev->type;
> > +
> > + pmu_uncore->uncore_dev->select_channel(event);
> > +
> > + new = reg_readl(hwc->event_base);
> > + prev = local64_xchg(&hwc->prev_count, new);
> > +
> > + /* handles rollover of 32 bit counter */
> > + delta = (u32)(((1UL << 32) - prev) + new);
> > + local64_add(delta, &event->count);
> > +}
> > +
>
>
> > +
> > +/*
> > + * We must NOT create groups containing events from multiple hardware PMUs,
> > + * although mixing different software and hardware PMUs is allowed.
> > + */
> > +static bool thunderx2_uncore_validate_event_group(struct perf_event *event)
> > +{
> > + struct pmu *pmu = event->pmu;
> > + struct perf_event *leader = event->group_leader;
> > + struct perf_event *sibling;
> > + int counters = 0;
> > +
> > + if (leader->pmu != event->pmu && !is_software_event(leader))
> > + return false;
> > +
> > + for_each_sibling_event(sibling, event->group_leader) {
> > + if (is_software_event(sibling))
> > + continue;
> > + if (sibling->pmu != pmu)
> > + return false;
> > + counters++;
> > + }
>
> The above loop will not account for :
> 1) The leader event
> 2) The event that we are about to add.
>
> So you need to allocate counters for both the above to make sure we can
> accommodate the events.
Is leader also part of sibling list?
>
> > +
> > + /*
> > + * If the group requires more counters than the HW has,
> > + * it cannot ever be scheduled.
> > + */
> > + return counters < UNCORE_MAX_COUNTERS;
> > +}
> > +
>
>
> > +
> > +static void thunderx2_uncore_start(struct perf_event *event, int flags)
> > +{
> > + struct hw_perf_event *hwc = &event->hw;
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> > + struct thunderx2_pmu_uncore_dev *uncore_dev;
> > + unsigned long irqflags;
> > +
> > + hwc->state = 0;
> > + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> > + uncore_dev = pmu_uncore->uncore_dev;
> > +
> > + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
> > + uncore_dev->select_channel(event);
> > + uncore_dev->start_event(event, flags);
> > + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
> > +
> > + perf_event_update_userpage(event);
> > +
> > + if (!find_last_bit(pmu_uncore->active_counters,
> > + pmu_uncore->uncore_dev->max_counters)) {
>
> Are we guaranteed that the counter0 is always allocated ? What if the
> event is deleted and there are other events active ? A better check
> would be :
> if (find_last_bit(...) != pmu_uncore->uncore_dev->max_counters)
>
IIUC, find_last_bit will return zero if none of the counters are
active and we want to start timer for first active counter.
> > + hrtimer_start(&pmu_uncore->hrtimer,
> > + ns_to_ktime(uncore_dev->hrtimer_interval),
> > + HRTIMER_MODE_REL_PINNED);
> > + }
> > +}
>
>
> > +
> > +static void thunderx2_uncore_stop(struct perf_event *event, int flags)
> > +{
> > + struct hw_perf_event *hwc = &event->hw;
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> > + struct thunderx2_pmu_uncore_dev *uncore_dev;
> > + unsigned long irqflags;
> > +
> > + if (hwc->state & PERF_HES_UPTODATE)
> > + return;
> > +
> > + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> > + uncore_dev = pmu_uncore->uncore_dev;
> > +
> > + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
> > +
> > + uncore_dev->select_channel(event);
> > + uncore_dev->stop_event(event);
> > +
> > + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
> > + hwc->state |= PERF_HES_STOPPED;
> > + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
>
> We already check the UPTODATE flag above, we could skip that
> flag check here ?
ok, thanks.
>
> > + thunderx2_uncore_update(event);
> > + hwc->state |= PERF_HES_UPTODATE;
> > + }
> > + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
> > +}
> > +
> > +static int thunderx2_uncore_add(struct perf_event *event, int flags)
> > +{
> > + struct hw_perf_event *hwc = &event->hw;
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> > + struct thunderx2_pmu_uncore_dev *uncore_dev;
> > +
> > + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> > + uncore_dev = pmu_uncore->uncore_dev;
> > +
> > + /* Allocate a free counter */
> > + hwc->idx = alloc_counter(pmu_uncore);
> > + if (hwc->idx < 0)
> > + return -EAGAIN;
> > +
> > + pmu_uncore->events[hwc->idx] = event;
> > + /* set counter control and data registers base address */
> > + uncore_dev->init_cntr_base(event, uncore_dev);
> > +
> > + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> > + if (flags & PERF_EF_START)
> > + thunderx2_uncore_start(event, flags);
> > +
> > + return 0;
> > +}
> > +
> > +static void thunderx2_uncore_del(struct perf_event *event, int flags)
> > +{
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> > + pmu_to_thunderx2_pmu_uncore(event->pmu);
> > + struct hw_perf_event *hwc = &event->hw;
> > +
> > + thunderx2_uncore_stop(event, PERF_EF_UPDATE);
> > +
> > + /* clear the assigned counter */
> > + free_counter(pmu_uncore, GET_COUNTERID(event));
> > +
> > + perf_event_update_userpage(event);
> > + pmu_uncore->events[hwc->idx] = NULL;
> > + hwc->idx = -1;
> > +}
> > +
> > +static void thunderx2_uncore_read(struct perf_event *event)
> > +{
> > + unsigned long irqflags;
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> > + pmu_to_thunderx2_pmu_uncore(event->pmu);
> > +
> > + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
> > + thunderx2_uncore_update(event);
> > + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
> > +}
> > +
> > +static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
> > + struct hrtimer *hrt)
> > +{
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> > + unsigned long irqflags;
> > + int idx;
> > + bool restart_timer = false;
> > +
> > + pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
> > + hrtimer);
> > +
> > + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
> > + for_each_set_bit(idx, pmu_uncore->active_counters,
> > + pmu_uncore->uncore_dev->max_counters) {
> > + struct perf_event *event = pmu_uncore->events[idx];
>
> Do we need to check if the "event" is not NULL ? Couldn't this race with
> an event_del() operation ?
i dont think so, i have not come across any such race condition during
my testing.
>
> > +
> > + thunderx2_uncore_update(event);
> > + restart_timer = true;
> > + }
> > + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
> > +
> > + if (restart_timer)
> > + hrtimer_forward_now(hrt,
> > + ns_to_ktime(
> > + pmu_uncore->uncore_dev->hrtimer_interval));
> > +
> > + return restart_timer ? HRTIMER_RESTART : HRTIMER_NORESTART;
> > +}
> > +
> > +static int thunderx2_pmu_uncore_register(
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore)
> > +{
> > + struct device *dev = pmu_uncore->uncore_dev->dev;
> > + char *name = pmu_uncore->uncore_dev->name;
> > + int channel = pmu_uncore->channel;
> > +
> > + /* Perf event registration */
> > + pmu_uncore->pmu = (struct pmu) {
> > + .attr_groups = pmu_uncore->uncore_dev->attr_groups,
> > + .task_ctx_nr = perf_invalid_context,
> > + .event_init = thunderx2_uncore_event_init,
> > + .add = thunderx2_uncore_add,
> > + .del = thunderx2_uncore_del,
> > + .start = thunderx2_uncore_start,
> > + .stop = thunderx2_uncore_stop,
> > + .read = thunderx2_uncore_read,
>
> You must fill in the module field of the PMU to prevent this module from
> being unloaded while it is in use.
thanks, i will add it.
>
> > + };
> > +
> > + pmu_uncore->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
> > + "%s_%d", name, channel);
> > +
> > + return perf_pmu_register(&pmu_uncore->pmu, pmu_uncore->pmu.name, -1);
> > +}
> > +
> > +static int thunderx2_pmu_uncore_add(struct thunderx2_pmu_uncore_dev *uncore_dev,
> > + int channel)
> > +{
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> > + int ret, cpu;
> > +
> > + pmu_uncore = devm_kzalloc(uncore_dev->dev, sizeof(*pmu_uncore),
> > + GFP_KERNEL);
> > + if (!pmu_uncore)
> > + return -ENOMEM;
> > +
> > + cpu = cpumask_any_and(cpumask_of_node(uncore_dev->node),
> > + cpu_online_mask);
> > + if (cpu >= nr_cpu_ids)
> > + return -EINVAL;
>
> Do we need to fail this here ? We could always add this back when a CPU
> comes online (e.g, maxcpus=n). We do have cpu-hotplug call backs anyway.
we want to fail, since we dont have any cpu to serve.
>
> > +static acpi_status thunderx2_pmu_uncore_dev_add(acpi_handle handle, u32 level,
> > + void *data, void **return_value)
> > +{
> > + struct thunderx2_pmu_uncore_dev *uncore_dev;
> > + struct acpi_device *adev;
> > + enum thunderx2_uncore_type type;
> > + int channel;
> > +
> > + if (acpi_bus_get_device(handle, &adev))
> > + return AE_OK;
> > + if (acpi_bus_get_status(adev) || !adev->status.present)
> > + return AE_OK;
> > +
> > + type = get_uncore_device_type(adev);
> > + if (type == PMU_TYPE_INVALID)
> > + return AE_OK;
> > +
> > + uncore_dev = init_pmu_uncore_dev((struct device *)data, handle,
> > + adev, type);
> > +
> > + if (!uncore_dev)
> > + return AE_ERROR;
> > +
> > + for (channel = 0; channel < uncore_dev->max_channels; channel++) {
> > + if (thunderx2_pmu_uncore_add(uncore_dev, channel)) {
> > + /* Can't add the PMU device, abort */
> > + return AE_ERROR;
> > + }
> > + }
> > + return AE_OK;
> > +}
> > +
> > +static int thunderx2_uncore_pmu_offline_cpu(unsigned int cpu,
> > + struct hlist_node *node)
> > +{
> > + int new_cpu;
> > + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> > +
> > + pmu_uncore = hlist_entry_safe(node,
> > + struct thunderx2_pmu_uncore_channel, node);
> > + if (cpu != pmu_uncore->cpu)
> > + return 0;
> > +
> > + new_cpu = cpumask_any_and(
> > + cpumask_of_node(pmu_uncore->uncore_dev->node),
> > + cpu_online_mask);
> > + if (new_cpu >= nr_cpu_ids)
> > + return 0;
> > +
> > + pmu_uncore->cpu = new_cpu;
> > + perf_pmu_migrate_context(&pmu_uncore->pmu, cpu, new_cpu);
> > + return 0;
> > +}
> > +
> > +static const struct acpi_device_id thunderx2_uncore_acpi_match[] = {
> > + {"CAV901C", 0},
> > + {},
> > +};
> > +MODULE_DEVICE_TABLE(acpi, thunderx2_uncore_acpi_match);
> > +
> > +static int thunderx2_uncore_probe(struct platform_device *pdev)
> > +{
> > + struct device *dev = &pdev->dev;
> > + acpi_handle handle;
> > + acpi_status status;
> > +
> > + set_dev_node(dev, acpi_get_node(ACPI_HANDLE(dev)));
> > +
> > + /* Make sure firmware supports DMC/L3C set channel smc call */
> > + if (test_uncore_select_channel_early(dev))
> > + return -ENODEV;
> > +
> > + if (!has_acpi_companion(dev))
> > + return -ENODEV;
> > +
> > + handle = ACPI_HANDLE(dev);
> > + if (!handle)
> > + return -EINVAL;
> > +
> > + /* Walk through the tree for all PMU UNCORE devices */
> > + status = acpi_walk_namespace(ACPI_TYPE_DEVICE, handle, 1,
> > + thunderx2_pmu_uncore_dev_add,
> > + NULL, dev, NULL);
> > + if (ACPI_FAILURE(status)) {
> > + dev_err(dev, "failed to probe PMU devices\n");
> > + return_ACPI_STATUS(status);
> > + }
> > +
> > + dev_info(dev, "node%d: pmu uncore registered\n", dev_to_node(dev));
> > + return 0;
> > +}
> > +
> > +static struct platform_driver thunderx2_uncore_driver = {
> > + .probe = thunderx2_uncore_probe,
> > + .driver = {
> > + .name = "thunderx2-uncore-pmu",
> > + .acpi_match_table = ACPI_PTR(thunderx2_uncore_acpi_match),
> > + },
> > +};
>
> Don't we need to unregister the pmu's when we remove the module ?
it is built in module at present, no unload is supported.
Anyway i am adding in next version , since i am going for loadable module.
>
> Suzuki
thanks
Ganapat
Hi Ganapatrao,
On 11/10/18 07:39, Ganapatrao Kulkarni wrote:
>>> +
>>> +/*
>>> + * We must NOT create groups containing events from multiple hardware PMUs,
>>> + * although mixing different software and hardware PMUs is allowed.
>>> + */
>>> +static bool thunderx2_uncore_validate_event_group(struct perf_event *event)
>>> +{
>>> + struct pmu *pmu = event->pmu;
>>> + struct perf_event *leader = event->group_leader;
>>> + struct perf_event *sibling;
>>> + int counters = 0;
>>> +
>>> + if (leader->pmu != event->pmu && !is_software_event(leader))
>>> + return false;
>>> +
>>> + for_each_sibling_event(sibling, event->group_leader) {
>>> + if (is_software_event(sibling))
>>> + continue;
>>> + if (sibling->pmu != pmu)
>>> + return false;
>>> + counters++;
>>> + }
>>
>> The above loop will not account for :
>> 1) The leader event
>> 2) The event that we are about to add.
>>
>> So you need to allocate counters for both the above to make sure we can
>> accommodate the events.
>
> Is leader also part of sibling list?
No.
>>> +static void thunderx2_uncore_start(struct perf_event *event, int flags)
>>> +{
>>> + struct hw_perf_event *hwc = &event->hw;
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>>> + unsigned long irqflags;
>>> +
>>> + hwc->state = 0;
>>> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
>>> + uncore_dev = pmu_uncore->uncore_dev;
>>> +
>>> + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
>>> + uncore_dev->select_channel(event);
>>> + uncore_dev->start_event(event, flags);
>>> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
>>> +
>>> + perf_event_update_userpage(event);
>>> +
>>> + if (!find_last_bit(pmu_uncore->active_counters,
>>> + pmu_uncore->uncore_dev->max_counters)) {
>>
>> Are we guaranteed that the counter0 is always allocated ? What if the
>> event is deleted and there are other events active ? A better check
>> would be :
>> if (find_last_bit(...) != pmu_uncore->uncore_dev->max_counters)
>>
> IIUC, find_last_bit will return zero if none of the counters are
> active and we want to start timer for first active counter.
>
Well, if that was the case, how do you know if the bit zero is the only
bit set ?
AFAICS, lib/find_bit.c has :
unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
{
if (size) {
unsigned long val = BITMAP_LAST_WORD_MASK(size);
unsigned long idx = (size-1) / BITS_PER_LONG;
do {
val &= addr[idx];
if (val)
return idx * BITS_PER_LONG + __fls(val);
val = ~0ul;
} while (idx--);
}
return size;
}
It returns "size" when it can't find a set bit.
>>> + thunderx2_uncore_update(event);
>>> + hwc->state |= PERF_HES_UPTODATE;
>>> + }
>>> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
>>> +}
>>> +
>>> +static int thunderx2_uncore_add(struct perf_event *event, int flags)
>>> +{
>>> + struct hw_perf_event *hwc = &event->hw;
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
>>> +
>>> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
>>> + uncore_dev = pmu_uncore->uncore_dev;
>>> +
>>> + /* Allocate a free counter */
>>> + hwc->idx = alloc_counter(pmu_uncore);
>>> + if (hwc->idx < 0)
>>> + return -EAGAIN;
>>> +
>>> + pmu_uncore->events[hwc->idx] = event;
>>> + /* set counter control and data registers base address */
>>> + uncore_dev->init_cntr_base(event, uncore_dev);
>>> +
>>> + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
>>> + if (flags & PERF_EF_START)
>>> + thunderx2_uncore_start(event, flags);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static void thunderx2_uncore_del(struct perf_event *event, int flags)
>>> +{
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
>>> + pmu_to_thunderx2_pmu_uncore(event->pmu);
>>> + struct hw_perf_event *hwc = &event->hw;
>>> +
>>> + thunderx2_uncore_stop(event, PERF_EF_UPDATE);
>>> +
>>> + /* clear the assigned counter */
>>> + free_counter(pmu_uncore, GET_COUNTERID(event));
>>> +
>>> + perf_event_update_userpage(event);
>>> + pmu_uncore->events[hwc->idx] = NULL;
>>> + hwc->idx = -1;
>>> +}
>>> +
>>> +static void thunderx2_uncore_read(struct perf_event *event)
>>> +{
>>> + unsigned long irqflags;
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
>>> + pmu_to_thunderx2_pmu_uncore(event->pmu);
>>> +
>>> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
>>> + thunderx2_uncore_update(event);
>>> + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
>>> +}
>>> +
>>> +static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
>>> + struct hrtimer *hrt)
>>> +{
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>>> + unsigned long irqflags;
>>> + int idx;
>>> + bool restart_timer = false;
>>> +
>>> + pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
>>> + hrtimer);
>>> +
>>> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
>>> + for_each_set_bit(idx, pmu_uncore->active_counters,
>>> + pmu_uncore->uncore_dev->max_counters) {
>>> + struct perf_event *event = pmu_uncore->events[idx];
>>
>> Do we need to check if the "event" is not NULL ? Couldn't this race with
>> an event_del() operation ?
>
> i dont think so, i have not come across any such race condition during
> my testing.
Thinking about it further, we may not hit it, as the PMU is "disable"d,
before "add/del", which implies that the IRQ won't be triggered.
Btw, in general, not hitting a race condition doesn't prove there cannot
be a race ;)
>>> +static int thunderx2_pmu_uncore_register(
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore)
>>> +{
>>> + struct device *dev = pmu_uncore->uncore_dev->dev;
>>> + char *name = pmu_uncore->uncore_dev->name;
>>> + int channel = pmu_uncore->channel;
>>> +
>>> + /* Perf event registration */
>>> + pmu_uncore->pmu = (struct pmu) {
>>> + .attr_groups = pmu_uncore->uncore_dev->attr_groups,
>>> + .task_ctx_nr = perf_invalid_context,
>>> + .event_init = thunderx2_uncore_event_init,
>>> + .add = thunderx2_uncore_add,
>>> + .del = thunderx2_uncore_del,
>>> + .start = thunderx2_uncore_start,
>>> + .stop = thunderx2_uncore_stop,
>>> + .read = thunderx2_uncore_read,
>>
>> You must fill in the module field of the PMU to prevent this module from
>> being unloaded while it is in use.
>
> thanks, i will add it.
>>
>>> + };
>>> +
>>> + pmu_uncore->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
>>> + "%s_%d", name, channel);
>>> +
>>> + return perf_pmu_register(&pmu_uncore->pmu, pmu_uncore->pmu.name, -1);
>>> +}
>>> +
>>> +static int thunderx2_pmu_uncore_add(struct thunderx2_pmu_uncore_dev *uncore_dev,
>>> + int channel)
>>> +{
>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>>> + int ret, cpu;
>>> +
>>> + pmu_uncore = devm_kzalloc(uncore_dev->dev, sizeof(*pmu_uncore),
>>> + GFP_KERNEL);
>>> + if (!pmu_uncore)
>>> + return -ENOMEM;
>>> +
>>> + cpu = cpumask_any_and(cpumask_of_node(uncore_dev->node),
>>> + cpu_online_mask);
>>> + if (cpu >= nr_cpu_ids)
>>> + return -EINVAL;
>>
>> Do we need to fail this here ? We could always add this back when a CPU
>> comes online (e.g, maxcpus=n). We do have cpu-hotplug call backs anyway.
>
> we want to fail, since we dont have any cpu to serve.
>
I understand that. We can bring up CPUs later from userspace. e.g, on a
system like ThunderX, to speed up the booting, one could add "maxcpus=8"
which forces the kernel to only bring up the first 8 CPUs and the rest
are brought up by the userspace. So, one of the DMC/L3 doesn't have a
supported CPU booted up during the driver probe, you will not bring
the PMU up and hence won't be able to use them even after the CPUs are
brought up later. You can take a look at the arm_dsu_pmu, where we
handle a similar situation via dsu_pmu_cpu_online().
>>> +static struct platform_driver thunderx2_uncore_driver = {
>>> + .probe = thunderx2_uncore_probe,
>>> + .driver = {
>>> + .name = "thunderx2-uncore-pmu",
>>> + .acpi_match_table = ACPI_PTR(thunderx2_uncore_acpi_match),
>>> + },
>>> +};
>>
>> Don't we need to unregister the pmu's when we remove the module ?
>
> it is built in module at present, no unload is supported.
> Anyway i am adding in next version , since i am going for loadable module.
Ok
Suzuki
On Thu, Oct 11, 2018 at 2:43 PM Suzuki K Poulose <[email protected]> wrote:
>
> Hi Ganapatrao,
>
> On 11/10/18 07:39, Ganapatrao Kulkarni wrote:
> >>> +
> >>> +/*
> >>> + * We must NOT create groups containing events from multiple hardware PMUs,
> >>> + * although mixing different software and hardware PMUs is allowed.
> >>> + */
> >>> +static bool thunderx2_uncore_validate_event_group(struct perf_event *event)
> >>> +{
> >>> + struct pmu *pmu = event->pmu;
> >>> + struct perf_event *leader = event->group_leader;
> >>> + struct perf_event *sibling;
> >>> + int counters = 0;
> >>> +
> >>> + if (leader->pmu != event->pmu && !is_software_event(leader))
> >>> + return false;
> >>> +
> >>> + for_each_sibling_event(sibling, event->group_leader) {
> >>> + if (is_software_event(sibling))
> >>> + continue;
> >>> + if (sibling->pmu != pmu)
> >>> + return false;
> >>> + counters++;
> >>> + }
> >>
> >> The above loop will not account for :
> >> 1) The leader event
> >> 2) The event that we are about to add.
> >>
> >> So you need to allocate counters for both the above to make sure we can
> >> accommodate the events.
> >
> > Is leader also part of sibling list?
>
> No.
not sure, what you are expecting here, this function is inline with
other perf driver code added in driver/perf
>
> >>> +static void thunderx2_uncore_start(struct perf_event *event, int flags)
> >>> +{
> >>> + struct hw_perf_event *hwc = &event->hw;
> >>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> >>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> >>> + unsigned long irqflags;
> >>> +
> >>> + hwc->state = 0;
> >>> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> >>> + uncore_dev = pmu_uncore->uncore_dev;
> >>> +
> >>> + raw_spin_lock_irqsave(&uncore_dev->lock, irqflags);
> >>> + uncore_dev->select_channel(event);
> >>> + uncore_dev->start_event(event, flags);
> >>> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
> >>> +
> >>> + perf_event_update_userpage(event);
> >>> +
> >>> + if (!find_last_bit(pmu_uncore->active_counters,
> >>> + pmu_uncore->uncore_dev->max_counters)) {
> >>
> >> Are we guaranteed that the counter0 is always allocated ? What if the
> >> event is deleted and there are other events active ? A better check
timer will be active/restarted, till last bit is cleared( till last
counter is released).
> >> would be :
> >> if (find_last_bit(...) != pmu_uncore->uncore_dev->max_counters)
> >>
> > IIUC, find_last_bit will return zero if none of the counters are
> > active and we want to start timer for first active counter.
> >
>
> Well, if that was the case, how do you know if the bit zero is the only
> bit set ?
>
> AFAICS, lib/find_bit.c has :
>
> unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
> {
> if (size) {
> unsigned long val = BITMAP_LAST_WORD_MASK(size);
> unsigned long idx = (size-1) / BITS_PER_LONG;
>
> do {
> val &= addr[idx];
> if (val)
> return idx * BITS_PER_LONG + __fls(val);
>
> val = ~0ul;
> } while (idx--);
> }
> return size;
> }
>
>
> It returns "size" when it can't find a set bit.
before start, alloc_counter is called to set, it seems we never hit
case where at-least one bit is not set.
timer will be started for first bit set and will continue till all
bits are cleared.
>
>
> >>> + thunderx2_uncore_update(event);
> >>> + hwc->state |= PERF_HES_UPTODATE;
> >>> + }
> >>> + raw_spin_unlock_irqrestore(&uncore_dev->lock, irqflags);
> >>> +}
> >>> +
> >>> +static int thunderx2_uncore_add(struct perf_event *event, int flags)
> >>> +{
> >>> + struct hw_perf_event *hwc = &event->hw;
> >>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> >>> + struct thunderx2_pmu_uncore_dev *uncore_dev;
> >>> +
> >>> + pmu_uncore = pmu_to_thunderx2_pmu_uncore(event->pmu);
> >>> + uncore_dev = pmu_uncore->uncore_dev;
> >>> +
> >>> + /* Allocate a free counter */
> >>> + hwc->idx = alloc_counter(pmu_uncore);
> >>> + if (hwc->idx < 0)
> >>> + return -EAGAIN;
> >>> +
> >>> + pmu_uncore->events[hwc->idx] = event;
> >>> + /* set counter control and data registers base address */
> >>> + uncore_dev->init_cntr_base(event, uncore_dev);
> >>> +
> >>> + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> >>> + if (flags & PERF_EF_START)
> >>> + thunderx2_uncore_start(event, flags);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void thunderx2_uncore_del(struct perf_event *event, int flags)
> >>> +{
> >>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> >>> + pmu_to_thunderx2_pmu_uncore(event->pmu);
> >>> + struct hw_perf_event *hwc = &event->hw;
> >>> +
> >>> + thunderx2_uncore_stop(event, PERF_EF_UPDATE);
> >>> +
> >>> + /* clear the assigned counter */
> >>> + free_counter(pmu_uncore, GET_COUNTERID(event));
> >>> +
> >>> + perf_event_update_userpage(event);
> >>> + pmu_uncore->events[hwc->idx] = NULL;
> >>> + hwc->idx = -1;
> >>> +}
> >>> +
> >>> +static void thunderx2_uncore_read(struct perf_event *event)
> >>> +{
> >>> + unsigned long irqflags;
> >>> + struct thunderx2_pmu_uncore_channel *pmu_uncore =
> >>> + pmu_to_thunderx2_pmu_uncore(event->pmu);
> >>> +
> >>> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
> >>> + thunderx2_uncore_update(event);
> >>> + raw_spin_unlock_irqrestore(&pmu_uncore->uncore_dev->lock, irqflags);
> >>> +}
> >>> +
> >>> +static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
> >>> + struct hrtimer *hrt)
> >>> +{
> >>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> >>> + unsigned long irqflags;
> >>> + int idx;
> >>> + bool restart_timer = false;
> >>> +
> >>> + pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
> >>> + hrtimer);
> >>> +
> >>> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
> >>> + for_each_set_bit(idx, pmu_uncore->active_counters,
> >>> + pmu_uncore->uncore_dev->max_counters) {
> >>> + struct perf_event *event = pmu_uncore->events[idx];
> >>
> >> Do we need to check if the "event" is not NULL ? Couldn't this race with
> >> an event_del() operation ?
> >
> > i dont think so, i have not come across any such race condition during
> > my testing.
>
> Thinking about it further, we may not hit it, as the PMU is "disable"d,
> before "add/del", which implies that the IRQ won't be triggered.
>
> Btw, in general, not hitting a race condition doesn't prove there cannot
> be a race ;)
want to avoid, unnecessary defensive programming.
>
> >>> +static int thunderx2_pmu_uncore_register(
> >>> + struct thunderx2_pmu_uncore_channel *pmu_uncore)
> >>> +{
> >>> + struct device *dev = pmu_uncore->uncore_dev->dev;
> >>> + char *name = pmu_uncore->uncore_dev->name;
> >>> + int channel = pmu_uncore->channel;
> >>> +
> >>> + /* Perf event registration */
> >>> + pmu_uncore->pmu = (struct pmu) {
> >>> + .attr_groups = pmu_uncore->uncore_dev->attr_groups,
> >>> + .task_ctx_nr = perf_invalid_context,
> >>> + .event_init = thunderx2_uncore_event_init,
> >>> + .add = thunderx2_uncore_add,
> >>> + .del = thunderx2_uncore_del,
> >>> + .start = thunderx2_uncore_start,
> >>> + .stop = thunderx2_uncore_stop,
> >>> + .read = thunderx2_uncore_read,
> >>
> >> You must fill in the module field of the PMU to prevent this module from
> >> being unloaded while it is in use.
> >
> > thanks, i will add it.
> >>
> >>> + };
> >>> +
> >>> + pmu_uncore->pmu.name = devm_kasprintf(dev, GFP_KERNEL,
> >>> + "%s_%d", name, channel);
> >>> +
> >>> + return perf_pmu_register(&pmu_uncore->pmu, pmu_uncore->pmu.name, -1);
> >>> +}
> >>> +
> >>> +static int thunderx2_pmu_uncore_add(struct thunderx2_pmu_uncore_dev *uncore_dev,
> >>> + int channel)
> >>> +{
> >>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
> >>> + int ret, cpu;
> >>> +
> >>> + pmu_uncore = devm_kzalloc(uncore_dev->dev, sizeof(*pmu_uncore),
> >>> + GFP_KERNEL);
> >>> + if (!pmu_uncore)
> >>> + return -ENOMEM;
> >>> +
> >>> + cpu = cpumask_any_and(cpumask_of_node(uncore_dev->node),
> >>> + cpu_online_mask);
> >>> + if (cpu >= nr_cpu_ids)
> >>> + return -EINVAL;
> >>
> >> Do we need to fail this here ? We could always add this back when a CPU
> >> comes online (e.g, maxcpus=n). We do have cpu-hotplug call backs anyway.
> >
> > we want to fail, since we dont have any cpu to serve.
> >
>
> I understand that. We can bring up CPUs later from userspace. e.g, on a
> system like ThunderX, to speed up the booting, one could add "maxcpus=8"
> which forces the kernel to only bring up the first 8 CPUs and the rest
> are brought up by the userspace. So, one of the DMC/L3 doesn't have a
> supported CPU booted up during the driver probe, you will not bring
> the PMU up and hence won't be able to use them even after the CPUs are
> brought up later. You can take a look at the arm_dsu_pmu, where we
> handle a similar situation via dsu_pmu_cpu_online().
ok got it thanks, this check was added when hotplug was not implemented.
>
>
> >>> +static struct platform_driver thunderx2_uncore_driver = {
> >>> + .probe = thunderx2_uncore_probe,
> >>> + .driver = {
> >>> + .name = "thunderx2-uncore-pmu",
> >>> + .acpi_match_table = ACPI_PTR(thunderx2_uncore_acpi_match),
> >>> + },
> >>> +};
> >>
> >> Don't we need to unregister the pmu's when we remove the module ?
> >
> > it is built in module at present, no unload is supported.
> > Anyway i am adding in next version , since i am going for loadable module.
>
> Ok
>
>
> Suzuki
thanks
Ganapat
Hi Ganpatrao,
On 11/10/18 17:06, Ganapatrao Kulkarni wrote:
> On Thu, Oct 11, 2018 at 2:43 PM Suzuki K Poulose <[email protected]> wrote:
>>
>> Hi Ganapatrao,
>>
>> On 11/10/18 07:39, Ganapatrao Kulkarni wrote:
>>>>> +
>>>>> +/*
>>>>> + * We must NOT create groups containing events from multiple hardware PMUs,
>>>>> + * although mixing different software and hardware PMUs is allowed.
>>>>> + */
>>>>> +static bool thunderx2_uncore_validate_event_group(struct perf_event *event)
>>>>> +{
>>>>> + struct pmu *pmu = event->pmu;
>>>>> + struct perf_event *leader = event->group_leader;
>>>>> + struct perf_event *sibling;
>>>>> + int counters = 0;
>>>>> +
>>>>> + if (leader->pmu != event->pmu && !is_software_event(leader))
>>>>> + return false;
>>>>> +
>>>>> + for_each_sibling_event(sibling, event->group_leader) {
>>>>> + if (is_software_event(sibling))
>>>>> + continue;
>>>>> + if (sibling->pmu != pmu)
>>>>> + return false;
>>>>> + counters++;
>>>>> + }
>>>>
>>>> The above loop will not account for :
>>>> 1) The leader event
>>>> 2) The event that we are about to add.
>>>>
>>>> So you need to allocate counters for both the above to make sure we can
>>>> accommodate the events.
>>>
>>> Is leader also part of sibling list?
>>
>> No.
>
> not sure, what you are expecting here,
The purpose of the validate_group is to ensure if a group of events can
be scheduled together. That implies, if the new event is added to the
group, do we have enough counters to profile. So, you have :
1 (leader) + number-of-siblings-in-the-group + 1(this_new) events
But you only account for the number-of-siblings.
> this function is inline with other perf driver code added in driver/perf
No, it is not. arm-cci, arm-dsu-pmu, arm-pmu, all do separate accounting
for leader, siblings and the "new event".
>>>> Are we guaranteed that the counter0 is always allocated ? What if the
>>>> event is deleted and there are other events active ? A better check
>
> timer will be active/restarted, till last bit is cleared( till last
> counter is released).
Ok, so you kick off the timer when you start the first active event,
which is assumed to be on the counter0 and the timeout handler would
restart the counters when at least one event is active, as you don't
have a PMU pmu_{enable/disable} call back. But I am still not convinced
that assuming the "first counter" is always 0. I would rather use:
bitmap_weight() == 1 check.
>>>> would be :
>>>> if (find_last_bit(...) != pmu_uncore->uncore_dev->max_counters)
>>>>
>>> IIUC, find_last_bit will return zero if none of the counters are
>>> active and we want to start timer for first active counter.
>>>
>>
>> Well, if that was the case, how do you know if the bit zero is the only
>> bit set ?
>>
>> AFAICS, lib/find_bit.c has :
>>
>> unsigned long find_last_bit(const unsigned long *addr, unsigned long size)
>> {
>> if (size) {
>> unsigned long val = BITMAP_LAST_WORD_MASK(size);
>> unsigned long idx = (size-1) / BITS_PER_LONG;
>>
>> do {
>> val &= addr[idx];
>> if (val)
>> return idx * BITS_PER_LONG + __fls(val);
>>
>> val = ~0ul;
>> } while (idx--);
>> }
>> return size;
>> }
>>
>>
>> It returns "size" when it can't find a set bit.
>
> before start, alloc_counter is called to set, it seems we never hit
> case where at-least one bit is not set.
> timer will be started for first bit set and will continue till all
> bits are cleared.
>>
>>
>>>>> +static enum hrtimer_restart thunderx2_uncore_hrtimer_callback(
>>>>> + struct hrtimer *hrt)
>>>>> +{
>>>>> + struct thunderx2_pmu_uncore_channel *pmu_uncore;
>>>>> + unsigned long irqflags;
>>>>> + int idx;
>>>>> + bool restart_timer = false;
>>>>> +
>>>>> + pmu_uncore = container_of(hrt, struct thunderx2_pmu_uncore_channel,
>>>>> + hrtimer);
>>>>> +
>>>>> + raw_spin_lock_irqsave(&pmu_uncore->uncore_dev->lock, irqflags);
>>>>> + for_each_set_bit(idx, pmu_uncore->active_counters,
>>>>> + pmu_uncore->uncore_dev->max_counters) {
>>>>> + struct perf_event *event = pmu_uncore->events[idx];
>>>>
>>>> Do we need to check if the "event" is not NULL ? Couldn't this race with
>>>> an event_del() operation ?
>>>
>>> i dont think so, i have not come across any such race condition during
>>> my testing.
>>
>> Thinking about it further, we may not hit it, as the PMU is "disable"d,
>> before "add/del", which implies that the IRQ won't be triggered.
Taking another look, since you don't have pmu_disable/enable callbacks,
the events could still be running. However, you are protected by the
free_counter() which holds the uncore_dev->lock to prevent the race.
>>
>> Btw, in general, not hitting a race condition doesn't prove there cannot
>> be a race ;)
>
> want to avoid, unnecessary defensive programming.
Of course.
Suzuki