LinuxLists.cc - [PATCH 0/1] dmaengine: idxd: IDXD pmu support

2021-04-02 16:50:51

Subject: [PATCH 0/1] dmaengine: idxd: IDXD pmu support

Hi,

This patchset implements initial pmu support for the Intel DSA (Data
Streaming Accelerator [1]), which I'm hoping can go into 5.13.

I'm also hoping to supply a couple follow-on patches in the near
future, but I'm not yet sure how much sense they make, so I thought
I'd throw a couple ideas out there and maybe get some opinions before
going forward with them:

- The perf userspace interface for this isn't exactly user-friedly,
in that you currently need to specify numeric values for field
values:

# perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/

It would be nicer to be able to specify those values symbolically
instead, and the way to do that seems to be via some JSON files in
perf userspace.

- Some of the DSA pmu support is patterned after existing uncore
code, and there seems to be at least some opportunity to
consolidate some of the things they both do into common code, such
as the cpumask device attributes and related cpu hotplug support.
At this point I'm not sure how much sense it makes to put any
effort into that, but would be willing to try if there would be
some interest in it, especially considering there will probably be
future pmu support added that could benefit from it.

Thanks,

Tom

[1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html

Tom Zanussi (1):
dmaengine: idxd: Add IDXD performance monitor support

drivers/dma/Kconfig | 13 +
drivers/dma/idxd/Makefile | 2 +
drivers/dma/idxd/idxd.h | 45 +++
drivers/dma/idxd/init.c | 9 +
drivers/dma/idxd/irq.c | 5 +-
drivers/dma/idxd/perfmon.c | 661 +++++++++++++++++++++++++++++++++++
drivers/dma/idxd/perfmon.h | 119 +++++++
drivers/dma/idxd/registers.h | 108 ++++++
include/linux/cpuhotplug.h | 1 +
9 files changed, 959 insertions(+), 4 deletions(-)
create mode 100644 drivers/dma/idxd/perfmon.c
create mode 100644 drivers/dma/idxd/perfmon.h

--
2.17.1

2021-04-02 16:50:55

by Tom Zanussi

[permalink] [raw]

Subject: [PATCH 1/1] dmaengine: idxd: Add IDXD performance monitor support

Enable the IDXD performance monitor capability (named 'perfmon' in the
DSA (Data Streaming Accelerator) spec [1]), which supports the
collection of information about key events occurring during DSA and
IAX (Intel Analytics Accelerator) device execution, to assist in
performance tuning and debugging.

The idxd perfmon support is implemented as part of the IDXD driver and
interfaces with the Linux perf framework. It has several features in
common with the existing uncore pmu support:

- it does not support sampling
- does not support per-thread counting

However it also has some unique features not present in the core and
uncore support:

- all general-purpose counters are identical, thus no event constraints
- operation is always system-wide

While the core perf subsystem assumes that all counters are by default
per-cpu, the uncore pmus are socket-scoped and use a cpu mask to
restrict counting to one cpu from each socket. IDXD counters use a
similar strategy but expand the scope even further; since IDXD
counters are system-wide and can be read from any cpu, the IDXD perf
driver picks a single cpu to do the work (with cpu hotplug notifiers
to choose a different cpu if the chosen one is taken off-line).

More specifically, the perf userspace tool by default opens a counter
for each cpu for an event. However, if it finds a cpumask file
associated with the pmu under sysfs, as is the case with the uncore
pmus, it will open counters only on the cpus specified by the cpumask.
Since perfmon only needs to open a single counter per event for a
given IDXD device, the perfmon driver will create a sysfs cpumask file
for the device and insert the first cpu of the system into it. When a
user uses perf to open an event, perf will open a single counter on
the cpu specified by the cpu mask. This amounts to the default
system-wide rather than per-cpu counting mentioned previously for
perfmon pmu events. In order to keep the cpu mask up-to-date, the
driver implements cpu hotplug support for multiple devices, as IDXD
usually enumerates and registers more than one idxd device.

The perfmon driver implements basic perfmon hardware capability
discovery and configuration, and is initialized by the IDXD driver's
probe function. During initialization, the driver retrieves the total
number of supported performance counters, the pmu ID, and the device
type from idxd device, and registers itself under the Linux perf
framework.

The perf userspace tool can be used to monitor single or multiple
events depending on the given configuration, as well as event groups,
which are also supported by the perfmon driver. The user configures
events using the perf tool command-line interface by specifying the
event and corresponding event category, along with an optional set of
filters that can be used to restrict counting to specific work queues,
traffic classes, page and transfer sizes, and engines (See [1] for
specifics).

With the configuration specified by the user, the perf tool issues a
system call passing that information to the kernel, which uses it to
initialize the specified event(s). The event(s) are opened and
started, and following termination of the perf command, they're
stopped. At that point, the perfmon driver will read the latest count
for the event(s), calculate the difference between the latest counter
values and previously tracked counter values, and display the final
incremental count as the event count for the cycle. An overflow
handler registered on the IDXD irq path is used to account for counter
overflows, which are signaled by an overflow interrupt.

Below are a couple of examples of perf usage for monitoring DSA events.

The following monitors all events in the 'engine' category. Becuuse
no filters are specified, this captures all engine events for the
workload, which in this case is 19 iterations of the work generated by
the kernel dmatest module.

Details describing the events can be found in Appendix D of [1],
Performance Monitoring Events, but briefly they are:

event 0x1: total input data processed, in 32-byte units
event 0x2: total data written, in 32-byte units
event 0x4: number of work descriptors that read the source
event 0x8: number of work descriptors that write the destination
event 0x10: number of work descriptors dispatched from batch descriptors
event 0x20: number of work descriptors dispatched from work queues

# perf stat -e dsa0/event=0x1,event_category=0x1/,
dsa0/event=0x2,event_category=0x1/,
dsa0/event=0x4,event_category=0x1/,
dsa0/event=0x8,event_category=0x1/,
dsa0/event=0x10,event_category=0x1/,
dsa0/event=0x20,event_category=0x1/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1

Performance counter stats for 'system wide':

5,332 dsa0/event=0x1,event_category=0x1/
5,327 dsa0/event=0x2,event_category=0x1/
19 dsa0/event=0x4,event_category=0x1/
19 dsa0/event=0x8,event_category=0x1/
0 dsa0/event=0x10,event_category=0x1/
19 dsa0/event=0x20,event_category=0x1/

21.977436186 seconds time elapsed

The command below illustrates filter usage with a simple example. It
specifies that MEM_MOVE operations should be counted for the DSA
device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number
of Memory Move Descriptors, which is part of event category 0x3 -
Operations. The detailed category and event IDs are available in
Appendix D, Performance Monitoring Events, of [1]). In addition to
the event and event category, a number of filters are also specified
(the detailed filter values are available in Chapter 6.4 (Filter
Support) of [1]), which will restrict counting to only those events
that meet all of the filter criteria. In this case, the filters
specify that only MEM_MOVE operations that are serviced by work queue
wq0 and specifically engine number engine0 and traffic class tc0
having sizes between 0 and 4k and page size of between 0 and 1G result
in a counter hit; anything else will be filtered out and not appear in
the final count. Note that filters are optional - any filter not
specified is assumed to be all ones and will pass anything.

# perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/
modprobe dmatest channel=dma0chan0 timeout=2000
iterations=19 run=1 wait=1

Performance counter stats for 'system wide':

19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7,
filter_eng=0x1,event=0x8,event_category=0x3/

21.865914091 seconds time elapsed

The output above reflects that the unspecified workload resulted in
the counting of 19 MEM_MOVE operation events that met the filter
criteria.

[1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html

[ Based on work originally by Jing Lin. ]

Reviewed-by: Dave Jiang <[email protected]>
Reviewed-by: Kan Liang <[email protected]>
Signed-off-by: Tom Zanussi <[email protected]>
---
drivers/dma/Kconfig | 13 +
drivers/dma/idxd/Makefile | 2 +
drivers/dma/idxd/idxd.h | 45 +++
drivers/dma/idxd/init.c | 9 +
drivers/dma/idxd/irq.c | 5 +-
drivers/dma/idxd/perfmon.c | 661 +++++++++++++++++++++++++++++++++++
drivers/dma/idxd/perfmon.h | 119 +++++++
drivers/dma/idxd/registers.h | 108 ++++++
include/linux/cpuhotplug.h | 1 +
9 files changed, 959 insertions(+), 4 deletions(-)
create mode 100644 drivers/dma/idxd/perfmon.c
create mode 100644 drivers/dma/idxd/perfmon.h

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 0c2827fd8c19..fa945420e346 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -300,6 +300,19 @@ config INTEL_IDXD_SVM
depends on PCI_PASID
depends on PCI_IOV

+config INTEL_IDXD_PERFMON
+ bool "Intel Data Accelerators performance monitor support"
+ depends on INTEL_IDXD
+ default y
+ help
+ Enable performance monitor (pmu) support for the Intel(R)
+ data accelerators present in Intel Xeon CPU. With this
+ enabled, perf can be used to monitor the DSA (Intel Data
+ Streaming Accelerator) events described in the Intel DSA
+ spec.
+
+ If unsure, say N.
+
config INTEL_IOATDMA
tristate "Intel I/OAT DMA support"
depends on PCI && X86_64
diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile
index 8978b898d777..6d11558756f8 100644
--- a/drivers/dma/idxd/Makefile
+++ b/drivers/dma/idxd/Makefile
@@ -1,2 +1,4 @@
obj-$(CONFIG_INTEL_IDXD) += idxd.o
idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o
+
+idxd-$(CONFIG_INTEL_IDXD_PERFMON) += perfmon.o
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 81a0e65fd316..bde819f30916 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -8,6 +8,8 @@
#include <linux/percpu-rwsem.h>
#include <linux/wait.h>
#include <linux/cdev.h>
+#include <linux/pci.h>
+#include <linux/perf_event.h>
#include "registers.h"

#define IDXD_DRIVER_VERSION "1.00"
@@ -25,6 +27,7 @@ enum idxd_type {
};

#define IDXD_NAME_SIZE 128
+#define IDXD_PMU_EVENT_MAX 64

struct idxd_device_driver {
struct device_driver drv;
@@ -56,6 +59,31 @@ struct idxd_group {
int tc_b;
};

+struct idxd_pmu {
+ struct idxd_device *idxd;
+
+ struct perf_event *event_list[IDXD_PMU_EVENT_MAX];
+ int n_events;
+
+ DECLARE_BITMAP(used_mask, IDXD_PMU_EVENT_MAX);
+
+ struct pmu pmu;
+ char name[IDXD_NAME_SIZE];
+ int cpu;
+
+ int n_counters;
+ int counter_width;
+ int n_event_categories;
+
+ bool per_counter_caps_supported;
+ unsigned long supported_event_categories;
+
+ unsigned long supported_filters;
+ int n_filters;
+
+ struct hlist_node cpuhp_node;
+};
+
#define IDXD_MAX_PRIORITY 0xf

enum idxd_wq_state {
@@ -213,6 +241,8 @@ struct idxd_device {
struct dma_device dma_dev;
struct workqueue_struct *wq;
struct work_struct work;
+
+ struct idxd_pmu *idxd_pmu;
};

/* IDXD software descriptor */
@@ -369,4 +399,19 @@ int idxd_cdev_get_major(struct idxd_device *idxd);
int idxd_wq_add_cdev(struct idxd_wq *wq);
void idxd_wq_del_cdev(struct idxd_wq *wq);

+/* perfmon */
+#ifdef CONFIG_INTEL_IDXD_PERFMON
+int perfmon_pmu_init(struct idxd_device *idxd);
+void perfmon_pmu_remove(struct idxd_device *idxd);
+void perfmon_counter_overflow(struct idxd_device *idxd);
+void perfmon_init(void);
+void perfmon_exit(void);
+#else
+static inline int perfmon_pmu_init(struct idxd_device *idxd) { return 0; }
+static inline void perfmon_pmu_remove(struct idxd_device *idxd) {}
+static inline void perfmon_counter_overflow(struct idxd_device *idxd) {}
+static inline void perfmon_init(void) {}
+static inline void perfmon_exit(void) {}
+#endif
+
#endif
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 085a0c3b62c6..e117d30c97ce 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -21,6 +21,7 @@
#include "../dmaengine.h"
#include "registers.h"
#include "idxd.h"
+#include "perfmon.h"

MODULE_VERSION(IDXD_DRIVER_VERSION);
MODULE_LICENSE("GPL v2");
@@ -378,6 +379,10 @@ static int idxd_probe(struct idxd_device *idxd)

idxd->major = idxd_cdev_get_major(idxd);

+ rc = perfmon_pmu_init(idxd);
+ if (rc < 0)
+ dev_warn(dev, "Failed to initialize perfmon. No PMU support: %d\n", rc);
+
dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id);
return 0;

@@ -522,6 +527,7 @@ static void idxd_remove(struct pci_dev *pdev)
idxd_shutdown(pdev);
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
+ perfmon_pmu_remove(idxd);
mutex_lock(&idxd_idr_lock);
idr_remove(&idxd_idrs[idxd->type], idxd->id);
mutex_unlock(&idxd_idr_lock);
@@ -556,6 +562,8 @@ static int __init idxd_init_module(void)
for (i = 0; i < IDXD_TYPE_MAX; i++)
idr_init(&idxd_idrs[i]);

+ perfmon_init();
+
err = idxd_register_bus_type();
if (err < 0)
return err;
@@ -589,5 +597,6 @@ static void __exit idxd_exit_module(void)
pci_unregister_driver(&idxd_pci_driver);
idxd_cdev_remove();
idxd_unregister_bus_type();
+ perfmon_exit();
}
module_exit(idxd_exit_module);
diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c
index a60ca11a5784..bb2f4f5f66ec 100644
--- a/drivers/dma/idxd/irq.c
+++ b/drivers/dma/idxd/irq.c
@@ -163,11 +163,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause)
}

if (cause & IDXD_INTC_PERFMON_OVFL) {
- /*
- * Driver does not utilize perfmon counter overflow interrupt
- * yet.
- */
val |= IDXD_INTC_PERFMON_OVFL;
+ perfmon_counter_overflow(idxd);
}

val ^= cause;
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c
new file mode 100644
index 000000000000..929a4a973757
--- /dev/null
+++ b/drivers/dma/idxd/perfmon.c
@@ -0,0 +1,661 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */
+
+#include <linux/sched/task.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include "idxd.h"
+#include "perfmon.h"
+
+static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
+ char *buf);
+
+static cpumask_t perfmon_dsa_cpu_mask;
+static bool cpuhp_set_up;
+static enum cpuhp_state cpuhp_slot;
+
+static DEVICE_ATTR_RO(cpumask);
+
+static struct attribute *perfmon_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group cpumask_attr_group = {
+ .attrs = perfmon_cpumask_attrs,
+};
+
+/* Attributes for idxd perfmon events/categories */
+DEFINE_PERFMON_FORMAT_ATTR(event_category, "config:0-3");
+DEFINE_PERFMON_FORMAT_ATTR(event, "config:4-31");
+
+/* Attributes for idxd event filter configuration */
+DEFINE_PERFMON_FORMAT_ATTR(filter_wq, "config1:0-31");
+DEFINE_PERFMON_FORMAT_ATTR(filter_tc, "config1:32-39");
+DEFINE_PERFMON_FORMAT_ATTR(filter_pgsz, "config1:40-43");
+DEFINE_PERFMON_FORMAT_ATTR(filter_sz, "config1:44-51");
+DEFINE_PERFMON_FORMAT_ATTR(filter_eng, "config1:52-59");
+
+#define PERFMON_FILTERS_START 2
+#define PERFMON_FILTERS_MAX 5
+
+static struct attribute *perfmon_format_attrs[] = {
+ &format_attr_idxd_event_category.attr,
+ &format_attr_idxd_event.attr,
+ &format_attr_idxd_filter_wq.attr,
+ &format_attr_idxd_filter_tc.attr,
+ &format_attr_idxd_filter_pgsz.attr,
+ &format_attr_idxd_filter_sz.attr,
+ &format_attr_idxd_filter_eng.attr,
+ NULL,
+};
+
+static struct attribute_group perfmon_format_attr_group = {
+ .name = "format",
+ .attrs = perfmon_format_attrs,
+};
+
+static const struct attribute_group *perfmon_attr_groups[] = {
+ &perfmon_format_attr_group,
+ &cpumask_attr_group,
+ NULL,
+};
+
+static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &perfmon_dsa_cpu_mask);
+}
+
+static bool is_idxd_event(struct idxd_pmu *idxd_pmu, struct perf_event *event)
+{
+ return &idxd_pmu->pmu == event->pmu;
+}
+
+static int perfmon_collect_events(struct idxd_pmu *idxd_pmu,
+ struct perf_event *leader,
+ bool dogrp)
+{
+ struct perf_event *event;
+ int n, max_count;
+
+ max_count = idxd_pmu->n_counters;
+ n = idxd_pmu->n_events;
+
+ if (n >= max_count)
+ return -EINVAL;
+
+ if (is_idxd_event(idxd_pmu, leader)) {
+ idxd_pmu->event_list[n] = leader;
+ idxd_pmu->event_list[n]->hw.idx = n;
+ n++;
+ }
+
+ if (!dogrp)
+ return n;
+
+ for_each_sibling_event(event, leader) {
+ if (!is_idxd_event(idxd_pmu, event) ||
+ event->state <= PERF_EVENT_STATE_OFF)
+ continue;
+
+ if (n >= max_count)
+ return -EINVAL;
+
+ idxd_pmu->event_list[n] = event;
+ idxd_pmu->event_list[n]->hw.idx = n;
+ n++;
+ }
+
+ return n;
+}
+
+static void perfmon_assign_hw_event(struct idxd_pmu *idxd_pmu,
+ struct perf_event *event, int idx)
+{
+ struct idxd_device *idxd = idxd_pmu->idxd;
+ struct hw_perf_event *hwc = &event->hw;
+
+ hwc->idx = idx;
+ hwc->config_base = ioread64(CNTRCFG_REG(idxd, idx));
+ hwc->event_base = ioread64(CNTRCFG_REG(idxd, idx));
+}
+
+static int perfmon_assign_event(struct idxd_pmu *idxd_pmu,
+ struct perf_event *event)
+{
+ int i;
+
+ for (i = 0; i < IDXD_PMU_EVENT_MAX; i++)
+ if (!test_and_set_bit(i, idxd_pmu->used_mask))
+ return i;
+
+ return -EINVAL;
+}
+
+/*
+ * Check whether there are enough counters to satisfy that all the
+ * events in the group can actually be scheduled at the same time.
+ *
+ * To do this, create a fake idxd_pmu object so the event collection
+ * and assignment functions can be used without affecting the internal
+ * state of the real idxd_pmu object.
+ */
+static int perfmon_validate_group(struct idxd_pmu *pmu,
+ struct perf_event *event)
+{
+ struct perf_event *leader = event->group_leader;
+ struct idxd_pmu *fake_pmu;
+ int i, ret = 0, n;
+
+ fake_pmu = kzalloc(sizeof(*fake_pmu), GFP_KERNEL);
+ if (!fake_pmu)
+ return -ENOMEM;
+
+ fake_pmu->pmu.name = pmu->pmu.name;
+ fake_pmu->n_counters = pmu->n_counters;
+
+ n = perfmon_collect_events(fake_pmu, leader, true);
+ if (n < 0) {
+ ret = n;
+ goto out;
+ }
+
+ fake_pmu->n_events = n;
+ n = perfmon_collect_events(fake_pmu, event, false);
+ if (n < 0) {
+ ret = n;
+ goto out;
+ }
+
+ fake_pmu->n_events = n;
+
+ for (i = 0; i < n; i++) {
+ int idx;
+
+ event = fake_pmu->event_list[i];
+
+ idx = perfmon_assign_event(fake_pmu, event);
+ if (idx < 0) {
+ ret = idx;
+ goto out;
+ }
+ }
+out:
+ kfree(fake_pmu);
+
+ return ret;
+}
+
+static int perfmon_pmu_event_init(struct perf_event *event)
+{
+ struct idxd_device *idxd;
+ struct device *dev;
+ int ret = 0;
+
+ idxd = event_to_idxd(event);
+ dev = &idxd->pdev->dev;
+ event->hw.idx = -1;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* sampling not supported */
+ if (event->attr.sample_period)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ if (event->pmu != &idxd->idxd_pmu->pmu)
+ return -EINVAL;
+
+ event->hw.event_base = ioread64(PERFMON_TABLE_OFFSET(idxd));
+ event->cpu = idxd->idxd_pmu->cpu;
+ event->hw.config = event->attr.config;
+
+ if (event->group_leader != event)
+ /* non-group events have themselves as leader */
+ ret = perfmon_validate_group(idxd->idxd_pmu, event);
+
+ return ret;
+}
+
+static inline u64 perfmon_pmu_read_counter(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct idxd_device *idxd;
+ int cntr = hwc->idx;
+ struct device *dev;
+ u64 cntrdata;
+
+ idxd = event_to_idxd(event);
+ dev = &idxd->pdev->dev;
+
+ cntrdata = ioread64(CNTRDATA_REG(idxd, cntr));
+
+ return cntrdata;
+}
+
+static void perfmon_pmu_event_update(struct perf_event *event)
+{
+ struct idxd_device *idxd = event_to_idxd(event);
+ u64 prev_raw_count, new_raw_count, delta, p, n;
+ int shift = 64 - idxd->idxd_pmu->counter_width;
+ struct hw_perf_event *hwc = &event->hw;
+
+ do {
+ prev_raw_count = local64_read(&hwc->prev_count);
+ new_raw_count = perfmon_pmu_read_counter(event);
+ } while (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
+ new_raw_count) != prev_raw_count);
+
+ n = (new_raw_count << shift);
+ p = (prev_raw_count << shift);
+
+ delta = ((n - p) >> shift);
+
+ local64_add(delta, &event->count);
+}
+
+void perfmon_counter_overflow(struct idxd_device *idxd)
+{
+ int i, n_counters, max_loop = OVERFLOW_SIZE;
+ struct perf_event *event;
+ unsigned long ovfstatus;
+
+ n_counters = min(idxd->idxd_pmu->n_counters, OVERFLOW_SIZE);
+
+ ovfstatus = ioread32(OVFSTATUS_REG(idxd));
+
+ /*
+ * While updating overflowed counters, other counters behind
+ * them could overflow and be missed in a given pass.
+ * Normally this could happen at most n_counters times, but in
+ * theory a tiny counter width could result in continual
+ * overflows and endless looping. max_loop provides a
+ * failsafe in that highly unlikely case.
+ */
+ while (ovfstatus && max_loop--) {
+ /* Figure out which counter(s) overflowed */
+ for_each_set_bit(i, &ovfstatus, n_counters) {
+ /* Update event->count for overflowed counter */
+ event = idxd->idxd_pmu->event_list[i];
+ perfmon_pmu_event_update(event);
+ clear_bit(i, &ovfstatus);
+ iowrite32(ovfstatus, OVFSTATUS_REG(idxd));
+ }
+
+ ovfstatus = ioread32(OVFSTATUS_REG(idxd));
+ }
+
+ /*
+ * Should never happen. If so, it means a counter(s) looped
+ * around twice while this handler was running.
+ */
+ WARN_ON_ONCE(ovfstatus);
+}
+
+static inline void perfmon_reset_config(struct idxd_device *idxd)
+{
+ iowrite32(CONFIG_RESET, PERFRST_REG(idxd));
+ iowrite32(0, OVFSTATUS_REG(idxd));
+ iowrite32(0, PERFFRZ_REG(idxd));
+}
+
+static inline void perfmon_reset_counters(struct idxd_device *idxd)
+{
+ iowrite32(CNTR_RESET, PERFRST_REG(idxd));
+}
+
+static inline void perfmon_reset(struct idxd_device *idxd)
+{
+ perfmon_reset_config(idxd);
+ perfmon_reset_counters(idxd);
+}
+
+static void perfmon_pmu_event_start(struct perf_event *event, int mode)
+{
+ u32 flt_wq, flt_tc, flt_pg_sz, flt_xfer_sz, flt_eng = 0;
+ u64 cntr_cfg, cntrdata, event_enc, event_cat = 0;
+ struct hw_perf_event *hwc = &event->hw;
+ union filter_cfg flt_cfg;
+ union event_cfg event_cfg;
+ struct idxd_pmu *idxd_pmu;
+ struct idxd_device *idxd;
+ struct device *dev;
+ int cntr;
+
+ idxd_pmu = event_to_pmu(event);
+ idxd = event_to_idxd(event);
+ dev = &idxd->pdev->dev;
+
+ event->hw.idx = hwc->idx;
+ cntr = hwc->idx;
+
+ /* Obtain event category and event value from user space */
+ event_cfg.val = event->attr.config;
+ flt_cfg.val = event->attr.config1;
+ event_cat = event_cfg.event_cat;
+ event_enc = event_cfg.event_enc;
+
+ /* Obtain filter configuration from user space */
+ flt_wq = flt_cfg.wq;
+ flt_tc = flt_cfg.tc;
+ flt_pg_sz = flt_cfg.pg_sz;
+ flt_xfer_sz = flt_cfg.xfer_sz;
+ flt_eng = flt_cfg.eng;
+
+ if (flt_wq && test_bit(FLT_WQ, &idxd->idxd_pmu->supported_filters))
+ iowrite32(flt_wq, FLTCFG_REG(idxd, cntr, FLT_WQ));
+ if (flt_tc && test_bit(FLT_TC, &idxd->idxd_pmu->supported_filters))
+ iowrite32(flt_tc, FLTCFG_REG(idxd, cntr, FLT_TC));
+ if (flt_pg_sz && test_bit(FLT_PG_SZ, &idxd->idxd_pmu->supported_filters))
+ iowrite32(flt_pg_sz, FLTCFG_REG(idxd, cntr, FLT_PG_SZ));
+ if (flt_xfer_sz && test_bit(FLT_XFER_SZ, &idxd->idxd_pmu->supported_filters))
+ iowrite32(flt_xfer_sz, FLTCFG_REG(idxd, cntr, FLT_XFER_SZ));
+ if (flt_eng && test_bit(FLT_ENG, &idxd->idxd_pmu->supported_filters))
+ iowrite32(flt_eng, FLTCFG_REG(idxd, cntr, FLT_ENG));
+
+ /* Read the start value */
+ cntrdata = ioread64(CNTRDATA_REG(idxd, cntr));
+ local64_set(&event->hw.prev_count, cntrdata);
+
+ /* Set counter to event/category */
+ cntr_cfg = event_cat << CNTRCFG_CATEGORY_SHIFT;
+ cntr_cfg |= event_enc << CNTRCFG_EVENT_SHIFT;
+ /* Set interrupt on overflow and counter enable bits */
+ cntr_cfg |= (CNTRCFG_IRQ_OVERFLOW | CNTRCFG_ENABLE);
+
+ iowrite64(cntr_cfg, CNTRCFG_REG(idxd, cntr));
+}
+
+static void perfmon_pmu_event_stop(struct perf_event *event, int mode)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct idxd_device *idxd;
+ int i, cntr = hwc->idx;
+ u64 cntr_cfg;
+
+ idxd = event_to_idxd(event);
+
+ /* remove this event from event list */
+ for (i = 0; i < idxd->idxd_pmu->n_events; i++) {
+ if (event != idxd->idxd_pmu->event_list[i])
+ continue;
+
+ for (++i; i < idxd->idxd_pmu->n_events; i++)
+ idxd->idxd_pmu->event_list[i - 1] = idxd->idxd_pmu->event_list[i];
+ --idxd->idxd_pmu->n_events;
+ break;
+ }
+
+ cntr_cfg = ioread64(CNTRCFG_REG(idxd, cntr));
+ cntr_cfg &= ~CNTRCFG_ENABLE;
+ iowrite64(cntr_cfg, CNTRCFG_REG(idxd, cntr));
+
+ if (mode == PERF_EF_UPDATE)
+ perfmon_pmu_event_update(event);
+
+ event->hw.idx = -1;
+ clear_bit(cntr, idxd->idxd_pmu->used_mask);
+}
+
+static void perfmon_pmu_event_del(struct perf_event *event, int mode)
+{
+ perfmon_pmu_event_stop(event, PERF_EF_UPDATE);
+}
+
+static int perfmon_pmu_event_add(struct perf_event *event, int flags)
+{
+ struct idxd_device *idxd = event_to_idxd(event);
+ struct idxd_pmu *idxd_pmu = idxd->idxd_pmu;
+ struct hw_perf_event *hwc = &event->hw;
+ int idx, n;
+
+ n = perfmon_collect_events(idxd_pmu, event, false);
+ if (n < 0)
+ return n;
+
+ hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+ if (!(flags & PERF_EF_START))
+ hwc->state |= PERF_HES_ARCH;
+
+ idx = perfmon_assign_event(idxd_pmu, event);
+ if (idx < 0)
+ return idx;
+
+ perfmon_assign_hw_event(idxd_pmu, event, idx);
+
+ if (flags & PERF_EF_START)
+ perfmon_pmu_event_start(event, 0);
+
+ idxd_pmu->n_events = n;
+
+ return 0;
+}
+
+static void enable_perfmon_pmu(struct idxd_device *idxd)
+{
+ iowrite32(COUNTER_UNFREEZE, PERFFRZ_REG(idxd));
+}
+
+static void disable_perfmon_pmu(struct idxd_device *idxd)
+{
+ iowrite32(COUNTER_FREEZE, PERFFRZ_REG(idxd));
+}
+
+static void perfmon_pmu_enable(struct pmu *pmu)
+{
+ struct idxd_device *idxd = pmu_to_idxd(pmu);
+
+ enable_perfmon_pmu(idxd);
+}
+
+static void perfmon_pmu_disable(struct pmu *pmu)
+{
+ struct idxd_device *idxd = pmu_to_idxd(pmu);
+
+ disable_perfmon_pmu(idxd);
+}
+
+static void skip_filter(int i)
+{
+ int j;
+
+ for (j = i; j < PERFMON_FILTERS_MAX; j++)
+ perfmon_format_attrs[PERFMON_FILTERS_START + j] =
+ perfmon_format_attrs[PERFMON_FILTERS_START + j + 1];
+}
+
+static void idxd_pmu_init(struct idxd_pmu *idxd_pmu)
+{
+ int i;
+
+ for (i = 0 ; i < PERFMON_FILTERS_MAX; i++) {
+ if (!test_bit(i, &idxd_pmu->supported_filters))
+ skip_filter(i);
+ }
+
+ idxd_pmu->pmu.name = idxd_pmu->name;
+ idxd_pmu->pmu.attr_groups = perfmon_attr_groups;
+ idxd_pmu->pmu.task_ctx_nr = perf_invalid_context;
+ idxd_pmu->pmu.event_init = perfmon_pmu_event_init;
+ idxd_pmu->pmu.pmu_enable = perfmon_pmu_enable,
+ idxd_pmu->pmu.pmu_disable = perfmon_pmu_disable,
+ idxd_pmu->pmu.add = perfmon_pmu_event_add;
+ idxd_pmu->pmu.del = perfmon_pmu_event_del;
+ idxd_pmu->pmu.start = perfmon_pmu_event_start;
+ idxd_pmu->pmu.stop = perfmon_pmu_event_stop;
+ idxd_pmu->pmu.read = perfmon_pmu_event_update;
+ idxd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_EXCLUDE;
+ idxd_pmu->pmu.module = THIS_MODULE;
+}
+
+void perfmon_pmu_remove(struct idxd_device *idxd)
+{
+ if (!idxd->idxd_pmu)
+ return;
+
+ cpuhp_state_remove_instance(cpuhp_slot, &idxd->idxd_pmu->cpuhp_node);
+ perf_pmu_unregister(&idxd->idxd_pmu->pmu);
+ kfree(idxd->idxd_pmu);
+ idxd->idxd_pmu = NULL;
+}
+
+static int perf_event_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+ struct idxd_pmu *idxd_pmu;
+
+ idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
+
+ /* select the first online CPU as the designated reader */
+ if (cpumask_empty(&perfmon_dsa_cpu_mask)) {
+ cpumask_set_cpu(cpu, &perfmon_dsa_cpu_mask);
+ idxd_pmu->cpu = cpu;
+ }
+
+ return 0;
+}
+
+static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+ struct idxd_pmu *idxd_pmu;
+ unsigned int target;
+
+ idxd_pmu = hlist_entry_safe(node, typeof(*idxd_pmu), cpuhp_node);
+
+ if (!cpumask_test_and_clear_cpu(cpu, &perfmon_dsa_cpu_mask))
+ return 0;
+
+ target = cpumask_any_but(cpu_online_mask, cpu);
+
+ /* migrate events if there is a valid target */
+ if (target < nr_cpu_ids)
+ cpumask_set_cpu(target, &perfmon_dsa_cpu_mask);
+ else
+ target = -1;
+
+ perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target);
+
+ return 0;
+}
+
+int perfmon_pmu_init(struct idxd_device *idxd)
+{
+ union idxd_perfcap perfcap;
+ struct idxd_pmu *idxd_pmu;
+ int rc = -ENODEV;
+
+ /*
+ * perfmon module initialization failed, nothing to do
+ */
+ if (!cpuhp_set_up)
+ return -ENODEV;
+
+ /*
+ * If perfmon_offset or num_counters is 0, it means perfmon is
+ * not supported on this hardware.
+ */
+ if (idxd->perfmon_offset == 0)
+ return -ENODEV;
+
+ idxd_pmu = kzalloc(sizeof(*idxd_pmu), GFP_KERNEL);
+ if (!idxd_pmu)
+ return -ENOMEM;
+
+ idxd_pmu->idxd = idxd;
+ idxd->idxd_pmu = idxd_pmu;
+
+ if (idxd->type == IDXD_TYPE_DSA) {
+ rc = sprintf(idxd_pmu->name, "dsa%d", idxd->id);
+ if (rc < 0)
+ goto free;
+ } else if (idxd->type == IDXD_TYPE_IAX) {
+ rc = sprintf(idxd_pmu->name, "iax%d", idxd->id);
+ if (rc < 0)
+ goto free;
+ } else {
+ goto free;
+ }
+
+ perfmon_reset(idxd);
+
+ perfcap.bits = ioread64(PERFCAP_REG(idxd));
+
+ /*
+ * If total perf counter is 0, stop further registration.
+ * This is necessary in order to support driver running on
+ * guest which does not have pmon support.
+ */
+ if (perfcap.num_perf_counter == 0)
+ goto free;
+
+ /* A counter width of 0 means it can't count */
+ if (perfcap.counter_width == 0)
+ goto free;
+
+ /* Overflow interrupt and counter freeze support must be available */
+ if (!perfcap.overflow_interrupt || !perfcap.counter_freeze)
+ goto free;
+
+ /* Number of event categories cannot be 0 */
+ if (perfcap.num_event_category == 0)
+ goto free;
+
+ /*
+ * We don't support per-counter capabilities for now.
+ */
+ if (perfcap.cap_per_counter)
+ goto free;
+
+ idxd_pmu->n_event_categories = perfcap.num_event_category;
+ idxd_pmu->supported_event_categories = perfcap.global_event_category;
+ idxd_pmu->per_counter_caps_supported = perfcap.cap_per_counter;
+
+ /* check filter capability. If 0, then filters are not supported */
+ idxd_pmu->supported_filters = perfcap.filter;
+ if (perfcap.filter)
+ idxd_pmu->n_filters = hweight8(perfcap.filter);
+
+ /* Store the total number of counters categories, and counter width */
+ idxd_pmu->n_counters = perfcap.num_perf_counter;
+ idxd_pmu->counter_width = perfcap.counter_width;
+
+ idxd_pmu_init(idxd_pmu);
+
+ rc = perf_pmu_register(&idxd_pmu->pmu, idxd_pmu->name, -1);
+ if (rc)
+ goto free;
+
+ rc = cpuhp_state_add_instance(cpuhp_slot, &idxd_pmu->cpuhp_node);
+ if (rc) {
+ perf_pmu_unregister(&idxd->idxd_pmu->pmu);
+ goto free;
+ }
+out:
+ return rc;
+free:
+ kfree(idxd_pmu);
+ idxd->idxd_pmu = NULL;
+
+ goto out;
+}
+
+void __init perfmon_init(void)
+{
+ int rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+ "driver/dma/idxd/perf:online",
+ perf_event_cpu_online,
+ perf_event_cpu_offline);
+ if (WARN_ON(rc < 0))
+ return;
+
+ cpuhp_slot = rc;
+ cpuhp_set_up = true;
+}
+
+void __exit perfmon_exit(void)
+{
+ if (cpuhp_set_up)
+ cpuhp_remove_multi_state(cpuhp_slot);
+}
diff --git a/drivers/dma/idxd/perfmon.h b/drivers/dma/idxd/perfmon.h
new file mode 100644
index 000000000000..9a081a1bc605
--- /dev/null
+++ b/drivers/dma/idxd/perfmon.h
@@ -0,0 +1,119 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */
+
+#ifndef _PERFMON_H_
+#define _PERFMON_H_
+
+#include <linux/slab.h>
+#include <linux/pci.h>
+#include <linux/sbitmap.h>
+#include <linux/dmaengine.h>
+#include <linux/percpu-rwsem.h>
+#include <linux/wait.h>
+#include <linux/cdev.h>
+#include <linux/uuid.h>
+#include <linux/idxd.h>
+#include <linux/perf_event.h>
+#include "registers.h"
+
+static inline struct idxd_pmu *event_to_pmu(struct perf_event *event)
+{
+ struct idxd_pmu *idxd_pmu;
+ struct pmu *pmu;
+
+ pmu = event->pmu;
+ idxd_pmu = container_of(pmu, struct idxd_pmu, pmu);
+
+ return idxd_pmu;
+}
+
+static inline struct idxd_device *event_to_idxd(struct perf_event *event)
+{
+ struct idxd_pmu *idxd_pmu;
+ struct pmu *pmu;
+
+ pmu = event->pmu;
+ idxd_pmu = container_of(pmu, struct idxd_pmu, pmu);
+
+ return idxd_pmu->idxd;
+}
+
+static inline struct idxd_device *pmu_to_idxd(struct pmu *pmu)
+{
+ struct idxd_pmu *idxd_pmu;
+
+ idxd_pmu = container_of(pmu, struct idxd_pmu, pmu);
+
+ return idxd_pmu->idxd;
+}
+
+enum dsa_perf_events {
+ DSA_PERF_EVENT_WQ = 0,
+ DSA_PERF_EVENT_ENGINE,
+ DSA_PERF_EVENT_ADDR_TRANS,
+ DSA_PERF_EVENT_OP,
+ DSA_PERF_EVENT_COMPL,
+ DSA_PERF_EVENT_MAX,
+};
+
+enum filter_enc {
+ FLT_WQ = 0,
+ FLT_TC,
+ FLT_PG_SZ,
+ FLT_XFER_SZ,
+ FLT_ENG,
+ FLT_MAX,
+};
+
+#define CONFIG_RESET 0x0000000000000001
+#define CNTR_RESET 0x0000000000000002
+#define CNTR_ENABLE 0x0000000000000001
+#define INTR_OVFL 0x0000000000000002
+
+#define COUNTER_FREEZE 0x00000000FFFFFFFF
+#define COUNTER_UNFREEZE 0x0000000000000000
+#define OVERFLOW_SIZE 32
+
+#define CNTRCFG_ENABLE BIT(0)
+#define CNTRCFG_IRQ_OVERFLOW BIT(1)
+#define CNTRCFG_CATEGORY_SHIFT 8
+#define CNTRCFG_EVENT_SHIFT 32
+
+#define PERFMON_TABLE_OFFSET(_idxd) \
+({ \
+ typeof(_idxd) __idxd = (_idxd); \
+ ((__idxd)->reg_base + (__idxd)->perfmon_offset); \
+})
+#define PERFMON_REG_OFFSET(idxd, offset) \
+ (PERFMON_TABLE_OFFSET(idxd) + (offset))
+
+#define PERFCAP_REG(idxd) (PERFMON_REG_OFFSET(idxd, IDXD_PERFCAP_OFFSET))
+#define PERFRST_REG(idxd) (PERFMON_REG_OFFSET(idxd, IDXD_PERFRST_OFFSET))
+#define OVFSTATUS_REG(idxd) (PERFMON_REG_OFFSET(idxd, IDXD_OVFSTATUS_OFFSET))
+#define PERFFRZ_REG(idxd) (PERFMON_REG_OFFSET(idxd, IDXD_PERFFRZ_OFFSET))
+
+#define FLTCFG_REG(idxd, cntr, flt) \
+ (PERFMON_REG_OFFSET(idxd, IDXD_FLTCFG_OFFSET) + ((cntr) * 32) + ((flt) * 4))
+
+#define CNTRCFG_REG(idxd, cntr) \
+ (PERFMON_REG_OFFSET(idxd, IDXD_CNTRCFG_OFFSET) + ((cntr) * 8))
+#define CNTRDATA_REG(idxd, cntr) \
+ (PERFMON_REG_OFFSET(idxd, IDXD_CNTRDATA_OFFSET) + ((cntr) * 8))
+#define CNTRCAP_REG(idxd, cntr) \
+ (PERFMON_REG_OFFSET(idxd, IDXD_CNTRCAP_OFFSET) + ((cntr) * 8))
+
+#define EVNTCAP_REG(idxd, category) \
+ (PERFMON_REG_OFFSET(idxd, IDXD_EVNTCAP_OFFSET) + ((category) * 8))
+
+#define DEFINE_PERFMON_FORMAT_ATTR(_name, _format) \
+static ssize_t __perfmon_idxd_##_name##_show(struct kobject *kobj, \
+ struct kobj_attribute *attr, \
+ char *page) \
+{ \
+ BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \
+ return sprintf(page, _format "\n"); \
+} \
+static struct kobj_attribute format_attr_idxd_##_name = \
+ __ATTR(_name, 0444, __perfmon_idxd_##_name##_show, NULL)
+
+#endif
diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h
index 751ecb4f9f81..ffc1f7c7b3b5 100644
--- a/drivers/dma/idxd/registers.h
+++ b/drivers/dma/idxd/registers.h
@@ -378,4 +378,112 @@ union wqcfg {
#define GRPENGCFG_OFFSET(idxd_dev, n) ((idxd_dev)->grpcfg_offset + (n) * GRPCFG_SIZE + 32)
#define GRPFLGCFG_OFFSET(idxd_dev, n) ((idxd_dev)->grpcfg_offset + (n) * GRPCFG_SIZE + 40)

+/* Following is performance monitor registers */
+#define IDXD_PERFCAP_OFFSET 0x0
+union idxd_perfcap {
+ struct {
+ u64 num_perf_counter:6;
+ u64 rsvd1:2;
+ u64 counter_width:8;
+ u64 num_event_category:4;
+ u64 global_event_category:16;
+ u64 filter:8;
+ u64 rsvd2:8;
+ u64 cap_per_counter:1;
+ u64 writeable_counter:1;
+ u64 counter_freeze:1;
+ u64 overflow_interrupt:1;
+ u64 rsvd3:8;
+ };
+ u64 bits;
+} __packed;
+
+#define IDXD_EVNTCAP_OFFSET 0x80
+union idxd_evntcap {
+ struct {
+ u64 events:28;
+ u64 rsvd:36;
+ };
+ u64 bits;
+} __packed;
+
+struct idxd_event {
+ union {
+ struct {
+ u32 event_category:4;
+ u32 events:28;
+ };
+ u32 val;
+ };
+} __packed;
+
+#define IDXD_CNTRCAP_OFFSET 0x800
+struct idxd_cntrcap {
+ union {
+ struct {
+ u32 counter_width:8;
+ u32 rsvd:20;
+ u32 num_events:4;
+ };
+ u32 val;
+ };
+ struct idxd_event events[];
+} __packed;
+
+#define IDXD_PERFRST_OFFSET 0x10
+union idxd_perfrst {
+ struct {
+ u32 perfrst_config:1;
+ u32 perfrst_counter:1;
+ u32 rsvd:30;
+ };
+ u32 val;
+} __packed;
+
+#define IDXD_OVFSTATUS_OFFSET 0x30
+#define IDXD_PERFFRZ_OFFSET 0x20
+#define IDXD_CNTRCFG_OFFSET 0x100
+union idxd_cntrcfg {
+ struct {
+ u64 enable:1;
+ u64 interrupt_ovf:1;
+ u64 global_freeze_ovf:1;
+ u64 rsvd1:5;
+ u64 event_category:4;
+ u64 rsvd2:20;
+ u64 events:28;
+ u64 rsvd3:4;
+ };
+ u64 val;
+} __packed;
+
+#define IDXD_FLTCFG_OFFSET 0x300
+
+#define IDXD_CNTRDATA_OFFSET 0x200
+union idxd_cntrdata {
+ struct {
+ u64 event_count_value;
+ };
+ u64 val;
+} __packed;
+
+union event_cfg {
+ struct {
+ u64 event_cat:4;
+ u64 event_enc:28;
+ };
+ u64 val;
+} __packed;
+
+union filter_cfg {
+ struct {
+ u64 wq:32;
+ u64 tc:8;
+ u64 pg_sz:4;
+ u64 xfer_sz:8;
+ u64 eng:8;
+ };
+ u64 val;
+} __packed;
+
#endif
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index f14adb882338..264d911424c0 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -167,6 +167,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_X86_RAPL_ONLINE,
CPUHP_AP_PERF_X86_CQM_ONLINE,
CPUHP_AP_PERF_X86_CSTATE_ONLINE,
+ CPUHP_AP_PERF_X86_IDXD_ONLINE,
CPUHP_AP_PERF_S390_CF_ONLINE,
CPUHP_AP_PERF_S390_CFD_ONLINE,
CPUHP_AP_PERF_S390_SF_ONLINE,
--
2.17.1

2021-04-02 18:25:34

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH 1/1] dmaengine: idxd: Add IDXD performance monitor support

Hi Tom,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on vkoul-dmaengine/next]
[also build test WARNING on linux/master linus/master v5.12-rc5 next-20210401]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Tom-Zanussi/dmaengine-idxd-IDXD-pmu-support/20210403-005240
base: https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git next
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/ef9587b8e4ebe37a46d89b14ed68fb321e33242f
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Tom-Zanussi/dmaengine-idxd-IDXD-pmu-support/20210403-005240
git checkout ef9587b8e4ebe37a46d89b14ed68fb321e33242f
# save the attached .config to linux build tree
make W=1 ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

drivers/dma/idxd/perfmon.c: In function 'perfmon_pmu_event_init':
>> drivers/dma/idxd/perfmon.c:192:17: warning: variable 'dev' set but not used [-Wunused-but-set-variable]
192 | struct device *dev;
| ^~~
drivers/dma/idxd/perfmon.c: In function 'perfmon_pmu_read_counter':
drivers/dma/idxd/perfmon.c:228:17: warning: variable 'dev' set but not used [-Wunused-but-set-variable]
228 | struct device *dev;
| ^~~
drivers/dma/idxd/perfmon.c: In function 'perfmon_pmu_event_start':
drivers/dma/idxd/perfmon.c:325:17: warning: variable 'dev' set but not used [-Wunused-but-set-variable]
325 | struct device *dev;
| ^~~
>> drivers/dma/idxd/perfmon.c:323:19: warning: variable 'idxd_pmu' set but not used [-Wunused-but-set-variable]
323 | struct idxd_pmu *idxd_pmu;
| ^~~~~~~~

vim +/dev +192 drivers/dma/idxd/perfmon.c

188
189 static int perfmon_pmu_event_init(struct perf_event *event)
190 {
191 struct idxd_device *idxd;
> 192 struct device *dev;
193 int ret = 0;
194
195 idxd = event_to_idxd(event);
196 dev = &idxd->pdev->dev;
197 event->hw.idx = -1;
198
199 if (event->attr.type != event->pmu->type)
200 return -ENOENT;
201
202 /* sampling not supported */
203 if (event->attr.sample_period)
204 return -EINVAL;
205
206 if (event->cpu < 0)
207 return -EINVAL;
208
209 if (event->pmu != &idxd->idxd_pmu->pmu)
210 return -EINVAL;
211
212 event->hw.event_base = ioread64(PERFMON_TABLE_OFFSET(idxd));
213 event->cpu = idxd->idxd_pmu->cpu;
214 event->hw.config = event->attr.config;
215
216 if (event->group_leader != event)
217 /* non-group events have themselves as leader */
218 ret = perfmon_validate_group(idxd->idxd_pmu, event);
219
220 return ret;
221 }
222
223 static inline u64 perfmon_pmu_read_counter(struct perf_event *event)
224 {
225 struct hw_perf_event *hwc = &event->hw;
226 struct idxd_device *idxd;
227 int cntr = hwc->idx;
228 struct device *dev;
229 u64 cntrdata;
230
231 idxd = event_to_idxd(event);
232 dev = &idxd->pdev->dev;
233
234 cntrdata = ioread64(CNTRDATA_REG(idxd, cntr));
235
236 return cntrdata;
237 }
238
239 static void perfmon_pmu_event_update(struct perf_event *event)
240 {
241 struct idxd_device *idxd = event_to_idxd(event);
242 u64 prev_raw_count, new_raw_count, delta, p, n;
243 int shift = 64 - idxd->idxd_pmu->counter_width;
244 struct hw_perf_event *hwc = &event->hw;
245
246 do {
247 prev_raw_count = local64_read(&hwc->prev_count);
248 new_raw_count = perfmon_pmu_read_counter(event);
249 } while (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
250 new_raw_count) != prev_raw_count);
251
252 n = (new_raw_count << shift);
253 p = (prev_raw_count << shift);
254
255 delta = ((n - p) >> shift);
256
257 local64_add(delta, &event->count);
258 }
259
260 void perfmon_counter_overflow(struct idxd_device *idxd)
261 {
262 int i, n_counters, max_loop = OVERFLOW_SIZE;
263 struct perf_event *event;
264 unsigned long ovfstatus;
265
266 n_counters = min(idxd->idxd_pmu->n_counters, OVERFLOW_SIZE);
267
268 ovfstatus = ioread32(OVFSTATUS_REG(idxd));
269
270 /*
271 * While updating overflowed counters, other counters behind
272 * them could overflow and be missed in a given pass.
273 * Normally this could happen at most n_counters times, but in
274 * theory a tiny counter width could result in continual
275 * overflows and endless looping. max_loop provides a
276 * failsafe in that highly unlikely case.
277 */
278 while (ovfstatus && max_loop--) {
279 /* Figure out which counter(s) overflowed */
280 for_each_set_bit(i, &ovfstatus, n_counters) {
281 /* Update event->count for overflowed counter */
282 event = idxd->idxd_pmu->event_list[i];
283 perfmon_pmu_event_update(event);
284 clear_bit(i, &ovfstatus);
285 iowrite32(ovfstatus, OVFSTATUS_REG(idxd));
286 }
287
288 ovfstatus = ioread32(OVFSTATUS_REG(idxd));
289 }
290
291 /*
292 * Should never happen. If so, it means a counter(s) looped
293 * around twice while this handler was running.
294 */
295 WARN_ON_ONCE(ovfstatus);
296 }
297
298 static inline void perfmon_reset_config(struct idxd_device *idxd)
299 {
300 iowrite32(CONFIG_RESET, PERFRST_REG(idxd));
301 iowrite32(0, OVFSTATUS_REG(idxd));
302 iowrite32(0, PERFFRZ_REG(idxd));
303 }
304
305 static inline void perfmon_reset_counters(struct idxd_device *idxd)
306 {
307 iowrite32(CNTR_RESET, PERFRST_REG(idxd));
308 }
309
310 static inline void perfmon_reset(struct idxd_device *idxd)
311 {
312 perfmon_reset_config(idxd);
313 perfmon_reset_counters(idxd);
314 }
315
316 static void perfmon_pmu_event_start(struct perf_event *event, int mode)
317 {
318 u32 flt_wq, flt_tc, flt_pg_sz, flt_xfer_sz, flt_eng = 0;
319 u64 cntr_cfg, cntrdata, event_enc, event_cat = 0;
320 struct hw_perf_event *hwc = &event->hw;
321 union filter_cfg flt_cfg;
322 union event_cfg event_cfg;
> 323 struct idxd_pmu *idxd_pmu;
324 struct idxd_device *idxd;
325 struct device *dev;
326 int cntr;
327
328 idxd_pmu = event_to_pmu(event);
329 idxd = event_to_idxd(event);
330 dev = &idxd->pdev->dev;
331
332 event->hw.idx = hwc->idx;
333 cntr = hwc->idx;
334
335 /* Obtain event category and event value from user space */
336 event_cfg.val = event->attr.config;
337 flt_cfg.val = event->attr.config1;
338 event_cat = event_cfg.event_cat;
339 event_enc = event_cfg.event_enc;
340
341 /* Obtain filter configuration from user space */
342 flt_wq = flt_cfg.wq;
343 flt_tc = flt_cfg.tc;
344 flt_pg_sz = flt_cfg.pg_sz;
345 flt_xfer_sz = flt_cfg.xfer_sz;
346 flt_eng = flt_cfg.eng;
347
348 if (flt_wq && test_bit(FLT_WQ, &idxd->idxd_pmu->supported_filters))
349 iowrite32(flt_wq, FLTCFG_REG(idxd, cntr, FLT_WQ));
350 if (flt_tc && test_bit(FLT_TC, &idxd->idxd_pmu->supported_filters))
351 iowrite32(flt_tc, FLTCFG_REG(idxd, cntr, FLT_TC));
352 if (flt_pg_sz && test_bit(FLT_PG_SZ, &idxd->idxd_pmu->supported_filters))
353 iowrite32(flt_pg_sz, FLTCFG_REG(idxd, cntr, FLT_PG_SZ));
354 if (flt_xfer_sz && test_bit(FLT_XFER_SZ, &idxd->idxd_pmu->supported_filters))
355 iowrite32(flt_xfer_sz, FLTCFG_REG(idxd, cntr, FLT_XFER_SZ));
356 if (flt_eng && test_bit(FLT_ENG, &idxd->idxd_pmu->supported_filters))
357 iowrite32(flt_eng, FLTCFG_REG(idxd, cntr, FLT_ENG));
358
359 /* Read the start value */
360 cntrdata = ioread64(CNTRDATA_REG(idxd, cntr));
361 local64_set(&event->hw.prev_count, cntrdata);
362
363 /* Set counter to event/category */
364 cntr_cfg = event_cat << CNTRCFG_CATEGORY_SHIFT;
365 cntr_cfg |= event_enc << CNTRCFG_EVENT_SHIFT;
366 /* Set interrupt on overflow and counter enable bits */
367 cntr_cfg |= (CNTRCFG_IRQ_OVERFLOW | CNTRCFG_ENABLE);
368
369 iowrite64(cntr_cfg, CNTRCFG_REG(idxd, cntr));
370 }
371

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]

Attachments:

(No filename) (8.82 kB)
.config.gz (63.93 kB)
Download all attachments