2013-05-17 19:48:52

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

From: Suravee Suthikulpanit <[email protected]>

These patches implement the AMD IOMMU Performance Counter functionality
via custom perf PMU and implement static counting for various IOMMU
translations.

1) Extend the AMD IOMMU initialization to include performance
counter enablement.

2) The perf AMD IOMMU PMU to manage performance counters, which
interface with the AMD IOMMU core driver.

Steven L Kinney (1):
perf/x86/amd: Adding IOMMU PC resource management

Suravee Suthikulpanit (1):
perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation

arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/perf_event_amd_iommu.c | 500 ++++++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event_amd_iommu.h | 40 +++
drivers/iommu/amd_iommu_init.c | 121 ++++++-
drivers/iommu/amd_iommu_proto.h | 7 +
drivers/iommu/amd_iommu_types.h | 12 +-
6 files changed, 675 insertions(+), 7 deletions(-)
create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h

--
1.7.10.4


2013-05-17 19:47:45

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management

From: Steven L Kinney <[email protected]>

Add functionality to check the availability of the AMD IOMMU Performance
Counters and export this functionality to other core drivers, such as in this
case, a perf AMD IOMMU PMU. This feature is not bound to any specific AMD
family/model other than the presence of the IOMMU with PC enabled.

The AMD IOMMU PC support static counting only at this time.

Signed-off-by: Steven Kinney <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
V2 Changes:
- Modify the amd_iommu_pc_get_set_reg_val function
to support 64 bit values.
- Fix logic to properly clear amd_iommu_pc_present when
initialization failed.

drivers/iommu/amd_iommu_init.c | 121 ++++++++++++++++++++++++++++++++++++++-
drivers/iommu/amd_iommu_proto.h | 7 +++
drivers/iommu/amd_iommu_types.h | 12 +++-
3 files changed, 134 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index bf51abb..395fa29 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
u32 amd_iommu_max_pasids __read_mostly = ~0;

bool amd_iommu_v2_present __read_mostly;
+bool amd_iommu_pc_present __read_mostly;

bool amd_iommu_force_isolation __read_mostly;

@@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
*/
static u8 __iomem * __init iommu_map_mmio_space(u64 address)
{
- if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
+ if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {
pr_err("AMD-Vi: Can not reserve memory region %llx for mmio\n",
address);
pr_err("AMD-Vi: This is a BIOS bug. Please contact your hardware vendor\n");
return NULL;
}

- return (u8 __iomem *)ioremap_nocache(address, MMIO_REGION_LENGTH);
+ return (u8 __iomem *)ioremap_nocache(address, MMIO_REG_END_OFFSET);
}

static void __init iommu_unmap_mmio_space(struct amd_iommu *iommu)
{
if (iommu->mmio_base)
iounmap(iommu->mmio_base);
- release_mem_region(iommu->mmio_phys, MMIO_REGION_LENGTH);
+ release_mem_region(iommu->mmio_phys, MMIO_REG_END_OFFSET);
}

/****************************************************************************
@@ -1160,6 +1161,33 @@ static int __init init_iommu_all(struct acpi_table_header *table)
return 0;
}

+
+static void init_iommu_perf_ctr(struct amd_iommu *iommu)
+{
+ u64 val = 0xabcd, val2 = 0;
+
+ if (!iommu_feature(iommu, FEATURE_PC))
+ return;
+
+ amd_iommu_pc_present = true;
+
+ /* Check if the performance counters can be written to */
+ if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
+ (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
+ (val != val2)) {
+ pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
+ amd_iommu_pc_present = false;
+ return;
+ }
+
+ pr_info("AMD-Vi: IOMMU performance counters " "supported\n");
+
+ val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
+ iommu->max_banks = (u8) ((val >> 12) & 0x3f);
+ iommu->max_counters = (u8) ((val >> 7) & 0xf);
+}
+
+
static int iommu_init_pci(struct amd_iommu *iommu)
{
int cap_ptr = iommu->cap_ptr;
@@ -1226,6 +1254,8 @@ static int iommu_init_pci(struct amd_iommu *iommu)
if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
amd_iommu_np_cache = true;

+ init_iommu_perf_ctr(iommu);
+
if (is_rd890_iommu(iommu->dev)) {
int i, j;

@@ -2232,3 +2262,88 @@ bool amd_iommu_v2_supported(void)
return amd_iommu_v2_present;
}
EXPORT_SYMBOL(amd_iommu_v2_supported);
+
+/****************************************************************************
+ *
+ * IOMMU EFR Performance Counter support functionality. This code allows
+ * access to the IOMMU PC functionality.
+ *
+ ****************************************************************************/
+
+u8 amd_iommu_pc_get_max_banks(u16 devid)
+{
+ struct amd_iommu *iommu;
+
+ /* locate the iommu governing the devid */
+ iommu = amd_iommu_rlookup_table[devid];
+
+ if (iommu)
+ return iommu->max_banks;
+
+ return -ENODEV;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);
+
+bool amd_iommu_pc_supported(void)
+{
+ return amd_iommu_pc_present;
+}
+EXPORT_SYMBOL(amd_iommu_pc_supported);
+
+u8 amd_iommu_pc_get_max_counters(u16 devid)
+{
+ struct amd_iommu *iommu;
+
+ /* locate the iommu governing the devid */
+ iommu = amd_iommu_rlookup_table[devid];
+
+ if (iommu)
+ return iommu->max_counters;
+
+ return -ENODEV;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);
+
+int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
+ u64 *value, bool is_write)
+{
+ struct amd_iommu *iommu;
+ u32 offset;
+ u32 max_offset_lim;
+
+ /* Make sure the IOMMU PC resource is available */
+ if (!amd_iommu_pc_present) {
+ pr_info("AMD IOMMU - PC Not supported.\n");
+ return -ENODEV;
+ }
+
+ /* locate the iommu associated with the device ID */
+ iommu = amd_iommu_rlookup_table[devid];
+ if (iommu == NULL)
+ return -ENODEV;
+
+ /* check for valid iommu pc register indexing */
+ if (fxn < 0 || fxn > 0x28 || (fxn & 7))
+ return -ENODEV;
+
+ offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
+
+ /* limit the offset to the hw defined mmio region aperture */
+ max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
+ (iommu->max_counters << 8) | 0x28);
+ if ((offset < MMIO_CNTR_REG_OFFSET) ||
+ (offset > max_offset_lim))
+ return -EINVAL;
+
+ if (is_write) {
+ writel((u32)*value, iommu->mmio_base + offset);
+ writel((*value >> 32), iommu->mmio_base + offset + 4);
+ } else {
+ *value = readl(iommu->mmio_base + offset + 4);
+ *value <<= 32;
+ *value = readl(iommu->mmio_base + offset);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_set_reg_val);
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index c294961..95ed6de 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -56,6 +56,13 @@ extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid);
extern struct iommu_domain *amd_iommu_get_v2_domain(struct pci_dev *pdev);

+/* IOMMU Performance Counter functions */
+extern bool amd_iommu_pc_supported(void);
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+extern int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
+ u64 *value, bool is_write);
+
#define PPR_SUCCESS 0x0
#define PPR_INVALID 0x1
#define PPR_FAILURE 0xf
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 0285a21..a95f0ae 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -38,9 +38,6 @@
#define ALIAS_TABLE_ENTRY_SIZE 2
#define RLOOKUP_TABLE_ENTRY_SIZE (sizeof(void *))

-/* Length of the MMIO region for the AMD IOMMU */
-#define MMIO_REGION_LENGTH 0x4000
-
/* Capability offsets used by the driver */
#define MMIO_CAP_HDR_OFFSET 0x00
#define MMIO_RANGE_OFFSET 0x0c
@@ -78,6 +75,10 @@
#define MMIO_STATUS_OFFSET 0x2020
#define MMIO_PPR_HEAD_OFFSET 0x2030
#define MMIO_PPR_TAIL_OFFSET 0x2038
+#define MMIO_CNTR_CONF_OFFSET 0x4000
+#define MMIO_CNTR_REG_OFFSET 0x40000
+#define MMIO_REG_END_OFFSET 0x80000
+


/* Extended Feature Bits */
@@ -507,6 +508,7 @@ struct amd_iommu {

/* physical address of MMIO space */
u64 mmio_phys;
+
/* virtual address of MMIO space */
u8 __iomem *mmio_base;

@@ -584,6 +586,10 @@ struct amd_iommu {

/* The l2 indirect registers */
u32 stored_l2[0x83];
+
+ /* The maximum PC banks and counters/bank (PCSup=1) */
+ u8 max_banks;
+ u8 max_counters;
};

struct devid_map {
--
1.7.10.4

2013-05-17 19:47:52

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation

From: Suravee Suthikulpanit <[email protected]>

Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.

To invoke the AMD IOMMU PMU, issue a perf tool command such as:

./perf stat -a -e amd_iommu/<events>/ <command>
or
./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>

For example:

./perf stat -a -e amd_iommu/mem_trans_total/ <command>

The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.

The IOMMU performance counter support is available starting in the
AMD family15h model 0x30. For information regarding IOMMU performance
counter configuration, please see the AMD IOMMU v2.5 specification.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
V3 Changes:
This version contain several changes:
- Properly implement system-wide mode.
- Add "amd_iommu/cpumask".
- Add "amd_iommu/events"
- Simplify "amd_iommu/format"
- Add supports for devid, domid, and pasid filtering

arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/perf_event_amd_iommu.c | 498 ++++++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event_amd_iommu.h | 40 +++
3 files changed, 539 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index b0684e4..fcbd3b8 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o

ifdef CONFIG_PERF_EVENTS
-obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd.o perf_event_amd_uncore.o
+obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_p6.o perf_event_knc.o perf_event_p4.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
obj-$(CONFIG_CPU_SUP_INTEL) += perf_event_intel_uncore.o
diff --git a/arch/x86/kernel/cpu/perf_event_amd_iommu.c b/arch/x86/kernel/cpu/perf_event_amd_iommu.c
new file mode 100644
index 0000000..fad04f2
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_amd_iommu.c
@@ -0,0 +1,498 @@
+/*
+ * Copyright (C) 2013 Advanced Micro Devices, Inc.
+ *
+ * Author: Steven Kinney <[email protected]>
+ * Author: Suravee Suthikulpanit <[email protected]>
+ *
+ * Perf: amd_iommu - AMD IOMMU Performance Counter PMU implementation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <linux/cpumask.h>
+#include <linux/slab.h>
+
+#include "perf_event.h"
+#include "perf_event_amd_iommu.h"
+
+#define _GET_BANK(ev) ((u8)(ev->hw.extra_reg.reg >> 8))
+#define _GET_CNTR(ev) ((u8)(ev->hw.extra_reg.reg))
+
+/* iommu pmu config masks */
+#define _GET_CSOURCE(ev) ((ev->hw.config & 0xFFULL))
+#define _GET_DEVID(ev) ((ev->hw.config >> 8) & 0xFFFFULL)
+#define _GET_PASID(ev) ((ev->hw.config >> 24) & 0xFFFFULL)
+#define _GET_DOMID(ev) ((ev->hw.config >> 40) & 0xFFFFULL)
+#define _GET_DEVID_MASK(ev) ((ev->hw.extra_reg.config) & 0xFFFFULL)
+#define _GET_PASID_MASK(ev) ((ev->hw.extra_reg.config >> 16) & 0xFFFFULL)
+#define _GET_DOMID_MASK(ev) ((ev->hw.extra_reg.config >> 32) & 0xFFFFULL)
+
+static struct perf_amd_iommu __perf_iommu;
+
+struct perf_amd_iommu {
+ struct pmu pmu;
+ u8 max_banks;
+ u8 max_counters;
+ u64 cntr_assign_mask;
+ raw_spinlock_t lock;
+ const struct attribute_group *attr_groups[4];
+};
+
+#define format_group attr_groups[0]
+#define cpumask_group attr_groups[1]
+#define events_group attr_groups[2]
+#define null_group attr_groups[3]
+
+/*---------------------------------------------
+ * sysfs format attributes
+ *---------------------------------------------*/
+PMU_FORMAT_ATTR(csource, "config:0-7");
+PMU_FORMAT_ATTR(devid, "config:8-23");
+PMU_FORMAT_ATTR(pasid, "config:24-39");
+PMU_FORMAT_ATTR(domid, "config:40-55");
+PMU_FORMAT_ATTR(devid_mask, "config1:0-15");
+PMU_FORMAT_ATTR(pasid_mask, "config1:16-31");
+PMU_FORMAT_ATTR(domid_mask, "config1:32-47");
+
+static struct attribute *iommu_format_attrs[] = {
+ &format_attr_csource.attr,
+ &format_attr_devid.attr,
+ &format_attr_pasid.attr,
+ &format_attr_domid.attr,
+ &format_attr_devid_mask.attr,
+ &format_attr_pasid_mask.attr,
+ &format_attr_domid_mask.attr,
+ NULL,
+};
+
+static struct attribute_group amd_iommu_format_group = {
+ .name = "format",
+ .attrs = iommu_format_attrs,
+};
+
+/*---------------------------------------------
+ * sysfs events attributes
+ *---------------------------------------------*/
+struct amd_iommu_event_desc {
+ struct kobj_attribute attr;
+ const char *event;
+};
+
+static ssize_t _iommu_event_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct amd_iommu_event_desc *event =
+ container_of(attr, struct amd_iommu_event_desc, attr);
+ return sprintf(buf, "%s\n", event->event);
+}
+
+#define AMD_IOMMU_EVENT_DESC(_name, _event) \
+{ \
+ .attr = __ATTR(_name, 0444, _iommu_event_show, NULL), \
+ .event = _event, \
+}
+
+static struct amd_iommu_event_desc amd_iommu_v2_event_descs[] = {
+ AMD_IOMMU_EVENT_DESC(mem_pass_untrans, "csource=0x01"),
+ AMD_IOMMU_EVENT_DESC(mem_pass_pretrans, "csource=0x02"),
+ AMD_IOMMU_EVENT_DESC(mem_pass_excl, "csource=0x03"),
+ AMD_IOMMU_EVENT_DESC(mem_target_abort, "csource=0x04"),
+ AMD_IOMMU_EVENT_DESC(mem_trans_total, "csource=0x05"),
+ AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pte_hit, "csource=0x06"),
+ AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pte_mis, "csource=0x07"),
+ AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pde_hit, "csource=0x08"),
+ AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pde_mis, "csource=0x09"),
+ AMD_IOMMU_EVENT_DESC(mem_dte_hit, "csource=0x0a"),
+ AMD_IOMMU_EVENT_DESC(mem_dte_mis, "csource=0x0b"),
+ AMD_IOMMU_EVENT_DESC(page_tbl_read_tot, "csource=0x0c"),
+ AMD_IOMMU_EVENT_DESC(page_tbl_read_nst, "csource=0x0d"),
+ AMD_IOMMU_EVENT_DESC(page_tbl_read_gst, "csource=0x0e"),
+ AMD_IOMMU_EVENT_DESC(int_dte_hit, "csource=0x0f"),
+ AMD_IOMMU_EVENT_DESC(int_dte_mis, "csource=0x10"),
+ AMD_IOMMU_EVENT_DESC(cmd_processed, "csource=0x11"),
+ AMD_IOMMU_EVENT_DESC(cmd_processed_inv, "csource=0x12"),
+ AMD_IOMMU_EVENT_DESC(tlb_inv, "csource=0x13"),
+ { /* end: all zeroes */ },
+};
+
+/*---------------------------------------------
+ * sysfs cpumask attributes
+ *---------------------------------------------*/
+static cpumask_t iommu_cpumask;
+
+static ssize_t _iommu_cpumask_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ int n = cpulist_scnprintf(buf, PAGE_SIZE - 2, &iommu_cpumask);
+ buf[n++] = '\n';
+ buf[n] = '\0';
+ return n;
+}
+static DEVICE_ATTR(cpumask, S_IRUGO, _iommu_cpumask_show, NULL);
+
+static struct attribute *iommu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group amd_iommu_cpumask_group = {
+ .attrs = iommu_cpumask_attrs,
+};
+
+/*---------------------------------------------*/
+
+static int get_next_avail_iommu_bnk_cntr(struct perf_amd_iommu *perf_iommu)
+{
+ unsigned long flags;
+ int shift, bank, cntr, retval;
+ int max_banks = perf_iommu->max_banks;
+ int max_cntrs = perf_iommu->max_counters;
+
+ raw_spin_lock_irqsave(&perf_iommu->lock, flags);
+
+ for (bank = 0, shift = 0; bank < max_banks; bank++) {
+ for (cntr = 0; cntr < max_cntrs; cntr++) {
+ shift = bank + (bank*3) + cntr;
+ if (perf_iommu->cntr_assign_mask & (1ULL<<shift)) {
+ continue;
+ } else {
+ perf_iommu->cntr_assign_mask |= (1ULL<<shift);
+ retval = ((u16)((u16)bank<<8) | (u8)(cntr));
+ goto out;
+ }
+ }
+ }
+ retval = -ENOSPC;
+out:
+ raw_spin_unlock_irqrestore(&perf_iommu->lock, flags);
+ return retval;
+}
+
+static int clear_avail_iommu_bnk_cntr(struct perf_amd_iommu *perf_iommu,
+ u8 bank, u8 cntr)
+{
+ unsigned long flags;
+ int max_banks, max_cntrs;
+ int shift = 0;
+
+ max_banks = perf_iommu->max_banks;
+ max_cntrs = perf_iommu->max_counters;
+
+ if ((bank > max_banks) || (cntr > max_cntrs))
+ return -EINVAL;
+
+ shift = bank + cntr + (bank*3);
+
+ raw_spin_lock_irqsave(&perf_iommu->lock, flags);
+ perf_iommu->cntr_assign_mask &= ~(1ULL<<shift);
+ raw_spin_unlock_irqrestore(&perf_iommu->lock, flags);
+
+ return 0;
+}
+
+static int perf_iommu_event_init(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct perf_amd_iommu *perf_iommu;
+ u64 config, config1;
+
+ /* test the event attr type check for PMU enumeration */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /*
+ * IOMMU counters are shared across all cores.
+ * Therefore, it does not support per-process mode.
+ * Also, it does not support event sampling mode.
+ */
+ if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
+ return -EINVAL;
+
+ /* IOMMU counters do not have usr/os/guest/host bits */
+ if (event->attr.exclude_user || event->attr.exclude_kernel ||
+ event->attr.exclude_host || event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ perf_iommu = &__perf_iommu;
+
+ if (event->pmu != &perf_iommu->pmu)
+ return -ENOENT;
+
+ if (perf_iommu) {
+ config = event->attr.config;
+ config1 = event->attr.config1;
+ } else {
+ return -EINVAL;
+ }
+
+ /* integrate with iommu base devid (0000), assume one iommu */
+ perf_iommu->max_banks =
+ amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
+ perf_iommu->max_counters =
+ amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
+
+ /* update the hw_perf_event struct with the iommu config data */
+ hwc->config = config;
+ hwc->extra_reg.config = config1;
+
+ return 0;
+}
+
+static void perf_iommu_enable_event(struct perf_event *ev)
+{
+ u8 csource = _GET_CSOURCE(ev);
+ u16 devid = _GET_DEVID(ev);
+ u64 reg = 0ULL;
+
+ reg = csource;
+ amd_iommu_pc_get_set_reg_val(devid,
+ _GET_BANK(ev), _GET_CNTR(ev) ,
+ IOMMU_PC_COUNTER_SRC_REG, &reg, true);
+
+ reg = 0ULL | devid | (_GET_DEVID_MASK(ev) << 32);
+ if (reg)
+ reg |= (1UL << 31);
+ amd_iommu_pc_get_set_reg_val(devid,
+ _GET_BANK(ev), _GET_CNTR(ev) ,
+ IOMMU_PC_DEVID_MATCH_REG, &reg, true);
+
+ reg = 0ULL | _GET_PASID(ev) | (_GET_PASID_MASK(ev) << 32);
+ if (reg)
+ reg |= (1UL << 31);
+ amd_iommu_pc_get_set_reg_val(devid,
+ _GET_BANK(ev), _GET_CNTR(ev) ,
+ IOMMU_PC_PASID_MATCH_REG, &reg, true);
+
+ reg = 0ULL | _GET_DOMID(ev) | (_GET_DOMID_MASK(ev) << 32);
+ if (reg)
+ reg |= (1UL << 31);
+ amd_iommu_pc_get_set_reg_val(devid,
+ _GET_BANK(ev), _GET_CNTR(ev) ,
+ IOMMU_PC_DOMID_MATCH_REG, &reg, true);
+}
+
+static void perf_iommu_disable_event(struct perf_event *event)
+{
+ u64 reg = 0ULL;
+
+ amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+ _GET_BANK(event), _GET_CNTR(event),
+ IOMMU_PC_COUNTER_SRC_REG, &reg, true);
+}
+
+static void perf_iommu_start(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ pr_debug("perf: amd_iommu:perf_iommu_start\n");
+ if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
+ return;
+
+ WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+ hwc->state = 0;
+
+ if (flags & PERF_EF_RELOAD) {
+ u64 prev_raw_count = local64_read(&hwc->prev_count);
+ amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+ _GET_BANK(event), _GET_CNTR(event),
+ IOMMU_PC_COUNTER_REG, &prev_raw_count, true);
+ }
+
+ perf_iommu_enable_event(event);
+ perf_event_update_userpage(event);
+
+}
+
+static void perf_iommu_read(struct perf_event *event)
+{
+ u64 count = 0ULL;
+ u64 prev_raw_count = 0ULL;
+ u64 delta = 0ULL;
+ struct hw_perf_event *hwc = &event->hw;
+ pr_debug("perf: amd_iommu:perf_iommu_read\n");
+
+ amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+ _GET_BANK(event), _GET_CNTR(event),
+ IOMMU_PC_COUNTER_REG, &count, false);
+
+ /* IOMMU pc counter register is only 48 bits */
+ count &= 0xFFFFFFFFFFFFULL;
+
+ prev_raw_count = local64_read(&hwc->prev_count);
+ if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
+ count) != prev_raw_count)
+ return;
+
+ delta = count - prev_raw_count;
+ local64_add(delta, &event->count);
+
+}
+
+static void perf_iommu_stop(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ u64 config;
+
+ pr_debug("perf: amd_iommu:perf_iommu_stop\n");
+
+ if (hwc->state & PERF_HES_UPTODATE)
+ return;
+
+ perf_iommu_disable_event(event);
+ WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+ hwc->state |= PERF_HES_STOPPED;
+
+ if (hwc->state & PERF_HES_UPTODATE)
+ return;
+
+ config = hwc->config;
+ perf_iommu_read(event);
+ hwc->state |= PERF_HES_UPTODATE;
+}
+
+static int perf_iommu_add(struct perf_event *event, int flags)
+{
+ int retval;
+ struct perf_amd_iommu *perf_iommu =
+ container_of(event->pmu, struct perf_amd_iommu, pmu);
+
+ pr_debug("perf: amd_iommu:perf_iommu_add\n");
+ event->hw.state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+ /* request an iommu bank/counter */
+ retval = get_next_avail_iommu_bnk_cntr(perf_iommu);
+ if (retval != -ENOSPC)
+ event->hw.extra_reg.reg = (u16)retval;
+ else
+ return retval;
+
+ if (flags & PERF_EF_START)
+ perf_iommu_start(event, PERF_EF_RELOAD);
+
+ return 0;
+}
+
+static void perf_iommu_del(struct perf_event *event, int flags)
+{
+ struct perf_amd_iommu *perf_iommu =
+ container_of(event->pmu, struct perf_amd_iommu, pmu);
+
+ pr_debug("perf: amd_iommu:perf_iommu_del\n");
+ perf_iommu_stop(event, PERF_EF_UPDATE);
+
+ /* clear the assigned iommu bank/counter */
+ clear_avail_iommu_bnk_cntr(perf_iommu,
+ _GET_BANK(event),
+ _GET_CNTR(event));
+
+ perf_event_update_userpage(event);
+}
+
+static __init int _init_events_attrs(struct perf_amd_iommu *perf_iommu)
+{
+ struct attribute **attrs;
+ struct attribute_group *attr_group;
+ int i = 0, j;
+
+ while (amd_iommu_v2_event_descs[i].attr.attr.name)
+ i++;
+
+ attr_group = kzalloc(sizeof(struct attribute *)
+ * (i + 1) + sizeof(*attr_group), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ attrs = (struct attribute **)(attr_group + 1);
+ for (j = 0; j < i; j++)
+ attrs[j] = &amd_iommu_v2_event_descs[j].attr.attr;
+
+ attr_group->name = "events";
+ attr_group->attrs = attrs;
+ perf_iommu->events_group = attr_group;
+
+ return 0;
+}
+
+static __init void amd_iommu_pc_exit(void)
+{
+ if (__perf_iommu.events_group != NULL) {
+ kfree(__perf_iommu.events_group);
+ __perf_iommu.events_group = NULL;
+ }
+}
+
+static __init int _init_perf_amd_iommu(
+ struct perf_amd_iommu *perf_iommu, char *name)
+{
+ int ret;
+
+ raw_spin_lock_init(&perf_iommu->lock);
+
+ /* Init format attributes */
+ perf_iommu->format_group = &amd_iommu_format_group;
+
+ /* Init cpumask attributes to only core 0 */
+ cpumask_set_cpu(0, &iommu_cpumask);
+ perf_iommu->cpumask_group = &amd_iommu_cpumask_group;
+
+ /* Init events attributes */
+ if (_init_events_attrs(perf_iommu) != 0)
+ pr_err("perf: amd_iommu: Only support raw events.\n");
+
+ /* Init null attributes */
+ perf_iommu->null_group = NULL;
+ perf_iommu->pmu.attr_groups = perf_iommu->attr_groups;
+
+ ret = perf_pmu_register(&perf_iommu->pmu, name, -1);
+ if (ret) {
+ pr_err("perf: amd_iommu: Failed to initialized.\n");
+ amd_iommu_pc_exit();
+ } else {
+ pr_info("perf: amd_iommu: Detected. (%d banks, %d counters/bank)\n",
+ amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID),
+ amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID));
+ }
+
+ return ret;
+}
+
+static struct perf_amd_iommu __perf_iommu = {
+ .pmu = {
+ .event_init = perf_iommu_event_init,
+ .add = perf_iommu_add,
+ .del = perf_iommu_del,
+ .start = perf_iommu_start,
+ .stop = perf_iommu_stop,
+ .read = perf_iommu_read,
+ },
+ .max_banks = 0x00,
+ .max_counters = 0x00,
+ .cntr_assign_mask = 0ULL,
+ .format_group = NULL,
+ .cpumask_group = NULL,
+ .events_group = NULL,
+ .null_group = NULL,
+};
+
+static __init int amd_iommu_pc_init(void)
+{
+ /* Make sure the IOMMU PC resource is available */
+ if (!amd_iommu_pc_supported()) {
+ pr_err("perf: amd_iommu PMU not installed. No support!\n");
+ return -ENODEV;
+ }
+
+ _init_perf_amd_iommu(&__perf_iommu, "amd_iommu");
+
+ return 0;
+}
+
+device_initcall(amd_iommu_pc_init);
diff --git a/arch/x86/kernel/cpu/perf_event_amd_iommu.h b/arch/x86/kernel/cpu/perf_event_amd_iommu.h
new file mode 100644
index 0000000..845d173
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_amd_iommu.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright (C) 2013 Advanced Micro Devices, Inc.
+ *
+ * Author: Steven Kinney <[email protected]>
+ * Author: Suravee Suthikulpanit <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _PERF_EVENT_AMD_IOMMU_H_
+#define _PERF_EVENT_AMD_IOMMU_H_
+
+/* iommu pc mmio region register indexes */
+#define IOMMU_PC_COUNTER_REG 0x00
+#define IOMMU_PC_COUNTER_SRC_REG 0x08
+#define IOMMU_PC_PASID_MATCH_REG 0x10
+#define IOMMU_PC_DOMID_MATCH_REG 0x18
+#define IOMMU_PC_DEVID_MATCH_REG 0x20
+#define IOMMU_PC_COUNTER_REPORT_REG 0x28
+
+/* maximun specified bank/counters */
+#define PC_MAX_SPEC_BNKS 64
+#define PC_MAX_SPEC_CNTRS 16
+
+/* iommu pc reg masks*/
+#define IOMMU_BASE_DEVID 0x0000
+
+/* amd_iommu_init.c external support functions */
+extern bool amd_iommu_pc_supported(void);
+
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+
+extern int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr,
+ u8 fxn, u64 *value, bool is_write);
+
+#endif /*_PERF_EVENT_AMD_IOMMU_H_*/
--
1.7.10.4

2013-05-20 15:41:43

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

Peter,

Please let me know if you have any questions/concerns regarding the PMU
implementation.

Joerg,
Please let me know if you have any questions/concerns regarding the
changes in the IOMMU driver.

Thank you,

Suravee

On 5/17/2013 2:43 PM, [email protected] wrote:
> From: Suravee Suthikulpanit <[email protected]>
>
> These patches implement the AMD IOMMU Performance Counter functionality
> via custom perf PMU and implement static counting for various IOMMU
> translations.
>
> 1) Extend the AMD IOMMU initialization to include performance
> counter enablement.
>
> 2) The perf AMD IOMMU PMU to manage performance counters, which
> interface with the AMD IOMMU core driver.
>
> Steven L Kinney (1):
> perf/x86/amd: Adding IOMMU PC resource management
>
> Suravee Suthikulpanit (1):
> perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
>
> arch/x86/kernel/cpu/Makefile | 2 +-
> arch/x86/kernel/cpu/perf_event_amd_iommu.c | 500 ++++++++++++++++++++++++++++
> arch/x86/kernel/cpu/perf_event_amd_iommu.h | 40 +++
> drivers/iommu/amd_iommu_init.c | 121 ++++++-
> drivers/iommu/amd_iommu_proto.h | 7 +
> drivers/iommu/amd_iommu_types.h | 12 +-
> 6 files changed, 675 insertions(+), 7 deletions(-)
> create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
> create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h
>

2013-05-21 12:48:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

On Mon, May 20, 2013 at 10:41:29AM -0500, Suravee Suthikulanit wrote:
> Peter,
>
> Please let me know if you have any questions/concerns regarding the PMU
> implementation.

Looks good, how would you like to go about merging this? Should I push
it through Ingo's tree or do you prefer it goes through some IOMMU tree?

2013-05-21 13:29:52

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

On 5/21/2013 4:11 AM, Peter Zijlstra wrote:
> On Mon, May 20, 2013 at 10:41:29AM -0500, Suravee Suthikulanit wrote:
>> Peter,
>>
>> Please let me know if you have any questions/concerns regarding the PMU
>> implementation.
> Looks good, how would you like to go about merging this? Should I push
> it through Ingo's tree or do you prefer it goes through some IOMMU tree?
>
Thank you, Peter. Since this is mostly PERF-related feature, I think it
would be appropriate to push this through Ingo's tree.

Suravee

2013-05-21 13:52:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

On Tue, May 21, 2013 at 08:29:47AM -0500, Suravee Suthikulanit wrote:
> On 5/21/2013 4:11 AM, Peter Zijlstra wrote:
> >On Mon, May 20, 2013 at 10:41:29AM -0500, Suravee Suthikulanit wrote:
> >>Peter,
> >>
> >>Please let me know if you have any questions/concerns regarding the PMU
> >>implementation.
> >Looks good, how would you like to go about merging this? Should I push
> >it through Ingo's tree or do you prefer it goes through some IOMMU tree?
> >
> Thank you, Peter. Since this is mostly PERF-related feature, I think it
> would be appropriate to push this through Ingo's tree.

OK, I'll take them and will push them to Ingo.

Thanks!

2013-05-21 14:25:30

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

Hi Peter,

On Tue, May 21, 2013 at 03:52:31PM +0200, Peter Zijlstra wrote:
> OK, I'll take them and will push them to Ingo.

Please wait with that until I had a look at the IOMMU pieces.

Thanks,

Joerg

2013-05-21 15:37:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

On Tue, May 21, 2013 at 04:25:23PM +0200, Joerg Roedel wrote:
> Hi Peter,
>
> On Tue, May 21, 2013 at 03:52:31PM +0200, Peter Zijlstra wrote:
> > OK, I'll take them and will push them to Ingo.
>
> Please wait with that until I had a look at the IOMMU pieces.

Sure thing, I'll wait until I hear from you again.

2013-05-27 16:30:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

On Tue, May 21, 2013 at 04:25:23PM +0200, Joerg Roedel wrote:
> Hi Peter,
>
> On Tue, May 21, 2013 at 03:52:31PM +0200, Peter Zijlstra wrote:
> > OK, I'll take them and will push them to Ingo.
>
> Please wait with that until I had a look at the IOMMU pieces.

Have you had time to look at this yet? -- mostly sending this email to
make sure I don't forget about these patches.

2013-05-27 17:26:11

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support

Hi Peter,

On Mon, May 27, 2013 at 06:30:23PM +0200, Peter Zijlstra wrote:
> Have you had time to look at this yet? -- mostly sending this email to
> make sure I don't forget about these patches.

Sorry about that, I didn't forget it. But I was traveling last week and
had an outage of my root-server (where all my email is) over the
weekend. But it should all be back to normal now and I will look at
these patches tomorrow.

Thanks,

Joerg

2013-05-28 11:07:57

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management

On Fri, May 17, 2013 at 02:43:31PM -0500, Suthikulpanit, Suravee wrote:
> From: Steven L Kinney <[email protected]>
>
> Add functionality to check the availability of the AMD IOMMU Performance
> Counters and export this functionality to other core drivers, such as in this
> case, a perf AMD IOMMU PMU. This feature is not bound to any specific AMD
> family/model other than the presence of the IOMMU with PC enabled.
>
> The AMD IOMMU PC support static counting only at this time.
>
> Signed-off-by: Steven Kinney <[email protected]>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> V2 Changes:
> - Modify the amd_iommu_pc_get_set_reg_val function
> to support 64 bit values.
> - Fix logic to properly clear amd_iommu_pc_present when
> initialization failed.
>
> drivers/iommu/amd_iommu_init.c | 121 ++++++++++++++++++++++++++++++++++++++-
> drivers/iommu/amd_iommu_proto.h | 7 +++
> drivers/iommu/amd_iommu_types.h | 12 +++-
> 3 files changed, 134 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index bf51abb..395fa29 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
> u32 amd_iommu_max_pasids __read_mostly = ~0;
>
> bool amd_iommu_v2_present __read_mostly;
> +bool amd_iommu_pc_present __read_mostly;
>
> bool amd_iommu_force_isolation __read_mostly;
>
> @@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
> */
> static u8 __iomem * __init iommu_map_mmio_space(u64 address)
> {
> - if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
> + if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {

This length needs to be different on IOMMUv1 systems where the MMI
region is only 16kb large. Only use MMIO_REG_END_OFFSET on IOMMUv2
systems.

> +static void init_iommu_perf_ctr(struct amd_iommu *iommu)
> +{
> + u64 val = 0xabcd, val2 = 0;
> +
> + if (!iommu_feature(iommu, FEATURE_PC))
> + return;
> +
> + amd_iommu_pc_present = true;
> +
> + /* Check if the performance counters can be written to */
> + if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
> + (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
> + (val != val2)) {
> + pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
> + amd_iommu_pc_present = false;
> + return;
> + }
> +
> + pr_info("AMD-Vi: IOMMU performance counters " "supported\n");

Why are this two strings? This makes it hard to grep for the message.


> +u8 amd_iommu_pc_get_max_banks(u16 devid)
> +{
> + struct amd_iommu *iommu;
> +
> + /* locate the iommu governing the devid */
> + iommu = amd_iommu_rlookup_table[devid];
> +
> + if (iommu)
> + return iommu->max_banks;
> +
> + return -ENODEV;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);

You can't return -ENODEV in a function returning u8. Return int instead.

> +
> +bool amd_iommu_pc_supported(void)
> +{
> + return amd_iommu_pc_present;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_supported);
> +
> +u8 amd_iommu_pc_get_max_counters(u16 devid)
> +{
> + struct amd_iommu *iommu;
> +
> + /* locate the iommu governing the devid */
> + iommu = amd_iommu_rlookup_table[devid];
> +
> + if (iommu)
> + return iommu->max_counters;
> +
> + return -ENODEV;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);

Same here. Please don't return negative values as u8.

> +
> +int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
> + u64 *value, bool is_write)
> +{
> + struct amd_iommu *iommu;
> + u32 offset;
> + u32 max_offset_lim;
> +
> + /* Make sure the IOMMU PC resource is available */
> + if (!amd_iommu_pc_present) {
> + pr_info("AMD IOMMU - PC Not supported.\n");

This message is redundant. You already print about about performance
counter availability in the init code. Please remove it.

> + return -ENODEV;
> + }
> +
> + /* locate the iommu associated with the device ID */
> + iommu = amd_iommu_rlookup_table[devid];
> + if (iommu == NULL)
> + return -ENODEV;
> +
> + /* check for valid iommu pc register indexing */
> + if (fxn < 0 || fxn > 0x28 || (fxn & 7))
> + return -ENODEV;

Does this check for calling errors? If so, you might consider turning
this into a WARN_ON.

> +
> + offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
> +
> + /* limit the offset to the hw defined mmio region aperture */
> + max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
> + (iommu->max_counters << 8) | 0x28);
> + if ((offset < MMIO_CNTR_REG_OFFSET) ||
> + (offset > max_offset_lim))
> + return -EINVAL;
> +
> + if (is_write) {
> + writel((u32)*value, iommu->mmio_base + offset);
> + writel((*value >> 32), iommu->mmio_base + offset + 4);
> + } else {
> + *value = readl(iommu->mmio_base + offset + 4);
> + *value <<= 32;
> + *value = readl(iommu->mmio_base + offset);
> + }
> +
> + return 0;
> +}


Joerg

2013-05-28 12:18:55

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation

On Fri, May 17, 2013 at 02:43:32PM -0500, Suthikulpanit, Suravee wrote:
> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
> index b0684e4..fcbd3b8 100644
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o
> obj-$(CONFIG_PERF_EVENTS) += perf_event.o
>
> ifdef CONFIG_PERF_EVENTS
> -obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd.o perf_event_amd_uncore.o
> +obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o

This should also depend on CONFIG_AMD_IOMMU. If no IOMMU driver is
compiled in it doesn't make sense to have that PMU.

> +static int perf_iommu_event_init(struct perf_event *event)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> + struct perf_amd_iommu *perf_iommu;
> + u64 config, config1;
> +
> + /* test the event attr type check for PMU enumeration */
> + if (event->attr.type != event->pmu->type)
> + return -ENOENT;
> +
> + /*
> + * IOMMU counters are shared across all cores.
> + * Therefore, it does not support per-process mode.
> + * Also, it does not support event sampling mode.
> + */
> + if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> + return -EINVAL;
> +
> + /* IOMMU counters do not have usr/os/guest/host bits */
> + if (event->attr.exclude_user || event->attr.exclude_kernel ||
> + event->attr.exclude_host || event->attr.exclude_guest)
> + return -EINVAL;
> +
> + if (event->cpu < 0)
> + return -EINVAL;
> +
> + perf_iommu = &__perf_iommu;
> +
> + if (event->pmu != &perf_iommu->pmu)
> + return -ENOENT;
> +
> + if (perf_iommu) {
> + config = event->attr.config;
> + config1 = event->attr.config1;
> + } else {
> + return -EINVAL;
> + }
> +
> + /* integrate with iommu base devid (0000), assume one iommu */
> + perf_iommu->max_banks =
> + amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
> + perf_iommu->max_counters =
> + amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
> +
> + /* update the hw_perf_event struct with the iommu config data */
> + hwc->config = config;
> + hwc->extra_reg.config = config1;
> +
> + return 0;
> +}

That implementation is very basic. Any reason for not using the event
reporting mechanism of the IOMMU? You could implement a nice perf
iommutop or something to see which devices do the most transactions or
something like that.


Joerg

2013-05-28 16:29:28

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management

Joerg,

Thanks for the review. I'll post new patch set to address your comment
shortly.

Suravee

On 5/28/2013 6:07 AM, Joerg Roedel wrote:
> On Fri, May 17, 2013 at 02:43:31PM -0500, Suthikulpanit, Suravee wrote:
>> From: Steven L Kinney <[email protected]>
>>
>> Add functionality to check the availability of the AMD IOMMU Performance
>> Counters and export this functionality to other core drivers, such as in this
>> case, a perf AMD IOMMU PMU. This feature is not bound to any specific AMD
>> family/model other than the presence of the IOMMU with PC enabled.
>>
>> The AMD IOMMU PC support static counting only at this time.
>>
>> Signed-off-by: Steven Kinney <[email protected]>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> ---
>> V2 Changes:
>> - Modify the amd_iommu_pc_get_set_reg_val function
>> to support 64 bit values.
>> - Fix logic to properly clear amd_iommu_pc_present when
>> initialization failed.
>>
>> drivers/iommu/amd_iommu_init.c | 121 ++++++++++++++++++++++++++++++++++++++-
>> drivers/iommu/amd_iommu_proto.h | 7 +++
>> drivers/iommu/amd_iommu_types.h | 12 +++-
>> 3 files changed, 134 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
>> index bf51abb..395fa29 100644
>> --- a/drivers/iommu/amd_iommu_init.c
>> +++ b/drivers/iommu/amd_iommu_init.c
>> @@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
>> u32 amd_iommu_max_pasids __read_mostly = ~0;
>>
>> bool amd_iommu_v2_present __read_mostly;
>> +bool amd_iommu_pc_present __read_mostly;
>>
>> bool amd_iommu_force_isolation __read_mostly;
>>
>> @@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
>> */
>> static u8 __iomem * __init iommu_map_mmio_space(u64 address)
>> {
>> - if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
>> + if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {
> This length needs to be different on IOMMUv1 systems where the MMI
> region is only 16kb large. Only use MMIO_REG_END_OFFSET on IOMMUv2
> systems.
>
>> +static void init_iommu_perf_ctr(struct amd_iommu *iommu)
>> +{
>> + u64 val = 0xabcd, val2 = 0;
>> +
>> + if (!iommu_feature(iommu, FEATURE_PC))
>> + return;
>> +
>> + amd_iommu_pc_present = true;
>> +
>> + /* Check if the performance counters can be written to */
>> + if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
>> + (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
>> + (val != val2)) {
>> + pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
>> + amd_iommu_pc_present = false;
>> + return;
>> + }
>> +
>> + pr_info("AMD-Vi: IOMMU performance counters " "supported\n");
> Why are this two strings? This makes it hard to grep for the message.
>
>
>> +u8 amd_iommu_pc_get_max_banks(u16 devid)
>> +{
>> + struct amd_iommu *iommu;
>> +
>> + /* locate the iommu governing the devid */
>> + iommu = amd_iommu_rlookup_table[devid];
>> +
>> + if (iommu)
>> + return iommu->max_banks;
>> +
>> + return -ENODEV;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);
> You can't return -ENODEV in a function returning u8. Return int instead.
>
>> +
>> +bool amd_iommu_pc_supported(void)
>> +{
>> + return amd_iommu_pc_present;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_supported);
>> +
>> +u8 amd_iommu_pc_get_max_counters(u16 devid)
>> +{
>> + struct amd_iommu *iommu;
>> +
>> + /* locate the iommu governing the devid */
>> + iommu = amd_iommu_rlookup_table[devid];
>> +
>> + if (iommu)
>> + return iommu->max_counters;
>> +
>> + return -ENODEV;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);
> Same here. Please don't return negative values as u8.
>
>> +
>> +int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
>> + u64 *value, bool is_write)
>> +{
>> + struct amd_iommu *iommu;
>> + u32 offset;
>> + u32 max_offset_lim;
>> +
>> + /* Make sure the IOMMU PC resource is available */
>> + if (!amd_iommu_pc_present) {
>> + pr_info("AMD IOMMU - PC Not supported.\n");
> This message is redundant. You already print about about performance
> counter availability in the init code. Please remove it.
>
>> + return -ENODEV;
>> + }
>> +
>> + /* locate the iommu associated with the device ID */
>> + iommu = amd_iommu_rlookup_table[devid];
>> + if (iommu == NULL)
>> + return -ENODEV;
>> +
>> + /* check for valid iommu pc register indexing */
>> + if (fxn < 0 || fxn > 0x28 || (fxn & 7))
>> + return -ENODEV;
> Does this check for calling errors? If so, you might consider turning
> this into a WARN_ON.
>
>> +
>> + offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
>> +
>> + /* limit the offset to the hw defined mmio region aperture */
>> + max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
>> + (iommu->max_counters << 8) | 0x28);
>> + if ((offset < MMIO_CNTR_REG_OFFSET) ||
>> + (offset > max_offset_lim))
>> + return -EINVAL;
>> +
>> + if (is_write) {
>> + writel((u32)*value, iommu->mmio_base + offset);
>> + writel((*value >> 32), iommu->mmio_base + offset + 4);
>> + } else {
>> + *value = readl(iommu->mmio_base + offset + 4);
>> + *value <<= 32;
>> + *value = readl(iommu->mmio_base + offset);
>> + }
>> +
>> + return 0;
>> +}
>
> Joerg
>
>
>

2013-05-28 17:17:38

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation

On 5/28/2013 7:18 AM, Joerg Roedel wrote:
> On Fri, May 17, 2013 at 02:43:32PM -0500, Suthikulpanit, Suravee wrote:
>> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
>> index b0684e4..fcbd3b8 100644
>> --- a/arch/x86/kernel/cpu/Makefile
>> +++ b/arch/x86/kernel/cpu/Makefile
>> @@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o
>> obj-$(CONFIG_PERF_EVENTS) += perf_event.o
>>
>> ifdef CONFIG_PERF_EVENTS
>> -obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd.o perf_event_amd_uncore.o
>> +obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o
> This should also depend on CONFIG_AMD_IOMMU. If no IOMMU driver is
> compiled in it doesn't make sense to have that PMU.
I will take care of this.
> + } else {
> + return -EINVAL;
> + }
> +
> + /* integrate with iommu base devid (0000), assume one iommu */
> + perf_iommu->max_banks =
> + amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
> + perf_iommu->max_counters =
> + amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
> +
> + /* update the hw_perf_event struct with the iommu config data */
> + hwc->config = config;
> + hwc->extra_reg.config = config1;
> +
> + return 0;
> +}
> That implementation is very basic. Any reason for not using the event
> reporting mechanism of the IOMMU? You could implement a nice perf
> iommutop or something to see which devices do the most transactions or
> something like that.
>
>
> Joerg
This patch is adding perf system-wide counting mode support which is
used by "perf stat" tool. We are not implementing the sampling mode
since MSI interrupt of the IOMMU cannot be used for current perf
sampling tools (e.g. perf record or top) since the IOMMU counters are
not core-specific. The current "perf record" and "perf top" needs to
attribute each sample to a particular core/pid which would allow the
tools to figure out the instruction pointer and map the sample to a
paticular module.

If I understand correctly, when you mentioned "perf iommutop", you want
a new perf user-space tool which will show real-time IOMMU events per
IOMMU HW and/or device?

Suravee

2013-05-29 11:24:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation

On Tue, May 28, 2013 at 12:17:28PM -0500, Suravee Suthikulanit wrote:
> On 5/28/2013 7:18 AM, Joerg Roedel wrote:

> >That implementation is very basic. Any reason for not using the event
> >reporting mechanism of the IOMMU? You could implement a nice perf
> >iommutop or something to see which devices do the most transactions or
> >something like that.

> This patch is adding perf system-wide counting mode support which is used by
> "perf stat" tool. We are not implementing the sampling mode since MSI
> interrupt of the IOMMU cannot be used for current perf sampling tools (e.g.
> perf record or top) since the IOMMU counters are not core-specific. The
> current "perf record" and "perf top" needs to attribute each sample to a
> particular core/pid which would allow the tools to figure out the
> instruction pointer and map the sample to a paticular module.
>
> If I understand correctly, when you mentioned "perf iommutop", you want a
> new perf user-space tool which will show real-time IOMMU events per IOMMU HW
> and/or device?

Right, unless there's more to the IOMMU event reporting than setting an
event threshold to get an interrupt of kinds I don't see how an
interrupt would be useful except for making sure we don't loose a
counter overflow.

Note that for Intel uncore we poll with a software timer to ensure we
don't miss the overflow because its interrupt facility is either broken
or missing.

If otoh the event reporting thing includes more data than just 'hey
counter overflow!' it might be useful. Not exactly sure how yet because
it would be the first PMU to need this.