2014-02-03 12:56:01

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 00/10] perf/x86/uncore: add support for SNB/IVB/HSW integrated memory controller PMU

This patch series adds support for SandyBridge, IvyBridge, Haswell client (desktop/mobile)
processor PCI-based integrated memory controller PMU. This PMU provides a few free running
32-bit counters which can be used to determine memory bandwidth utilization.

The code is based on the documentation at:
http://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel

The patches implement a new uncore PMU called uncore_imc.
It exports its format and events in sysfs as usual.

The following events are currently defined:
- name: uncore_imc/data_reads/
- code: 0x1
- unit: 64 bytes
- number of full cacheline (64 bytes) read requests to the IMC

- name: uncore_imc/data_writes/
- code: 0x2
- unit: 64 bytes
- number of full cacheline (64 bytes) write requests to the IMC

The unit and scale of each event are also exposed in sysfs and
are picked up by perf stat (needs v3.13).

The uncore_imc PMU is by construction system-wide only and counting
mode only. There is no priv level filtering. The kernel enforces
those restrictions. Counters are 32-bit and do not generate overflow
interrupts, therefore the kernel uses a hrtimer to poll the counters
and avoid missing an overflow.

The series includes an optional patch to alter the unit of the event
to be Mebibytes in perf stat. The kernel still exports counts as
64 bytes increments (raw value of the counter).

To use the PMU with perf stat:
# perf stat -a -e uncore_imc/data_reads/,uncore_imc/data_writes/ -I 1000 sleep 100
# time counts unit events
1.000169151 180.62 MiB uncore_imc/data_reads/
1.000169151 0.14 MiB uncore_imc/data_writes/
2.000506913 180.37 MiB uncore_imc/data_reads/
2.000506913 0.02 MiB uncore_imc/data_writes/
3.000748105 180.32 MiB uncore_imc/data_reads/
3.000748105 0.02 MiB uncore_imc/data_writes/
4.000991441 180.30 MiB uncore_imc/data_reads/

Signed-off-by: Stephane Eranian <[email protected]>

Stephane Eranian (10):
perf/x86/uncore: fix initialization of cpumask
perf/x86/uncore: add ability to customize pmu callbacks
perf/x86/uncore: do not assume PCI fixed ctrs have more than 32 bits
perf/x86/uncore: add PCI ids for SNB/IVB/HSW IMC
perf/x86/uncore: make hrtimer timeout configurable per box
perf/x86/uncore: move uncore_event_to_box() and uncore_pmu_to_box()
perf/x86/uncore: allow more than one fixed counter per box
perf/x86/uncore: add SNB/IVB/HSW client uncore memory controller support
perf/x86/uncore: add hrtimer to SNB uncore IMC PMU
perf/x86/uncore: use MiB unit for events for SNB/IVB/HSW IMC

arch/x86/kernel/cpu/perf_event_intel_uncore.c | 550 +++++++++++++++++++++----
arch/x86/kernel/cpu/perf_event_intel_uncore.h | 48 ++-
include/linux/pci_ids.h | 3 +
3 files changed, 513 insertions(+), 88 deletions(-)

--
1.7.9.5


2014-02-03 12:56:09

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 01/10] perf/x86/uncore: fix initialization of cpumask

On certain processors, the uncore PMU boxes may only be
msr-bsed or PCI-based. But in both cases, the cpumask,
suggesting on which CPUs to monitor to get full coverage
of the particular PMU, must be created.

However with the current code base, the cpumask was only
created on processor which had at least one MSR-based
uncore PMU. This patch removes that restriction and
ensures the cpumask is created even when there is no
msr-based PMU. For instance, on SNB client where only
a PCI-based memory controller PMU is supported.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 61 +++++++++++++++----------
1 file changed, 37 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 29c2487..fe4255b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -3764,7 +3764,7 @@ static void __init uncore_cpu_setup(void *dummy)

static int __init uncore_cpu_init(void)
{
- int ret, cpu, max_cores;
+ int ret, max_cores;

max_cores = boot_cpu_data.x86_max_cores;
switch (boot_cpu_data.x86_model) {
@@ -3808,29 +3808,6 @@ static int __init uncore_cpu_init(void)
if (ret)
return ret;

- get_online_cpus();
-
- for_each_online_cpu(cpu) {
- int i, phys_id = topology_physical_package_id(cpu);
-
- for_each_cpu(i, &uncore_cpu_mask) {
- if (phys_id == topology_physical_package_id(i)) {
- phys_id = -1;
- break;
- }
- }
- if (phys_id < 0)
- continue;
-
- uncore_cpu_prepare(cpu, phys_id);
- uncore_event_init_cpu(cpu);
- }
- on_each_cpu(uncore_cpu_setup, NULL, 1);
-
- register_cpu_notifier(&uncore_cpu_nb);
-
- put_online_cpus();
-
return 0;
}

@@ -3859,6 +3836,41 @@ static int __init uncore_pmus_register(void)
return 0;
}

+static void uncore_cpumask_init(void)
+{
+ int cpu;
+
+ /*
+ * ony invoke once from msr or pci init code
+ */
+ if (!cpumask_empty(&uncore_cpu_mask))
+ return;
+
+ get_online_cpus();
+
+ for_each_online_cpu(cpu) {
+ int i, phys_id = topology_physical_package_id(cpu);
+
+ for_each_cpu(i, &uncore_cpu_mask) {
+ if (phys_id == topology_physical_package_id(i)) {
+ phys_id = -1;
+ break;
+ }
+ }
+ if (phys_id < 0)
+ continue;
+
+ uncore_cpu_prepare(cpu, phys_id);
+ uncore_event_init_cpu(cpu);
+ }
+ on_each_cpu(uncore_cpu_setup, NULL, 1);
+
+ register_cpu_notifier(&uncore_cpu_nb);
+
+ put_online_cpus();
+}
+
+
static int __init intel_uncore_init(void)
{
int ret;
@@ -3877,6 +3889,7 @@ static int __init intel_uncore_init(void)
uncore_pci_exit();
goto fail;
}
+ uncore_cpumask_init();

uncore_pmus_register();
return 0;
--
1.7.9.5

2014-02-03 12:56:12

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 02/10] perf/x86/uncore: add ability to customize pmu callbacks

This patch enables custom struct pmu callbacks per uncore
PMU types. This feature may be used to simplify counter
setup for certain uncore PMUs which have free running
counters for instance. It becomes possible to bypass
the event scheduling phase of the configuration.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 25 +++++++++++++++----------
arch/x86/kernel/cpu/perf_event_intel_uncore.h | 1 +
2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index fe4255b..e6f32b3 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -3271,16 +3271,21 @@ static int __init uncore_pmu_register(struct intel_uncore_pmu *pmu)
{
int ret;

- pmu->pmu = (struct pmu) {
- .attr_groups = pmu->type->attr_groups,
- .task_ctx_nr = perf_invalid_context,
- .event_init = uncore_pmu_event_init,
- .add = uncore_pmu_event_add,
- .del = uncore_pmu_event_del,
- .start = uncore_pmu_event_start,
- .stop = uncore_pmu_event_stop,
- .read = uncore_pmu_event_read,
- };
+ if (!pmu->type->pmu) {
+ pmu->pmu = (struct pmu) {
+ .attr_groups = pmu->type->attr_groups,
+ .task_ctx_nr = perf_invalid_context,
+ .event_init = uncore_pmu_event_init,
+ .add = uncore_pmu_event_add,
+ .del = uncore_pmu_event_del,
+ .start = uncore_pmu_event_start,
+ .stop = uncore_pmu_event_stop,
+ .read = uncore_pmu_event_read,
+ };
+ } else {
+ pmu->pmu = *pmu->type->pmu;
+ pmu->pmu.attr_groups = pmu->type->attr_groups;
+ }

if (pmu->type->num_boxes == 1) {
if (strlen(pmu->type->name) > 0)
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index a80ab71..77dc9a5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -440,6 +440,7 @@ struct intel_uncore_type {
struct intel_uncore_ops *ops;
struct uncore_event_desc *event_descs;
const struct attribute_group *attr_groups[4];
+ struct pmu *pmu; /* for custom pmu ops */
};

#define pmu_group attr_groups[0]
--
1.7.9.5

2014-02-03 12:56:23

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 04/10] perf/x86/uncore: add PCI ids for SNB/IVB/HSW IMC

This patch adds the PCI ids for the Intel SandyBridge,
IvyBridge, Haswell Client memory controller (IMC).

Signed-off-by: Stephane Eranian <[email protected]>
---
include/linux/pci_ids.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 97fbecd..7399e6a 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2531,6 +2531,9 @@

#define PCI_VENDOR_ID_INTEL 0x8086
#define PCI_DEVICE_ID_INTEL_EESSC 0x0008
+#define PCI_DEVICE_ID_INTEL_SNB_IMC 0x0100
+#define PCI_DEVICE_ID_INTEL_IVB_IMC 0x0154
+#define PCI_DEVICE_ID_INTEL_HSW_IMC 0x0c00
#define PCI_DEVICE_ID_INTEL_PXHD_0 0x0320
#define PCI_DEVICE_ID_INTEL_PXHD_1 0x0321
#define PCI_DEVICE_ID_INTEL_PXH_0 0x0329
--
1.7.9.5

2014-02-03 12:56:25

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 03/10] perf/x86/uncore: do not assume PCI fixed ctrs have more than 32 bits

The current code assumes all PCI fixed counters implement more than
32-bit hardware counters. The actual width is then round up to 64 to
enable base + 8 * idx calculations.

Not all PMUs necessarily implement counters with more than 32-bits.
The patch makes the uncore_pci_perf_ctr() function dynamically
determine the actual bits width of a pci perf counter.

This patch paves the way for handling more than one uncore fixed
counter per PMU box.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.h | 31 ++++++++++++++++---------
1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index 77dc9a5..f5549cf 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -548,10 +548,29 @@ unsigned uncore_pci_event_ctl(struct intel_uncore_box *box, int idx)
return idx * 4 + box->pmu->type->event_ctl;
}

+static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
+{
+ return box->pmu->type->perf_ctr_bits;
+}
+
+static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
+{
+ return box->pmu->type->fixed_ctr_bits;
+}
+
static inline
unsigned uncore_pci_perf_ctr(struct intel_uncore_box *box, int idx)
{
- return idx * 8 + box->pmu->type->perf_ctr;
+ int bits, bytes;
+
+ if (idx == UNCORE_PMC_IDX_FIXED)
+ bits = uncore_fixed_ctr_bits(box);
+ else
+ bits = uncore_perf_ctr_bits(box);
+
+ bytes = round_up(bits, 8);
+
+ return idx * bytes + box->pmu->type->perf_ctr;
}

static inline unsigned uncore_msr_box_offset(struct intel_uncore_box *box)
@@ -633,16 +652,6 @@ unsigned uncore_perf_ctr(struct intel_uncore_box *box, int idx)
return uncore_msr_perf_ctr(box, idx);
}

-static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
-{
- return box->pmu->type->perf_ctr_bits;
-}
-
-static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
-{
- return box->pmu->type->fixed_ctr_bits;
-}
-
static inline int uncore_num_counters(struct intel_uncore_box *box)
{
return box->pmu->type->num_counters;
--
1.7.9.5

2014-02-03 12:56:32

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 07/10] perf/x86/uncore: allow more than one fixed counter per box

This patch modifies some of the helper functions to support
more than one fixed counter per uncore PCI PMU box.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 4 ++--
arch/x86/kernel/cpu/perf_event_intel_uncore.h | 22 +++++++++++++---------
2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 2980994c..69a4ad0 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -2777,8 +2777,8 @@ static void uncore_assign_hw_event(struct intel_uncore_box *box, struct perf_eve
hwc->idx = idx;
hwc->last_tag = ++box->tags[idx];

- if (hwc->idx == UNCORE_PMC_IDX_FIXED) {
- hwc->event_base = uncore_fixed_ctr(box);
+ if (hwc->idx >= UNCORE_PMC_IDX_FIXED) {
+ hwc->event_base = uncore_fixed_ctr(box, idx);
hwc->config_base = uncore_fixed_ctl(box);
return;
}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index 433c180..c63a3ff 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -538,9 +538,18 @@ static inline unsigned uncore_pci_fixed_ctl(struct intel_uncore_box *box)
return box->pmu->type->fixed_ctl;
}

-static inline unsigned uncore_pci_fixed_ctr(struct intel_uncore_box *box)
+static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
+{
+ return box->pmu->type->fixed_ctr_bits;
+}
+
+static inline unsigned uncore_pci_fixed_ctr(struct intel_uncore_box *box,
+ int idx)
{
- return box->pmu->type->fixed_ctr;
+ int bits = uncore_fixed_ctr_bits(box);
+ int bytes = round_up(bits, 8);
+
+ return idx * bytes + box->pmu->type->fixed_ctr;
}

static inline
@@ -554,11 +563,6 @@ static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
return box->pmu->type->perf_ctr_bits;
}

-static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
-{
- return box->pmu->type->fixed_ctr_bits;
-}
-
static inline
unsigned uncore_pci_perf_ctr(struct intel_uncore_box *box, int idx)
{
@@ -627,10 +631,10 @@ unsigned uncore_fixed_ctl(struct intel_uncore_box *box)
}

static inline
-unsigned uncore_fixed_ctr(struct intel_uncore_box *box)
+unsigned uncore_fixed_ctr(struct intel_uncore_box *box, int idx)
{
if (box->pci_dev)
- return uncore_pci_fixed_ctr(box);
+ return uncore_pci_fixed_ctr(box, idx);
else
return uncore_msr_fixed_ctr(box);
}
--
1.7.9.5

2014-02-03 12:56:37

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 08/10] perf/x86/uncore: add SNB/IVB/HSW client uncore memory controller support

This patch adds a new uncore PMU for Intel SNB/IVB/HSW client
CPUs. It adds the Integrated Memory Controller (IMC) PMU. This
new PMU provides a set of events to measure memory bandwidth utilization.

The IMC on those processor is PCI-space based. This patch
exposes a new uncore PMU on those processor: uncore_imc

Two new events are defined:
- name: data_reads
- code: 0x1
- unit: 64 bytes
- number of full cacheline read requests to the IMC

- name: data_writes
- code: 0x2
- unit: 64 bytes
- number of full cacheline write requests to the IMC

Documentation available at:
http://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 370 +++++++++++++++++++++++++
arch/x86/kernel/cpu/perf_event_intel_uncore.h | 1 +
2 files changed, 371 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 69a4ad0..8b1f81f 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -66,6 +66,12 @@ DEFINE_UNCORE_FORMAT_ATTR(mask_vnw, mask_vnw, "config2:3-4");
DEFINE_UNCORE_FORMAT_ATTR(mask0, mask0, "config2:0-31");
DEFINE_UNCORE_FORMAT_ATTR(mask1, mask1, "config2:32-63");

+static void uncore_pmu_start_hrtimer(struct intel_uncore_box *box);
+static void uncore_pmu_cancel_hrtimer(struct intel_uncore_box *box);
+static void uncore_perf_event_update(struct intel_uncore_box *box,
+ struct perf_event *event);
+static void uncore_pmu_event_read(struct perf_event *event);
+
static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
{
return container_of(event->pmu, struct intel_uncore_pmu, pmu);
@@ -1668,6 +1674,348 @@ static struct intel_uncore_type *snb_msr_uncores[] = {
&snb_uncore_cbox,
NULL,
};
+
+enum {
+ SNB_PCI_UNCORE_IMC,
+};
+
+static struct uncore_event_desc snb_uncore_imc_events[] = {
+ INTEL_UNCORE_EVENT_DESC(data_reads, "event=0x01"),
+ INTEL_UNCORE_EVENT_DESC(data_reads.scale, "64"),
+ INTEL_UNCORE_EVENT_DESC(data_reads.unit, "bytes"),
+
+ INTEL_UNCORE_EVENT_DESC(data_writes, "event=0x02"),
+ INTEL_UNCORE_EVENT_DESC(data_writes.scale, "64"),
+ INTEL_UNCORE_EVENT_DESC(data_writes.unit, "bytes"),
+
+ { /* end: all zeroes */ },
+};
+
+#define SNB_UNCORE_PCI_IMC_EVENT_MASK 0xff
+#define SNB_UNCORE_PCI_IMC_BAR_OFFSET 0x48
+
+/* page size multiple covering all config regs */
+#define SNB_UNCORE_PCI_IMC_MAP_SIZE 0x6000
+
+#define SNB_UNCORE_PCI_IMC_DATA_READS 0x1
+#define SNB_UNCORE_PCI_IMC_DATA_READS_BASE 0x5050
+#define SNB_UNCORE_PCI_IMC_DATA_WRITES 0x2
+#define SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE 0x5054
+#define SNB_UNCORE_PCI_IMC_CTR_BASE 0x5050
+
+static struct attribute *snb_uncore_imc_formats_attr[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group snb_uncore_imc_format_group = {
+ .name = "format",
+ .attrs = snb_uncore_imc_formats_attr,
+};
+
+static void snb_uncore_imc_init_box(struct intel_uncore_box *box)
+{
+ struct pci_dev *pdev = box->pci_dev;
+ u32 addr_lo, addr_hi;
+ resource_size_t addr;
+
+ pci_read_config_dword(pdev, SNB_UNCORE_PCI_IMC_BAR_OFFSET, &addr_lo);
+ addr = addr_lo;
+
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+ pci_read_config_dword(pdev, SNB_UNCORE_PCI_IMC_BAR_OFFSET+4, &addr_hi);
+ addr = ((resource_size_t)addr_hi << 32) | addr_lo;
+#endif
+
+ addr &= ~(PAGE_SIZE - 1);
+
+ box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
+}
+
+static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)
+{}
+
+static void snb_uncore_imc_disable_box(struct intel_uncore_box *box)
+{}
+
+static void snb_uncore_imc_enable_event(struct intel_uncore_box *box,
+ struct perf_event *event)
+{}
+
+static void snb_uncore_imc_disable_event(struct intel_uncore_box *box,
+ struct perf_event *event)
+{}
+
+static u64 snb_uncore_imc_read_counter(struct intel_uncore_box *box,
+ struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ return (u64)*(unsigned int *)(box->io_addr + hwc->event_base);
+}
+
+/*
+ * custom event_init() function because we define our own fixed, free
+ * running counters, so we do not want to conflict with generic uncore
+ * logic. Also simplifies processing
+ */
+static int snb_uncore_imc_event_init(struct perf_event *event)
+{
+ struct intel_uncore_pmu *pmu;
+ struct intel_uncore_box *box;
+ struct hw_perf_event *hwc = &event->hw;
+ u64 cfg = event->attr.config & SNB_UNCORE_PCI_IMC_EVENT_MASK;
+ int idx, base;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ pmu = uncore_event_to_pmu(event);
+ /* no device found for this pmu */
+ if (pmu->func_id < 0)
+ return -ENOENT;
+
+ /* Sampling not supported yet */
+ if (hwc->sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest ||
+ event->attr.sample_period) /* no sampling */
+ return -EINVAL;
+
+ /*
+ * Place all uncore events for a particular physical package
+ * onto a single cpu
+ */
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ /* check only supported bits are set */
+ if (event->attr.config & ~SNB_UNCORE_PCI_IMC_EVENT_MASK)
+ return -EINVAL;
+
+ box = uncore_pmu_to_box(pmu, event->cpu);
+ if (!box || box->cpu < 0)
+ return -EINVAL;
+
+ event->cpu = box->cpu;
+
+ event->hw.idx = -1;
+ event->hw.last_tag = ~0ULL;
+ event->hw.extra_reg.idx = EXTRA_REG_NONE;
+ event->hw.branch_reg.idx = EXTRA_REG_NONE;
+ /*
+ * check event is known (whitelist, determines counter)
+ */
+ switch (cfg) {
+ case SNB_UNCORE_PCI_IMC_DATA_READS:
+ base = SNB_UNCORE_PCI_IMC_DATA_READS_BASE;
+ idx = UNCORE_PMC_IDX_FIXED;
+ break;
+ case SNB_UNCORE_PCI_IMC_DATA_WRITES:
+ base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
+ idx = UNCORE_PMC_IDX_FIXED + 1;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* must be done before validate_group */
+ event->hw.event_base = base;
+ event->hw.config = cfg;
+ event->hw.idx = idx;
+
+ /* no group validation needed, we have free running counters */
+
+ return 0;
+}
+
+static int snb_uncore_imc_hw_config(struct intel_uncore_box *box,
+ struct perf_event *event)
+{
+ return 0;
+}
+
+static void snb_uncore_imc_event_start(struct perf_event *event, int flags)
+{
+ struct intel_uncore_box *box = uncore_event_to_box(event);
+ u64 count;
+
+ if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
+ return;
+
+ event->hw.state = 0;
+ box->n_active++;
+
+ list_add_tail(&event->active_entry, &box->active_list);
+
+ count = snb_uncore_imc_read_counter(box, event);
+ local64_set(&event->hw.prev_count, count);
+
+ if (box->n_active == 1)
+ uncore_pmu_start_hrtimer(box);
+}
+
+static void snb_uncore_imc_event_stop(struct perf_event *event, int flags)
+{
+ struct intel_uncore_box *box = uncore_event_to_box(event);
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (!(hwc->state & PERF_HES_STOPPED)) {
+ box->n_active--;
+
+ WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+ hwc->state |= PERF_HES_STOPPED;
+
+ list_del(&event->active_entry);
+
+ if (box->n_active == 0)
+ uncore_pmu_cancel_hrtimer(box);
+ }
+
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ /*
+ * Drain the remaining delta count out of a event
+ * that we are disabling:
+ */
+ uncore_perf_event_update(box, event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+}
+
+static int snb_uncore_imc_event_add(struct perf_event *event, int flags)
+{
+ struct intel_uncore_box *box = uncore_event_to_box(event);
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (!box)
+ return -ENODEV;
+
+ hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+ if (!(flags & PERF_EF_START))
+ hwc->state |= PERF_HES_ARCH;
+
+ snb_uncore_imc_event_start(event, 0);
+
+ box->n_events++;
+
+ return 0;
+}
+
+static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
+{
+ struct intel_uncore_box *box = uncore_event_to_box(event);
+ int i;
+
+ snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
+
+ for (i = 0; i < box->n_events; i++) {
+ if (event == box->event_list[i]) {
+ --box->n_events;
+ break;
+ }
+ }
+}
+
+static int snb_pci2phy_map_init(int devid)
+{
+ struct pci_dev *dev = NULL;
+ int bus;
+
+ dev = pci_get_device(PCI_VENDOR_ID_INTEL, devid, dev);
+ if (!dev)
+ return -ENOTTY;
+
+ bus = dev->bus->number;
+
+ pcibus_to_physid[bus] = 0;
+
+ pci_dev_put(dev);
+
+ return 0;
+}
+
+static struct pmu snb_uncore_imc_pmu = {
+ .task_ctx_nr = perf_invalid_context,
+ .event_init = snb_uncore_imc_event_init,
+ .add = snb_uncore_imc_event_add,
+ .del = snb_uncore_imc_event_del,
+ .start = snb_uncore_imc_event_start,
+ .stop = snb_uncore_imc_event_stop,
+ .read = uncore_pmu_event_read,
+};
+
+static struct intel_uncore_ops snb_uncore_imc_ops = {
+ .init_box = snb_uncore_imc_init_box,
+ .enable_box = snb_uncore_imc_enable_box,
+ .disable_box = snb_uncore_imc_disable_box,
+ .disable_event = snb_uncore_imc_disable_event,
+ .enable_event = snb_uncore_imc_enable_event,
+ .hw_config = snb_uncore_imc_hw_config,
+ .read_counter = snb_uncore_imc_read_counter,
+};
+
+static struct intel_uncore_type snb_uncore_imc = {
+ .name = "imc",
+ .num_counters = 2,
+ .num_boxes = 1,
+ .fixed_ctr_bits = 32,
+ .fixed_ctr = SNB_UNCORE_PCI_IMC_CTR_BASE,
+ .event_descs = snb_uncore_imc_events,
+ .format_group = &snb_uncore_imc_format_group,
+ .perf_ctr = SNB_UNCORE_PCI_IMC_DATA_READS_BASE,
+ .event_mask = SNB_UNCORE_PCI_IMC_EVENT_MASK,
+ .ops = &snb_uncore_imc_ops,
+ .pmu = &snb_uncore_imc_pmu,
+};
+
+static struct intel_uncore_type *snb_pci_uncores[] = {
+ [SNB_PCI_UNCORE_IMC] = &snb_uncore_imc,
+ NULL,
+};
+
+static DEFINE_PCI_DEVICE_TABLE(snb_uncore_pci_ids) = {
+ { /* IMC */
+ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_SNB_IMC),
+ .driver_data = UNCORE_PCI_DEV_DATA(SNB_PCI_UNCORE_IMC, 0),
+ },
+};
+
+static DEFINE_PCI_DEVICE_TABLE(ivb_uncore_pci_ids) = {
+ { /* IMC */
+ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_IMC),
+ .driver_data = UNCORE_PCI_DEV_DATA(SNB_PCI_UNCORE_IMC, 0),
+ },
+};
+
+static DEFINE_PCI_DEVICE_TABLE(hsw_uncore_pci_ids) = {
+ { /* IMC */
+ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_HSW_IMC),
+ .driver_data = UNCORE_PCI_DEV_DATA(SNB_PCI_UNCORE_IMC, 0),
+ },
+};
+
+static struct pci_driver snb_uncore_pci_driver = {
+ .name = "snb_uncore",
+ .id_table = snb_uncore_pci_ids,
+};
+
+static struct pci_driver ivb_uncore_pci_driver = {
+ .name = "ivb_uncore",
+ .id_table = ivb_uncore_pci_ids,
+};
+
+static struct pci_driver hsw_uncore_pci_driver = {
+ .name = "hsw_uncore",
+ .id_table = hsw_uncore_pci_ids,
+};
+
/* end of Sandy Bridge uncore support */

/* Nehalem uncore support */
@@ -3502,6 +3850,28 @@ static int __init uncore_pci_init(void)
pci_uncores = ivt_pci_uncores;
uncore_pci_driver = &ivt_uncore_pci_driver;
break;
+ case 42: /* Sandy Bridge */
+ ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_SNB_IMC);
+ if (ret)
+ return ret;
+ pci_uncores = snb_pci_uncores;
+ uncore_pci_driver = &snb_uncore_pci_driver;
+ break;
+ case 60: /* Haswell */
+ case 69: /* Haswell Celeron */
+ ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_HSW_IMC);
+ if (ret)
+ return ret;
+ pci_uncores = snb_pci_uncores;
+ uncore_pci_driver = &hsw_uncore_pci_driver;
+ break;
+ case 58: /* Ivy Bridge */
+ ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_IVB_IMC);
+ if (ret)
+ return ret;
+ pci_uncores = snb_pci_uncores;
+ uncore_pci_driver = &ivb_uncore_pci_driver;
+ break;
default:
return 0;
}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index c63a3ff..0770da2 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -492,6 +492,7 @@ struct intel_uncore_box {
u64 hrtimer_duration; /* hrtimer timeout for this box */
struct hrtimer hrtimer;
struct list_head list;
+ void *io_addr;
struct intel_uncore_extra_reg shared_regs[0];
};

--
1.7.9.5

2014-02-03 12:56:41

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 10/10] perf/x86/uncore: use MiB unit for events for SNB/IVB/HSW IMC

This patch makes perf use Mebibytes to display the counts
of uncore_imc/data_reads/ and uncore_imc/data_writes.

1MiB = 1024*1024 bytes.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index f76937e..51dced8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1681,12 +1681,12 @@ enum {

static struct uncore_event_desc snb_uncore_imc_events[] = {
INTEL_UNCORE_EVENT_DESC(data_reads, "event=0x01"),
- INTEL_UNCORE_EVENT_DESC(data_reads.scale, "64"),
- INTEL_UNCORE_EVENT_DESC(data_reads.unit, "bytes"),
+ INTEL_UNCORE_EVENT_DESC(data_reads.scale, "6.103515625e-5"),
+ INTEL_UNCORE_EVENT_DESC(data_reads.unit, "MiB"),

INTEL_UNCORE_EVENT_DESC(data_writes, "event=0x02"),
- INTEL_UNCORE_EVENT_DESC(data_writes.scale, "64"),
- INTEL_UNCORE_EVENT_DESC(data_writes.unit, "bytes"),
+ INTEL_UNCORE_EVENT_DESC(data_writes.scale, "6.103515625e-5"),
+ INTEL_UNCORE_EVENT_DESC(data_writes.unit, "MiB"),

{ /* end: all zeroes */ },
};
--
1.7.9.5

2014-02-03 12:57:25

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 09/10] perf/x86/uncore: add hrtimer to SNB uncore IMC PMU

This patch is needed because that PMU uses 32-bit free
running counters with no interrupt capabilities.

On SNB/IVB/HSW, we used 20GB/s theoretical peak to calculate
the hrtimer timeout necessary to avoid missing an overflow.
That delay is set to 5s to be on the cautious side.

The SNB IMC uses free running counters, which are handled
via pseudo fixed counters. The SNB IMC PMU implementation
supports an arbitrary number of events, because the counters
are read-only. Therefore it is not possible to track active
counters. Instead we put active events on a linked list which
is then used by the hrtimer handler to update the SW counts.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 12 ++++++++++++
arch/x86/kernel/cpu/perf_event_intel_uncore.h | 2 ++
2 files changed, 14 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 8b1f81f..f76937e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -1730,6 +1730,7 @@ static void snb_uncore_imc_init_box(struct intel_uncore_box *box)
addr &= ~(PAGE_SIZE - 1);

box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
+ box->hrtimer_duration = UNCORE_SNB_IMC_HRTIMER_INTERVAL;
}

static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)
@@ -3166,6 +3167,7 @@ static void uncore_perf_event_update(struct intel_uncore_box *box, struct perf_e
static enum hrtimer_restart uncore_pmu_hrtimer(struct hrtimer *hrtimer)
{
struct intel_uncore_box *box;
+ struct perf_event *event;
unsigned long flags;
int bit;

@@ -3178,6 +3180,14 @@ static enum hrtimer_restart uncore_pmu_hrtimer(struct hrtimer *hrtimer)
*/
local_irq_save(flags);

+ /*
+ * handle boxes with an active event list as opposed to active
+ * counters
+ */
+ list_for_each_entry(event, &box->active_list, active_entry) {
+ uncore_perf_event_update(box, event);
+ }
+
for_each_set_bit(bit, box->active_mask, UNCORE_PMC_IDX_MAX)
uncore_perf_event_update(box, box->events[bit]);

@@ -3227,6 +3237,8 @@ static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type,
/* set default hrtimer timeout */
box->hrtimer_duration = UNCORE_PMU_HRTIMER_INTERVAL;

+ INIT_LIST_HEAD(&box->active_list);
+
return box;
}

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index 0770da2..634de93 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -6,6 +6,7 @@

#define UNCORE_PMU_NAME_LEN 32
#define UNCORE_PMU_HRTIMER_INTERVAL (60LL * NSEC_PER_SEC)
+#define UNCORE_SNB_IMC_HRTIMER_INTERVAL (5ULL * NSEC_PER_SEC)

#define UNCORE_FIXED_EVENT 0xff
#define UNCORE_PMC_IDX_MAX_GENERIC 8
@@ -492,6 +493,7 @@ struct intel_uncore_box {
u64 hrtimer_duration; /* hrtimer timeout for this box */
struct hrtimer hrtimer;
struct list_head list;
+ struct list_head active_list;
void *io_addr;
struct intel_uncore_extra_reg shared_regs[0];
};
--
1.7.9.5

2014-02-03 12:58:11

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 06/10] perf/x86/uncore: move uncore_event_to_box() and uncore_pmu_to_box()

Move a couple of functions around to avoid forward declarations
when we add code later on.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 73 +++++++++++++------------
1 file changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index ea823b8..2980994c 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -66,6 +66,43 @@ DEFINE_UNCORE_FORMAT_ATTR(mask_vnw, mask_vnw, "config2:3-4");
DEFINE_UNCORE_FORMAT_ATTR(mask0, mask0, "config2:0-31");
DEFINE_UNCORE_FORMAT_ATTR(mask1, mask1, "config2:32-63");

+static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
+{
+ return container_of(event->pmu, struct intel_uncore_pmu, pmu);
+}
+
+static struct intel_uncore_box *
+uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
+{
+ struct intel_uncore_box *box;
+
+ box = *per_cpu_ptr(pmu->box, cpu);
+ if (box)
+ return box;
+
+ raw_spin_lock(&uncore_box_lock);
+ list_for_each_entry(box, &pmu->box_list, list) {
+ if (box->phys_id == topology_physical_package_id(cpu)) {
+ atomic_inc(&box->refcnt);
+ *per_cpu_ptr(pmu->box, cpu) = box;
+ break;
+ }
+ }
+ raw_spin_unlock(&uncore_box_lock);
+
+ return *per_cpu_ptr(pmu->box, cpu);
+}
+
+static struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
+{
+ int cpu = smp_processor_id();
+ /*
+ * perf core schedules event on the basis of cpu, uncore events are
+ * collected by one of the cpus inside a physical package.
+ */
+ return uncore_pmu_to_box(uncore_event_to_pmu(event), cpu);
+}
+
static u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event *event)
{
u64 count;
@@ -2845,42 +2882,6 @@ static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type,
return box;
}

-static struct intel_uncore_box *
-uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
-{
- struct intel_uncore_box *box;
-
- box = *per_cpu_ptr(pmu->box, cpu);
- if (box)
- return box;
-
- raw_spin_lock(&uncore_box_lock);
- list_for_each_entry(box, &pmu->box_list, list) {
- if (box->phys_id == topology_physical_package_id(cpu)) {
- atomic_inc(&box->refcnt);
- *per_cpu_ptr(pmu->box, cpu) = box;
- break;
- }
- }
- raw_spin_unlock(&uncore_box_lock);
-
- return *per_cpu_ptr(pmu->box, cpu);
-}
-
-static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
-{
- return container_of(event->pmu, struct intel_uncore_pmu, pmu);
-}
-
-static struct intel_uncore_box *uncore_event_to_box(struct perf_event *event)
-{
- /*
- * perf core schedules event on the basis of cpu, uncore events are
- * collected by one of the cpus inside a physical package.
- */
- return uncore_pmu_to_box(uncore_event_to_pmu(event), smp_processor_id());
-}
-
static int
uncore_collect_events(struct intel_uncore_box *box, struct perf_event *leader, bool dogrp)
{
--
1.7.9.5

2014-02-03 12:58:46

by Stephane Eranian

[permalink] [raw]
Subject: [PATCH v1 05/10] perf/x86/uncore: make hrtimer timeout configurable per box

This patch makes the hrtimer timeout configurable per PMU
box. Not all counters have necessarily the same width and
rate, thus the default timeout of 60s may need to be adjusted.

This patch adds box->hrtimer_duration. It is set to default
when the box is allocated. It can be overriden when the box
is initialized.

Signed-off-by: Stephane Eranian <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 7 +++++--
arch/x86/kernel/cpu/perf_event_intel_uncore.h | 1 +
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index e6f32b3..ea823b8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -2798,14 +2798,14 @@ static enum hrtimer_restart uncore_pmu_hrtimer(struct hrtimer *hrtimer)

local_irq_restore(flags);

- hrtimer_forward_now(hrtimer, ns_to_ktime(UNCORE_PMU_HRTIMER_INTERVAL));
+ hrtimer_forward_now(hrtimer, ns_to_ktime(box->hrtimer_duration));
return HRTIMER_RESTART;
}

static void uncore_pmu_start_hrtimer(struct intel_uncore_box *box)
{
__hrtimer_start_range_ns(&box->hrtimer,
- ns_to_ktime(UNCORE_PMU_HRTIMER_INTERVAL), 0,
+ ns_to_ktime(box->hrtimer_duration), 0,
HRTIMER_MODE_REL_PINNED, 0);
}

@@ -2839,6 +2839,9 @@ static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type,
box->cpu = -1;
box->phys_id = -1;

+ /* set default hrtimer timeout */
+ box->hrtimer_duration = UNCORE_PMU_HRTIMER_INTERVAL;
+
return box;
}

diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index f5549cf..433c180 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -489,6 +489,7 @@ struct intel_uncore_box {
u64 tags[UNCORE_PMC_IDX_MAX];
struct pci_dev *pci_dev;
struct intel_uncore_pmu *pmu;
+ u64 hrtimer_duration; /* hrtimer timeout for this box */
struct hrtimer hrtimer;
struct list_head list;
struct intel_uncore_extra_reg shared_regs[0];
--
1.7.9.5

2014-02-10 02:49:46

by Yan, Zheng

[permalink] [raw]
Subject: Re: [PATCH v1 01/10] perf/x86/uncore: fix initialization of cpumask

why not something like below. I think it's simpler.

---
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index 29c2487..169ef4a 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -3799,9 +3799,6 @@ static int __init uncore_cpu_init(void)
ivt_uncore_cbox.num_boxes = max_cores;
msr_uncores = ivt_msr_uncores;
break;
-
- default:
- return 0;
}

ret = uncore_types_init(msr_uncores);


Regards
Yan, Zheng


On 02/03/2014 08:55 PM, Stephane Eranian wrote:
> On certain processors, the uncore PMU boxes may only be
> msr-bsed or PCI-based. But in both cases, the cpumask,
> suggesting on which CPUs to monitor to get full coverage
> of the particular PMU, must be created.
>
> However with the current code base, the cpumask was only
> created on processor which had at least one MSR-based
> uncore PMU. This patch removes that restriction and
> ensures the cpumask is created even when there is no
> msr-based PMU. For instance, on SNB client where only
> a PCI-based memory controller PMU is supported.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel_uncore.c | 61 +++++++++++++++----------
> 1 file changed, 37 insertions(+), 24 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> index 29c2487..fe4255b 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -3764,7 +3764,7 @@ static void __init uncore_cpu_setup(void *dummy)
>
> static int __init uncore_cpu_init(void)
> {
> - int ret, cpu, max_cores;
> + int ret, max_cores;
>
> max_cores = boot_cpu_data.x86_max_cores;
> switch (boot_cpu_data.x86_model) {
> @@ -3808,29 +3808,6 @@ static int __init uncore_cpu_init(void)
> if (ret)
> return ret;
>
> - get_online_cpus();
> -
> - for_each_online_cpu(cpu) {
> - int i, phys_id = topology_physical_package_id(cpu);
> -
> - for_each_cpu(i, &uncore_cpu_mask) {
> - if (phys_id == topology_physical_package_id(i)) {
> - phys_id = -1;
> - break;
> - }
> - }
> - if (phys_id < 0)
> - continue;
> -
> - uncore_cpu_prepare(cpu, phys_id);
> - uncore_event_init_cpu(cpu);
> - }
> - on_each_cpu(uncore_cpu_setup, NULL, 1);
> -
> - register_cpu_notifier(&uncore_cpu_nb);
> -
> - put_online_cpus();
> -
> return 0;
> }
>
> @@ -3859,6 +3836,41 @@ static int __init uncore_pmus_register(void)
> return 0;
> }
>
> +static void uncore_cpumask_init(void)
> +{
> + int cpu;
> +
> + /*
> + * ony invoke once from msr or pci init code
> + */
> + if (!cpumask_empty(&uncore_cpu_mask))
> + return;
> +
> + get_online_cpus();
> +
> + for_each_online_cpu(cpu) {
> + int i, phys_id = topology_physical_package_id(cpu);
> +
> + for_each_cpu(i, &uncore_cpu_mask) {
> + if (phys_id == topology_physical_package_id(i)) {
> + phys_id = -1;
> + break;
> + }
> + }
> + if (phys_id < 0)
> + continue;
> +
> + uncore_cpu_prepare(cpu, phys_id);
> + uncore_event_init_cpu(cpu);
> + }
> + on_each_cpu(uncore_cpu_setup, NULL, 1);
> +
> + register_cpu_notifier(&uncore_cpu_nb);
> +
> + put_online_cpus();
> +}
> +
> +
> static int __init intel_uncore_init(void)
> {
> int ret;
> @@ -3877,6 +3889,7 @@ static int __init intel_uncore_init(void)
> uncore_pci_exit();
> goto fail;
> }
> + uncore_cpumask_init();
>
> uncore_pmus_register();
> return 0;
>

2014-02-10 03:11:46

by Yan, Zheng

[permalink] [raw]
Subject: Re: [PATCH v1 03/10] perf/x86/uncore: do not assume PCI fixed ctrs have more than 32 bits

On 02/03/2014 08:55 PM, Stephane Eranian wrote:
> The current code assumes all PCI fixed counters implement more than
> 32-bit hardware counters. The actual width is then round up to 64 to
> enable base + 8 * idx calculations.
>
> Not all PMUs necessarily implement counters with more than 32-bits.
> The patch makes the uncore_pci_perf_ctr() function dynamically
> determine the actual bits width of a pci perf counter.
>
> This patch paves the way for handling more than one uncore fixed
> counter per PMU box.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel_uncore.h | 31 ++++++++++++++++---------
> 1 file changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> index 77dc9a5..f5549cf 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> @@ -548,10 +548,29 @@ unsigned uncore_pci_event_ctl(struct intel_uncore_box *box, int idx)
> return idx * 4 + box->pmu->type->event_ctl;
> }
>
> +static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
> +{
> + return box->pmu->type->perf_ctr_bits;
> +}
> +
> +static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
> +{
> + return box->pmu->type->fixed_ctr_bits;
> +}
> +
> static inline
> unsigned uncore_pci_perf_ctr(struct intel_uncore_box *box, int idx)
> {
> - return idx * 8 + box->pmu->type->perf_ctr;
> + int bits, bytes;
> +
> + if (idx == UNCORE_PMC_IDX_FIXED)
> + bits = uncore_fixed_ctr_bits(box);
> + else
> + bits = uncore_perf_ctr_bits(box);
> +
> + bytes = round_up(bits, 8);

should this be "round_up(bits, 32) / 8" ?

Regards
Yan, Zheng

> +
> + return idx * bytes + box->pmu->type->perf_ctr;
> }
>
> static inline unsigned uncore_msr_box_offset(struct intel_uncore_box *box)
> @@ -633,16 +652,6 @@ unsigned uncore_perf_ctr(struct intel_uncore_box *box, int idx)
> return uncore_msr_perf_ctr(box, idx);
> }
>
> -static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
> -{
> - return box->pmu->type->perf_ctr_bits;
> -}
> -
> -static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
> -{
> - return box->pmu->type->fixed_ctr_bits;
> -}
> -
> static inline int uncore_num_counters(struct intel_uncore_box *box)
> {
> return box->pmu->type->num_counters;
>

2014-02-10 03:17:16

by Yan, Zheng

[permalink] [raw]
Subject: Re: [PATCH v1 07/10] perf/x86/uncore: allow more than one fixed counter per box

On 02/03/2014 08:55 PM, Stephane Eranian wrote:
> This patch modifies some of the helper functions to support
> more than one fixed counter per uncore PCI PMU box.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel_uncore.c | 4 ++--
> arch/x86/kernel/cpu/perf_event_intel_uncore.h | 22 +++++++++++++---------
> 2 files changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> index 2980994c..69a4ad0 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -2777,8 +2777,8 @@ static void uncore_assign_hw_event(struct intel_uncore_box *box, struct perf_eve
> hwc->idx = idx;
> hwc->last_tag = ++box->tags[idx];
>
> - if (hwc->idx == UNCORE_PMC_IDX_FIXED) {
> - hwc->event_base = uncore_fixed_ctr(box);
> + if (hwc->idx >= UNCORE_PMC_IDX_FIXED) {
> + hwc->event_base = uncore_fixed_ctr(box, idx);
> hwc->config_base = uncore_fixed_ctl(box);
> return;
> }
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> index 433c180..c63a3ff 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> @@ -538,9 +538,18 @@ static inline unsigned uncore_pci_fixed_ctl(struct intel_uncore_box *box)
> return box->pmu->type->fixed_ctl;
> }
>
> -static inline unsigned uncore_pci_fixed_ctr(struct intel_uncore_box *box)
> +static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
> +{
> + return box->pmu->type->fixed_ctr_bits;
> +}
> +
> +static inline unsigned uncore_pci_fixed_ctr(struct intel_uncore_box *box,
> + int idx)
> {
> - return box->pmu->type->fixed_ctr;
> + int bits = uncore_fixed_ctr_bits(box);
> + int bytes = round_up(bits, 8);
> +
> + return idx * bytes + box->pmu->type->fixed_ctr;
> }

should this be:
int bytes = round_up(bits, 32) / 8;
return (idx - UNCORE_PMC_IDX_FIXED) * bytes + box->pmu->type->fixed_ctr;
?

Regards
Yan, Zheng

>
> static inline
> @@ -554,11 +563,6 @@ static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
> return box->pmu->type->perf_ctr_bits;
> }
>
> -static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
> -{
> - return box->pmu->type->fixed_ctr_bits;
> -}
> -
> static inline
> unsigned uncore_pci_perf_ctr(struct intel_uncore_box *box, int idx)
> {
> @@ -627,10 +631,10 @@ unsigned uncore_fixed_ctl(struct intel_uncore_box *box)
> }
>
> static inline
> -unsigned uncore_fixed_ctr(struct intel_uncore_box *box)
> +unsigned uncore_fixed_ctr(struct intel_uncore_box *box, int idx)
> {
> if (box->pci_dev)
> - return uncore_pci_fixed_ctr(box);
> + return uncore_pci_fixed_ctr(box, idx);
> else
> return uncore_msr_fixed_ctr(box);
> }
>

2014-02-10 06:04:51

by Yan, Zheng

[permalink] [raw]
Subject: Re: [PATCH v1 09/10] perf/x86/uncore: add hrtimer to SNB uncore IMC PMU

On 02/03/2014 08:55 PM, Stephane Eranian wrote:
> This patch is needed because that PMU uses 32-bit free
> running counters with no interrupt capabilities.
>
> On SNB/IVB/HSW, we used 20GB/s theoretical peak to calculate
> the hrtimer timeout necessary to avoid missing an overflow.
> That delay is set to 5s to be on the cautious side.
>
> The SNB IMC uses free running counters, which are handled
> via pseudo fixed counters. The SNB IMC PMU implementation
> supports an arbitrary number of events, because the counters
> are read-only. Therefore it is not possible to track active
> counters. Instead we put active events on a linked list which
> is then used by the hrtimer handler to update the SW counts.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel_uncore.c | 12 ++++++++++++
> arch/x86/kernel/cpu/perf_event_intel_uncore.h | 2 ++
> 2 files changed, 14 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> index 8b1f81f..f76937e 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -1730,6 +1730,7 @@ static void snb_uncore_imc_init_box(struct intel_uncore_box *box)
> addr &= ~(PAGE_SIZE - 1);
>
> box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
> + box->hrtimer_duration = UNCORE_SNB_IMC_HRTIMER_INTERVAL;
> }
>
> static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)
> @@ -3166,6 +3167,7 @@ static void uncore_perf_event_update(struct intel_uncore_box *box, struct perf_e
> static enum hrtimer_restart uncore_pmu_hrtimer(struct hrtimer *hrtimer)
> {
> struct intel_uncore_box *box;
> + struct perf_event *event;
> unsigned long flags;
> int bit;
>
> @@ -3178,6 +3180,14 @@ static enum hrtimer_restart uncore_pmu_hrtimer(struct hrtimer *hrtimer)
> */
> local_irq_save(flags);
>
> + /*
> + * handle boxes with an active event list as opposed to active
> + * counters
> + */
> + list_for_each_entry(event, &box->active_list, active_entry) {
> + uncore_perf_event_update(box, event);
> + }
> +
> for_each_set_bit(bit, box->active_mask, UNCORE_PMC_IDX_MAX)
> uncore_perf_event_update(box, box->events[bit]);
>
> @@ -3227,6 +3237,8 @@ static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type,
> /* set default hrtimer timeout */
> box->hrtimer_duration = UNCORE_PMU_HRTIMER_INTERVAL;
>
> + INIT_LIST_HEAD(&box->active_list);
> +
> return box;
> }
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> index 0770da2..634de93 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> @@ -6,6 +6,7 @@
>
> #define UNCORE_PMU_NAME_LEN 32
> #define UNCORE_PMU_HRTIMER_INTERVAL (60LL * NSEC_PER_SEC)
> +#define UNCORE_SNB_IMC_HRTIMER_INTERVAL (5ULL * NSEC_PER_SEC)
>
> #define UNCORE_FIXED_EVENT 0xff
> #define UNCORE_PMC_IDX_MAX_GENERIC 8
> @@ -492,6 +493,7 @@ struct intel_uncore_box {
> u64 hrtimer_duration; /* hrtimer timeout for this box */
> struct hrtimer hrtimer;
> struct list_head list;
> + struct list_head active_list;

I think patch 8 and patch 9 are disordered

Regards
Yan, Zheng

> void *io_addr;
> struct intel_uncore_extra_reg shared_regs[0];
> };
>

2014-02-10 06:27:51

by Yan, Zheng

[permalink] [raw]
Subject: Re: [PATCH v1 08/10] perf/x86/uncore: add SNB/IVB/HSW client uncore memory controller support

On 02/03/2014 08:55 PM, Stephane Eranian wrote:
> This patch adds a new uncore PMU for Intel SNB/IVB/HSW client
> CPUs. It adds the Integrated Memory Controller (IMC) PMU. This
> new PMU provides a set of events to measure memory bandwidth utilization.
>
> The IMC on those processor is PCI-space based. This patch
> exposes a new uncore PMU on those processor: uncore_imc
>
> Two new events are defined:
> - name: data_reads
> - code: 0x1
> - unit: 64 bytes
> - number of full cacheline read requests to the IMC
>
> - name: data_writes
> - code: 0x2
> - unit: 64 bytes
> - number of full cacheline write requests to the IMC
>
> Documentation available at:
> http://software.intel.com/en-us/articles/monitoring-integrated-memory-controller-requests-in-the-2nd-3rd-and-4th-generation-intel
>
> Signed-off-by: Stephane Eranian <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel_uncore.c | 370 +++++++++++++++++++++++++
> arch/x86/kernel/cpu/perf_event_intel_uncore.h | 1 +
> 2 files changed, 371 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> index 69a4ad0..8b1f81f 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -66,6 +66,12 @@ DEFINE_UNCORE_FORMAT_ATTR(mask_vnw, mask_vnw, "config2:3-4");
> DEFINE_UNCORE_FORMAT_ATTR(mask0, mask0, "config2:0-31");
> DEFINE_UNCORE_FORMAT_ATTR(mask1, mask1, "config2:32-63");
>
> +static void uncore_pmu_start_hrtimer(struct intel_uncore_box *box);
> +static void uncore_pmu_cancel_hrtimer(struct intel_uncore_box *box);
> +static void uncore_perf_event_update(struct intel_uncore_box *box,
> + struct perf_event *event);
> +static void uncore_pmu_event_read(struct perf_event *event);
> +
> static struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
> {
> return container_of(event->pmu, struct intel_uncore_pmu, pmu);
> @@ -1668,6 +1674,348 @@ static struct intel_uncore_type *snb_msr_uncores[] = {
> &snb_uncore_cbox,
> NULL,
> };
> +
> +enum {
> + SNB_PCI_UNCORE_IMC,
> +};
> +
> +static struct uncore_event_desc snb_uncore_imc_events[] = {
> + INTEL_UNCORE_EVENT_DESC(data_reads, "event=0x01"),
> + INTEL_UNCORE_EVENT_DESC(data_reads.scale, "64"),
> + INTEL_UNCORE_EVENT_DESC(data_reads.unit, "bytes"),
> +
> + INTEL_UNCORE_EVENT_DESC(data_writes, "event=0x02"),
> + INTEL_UNCORE_EVENT_DESC(data_writes.scale, "64"),
> + INTEL_UNCORE_EVENT_DESC(data_writes.unit, "bytes"),
> +
> + { /* end: all zeroes */ },
> +};
> +
> +#define SNB_UNCORE_PCI_IMC_EVENT_MASK 0xff
> +#define SNB_UNCORE_PCI_IMC_BAR_OFFSET 0x48
> +
> +/* page size multiple covering all config regs */
> +#define SNB_UNCORE_PCI_IMC_MAP_SIZE 0x6000
> +
> +#define SNB_UNCORE_PCI_IMC_DATA_READS 0x1
> +#define SNB_UNCORE_PCI_IMC_DATA_READS_BASE 0x5050
> +#define SNB_UNCORE_PCI_IMC_DATA_WRITES 0x2
> +#define SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE 0x5054
> +#define SNB_UNCORE_PCI_IMC_CTR_BASE 0x5050
> +
> +static struct attribute *snb_uncore_imc_formats_attr[] = {
> + &format_attr_event.attr,
> + NULL,
> +};
> +
> +static struct attribute_group snb_uncore_imc_format_group = {
> + .name = "format",
> + .attrs = snb_uncore_imc_formats_attr,
> +};
> +
> +static void snb_uncore_imc_init_box(struct intel_uncore_box *box)
> +{
> + struct pci_dev *pdev = box->pci_dev;
> + u32 addr_lo, addr_hi;
> + resource_size_t addr;
> +
> + pci_read_config_dword(pdev, SNB_UNCORE_PCI_IMC_BAR_OFFSET, &addr_lo);
> + addr = addr_lo;
> +
> +#ifdef CONFIG_PHYS_ADDR_T_64BIT
> + pci_read_config_dword(pdev, SNB_UNCORE_PCI_IMC_BAR_OFFSET+4, &addr_hi);
> + addr = ((resource_size_t)addr_hi << 32) | addr_lo;
> +#endif
> +
> + addr &= ~(PAGE_SIZE - 1);
> +
> + box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
> +}
> +
> +static void snb_uncore_imc_enable_box(struct intel_uncore_box *box)
> +{}
> +
> +static void snb_uncore_imc_disable_box(struct intel_uncore_box *box)
> +{}
> +
> +static void snb_uncore_imc_enable_event(struct intel_uncore_box *box,
> + struct perf_event *event)
> +{}
> +
> +static void snb_uncore_imc_disable_event(struct intel_uncore_box *box,
> + struct perf_event *event)
> +{}
> +
> +static u64 snb_uncore_imc_read_counter(struct intel_uncore_box *box,
> + struct perf_event *event)
> +{
> + struct hw_perf_event *hwc = &event->hw;
> +
> + return (u64)*(unsigned int *)(box->io_addr + hwc->event_base);
> +}
> +
> +/*
> + * custom event_init() function because we define our own fixed, free
> + * running counters, so we do not want to conflict with generic uncore
> + * logic. Also simplifies processing
> + */
> +static int snb_uncore_imc_event_init(struct perf_event *event)
> +{
> + struct intel_uncore_pmu *pmu;
> + struct intel_uncore_box *box;
> + struct hw_perf_event *hwc = &event->hw;
> + u64 cfg = event->attr.config & SNB_UNCORE_PCI_IMC_EVENT_MASK;
> + int idx, base;
> +
> + if (event->attr.type != event->pmu->type)
> + return -ENOENT;
> +
> + pmu = uncore_event_to_pmu(event);
> + /* no device found for this pmu */
> + if (pmu->func_id < 0)
> + return -ENOENT;
> +
> + /* Sampling not supported yet */
> + if (hwc->sample_period)
> + return -EINVAL;
> +
> + /* unsupported modes and filters */
> + if (event->attr.exclude_user ||
> + event->attr.exclude_kernel ||
> + event->attr.exclude_hv ||
> + event->attr.exclude_idle ||
> + event->attr.exclude_host ||
> + event->attr.exclude_guest ||
> + event->attr.sample_period) /* no sampling */
> + return -EINVAL;
> +
> + /*
> + * Place all uncore events for a particular physical package
> + * onto a single cpu
> + */
> + if (event->cpu < 0)
> + return -EINVAL;
> +
> + /* check only supported bits are set */
> + if (event->attr.config & ~SNB_UNCORE_PCI_IMC_EVENT_MASK)
> + return -EINVAL;
> +
> + box = uncore_pmu_to_box(pmu, event->cpu);
> + if (!box || box->cpu < 0)
> + return -EINVAL;
> +
> + event->cpu = box->cpu;
> +
> + event->hw.idx = -1;
> + event->hw.last_tag = ~0ULL;
> + event->hw.extra_reg.idx = EXTRA_REG_NONE;
> + event->hw.branch_reg.idx = EXTRA_REG_NONE;
> + /*
> + * check event is known (whitelist, determines counter)
> + */
> + switch (cfg) {
> + case SNB_UNCORE_PCI_IMC_DATA_READS:
> + base = SNB_UNCORE_PCI_IMC_DATA_READS_BASE;
> + idx = UNCORE_PMC_IDX_FIXED;
> + break;
> + case SNB_UNCORE_PCI_IMC_DATA_WRITES:
> + base = SNB_UNCORE_PCI_IMC_DATA_WRITES_BASE;
> + idx = UNCORE_PMC_IDX_FIXED + 1;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + /* must be done before validate_group */
> + event->hw.event_base = base;
> + event->hw.config = cfg;
> + event->hw.idx = idx;
> +
> + /* no group validation needed, we have free running counters */
> +
> + return 0;
> +}
> +
> +static int snb_uncore_imc_hw_config(struct intel_uncore_box *box,
> + struct perf_event *event)
> +{
> + return 0;
> +}
> +
> +static void snb_uncore_imc_event_start(struct perf_event *event, int flags)
> +{
> + struct intel_uncore_box *box = uncore_event_to_box(event);
> + u64 count;
> +
> + if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> + return;
> +
> + event->hw.state = 0;
> + box->n_active++;
> +
> + list_add_tail(&event->active_entry, &box->active_list);
> +
> + count = snb_uncore_imc_read_counter(box, event);
> + local64_set(&event->hw.prev_count, count);
> +
> + if (box->n_active == 1)
> + uncore_pmu_start_hrtimer(box);
> +}
> +
> +static void snb_uncore_imc_event_stop(struct perf_event *event, int flags)
> +{
> + struct intel_uncore_box *box = uncore_event_to_box(event);
> + struct hw_perf_event *hwc = &event->hw;
> +
> + if (!(hwc->state & PERF_HES_STOPPED)) {
> + box->n_active--;
> +
> + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
> + hwc->state |= PERF_HES_STOPPED;
> +
> + list_del(&event->active_entry);
> +
> + if (box->n_active == 0)
> + uncore_pmu_cancel_hrtimer(box);
> + }
> +
> + if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> + /*
> + * Drain the remaining delta count out of a event
> + * that we are disabling:
> + */
> + uncore_perf_event_update(box, event);
> + hwc->state |= PERF_HES_UPTODATE;
> + }
> +}
> +
> +static int snb_uncore_imc_event_add(struct perf_event *event, int flags)
> +{
> + struct intel_uncore_box *box = uncore_event_to_box(event);
> + struct hw_perf_event *hwc = &event->hw;
> +
> + if (!box)
> + return -ENODEV;
> +
> + hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> + if (!(flags & PERF_EF_START))
> + hwc->state |= PERF_HES_ARCH;
> +
> + snb_uncore_imc_event_start(event, 0);
> +
> + box->n_events++;
> +
> + return 0;
> +}
> +
> +static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
> +{
> + struct intel_uncore_box *box = uncore_event_to_box(event);
> + int i;
> +
> + snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
> +
> + for (i = 0; i < box->n_events; i++) {
> + if (event == box->event_list[i]) {
> + --box->n_events;
> + break;
> + }
> + }
> +}

no need to update n_events and event_list, they are not used. the rest of the patch looks good.

Regards
Yan, Zheng
> +
> +static int snb_pci2phy_map_init(int devid)
> +{
> + struct pci_dev *dev = NULL;
> + int bus;
> +
> + dev = pci_get_device(PCI_VENDOR_ID_INTEL, devid, dev);
> + if (!dev)
> + return -ENOTTY;
> +
> + bus = dev->bus->number;
> +
> + pcibus_to_physid[bus] = 0;
> +
> + pci_dev_put(dev);
> +
> + return 0;
> +}
> +
> +static struct pmu snb_uncore_imc_pmu = {
> + .task_ctx_nr = perf_invalid_context,
> + .event_init = snb_uncore_imc_event_init,
> + .add = snb_uncore_imc_event_add,
> + .del = snb_uncore_imc_event_del,
> + .start = snb_uncore_imc_event_start,
> + .stop = snb_uncore_imc_event_stop,
> + .read = uncore_pmu_event_read,
> +};
> +
> +static struct intel_uncore_ops snb_uncore_imc_ops = {
> + .init_box = snb_uncore_imc_init_box,
> + .enable_box = snb_uncore_imc_enable_box,
> + .disable_box = snb_uncore_imc_disable_box,
> + .disable_event = snb_uncore_imc_disable_event,
> + .enable_event = snb_uncore_imc_enable_event,
> + .hw_config = snb_uncore_imc_hw_config,
> + .read_counter = snb_uncore_imc_read_counter,
> +};
> +
> +static struct intel_uncore_type snb_uncore_imc = {
> + .name = "imc",
> + .num_counters = 2,
> + .num_boxes = 1,
> + .fixed_ctr_bits = 32,
> + .fixed_ctr = SNB_UNCORE_PCI_IMC_CTR_BASE,
> + .event_descs = snb_uncore_imc_events,
> + .format_group = &snb_uncore_imc_format_group,
> + .perf_ctr = SNB_UNCORE_PCI_IMC_DATA_READS_BASE,
> + .event_mask = SNB_UNCORE_PCI_IMC_EVENT_MASK,
> + .ops = &snb_uncore_imc_ops,
> + .pmu = &snb_uncore_imc_pmu,
> +};
> +
> +static struct intel_uncore_type *snb_pci_uncores[] = {
> + [SNB_PCI_UNCORE_IMC] = &snb_uncore_imc,
> + NULL,
> +};
> +
> +static DEFINE_PCI_DEVICE_TABLE(snb_uncore_pci_ids) = {
> + { /* IMC */
> + PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_SNB_IMC),
> + .driver_data = UNCORE_PCI_DEV_DATA(SNB_PCI_UNCORE_IMC, 0),
> + },
> +};
> +
> +static DEFINE_PCI_DEVICE_TABLE(ivb_uncore_pci_ids) = {
> + { /* IMC */
> + PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_IMC),
> + .driver_data = UNCORE_PCI_DEV_DATA(SNB_PCI_UNCORE_IMC, 0),
> + },
> +};
> +
> +static DEFINE_PCI_DEVICE_TABLE(hsw_uncore_pci_ids) = {
> + { /* IMC */
> + PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_HSW_IMC),
> + .driver_data = UNCORE_PCI_DEV_DATA(SNB_PCI_UNCORE_IMC, 0),
> + },
> +};
> +
> +static struct pci_driver snb_uncore_pci_driver = {
> + .name = "snb_uncore",
> + .id_table = snb_uncore_pci_ids,
> +};
> +
> +static struct pci_driver ivb_uncore_pci_driver = {
> + .name = "ivb_uncore",
> + .id_table = ivb_uncore_pci_ids,
> +};
> +
> +static struct pci_driver hsw_uncore_pci_driver = {
> + .name = "hsw_uncore",
> + .id_table = hsw_uncore_pci_ids,
> +};
> +
> /* end of Sandy Bridge uncore support */
>
> /* Nehalem uncore support */
> @@ -3502,6 +3850,28 @@ static int __init uncore_pci_init(void)
> pci_uncores = ivt_pci_uncores;
> uncore_pci_driver = &ivt_uncore_pci_driver;
> break;
> + case 42: /* Sandy Bridge */
> + ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_SNB_IMC);
> + if (ret)
> + return ret;
> + pci_uncores = snb_pci_uncores;
> + uncore_pci_driver = &snb_uncore_pci_driver;
> + break;
> + case 60: /* Haswell */
> + case 69: /* Haswell Celeron */
> + ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_HSW_IMC);
> + if (ret)
> + return ret;
> + pci_uncores = snb_pci_uncores;
> + uncore_pci_driver = &hsw_uncore_pci_driver;
> + break;
> + case 58: /* Ivy Bridge */
> + ret = snb_pci2phy_map_init(PCI_DEVICE_ID_INTEL_IVB_IMC);
> + if (ret)
> + return ret;
> + pci_uncores = snb_pci_uncores;
> + uncore_pci_driver = &ivb_uncore_pci_driver;
> + break;
> default:
> return 0;
> }
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> index c63a3ff..0770da2 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> @@ -492,6 +492,7 @@ struct intel_uncore_box {
> u64 hrtimer_duration; /* hrtimer timeout for this box */
> struct hrtimer hrtimer;
> struct list_head list;
> + void *io_addr;
> struct intel_uncore_extra_reg shared_regs[0];
> };
>
>

2014-02-11 11:12:25

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH v1 01/10] perf/x86/uncore: fix initialization of cpumask

On Mon, Feb 10, 2014 at 3:49 AM, Yan, Zheng <[email protected]> wrote:
> why not something like below. I think it's simpler.
>
I don't like that too much. It assumes msr_uncores is initialized properly.
I have seen compiler complaining about missing default: case.

> ---
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> index 29c2487..169ef4a 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -3799,9 +3799,6 @@ static int __init uncore_cpu_init(void)
> ivt_uncore_cbox.num_boxes = max_cores;
> msr_uncores = ivt_msr_uncores;
> break;
> -
> - default:
> - return 0;
> }
>
> ret = uncore_types_init(msr_uncores);
>
>
> Regards
> Yan, Zheng
>
>
> On 02/03/2014 08:55 PM, Stephane Eranian wrote:
>> On certain processors, the uncore PMU boxes may only be
>> msr-bsed or PCI-based. But in both cases, the cpumask,
>> suggesting on which CPUs to monitor to get full coverage
>> of the particular PMU, must be created.
>>
>> However with the current code base, the cpumask was only
>> created on processor which had at least one MSR-based
>> uncore PMU. This patch removes that restriction and
>> ensures the cpumask is created even when there is no
>> msr-based PMU. For instance, on SNB client where only
>> a PCI-based memory controller PMU is supported.
>>
>> Signed-off-by: Stephane Eranian <[email protected]>
>> ---
>> arch/x86/kernel/cpu/perf_event_intel_uncore.c | 61 +++++++++++++++----------
>> 1 file changed, 37 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
>> index 29c2487..fe4255b 100644
>> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
>> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
>> @@ -3764,7 +3764,7 @@ static void __init uncore_cpu_setup(void *dummy)
>>
>> static int __init uncore_cpu_init(void)
>> {
>> - int ret, cpu, max_cores;
>> + int ret, max_cores;
>>
>> max_cores = boot_cpu_data.x86_max_cores;
>> switch (boot_cpu_data.x86_model) {
>> @@ -3808,29 +3808,6 @@ static int __init uncore_cpu_init(void)
>> if (ret)
>> return ret;
>>
>> - get_online_cpus();
>> -
>> - for_each_online_cpu(cpu) {
>> - int i, phys_id = topology_physical_package_id(cpu);
>> -
>> - for_each_cpu(i, &uncore_cpu_mask) {
>> - if (phys_id == topology_physical_package_id(i)) {
>> - phys_id = -1;
>> - break;
>> - }
>> - }
>> - if (phys_id < 0)
>> - continue;
>> -
>> - uncore_cpu_prepare(cpu, phys_id);
>> - uncore_event_init_cpu(cpu);
>> - }
>> - on_each_cpu(uncore_cpu_setup, NULL, 1);
>> -
>> - register_cpu_notifier(&uncore_cpu_nb);
>> -
>> - put_online_cpus();
>> -
>> return 0;
>> }
>>
>> @@ -3859,6 +3836,41 @@ static int __init uncore_pmus_register(void)
>> return 0;
>> }
>>
>> +static void uncore_cpumask_init(void)
>> +{
>> + int cpu;
>> +
>> + /*
>> + * ony invoke once from msr or pci init code
>> + */
>> + if (!cpumask_empty(&uncore_cpu_mask))
>> + return;
>> +
>> + get_online_cpus();
>> +
>> + for_each_online_cpu(cpu) {
>> + int i, phys_id = topology_physical_package_id(cpu);
>> +
>> + for_each_cpu(i, &uncore_cpu_mask) {
>> + if (phys_id == topology_physical_package_id(i)) {
>> + phys_id = -1;
>> + break;
>> + }
>> + }
>> + if (phys_id < 0)
>> + continue;
>> +
>> + uncore_cpu_prepare(cpu, phys_id);
>> + uncore_event_init_cpu(cpu);
>> + }
>> + on_each_cpu(uncore_cpu_setup, NULL, 1);
>> +
>> + register_cpu_notifier(&uncore_cpu_nb);
>> +
>> + put_online_cpus();
>> +}
>> +
>> +
>> static int __init intel_uncore_init(void)
>> {
>> int ret;
>> @@ -3877,6 +3889,7 @@ static int __init intel_uncore_init(void)
>> uncore_pci_exit();
>> goto fail;
>> }
>> + uncore_cpumask_init();
>>
>> uncore_pmus_register();
>> return 0;
>>
>

2014-02-11 13:54:54

by Stephane Eranian

[permalink] [raw]
Subject: Re: [PATCH v1 03/10] perf/x86/uncore: do not assume PCI fixed ctrs have more than 32 bits

Yan,

In fact, I realized I was not even using all those fixed counter
changes, thanks to the
pmu bypass from patch 1. So I will drop all of this in V2. It will
make the patch simpler.

Thanks for catching this.


On Mon, Feb 10, 2014 at 4:11 AM, Yan, Zheng <[email protected]> wrote:
> On 02/03/2014 08:55 PM, Stephane Eranian wrote:
>> The current code assumes all PCI fixed counters implement more than
>> 32-bit hardware counters. The actual width is then round up to 64 to
>> enable base + 8 * idx calculations.
>>
>> Not all PMUs necessarily implement counters with more than 32-bits.
>> The patch makes the uncore_pci_perf_ctr() function dynamically
>> determine the actual bits width of a pci perf counter.
>>
>> This patch paves the way for handling more than one uncore fixed
>> counter per PMU box.
>>
>> Signed-off-by: Stephane Eranian <[email protected]>
>> ---
>> arch/x86/kernel/cpu/perf_event_intel_uncore.h | 31 ++++++++++++++++---------
>> 1 file changed, 20 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
>> index 77dc9a5..f5549cf 100644
>> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
>> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
>> @@ -548,10 +548,29 @@ unsigned uncore_pci_event_ctl(struct intel_uncore_box *box, int idx)
>> return idx * 4 + box->pmu->type->event_ctl;
>> }
>>
>> +static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
>> +{
>> + return box->pmu->type->perf_ctr_bits;
>> +}
>> +
>> +static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
>> +{
>> + return box->pmu->type->fixed_ctr_bits;
>> +}
>> +
>> static inline
>> unsigned uncore_pci_perf_ctr(struct intel_uncore_box *box, int idx)
>> {
>> - return idx * 8 + box->pmu->type->perf_ctr;
>> + int bits, bytes;
>> +
>> + if (idx == UNCORE_PMC_IDX_FIXED)
>> + bits = uncore_fixed_ctr_bits(box);
>> + else
>> + bits = uncore_perf_ctr_bits(box);
>> +
>> + bytes = round_up(bits, 8);
>
> should this be "round_up(bits, 32) / 8" ?
>
> Regards
> Yan, Zheng
>
>> +
>> + return idx * bytes + box->pmu->type->perf_ctr;
>> }
>>
>> static inline unsigned uncore_msr_box_offset(struct intel_uncore_box *box)
>> @@ -633,16 +652,6 @@ unsigned uncore_perf_ctr(struct intel_uncore_box *box, int idx)
>> return uncore_msr_perf_ctr(box, idx);
>> }
>>
>> -static inline int uncore_perf_ctr_bits(struct intel_uncore_box *box)
>> -{
>> - return box->pmu->type->perf_ctr_bits;
>> -}
>> -
>> -static inline int uncore_fixed_ctr_bits(struct intel_uncore_box *box)
>> -{
>> - return box->pmu->type->fixed_ctr_bits;
>> -}
>> -
>> static inline int uncore_num_counters(struct intel_uncore_box *box)
>> {
>> return box->pmu->type->num_counters;
>>
>