2015-06-25 09:57:06

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v3 0/7]powerpc/powernv: Nest Instrumentation support

This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.

Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.

OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.

Kernel code then, parses the device-tree for nest pmu support
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip reserved memory region, hence
device-tree also exports per-chip nest accumulation memory region.
And individual event offset are used as event configuration values.

Here is sample perf usage to explain the interface.

#./perf list
....
iTLB-load-misses [Hardware cache event]

Nest_Alink_BW/Alink0/ [Kernel PMU event]
Nest_Alink_BW/Alink1/ [Kernel PMU event]
Nest_Alink_BW/Alink2/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_03/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_03/ [Kernel PMU event]
Nest_PowerBus_BW/External/ [Kernel PMU event]
Nest_PowerBus_BW/Internal/ [Kernel PMU event]
Nest_Xlink_BW/Xlink0/ [Kernel PMU event]
Nest_Xlink_BW/Xlink1/ [Kernel PMU event]
Nest_Xlink_BW/Xlink2/ [Kernel PMU event]

rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
.....

# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1

Performance counter stats for 'system wide':

CPU0 15,913.18 MiB Nest_Xlink_BW/Xlink1/
CPU32 11,955.88 MiB Nest_Xlink_BW/Xlink1/
CPU64 11,042.43 MiB Nest_Xlink_BW/Xlink1/
CPU96 14,065.27 MiB Nest_Xlink_BW/Xlink1/

1.001062038 seconds time elapsed

# ./perf stat -e 'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 1000 sleep 5

Performance counter stats for 'system wide':

CPU0 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU32 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU64 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU96 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU0 1,430.43 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU32 320.99 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU64 3,443.83 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU96 1,904.41 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU0 2,856.85 MiB Nest_Alink_BW/Alink2/
CPU32 7.50 MiB Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB Nest_Alink_BW/Alink2/
CPU96 288.49 MiB Nest_Alink_BW/Alink2/
.....

OPAL side patches are posted in the skiboot mailing list.

Changelog from v2:

1) Changed variable and macro names to be consistent.
2) Made changes to commit message and code comment messages
3) Moved "format attribute" related code from patch 6 to 5
4) Added check for pmu register function
5) Changed cpu_init and cpu_exit functions to use first online
cpu of the chip, there by making code lot simplier.

Changelog from v1:

1) No logic changes, re-ordered patches make each patch compile
without errors
2) Added comments based on the review feedback.
3) removed perf_event_del function and replaced it with perf_event_stop.
4) Moved Nest feature detection code out of parser function.
5) Optimized functions and removed some variables.
6) squashed the makefile changes, instead of the separate patch
7) squashed the cpumask and hotplug patches as single patch
8) Added cpu checks in nest_change_cpu_context and nest_exit_cpu functions
9) Made changes to commit messages.

Changelog from RFC:

1) Removed "uncore" code and made each Nest Unit a separate PMU.
2) Removed uncore type abstraction and uncore related functions.
3) Added simple cpumask function since these are per-chip counters
4) Redesigned device-tree parser based on the latest platform enablement code
5) Made changes to commit message

Thanks to intput from Sukadev Bhattiprolu, Preeti Murthy, Daniel Axtens,
Suzuki Poulose and Michael Ellerman

Kindly let me know you comments and feedback.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>

Madhavan Srinivasan (7):
powerpc/powernv: Data structure and macros definition
powerpc/powernv: Add OPAL support for Nest PMU
powerpc/powernv: Nest PMU detection and device tree parser
powerpc/powernv: detect supported nest pmus and its events
powerpc/powernv: add event attribute and group to nest pmu
powerpc/powernv: generic nest pmu event functions
powerpc/powernv: nest pmu cpumask and cpu hotplug support

arch/powerpc/include/asm/opal-api.h | 3 +-
arch/powerpc/include/asm/opal.h | 2 +
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 515 +++++++++++++++++++++++++
arch/powerpc/perf/nest-pmu.h | 53 +++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
6 files changed, 574 insertions(+), 2 deletions(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c
create mode 100644 arch/powerpc/perf/nest-pmu.h

--
1.9.1


2015-06-25 09:57:13

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v3 1/7]powerpc/powernv: Data structure and macros definition

Create new header file "nest-pmu.h" to add the data structures
and macros needed for the nest pmu support.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.h | 53 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)
create mode 100644 arch/powerpc/perf/nest-pmu.h

diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index 0000000..ecb5d26
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,53 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+#define P8_NEST_MAX_CHIPS 32
+#define P8_NEST_MAX_PMUS 32
+#define P8_NEST_MAX_PMU_NAME_LEN 256
+#define P8_NEST_MAX_EVENTS_SUPPORTED 256
+#define P8_NEST_ENGINE_START 1
+#define P8_NEST_ENGINE_STOP 0
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+ uint32_t chip_id;
+ uint64_t pbase;
+ uint64_t vbase;
+ uint32_t size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct nest_ima_events {
+ const char *ev_name;
+ const char *ev_value;
+};
+
+/*
+ * Device tree parser code detects nest pmu support and
+ * registers new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct nest_pmu {
+ struct pmu pmu;
+ const struct attribute_group *attr_groups[4];
+};
--
1.9.1

2015-06-25 09:57:22

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v3 2/7]powerpc/powernv: Add OPAL support for Nest PMU

Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.

OPAL side patches are posted in the skiboot mailing.

Cc: Stewart Smith <[email protected]>
Cc: Jeremy Kerr <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 3 ++-
arch/powerpc/include/asm/opal.h | 2 ++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 0321a90..8011a75 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -153,7 +153,8 @@
#define OPAL_FLASH_READ 110
#define OPAL_FLASH_WRITE 111
#define OPAL_FLASH_ERASE 112
-#define OPAL_LAST 112
+#define OPAL_NEST_IMA_CONTROL 116
+#define OPAL_LAST 116

/* Device tree flags */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..f86e5e9 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -201,6 +201,8 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);

+int64_t opal_nest_ima_control(uint32_t value);
+
/* Internal functions */
extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
int depth, void *data);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a7ade94..ce36a68 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -295,3 +295,4 @@ OPAL_CALL(opal_i2c_request, OPAL_I2C_REQUEST);
OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
OPAL_CALL(opal_flash_write, OPAL_FLASH_WRITE);
OPAL_CALL(opal_flash_erase, OPAL_FLASH_ERASE);
+OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL);
--
1.9.1

2015-06-25 09:58:04

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v3 3/7]powerpc/powernv: Nest PMU detection and device tree parser

Create a file "nest-pmu.c" to contain nest pmu related functions. Code
to detect nest pmu support and parser to collect per-chip reserved memory
region information from device tree (DT).

Detection mechanism is to look for specific property "ibm,ima-chip" in DT.
For Nest pmu, device tree will have two set of information.
1) Per-chip reserved memory region for nest pmu counter collection area.
2) Supported Nest PMUs and events

Device tree layout for the Nest PMU as follows.

/ -- DT root folder
|
-nest-ima -- Nest PMU folder
|

-ima-chip@<chip-id> -- Per-chip folder for reserved region information
|
-ibm,chip-id -- Chip id
-ibm,ima-chip
-reg -- HOMER PORE Nest Counter collection Address (RA)
-size -- size to map in kernel space

-Alink_BW -- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit -- Event unit file
-device_type -- "nest-ima-unit" marker
....

Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 85 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 86 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a..6da656b 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o
obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \
power5+-pmu.o power6-pmu.o power7-pmu.o \
- power8-pmu.o
+ power8-pmu.o nest-pmu.o
obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o

obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index 0000000..e7d45ed
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,85 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "nest-pmu.h"
+
+static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+
+static int nest_ima_dt_parser(void)
+{
+ const __be32 *gcid;
+ const __be64 *chip_ima_reg;
+ const __be64 *chip_ima_size;
+ struct device_node *dev;
+ struct perchip_nest_info *p8ni;
+ int idx;
+
+ /*
+ * "nest-ima" folder contains two things,
+ * a) per-chip reserved memory region for Nest PMU Counter data
+ * b) Support Nest PMU units and their event files
+ */
+ for_each_node_with_property(dev, "ibm,ima-chip") {
+ gcid = of_get_property(dev, "ibm,chip-id", NULL);
+ chip_ima_reg = of_get_property(dev, "reg", NULL);
+ chip_ima_size = of_get_property(dev, "size", NULL);
+
+ if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
+ pr_err("Nest_PMU: device %s missing property\n",
+ dev->full_name);
+ return -ENODEV;
+ }
+
+ /* chip id to save reserve memory region */
+ idx = (uint32_t)be32_to_cpup(gcid);
+
+ /*
+ * Using a local variable to make it compact and
+ * easier to read
+ */
+ p8ni = &p8_nest_perchip_info[idx];
+ p8ni->pbase = be64_to_cpup(chip_ima_reg);
+ p8ni->size = be64_to_cpup(chip_ima_size);
+ p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
+ }
+
+ return 0;
+}
+
+static int __init nest_pmu_init(void)
+{
+ int ret = -ENODEV;
+
+ /*
+ * Lets do this only if we are hypervisor
+ */
+ if (!cur_cpu_spec->oprofile_cpu_type ||
+ !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) ||
+ !cpu_has_feature(CPU_FTR_HVMODE))
+ return ret;
+
+ /*
+ * Nest PMU information is grouped under "nest-ima" node
+ * of the top-level device-tree directory. Detect Nest PMU
+ * by the "ibm,ima-chip" property.
+ */
+ if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
+ return ret;
+
+ /*
+ * Parse device-tree for Nest PMU information
+ */
+ ret = nest_ima_dt_parser();
+ if (ret)
+ return ret;
+
+ return 0;
+}
+device_initcall(nest_pmu_init);
--
1.9.1

2015-06-25 09:57:44

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v3 4/7]powerpc/powernv: detect supported nest pmus and its events

Parse device tree to detect supported nest pmu units. Traverse
through each nest pmu unit folder to find supported events and
corresponding unit/scale files (if any).

The nest unit event file from DT, will contain the offset in the reserved memory
region to get the counter data for a given event. Kernel code uses this offset
as event configuration value.

Device tree parser code also looks for scale/unit in the file name and
passes on the file as an event attr for perf tool to use in the post
processing.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 124 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index e7d45ed..6116ff3 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,6 +11,119 @@
#include "nest-pmu.h"

static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+
+static int nest_event_info(struct property *pp, char *start,
+ struct nest_ima_events *p8_events, int flg, u32 val)
+{
+ char *buf;
+
+ /* memory for event name */
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ strncpy(buf, start, strlen(start));
+ p8_events->ev_name = buf;
+
+ /* memory for content */
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ if (flg) {
+ /* string content*/
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length))
+ return -EINVAL;
+
+ strncpy(buf, (const char *)pp->value, pp->length);
+ } else
+ sprintf(buf, "event=0x%x", val);
+
+ p8_events->ev_value = buf;
+ return 0;
+}
+
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+ struct nest_ima_events **p8_events_arr, *p8_events;
+ struct nest_pmu *pmu_ptr;
+ struct property *pp;
+ char *buf, *start;
+ const __be32 *lval;
+ u32 val;
+ int idx = 0, ret;
+
+ if (!dev)
+ return -EINVAL;
+
+ /* memory for nest pmus */
+ pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+ if (!pmu_ptr)
+ return -ENOMEM;
+
+ /* Needed for hotplug/migration */
+ per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+ /* memory for nest pmu events */
+ p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
+ GFP_KERNEL);
+ if (!p8_events_arr)
+ return -ENOMEM;
+ p8_events = (struct nest_ima_events *)p8_events_arr;
+
+ /*
+ * Loop through each property
+ */
+ for_each_property_of_node(dev, pp) {
+ start = pp->name;
+
+ if (!strcmp(pp->name, "name")) {
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length))
+ return -EINVAL;
+
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ /* Save the name to register it later */
+ sprintf(buf, "Nest_%s", (char *)pp->value);
+ pmu_ptr->pmu.name = (char *)buf;
+ continue;
+ }
+
+ /* Skip these, we dont need it */
+ if (!strcmp(pp->name, "phandle") ||
+ !strcmp(pp->name, "device_type") ||
+ !strcmp(pp->name, "linux,phandle"))
+ continue;
+
+ if (strncmp(pp->name, "unit.", 5) == 0) {
+ /* Skip first few chars in the name */
+ start += 5;
+ ret = nest_event_info(pp, start, p8_events++, 1, 0);
+ } else if (strncmp(pp->name, "scale.", 6) == 0) {
+ /* Skip first few chars in the name */
+ start += 6;
+ ret = nest_event_info(pp, start, p8_events++, 1, 0);
+ } else {
+ lval = of_get_property(dev, pp->name, NULL);
+ val = (uint32_t)be32_to_cpup(lval);
+
+ ret = nest_event_info(pp, start, p8_events++, 0, val);
+ }
+
+ if (ret)
+ return ret;
+
+ /* book keeping */
+ idx++;
+ }
+
+ return 0;
+}

static int nest_ima_dt_parser(void)
{
@@ -19,7 +132,7 @@ static int nest_ima_dt_parser(void)
const __be64 *chip_ima_size;
struct device_node *dev;
struct perchip_nest_info *p8ni;
- int idx;
+ int idx, ret;

/*
* "nest-ima" folder contains two things,
@@ -50,6 +163,15 @@ static int nest_ima_dt_parser(void)
p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
}

+ /* Look for supported Nest PMU units */
+ idx = 0;
+ for_each_node_by_type(dev, "nest-ima-unit") {
+ ret = nest_pmu_create(dev, idx);
+ if (ret)
+ return ret;
+ idx++;
+ }
+
return 0;
}

--
1.9.1

2015-06-25 09:57:52

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v3 5/7]powerpc/powernv: add event attribute and group to nest pmu

Add code to create event/format attributes and attribute groups for
each nest pmu.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 57 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 57 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 6116ff3..20ed9f8 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,17 @@
static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];

+PMU_FORMAT_ATTR(event, "config:0-20");
+struct attribute *p8_nest_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+struct attribute_group p8_nest_format_group = {
+ .name = "format",
+ .attrs = p8_nest_format_attrs,
+};
+
static int nest_event_info(struct property *pp, char *start,
struct nest_ima_events *p8_events, int flg, u32 val)
{
@@ -45,6 +56,48 @@ static int nest_event_info(struct property *pp, char *start,
return 0;
}

+/*
+ * Populate event name and string in attribute
+ */
+struct attribute *dev_str_attr(const char *name, const char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ attr->event_str = str;
+ attr->attr.attr.name = name;
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = perf_event_sysfs_show;
+
+ return &attr->attr.attr;
+}
+
+int update_events_in_group(
+ struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
+{
+ struct attribute_group *attr_group;
+ struct attribute **attrs;
+ int i;
+
+ /* Allocate memory for event attribute group */
+ attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
+ sizeof(*attr_group)), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ attrs = (struct attribute **)(attr_group + 1);
+ attr_group->name = "events";
+ attr_group->attrs = attrs;
+
+ for (i = 0; i < idx; i++, p8_events++)
+ attrs[i] = dev_str_attr((char *)p8_events->ev_name,
+ (char *)p8_events->ev_value);
+
+ pmu->attr_groups[0] = attr_group;
+ return 0;
+}
+
static int nest_pmu_create(struct device_node *dev, int pmu_index)
{
struct nest_ima_events **p8_events_arr, *p8_events;
@@ -91,6 +144,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
/* Save the name to register it later */
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
+ pmu_ptr->attr_groups[1] = &p8_nest_format_group;
continue;
}

@@ -122,6 +176,9 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
idx++;
}

+ update_events_in_group(
+ (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
+
return 0;
}

--
1.9.1

2015-06-25 10:00:10

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v3 6/7]powerpc/powernv: generic nest pmu event functions

Add set of generic nest pmu related event functions to be used by
each nest pmu. Add code to register nest pmus.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 104 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 20ed9f8..c2ada13 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -24,6 +24,100 @@ struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
};

+static int p8_nest_event_init(struct perf_event *event)
+{
+ int chip_id;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported yet */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ chip_id = topology_physical_package_id(event->cpu);
+ event->hw.event_base = event->attr.config +
+ p8_nest_perchip_info[chip_id].vbase;
+
+ return 0;
+}
+
+static void p8_nest_read_counter(struct perf_event *event)
+{
+ uint64_t *addr;
+ u64 data = 0;
+
+ addr = (u64 *)event->hw.event_base;
+ data = __be64_to_cpu(*addr);
+ local64_set(&event->hw.prev_count, data);
+}
+
+static void p8_nest_perf_event_update(struct perf_event *event)
+{
+ u64 counter_prev, counter_new, final_count;
+ uint64_t *addr;
+
+ addr = (uint64_t *)event->hw.event_base;
+ counter_prev = local64_read(&event->hw.prev_count);
+ counter_new = __be64_to_cpu(*addr);
+ final_count = counter_new - counter_prev;
+
+ local64_set(&event->hw.prev_count, counter_new);
+ local64_add(final_count, &event->count);
+}
+
+static void p8_nest_event_start(struct perf_event *event, int flags)
+{
+ event->hw.state = 0;
+ p8_nest_read_counter(event);
+}
+
+static void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_UPDATE)
+ p8_nest_perf_event_update(event);
+}
+
+static int p8_nest_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ p8_nest_event_start(event, flags);
+
+ return 0;
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+ if (!pmu)
+ return -EINVAL;
+
+ pmu->pmu.task_ctx_nr = perf_invalid_context;
+ pmu->pmu.event_init = p8_nest_event_init;
+ pmu->pmu.add = p8_nest_event_add;
+ pmu->pmu.del = p8_nest_event_stop;
+ pmu->pmu.start = p8_nest_event_start;
+ pmu->pmu.stop = p8_nest_event_stop;
+ pmu->pmu.read = p8_nest_perf_event_update;
+ pmu->pmu.attr_groups = pmu->attr_groups;
+
+ return 0;
+}
+
static int nest_event_info(struct property *pp, char *start,
struct nest_ima_events *p8_events, int flg, u32 val)
{
@@ -179,6 +273,16 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
update_events_in_group(
(struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);

+ update_pmu_ops(pmu_ptr);
+ /* Register the pmu */
+ ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+ if (ret) {
+ pr_err("Nest PMU %s Register failed\n", pmu_ptr->pmu.name);
+ return ret;
+ }
+
+ pr_info("%s performance monitor hardware support registered\n",
+ pmu_ptr->pmu.name);
return 0;
}

--
1.9.1

2015-06-25 09:58:08

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v3 7/7]powerpc/powernv: nest pmu cpumask and cpu hotplug support

Adds cpumask attribute to be used by each nest pmu since nest
units are per-chip. Only one cpu (first online cpu) from each node/chip
is designated to read counters.

On cpu hotplug, dying cpu is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same node/chip is designated
as new cpu to read counters.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 146 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 146 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index c2ada13..31943c5 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,7 @@

static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+static cpumask_t nest_pmu_cpu_mask;

PMU_FORMAT_ATTR(event, "config:0-20");
struct attribute *p8_nest_format_attrs[] = {
@@ -24,6 +25,147 @@ struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
};

+static ssize_t nest_pmu_cpumask_get_attr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &nest_pmu_cpu_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *nest_pmu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group nest_pmu_cpumask_attr_group = {
+ .attrs = nest_pmu_cpumask_attrs,
+};
+
+static void nest_init(void *dummy)
+{
+ opal_nest_ima_control(P8_NEST_ENGINE_START);
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+ int i;
+
+ for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
+ perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+ old_cpu, new_cpu);
+}
+
+static void nest_exit_cpu(int cpu)
+{
+ int nid, target = -1;
+ struct cpumask *l_cpumask;
+
+ /*
+ * Check in the designated list for this cpu. Dont bother
+ * if not one of them.
+ */
+ if (!cpumask_test_and_clear_cpu(cpu, &nest_pmu_cpu_mask))
+ return;
+
+ /*
+ * Now that this cpu is one of the designated,
+ * find a next cpu a) which is online and b) in same chip.
+ */
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+ target = cpumask_next(cpu, l_cpumask);
+
+ /*
+ * Update the cpumask with the target cpu and
+ * migrate the context if needed
+ */
+ if (target >= 0 && target <= nr_cpu_ids) {
+ cpumask_set_cpu(target, &nest_pmu_cpu_mask);
+ nest_change_cpu_context(cpu, target);
+ }
+}
+
+static void nest_init_cpu(int cpu)
+{
+ int nid, fcpu, ncpu;
+ struct cpumask *l_cpumask, tmp_mask;
+
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+
+ /*
+ * if empty cpumask, just add incoming cpu and move on.
+ */
+ if (!cpumask_and(&tmp_mask, l_cpumask, &nest_pmu_cpu_mask)) {
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ return;
+ }
+
+ /*
+ * Alway have the first online cpu of a chip as designated one.
+ */
+ fcpu = cpumask_first(l_cpumask);
+ ncpu = cpumask_next(cpu, l_cpumask);
+ if (cpu == fcpu) {
+ if (cpumask_test_and_clear_cpu(ncpu, &nest_pmu_cpu_mask)) {
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ nest_change_cpu_context(ncpu, cpu);
+ }
+ }
+}
+
+static int nest_pmu_cpu_notifier(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ long cpu = (long)hcpu;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_ONLINE:
+ nest_init_cpu(cpu);
+ break;
+ case CPU_DOWN_PREPARE:
+ nest_exit_cpu(cpu);
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block nest_pmu_cpu_nb = {
+ .notifier_call = nest_pmu_cpu_notifier,
+ .priority = CPU_PRI_PERF + 1,
+};
+
+void nest_pmu_cpumask_init(void)
+{
+ const struct cpumask *l_cpumask;
+ int cpu, nid;
+
+ cpu_notifier_register_begin();
+
+ /*
+ * Nest PMUs are per-chip counters. So designate a cpu
+ * from each chip for counter collection.
+ */
+ for_each_online_node(nid) {
+ l_cpumask = cpumask_of_node(nid);
+
+ /* designate first online cpu in this node */
+ cpu = cpumask_first(l_cpumask);
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ }
+
+ /* Initialize Nest PMUs in each node using designated cpus */
+ on_each_cpu_mask(&nest_pmu_cpu_mask, (smp_call_func_t)nest_init, NULL, 1);
+
+ __register_cpu_notifier(&nest_pmu_cpu_nb);
+
+ cpu_notifier_register_done();
+}
+
static int p8_nest_event_init(struct perf_event *event)
{
int chip_id;
@@ -239,6 +381,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
pmu_ptr->attr_groups[1] = &p8_nest_format_group;
+ pmu_ptr->attr_groups[2] = &nest_pmu_cpumask_attr_group;
continue;
}

@@ -363,6 +506,9 @@ static int __init nest_pmu_init(void)
if (ret)
return ret;

+ /* Add cpumask and register for hotplug notification */
+ nest_pmu_cpumask_init();
+
return 0;
}
device_initcall(nest_pmu_init);
--
1.9.1