2015-08-03 07:37:42

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v7 0/7] powerpc/powernv: Nest Instrumentation support

This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.

Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.

OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.

Kernel code then, parses the device-tree for nest pmu support
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip reserved memory region, hence
device-tree also exports per-chip nest accumulation memory region.
And individual event offset are used as event configuration values.

Here is sample perf usage to explain the interface.

#./perf list
....
iTLB-load-misses [Hardware cache event]

Nest_Alink_BW/Alink0/ [Kernel PMU event]
Nest_Alink_BW/Alink1/ [Kernel PMU event]
Nest_Alink_BW/Alink2/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_03/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_03/ [Kernel PMU event]
Nest_PowerBus_BW/External/ [Kernel PMU event]
Nest_PowerBus_BW/Internal/ [Kernel PMU event]
Nest_Xlink_BW/Xlink0/ [Kernel PMU event]
Nest_Xlink_BW/Xlink1/ [Kernel PMU event]
Nest_Xlink_BW/Xlink2/ [Kernel PMU event]

rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
.....

# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1

Performance counter stats for 'system wide':

CPU0 15,913.18 MiB Nest_Xlink_BW/Xlink1/
CPU32 11,955.88 MiB Nest_Xlink_BW/Xlink1/
CPU64 11,042.43 MiB Nest_Xlink_BW/Xlink1/
CPU96 14,065.27 MiB Nest_Xlink_BW/Xlink1/

1.001062038 seconds time elapsed


# ./perf stat -e 'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 1000 sleep 5

Performance counter stats for 'system wide':

CPU0 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU32 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU64 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU96 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU0 1,430.43 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU32 320.99 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU64 3,443.83 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU96 1,904.41 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU0 2,856.85 MiB Nest_Alink_BW/Alink2/
CPU32 7.50 MiB Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB Nest_Alink_BW/Alink2/
CPU96 288.49 MiB Nest_Alink_BW/Alink2/
.....

OPAL side patches are posted in the skiboot mailing list.

Changelog from v6:

1) Modified the OPAL call number to sync with OPAL side changes.
2) Rebased to latest upstream.

Changelog from v5:

1) Replaced be32_to_cpup with of_property_read_u32 when reading from device tree
2) Replaced uintxx_t to u[32/64] variable types
3) Broken the memory location for event attribute group and events into
two alloc functions to have it simple.
4) Added separate functions for event string and event value as suggested.
5) Added P8_NEST_MAX_PMUS check for array overflow
6) Made changes to commit messages.

Changelog from v4:

1) Variable name changes for consistency and added more comments
2) Added sysfs_att_init to have lockdep happy
3) Updated OPAL Call interface changes and added code to handle
failure case.
4) Added new macro "P8_NEST_MODE_PRODUCTION" to specify PORE Engine mode
5) Modified nest_pmu_cpumask_init function to return value to
nest pmu init function incase of OPAL call failure.

Changelog from v3:

No logic change, just a rebase to latest upstream kernel.

Changelog from v2:

1) Changed variable and macro names to be consistent.
2) Made changes to commit message and code comment messages
3) Moved "format attribute" related code from patch 6 to 5
4) Added check for pmu register function
5) Changed cpu_init and cpu_exit functions to use first online
cpu of the chip, there by making code lot simplier.

Changelog from v1:

1) No logic changes, re-ordered patches make each patch compile
without errors
2) Added comments based on the review feedback.
3) removed perf_event_del function and replaced it with perf_event_stop.
4) Moved Nest feature detection code out of parser function.
5) Optimized functions and removed some variables.
6) squashed the makefile changes, instead of the separate patch
7) squashed the cpumask and hotplug patches as single patch
8) Added cpu checks in nest_change_cpu_context and nest_exit_cpu functions
9) Made changes to commit messages.

Changelog from RFC:

1) Removed "uncore" code and made each Nest Unit a separate PMU.
2) Removed uncore type abstraction and uncore related functions.
3) Added simple cpumask function since these are per-chip counters
4) Redesigned device-tree parser based on the latest platform enablement code
5) Made changes to commit message

Thanks to input from Sukadev Bhattiprolu, Preeti Murthy, Daniel Axtens,
Suzuki Poulose and Michael Ellerman

Kindly let me know you comments and feedback.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Reviewed-by: Sukadev Bhattiprolu <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>

Madhavan Srinivasan (7):
powerpc/powernv: Data structure and macros definition
powerpc/powernv: Add OPAL support for Nest PMU
powerpc/powernv: Nest PMU detection and device tree parser
powerpc/powernv: detect supported nest pmus and its events
powerpc/powernv: add event attribute and group to nest pmu
powerpc/powernv: generic nest pmu event functions
powerpc/powernv: nest pmu cpumask and cpu hotplug support

arch/powerpc/include/asm/opal-api.h | 3 +-
arch/powerpc/include/asm/opal.h | 1 +
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 544 +++++++++++++++++++++++++
arch/powerpc/perf/nest-pmu.h | 54 +++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
6 files changed, 603 insertions(+), 2 deletions(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c
create mode 100644 arch/powerpc/perf/nest-pmu.h

--
1.9.1


2015-08-03 07:36:16

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v7 1/7] powerpc/powernv: Data structure and macros definition

Create new header file "nest-pmu.h" to add the data structures
and macros needed for the nest pmu support.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.h | 54 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
create mode 100644 arch/powerpc/perf/nest-pmu.h

diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index 000000000000..73aff2de3350
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,54 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+#define P8_NEST_MAX_CHIPS 32
+#define P8_NEST_MAX_PMUS 32
+#define P8_NEST_MAX_PMU_NAME_LEN 256
+#define P8_NEST_MAX_EVENTS_SUPPORTED 64
+#define P8_NEST_ENGINE_START 1
+#define P8_NEST_ENGINE_STOP 0
+#define P8_NEST_MODE_PRODUCTION 1
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+ u32 chip_id;
+ u64 pbase;
+ u64 vbase;
+ u32 size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct nest_ima_events {
+ char *ev_name;
+ char *ev_value;
+};
+
+/*
+ * Device tree parser code detects nest pmu support and
+ * registers new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct nest_pmu {
+ struct pmu pmu;
+ const struct attribute_group *attr_groups[4];
+};
--
1.9.1

2015-08-03 07:38:29

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v7 2/7] powerpc/powernv: Add OPAL support for Nest PMU

Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.

OPAL side patches are posted in the skiboot mailing.

Cc: Stewart Smith <[email protected]>
Cc: Jeremy Kerr <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 3 ++-
arch/powerpc/include/asm/opal.h | 1 +
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index e9e4c52f3685..52903449d435 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -154,7 +154,8 @@
#define OPAL_FLASH_WRITE 111
#define OPAL_FLASH_ERASE 112
#define OPAL_PRD_MSG 113
-#define OPAL_LAST 113
+#define OPAL_NEST_IMA_CONTROL 117
+#define OPAL_LAST 117

/* Device tree flags */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 958e941c0cda..7c813ed52ab4 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -202,6 +202,7 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
uint64_t size, uint64_t token);
int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_nest_ima_control(uint64_t mode, uint64_t value);

/* Internal functions */
extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index d6a7b8252e4d..c475c04468fb 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -297,3 +297,4 @@ OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
OPAL_CALL(opal_flash_write, OPAL_FLASH_WRITE);
OPAL_CALL(opal_flash_erase, OPAL_FLASH_ERASE);
OPAL_CALL(opal_prd_msg, OPAL_PRD_MSG);
+OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL);
--
1.9.1

2015-08-03 07:36:30

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v7 3/7] powerpc/powernv: Nest PMU detection and device tree parser

Create a file "nest-pmu.c" to contain nest pmu related functions. Code
to detect nest pmu support and parser to collect per-chip reserved memory
region information from device tree (DT).

Detection mechanism is to look for specific property "ibm,ima-chip" in DT.
For Nest pmu, device tree will have two set of information.
1) Per-chip reserved memory region for nest pmu counter collection area.
2) Supported Nest PMUs and events

Device tree layout for the Nest PMU as follows.

/ -- DT root folder
|
-nest-ima -- Nest PMU folder
|

-ima-chip@<chip-id> -- Per-chip folder for reserved region information
|
-ibm,chip-id -- Chip id
-ibm,ima-chip
-reg -- HOMER PORE Nest Counter collection Address (RA)
-size -- size to map in kernel space

-Alink_BW -- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit -- Event unit file
-device_type -- "nest-ima-unit" marker
....

Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 73 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a5652a..6da656b50e3c 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o
obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \
power5+-pmu.o power6-pmu.o power7-pmu.o \
- power8-pmu.o
+ power8-pmu.o nest-pmu.o
obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o

obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index 000000000000..48738f9f6426
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,72 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "nest-pmu.h"
+
+static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+
+static int nest_ima_dt_parser(void)
+{
+ struct device_node *dev;
+ struct perchip_nest_info *p8ni;
+ u32 idx;
+
+ /*
+ * "nest-ima" folder contains two things,
+ * a) per-chip reserved memory region for Nest PMU Counter data
+ * b) Support Nest PMU units and their event files
+ */
+ for_each_node_with_property(dev, "ibm,ima-chip") {
+ if (of_property_read_u32(dev, "ibm,chip-id", &idx)) {
+ pr_err("Nest_PMU: device %s missing property\n",
+ dev->full_name);
+ return -ENODEV;
+ }
+ p8ni = &p8_nest_perchip_info[idx];
+
+ if (of_property_read_u64(dev, "reg", &p8ni->pbase)) {
+ pr_err("Nest_PMU: device %s missing property\n",
+ dev->full_name);
+ return -ENODEV;
+ }
+ p8ni->vbase = (u64)phys_to_virt(p8ni->pbase);
+ }
+
+ return 0;
+}
+
+static int __init nest_pmu_init(void)
+{
+ int ret = -ENODEV;
+
+ /*
+ * Lets do this only if we are hypervisor
+ */
+ if (!cpu_has_feature(CPU_FTR_HVMODE))
+ return ret;
+
+ /*
+ * Nest PMU information is grouped under "nest-ima" node
+ * of the top-level device-tree directory. Detect Nest PMU
+ * by the "ibm,ima-chip" property.
+ */
+ if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
+ return ret;
+
+ /*
+ * Parse device-tree for Nest PMU information
+ */
+ ret = nest_ima_dt_parser();
+ if (ret)
+ return ret;
+
+ return 0;
+}
+device_initcall(nest_pmu_init);
--
1.9.1

2015-08-03 07:38:04

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v7 4/7] powerpc/powernv: detect supported nest pmus and its events

Parse device tree to detect supported nest pmu units. Traverse
through each nest pmu unit folder to find supported events and
corresponding unit/scale files (if any).

The nest unit event file from Device Tree will contain the offset in the
reserved memory region to get the counter data for a given event.
Kernel code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit in the file name and
passes on the file as an event attr for perf tool to use in the post
processing.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 137 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 137 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 48738f9f6426..292fc2b91ed0 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,12 +11,140 @@
#include "nest-pmu.h"

static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+
+static int nest_event_info(char *name, struct nest_ima_events *p8_events)
+{
+ char *buf;
+
+ /* memory for event name */
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ strncpy(buf, name, strlen(name));
+ p8_events->ev_name = buf;
+
+ /* memory for content */
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ p8_events->ev_value = buf;
+ return 0;
+}
+
+static int nest_event_info_str(struct property *pp, char *name,
+ struct nest_ima_events *p8_events)
+{
+ if (nest_event_info(name, p8_events))
+ return -ENOMEM;
+
+ if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
+ (pp->length > P8_NEST_MAX_PMU_NAME_LEN))
+ return -EINVAL;
+
+ strncpy(p8_events->ev_value, (const char *)pp->value, pp->length);
+
+ return 0;
+}
+
+static int nest_event_info_val(char *name, u32 val,
+ struct nest_ima_events *p8_events)
+{
+ if (nest_event_info(name, p8_events))
+ return -ENOMEM;
+
+ sprintf(p8_events->ev_value, "event=0x%x", val);
+
+ return 0;
+}
+
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+ struct nest_ima_events *p8_events;
+ struct nest_pmu *pmu_ptr;
+ struct property *pp;
+ char *buf, *start;
+ u32 val;
+ int idx = 0, ret;
+
+ if (!dev)
+ return -EINVAL;
+
+ /* memory for nest pmus */
+ pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+ if (!pmu_ptr)
+ return -ENOMEM;
+
+ /* Needed for hotplug/migration */
+ per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+ /* memory for nest pmu events */
+ p8_events = kzalloc((sizeof(struct nest_ima_events) *
+ P8_NEST_MAX_EVENTS_SUPPORTED), GFP_KERNEL);
+ if (!p8_events)
+ return -ENOMEM;
+
+ /*
+ * Loop through each property
+ */
+ for_each_property_of_node(dev, pp) {
+ start = pp->name;
+
+ if (!strcmp(pp->name, "name")) {
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length) ||
+ (pp->length > P8_NEST_MAX_PMU_NAME_LEN))
+ return -EINVAL;
+
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ /* Save the name to register it later */
+ sprintf(buf, "Nest_%s", (char *)pp->value);
+ pmu_ptr->pmu.name = (char *)buf;
+ continue;
+ }
+
+ /* Skip these, we don't need it */
+ if (!strcmp(pp->name, "phandle") ||
+ !strcmp(pp->name, "device_type") ||
+ !strcmp(pp->name, "linux,phandle"))
+ continue;
+
+ /*
+ * Strip the prefix from "unit" and "scale" property name
+ * since it is only a marker search
+ */
+ if (strncmp(pp->name, "unit.", 5) == 0) {
+ start += 5;
+ ret = nest_event_info_str(pp, start, &p8_events[idx]);
+ } else if (strncmp(pp->name, "scale.", 6) == 0) {
+ start += 6;
+ ret = nest_event_info_str(pp, start, &p8_events[idx]);
+ } else {
+ of_property_read_u32(dev, pp->name, &val);
+ ret = nest_event_info_val(start, val, &p8_events[idx]);
+ }
+
+ if (ret)
+ return ret;
+
+ /* book keeping */
+ idx++;
+ }
+
+ return 0;
+}

static int nest_ima_dt_parser(void)
{
struct device_node *dev;
struct perchip_nest_info *p8ni;
u32 idx;
+ int ret;

/*
* "nest-ima" folder contains two things,
@@ -39,6 +167,15 @@ static int nest_ima_dt_parser(void)
p8ni->vbase = (u64)phys_to_virt(p8ni->pbase);
}

+ /* Look for supported Nest PMU units */
+ idx = 0;
+ for_each_node_by_type(dev, "nest-ima-unit") {
+ ret = nest_pmu_create(dev, idx);
+ if (ret)
+ return ret;
+ idx++;
+ }
+
return 0;
}

--
1.9.1

2015-08-03 07:36:42

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v7 5/7] powerpc/powernv: add event attribute and group to nest pmu

Add code to create event/format attributes and attribute groups for
each nest pmu.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 61 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 292fc2b91ed0..4a74ffb97741 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,17 @@
static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];

+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *p8_nest_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group p8_nest_format_group = {
+ .name = "format",
+ .attrs = p8_nest_format_attrs,
+};
+
static int nest_event_info(char *name, struct nest_ima_events *p8_events)
{
char *buf;
@@ -60,6 +71,53 @@ static int nest_event_info_val(char *name, u32 val,
return 0;
}

+/*
+ * Populate event name and string in attribute
+ */
+static struct attribute *dev_str_attr(const char *name, const char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ sysfs_attr_init(&attr->attr.attr);
+
+ attr->event_str = str;
+ attr->attr.attr.name = name;
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = perf_event_sysfs_show;
+
+ return &attr->attr.attr;
+}
+
+static int update_events_in_group(
+ struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
+{
+ struct attribute_group *attr_group;
+ struct attribute **attrs;
+ int i;
+
+ /* Allocate memory for attribute group */
+ attr_group = kzalloc(sizeof(*attr_group), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ /* Allocate memory for attributes */
+ attrs = kzalloc((sizeof(struct attribute *) * (idx + 1)), GFP_KERNEL);
+ if (!attrs)
+ return -ENOMEM;
+
+ attr_group->name = "events";
+ attr_group->attrs = attrs;
+
+ for (i = 0; i < idx; i++, p8_events++)
+ attrs[i] = dev_str_attr((char *)p8_events->ev_name,
+ (char *)p8_events->ev_value);
+
+ pmu->attr_groups[0] = attr_group;
+ return 0;
+}
+
static int nest_pmu_create(struct device_node *dev, int pmu_index)
{
struct nest_ima_events *p8_events;
@@ -105,6 +163,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
/* Save the name to register it later */
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
+ pmu_ptr->attr_groups[1] = &p8_nest_format_group;
continue;
}

@@ -136,6 +195,8 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
idx++;
}

+ update_events_in_group(p8_events, idx, pmu_ptr);
+
return 0;
}

--
1.9.1

2015-08-03 07:36:27

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v7 6/7] powerpc/powernv: generic nest pmu event functions

Add set of generic nest pmu related event functions to be used by
each nest pmu. Add code to register nest pmus.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 101 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 101 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 4a74ffb97741..f653a7ab6ed7 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -24,6 +24,97 @@ static struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
};

+static int p8_nest_event_init(struct perf_event *event)
+{
+ int chip_id;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported yet */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ chip_id = topology_physical_package_id(event->cpu);
+ event->hw.event_base = event->attr.config +
+ p8_nest_perchip_info[chip_id].vbase;
+
+ return 0;
+}
+
+static void p8_nest_read_counter(struct perf_event *event)
+{
+ u64 *addr, data;
+
+ addr = (u64 *)event->hw.event_base;
+ data = __be64_to_cpu(*addr);
+ local64_set(&event->hw.prev_count, data);
+}
+
+static void p8_nest_perf_event_update(struct perf_event *event)
+{
+ u64 counter_prev, counter_new, final_count, *addr;
+
+ addr = (u64 *)event->hw.event_base;
+ counter_prev = local64_read(&event->hw.prev_count);
+ counter_new = __be64_to_cpu(*addr);
+ final_count = counter_new - counter_prev;
+
+ local64_set(&event->hw.prev_count, counter_new);
+ local64_add(final_count, &event->count);
+}
+
+static void p8_nest_event_start(struct perf_event *event, int flags)
+{
+ p8_nest_read_counter(event);
+}
+
+static void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_UPDATE)
+ p8_nest_perf_event_update(event);
+}
+
+static int p8_nest_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ p8_nest_event_start(event, flags);
+
+ return 0;
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+ if (!pmu)
+ return -EINVAL;
+
+ pmu->pmu.task_ctx_nr = perf_invalid_context;
+ pmu->pmu.event_init = p8_nest_event_init;
+ pmu->pmu.add = p8_nest_event_add;
+ pmu->pmu.del = p8_nest_event_stop;
+ pmu->pmu.start = p8_nest_event_start;
+ pmu->pmu.stop = p8_nest_event_stop;
+ pmu->pmu.read = p8_nest_perf_event_update;
+ pmu->pmu.attr_groups = pmu->attr_groups;
+
+ return 0;
+}
+
static int nest_event_info(char *name, struct nest_ima_events *p8_events)
{
char *buf;
@@ -197,6 +288,16 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)

update_events_in_group(p8_events, idx, pmu_ptr);

+ update_pmu_ops(pmu_ptr);
+ /* Register the pmu */
+ ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+ if (ret) {
+ pr_err("Nest PMU %s Register failed\n", pmu_ptr->pmu.name);
+ return ret;
+ }
+
+ pr_info("%s performance monitor hardware support registered\n",
+ pmu_ptr->pmu.name);
return 0;
}

--
1.9.1

2015-08-03 07:36:37

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v7 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support

Adds cpumask attribute to be used by each nest pmu since nest
units are per-chip. Only one cpu (first online cpu) from each chip
is designated to read counters.

On cpu hotplug, dying cpu is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip is
designated as new cpu to read counters.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 173 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 173 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index f653a7ab6ed7..ec173b7c36e5 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,7 @@

static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+static cpumask_t nest_pmu_cpu_mask;

PMU_FORMAT_ATTR(event, "config:0-20");
static struct attribute *p8_nest_format_attrs[] = {
@@ -24,6 +25,173 @@ static struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
};

+static ssize_t nest_pmu_cpumask_get_attr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &nest_pmu_cpu_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *nest_pmu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group nest_pmu_cpumask_attr_group = {
+ .attrs = nest_pmu_cpumask_attrs,
+};
+
+static void nest_init(int *loc)
+{
+ int rc;
+
+ rc = opal_nest_ima_control(
+ P8_NEST_MODE_PRODUCTION, P8_NEST_ENGINE_START);
+ if (rc)
+ loc[smp_processor_id()] = 1;
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+ int i;
+
+ for (i = 0;
+ (per_nest_pmu_arr[i] != NULL) && (i < P8_NEST_MAX_PMUS); i++)
+ perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+ old_cpu, new_cpu);
+}
+
+static void nest_exit_cpu(int cpu)
+{
+ int nid, target = -1;
+ struct cpumask *l_cpumask;
+
+ /*
+ * Check in the designated list for this cpu. Dont bother
+ * if not one of them.
+ */
+ if (!cpumask_test_and_clear_cpu(cpu, &nest_pmu_cpu_mask))
+ return;
+
+ /*
+ * Now that this cpu is one of the designated,
+ * find a next cpu a) which is online and b) in same chip.
+ */
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+ target = cpumask_next(cpu, l_cpumask);
+
+ /*
+ * Update the cpumask with the target cpu and
+ * migrate the context if needed
+ */
+ if (target >= 0 && target <= nr_cpu_ids) {
+ cpumask_set_cpu(target, &nest_pmu_cpu_mask);
+ nest_change_cpu_context(cpu, target);
+ }
+}
+
+static void nest_init_cpu(int cpu)
+{
+ int nid, fcpu, ncpu;
+ struct cpumask *l_cpumask, tmp_mask;
+
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+
+ /*
+ * if empty cpumask, just add incoming cpu and move on.
+ */
+ if (!cpumask_and(&tmp_mask, l_cpumask, &nest_pmu_cpu_mask)) {
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ return;
+ }
+
+ /*
+ * Alway have the first online cpu of a chip as designated one.
+ */
+ fcpu = cpumask_first(l_cpumask);
+ ncpu = cpumask_next(cpu, l_cpumask);
+ if (cpu == fcpu) {
+ if (cpumask_test_and_clear_cpu(ncpu, &nest_pmu_cpu_mask)) {
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ nest_change_cpu_context(ncpu, cpu);
+ }
+ }
+}
+
+static int nest_pmu_cpu_notifier(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ long cpu = (long)hcpu;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_ONLINE:
+ nest_init_cpu(cpu);
+ break;
+ case CPU_DOWN_PREPARE:
+ nest_exit_cpu(cpu);
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block nest_pmu_cpu_nb = {
+ .notifier_call = nest_pmu_cpu_notifier,
+ .priority = CPU_PRI_PERF + 1,
+};
+
+static int nest_pmu_cpumask_init(void)
+{
+ const struct cpumask *l_cpumask;
+ int cpu, nid;
+ int *cpus_opal_rc;
+
+ cpu_notifier_register_begin();
+
+ /*
+ * Nest PMUs are per-chip counters. So designate a cpu
+ * from each chip for counter collection.
+ */
+ for_each_online_node(nid) {
+ l_cpumask = cpumask_of_node(nid);
+
+ /* designate first online cpu in this node */
+ cpu = cpumask_first(l_cpumask);
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ }
+
+ /*
+ * Memory for OPAL call return value.
+ */
+ cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+ if (!cpus_opal_rc)
+ goto fail;
+
+ /* Initialize Nest PMUs in each node using designated cpus */
+ on_each_cpu_mask(&nest_pmu_cpu_mask, (smp_call_func_t)nest_init,
+ (void *)cpus_opal_rc, 1);
+
+ /* Check return value array for any OPAL call failure */
+ for_each_cpu(cpu, &nest_pmu_cpu_mask) {
+ if (cpus_opal_rc[cpu])
+ goto fail;
+ }
+
+ __register_cpu_notifier(&nest_pmu_cpu_nb);
+
+ cpu_notifier_register_done();
+ return 0;
+
+fail:
+ cpu_notifier_register_done();
+ return -ENODEV;
+}
+
static int p8_nest_event_init(struct perf_event *event)
{
int chip_id;
@@ -255,6 +423,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
pmu_ptr->attr_groups[1] = &p8_nest_format_group;
+ pmu_ptr->attr_groups[2] = &nest_pmu_cpumask_attr_group;
continue;
}

@@ -359,6 +528,10 @@ static int __init nest_pmu_init(void)
if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
return ret;

+ /* Add cpumask and register for hotplug notification */
+ if (nest_pmu_cpumask_init())
+ return ret;
+
/*
* Parse device-tree for Nest PMU information
*/
--
1.9.1