2015-07-08 13:03:07

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v4 0/7]powerpc/powernv: Nest Instrumentation support

This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.

Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.

OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.

Kernel code then, parses the device-tree for nest pmu support
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip reserved memory region, hence
device-tree also exports per-chip nest accumulation memory region.
And individual event offset are used as event configuration values.

Here is sample perf usage to explain the interface.

#./perf list
....
iTLB-load-misses [Hardware cache event]

Nest_Alink_BW/Alink0/ [Kernel PMU event]
Nest_Alink_BW/Alink1/ [Kernel PMU event]
Nest_Alink_BW/Alink2/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_03/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_03/ [Kernel PMU event]
Nest_PowerBus_BW/External/ [Kernel PMU event]
Nest_PowerBus_BW/Internal/ [Kernel PMU event]
Nest_Xlink_BW/Xlink0/ [Kernel PMU event]
Nest_Xlink_BW/Xlink1/ [Kernel PMU event]
Nest_Xlink_BW/Xlink2/ [Kernel PMU event]

rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
.....

# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1

Performance counter stats for 'system wide':

CPU0 15,913.18 MiB Nest_Xlink_BW/Xlink1/
CPU32 11,955.88 MiB Nest_Xlink_BW/Xlink1/
CPU64 11,042.43 MiB Nest_Xlink_BW/Xlink1/
CPU96 14,065.27 MiB Nest_Xlink_BW/Xlink1/

1.001062038 seconds time elapsed

# ./perf stat -e 'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 1000 sleep 5

Performance counter stats for 'system wide':

CPU0 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU32 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU64 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU96 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU0 1,430.43 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU32 320.99 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU64 3,443.83 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU96 1,904.41 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU0 2,856.85 MiB Nest_Alink_BW/Alink2/
CPU32 7.50 MiB Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB Nest_Alink_BW/Alink2/
CPU96 288.49 MiB Nest_Alink_BW/Alink2/
.....

OPAL side patches are posted in the skiboot mailing list.

Changelog from v3:

No logic change, just a rebase to latest upstream kernel.

Changelog from v2:

1) Changed variable and macro names to be consistent.
2) Made changes to commit message and code comment messages
3) Moved "format attribute" related code from patch 6 to 5
4) Added check for pmu register function
5) Changed cpu_init and cpu_exit functions to use first online
cpu of the chip, there by making code lot simplier.

Changelog from v1:

1) No logic changes, re-ordered patches make each patch compile
without errors
2) Added comments based on the review feedback.
3) removed perf_event_del function and replaced it with perf_event_stop.
4) Moved Nest feature detection code out of parser function.
5) Optimized functions and removed some variables.
6) squashed the makefile changes, instead of the separate patch
7) squashed the cpumask and hotplug patches as single patch
8) Added cpu checks in nest_change_cpu_context and nest_exit_cpu functions
9) Made changes to commit messages.

Changelog from RFC:

1) Removed "uncore" code and made each Nest Unit a separate PMU.
2) Removed uncore type abstraction and uncore related functions.
3) Added simple cpumask function since these are per-chip counters
4) Redesigned device-tree parser based on the latest platform enablement code
5) Made changes to commit message

Thanks to input from Sukadev Bhattiprolu, Preeti Murthy, Daniel Axtens,
Suzuki Poulose and Michael Ellerman

Kindly let me know you comments and feedback.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>

Madhavan Srinivasan (7):
powerpc/powernv: Data structure and macros definition
powerpc/powernv: Add OPAL support for Nest PMU
powerpc/powernv: Nest PMU detection and device tree parser
powerpc/powernv: detect supported nest pmus and its events
powerpc/powernv: add event attribute and group to nest pmu
powerpc/powernv: generic nest pmu event functions
powerpc/powernv: nest pmu cpumask and cpu hotplug support

arch/powerpc/include/asm/opal-api.h | 3 +-
arch/powerpc/include/asm/opal.h | 1 +
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 514 +++++++++++++++++++++++++
arch/powerpc/perf/nest-pmu.h | 53 +++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
6 files changed, 572 insertions(+), 2 deletions(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c
create mode 100644 arch/powerpc/perf/nest-pmu.h

--
1.9.1


2015-07-08 13:03:11

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v4 1/7] powerpc/powernv: Data structure and macros definition

Create new header file "nest-pmu.h" to add the data structures
and macros needed for the nest pmu support.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.h | 53 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)
create mode 100644 arch/powerpc/perf/nest-pmu.h

diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index 0000000..ecb5d26
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,53 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+#define P8_NEST_MAX_CHIPS 32
+#define P8_NEST_MAX_PMUS 32
+#define P8_NEST_MAX_PMU_NAME_LEN 256
+#define P8_NEST_MAX_EVENTS_SUPPORTED 256
+#define P8_NEST_ENGINE_START 1
+#define P8_NEST_ENGINE_STOP 0
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+ uint32_t chip_id;
+ uint64_t pbase;
+ uint64_t vbase;
+ uint32_t size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct nest_ima_events {
+ const char *ev_name;
+ const char *ev_value;
+};
+
+/*
+ * Device tree parser code detects nest pmu support and
+ * registers new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct nest_pmu {
+ struct pmu pmu;
+ const struct attribute_group *attr_groups[4];
+};
--
1.9.1

2015-07-08 13:03:15

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v4 2/7] powerpc/powernv: Add OPAL support for Nest PMU

Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.

OPAL side patches are posted in the skiboot mailing.

Cc: Stewart Smith <[email protected]>
Cc: Jeremy Kerr <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 3 ++-
arch/powerpc/include/asm/opal.h | 1 +
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index e9e4c52..4cd8128 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -154,7 +154,8 @@
#define OPAL_FLASH_WRITE 111
#define OPAL_FLASH_ERASE 112
#define OPAL_PRD_MSG 113
-#define OPAL_LAST 113
+#define OPAL_NEST_IMA_CONTROL 116
+#define OPAL_LAST 116

/* Device tree flags */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 958e941..7cb6215 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -202,6 +202,7 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
uint64_t size, uint64_t token);
int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_nest_ima_control(uint32_t value);

/* Internal functions */
extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index d6a7b82..c475c04 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -297,3 +297,4 @@ OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
OPAL_CALL(opal_flash_write, OPAL_FLASH_WRITE);
OPAL_CALL(opal_flash_erase, OPAL_FLASH_ERASE);
OPAL_CALL(opal_prd_msg, OPAL_PRD_MSG);
+OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL);
--
1.9.1

2015-07-08 13:03:30

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v4 3/7] powerpc/powernv: Nest PMU detection and device tree parser

Create a file "nest-pmu.c" to contain nest pmu related functions. Code
to detect nest pmu support and parser to collect per-chip reserved memory
region information from device tree (DT).

Detection mechanism is to look for specific property "ibm,ima-chip" in DT.
For Nest pmu, device tree will have two set of information.
1) Per-chip reserved memory region for nest pmu counter collection area.
2) Supported Nest PMUs and events

Device tree layout for the Nest PMU as follows.

/ -- DT root folder
|
-nest-ima -- Nest PMU folder
|

-ima-chip@<chip-id> -- Per-chip folder for reserved region information
|
-ibm,chip-id -- Chip id
-ibm,ima-chip
-reg -- HOMER PORE Nest Counter collection Address (RA)
-size -- size to map in kernel space

-Alink_BW -- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit -- Event unit file
-device_type -- "nest-ima-unit" marker
....

Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 85 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 86 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a..6da656b 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o
obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \
power5+-pmu.o power6-pmu.o power7-pmu.o \
- power8-pmu.o
+ power8-pmu.o nest-pmu.o
obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o

obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index 0000000..e7d45ed
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,85 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "nest-pmu.h"
+
+static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+
+static int nest_ima_dt_parser(void)
+{
+ const __be32 *gcid;
+ const __be64 *chip_ima_reg;
+ const __be64 *chip_ima_size;
+ struct device_node *dev;
+ struct perchip_nest_info *p8ni;
+ int idx;
+
+ /*
+ * "nest-ima" folder contains two things,
+ * a) per-chip reserved memory region for Nest PMU Counter data
+ * b) Support Nest PMU units and their event files
+ */
+ for_each_node_with_property(dev, "ibm,ima-chip") {
+ gcid = of_get_property(dev, "ibm,chip-id", NULL);
+ chip_ima_reg = of_get_property(dev, "reg", NULL);
+ chip_ima_size = of_get_property(dev, "size", NULL);
+
+ if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
+ pr_err("Nest_PMU: device %s missing property\n",
+ dev->full_name);
+ return -ENODEV;
+ }
+
+ /* chip id to save reserve memory region */
+ idx = (uint32_t)be32_to_cpup(gcid);
+
+ /*
+ * Using a local variable to make it compact and
+ * easier to read
+ */
+ p8ni = &p8_nest_perchip_info[idx];
+ p8ni->pbase = be64_to_cpup(chip_ima_reg);
+ p8ni->size = be64_to_cpup(chip_ima_size);
+ p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
+ }
+
+ return 0;
+}
+
+static int __init nest_pmu_init(void)
+{
+ int ret = -ENODEV;
+
+ /*
+ * Lets do this only if we are hypervisor
+ */
+ if (!cur_cpu_spec->oprofile_cpu_type ||
+ !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) ||
+ !cpu_has_feature(CPU_FTR_HVMODE))
+ return ret;
+
+ /*
+ * Nest PMU information is grouped under "nest-ima" node
+ * of the top-level device-tree directory. Detect Nest PMU
+ * by the "ibm,ima-chip" property.
+ */
+ if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
+ return ret;
+
+ /*
+ * Parse device-tree for Nest PMU information
+ */
+ ret = nest_ima_dt_parser();
+ if (ret)
+ return ret;
+
+ return 0;
+}
+device_initcall(nest_pmu_init);
--
1.9.1

2015-07-08 13:03:37

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v4 4/7] powerpc/powernv: detect supported nest pmus and its events

Parse device tree to detect supported nest pmu units. Traverse
through each nest pmu unit folder to find supported events and
corresponding unit/scale files (if any).

The nest unit event file from DT, will contain the offset in the
reserved memory region to get the counter data for a given event.
Kernel code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit in the file name and
passes on the file as an event attr for perf tool to use in the post
processing.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 124 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 123 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index e7d45ed..6116ff3 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,6 +11,119 @@
#include "nest-pmu.h"

static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+
+static int nest_event_info(struct property *pp, char *start,
+ struct nest_ima_events *p8_events, int flg, u32 val)
+{
+ char *buf;
+
+ /* memory for event name */
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ strncpy(buf, start, strlen(start));
+ p8_events->ev_name = buf;
+
+ /* memory for content */
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ if (flg) {
+ /* string content*/
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length))
+ return -EINVAL;
+
+ strncpy(buf, (const char *)pp->value, pp->length);
+ } else
+ sprintf(buf, "event=0x%x", val);
+
+ p8_events->ev_value = buf;
+ return 0;
+}
+
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+ struct nest_ima_events **p8_events_arr, *p8_events;
+ struct nest_pmu *pmu_ptr;
+ struct property *pp;
+ char *buf, *start;
+ const __be32 *lval;
+ u32 val;
+ int idx = 0, ret;
+
+ if (!dev)
+ return -EINVAL;
+
+ /* memory for nest pmus */
+ pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+ if (!pmu_ptr)
+ return -ENOMEM;
+
+ /* Needed for hotplug/migration */
+ per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+ /* memory for nest pmu events */
+ p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
+ GFP_KERNEL);
+ if (!p8_events_arr)
+ return -ENOMEM;
+ p8_events = (struct nest_ima_events *)p8_events_arr;
+
+ /*
+ * Loop through each property
+ */
+ for_each_property_of_node(dev, pp) {
+ start = pp->name;
+
+ if (!strcmp(pp->name, "name")) {
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length))
+ return -EINVAL;
+
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ /* Save the name to register it later */
+ sprintf(buf, "Nest_%s", (char *)pp->value);
+ pmu_ptr->pmu.name = (char *)buf;
+ continue;
+ }
+
+ /* Skip these, we dont need it */
+ if (!strcmp(pp->name, "phandle") ||
+ !strcmp(pp->name, "device_type") ||
+ !strcmp(pp->name, "linux,phandle"))
+ continue;
+
+ if (strncmp(pp->name, "unit.", 5) == 0) {
+ /* Skip first few chars in the name */
+ start += 5;
+ ret = nest_event_info(pp, start, p8_events++, 1, 0);
+ } else if (strncmp(pp->name, "scale.", 6) == 0) {
+ /* Skip first few chars in the name */
+ start += 6;
+ ret = nest_event_info(pp, start, p8_events++, 1, 0);
+ } else {
+ lval = of_get_property(dev, pp->name, NULL);
+ val = (uint32_t)be32_to_cpup(lval);
+
+ ret = nest_event_info(pp, start, p8_events++, 0, val);
+ }
+
+ if (ret)
+ return ret;
+
+ /* book keeping */
+ idx++;
+ }
+
+ return 0;
+}

static int nest_ima_dt_parser(void)
{
@@ -19,7 +132,7 @@ static int nest_ima_dt_parser(void)
const __be64 *chip_ima_size;
struct device_node *dev;
struct perchip_nest_info *p8ni;
- int idx;
+ int idx, ret;

/*
* "nest-ima" folder contains two things,
@@ -50,6 +163,15 @@ static int nest_ima_dt_parser(void)
p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
}

+ /* Look for supported Nest PMU units */
+ idx = 0;
+ for_each_node_by_type(dev, "nest-ima-unit") {
+ ret = nest_pmu_create(dev, idx);
+ if (ret)
+ return ret;
+ idx++;
+ }
+
return 0;
}

--
1.9.1

2015-07-08 13:03:43

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v4 5/7] powerpc/powernv: add event attribute and group to nest pmu

Add code to create event/format attributes and attribute groups for
each nest pmu.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 57 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 57 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 6116ff3..20ed9f8 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,17 @@
static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];

+PMU_FORMAT_ATTR(event, "config:0-20");
+struct attribute *p8_nest_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+struct attribute_group p8_nest_format_group = {
+ .name = "format",
+ .attrs = p8_nest_format_attrs,
+};
+
static int nest_event_info(struct property *pp, char *start,
struct nest_ima_events *p8_events, int flg, u32 val)
{
@@ -45,6 +56,48 @@ static int nest_event_info(struct property *pp, char *start,
return 0;
}

+/*
+ * Populate event name and string in attribute
+ */
+struct attribute *dev_str_attr(const char *name, const char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ attr->event_str = str;
+ attr->attr.attr.name = name;
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = perf_event_sysfs_show;
+
+ return &attr->attr.attr;
+}
+
+int update_events_in_group(
+ struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
+{
+ struct attribute_group *attr_group;
+ struct attribute **attrs;
+ int i;
+
+ /* Allocate memory for event attribute group */
+ attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
+ sizeof(*attr_group)), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ attrs = (struct attribute **)(attr_group + 1);
+ attr_group->name = "events";
+ attr_group->attrs = attrs;
+
+ for (i = 0; i < idx; i++, p8_events++)
+ attrs[i] = dev_str_attr((char *)p8_events->ev_name,
+ (char *)p8_events->ev_value);
+
+ pmu->attr_groups[0] = attr_group;
+ return 0;
+}
+
static int nest_pmu_create(struct device_node *dev, int pmu_index)
{
struct nest_ima_events **p8_events_arr, *p8_events;
@@ -91,6 +144,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
/* Save the name to register it later */
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
+ pmu_ptr->attr_groups[1] = &p8_nest_format_group;
continue;
}

@@ -122,6 +176,9 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
idx++;
}

+ update_events_in_group(
+ (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
+
return 0;
}

--
1.9.1

2015-07-08 13:03:51

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v4 6/7] powerpc/powernv: generic nest pmu event functions

Add set of generic nest pmu related event functions to be used by
each nest pmu. Add code to register nest pmus.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 104 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 20ed9f8..c2ada13 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -24,6 +24,100 @@ struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
};

+static int p8_nest_event_init(struct perf_event *event)
+{
+ int chip_id;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported yet */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ chip_id = topology_physical_package_id(event->cpu);
+ event->hw.event_base = event->attr.config +
+ p8_nest_perchip_info[chip_id].vbase;
+
+ return 0;
+}
+
+static void p8_nest_read_counter(struct perf_event *event)
+{
+ uint64_t *addr;
+ u64 data = 0;
+
+ addr = (u64 *)event->hw.event_base;
+ data = __be64_to_cpu(*addr);
+ local64_set(&event->hw.prev_count, data);
+}
+
+static void p8_nest_perf_event_update(struct perf_event *event)
+{
+ u64 counter_prev, counter_new, final_count;
+ uint64_t *addr;
+
+ addr = (uint64_t *)event->hw.event_base;
+ counter_prev = local64_read(&event->hw.prev_count);
+ counter_new = __be64_to_cpu(*addr);
+ final_count = counter_new - counter_prev;
+
+ local64_set(&event->hw.prev_count, counter_new);
+ local64_add(final_count, &event->count);
+}
+
+static void p8_nest_event_start(struct perf_event *event, int flags)
+{
+ event->hw.state = 0;
+ p8_nest_read_counter(event);
+}
+
+static void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_UPDATE)
+ p8_nest_perf_event_update(event);
+}
+
+static int p8_nest_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ p8_nest_event_start(event, flags);
+
+ return 0;
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+ if (!pmu)
+ return -EINVAL;
+
+ pmu->pmu.task_ctx_nr = perf_invalid_context;
+ pmu->pmu.event_init = p8_nest_event_init;
+ pmu->pmu.add = p8_nest_event_add;
+ pmu->pmu.del = p8_nest_event_stop;
+ pmu->pmu.start = p8_nest_event_start;
+ pmu->pmu.stop = p8_nest_event_stop;
+ pmu->pmu.read = p8_nest_perf_event_update;
+ pmu->pmu.attr_groups = pmu->attr_groups;
+
+ return 0;
+}
+
static int nest_event_info(struct property *pp, char *start,
struct nest_ima_events *p8_events, int flg, u32 val)
{
@@ -179,6 +273,16 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
update_events_in_group(
(struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);

+ update_pmu_ops(pmu_ptr);
+ /* Register the pmu */
+ ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+ if (ret) {
+ pr_err("Nest PMU %s Register failed\n", pmu_ptr->pmu.name);
+ return ret;
+ }
+
+ pr_info("%s performance monitor hardware support registered\n",
+ pmu_ptr->pmu.name);
return 0;
}

--
1.9.1

2015-07-08 13:03:57

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v4 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support

Adds cpumask attribute to be used by each nest pmu since nest
units are per-chip. Only one cpu (first online cpu) from each node/chip
is designated to read counters.

On cpu hotplug, dying cpu is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same node/chip is
designated as new cpu to read counters.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 146 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 146 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index c2ada13..31943c5 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,7 @@

static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+static cpumask_t nest_pmu_cpu_mask;

PMU_FORMAT_ATTR(event, "config:0-20");
struct attribute *p8_nest_format_attrs[] = {
@@ -24,6 +25,147 @@ struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
};

+static ssize_t nest_pmu_cpumask_get_attr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &nest_pmu_cpu_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *nest_pmu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group nest_pmu_cpumask_attr_group = {
+ .attrs = nest_pmu_cpumask_attrs,
+};
+
+static void nest_init(void *dummy)
+{
+ opal_nest_ima_control(P8_NEST_ENGINE_START);
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+ int i;
+
+ for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
+ perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+ old_cpu, new_cpu);
+}
+
+static void nest_exit_cpu(int cpu)
+{
+ int nid, target = -1;
+ struct cpumask *l_cpumask;
+
+ /*
+ * Check in the designated list for this cpu. Dont bother
+ * if not one of them.
+ */
+ if (!cpumask_test_and_clear_cpu(cpu, &nest_pmu_cpu_mask))
+ return;
+
+ /*
+ * Now that this cpu is one of the designated,
+ * find a next cpu a) which is online and b) in same chip.
+ */
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+ target = cpumask_next(cpu, l_cpumask);
+
+ /*
+ * Update the cpumask with the target cpu and
+ * migrate the context if needed
+ */
+ if (target >= 0 && target <= nr_cpu_ids) {
+ cpumask_set_cpu(target, &nest_pmu_cpu_mask);
+ nest_change_cpu_context(cpu, target);
+ }
+}
+
+static void nest_init_cpu(int cpu)
+{
+ int nid, fcpu, ncpu;
+ struct cpumask *l_cpumask, tmp_mask;
+
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+
+ /*
+ * if empty cpumask, just add incoming cpu and move on.
+ */
+ if (!cpumask_and(&tmp_mask, l_cpumask, &nest_pmu_cpu_mask)) {
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ return;
+ }
+
+ /*
+ * Alway have the first online cpu of a chip as designated one.
+ */
+ fcpu = cpumask_first(l_cpumask);
+ ncpu = cpumask_next(cpu, l_cpumask);
+ if (cpu == fcpu) {
+ if (cpumask_test_and_clear_cpu(ncpu, &nest_pmu_cpu_mask)) {
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ nest_change_cpu_context(ncpu, cpu);
+ }
+ }
+}
+
+static int nest_pmu_cpu_notifier(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ long cpu = (long)hcpu;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_ONLINE:
+ nest_init_cpu(cpu);
+ break;
+ case CPU_DOWN_PREPARE:
+ nest_exit_cpu(cpu);
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block nest_pmu_cpu_nb = {
+ .notifier_call = nest_pmu_cpu_notifier,
+ .priority = CPU_PRI_PERF + 1,
+};
+
+void nest_pmu_cpumask_init(void)
+{
+ const struct cpumask *l_cpumask;
+ int cpu, nid;
+
+ cpu_notifier_register_begin();
+
+ /*
+ * Nest PMUs are per-chip counters. So designate a cpu
+ * from each chip for counter collection.
+ */
+ for_each_online_node(nid) {
+ l_cpumask = cpumask_of_node(nid);
+
+ /* designate first online cpu in this node */
+ cpu = cpumask_first(l_cpumask);
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ }
+
+ /* Initialize Nest PMUs in each node using designated cpus */
+ on_each_cpu_mask(&nest_pmu_cpu_mask, (smp_call_func_t)nest_init, NULL, 1);
+
+ __register_cpu_notifier(&nest_pmu_cpu_nb);
+
+ cpu_notifier_register_done();
+}
+
static int p8_nest_event_init(struct perf_event *event)
{
int chip_id;
@@ -239,6 +381,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
pmu_ptr->attr_groups[1] = &p8_nest_format_group;
+ pmu_ptr->attr_groups[2] = &nest_pmu_cpumask_attr_group;
continue;
}

@@ -363,6 +506,9 @@ static int __init nest_pmu_init(void)
if (ret)
return ret;

+ /* Add cpumask and register for hotplug notification */
+ nest_pmu_cpumask_init();
+
return 0;
}
device_initcall(nest_pmu_init);
--
1.9.1

2015-07-08 22:02:43

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] powerpc/powernv: detect supported nest pmus and its events

Madhavan Srinivasan [[email protected]] wrote:
| Parse device tree to detect supported nest pmu units. Traverse
| through each nest pmu unit folder to find supported events and
| corresponding unit/scale files (if any).
|
| The nest unit event file from DT, will contain the offset in the
| reserved memory region to get the counter data for a given event.
| Kernel code uses this offset as event configuration value.
|
| Device tree parser code also looks for scale/unit in the file name and
| passes on the file as an event attr for perf tool to use in the post
| processing.
|
| Cc: Michael Ellerman <[email protected]>
| Cc: Benjamin Herrenschmidt <[email protected]>
| Cc: Paul Mackerras <[email protected]>
| Cc: Anton Blanchard <[email protected]>
| Cc: Sukadev Bhattiprolu <[email protected]>
| Cc: Anshuman Khandual <[email protected]>
| Cc: Stephane Eranian <[email protected]>
| Signed-off-by: Madhavan Srinivasan <[email protected]>
| ---
| arch/powerpc/perf/nest-pmu.c | 124 ++++++++++++++++++++++++++++++++++++++++++-
| 1 file changed, 123 insertions(+), 1 deletion(-)
|
| diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
| index e7d45ed..6116ff3 100644
| --- a/arch/powerpc/perf/nest-pmu.c
| +++ b/arch/powerpc/perf/nest-pmu.c
| @@ -11,6 +11,119 @@
| #include "nest-pmu.h"
|
| static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
| +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
| +
| +static int nest_event_info(struct property *pp, char *start,

nit: s/start/name/?

| + struct nest_ima_events *p8_events, int flg, u32 val)

nit: s/flg/string/?
| +{
| + char *buf;
| +
| + /* memory for event name */
| + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
| + if (!buf)
| + return -ENOMEM;
| +
| + strncpy(buf, start, strlen(start));
| + p8_events->ev_name = buf;
| +
| + /* memory for content */
| + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
| + if (!buf)
| + return -ENOMEM;
| +
| + if (flg) {
| + /* string content*/
| + if (!pp->value ||
| + (strnlen(pp->value, pp->length) == pp->length))
| + return -EINVAL;
| +
| + strncpy(buf, (const char *)pp->value, pp->length);
| + } else
| + sprintf(buf, "event=0x%x", val);
| +
| + p8_events->ev_value = buf;
| + return 0;
| +}
| +
| +static int nest_pmu_create(struct device_node *dev, int pmu_index)
| +{
| + struct nest_ima_events **p8_events_arr, *p8_events;
| + struct nest_pmu *pmu_ptr;
| + struct property *pp;
| + char *buf, *start;
| + const __be32 *lval;
| + u32 val;
| + int idx = 0, ret;
| +
| + if (!dev)
| + return -EINVAL;
| +
| + /* memory for nest pmus */
| + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
| + if (!pmu_ptr)
| + return -ENOMEM;
| +
| + /* Needed for hotplug/migration */
| + per_nest_pmu_arr[pmu_index] = pmu_ptr;
| +
| + /* memory for nest pmu events */
| + p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
| + GFP_KERNEL);
| + if (!p8_events_arr)
| + return -ENOMEM;
| + p8_events = (struct nest_ima_events *)p8_events_arr;
| +
| + /*
| + * Loop through each property
| + */
| + for_each_property_of_node(dev, pp) {
| + start = pp->name;
| +
| + if (!strcmp(pp->name, "name")) {
| + if (!pp->value ||
| + (strnlen(pp->value, pp->length) == pp->length))
| + return -EINVAL;

Do we need to check the string length here? If so, should we check against
size we are going to allocate below (P8_NEST_MAX_PMU_NAME_LEN)? Or is it
possible pp->value is not NULL terminated?

| +
| + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
| + if (!buf)
| + return -ENOMEM;
| +
| + /* Save the name to register it later */
| + sprintf(buf, "Nest_%s", (char *)pp->value);
| + pmu_ptr->pmu.name = (char *)buf;
| + continue;
| + }
| +
| + /* Skip these, we dont need it */
| + if (!strcmp(pp->name, "phandle") ||
| + !strcmp(pp->name, "device_type") ||
| + !strcmp(pp->name, "linux,phandle"))
| + continue;
| +
| + if (strncmp(pp->name, "unit.", 5) == 0) {
| + /* Skip first few chars in the name */
| + start += 5;
| + ret = nest_event_info(pp, start, p8_events++, 1, 0);
| + } else if (strncmp(pp->name, "scale.", 6) == 0) {
| + /* Skip first few chars in the name */
| + start += 6;
| + ret = nest_event_info(pp, start, p8_events++, 1, 0);

Are the 'start.*' and 'unit.*' files events by themselves or just attributes
of events?

These will show up in sysfs as:

$ ls /sys/bus/event_source/devices/Nest_Alink_BW//events/
Alink0 Alink0.unit Alink1.scale Alink2 Alink2.unit
Alink0.scale Alink1 Alink1.unit Alink2.scale

>From the other PMUs, the 'events' directory contains just events.

Attributes of events, like descriptions go into a separate directory:
Eg: For 24x7 counters:

$ ls -d /sys/bus/event_source/devices/hv_24x7/event*
/sys/bus/event_source/devices/hv_24x7/event_descs
/sys/bus/event_source/devices/hv_24x7/event_long_descs
/sys/bus/event_source/devices/hv_24x7/events

If events have/need parameters, they go into the event file itself:
Eg on an x86 box:

$ cat /sys/bus/event_source/devices/cpu/events/cache-misses
event=0x2e,umask=0x41

For the nest events, are the 'units' and 'scale' files needed to
identify the event (like the umask in x86 example above) or are they
informational attributes (like descriptions for 24x7 counters)?

IOW, following works on my x86 system (and with 24x7 counters):

for i in `ls /sys/bus/event_source/devices/cpu/events/`; do
perf stat -e cpu/$i/ sleep 1;
done

This loop will fail for Nest events, when it hits files like Alink0.unit.

That said, I am not sure if the above for loop is supposed to work always!
Maybe Peter Zijlstra can comment on that.

Are the units and scales needed/used by perf in computations? If just
informational and, given that we can locate them from the device-tree,
can we drop them altogether?

If they are needed by perf and are attributes of an event, can we
move them to separate directories?

/sys/bus/event_source/devices/Nest_Alink_BW/events
/sys/bus/event_source/devices/Nest_Alink_BW/units
/sys/bus/event_source/devices/Nest_Alink_BW/scales

Sukadev



| + } else {
| + lval = of_get_property(dev, pp->name, NULL);
| + val = (uint32_t)be32_to_cpup(lval);
| +
| + ret = nest_event_info(pp, start, p8_events++, 0, val);
| + }
| +
| + if (ret)
| + return ret;
| +
| + /* book keeping */
| + idx++;

I don't see idx being used after the increment?

| + }
| +
| + return 0;
| +}
|
| static int nest_ima_dt_parser(void)
| {
| @@ -19,7 +132,7 @@ static int nest_ima_dt_parser(void)
| const __be64 *chip_ima_size;
| struct device_node *dev;
| struct perchip_nest_info *p8ni;
| - int idx;
| + int idx, ret;
|
| /*
| * "nest-ima" folder contains two things,
| @@ -50,6 +163,15 @@ static int nest_ima_dt_parser(void)
| p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
| }
|
| + /* Look for supported Nest PMU units */
| + idx = 0;
| + for_each_node_by_type(dev, "nest-ima-unit") {
| + ret = nest_pmu_create(dev, idx);
| + if (ret)
| + return ret;
| + idx++;

idx not used?

| + }
| +
| return 0;
| }
|
| --
| 1.9.1

2015-07-09 13:12:04

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] powerpc/powernv: detect supported nest pmus and its events

Sukadev Bhattiprolu [[email protected]] wrote:
| | @@ -50,6 +163,15 @@ static int nest_ima_dt_parser(void)
| | p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
| | }
| |
| | + /* Look for supported Nest PMU units */
| | + idx = 0;
| | + for_each_node_by_type(dev, "nest-ima-unit") {
| | + ret = nest_pmu_create(dev, idx);
| | + if (ret)
| | + return ret;
| | + idx++;
|
| idx not used?

Sorry, disregard this. Had my blinders on :-(

2015-07-08 22:44:35

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] powerpc/powernv: add event attribute and group to nest pmu

Madhavan Srinivasan [[email protected]] wrote:
| Add code to create event/format attributes and attribute groups for
| each nest pmu.
|
| Cc: Michael Ellerman <[email protected]>
| Cc: Benjamin Herrenschmidt <[email protected]>
| Cc: Paul Mackerras <[email protected]>
| Cc: Anton Blanchard <[email protected]>
| Cc: Sukadev Bhattiprolu <[email protected]>
| Cc: Anshuman Khandual <[email protected]>
| Cc: Stephane Eranian <[email protected]>
| Signed-off-by: Madhavan Srinivasan <[email protected]>
| ---
| arch/powerpc/perf/nest-pmu.c | 57 ++++++++++++++++++++++++++++++++++++++++++++
| 1 file changed, 57 insertions(+)
|
| diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
| index 6116ff3..20ed9f8 100644
| --- a/arch/powerpc/perf/nest-pmu.c
| +++ b/arch/powerpc/perf/nest-pmu.c
| @@ -13,6 +13,17 @@
| static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
| static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
|
| +PMU_FORMAT_ATTR(event, "config:0-20");
| +struct attribute *p8_nest_format_attrs[] = {

name collision unlikely, but could this be static struct?

| + &format_attr_event.attr,
| + NULL,
| +};
| +
| +struct attribute_group p8_nest_format_group = {

static struct?

| + .name = "format",
| + .attrs = p8_nest_format_attrs,
| +};
| +
| static int nest_event_info(struct property *pp, char *start,
| struct nest_ima_events *p8_events, int flg, u32 val)
| {
| @@ -45,6 +56,48 @@ static int nest_event_info(struct property *pp, char *start,
| return 0;
| }
|
| +/*
| + * Populate event name and string in attribute
| + */
| +struct attribute *dev_str_attr(const char *name, const char *str)

static function?

| +{
| + struct perf_pmu_events_attr *attr;
| +
| + attr = kzalloc(sizeof(*attr), GFP_KERNEL);
| +

We recently needed following in 24x7 counters to keep lockdep happy:

sysfs_attr_init(&attr->attr.attr);

| + attr->event_str = str;
| + attr->attr.attr.name = name;
| + attr->attr.attr.mode = 0444;
| + attr->attr.show = perf_event_sysfs_show;
| +
| + return &attr->attr.attr;
| +}
| +
| +int update_events_in_group(

static function?

nit: do we need a new line before the first parameter? some functions
in the file don't add the new line.

| + struct nest_ima_events *p8_events, int nevents, struct nest_pmu *pmu)

s/idx/nevents/?

| +{
| + struct attribute_group *attr_group;
| + struct attribute **attrs;
| + int i;
| +
| + /* Allocate memory for event attribute group */
| + attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
| + sizeof(*attr_group)), GFP_KERNEL);
| + if (!attr_group)
| + return -ENOMEM;
| +
| + attrs = (struct attribute **)(attr_group + 1);

Can you add a comment on the +1?

| + attr_group->name = "events";
| + attr_group->attrs = attrs;
| +
| + for (i = 0; i < idx; i++, p8_events++)
| + attrs[i] = dev_str_attr((char *)p8_events->ev_name,
| + (char *)p8_events->ev_value);
| +
| + pmu->attr_groups[0] = attr_group;

The ->attr_groups[0] is initialized here, after the ->attr_groups[1] and
attr_groups[2] are initialized in caller. Since, ->attr_groups[1] and
->attr_groups[2] are set to global (loop-invariant) values, can we
initialize all the attribute-groups here? May need to rename function.

| + return 0;
| +}
| +
| static int nest_pmu_create(struct device_node *dev, int pmu_index)
| {
| struct nest_ima_events **p8_events_arr, *p8_events;
| @@ -91,6 +144,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
| /* Save the name to register it later */
| sprintf(buf, "Nest_%s", (char *)pp->value);
| pmu_ptr->pmu.name = (char *)buf;
| + pmu_ptr->attr_groups[1] = &p8_nest_format_group;
| continue;
| }
|
| @@ -122,6 +176,9 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
| idx++;
| }
|
| + update_events_in_group(

nit: need newline before first param?

| + (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
| +
| return 0;
| }
|
| --
| 1.9.1

2015-07-09 08:03:40

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] powerpc/powernv: detect supported nest pmus and its events



On Thursday 09 July 2015 03:31 AM, Sukadev Bhattiprolu wrote:
> Madhavan Srinivasan [[email protected]] wrote:
> | Parse device tree to detect supported nest pmu units. Traverse
> | through each nest pmu unit folder to find supported events and
> | corresponding unit/scale files (if any).
> |
> | The nest unit event file from DT, will contain the offset in the
> | reserved memory region to get the counter data for a given event.
> | Kernel code uses this offset as event configuration value.
> |
> | Device tree parser code also looks for scale/unit in the file name and
> | passes on the file as an event attr for perf tool to use in the post
> | processing.
> |
> | Cc: Michael Ellerman <[email protected]>
> | Cc: Benjamin Herrenschmidt <[email protected]>
> | Cc: Paul Mackerras <[email protected]>
> | Cc: Anton Blanchard <[email protected]>
> | Cc: Sukadev Bhattiprolu <[email protected]>
> | Cc: Anshuman Khandual <[email protected]>
> | Cc: Stephane Eranian <[email protected]>
> | Signed-off-by: Madhavan Srinivasan <[email protected]>
> | ---
> | arch/powerpc/perf/nest-pmu.c | 124 ++++++++++++++++++++++++++++++++++++++++++-
> | 1 file changed, 123 insertions(+), 1 deletion(-)
> |
> | diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
> | index e7d45ed..6116ff3 100644
> | --- a/arch/powerpc/perf/nest-pmu.c
> | +++ b/arch/powerpc/perf/nest-pmu.c
> | @@ -11,6 +11,119 @@
> | #include "nest-pmu.h"
> |
> | static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
> | +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
> | +
> | +static int nest_event_info(struct property *pp, char *start,
>
> nit: s/start/name/?

Yes. Will change it.

>
> | + struct nest_ima_events *p8_events, int flg, u32 val)
>
> nit: s/flg/string/?

Yes. Will change it.

> | +{
> | + char *buf;
> | +
> | + /* memory for event name */
> | + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
> | + if (!buf)
> | + return -ENOMEM;
> | +
> | + strncpy(buf, start, strlen(start));
> | + p8_events->ev_name = buf;
> | +
> | + /* memory for content */
> | + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
> | + if (!buf)
> | + return -ENOMEM;
> | +
> | + if (flg) {
> | + /* string content*/
> | + if (!pp->value ||
> | + (strnlen(pp->value, pp->length) == pp->length))
> | + return -EINVAL;
> | +
> | + strncpy(buf, (const char *)pp->value, pp->length);
> | + } else
> | + sprintf(buf, "event=0x%x", val);
> | +
> | + p8_events->ev_value = buf;
> | + return 0;
> | +}
> | +
> | +static int nest_pmu_create(struct device_node *dev, int pmu_index)
> | +{
> | + struct nest_ima_events **p8_events_arr, *p8_events;
> | + struct nest_pmu *pmu_ptr;
> | + struct property *pp;
> | + char *buf, *start;
> | + const __be32 *lval;
> | + u32 val;
> | + int idx = 0, ret;
> | +
> | + if (!dev)
> | + return -EINVAL;
> | +
> | + /* memory for nest pmus */
> | + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
> | + if (!pmu_ptr)
> | + return -ENOMEM;
> | +
> | + /* Needed for hotplug/migration */
> | + per_nest_pmu_arr[pmu_index] = pmu_ptr;
> | +
> | + /* memory for nest pmu events */
> | + p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
> | + GFP_KERNEL);
> | + if (!p8_events_arr)
> | + return -ENOMEM;
> | + p8_events = (struct nest_ima_events *)p8_events_arr;
> | +
> | + /*
> | + * Loop through each property
> | + */
> | + for_each_property_of_node(dev, pp) {
> | + start = pp->name;
> | +
> | + if (!strcmp(pp->name, "name")) {
> | + if (!pp->value ||
> | + (strnlen(pp->value, pp->length) == pp->length))
> | + return -EINVAL;
>
> Do we need to check the string length here? If so, should we check against
> size we are going to allocate below (P8_NEST_MAX_PMU_NAME_LEN)? Or is it
> possible pp->value is not NULL terminated?

Yes we need and can add a check for size is below the
P8_NEST_MAX_PMU_NAME_LEN .

> | +
> | + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
> | + if (!buf)
> | + return -ENOMEM;
> | +
> | + /* Save the name to register it later */
> | + sprintf(buf, "Nest_%s", (char *)pp->value);
> | + pmu_ptr->pmu.name = (char *)buf;
> | + continue;
> | + }
> | +
> | + /* Skip these, we dont need it */
> | + if (!strcmp(pp->name, "phandle") ||
> | + !strcmp(pp->name, "device_type") ||
> | + !strcmp(pp->name, "linux,phandle"))
> | + continue;
> | +
> | + if (strncmp(pp->name, "unit.", 5) == 0) {
> | + /* Skip first few chars in the name */
> | + start += 5;
> | + ret = nest_event_info(pp, start, p8_events++, 1, 0);
> | + } else if (strncmp(pp->name, "scale.", 6) == 0) {
> | + /* Skip first few chars in the name */
> | + start += 6;
> | + ret = nest_event_info(pp, start, p8_events++, 1, 0);
>
> Are the 'start.*' and 'unit.*' files events by themselves or just attributes
> of events?

These are attributes needed for computation. unit and scale attributes
will be used by perf tool in post-processing the counter data. These
can also use by other tools like pcp.

> These will show up in sysfs as:
>
> $ ls /sys/bus/event_source/devices/Nest_Alink_BW//events/
> Alink0 Alink0.unit Alink1.scale Alink2 Alink2.unit
> Alink0.scale Alink1 Alink1.unit Alink2.scale

Yes true.

> From the other PMUs, the 'events' directory contains just events.

For ex from x86 sandybridge system:

# ls
cas_count_read cas_count_read.unit cas_count_write.scale clockticks
cas_count_read.scale cas_count_write cas_count_write.unit

> Attributes of events, like descriptions go into a separate directory:
> Eg: For 24x7 counters:
>
> $ ls -d /sys/bus/event_source/devices/hv_24x7/event*
> /sys/bus/event_source/devices/hv_24x7/event_descs
> /sys/bus/event_source/devices/hv_24x7/event_long_descs
> /sys/bus/event_source/devices/hv_24x7/events

Yes. But these attributes are not informational like event description.

> If events have/need parameters, they go into the event file itself:
> Eg on an x86 box:
>
> $ cat /sys/bus/event_source/devices/cpu/events/cache-misses
> event=0x2e,umask=0x41
>
> For the nest events, are the 'units' and 'scale' files needed to
> identify the event (like the umask in x86 example above) or are they
> informational attributes (like descriptions for 24x7 counters)?

Yes. Again, these are not event parameter value to add in
the event configuration value or informational.

> IOW, following works on my x86 system (and with 24x7 counters):
>
> for i in `ls /sys/bus/event_source/devices/cpu/events/`; do
> perf stat -e cpu/$i/ sleep 1;
> done
>
> This loop will fail for Nest events, when it hits files like Alink0.unit.
>
> That said, I am not sure if the above for loop is supposed to work always!
> Maybe Peter Zijlstra can comment on that.

Yes.

> Are the units and scales needed/used by perf in computations? If just
> informational and, given that we can locate them from the device-tree,
> can we drop them altogether?
>
> If they are needed by perf and are attributes of an event, can we
> move them to separate directories?

I could prefer not to and here is my reason. Today perf tool
already have a mechanism to get the unit and scale values from
kernel, why to add one more and add code to perf tool to support it?

Thanks for review

Maddy


> /sys/bus/event_source/devices/Nest_Alink_BW/events
> /sys/bus/event_source/devices/Nest_Alink_BW/units
> /sys/bus/event_source/devices/Nest_Alink_BW/scales
>
> Sukadev
>
>
>
> | + } else {
> | + lval = of_get_property(dev, pp->name, NULL);
> | + val = (uint32_t)be32_to_cpup(lval);
> | +
> | + ret = nest_event_info(pp, start, p8_events++, 0, val);
> | + }
> | +
> | + if (ret)
> | + return ret;
> | +
> | + /* book keeping */
> | + idx++;
>
> I don't see idx being used after the increment?

Used in next patch.

> | + }
> | +
> | + return 0;
> | +}
> |
> | static int nest_ima_dt_parser(void)
> | {
> | @@ -19,7 +132,7 @@ static int nest_ima_dt_parser(void)
> | const __be64 *chip_ima_size;
> | struct device_node *dev;
> | struct perchip_nest_info *p8ni;
> | - int idx;
> | + int idx, ret;
> |
> | /*
> | * "nest-ima" folder contains two things,
> | @@ -50,6 +163,15 @@ static int nest_ima_dt_parser(void)
> | p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
> | }
> |
> | + /* Look for supported Nest PMU units */
> | + idx = 0;
> | + for_each_node_by_type(dev, "nest-ima-unit") {
> | + ret = nest_pmu_create(dev, idx);
> | + if (ret)
> | + return ret;
> | + idx++;
>
> idx not used?

used by code in patch 6 of this series.

Thanks
Maddy

> | + }
> | +
> | return 0;
> | }
> |
> | --
> | 1.9.1

2015-07-09 08:15:48

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] powerpc/powernv: add event attribute and group to nest pmu



On Thursday 09 July 2015 04:13 AM, Sukadev Bhattiprolu wrote:
> Madhavan Srinivasan [[email protected]] wrote:
> | Add code to create event/format attributes and attribute groups for
> | each nest pmu.
> |
> | Cc: Michael Ellerman <[email protected]>
> | Cc: Benjamin Herrenschmidt <[email protected]>
> | Cc: Paul Mackerras <[email protected]>
> | Cc: Anton Blanchard <[email protected]>
> | Cc: Sukadev Bhattiprolu <[email protected]>
> | Cc: Anshuman Khandual <[email protected]>
> | Cc: Stephane Eranian <[email protected]>
> | Signed-off-by: Madhavan Srinivasan <[email protected]>
> | ---
> | arch/powerpc/perf/nest-pmu.c | 57 ++++++++++++++++++++++++++++++++++++++++++++
> | 1 file changed, 57 insertions(+)
> |
> | diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
> | index 6116ff3..20ed9f8 100644
> | --- a/arch/powerpc/perf/nest-pmu.c
> | +++ b/arch/powerpc/perf/nest-pmu.c
> | @@ -13,6 +13,17 @@
> | static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
> | static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
> |
> | +PMU_FORMAT_ATTR(event, "config:0-20");
> | +struct attribute *p8_nest_format_attrs[] = {
>
> name collision unlikely, but could this be static struct?
>
> | + &format_attr_event.attr,
> | + NULL,
> | +};
> | +
> | +struct attribute_group p8_nest_format_group = {
>
> static struct?

Sure will check it.

>
> | + .name = "format",
> | + .attrs = p8_nest_format_attrs,
> | +};
> | +
> | static int nest_event_info(struct property *pp, char *start,
> | struct nest_ima_events *p8_events, int flg, u32 val)
> | {
> | @@ -45,6 +56,48 @@ static int nest_event_info(struct property *pp, char *start,
> | return 0;
> | }
> |
> | +/*
> | + * Populate event name and string in attribute
> | + */
> | +struct attribute *dev_str_attr(const char *name, const char *str)
>
> static function?

Sure. Will check it.

>
> | +{
> | + struct perf_pmu_events_attr *attr;
> | +
> | + attr = kzalloc(sizeof(*attr), GFP_KERNEL);
> | +
>
> We recently needed following in 24x7 counters to keep lockdep happy:

Yeah, i saw your patch and wanted to add it, but missed it. My bad
Will add it in the next version.

>
> sysfs_attr_init(&attr->attr.attr);
>
> | + attr->event_str = str;
> | + attr->attr.attr.name = name;
> | + attr->attr.attr.mode = 0444;
> | + attr->attr.show = perf_event_sysfs_show;
> | +
> | + return &attr->attr.attr;
> | +}
> | +
> | +int update_events_in_group(
>
> static function?
>
> nit: do we need a new line before the first parameter? some functions
> in the file don't add the new line.

Checkpatch complained abt 80 character limit, hence did a split.

>
> | + struct nest_ima_events *p8_events, int nevents, struct nest_pmu *pmu)
>
> s/idx/nevents/?

Weird, I dont see "nevents" in the patch I sent? it is "idx". Dont know
where
it came from :)

>
> | +{
> | + struct attribute_group *attr_group;
> | + struct attribute **attrs;
> | + int i;
> | +
> | + /* Allocate memory for event attribute group */
> | + attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
> | + sizeof(*attr_group)), GFP_KERNEL);
> | + if (!attr_group)
> | + return -ENOMEM;
> | +
> | + attrs = (struct attribute **)(attr_group + 1);
>
> Can you add a comment on the +1?

Sure, will do.

>
> | + attr_group->name = "events";
> | + attr_group->attrs = attrs;
> | +
> | + for (i = 0; i < idx; i++, p8_events++)
> | + attrs[i] = dev_str_attr((char *)p8_events->ev_name,
> | + (char *)p8_events->ev_value);
> | +
> | + pmu->attr_groups[0] = attr_group;
>
> The ->attr_groups[0] is initialized here, after the ->attr_groups[1] and
> attr_groups[2] are initialized in caller. Since, ->attr_groups[1] and
> ->attr_groups[2] are set to global (loop-invariant) values, can we
> initialize all the attribute-groups here? May need to rename function.

I dont get it, reinitialize all the three here in this function?

> | + return 0;
> | +}
> | +
> | static int nest_pmu_create(struct device_node *dev, int pmu_index)
> | {
> | struct nest_ima_events **p8_events_arr, *p8_events;
> | @@ -91,6 +144,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
> | /* Save the name to register it later */
> | sprintf(buf, "Nest_%s", (char *)pp->value);
> | pmu_ptr->pmu.name = (char *)buf;
> | + pmu_ptr->attr_groups[1] = &p8_nest_format_group;
> | continue;
> | }
> |
> | @@ -122,6 +176,9 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
> | idx++;
> | }
> |
> | + update_events_in_group(
>
> nit: need newline before first param?

once again, checkpatch warning of 80 character.

>
> | + (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
> | +
> | return 0;
> | }
> |
> | --
> | 1.9.1

2015-07-09 21:32:11

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] powerpc/powernv: detect supported nest pmus and its events

Madhavan Srinivasan [[email protected]] wrote:
|
| > Are the 'start.*' and 'unit.*' files events by themselves or just attributes
| > of events?
|
| These are attributes needed for computation. unit and scale attributes
| will be used by perf tool in post-processing the counter data. These
| can also use by other tools like pcp.

OK. Thanks for clarifying.