2015-07-16 11:13:49

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v5 0/7]powerpc/powernv: Nest Instrumentation support

This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.

Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.

OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.

Kernel code then, parses the device-tree for nest pmu support
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip reserved memory region, hence
device-tree also exports per-chip nest accumulation memory region.
And individual event offset are used as event configuration values.

Here is sample perf usage to explain the interface.

#./perf list
....
iTLB-load-misses [Hardware cache event]

Nest_Alink_BW/Alink0/ [Kernel PMU event]
Nest_Alink_BW/Alink1/ [Kernel PMU event]
Nest_Alink_BW/Alink2/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_03/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_03/ [Kernel PMU event]
Nest_PowerBus_BW/External/ [Kernel PMU event]
Nest_PowerBus_BW/Internal/ [Kernel PMU event]
Nest_Xlink_BW/Xlink0/ [Kernel PMU event]
Nest_Xlink_BW/Xlink1/ [Kernel PMU event]
Nest_Xlink_BW/Xlink2/ [Kernel PMU event]

rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
.....

# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1

Performance counter stats for 'system wide':

CPU0 15,913.18 MiB Nest_Xlink_BW/Xlink1/
CPU32 11,955.88 MiB Nest_Xlink_BW/Xlink1/
CPU64 11,042.43 MiB Nest_Xlink_BW/Xlink1/
CPU96 14,065.27 MiB Nest_Xlink_BW/Xlink1/

1.001062038 seconds time elapsed

# ./perf stat -e 'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 1000 sleep 5

Performance counter stats for 'system wide':

CPU0 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU32 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU64 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU96 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU0 1,430.43 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU32 320.99 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU64 3,443.83 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU96 1,904.41 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU0 2,856.85 MiB Nest_Alink_BW/Alink2/
CPU32 7.50 MiB Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB Nest_Alink_BW/Alink2/
CPU96 288.49 MiB Nest_Alink_BW/Alink2/
.....

OPAL side patches are posted in the skiboot mailing list.

Changelog from v4:

1) Variable name changes for consistency and added more comments
2) Added sysfs_att_init to have lockdep happy
3) Updated OPAL Call interface changes and added code to handle
failure case.
4) Added new macro "P8_NEST_MODE_PRODUCTION" to specify PORE Engine mode
5) Modified nest_pmu_cpumask_init function to return value to
nest pmu init function incase of OPAL call failure.

Changelog from v3:

No logic change, just a rebase to latest upstream kernel.

Changelog from v2:

1) Changed variable and macro names to be consistent.
2) Made changes to commit message and code comment messages
3) Moved "format attribute" related code from patch 6 to 5
4) Added check for pmu register function
5) Changed cpu_init and cpu_exit functions to use first online
cpu of the chip, there by making code lot simplier.

Changelog from v1:

1) No logic changes, re-ordered patches make each patch compile
without errors
2) Added comments based on the review feedback.
3) removed perf_event_del function and replaced it with perf_event_stop.
4) Moved Nest feature detection code out of parser function.
5) Optimized functions and removed some variables.
6) squashed the makefile changes, instead of the separate patch
7) squashed the cpumask and hotplug patches as single patch
8) Added cpu checks in nest_change_cpu_context and nest_exit_cpu functions
9) Made changes to commit messages.

Changelog from RFC:

1) Removed "uncore" code and made each Nest Unit a separate PMU.
2) Removed uncore type abstraction and uncore related functions.
3) Added simple cpumask function since these are per-chip counters
4) Redesigned device-tree parser based on the latest platform enablement code
5) Made changes to commit message

Thanks to input from Sukadev Bhattiprolu, Preeti Murthy, Daniel Axtens,
Suzuki Poulose and Michael Ellerman

Kindly let me know you comments and feedback.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>

Madhavan Srinivasan (7):
powerpc/powernv: Data structure and macros definition
powerpc/powernv: Add OPAL support for Nest PMU
powerpc/powernv: Nest PMU detection and device tree parser
powerpc/powernv: detect supported nest pmus and its events
powerpc/powernv: add event attribute and group to nest pmu
powerpc/powernv: generic nest pmu event functions
powerpc/powernv: nest pmu cpumask and cpu hotplug support

arch/powerpc/include/asm/opal-api.h | 3 +-
arch/powerpc/include/asm/opal.h | 1 +
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 551 +++++++++++++++++++++++++
arch/powerpc/perf/nest-pmu.h | 54 +++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
6 files changed, 610 insertions(+), 2 deletions(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c
create mode 100644 arch/powerpc/perf/nest-pmu.h

--
1.9.1


2015-07-16 11:15:43

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v5 1/7] powerpc/powernv: Data structure and macros definition

Create new header file "nest-pmu.h" to add the data structures
and macros needed for the nest pmu support.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.h | 54 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
create mode 100644 arch/powerpc/perf/nest-pmu.h

diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index 000000000000..28e3c6e024a6
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,54 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+#define P8_NEST_MAX_CHIPS 32
+#define P8_NEST_MAX_PMUS 32
+#define P8_NEST_MAX_PMU_NAME_LEN 256
+#define P8_NEST_MAX_EVENTS_SUPPORTED 256
+#define P8_NEST_ENGINE_START 1
+#define P8_NEST_ENGINE_STOP 0
+#define P8_NEST_MODE_PRODUCTION 1
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+ uint32_t chip_id;
+ uint64_t pbase;
+ uint64_t vbase;
+ uint32_t size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct nest_ima_events {
+ const char *ev_name;
+ const char *ev_value;
+};
+
+/*
+ * Device tree parser code detects nest pmu support and
+ * registers new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct nest_pmu {
+ struct pmu pmu;
+ const struct attribute_group *attr_groups[4];
+};
--
1.9.1

2015-07-16 11:13:50

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v5 2/7] powerpc/powernv: Add OPAL support for Nest PMU

Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.

OPAL side patches are posted in the skiboot mailing.

Cc: Stewart Smith <[email protected]>
Cc: Jeremy Kerr <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 3 ++-
arch/powerpc/include/asm/opal.h | 1 +
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index e9e4c52f3685..4cd8128c6ebc 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -154,7 +154,8 @@
#define OPAL_FLASH_WRITE 111
#define OPAL_FLASH_ERASE 112
#define OPAL_PRD_MSG 113
-#define OPAL_LAST 113
+#define OPAL_NEST_IMA_CONTROL 116
+#define OPAL_LAST 116

/* Device tree flags */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 958e941c0cda..7c813ed52ab4 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -202,6 +202,7 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
uint64_t size, uint64_t token);
int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_nest_ima_control(uint64_t mode, uint64_t value);

/* Internal functions */
extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index d6a7b8252e4d..c475c04468fb 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -297,3 +297,4 @@ OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
OPAL_CALL(opal_flash_write, OPAL_FLASH_WRITE);
OPAL_CALL(opal_flash_erase, OPAL_FLASH_ERASE);
OPAL_CALL(opal_prd_msg, OPAL_PRD_MSG);
+OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL);
--
1.9.1

2015-07-16 11:13:58

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v5 3/7] powerpc/powernv: Nest PMU detection and device tree parser

Create a file "nest-pmu.c" to contain nest pmu related functions. Code
to detect nest pmu support and parser to collect per-chip reserved memory
region information from device tree (DT).

Detection mechanism is to look for specific property "ibm,ima-chip" in DT.
For Nest pmu, device tree will have two set of information.
1) Per-chip reserved memory region for nest pmu counter collection area.
2) Supported Nest PMUs and events

Device tree layout for the Nest PMU as follows.

/ -- DT root folder
|
-nest-ima -- Nest PMU folder
|

-ima-chip@<chip-id> -- Per-chip folder for reserved region information
|
-ibm,chip-id -- Chip id
-ibm,ima-chip
-reg -- HOMER PORE Nest Counter collection Address (RA)
-size -- size to map in kernel space

-Alink_BW -- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit -- Event unit file
-device_type -- "nest-ima-unit" marker
....

Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 85 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 86 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a5652a..6da656b50e3c 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o
obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \
power5+-pmu.o power6-pmu.o power7-pmu.o \
- power8-pmu.o
+ power8-pmu.o nest-pmu.o
obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o

obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index 000000000000..e7d45ed4922d
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,85 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "nest-pmu.h"
+
+static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+
+static int nest_ima_dt_parser(void)
+{
+ const __be32 *gcid;
+ const __be64 *chip_ima_reg;
+ const __be64 *chip_ima_size;
+ struct device_node *dev;
+ struct perchip_nest_info *p8ni;
+ int idx;
+
+ /*
+ * "nest-ima" folder contains two things,
+ * a) per-chip reserved memory region for Nest PMU Counter data
+ * b) Support Nest PMU units and their event files
+ */
+ for_each_node_with_property(dev, "ibm,ima-chip") {
+ gcid = of_get_property(dev, "ibm,chip-id", NULL);
+ chip_ima_reg = of_get_property(dev, "reg", NULL);
+ chip_ima_size = of_get_property(dev, "size", NULL);
+
+ if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
+ pr_err("Nest_PMU: device %s missing property\n",
+ dev->full_name);
+ return -ENODEV;
+ }
+
+ /* chip id to save reserve memory region */
+ idx = (uint32_t)be32_to_cpup(gcid);
+
+ /*
+ * Using a local variable to make it compact and
+ * easier to read
+ */
+ p8ni = &p8_nest_perchip_info[idx];
+ p8ni->pbase = be64_to_cpup(chip_ima_reg);
+ p8ni->size = be64_to_cpup(chip_ima_size);
+ p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
+ }
+
+ return 0;
+}
+
+static int __init nest_pmu_init(void)
+{
+ int ret = -ENODEV;
+
+ /*
+ * Lets do this only if we are hypervisor
+ */
+ if (!cur_cpu_spec->oprofile_cpu_type ||
+ !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) ||
+ !cpu_has_feature(CPU_FTR_HVMODE))
+ return ret;
+
+ /*
+ * Nest PMU information is grouped under "nest-ima" node
+ * of the top-level device-tree directory. Detect Nest PMU
+ * by the "ibm,ima-chip" property.
+ */
+ if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
+ return ret;
+
+ /*
+ * Parse device-tree for Nest PMU information
+ */
+ ret = nest_ima_dt_parser();
+ if (ret)
+ return ret;
+
+ return 0;
+}
+device_initcall(nest_pmu_init);
--
1.9.1

2015-07-16 11:13:52

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v5 4/7] powerpc/powernv: detect supported nest pmus and its events

Parse device tree to detect supported nest pmu units. Traverse
through each nest pmu unit folder to find supported events and
corresponding unit/scale files (if any).

The nest unit event file from DT, will contain the offset in the
reserved memory region to get the counter data for a given event.
Kernel code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit in the file name and
passes on the file as an event attr for perf tool to use in the post
processing.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 126 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 125 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index e7d45ed4922d..c4c08e4dee55 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,6 +11,121 @@
#include "nest-pmu.h"

static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
+static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+
+static int nest_event_info(struct property *pp, char *name,
+ struct nest_ima_events *p8_events, int string, u32 val)
+{
+ char *buf;
+
+ /* memory for event name */
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ strncpy(buf, name, strlen(name));
+ p8_events->ev_name = buf;
+
+ /* memory for content */
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ if (string) {
+ /* string content*/
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length) ||
+ (pp->length > P8_NEST_MAX_PMU_NAME_LEN))
+ return -EINVAL;
+
+ strncpy(buf, (const char *)pp->value, pp->length);
+ } else
+ sprintf(buf, "event=0x%x", val);
+
+ p8_events->ev_value = buf;
+ return 0;
+}
+
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+ struct nest_ima_events **p8_events_arr, *p8_events;
+ struct nest_pmu *pmu_ptr;
+ struct property *pp;
+ char *buf, *start;
+ const __be32 *lval;
+ u32 val;
+ int idx = 0, ret;
+
+ if (!dev)
+ return -EINVAL;
+
+ /* memory for nest pmus */
+ pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+ if (!pmu_ptr)
+ return -ENOMEM;
+
+ /* Needed for hotplug/migration */
+ per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+ /* memory for nest pmu events */
+ p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
+ GFP_KERNEL);
+ if (!p8_events_arr)
+ return -ENOMEM;
+ p8_events = (struct nest_ima_events *)p8_events_arr;
+
+ /*
+ * Loop through each property
+ */
+ for_each_property_of_node(dev, pp) {
+ start = pp->name;
+
+ if (!strcmp(pp->name, "name")) {
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length) ||
+ (pp->length > P8_NEST_MAX_PMU_NAME_LEN))
+ return -EINVAL;
+
+ buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ /* Save the name to register it later */
+ sprintf(buf, "Nest_%s", (char *)pp->value);
+ pmu_ptr->pmu.name = (char *)buf;
+ continue;
+ }
+
+ /* Skip these, we dont need it */
+ if (!strcmp(pp->name, "phandle") ||
+ !strcmp(pp->name, "device_type") ||
+ !strcmp(pp->name, "linux,phandle"))
+ continue;
+
+ if (strncmp(pp->name, "unit.", 5) == 0) {
+ /* Skip first few chars in the name */
+ start += 5;
+ ret = nest_event_info(pp, start, p8_events++, 1, 0);
+ } else if (strncmp(pp->name, "scale.", 6) == 0) {
+ /* Skip first few chars in the name */
+ start += 6;
+ ret = nest_event_info(pp, start, p8_events++, 1, 0);
+ } else {
+ lval = of_get_property(dev, pp->name, NULL);
+ val = (uint32_t)be32_to_cpup(lval);
+
+ ret = nest_event_info(pp, start, p8_events++, 0, val);
+ }
+
+ if (ret)
+ return ret;
+
+ /* book keeping */
+ idx++;
+ }
+
+ return 0;
+}

static int nest_ima_dt_parser(void)
{
@@ -19,7 +134,7 @@ static int nest_ima_dt_parser(void)
const __be64 *chip_ima_size;
struct device_node *dev;
struct perchip_nest_info *p8ni;
- int idx;
+ int idx, ret;

/*
* "nest-ima" folder contains two things,
@@ -50,6 +165,15 @@ static int nest_ima_dt_parser(void)
p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
}

+ /* Look for supported Nest PMU units */
+ idx = 0;
+ for_each_node_by_type(dev, "nest-ima-unit") {
+ ret = nest_pmu_create(dev, idx);
+ if (ret)
+ return ret;
+ idx++;
+ }
+
return 0;
}

--
1.9.1

2015-07-16 11:14:05

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v5 5/7] powerpc/powernv: add event attribute and group to nest pmu

Add code to create event/format attributes and attribute groups for
each nest pmu.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index c4c08e4dee55..f3418bdec1cd 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,17 @@
static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];

+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *p8_nest_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group p8_nest_format_group = {
+ .name = "format",
+ .attrs = p8_nest_format_attrs,
+};
+
static int nest_event_info(struct property *pp, char *name,
struct nest_ima_events *p8_events, int string, u32 val)
{
@@ -46,6 +57,56 @@ static int nest_event_info(struct property *pp, char *name,
return 0;
}

+/*
+ * Populate event name and string in attribute
+ */
+static struct attribute *dev_str_attr(const char *name, const char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ sysfs_attr_init(&attr->attr.attr);
+
+ attr->event_str = str;
+ attr->attr.attr.name = name;
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = perf_event_sysfs_show;
+
+ return &attr->attr.attr;
+}
+
+static int update_events_in_group(
+ struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
+{
+ struct attribute_group *attr_group;
+ struct attribute **attrs;
+ int i;
+
+ /*
+ * Allocate memory for both event attribute group and for
+ * event attributes array.
+ */
+ attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
+ sizeof(*attr_group)), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ /*
+ * Assign memory for event attribute array
+ */
+ attrs = (struct attribute **)(attr_group + 1);
+ attr_group->name = "events";
+ attr_group->attrs = attrs;
+
+ for (i = 0; i < idx; i++, p8_events++)
+ attrs[i] = dev_str_attr((char *)p8_events->ev_name,
+ (char *)p8_events->ev_value);
+
+ pmu->attr_groups[0] = attr_group;
+ return 0;
+}
+
static int nest_pmu_create(struct device_node *dev, int pmu_index)
{
struct nest_ima_events **p8_events_arr, *p8_events;
@@ -93,6 +154,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
/* Save the name to register it later */
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
+ pmu_ptr->attr_groups[1] = &p8_nest_format_group;
continue;
}

@@ -124,6 +186,9 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
idx++;
}

+ update_events_in_group(
+ (struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);
+
return 0;
}

--
1.9.1

2015-07-16 11:14:09

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v5 6/7] powerpc/powernv: generic nest pmu event functions

Add set of generic nest pmu related event functions to be used by
each nest pmu. Add code to register nest pmus.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 105 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 105 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index f3418bdec1cd..2ebd0508e9b3 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -24,6 +24,101 @@ static struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
};

+static int p8_nest_event_init(struct perf_event *event)
+{
+ int chip_id;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported yet */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ chip_id = topology_physical_package_id(event->cpu);
+ event->hw.event_base = event->attr.config +
+ p8_nest_perchip_info[chip_id].vbase;
+
+ return 0;
+}
+
+static void p8_nest_read_counter(struct perf_event *event)
+{
+ uint64_t *addr;
+ u64 data = 0;
+
+ addr = (u64 *)event->hw.event_base;
+ data = __be64_to_cpu(*addr);
+ local64_set(&event->hw.prev_count, data);
+}
+
+static void p8_nest_perf_event_update(struct perf_event *event)
+{
+ u64 counter_prev, counter_new, final_count;
+ uint64_t *addr;
+
+ addr = (uint64_t *)event->hw.event_base;
+ counter_prev = local64_read(&event->hw.prev_count);
+ counter_new = __be64_to_cpu(*addr);
+ final_count = counter_new - counter_prev;
+
+ local64_set(&event->hw.prev_count, counter_new);
+ local64_add(final_count, &event->count);
+}
+
+static void p8_nest_event_start(struct perf_event *event, int flags)
+{
+ event->hw.state = 0;
+ p8_nest_read_counter(event);
+}
+
+static void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_UPDATE)
+ p8_nest_perf_event_update(event);
+}
+
+static int p8_nest_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ p8_nest_event_start(event, flags);
+
+ return 0;
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+ if (!pmu)
+ return -EINVAL;
+
+ pmu->pmu.task_ctx_nr = perf_invalid_context;
+ pmu->pmu.event_init = p8_nest_event_init;
+ pmu->pmu.add = p8_nest_event_add;
+ pmu->pmu.del = p8_nest_event_stop;
+ pmu->pmu.start = p8_nest_event_start;
+ pmu->pmu.stop = p8_nest_event_stop;
+ pmu->pmu.read = p8_nest_perf_event_update;
+ pmu->pmu.attr_groups = pmu->attr_groups;
+
+ return 0;
+}
+
+
static int nest_event_info(struct property *pp, char *name,
struct nest_ima_events *p8_events, int string, u32 val)
{
@@ -189,6 +284,16 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
update_events_in_group(
(struct nest_ima_events *)p8_events_arr, idx, pmu_ptr);

+ update_pmu_ops(pmu_ptr);
+ /* Register the pmu */
+ ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+ if (ret) {
+ pr_err("Nest PMU %s Register failed\n", pmu_ptr->pmu.name);
+ return ret;
+ }
+
+ pr_info("%s performance monitor hardware support registered\n",
+ pmu_ptr->pmu.name);
return 0;
}

--
1.9.1

2015-07-16 11:13:56

by Madhavan Srinivasan

[permalink] [raw]
Subject: [PATCH v5 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support

Adds cpumask attribute to be used by each nest pmu since nest
units are per-chip. Only one cpu (first online cpu) from each node/chip
is designated to read counters.

On cpu hotplug, dying cpu is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same node/chip is
designated as new cpu to read counters.

Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 172 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 172 insertions(+)

diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 2ebd0508e9b3..d3a2fd746cf9 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,7 @@

static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
+static cpumask_t nest_pmu_cpu_mask;

PMU_FORMAT_ATTR(event, "config:0-20");
static struct attribute *p8_nest_format_attrs[] = {
@@ -24,6 +25,172 @@ static struct attribute_group p8_nest_format_group = {
.attrs = p8_nest_format_attrs,
};

+static ssize_t nest_pmu_cpumask_get_attr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &nest_pmu_cpu_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, nest_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *nest_pmu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group nest_pmu_cpumask_attr_group = {
+ .attrs = nest_pmu_cpumask_attrs,
+};
+
+static void nest_init(int *loc)
+{
+ int rc;
+
+ rc = opal_nest_ima_control(
+ P8_NEST_MODE_PRODUCTION, P8_NEST_ENGINE_START);
+ if (rc)
+ loc[smp_processor_id()] = 1;
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+ int i;
+
+ for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
+ perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+ old_cpu, new_cpu);
+}
+
+static void nest_exit_cpu(int cpu)
+{
+ int nid, target = -1;
+ struct cpumask *l_cpumask;
+
+ /*
+ * Check in the designated list for this cpu. Dont bother
+ * if not one of them.
+ */
+ if (!cpumask_test_and_clear_cpu(cpu, &nest_pmu_cpu_mask))
+ return;
+
+ /*
+ * Now that this cpu is one of the designated,
+ * find a next cpu a) which is online and b) in same chip.
+ */
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+ target = cpumask_next(cpu, l_cpumask);
+
+ /*
+ * Update the cpumask with the target cpu and
+ * migrate the context if needed
+ */
+ if (target >= 0 && target <= nr_cpu_ids) {
+ cpumask_set_cpu(target, &nest_pmu_cpu_mask);
+ nest_change_cpu_context(cpu, target);
+ }
+}
+
+static void nest_init_cpu(int cpu)
+{
+ int nid, fcpu, ncpu;
+ struct cpumask *l_cpumask, tmp_mask;
+
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+
+ /*
+ * if empty cpumask, just add incoming cpu and move on.
+ */
+ if (!cpumask_and(&tmp_mask, l_cpumask, &nest_pmu_cpu_mask)) {
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ return;
+ }
+
+ /*
+ * Alway have the first online cpu of a chip as designated one.
+ */
+ fcpu = cpumask_first(l_cpumask);
+ ncpu = cpumask_next(cpu, l_cpumask);
+ if (cpu == fcpu) {
+ if (cpumask_test_and_clear_cpu(ncpu, &nest_pmu_cpu_mask)) {
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ nest_change_cpu_context(ncpu, cpu);
+ }
+ }
+}
+
+static int nest_pmu_cpu_notifier(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ long cpu = (long)hcpu;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_ONLINE:
+ nest_init_cpu(cpu);
+ break;
+ case CPU_DOWN_PREPARE:
+ nest_exit_cpu(cpu);
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block nest_pmu_cpu_nb = {
+ .notifier_call = nest_pmu_cpu_notifier,
+ .priority = CPU_PRI_PERF + 1,
+};
+
+static int nest_pmu_cpumask_init(void)
+{
+ const struct cpumask *l_cpumask;
+ int cpu, nid;
+ int *cpus_opal_rc;
+
+ cpu_notifier_register_begin();
+
+ /*
+ * Nest PMUs are per-chip counters. So designate a cpu
+ * from each chip for counter collection.
+ */
+ for_each_online_node(nid) {
+ l_cpumask = cpumask_of_node(nid);
+
+ /* designate first online cpu in this node */
+ cpu = cpumask_first(l_cpumask);
+ cpumask_set_cpu(cpu, &nest_pmu_cpu_mask);
+ }
+
+ /*
+ * Memory for OPAL call return value.
+ */
+ cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+ if (!cpus_opal_rc)
+ goto fail;
+
+ /* Initialize Nest PMUs in each node using designated cpus */
+ on_each_cpu_mask(&nest_pmu_cpu_mask, (smp_call_func_t)nest_init,
+ (void *)cpus_opal_rc, 1);
+
+ /* Check return value array for any OPAL call failure */
+ for_each_cpu(cpu, &nest_pmu_cpu_mask) {
+ if (cpus_opal_rc[cpu])
+ goto fail;
+ }
+
+ __register_cpu_notifier(&nest_pmu_cpu_nb);
+
+ cpu_notifier_register_done();
+ return 0;
+
+fail:
+ cpu_notifier_register_done();
+ return -ENODEV;
+}
+
static int p8_nest_event_init(struct perf_event *event)
{
int chip_id;
@@ -250,6 +417,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
pmu_ptr->attr_groups[1] = &p8_nest_format_group;
+ pmu_ptr->attr_groups[2] = &nest_pmu_cpumask_attr_group;
continue;
}

@@ -359,6 +527,10 @@ static int __init nest_pmu_init(void)
!cpu_has_feature(CPU_FTR_HVMODE))
return ret;

+ /* Add cpumask and register for hotplug notification */
+ if (nest_pmu_cpumask_init())
+ return ret;
+
/*
* Nest PMU information is grouped under "nest-ima" node
* of the top-level device-tree directory. Detect Nest PMU
--
1.9.1

2015-07-20 18:44:29

by Sukadev Bhattiprolu

[permalink] [raw]
Subject: Re: [PATCH v5 0/7]powerpc/powernv: Nest Instrumentation support

Madhavan Srinivasan [[email protected]] wrote:
| This patchset enables Nest Instrumentation support on powerpc.
| POWER8 has per-chip Nest Intrumentation which provides various
| per-chip metrics like memory, powerbus, Xlink and Alink
| bandwidth.
|
<snip>

| Cc: Michael Ellerman <[email protected]>
| Cc: Benjamin Herrenschmidt <[email protected]>
| Cc: Paul Mackerras <[email protected]>
| Cc: Anton Blanchard <[email protected]>
| Cc: Sukadev Bhattiprolu <[email protected]>
| Cc: Anshuman Khandual <[email protected]>
| Cc: Stephane Eranian <[email protected]>
| Signed-off-by: Madhavan Srinivasan <[email protected]>

Thanks for addressing my comments from earlier version.

Reviewed-by: Sukadev Bhattiprolu <[email protected]>

2015-07-22 03:51:35

by Daniel Axtens

[permalink] [raw]
Subject: Re: [PATCH v5 3/7] powerpc/powernv: Nest PMU detection and device tree parser

Hi,

> +static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
> +
> +static int nest_ima_dt_parser(void)
> +{
> + const __be32 *gcid;
> + const __be64 *chip_ima_reg;
> + const __be64 *chip_ima_size;
> + struct device_node *dev;
> + struct perchip_nest_info *p8ni;
> + int idx;
> +
> + /*
> + * "nest-ima" folder contains two things,
> + * a) per-chip reserved memory region for Nest PMU Counter data
> + * b) Support Nest PMU units and their event files
> + */
> + for_each_node_with_property(dev, "ibm,ima-chip") {
> + gcid = of_get_property(dev, "ibm,chip-id", NULL);
> + chip_ima_reg = of_get_property(dev, "reg", NULL);
> + chip_ima_size = of_get_property(dev, "size", NULL);
> +
> + if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
> + pr_err("Nest_PMU: device %s missing property\n",
> + dev->full_name);
> + return -ENODEV;
> + }
> +
> + /* chip id to save reserve memory region */
> + idx = (uint32_t)be32_to_cpup(gcid);
So be32_to_cpup returns a __u32. You're casting to a uint32_t and then
assigning to an int.
- Do you need the intermediate cast?
- Should idx be an unsigned type?
> +
> + /*
> + * Using a local variable to make it compact and
> + * easier to read
> + */
We probably don't need this comment. But a better variable name would be
helpful!
> + p8ni = &p8_nest_perchip_info[idx];
> + p8ni->pbase = be64_to_cpup(chip_ima_reg);
> + p8ni->size = be64_to_cpup(chip_ima_size);
> + p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
> + }
> +
> + return 0;
> +}
> +
> +static int __init nest_pmu_init(void)
> +{
> + int ret = -ENODEV;
> +
> + /*
> + * Lets do this only if we are hypervisor
> + */
> + if (!cur_cpu_spec->oprofile_cpu_type ||
> + !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) ||
> + !cpu_has_feature(CPU_FTR_HVMODE))
> + return ret;
> +
I'm still really uncomfortable with doing this via oprofile_cpu_type.
If the kernel is compiled without oprofile support, will that get
populated?

I'm also curious about why we need checking for power8 at all. We
already know we're not going to run on hardware without a nest PMU
because of the device tree check below.

What happens if there's a future generation of chip that supports nest
PMUs?

If it's really important to check versions in this function, maybe you
could put a version property in the ibm,ima-chip node?
> + /*
> + * Nest PMU information is grouped under "nest-ima" node
> + * of the top-level device-tree directory. Detect Nest PMU
> + * by the "ibm,ima-chip" property.
> + */
> + if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
> + return ret;
> +
> + /*
> + * Parse device-tree for Nest PMU information
> + */
> + ret = nest_ima_dt_parser();
> + if (ret)
> + return ret;
> +
> + return 0;
> +}
> +device_initcall(nest_pmu_init);

--
Regards,
Daniel


Attachments:
signature.asc (860.00 B)
This is a digitally signed message part

2015-07-22 04:09:37

by Daniel Axtens

[permalink] [raw]
Subject: Re: [PATCH v5 4/7] powerpc/powernv: detect supported nest pmus and its events


> static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
> +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
> +
> +static int nest_event_info(struct property *pp, char *name,
> + struct nest_ima_events *p8_events, int string, u32 val)
'int string' is a bit confusing. 'bool is_string' might be clearer, but
I think it would be even better still to have different functions for
string and non-string cases, especially because you only need val in the
non-string case.

That will also allow you to give the functions clearer names. I think
the function is populating the event with info from the dt property (in
the string case) or the val argument (non-string case) - maybe the names
could reflect that somehow?
> +{
> + char *buf;
> +


> +
> +static int nest_pmu_create(struct device_node *dev, int pmu_index)
> +{
> + struct nest_ima_events **p8_events_arr, *p8_events;
> + struct nest_pmu *pmu_ptr;
> + struct property *pp;
> + char *buf, *start;
> + const __be32 *lval;
> + u32 val;
> + int idx = 0, ret;
> +
> + if (!dev)
> + return -EINVAL;
> +
> + /* memory for nest pmus */
> + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
> + if (!pmu_ptr)
> + return -ENOMEM;
> +
> + /* Needed for hotplug/migration */
> + per_nest_pmu_arr[pmu_index] = pmu_ptr;
> +
> + /* memory for nest pmu events */
> + p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
> + GFP_KERNEL);
> + if (!p8_events_arr)
> + return -ENOMEM;
> + p8_events = (struct nest_ima_events *)p8_events_arr;

I'm still quite uncomfortable with this.
- Why * 64? Should it be * P8_NEST_MAX_EVENTS_SUPPORTED? Or is it a
different constant?
- p8_events = p8_events_arr[0] would be clearer

> +
> + /*
> + * Loop through each property
> + */
> + for_each_property_of_node(dev, pp) {
> + start = pp->name;
> +
> + if (!strcmp(pp->name, "name")) {
> + if (!pp->value ||
> + (strnlen(pp->value, pp->length) == pp->length) ||
> + (pp->length > P8_NEST_MAX_PMU_NAME_LEN))
> + return -EINVAL;
> +
> + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + /* Save the name to register it later */
> + sprintf(buf, "Nest_%s", (char *)pp->value);
> + pmu_ptr->pmu.name = (char *)buf;
> + continue;
> + }
> +
> + /* Skip these, we dont need it */
"don't" instead of "dont".
> + if (!strcmp(pp->name, "phandle") ||
> + !strcmp(pp->name, "device_type") ||
> + !strcmp(pp->name, "linux,phandle"))
> + continue;
> +
> + if (strncmp(pp->name, "unit.", 5) == 0) {
> + /* Skip first few chars in the name */
The whole comment is pretty uninformative, as is the similar comment
below. If you need a comment at all, maybe something along the lines of
"Strip the prefix because <reason we don't need/want the prefix>"?
> + start += 5;
> + ret = nest_event_info(pp, start, p8_events++, 1, 0);
> + } else if (strncmp(pp->name, "scale.", 6) == 0) {
> + /* Skip first few chars in the name */
> + start += 6;
> + ret = nest_event_info(pp, start, p8_events++, 1, 0);
> + } else {
> + lval = of_get_property(dev, pp->name, NULL);
> + val = (uint32_t)be32_to_cpup(lval);
> +
> + ret = nest_event_info(pp, start, p8_events++, 0, val);
> + }
> +
> + if (ret)
> + return ret;
> +
> + /* book keeping */
> + idx++;
You don't seem to use idx in this function, apart from incrementing it
here...?
> + }
> +
> + return 0;
> +}


--
Regards,
Daniel


Attachments:
signature.asc (860.00 B)
This is a digitally signed message part

2015-07-22 04:46:48

by Daniel Axtens

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] powerpc/powernv: add event attribute and group to nest pmu

On Thu, 2015-07-16 at 16:43 +0530, Madhavan Srinivasan wrote:
> Add code to create event/format attributes and attribute groups for
> each nest pmu.
>
> Cc: Michael Ellerman <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Anton Blanchard <[email protected]>
> Cc: Sukadev Bhattiprolu <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Signed-off-by: Madhavan Srinivasan <[email protected]>
> ---
> arch/powerpc/perf/nest-pmu.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 65 insertions(+)
>
> diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
> index c4c08e4dee55..f3418bdec1cd 100644
> --- a/arch/powerpc/perf/nest-pmu.c
> +++ b/arch/powerpc/perf/nest-pmu.c
> @@ -13,6 +13,17 @@
> static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
> static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
>
> +PMU_FORMAT_ATTR(event, "config:0-20");
> +static struct attribute *p8_nest_format_attrs[] = {
> + &format_attr_event.attr,
> + NULL,
> +};
> +
> +static struct attribute_group p8_nest_format_group = {
> + .name = "format",
> + .attrs = p8_nest_format_attrs,
> +};
> +
> static int nest_event_info(struct property *pp, char *name,
> struct nest_ima_events *p8_events, int string, u32 val)
> {
> @@ -46,6 +57,56 @@ static int nest_event_info(struct property *pp, char *name,
> return 0;
> }
>
> +/*
> + * Populate event name and string in attribute
> + */
> +static struct attribute *dev_str_attr(const char *name, const char *str)
> +{
> + struct perf_pmu_events_attr *attr;
> +
> + attr = kzalloc(sizeof(*attr), GFP_KERNEL);
> +
> + sysfs_attr_init(&attr->attr.attr);
> +
> + attr->event_str = str;
> + attr->attr.attr.name = name;
> + attr->attr.attr.mode = 0444;
> + attr->attr.show = perf_event_sysfs_show;
> +
> + return &attr->attr.attr;
So I asked you about this before, and you pointed me to
perf_event_sysfs_show. Looking at that in kernel/events/core.c, it looks
like that uses container_of to pull out the perf_pmu_events_attr. So I
guess that is at least mostly correct.

I'm hoping something else uses container_of to pull out attr->attr, so
that they can actually grab the attr->attr.show function pointer, so
that perf_event_sysfs_show actually gets called. Where would that be?

> +}
> +
> +static int update_events_in_group(
> + struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
> +{
> + struct attribute_group *attr_group;
> + struct attribute **attrs;
> + int i;
> +
> + /*
> + * Allocate memory for both event attribute group and for
> + * event attributes array.
> + */
> + attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
> + sizeof(*attr_group)), GFP_KERNEL);
> + if (!attr_group)
> + return -ENOMEM;
> +
> + /*
> + * Assign memory for event attribute array
> + */
> + attrs = (struct attribute **)(attr_group + 1);
> + attr_group->name = "events";
> + attr_group->attrs = attrs;
I am super uncomfortable with this block, especially the assignment to
attrs. I *think* you're trying to allocate an attribute group and a set
of attributes, but you've combined the allocation into one big
contiguous chunk, and then you're trying to tease them apart. Is that
necessary? Could it be two allocs, one for the attribute_group and one
for the attribute?

--
Regards,
Daniel


Attachments:
signature.asc (860.00 B)
This is a digitally signed message part

2015-07-22 04:57:53

by Daniel Axtens

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] powerpc/powernv: generic nest pmu event functions


> +static void p8_nest_read_counter(struct perf_event *event)
> +{
> + uint64_t *addr;
> + u64 data = 0;
You've got a u64 and a uint64_t, and then...
> +
> + addr = (u64 *)event->hw.event_base;
... you cast to event_base to a u64 pointer, which you assign to a
uint64_t pointer.
> + data = __be64_to_cpu(*addr);
And now you dereference the pointer.
Could you just have:
data = __be64_to_cpu(*event->hw.event_base);

> + local64_set(&event->hw.prev_count, data);
> +}
> +
> +static void p8_nest_perf_event_update(struct perf_event *event)
> +{
> + u64 counter_prev, counter_new, final_count;
> + uint64_t *addr;
> +
> + addr = (uint64_t *)event->hw.event_base;
Here at least the cast type is the same as the type of addr, but again,
why do you need the different types, and why local variable?
> + counter_prev = local64_read(&event->hw.prev_count);
> + counter_new = __be64_to_cpu(*addr);
> + final_count = counter_new - counter_prev;
> +
> + local64_set(&event->hw.prev_count, counter_new);
> + local64_add(final_count, &event->count);
> +}
> +
> +static void p8_nest_event_start(struct perf_event *event, int flags)
> +{
> + event->hw.state = 0;
Should this be an enum or a #define rather than a bare 0? (It may not
need to be, I was just wondering because I don't know what 0 means.)
> + p8_nest_read_counter(event);
> +}
> +
--
Regards,
Daniel


Attachments:
signature.asc (860.00 B)
This is a digitally signed message part

2015-07-22 05:05:19

by Daniel Axtens

[permalink] [raw]
Subject: Re: [PATCH v5 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support

> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> +{
> + int i;
> +
> + for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
> + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> + old_cpu, new_cpu);
From patch 4, I see per_nest_pmu_arr is defined as:
+static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];

Therefore, does this loop need to have a check that
i < P8_NEST_MAX_PMUS?


--
Regards,
Daniel


Attachments:
signature.asc (860.00 B)
This is a digitally signed message part

2015-07-23 05:56:03

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 3/7] powerpc/powernv: Nest PMU detection and device tree parser



On Wednesday 22 July 2015 09:19 AM, Daniel Axtens wrote:
> Hi,
>
>> +static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
>> +
>> +static int nest_ima_dt_parser(void)
>> +{
>> + const __be32 *gcid;
>> + const __be64 *chip_ima_reg;
>> + const __be64 *chip_ima_size;
>> + struct device_node *dev;
>> + struct perchip_nest_info *p8ni;
>> + int idx;
>> +
>> + /*
>> + * "nest-ima" folder contains two things,
>> + * a) per-chip reserved memory region for Nest PMU Counter data
>> + * b) Support Nest PMU units and their event files
>> + */
>> + for_each_node_with_property(dev, "ibm,ima-chip") {
>> + gcid = of_get_property(dev, "ibm,chip-id", NULL);
>> + chip_ima_reg = of_get_property(dev, "reg", NULL);
>> + chip_ima_size = of_get_property(dev, "size", NULL);
>> +
>> + if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
>> + pr_err("Nest_PMU: device %s missing property\n",
>> + dev->full_name);
>> + return -ENODEV;
>> + }
>> +
>> + /* chip id to save reserve memory region */
>> + idx = (uint32_t)be32_to_cpup(gcid);
> So be32_to_cpup returns a __u32. You're casting to a uint32_t and then
> assigning to an int.
> - Do you need the intermediate cast?
> - Should idx be an unsigned type?

my bad, sorry abt type case of uint to int.
And your are right, idx can be __u32 (__u32 and uint32_t are same i
guess).

>> +
>> + /*
>> + * Using a local variable to make it compact and
>> + * easier to read
>> + */
> We probably don't need this comment. But a better variable name would be
> helpful!

I dont want a long name since i end up with 80 char limit warning.
but let me see.

>> + p8ni = &p8_nest_perchip_info[idx];
>> + p8ni->pbase = be64_to_cpup(chip_ima_reg);
>> + p8ni->size = be64_to_cpup(chip_ima_size);
>> + p8ni->vbase = (uint64_t) phys_to_virt(p8ni->pbase);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int __init nest_pmu_init(void)
>> +{
>> + int ret = -ENODEV;
>> +
>> + /*
>> + * Lets do this only if we are hypervisor
>> + */
>> + if (!cur_cpu_spec->oprofile_cpu_type ||
>> + !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) ||
>> + !cpu_has_feature(CPU_FTR_HVMODE))
>> + return ret;
>> +
> I'm still really uncomfortable with doing this via oprofile_cpu_type.
> If the kernel is compiled without oprofile support, will that get
> populated?

I checked the per thread pmu register code and it all does the name.
But that should not stop nest pmu to enable. So probability,
I can carry only the HV mode check and drop the oprofile check.

>
> I'm also curious about why we need checking for power8 at all. We
> already know we're not going to run on hardware without a nest PMU
> because of the device tree check below.
> What happens if there's a future generation of chip that supports nest
> PMUs?
>
> If it's really important to check versions in this function, maybe you
> could put a version property in the ibm,ima-chip node?

True. I should not checkout power8 now, since we enable based on device tree
entries for Nest pmu.

>> + /*
>> + * Nest PMU information is grouped under "nest-ima" node
>> + * of the top-level device-tree directory. Detect Nest PMU
>> + * by the "ibm,ima-chip" property.
>> + */
>> + if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
>> + return ret;
>> +
>> + /*
>> + * Parse device-tree for Nest PMU information
>> + */
>> + ret = nest_ima_dt_parser();
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +device_initcall(nest_pmu_init);

2015-07-23 06:04:18

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 4/7] powerpc/powernv: detect supported nest pmus and its events



On Wednesday 22 July 2015 09:37 AM, Daniel Axtens wrote:
>
>> static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
>> +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
>> +
>> +static int nest_event_info(struct property *pp, char *name,
>> + struct nest_ima_events *p8_events, int string, u32 val)
> 'int string' is a bit confusing. 'bool is_string' might be clearer, but
> I think it would be even better still to have different functions for
> string and non-string cases, especially because you only need val in the
> non-string case.

I would perfer to be a single function, since most of the code is same
just of the sting or val part. yes. We can make is as is_string and will
add
comment explaining what is done here?

> That will also allow you to give the functions clearer names. I think
> the function is populating the event with info from the dt property (in
> the string case) or the val argument (non-string case) - maybe the names
> could reflect that somehow?
>> +{
>> + char *buf;
>> +
>
>> +
>> +static int nest_pmu_create(struct device_node *dev, int pmu_index)
>> +{
>> + struct nest_ima_events **p8_events_arr, *p8_events;
>> + struct nest_pmu *pmu_ptr;
>> + struct property *pp;
>> + char *buf, *start;
>> + const __be32 *lval;
>> + u32 val;
>> + int idx = 0, ret;
>> +
>> + if (!dev)
>> + return -EINVAL;
>> +
>> + /* memory for nest pmus */
>> + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
>> + if (!pmu_ptr)
>> + return -ENOMEM;
>> +
>> + /* Needed for hotplug/migration */
>> + per_nest_pmu_arr[pmu_index] = pmu_ptr;
>> +
>> + /* memory for nest pmu events */
>> + p8_events_arr = kzalloc((sizeof(struct nest_ima_events) * 64),
>> + GFP_KERNEL);
>> + if (!p8_events_arr)
>> + return -ENOMEM;
>> + p8_events = (struct nest_ima_events *)p8_events_arr;
> I'm still quite uncomfortable with this.
> - Why * 64? Should it be * P8_NEST_MAX_EVENTS_SUPPORTED? Or is it a
> different constant?

Yes it should be P8_NEST_MAX_EVENTS_SUPPORTED, and it should be
define as 64. Missed it to replace the macro. Nice catch.

> - p8_events = p8_events_arr[0] would be clearer
>
>> +
>> + /*
>> + * Loop through each property
>> + */
>> + for_each_property_of_node(dev, pp) {
>> + start = pp->name;
>> +
>> + if (!strcmp(pp->name, "name")) {
>> + if (!pp->value ||
>> + (strnlen(pp->value, pp->length) == pp->length) ||
>> + (pp->length > P8_NEST_MAX_PMU_NAME_LEN))
>> + return -EINVAL;
>> +
>> + buf = kzalloc(P8_NEST_MAX_PMU_NAME_LEN, GFP_KERNEL);
>> + if (!buf)
>> + return -ENOMEM;
>> +
>> + /* Save the name to register it later */
>> + sprintf(buf, "Nest_%s", (char *)pp->value);
>> + pmu_ptr->pmu.name = (char *)buf;
>> + continue;
>> + }
>> +
>> + /* Skip these, we dont need it */
> "don't" instead of "dont".

Will change it.

>> + if (!strcmp(pp->name, "phandle") ||
>> + !strcmp(pp->name, "device_type") ||
>> + !strcmp(pp->name, "linux,phandle"))
>> + continue;
>> +
>> + if (strncmp(pp->name, "unit.", 5) == 0) {
>> + /* Skip first few chars in the name */
> The whole comment is pretty uninformative, as is the similar comment
> below. If you need a comment at all, maybe something along the lines of
> "Strip the prefix because <reason we don't need/want the prefix>"?

Yes will change it. Thanks

>> + start += 5;
>> + ret = nest_event_info(pp, start, p8_events++, 1, 0);
>> + } else if (strncmp(pp->name, "scale.", 6) == 0) {
>> + /* Skip first few chars in the name */
>> + start += 6;
>> + ret = nest_event_info(pp, start, p8_events++, 1, 0);
>> + } else {
>> + lval = of_get_property(dev, pp->name, NULL);
>> + val = (uint32_t)be32_to_cpup(lval);
>> +
>> + ret = nest_event_info(pp, start, p8_events++, 0, val);
>> + }
>> +
>> + if (ret)
>> + return ret;
>> +
>> + /* book keeping */
>> + idx++;
> You don't seem to use idx in this function, apart from incrementing it
> here...?

Used in next patch.

>> + }
>> +
>> + return 0;
>> +}
>

2015-07-23 06:33:13

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] powerpc/powernv: add event attribute and group to nest pmu



On Wednesday 22 July 2015 10:14 AM, Daniel Axtens wrote:
> On Thu, 2015-07-16 at 16:43 +0530, Madhavan Srinivasan wrote:
>> Add code to create event/format attributes and attribute groups for
>> each nest pmu.
>>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Anton Blanchard <[email protected]>
>> Cc: Sukadev Bhattiprolu <[email protected]>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Stephane Eranian <[email protected]>
>> Signed-off-by: Madhavan Srinivasan <[email protected]>
>> ---
>> arch/powerpc/perf/nest-pmu.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 65 insertions(+)
>>
>> diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
>> index c4c08e4dee55..f3418bdec1cd 100644
>> --- a/arch/powerpc/perf/nest-pmu.c
>> +++ b/arch/powerpc/perf/nest-pmu.c
>> @@ -13,6 +13,17 @@
>> static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
>> static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
>>
>> +PMU_FORMAT_ATTR(event, "config:0-20");
>> +static struct attribute *p8_nest_format_attrs[] = {
>> + &format_attr_event.attr,
>> + NULL,
>> +};
>> +
>> +static struct attribute_group p8_nest_format_group = {
>> + .name = "format",
>> + .attrs = p8_nest_format_attrs,
>> +};
>> +
>> static int nest_event_info(struct property *pp, char *name,
>> struct nest_ima_events *p8_events, int string, u32 val)
>> {
>> @@ -46,6 +57,56 @@ static int nest_event_info(struct property *pp, char *name,
>> return 0;
>> }
>>
>> +/*
>> + * Populate event name and string in attribute
>> + */
>> +static struct attribute *dev_str_attr(const char *name, const char *str)
>> +{
>> + struct perf_pmu_events_attr *attr;
>> +
>> + attr = kzalloc(sizeof(*attr), GFP_KERNEL);
>> +
>> + sysfs_attr_init(&attr->attr.attr);
>> +
>> + attr->event_str = str;
>> + attr->attr.attr.name = name;
>> + attr->attr.attr.mode = 0444;
>> + attr->attr.show = perf_event_sysfs_show;
>> +
>> + return &attr->attr.attr;
> So I asked you about this before, and you pointed me to
> perf_event_sysfs_show. Looking at that in kernel/events/core.c, it looks
> like that uses container_of to pull out the perf_pmu_events_attr. So I
> guess that is at least mostly correct.
>
> I'm hoping something else uses container_of to pull out attr->attr, so
> that they can actually grab the attr->attr.show function pointer, so
> that perf_event_sysfs_show actually gets called. Where would that be?

OK, what we return is the device attribute struct which also have sysfs_ops.
So ->show and ->store are those entries in the strucutre and here we only
populate show ops using perf_event_sysfs_show. Now at the time of
pmu registering, we end up calling device_add->device_create_file->
sysfs_create_file which will end up adding a sysfs device file linked to
this
->show ops.

>> +}
>> +
>> +static int update_events_in_group(
>> + struct nest_ima_events *p8_events, int idx, struct nest_pmu *pmu)
>> +{
>> + struct attribute_group *attr_group;
>> + struct attribute **attrs;
>> + int i;
>> +
>> + /*
>> + * Allocate memory for both event attribute group and for
>> + * event attributes array.
>> + */
>> + attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
>> + sizeof(*attr_group)), GFP_KERNEL);
>> + if (!attr_group)
>> + return -ENOMEM;
>> +
>> + /*
>> + * Assign memory for event attribute array
>> + */
>> + attrs = (struct attribute **)(attr_group + 1);
>> + attr_group->name = "events";
>> + attr_group->attrs = attrs;
> I am super uncomfortable with this block, especially the assignment to
> attrs. I *think* you're trying to allocate an attribute group and a set
> of attributes, but you've combined the allocation into one big
> contiguous chunk, and then you're trying to tease them apart. Is that
> necessary? Could it be two allocs, one for the attribute_group and one
> for the attribute?

I wanted to avoid two function calls here, but this is not a hot path
This happens at the pmu init time (booting), so I guess we can have
two allocs here.

>

2015-07-23 06:45:08

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] powerpc/powernv: generic nest pmu event functions



On Wednesday 22 July 2015 10:26 AM, Daniel Axtens wrote:
>> +static void p8_nest_read_counter(struct perf_event *event)
>> +{
>> + uint64_t *addr;
>> + u64 data = 0;
> You've got a u64 and a uint64_t, and then...
>> +
>> + addr = (u64 *)event->hw.event_base;
> ... you cast to event_base to a u64 pointer, which you assign to a
> uint64_t pointer.
>> + data = __be64_to_cpu(*addr);
> And now you dereference the pointer.
> Could you just have:
> data = __be64_to_cpu(*event->hw.event_base);
>> + local64_set(&event->hw.prev_count, data);
>> +}
>> +
>> +static void p8_nest_perf_event_update(struct perf_event *event)
>> +{
>> + u64 counter_prev, counter_new, final_count;
>> + uint64_t *addr;
>> +
>> + addr = (uint64_t *)event->hw.event_base;
> Here at least the cast type is the same as the type of addr, but again,
> why do you need the different types, and why local variable?

Damn sorry, copy paste errors. When I added debug prints i messed
the type case in both the functions. I will make them as uint64_t.

Thanks for this detail review
Maddy

>> + counter_prev = local64_read(&event->hw.prev_count);
>> + counter_new = __be64_to_cpu(*addr);
>> + final_count = counter_new - counter_prev;
>> +
>> + local64_set(&event->hw.prev_count, counter_new);
>> + local64_add(final_count, &event->count);
>> +}
>> +
>> +static void p8_nest_event_start(struct perf_event *event, int flags)
>> +{
>> + event->hw.state = 0;
> Should this be an enum or a #define rather than a bare 0? (It may not
> need to be, I was just wondering because I don't know what 0 means.)

I could remove it since was just initializing at the start.

>> + p8_nest_read_counter(event);
>> +}
>> +

2015-07-23 06:45:21

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 0/7]powerpc/powernv: Nest Instrumentation support



On Tuesday 21 July 2015 12:13 AM, Sukadev Bhattiprolu wrote:
> Madhavan Srinivasan [[email protected]] wrote:
> | This patchset enables Nest Instrumentation support on powerpc.
> | POWER8 has per-chip Nest Intrumentation which provides various
> | per-chip metrics like memory, powerbus, Xlink and Alink
> | bandwidth.
> |
> <snip>
>
> | Cc: Michael Ellerman <[email protected]>
> | Cc: Benjamin Herrenschmidt <[email protected]>
> | Cc: Paul Mackerras <[email protected]>
> | Cc: Anton Blanchard <[email protected]>
> | Cc: Sukadev Bhattiprolu <[email protected]>
> | Cc: Anshuman Khandual <[email protected]>
> | Cc: Stephane Eranian <[email protected]>
> | Signed-off-by: Madhavan Srinivasan <[email protected]>
>
> Thanks for addressing my comments from earlier version.
>
> Reviewed-by: Sukadev Bhattiprolu <[email protected]>

Thanks for the review
Maddy

2015-07-23 06:49:12

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support



On Wednesday 22 July 2015 10:33 AM, Daniel Axtens wrote:
>> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
>> +{
>> + int i;
>> +
>> + for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
>> + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
>> + old_cpu, new_cpu);
> From patch 4, I see per_nest_pmu_arr is defined as:
> +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
>
> Therefore, does this loop need to have a check that
> i < P8_NEST_MAX_PMUS?

No, that is max possible pmu, but we may have only couple for nest pmus
registered.

Thanks for the review comments
Maddy
>

2015-07-23 06:51:04

by Daniel Axtens

[permalink] [raw]
Subject: Re: [PATCH v5 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support

On Thu, 2015-07-23 at 12:18 +0530, Madhavan Srinivasan wrote:
>
> On Wednesday 22 July 2015 10:33 AM, Daniel Axtens wrote:
> >> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> >> +{
> >> + int i;
> >> +
> >> + for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
> >> + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> >> + old_cpu, new_cpu);
> > From patch 4, I see per_nest_pmu_arr is defined as:
> > +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
> >
> > Therefore, does this loop need to have a check that
> > i < P8_NEST_MAX_PMUS?
>
> No, that is max possible pmu, but we may have only couple for nest pmus
> registered.
>
What if we have P8_NEST_MAX_PMUS registered? Then we'll check beyond the
end of the array...

> Thanks for the review comments
> Maddy
> >
>

--
Regards,
Daniel


Attachments:
signature.asc (860.00 B)
This is a digitally signed message part

2015-07-23 07:28:31

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 7/7] powerpc/powernv: nest pmu cpumask and cpu hotplug support



On Thursday 23 July 2015 12:19 PM, Daniel Axtens wrote:
> On Thu, 2015-07-23 at 12:18 +0530, Madhavan Srinivasan wrote:
>> On Wednesday 22 July 2015 10:33 AM, Daniel Axtens wrote:
>>>> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
>>>> +{
>>>> + int i;
>>>> +
>>>> + for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
>>>> + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
>>>> + old_cpu, new_cpu);
>>> From patch 4, I see per_nest_pmu_arr is defined as:
>>> +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
>>>
>>> Therefore, does this loop need to have a check that
>>> i < P8_NEST_MAX_PMUS?
>> No, that is max possible pmu, but we may have only couple for nest pmus
>> registered.
>>
> What if we have P8_NEST_MAX_PMUS registered? Then we'll check beyond the
> end of the array...

OK, i will add check for P8_NEST_MAX_PMUS also.

>> Thanks for the review comments
>> Maddy

2015-07-23 09:04:32

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] powerpc/powernv: generic nest pmu event functions

On Thu, 2015-07-23 at 12:14 +0530, Madhavan Srinivasan wrote:
>
> On Wednesday 22 July 2015 10:26 AM, Daniel Axtens wrote:
> >> +static void p8_nest_read_counter(struct perf_event *event)
> >> +{
> >> + uint64_t *addr;
> >> + u64 data = 0;
> > You've got a u64 and a uint64_t, and then...
> >> +
> >> + addr = (u64 *)event->hw.event_base;
> > ... you cast to event_base to a u64 pointer, which you assign to a
> > uint64_t pointer.
> >> + data = __be64_to_cpu(*addr);
> > And now you dereference the pointer.
> > Could you just have:
> > data = __be64_to_cpu(*event->hw.event_base);
> >> + local64_set(&event->hw.prev_count, data);
> >> +}
> >> +
> >> +static void p8_nest_perf_event_update(struct perf_event *event)
> >> +{
> >> + u64 counter_prev, counter_new, final_count;
> >> + uint64_t *addr;
> >> +
> >> + addr = (uint64_t *)event->hw.event_base;
> > Here at least the cast type is the same as the type of addr, but again,
> > why do you need the different types, and why local variable?
>
> Damn sorry, copy paste errors. When I added debug prints i messed
> the type case in both the functions. I will make them as uint64_t.

No please use u64/u32 etc. Most code in powerpc does and I prefer them.

cheers

2015-07-23 09:11:44

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v5 4/7] powerpc/powernv: detect supported nest pmus and its events

On Thu, 2015-07-23 at 11:33 +0530, Madhavan Srinivasan wrote:
>
> On Wednesday 22 July 2015 09:37 AM, Daniel Axtens wrote:
> >
> >> static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
> >> +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
> >> +
> >> +static int nest_event_info(struct property *pp, char *name,
> >> + struct nest_ima_events *p8_events, int string, u32 val)
> > 'int string' is a bit confusing. 'bool is_string' might be clearer, but
> > I think it would be even better still to have different functions for
> > string and non-string cases, especially because you only need val in the
> > non-string case.
>
> I would perfer to be a single function, since most of the code is same
> just of the sting or val part. yes. We can make is as is_string and will
> add comment explaining what is done here?

I think Daniel's right, it would be better as two functions.

The only part that is common after the if (string) check is the
p8_events->ev_value = buf assignment.

So you should be able to keep all the code up to the if (string) check in a
shared function and just have two wrappers that use it.

cheers

2015-07-23 09:17:06

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v5 3/7] powerpc/powernv: Nest PMU detection and device tree parser

On Thu, 2015-07-23 at 11:24 +0530, Madhavan Srinivasan wrote:
>
> On Wednesday 22 July 2015 09:19 AM, Daniel Axtens wrote:
> > Hi,
> >
> >> +static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
> >> +
> >> +static int nest_ima_dt_parser(void)
> >> +{
> >> + const __be32 *gcid;
> >> + const __be64 *chip_ima_reg;
> >> + const __be64 *chip_ima_size;
> >> + struct device_node *dev;
> >> + struct perchip_nest_info *p8ni;
> >> + int idx;
> >> +
> >> + /*
> >> + * "nest-ima" folder contains two things,
> >> + * a) per-chip reserved memory region for Nest PMU Counter data
> >> + * b) Support Nest PMU units and their event files
> >> + */
> >> + for_each_node_with_property(dev, "ibm,ima-chip") {
> >> + gcid = of_get_property(dev, "ibm,chip-id", NULL);
> >> + chip_ima_reg = of_get_property(dev, "reg", NULL);
> >> + chip_ima_size = of_get_property(dev, "size", NULL);
> >> +
> >> + if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
> >> + pr_err("Nest_PMU: device %s missing property\n",
> >> + dev->full_name);
> >> + return -ENODEV;
> >> + }
> >> +
> >> + /* chip id to save reserve memory region */
> >> + idx = (uint32_t)be32_to_cpup(gcid);
> > So be32_to_cpup returns a __u32. You're casting to a uint32_t and then
> > assigning to an int.
> > - Do you need the intermediate cast?
> > - Should idx be an unsigned type?
>
> my bad, sorry abt type case of uint to int.
> And your are right, idx can be __u32 (__u32 and uint32_t are same i
> guess).

It should be u32. Don't use the uintx_t types in kernel code unless there's
some good reason for it.

The __u32 etc. types are for things that are exposed to userspace, which this
is not, so u32 is correct.

Having said that, this code should be using of_property_read_u32() etc.

And having said that, this is all based on a device tree binding that hasn't
been reviewed yet on the OPAL side, so it's subject to change too.

cheers

2015-07-23 09:22:50

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] powerpc/powernv: generic nest pmu event functions



On Thursday 23 July 2015 02:34 PM, Michael Ellerman wrote:
> On Thu, 2015-07-23 at 12:14 +0530, Madhavan Srinivasan wrote:
>> On Wednesday 22 July 2015 10:26 AM, Daniel Axtens wrote:
>>>> +static void p8_nest_read_counter(struct perf_event *event)
>>>> +{
>>>> + uint64_t *addr;
>>>> + u64 data = 0;
>>> You've got a u64 and a uint64_t, and then...
>>>> +
>>>> + addr = (u64 *)event->hw.event_base;
>>> ... you cast to event_base to a u64 pointer, which you assign to a
>>> uint64_t pointer.
>>>> + data = __be64_to_cpu(*addr);
>>> And now you dereference the pointer.
>>> Could you just have:
>>> data = __be64_to_cpu(*event->hw.event_base);
>>>> + local64_set(&event->hw.prev_count, data);
>>>> +}
>>>> +
>>>> +static void p8_nest_perf_event_update(struct perf_event *event)
>>>> +{
>>>> + u64 counter_prev, counter_new, final_count;
>>>> + uint64_t *addr;
>>>> +
>>>> + addr = (uint64_t *)event->hw.event_base;
>>> Here at least the cast type is the same as the type of addr, but again,
>>> why do you need the different types, and why local variable?
>> Damn sorry, copy paste errors. When I added debug prints i messed
>> the type case in both the functions. I will make them as uint64_t.
> No please use u64/u32 etc. Most code in powerpc does and I prefer them.
>
> cheers
Ok Sure. Will use only u64 and u32.

Thanks
maddy

>

2015-07-23 09:24:30

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 4/7] powerpc/powernv: detect supported nest pmus and its events



On Thursday 23 July 2015 02:41 PM, Michael Ellerman wrote:
> On Thu, 2015-07-23 at 11:33 +0530, Madhavan Srinivasan wrote:
>> On Wednesday 22 July 2015 09:37 AM, Daniel Axtens wrote:
>>>
>>>> static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
>>>> +static struct nest_pmu *per_nest_pmu_arr[P8_NEST_MAX_PMUS];
>>>> +
>>>> +static int nest_event_info(struct property *pp, char *name,
>>>> + struct nest_ima_events *p8_events, int string, u32 val)
>>> 'int string' is a bit confusing. 'bool is_string' might be clearer, but
>>> I think it would be even better still to have different functions for
>>> string and non-string cases, especially because you only need val in the
>>> non-string case.
>> I would perfer to be a single function, since most of the code is same
>> just of the sting or val part. yes. We can make is as is_string and will
>> add comment explaining what is done here?
> I think Daniel's right, it would be better as two functions.
>
> The only part that is common after the if (string) check is the
> p8_events->ev_value = buf assignment.
>
> So you should be able to keep all the code up to the if (string) check in a
> shared function and just have two wrappers that use it.
>
> cheers

Sure. Will do.

Maddy

>
>
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev

2015-07-23 09:28:43

by Madhavan Srinivasan

[permalink] [raw]
Subject: Re: [PATCH v5 3/7] powerpc/powernv: Nest PMU detection and device tree parser



On Thursday 23 July 2015 02:46 PM, Michael Ellerman wrote:
> On Thu, 2015-07-23 at 11:24 +0530, Madhavan Srinivasan wrote:
>> On Wednesday 22 July 2015 09:19 AM, Daniel Axtens wrote:
>>> Hi,
>>>
>>>> +static struct perchip_nest_info p8_nest_perchip_info[P8_NEST_MAX_CHIPS];
>>>> +
>>>> +static int nest_ima_dt_parser(void)
>>>> +{
>>>> + const __be32 *gcid;
>>>> + const __be64 *chip_ima_reg;
>>>> + const __be64 *chip_ima_size;
>>>> + struct device_node *dev;
>>>> + struct perchip_nest_info *p8ni;
>>>> + int idx;
>>>> +
>>>> + /*
>>>> + * "nest-ima" folder contains two things,
>>>> + * a) per-chip reserved memory region for Nest PMU Counter data
>>>> + * b) Support Nest PMU units and their event files
>>>> + */
>>>> + for_each_node_with_property(dev, "ibm,ima-chip") {
>>>> + gcid = of_get_property(dev, "ibm,chip-id", NULL);
>>>> + chip_ima_reg = of_get_property(dev, "reg", NULL);
>>>> + chip_ima_size = of_get_property(dev, "size", NULL);
>>>> +
>>>> + if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
>>>> + pr_err("Nest_PMU: device %s missing property\n",
>>>> + dev->full_name);
>>>> + return -ENODEV;
>>>> + }
>>>> +
>>>> + /* chip id to save reserve memory region */
>>>> + idx = (uint32_t)be32_to_cpup(gcid);
>>> So be32_to_cpup returns a __u32. You're casting to a uint32_t and then
>>> assigning to an int.
>>> - Do you need the intermediate cast?
>>> - Should idx be an unsigned type?
>> my bad, sorry abt type case of uint to int.
>> And your are right, idx can be __u32 (__u32 and uint32_t are same i
>> guess).
> It should be u32. Don't use the uintx_t types in kernel code unless there's
> some good reason for it.
>
> The __u32 etc. types are for things that are exposed to userspace, which this
> is not, so u32 is correct.
>
> Having said that, this code should be using of_property_read_u32() etc.

Ok will change it to use of_property_read_u32.

> And having said that, this is all based on a device tree binding that hasn't
> been reviewed yet on the OPAL side, so it's subject to change too.

Have posted new version in the skiboot mailinglist based on the reviews.
hoping to get it reviewed soon.

Maddy
> cheers
>
>