This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.
Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.
OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.
Kernel code then, parses the device-tree for nest pmu support
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip reserved memory region, hence
device-tree also exports per-chip nest accumulation memory region.
And individual event offset are used as event configuration values.
Here is sample perf usage to explain the interface.
#./perf list
....
iTLB-load-misses [Hardware cache event]
Nest_Alink_BW/Alink0/ [Kernel PMU event]
Nest_Alink_BW/Alink1/ [Kernel PMU event]
Nest_Alink_BW/Alink2/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_03/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_03/ [Kernel PMU event]
Nest_PowerBus_BW/External/ [Kernel PMU event]
Nest_PowerBus_BW/Internal/ [Kernel PMU event]
Nest_Xlink_BW/Xlink0/ [Kernel PMU event]
Nest_Xlink_BW/Xlink1/ [Kernel PMU event]
Nest_Xlink_BW/Xlink2/ [Kernel PMU event]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
.....
# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1
Performance counter stats for 'system wide':
CPU0 15,913.18 MiB Nest_Xlink_BW/Xlink1/
CPU32 11,955.88 MiB Nest_Xlink_BW/Xlink1/
CPU64 11,042.43 MiB Nest_Xlink_BW/Xlink1/
CPU96 14,065.27 MiB Nest_Xlink_BW/Xlink1/
1.001062038 seconds time elapsed
# ./perf stat -e 'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 1000 sleep 5
Performance counter stats for 'system wide':
CPU0 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU32 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU64 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU96 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU0 1,430.43 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU32 320.99 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU64 3,443.83 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU96 1,904.41 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU0 2,856.85 MiB Nest_Alink_BW/Alink2/
CPU32 7.50 MiB Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB Nest_Alink_BW/Alink2/
CPU96 288.49 MiB Nest_Alink_BW/Alink2/
.....
OPAL side patches are posted in the skiboot mailing list.
Changelog from v1:
1) No logic changes, re-ordered patches make each patch compile
without errors
2) Added comments based on the review feedback.
3) removed perf_event_del function and replaced it with perf_event_stop.
4) Moved Nest feature detection code out of parser function.
5) Optimized functions and removed some variables.
6) squashed the makefile changes, instead of the separate patch
7) squashed the cpumask and hotplug patches as single patch
8) Added cpu checks in nest_change_cpu_context and nest_exit_cpu functions
9) Made changes to commit messages.
Changelog from RFC:
1) Removed "uncore" code and made each Nest Unit a separate PMU.
2) Removed uncore type abstraction and uncore related functions.
3) Added simple cpumask function since these are per-chip counters
4) Redesigned device-tree parser based on the latest platform enablement code
5) Made changes to commit message
Thanks to intput from Daniel Axtens, Suzuki Poulose and Michael Ellerman
Kindly let me know you comments and feedback.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Madhavan Srinivasan (7):
powerpc/powernv: Data structure and macros definition
powerpc/powernv: Add OPAL support for Nest PMU
powerpc/powernv: Nest PMU detection and device tree parser
powerpc/powernv: detect supported nest pmus and its events
powerpc/powernv: add event attribute and group to nest pmu
powerpc/powernv: generic nest pmu event functions
powerpc/powernv: nest pmu cpumask and cpu hotplug support
arch/powerpc/include/asm/opal-api.h | 3 +-
arch/powerpc/include/asm/opal.h | 2 +
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 520 +++++++++++++++++++++++++
arch/powerpc/perf/nest-pmu.h | 53 +++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
6 files changed, 579 insertions(+), 2 deletions(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c
create mode 100644 arch/powerpc/perf/nest-pmu.h
--
1.9.1
Create new header file "nest-pmu.h" to add the data structures
and macros needed for the nest pmu support.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.h | 53 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)
create mode 100644 arch/powerpc/perf/nest-pmu.h
diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index 0000000..ebe7689
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,53 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+#define P8_MAX_CHIP 32
+#define MAX_PMU_NAME_LEN 256
+#define MAX_EVENTS_SUPPORTED 256
+#define P8_NEST_ENGINE_START 1
+#define P8_NEST_ENGINE_STOP 0
+#define P8_MAX_NEST_PMUS 32
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+ uint32_t chip_id;
+ uint64_t pbase;
+ uint64_t vbase;
+ uint32_t size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct ppc64_nest_ima_events {
+ const char *ev_name;
+ const char *ev_value;
+};
+
+/*
+ * Device tree parser code detects nest pmu support and
+ * registers new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct nest_pmu {
+ struct pmu pmu;
+ const struct attribute_group *attr_groups[4];
+};
--
1.9.1
Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.
OPAL side patches are posted in the skiboot mailing.
Cc: Stewart Smith <[email protected]>
Cc: Jeremy Kerr <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 3 ++-
arch/powerpc/include/asm/opal.h | 2 ++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 0321a90..8011a75 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -153,7 +153,8 @@
#define OPAL_FLASH_READ 110
#define OPAL_FLASH_WRITE 111
#define OPAL_FLASH_ERASE 112
-#define OPAL_LAST 112
+#define OPAL_NEST_IMA_CONTROL 116
+#define OPAL_LAST 116
/* Device tree flags */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..f86e5e9 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -201,6 +201,8 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_nest_ima_control(uint32_t value);
+
/* Internal functions */
extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
int depth, void *data);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a7ade94..ce36a68 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -295,3 +295,4 @@ OPAL_CALL(opal_i2c_request, OPAL_I2C_REQUEST);
OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
OPAL_CALL(opal_flash_write, OPAL_FLASH_WRITE);
OPAL_CALL(opal_flash_erase, OPAL_FLASH_ERASE);
+OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL);
--
1.9.1
Create a file "nest-pmu.c" to contain nest pmu related functions. Code
to detect nest pmu support and parser to collect per-chip reserved memory
region information from device tree (DT).
Detection mechanism is to look for specific property "ibm,ima-chip" in DT.
For Nest pmu, device tree will have two set of information.
1) Per-chip reserved memory region for nest pmu counter collection area.
2) Supported Nest PMUs and events
Device tree layout for the Nest PMU as follows.
/ -- DT root folder
|
-nest-ima -- Nest PMU folder
|
-ima-chip@<chip-id> -- Per-chip folder for reserved region information
|
-ibm,chip-id -- Chip id
-ibm,ima-chip
-reg -- HOMER PORE Nest Counter collection Address (RA)
-size -- size to map in kernel space
-Alink_BW -- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit -- Event unit file
-device_type -- "nest-ima-unit" marker
....
Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 78 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a..6da656b 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o
obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \
power5+-pmu.o power6-pmu.o power7-pmu.o \
- power8-pmu.o
+ power8-pmu.o nest-pmu.o
obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o
obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index 0000000..e993630
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,78 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright (C) 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include "nest-pmu.h"
+
+static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
+
+static int nest_ima_dt_parser(void)
+{
+ const __be32 *gcid;
+ const __be64 *chip_ima_reg;
+ const __be64 *chip_ima_size;
+ struct device_node *dev;
+ int idx;
+
+ /*
+ * "nest-ima" folder contains two things,
+ * a) per-chip reserved memory region for Nest PMU Counter data
+ * b) Support Nest PMU units and their event files
+ */
+ for_each_node_with_property(dev, "ibm,ima-chip") {
+ gcid = of_get_property(dev, "ibm,chip-id", NULL);
+ chip_ima_reg = of_get_property(dev, "reg", NULL);
+ chip_ima_size = of_get_property(dev, "size", NULL);
+ if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
+ pr_err("Nest_PMU: device %s missing property \n",
+ dev->full_name);
+ return -ENODEV;
+ }
+
+ /* Save per-chip reserve memory region */
+ idx = (uint32_t)be32_to_cpup(gcid);
+ p8_perchip_nest_info[idx].pbase = be64_to_cpup(chip_ima_reg);
+ p8_perchip_nest_info[idx].size = be64_to_cpup(chip_ima_size);
+ p8_perchip_nest_info[idx].vbase = (uint64_t)
+ phys_to_virt(p8_perchip_nest_info[idx].pbase);
+ }
+
+ return 0;
+}
+
+static int __init nest_pmu_init(void)
+{
+ int ret = -ENODEV;
+
+ /*
+ * Lets do this only if we are hypervisor
+ */
+ if (!cur_cpu_spec->oprofile_cpu_type ||
+ !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0) ||
+ !cpu_has_feature(CPU_FTR_HVMODE))
+ return ret;
+
+ /*
+ * Nest PMU information are grouped under "nest-ima" node
+ * of the top-level Device-Tree directory. Detect Nest PMU
+ * with "ibm,ima-chip" property.
+ */
+ if (!of_find_node_with_property(NULL, "ibm,ima-chip"))
+ return ret;
+
+ /*
+ * Parser Device-tree for Nest PMU information
+ */
+ ret = nest_ima_dt_parser();
+ if (ret)
+ return ret;
+
+ return 0;
+}
+device_initcall(nest_pmu_init);
--
1.9.1
Parse device tree to detect supported nest pmu units. Traverse
through each nest pmu unit folder to find supported events and
corresponding unit/scale files (if any).
Nest unit event file from DT, will contain the offset in the reserves memory
region to get the counter data for a gievn event. Kernel code uses the same
as event configuration value.
Device tree parser code also look for scale/unit in the file name and
pass on the file as an event attr for perf tool to use in the post
processing.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 129 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 128 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index e993630..43d5fde 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,6 +11,124 @@
#include "nest-pmu.h"
static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
+static struct nest_pmu *per_nest_pmu_arr[P8_MAX_NEST_PMUS];
+
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+ struct ppc64_nest_ima_events **p8_events_arr, *p8_events;
+ struct nest_pmu *pmu_ptr;
+ struct property *pp;
+ char *buf, *start;
+ const __be32 *lval;
+ u32 val;
+ int idx = 0;
+
+ if (!dev)
+ return -EINVAL;
+
+ pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+ if (!pmu_ptr)
+ return -ENOMEM;
+
+ /* Needed for hotplug/migration */
+ per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+ p8_events_arr = kzalloc((sizeof(struct ppc64_nest_ima_events) * 64),
+ GFP_KERNEL);
+ if (!p8_events_arr)
+ return -ENOMEM;
+ p8_events = (struct ppc64_nest_ima_events *)p8_events_arr;
+
+ /*
+ * Loop through each property
+ */
+ for_each_property_of_node(dev, pp) {
+ start = pp->name;
+
+ if (!strcmp(pp->name, "name")) {
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length))
+ return -EINVAL;
+
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ /* Save the name to register the PMU with it */
+ sprintf(buf, "Nest_%s", (char *)pp->value);
+ pmu_ptr->pmu.name = (char *)buf;
+ }
+
+ /* Skip these, we dont need it */
+ if (!strcmp(pp->name, "name") ||
+ !strcmp(pp->name, "phandle") ||
+ !strcmp(pp->name, "device_type") ||
+ !strcmp(pp->name, "linux,phandle"))
+ continue;
+
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ if (strncmp(pp->name, "unit.", 5) == 0) {
+ start += 5;
+ strncpy(buf, start, strlen(start));
+ p8_events->ev_name = buf;
+
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length))
+ return -EINVAL;
+
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ strncpy(buf, (const char *)pp->value, pp->length);
+ p8_events->ev_value = buf;
+ idx++;
+ p8_events++;
+
+ } else if (strncmp(pp->name, "scale.", 6) == 0) {
+ start += 6;
+ strncpy(buf, start, strlen(start));
+ p8_events->ev_name = buf;
+
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length))
+ return -EINVAL;
+
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ strncpy(buf, (const char *)pp->value, pp->length);
+ p8_events->ev_value = buf;
+ idx++;
+ p8_events++;
+
+ } else {
+ strncpy(buf, start, strlen(start));
+ p8_events->ev_name = buf;
+ lval = of_get_property(dev, pp->name, NULL);
+ val = (uint32_t)be32_to_cpup(lval);
+
+ /*
+ * Use DT property value as the event
+ */
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ sprintf(buf, "event=0x%x", val);
+ p8_events->ev_value = buf;
+ p8_events++;
+ idx++;
+ }
+ }
+
+ return 0;
+}
+
static int nest_ima_dt_parser(void)
{
@@ -18,7 +136,7 @@ static int nest_ima_dt_parser(void)
const __be64 *chip_ima_reg;
const __be64 *chip_ima_size;
struct device_node *dev;
- int idx;
+ int idx, ret;
/*
* "nest-ima" folder contains two things,
@@ -43,6 +161,15 @@ static int nest_ima_dt_parser(void)
phys_to_virt(p8_perchip_nest_info[idx].pbase);
}
+ /* Look for supported Nest PMU units */
+ idx = 0;
+ for_each_node_by_type(dev, "nest-ima-unit") {
+ ret = nest_pmu_create(dev, idx);
+ if (ret)
+ return ret;
+ idx++;
+ }
+
return 0;
}
--
1.9.1
Add code to create event attribute and attribute group for
each nest pmu.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 55 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 43d5fde..8fad2d9 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,55 @@
static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
static struct nest_pmu *per_nest_pmu_arr[P8_MAX_NEST_PMUS];
+/*
+ * Populate event name and string in attribute
+ */
+struct attribute *dev_str_attr(const char *name, const char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ attr->event_str = str;
+ attr->attr.attr.name = name;
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = perf_event_sysfs_show;
+
+ return &attr->attr.attr;
+}
+
+int update_events_in_group(
+ struct ppc64_nest_ima_events *p8_events, int idx,
+ struct nest_pmu *pmu)
+{
+ struct attribute_group *attr_group;
+ struct attribute **attrs;
+ int i;
+
+ /* Allocate memory add supported event as event attribute */
+ attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
+ sizeof(*attr_group)), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ attrs = (struct attribute **)(attr_group + 1);
+ attr_group->name = "events";
+
+ /*
+ * struct attribute group contains, "name" and an
+ * array of attributes, along other elemsnts.
+ * link attrs with attribute group's attr array.
+ */
+ attr_group->attrs = attrs;
+
+ for (i = 0; i < idx; i++, p8_events++)
+ attrs[i] = dev_str_attr((char *)p8_events->ev_name,
+ (char *)p8_events->ev_value);
+
+ pmu->attr_groups[0] = attr_group;
+ return 0;
+}
+
static int nest_pmu_create(struct device_node *dev, int pmu_index)
{
struct ppc64_nest_ima_events **p8_events_arr, *p8_events;
@@ -68,7 +117,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
if (!buf)
- return -ENOMEM;
+ return -ENOMEM;
if (strncmp(pp->name, "unit.", 5) == 0) {
start += 5;
@@ -126,6 +175,10 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
}
}
+ update_events_in_group(
+ (struct ppc64_nest_ima_events *)p8_events_arr,
+ idx, pmu_ptr);
+
return 0;
}
--
1.9.1
Add generic format attribute and set of generic nest pmu related
event functions to be used by each nest pmu. Add code to register nest pmus.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 109 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 109 insertions(+)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 8fad2d9..a662c14 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -13,6 +13,108 @@
static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
static struct nest_pmu *per_nest_pmu_arr[P8_MAX_NEST_PMUS];
+PMU_FORMAT_ATTR(event, "config:0-20");
+struct attribute *p8_nest_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+struct attribute_group p8_nest_format_group = {
+ .name = "format",
+ .attrs = p8_nest_format_attrs,
+};
+
+static int p8_nest_event_init(struct perf_event *event)
+{
+ int chip_id;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported yet */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ chip_id = topology_physical_package_id(event->cpu);
+ event->hw.event_base = event->attr.config +
+ p8_perchip_nest_info[chip_id].vbase;
+
+ return 0;
+}
+
+static void p8_nest_read_counter(struct perf_event *event)
+{
+ u64 *addr;
+ u64 data = 0;
+
+ addr = (u64 *)event->hw.event_base;
+ data = __be64_to_cpu((uint64_t)*addr);
+ local64_set(&event->hw.prev_count, data);
+}
+
+static void p8_nest_perf_event_update(struct perf_event *event)
+{
+ u64 counter_prev, counter_new, final_count;
+ uint64_t *addr;
+
+ addr = (u64 *)event->hw.event_base;
+ counter_prev = local64_read(&event->hw.prev_count);
+ counter_new = __be64_to_cpu((uint64_t)*addr);
+ final_count = counter_new - counter_prev;
+
+ local64_set(&event->hw.prev_count, counter_new);
+ local64_add(final_count, &event->count);
+}
+
+static void p8_nest_event_start(struct perf_event *event, int flags)
+{
+ event->hw.state = 0;
+ p8_nest_read_counter(event);
+}
+
+static void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+ p8_nest_perf_event_update(event);
+}
+
+static int p8_nest_event_add(struct perf_event *event, int flags)
+{
+ p8_nest_event_start(event, flags);
+ return 0;
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+ if (!pmu)
+ return -EINVAL;
+
+ pmu->pmu.task_ctx_nr = perf_invalid_context;
+ pmu->pmu.event_init = p8_nest_event_init;
+ pmu->pmu.add = p8_nest_event_add;
+ pmu->pmu.del = p8_nest_event_stop;
+ pmu->pmu.start = p8_nest_event_start;
+ pmu->pmu.stop = p8_nest_event_stop;
+ pmu->pmu.read = p8_nest_perf_event_update;
+ pmu->pmu.attr_groups = pmu->attr_groups;
+
+ return 0;
+}
+
/*
* Populate event name and string in attribute
*/
@@ -106,6 +208,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
/* Save the name to register the PMU with it */
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
+ pmu_ptr->attr_groups[1] = &p8_nest_format_group;
}
/* Skip these, we dont need it */
@@ -179,6 +282,12 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
(struct ppc64_nest_ima_events *)p8_events_arr,
idx, pmu_ptr);
+ update_pmu_ops(pmu_ptr);
+
+ /* Register the pmu */
+ perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+ printk(KERN_INFO "Nest PMU %s Registered\n", pmu_ptr->pmu.name);
+
return 0;
}
--
1.9.1
Adds cpumask attribute to be used by each nest pmu since nest
units are per-chip. Only one cpu (first online cpu) from each node/chip
is designated to read counters.
On cpu hotplug, dying cpu is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same node/chip is designated
as new cpu to read counters.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 153 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 153 insertions(+)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index a662c14..95f8ecc 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,155 @@
static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
static struct nest_pmu *per_nest_pmu_arr[P8_MAX_NEST_PMUS];
+static cpumask_t cpu_mask_nest_pmu;
+
+static ssize_t cpumask_nest_pmu_get_attr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &cpu_mask_nest_pmu);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_nest_pmu_get_attr, NULL);
+
+static struct attribute *cpumask_nest_pmu_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group cpumask_nest_pmu_attr_group = {
+ .attrs = cpumask_nest_pmu_attrs,
+};
+
+static void nest_init(void *dummy)
+{
+ opal_nest_ima_control(P8_NEST_ENGINE_START);
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+ int i;
+
+ if (new_cpu >= 0) {
+ for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
+ perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+ old_cpu, new_cpu);
+ }
+}
+
+static void nest_exit_cpu(int cpu)
+{
+ int i, nid, target = -1;
+ const struct cpumask *l_cpumask;
+ int src_chipid;
+
+ /*
+ * Check in the designated list for this cpu. Dont bother
+ * if not one of them.
+ */
+ if (!cpumask_test_and_clear_cpu(cpu, &cpu_mask_nest_pmu))
+ return;
+
+ /*
+ * Now that this cpu is one of the designated,
+ * find a new cpu a) which is not dying and
+ * b) is in same node/chip.
+ */
+ nid = cpu_to_node(cpu);
+ src_chipid = topology_physical_package_id(cpu);
+ l_cpumask = cpumask_of_node(nid);
+ for_each_cpu(i, l_cpumask) {
+ if (i == cpu)
+ continue;
+ if (src_chipid == topology_physical_package_id(i)) {
+ target = i;
+ break;
+ }
+ }
+
+ /*
+ * Update the cpumask with the target cpu and
+ * migrate the context if needed
+ */
+ if (target >= 0) {
+ cpumask_set_cpu(target, &cpu_mask_nest_pmu);
+ nest_change_cpu_context (cpu, target);
+ }
+}
+
+static void nest_init_cpu(int cpu)
+{
+ int i, src_chipid;
+
+ /*
+ * Search for any existing designated thread from
+ * the incoming cpu's node/chip. If found, do nothing.
+ */
+ src_chipid = topology_physical_package_id(cpu);
+ for_each_cpu(i, &cpu_mask_nest_pmu)
+ if (src_chipid == topology_physical_package_id(i))
+ return;
+
+ /*
+ * Make incoming cpu as a designated thread for
+ * this node/chip
+ */
+ cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
+}
+
+static int nest_cpu_notifier(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ long cpu = (long)hcpu;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_DOWN_FAILED:
+ case CPU_STARTING:
+ nest_init_cpu(cpu);
+ break;
+ case CPU_DOWN_PREPARE:
+ nest_exit_cpu(cpu);
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block nest_cpu_nb = {
+ .notifier_call = nest_cpu_notifier,
+ .priority = CPU_PRI_PERF + 1,
+};
+
+void cpumask_chip(void)
+{
+ const struct cpumask *l_cpumask;
+ int cpu, nid;
+
+ if (!cpumask_empty(&cpu_mask_nest_pmu))
+ return;
+
+ cpu_notifier_register_begin();
+
+ /*
+ * Nest PMUs are per-chip counters. So designate a cpu
+ * from each node/chip for counter collection.
+ */
+ for_each_online_node(nid) {
+ l_cpumask = cpumask_of_node(nid);
+
+ /* designate first online cpu in this node */
+ cpu = cpumask_first(l_cpumask);
+ cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
+ }
+
+ /* Initialize Nest PMUs in each node using designated cpus */
+ on_each_cpu_mask(&cpu_mask_nest_pmu, (smp_call_func_t)nest_init, NULL, 1);
+
+ __register_cpu_notifier(&nest_cpu_nb);
+
+ cpu_notifier_register_done();
+}
PMU_FORMAT_ATTR(event, "config:0-20");
struct attribute *p8_nest_format_attrs[] = {
@@ -209,6 +358,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
sprintf(buf, "Nest_%s", (char *)pp->value);
pmu_ptr->pmu.name = (char *)buf;
pmu_ptr->attr_groups[1] = &p8_nest_format_group;
+ pmu_ptr->attr_groups[2] = &cpumask_nest_pmu_attr_group;
}
/* Skip these, we dont need it */
@@ -362,6 +512,9 @@ static int __init nest_pmu_init(void)
if (ret)
return ret;
+ /* Add cpumask and register for hotplug notification */
+ cpumask_chip();
+
return 0;
}
device_initcall(nest_pmu_init);
--
1.9.1
On 06/11/2015 10:47 AM, Madhavan Srinivasan wrote:
> Adds cpumask attribute to be used by each nest pmu since nest
> units are per-chip. Only one cpu (first online cpu) from each node/chip
> is designated to read counters.
>
> On cpu hotplug, dying cpu is checked to see whether it is one of the
> designated cpus, if yes, next online cpu from the same node/chip is designated
> as new cpu to read counters.
>
> Cc: Michael Ellerman <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Anton Blanchard <[email protected]>
> Cc: Sukadev Bhattiprolu <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Madhavan Srinivasan <[email protected]>
> ---
> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> +{
> + int i;
> +
> + if (new_cpu >= 0) {
> + for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
> + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> + old_cpu, new_cpu);
> + }
> +}
> +
> +static void nest_exit_cpu(int cpu)
> +{
> + int i, nid, target = -1;
> + const struct cpumask *l_cpumask;
> + int src_chipid;
> +
> + /*
> + * Check in the designated list for this cpu. Dont bother
> + * if not one of them.
> + */
> + if (!cpumask_test_and_clear_cpu(cpu, &cpu_mask_nest_pmu))
> + return;
> +
> + /*
> + * Now that this cpu is one of the designated,
> + * find a new cpu a) which is not dying and
This comment is not right. nest_exit_cpu() is called in the hotplug
path, so another cpu cannot be dying in parallel. Hotplug operations are
done serially. The comment ought to be "a) which is online" instead.
> + * b) is in same node/chip.
node is not the same as a chip right ? And you are interested in cpus on
the same chip alone. So shouldn't the above comment be b) in the same chip ?
> + */
> + nid = cpu_to_node(cpu);
> + src_chipid = topology_physical_package_id(cpu);
What is the relation between a node and a chip ? Can't we have a
function which returns the cpumask of a chip straight away, since that
is what you seem to be more interested in ? You can then simply choose
the next cpu in this cpumask rather than going through the below loop.
> + l_cpumask = cpumask_of_node(nid);
> + for_each_cpu(i, l_cpumask) {
> + if (i == cpu)
> + continue;
> + if (src_chipid == topology_physical_package_id(i)) {
> + target = i;
> + break;
> + }
> + }
> +
> + /*
> + * Update the cpumask with the target cpu and
> + * migrate the context if needed
> + */
> + if (target >= 0) {
You already check for target >= 0 here. So you don't need to check for
new_cpu >= 0 in nest_change_cpu_context() above ?
> + cpumask_set_cpu(target, &cpu_mask_nest_pmu);
> + nest_change_cpu_context (cpu, target);
> + }
> +}
> +
> +static void nest_init_cpu(int cpu)
> +{
> + int i, src_chipid;
> +
> + /*
> + * Search for any existing designated thread from
> + * the incoming cpu's node/chip. If found, do nothing.
> + */
> + src_chipid = topology_physical_package_id(cpu);
> + for_each_cpu(i, &cpu_mask_nest_pmu)
> + if (src_chipid == topology_physical_package_id(i))
> + return;
> +
> + /*
> + * Make incoming cpu as a designated thread for
> + * this node/chip
> + */
> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
Why can't we simply check if cpu is the first one coming online in the
chip and designate it as the cpu_mask_nest_pmu for that chip ? If it is
not the first cpu, it means that another cpu in the same chip already
took over this duty and it needn't bother.
And shouldn't we also call nest_init() on this cpu, just like you do in
cpumask_chip() on all cpu_mask_nest_pmu cpus ?
> +}
> +
> +static int nest_cpu_notifier(struct notifier_block *self,
> + unsigned long action, void *hcpu)
> +{
> + long cpu = (long)hcpu;
> +
> + switch (action & ~CPU_TASKS_FROZEN) {
> + case CPU_DOWN_FAILED:
Why do we need to handle the DOWN_FAILED case ? In DOWN_PREPARE, you
have ensured that the function moves on to another cpu. So even if the
offline failed, its no issue. The duty is safely taken over.
> + case CPU_STARTING:
I would suggest initializing nest in the CPU_ONLINE stage. This is
because CPU_STARTING phase can fail. In that case, we will be
unnecessarily initializing nest pre-maturely. CPU_ONLINE phase assures
that the cpu is successfully online and you can then initiate nest.
> + nest_init_cpu(cpu);
> + break;
> + case CPU_DOWN_PREPARE:
> + nest_exit_cpu(cpu);
> + break;
> + default:
> + break;
> + }
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block nest_cpu_nb = {
> + .notifier_call = nest_cpu_notifier,
> + .priority = CPU_PRI_PERF + 1,
> +};
> +
> +void cpumask_chip(void)
This name ^^ is not apt IMO. You are initiating the cpumask necessary
for nest pmu. So why not call it nest_pmu_cpumask_init() ?
> +{
> + const struct cpumask *l_cpumask;
> + int cpu, nid;
> +
> + if (!cpumask_empty(&cpu_mask_nest_pmu))
When can this condition become true ?
> + return;
> +
> + cpu_notifier_register_begin();
> +
> + /*
> + * Nest PMUs are per-chip counters. So designate a cpu
> + * from each node/chip for counter collection.
> + */
> + for_each_online_node(nid) {
> + l_cpumask = cpumask_of_node(nid);
> +
> + /* designate first online cpu in this node */
> + cpu = cpumask_first(l_cpumask);
> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
> + }
> +
> + /* Initialize Nest PMUs in each node using designated cpus */
> + on_each_cpu_mask(&cpu_mask_nest_pmu, (smp_call_func_t)nest_init, NULL, 1);
> +
> + __register_cpu_notifier(&nest_cpu_nb);
> +
> + cpu_notifier_register_done();
> +}
Regards
Preeti U Murthy
On Tuesday 16 June 2015 11:58 AM, Preeti U Murthy wrote:
> On 06/11/2015 10:47 AM, Madhavan Srinivasan wrote:
>> Adds cpumask attribute to be used by each nest pmu since nest
>> units are per-chip. Only one cpu (first online cpu) from each node/chip
>> is designated to read counters.
>>
>> On cpu hotplug, dying cpu is checked to see whether it is one of the
>> designated cpus, if yes, next online cpu from the same node/chip is designated
>> as new cpu to read counters.
>>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Anton Blanchard <[email protected]>
>> Cc: Sukadev Bhattiprolu <[email protected]>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Stephane Eranian <[email protected]>
>> Cc: Preeti U Murthy <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Signed-off-by: Madhavan Srinivasan <[email protected]>
>> ---
>> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
>> +{
>> + int i;
>> +
>> + if (new_cpu >= 0) {
>> + for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
>> + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
>> + old_cpu, new_cpu);
>> + }
>> +}
>> +
>> +static void nest_exit_cpu(int cpu)
>> +{
>> + int i, nid, target = -1;
>> + const struct cpumask *l_cpumask;
>> + int src_chipid;
>> +
>> + /*
>> + * Check in the designated list for this cpu. Dont bother
>> + * if not one of them.
>> + */
>> + if (!cpumask_test_and_clear_cpu(cpu, &cpu_mask_nest_pmu))
>> + return;
>> +
>> + /*
>> + * Now that this cpu is one of the designated,
>> + * find a new cpu a) which is not dying and
> This comment is not right. nest_exit_cpu() is called in the hotplug
> path, so another cpu cannot be dying in parallel. Hotplug operations are
> done serially. The comment ought to be "a) which is online" instead.
Ok will change it.
>> + * b) is in same node/chip.
> node is not the same as a chip right ? And you are interested in cpus on
> the same chip alone. So shouldn't the above comment be b) in the same chip ?
I was hoping it to be, but i will change comment to chip.
>> + */
>> + nid = cpu_to_node(cpu);
>> + src_chipid = topology_physical_package_id(cpu);
> What is the relation between a node and a chip ? Can't we have a
> function which returns the cpumask of a chip straight away, since that
> is what you seem to be more interested in ? You can then simply choose
> the next cpu in this cpumask rather than going through the below loop.
>
Make sense. I can separate it out.
>> + l_cpumask = cpumask_of_node(nid);
>> + for_each_cpu(i, l_cpumask) {
>> + if (i == cpu)
>> + continue;
>> + if (src_chipid == topology_physical_package_id(i)) {
>> + target = i;
>> + break;
>> + }
>> + }
>> +
>> + /*
>> + * Update the cpumask with the target cpu and
>> + * migrate the context if needed
>> + */
>> + if (target >= 0) {
> You already check for target >= 0 here. So you don't need to check for
> new_cpu >= 0 in nest_change_cpu_context() above ?
I guess i was way too cautious :) Will change it
>> + cpumask_set_cpu(target, &cpu_mask_nest_pmu);
>> + nest_change_cpu_context (cpu, target);
>> + }
>> +}
>> +
>> +static void nest_init_cpu(int cpu)
>> +{
>> + int i, src_chipid;
>> +
>> + /*
>> + * Search for any existing designated thread from
>> + * the incoming cpu's node/chip. If found, do nothing.
>> + */
>> + src_chipid = topology_physical_package_id(cpu);
>> + for_each_cpu(i, &cpu_mask_nest_pmu)
>> + if (src_chipid == topology_physical_package_id(i))
>> + return;
>> +
>> + /*
>> + * Make incoming cpu as a designated thread for
>> + * this node/chip
>> + */
>> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
> Why can't we simply check if cpu is the first one coming online in the
> chip and designate it as the cpu_mask_nest_pmu for that chip ? If it is
> not the first cpu, it means that another cpu in the same chip already
> took over this duty and it needn't bother.
Looks to be right. let me try it out.
> And shouldn't we also call nest_init() on this cpu, just like you do in
> cpumask_chip() on all cpu_mask_nest_pmu cpus ?
Yes. I missed that. We should init. Nice catch.
>> +}
>> +
>> +static int nest_cpu_notifier(struct notifier_block *self,
>> + unsigned long action, void *hcpu)
>> +{
>> + long cpu = (long)hcpu;
>> +
>> + switch (action & ~CPU_TASKS_FROZEN) {
>> + case CPU_DOWN_FAILED:
> Why do we need to handle the DOWN_FAILED case ? In DOWN_PREPARE, you
> have ensured that the function moves on to another cpu. So even if the
> offline failed, its no issue. The duty is safely taken over.
>
>> + case CPU_STARTING:
> I would suggest initializing nest in the CPU_ONLINE stage. This is
> because CPU_STARTING phase can fail. In that case, we will be
> unnecessarily initializing nest pre-maturely. CPU_ONLINE phase assures
> that the cpu is successfully online and you can then initiate nest.
Ok sure. Will do that.
>> + nest_init_cpu(cpu);
>> + break;
>> + case CPU_DOWN_PREPARE:
>> + nest_exit_cpu(cpu);
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block nest_cpu_nb = {
>> + .notifier_call = nest_cpu_notifier,
>> + .priority = CPU_PRI_PERF + 1,
>> +};
>> +
>> +void cpumask_chip(void)
> This name ^^ is not apt IMO. You are initiating the cpumask necessary
> for nest pmu. So why not call it nest_pmu_cpumask_init() ?
Ok.
>> +{
>> + const struct cpumask *l_cpumask;
>> + int cpu, nid;
>> +
>> + if (!cpumask_empty(&cpu_mask_nest_pmu))
> When can this condition become true ?
My bad. This code ended up here from the initial RFC patch, but after
reviewing this again, i dont think this is needed. Once again nice catch.
>> + return;
>> +
>> + cpu_notifier_register_begin();
>> +
>> + /*
>> + * Nest PMUs are per-chip counters. So designate a cpu
>> + * from each node/chip for counter collection.
>> + */
>> + for_each_online_node(nid) {
>> + l_cpumask = cpumask_of_node(nid);
>> +
>> + /* designate first online cpu in this node */
>> + cpu = cpumask_first(l_cpumask);
>> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
>> + }
>> +
>> + /* Initialize Nest PMUs in each node using designated cpus */
>> + on_each_cpu_mask(&cpu_mask_nest_pmu, (smp_call_func_t)nest_init, NULL, 1);
>> +
>> + __register_cpu_notifier(&nest_cpu_nb);
>> +
>> + cpu_notifier_register_done();
>> +}
> Regards
> Preeti U Murthy
Madhavan Srinivasan [[email protected]] wrote:
| From: Madhavan Srinivasan <[email protected]>
| Subject: [PATCH v2 6/7]powerpc/powernv: generic nest pmu event functions
|
| Add generic format attribute and set of generic nest pmu related
| event functions to be used by each nest pmu. Add code to register nest pmus.
|
| Cc: Michael Ellerman <[email protected]>
| Cc: Benjamin Herrenschmidt <[email protected]>
| Cc: Paul Mackerras <[email protected]>
| Cc: Anton Blanchard <[email protected]>
| Cc: Sukadev Bhattiprolu <[email protected]>
| Cc: Anshuman Khandual <[email protected]>
| Cc: Stephane Eranian <[email protected]>
| Signed-off-by: Madhavan Srinivasan <[email protected]>
| ---
| arch/powerpc/perf/nest-pmu.c | 109 +++++++++++++++++++++++++++++++++++++++++++
| 1 file changed, 109 insertions(+)
|
| diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
| index 8fad2d9..a662c14 100644
| --- a/arch/powerpc/perf/nest-pmu.c
| +++ b/arch/powerpc/perf/nest-pmu.c
| @@ -13,6 +13,108 @@
| static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
| static struct nest_pmu *per_nest_pmu_arr[P8_MAX_NEST_PMUS];
|
| +PMU_FORMAT_ATTR(event, "config:0-20");
| +struct attribute *p8_nest_format_attrs[] = {
| + &format_attr_event.attr,
| + NULL,
| +};
| +
| +struct attribute_group p8_nest_format_group = {
| + .name = "format",
| + .attrs = p8_nest_format_attrs,
| +};
Could this be included in previous/separate patch? That way,
this patch could focus on just registering the nest-pmu.
| +
| +static int p8_nest_event_init(struct perf_event *event)
| +{
| + int chip_id;
| +
| + if (event->attr.type != event->pmu->type)
| + return -ENOENT;
| +
| + /* Sampling not supported yet */
| + if (event->hw.sample_period)
| + return -EINVAL;
| +
| + /* unsupported modes and filters */
| + if (event->attr.exclude_user ||
| + event->attr.exclude_kernel ||
| + event->attr.exclude_hv ||
| + event->attr.exclude_idle ||
| + event->attr.exclude_host ||
| + event->attr.exclude_guest)
| + return -EINVAL;
| +
| + if (event->cpu < 0)
| + return -EINVAL;
| +
| + chip_id = topology_physical_package_id(event->cpu);
| + event->hw.event_base = event->attr.config +
| + p8_perchip_nest_info[chip_id].vbase;
| +
| + return 0;
| +}
| +
| +static void p8_nest_read_counter(struct perf_event *event)
| +{
| + u64 *addr;
|
Define as uint64_t so we can eliminate one cast below? Would also
be consistent with p8_nest_perf_event_update().
|
| + u64 data = 0;
| +
| + addr = (u64 *)event->hw.event_base;
| + data = __be64_to_cpu((uint64_t)*addr);
| + local64_set(&event->hw.prev_count, data);
| +}
| +
| +static void p8_nest_perf_event_update(struct perf_event *event)
| +{
| + u64 counter_prev, counter_new, final_count;
| + uint64_t *addr;
| +
| + addr = (u64 *)event->hw.event_base;
uint64_t *?
| + counter_prev = local64_read(&event->hw.prev_count);
| + counter_new = __be64_to_cpu((uint64_t)*addr);
Redundant cast? addr is already uint64_t *?
| + final_count = counter_new - counter_prev;
| +
| + local64_set(&event->hw.prev_count, counter_new);
| + local64_add(final_count, &event->count);
| +}
| +
| +static void p8_nest_event_start(struct perf_event *event, int flags)
| +{
Check PERF_EF_RELOAD before reloading?
| + event->hw.state = 0;
| + p8_nest_read_counter(event);
| +}
| +
| +static void p8_nest_event_stop(struct perf_event *event, int flags)
| +{
Check PERF_EF_UPDATE when stopping?
| + p8_nest_perf_event_update(event);
| +}
| +
| +static int p8_nest_event_add(struct perf_event *event, int flags)
| +{
Check PERF_EF_START flags before starting the counter on an ->add()?
| + p8_nest_event_start(event, flags);
| + return 0;
| +}
| +
| +/*
| + * Populate pmu ops in the structure
| + */
| +static int update_pmu_ops(struct nest_pmu *pmu)
| +{
| + if (!pmu)
| + return -EINVAL;
| +
| + pmu->pmu.task_ctx_nr = perf_invalid_context;
| + pmu->pmu.event_init = p8_nest_event_init;
| + pmu->pmu.add = p8_nest_event_add;
| + pmu->pmu.del = p8_nest_event_stop;
| + pmu->pmu.start = p8_nest_event_start;
| + pmu->pmu.stop = p8_nest_event_stop;
| + pmu->pmu.read = p8_nest_perf_event_update;
| + pmu->pmu.attr_groups = pmu->attr_groups;
| +
| + return 0;
| +}
| +
| /*
| * Populate event name and string in attribute
| */
| @@ -106,6 +208,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
| /* Save the name to register the PMU with it */
| sprintf(buf, "Nest_%s", (char *)pp->value);
| pmu_ptr->pmu.name = (char *)buf;
| + pmu_ptr->attr_groups[1] = &p8_nest_format_group;
| }
|
| /* Skip these, we dont need it */
| @@ -179,6 +282,12 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
| (struct ppc64_nest_ima_events *)p8_events_arr,
| idx, pmu_ptr);
|
| + update_pmu_ops(pmu_ptr);
| +
| + /* Register the pmu */
| + perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
There is a small chance that perf_pmu_register() can fail.
| + printk(KERN_INFO "Nest PMU %s Registered\n", pmu_ptr->pmu.name);
| +
| return 0;
| }
|
| --
| 1.9.1
|
On Tuesday 23 June 2015 07:19 AM, Sukadev Bhattiprolu wrote:
> Madhavan Srinivasan [[email protected]] wrote:
> | From: Madhavan Srinivasan <[email protected]>
> | Subject: [PATCH v2 6/7]powerpc/powernv: generic nest pmu event functions
> |
> | Add generic format attribute and set of generic nest pmu related
> | event functions to be used by each nest pmu. Add code to register nest pmus.
> |
> | Cc: Michael Ellerman <[email protected]>
> | Cc: Benjamin Herrenschmidt <[email protected]>
> | Cc: Paul Mackerras <[email protected]>
> | Cc: Anton Blanchard <[email protected]>
> | Cc: Sukadev Bhattiprolu <[email protected]>
> | Cc: Anshuman Khandual <[email protected]>
> | Cc: Stephane Eranian <[email protected]>
> | Signed-off-by: Madhavan Srinivasan <[email protected]>
> | ---
> | arch/powerpc/perf/nest-pmu.c | 109 +++++++++++++++++++++++++++++++++++++++++++
> | 1 file changed, 109 insertions(+)
> |
> | diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
> | index 8fad2d9..a662c14 100644
> | --- a/arch/powerpc/perf/nest-pmu.c
> | +++ b/arch/powerpc/perf/nest-pmu.c
> | @@ -13,6 +13,108 @@
> | static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
> | static struct nest_pmu *per_nest_pmu_arr[P8_MAX_NEST_PMUS];
> |
> | +PMU_FORMAT_ATTR(event, "config:0-20");
> | +struct attribute *p8_nest_format_attrs[] = {
> | + &format_attr_event.attr,
> | + NULL,
> | +};
> | +
> | +struct attribute_group p8_nest_format_group = {
> | + .name = "format",
> | + .attrs = p8_nest_format_attrs,
> | +};
>
> Could this be included in previous/separate patch? That way,
> this patch could focus on just registering the nest-pmu.
Yes. Will move it.
> | +
> | +static int p8_nest_event_init(struct perf_event *event)
> | +{
> | + int chip_id;
> | +
> | + if (event->attr.type != event->pmu->type)
> | + return -ENOENT;
> | +
> | + /* Sampling not supported yet */
> | + if (event->hw.sample_period)
> | + return -EINVAL;
> | +
> | + /* unsupported modes and filters */
> | + if (event->attr.exclude_user ||
> | + event->attr.exclude_kernel ||
> | + event->attr.exclude_hv ||
> | + event->attr.exclude_idle ||
> | + event->attr.exclude_host ||
> | + event->attr.exclude_guest)
> | + return -EINVAL;
> | +
> | + if (event->cpu < 0)
> | + return -EINVAL;
> | +
> | + chip_id = topology_physical_package_id(event->cpu);
> | + event->hw.event_base = event->attr.config +
> | + p8_perchip_nest_info[chip_id].vbase;
> | +
> | + return 0;
> | +}
> | +
> | +static void p8_nest_read_counter(struct perf_event *event)
> | +{
> | + u64 *addr;
> |
>
> Define as uint64_t so we can eliminate one cast below? Would also
> be consistent with p8_nest_perf_event_update().
Yes make sense.
> |
> | + u64 data = 0;
> | +
> | + addr = (u64 *)event->hw.event_base;
> | + data = __be64_to_cpu((uint64_t)*addr);
> | + local64_set(&event->hw.prev_count, data);
> | +}
> | +
> | +static void p8_nest_perf_event_update(struct perf_event *event)
> | +{
> | + u64 counter_prev, counter_new, final_count;
> | + uint64_t *addr;
> | +
> | + addr = (u64 *)event->hw.event_base;
>
> uint64_t *?
My bad. will change it.
> | + counter_prev = local64_read(&event->hw.prev_count);
> | + counter_new = __be64_to_cpu((uint64_t)*addr);
>
> Redundant cast? addr is already uint64_t *?
Nice catch. Will remove it.
> | + final_count = counter_new - counter_prev;
> | +
> | + local64_set(&event->hw.prev_count, counter_new);
> | + local64_add(final_count, &event->count);
> | +}
> | +
> | +static void p8_nest_event_start(struct perf_event *event, int flags)
> | +{
>
> Check PERF_EF_RELOAD before reloading?
>
> | + event->hw.state = 0;
> | + p8_nest_read_counter(event);
> | +}
> | +
> | +static void p8_nest_event_stop(struct perf_event *event, int flags)
> | +{
>
> Check PERF_EF_UPDATE when stopping?
>
> | + p8_nest_perf_event_update(event);
> | +}
> | +
> | +static int p8_nest_event_add(struct perf_event *event, int flags)
> | +{
>
> Check PERF_EF_START flags before starting the counter on an ->add()?
Will add the flags.
> | + p8_nest_event_start(event, flags);
> | + return 0;
> | +}
> | +
> | +/*
> | + * Populate pmu ops in the structure
> | + */
> | +static int update_pmu_ops(struct nest_pmu *pmu)
> | +{
> | + if (!pmu)
> | + return -EINVAL;
> | +
> | + pmu->pmu.task_ctx_nr = perf_invalid_context;
> | + pmu->pmu.event_init = p8_nest_event_init;
> | + pmu->pmu.add = p8_nest_event_add;
> | + pmu->pmu.del = p8_nest_event_stop;
> | + pmu->pmu.start = p8_nest_event_start;
> | + pmu->pmu.stop = p8_nest_event_stop;
> | + pmu->pmu.read = p8_nest_perf_event_update;
> | + pmu->pmu.attr_groups = pmu->attr_groups;
> | +
> | + return 0;
> | +}
> | +
> | /*
> | * Populate event name and string in attribute
> | */
> | @@ -106,6 +208,7 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
> | /* Save the name to register the PMU with it */
> | sprintf(buf, "Nest_%s", (char *)pp->value);
> | pmu_ptr->pmu.name = (char *)buf;
> | + pmu_ptr->attr_groups[1] = &p8_nest_format_group;
> | }
> |
> | /* Skip these, we dont need it */
> | @@ -179,6 +282,12 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
> | (struct ppc64_nest_ima_events *)p8_events_arr,
> | idx, pmu_ptr);
> |
> | + update_pmu_ops(pmu_ptr);
> | +
> | + /* Register the pmu */
> | + perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
>
> There is a small chance that perf_pmu_register() can fail.
Sure. Will have a check for the return value.
Thanks for review
Maddy
> | + printk(KERN_INFO "Nest PMU %s Registered\n", pmu_ptr->pmu.name);
> | +
> | return 0;
> | }
> |
> | --
> | 1.9.1
> |
On Monday 22 June 2015 02:45 PM, Madhavan Srinivasan wrote:
>
> On Tuesday 16 June 2015 11:58 AM, Preeti U Murthy wrote:
>> On 06/11/2015 10:47 AM, Madhavan Srinivasan wrote:
>>> Adds cpumask attribute to be used by each nest pmu since nest
>>> units are per-chip. Only one cpu (first online cpu) from each node/chip
>>> is designated to read counters.
>>>
>>> On cpu hotplug, dying cpu is checked to see whether it is one of the
>>> designated cpus, if yes, next online cpu from the same node/chip is designated
>>> as new cpu to read counters.
>>>
>>> Cc: Michael Ellerman <[email protected]>
>>> Cc: Benjamin Herrenschmidt <[email protected]>
>>> Cc: Paul Mackerras <[email protected]>
>>> Cc: Anton Blanchard <[email protected]>
>>> Cc: Sukadev Bhattiprolu <[email protected]>
>>> Cc: Anshuman Khandual <[email protected]>
>>> Cc: Stephane Eranian <[email protected]>
>>> Cc: Preeti U Murthy <[email protected]>
>>> Cc: Ingo Molnar <[email protected]>
>>> Cc: Peter Zijlstra <[email protected]>
>>> Signed-off-by: Madhavan Srinivasan <[email protected]>
>>> ---
>>> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
>>> +{
>>> + int i;
>>> +
>>> + if (new_cpu >= 0) {
>>> + for (i = 0; per_nest_pmu_arr[i] != NULL; i++)
>>> + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
>>> + old_cpu, new_cpu);
>>> + }
>>> +}
>>> +
>>> +static void nest_exit_cpu(int cpu)
>>> +{
>>> + int i, nid, target = -1;
>>> + const struct cpumask *l_cpumask;
>>> + int src_chipid;
>>> +
>>> + /*
>>> + * Check in the designated list for this cpu. Dont bother
>>> + * if not one of them.
>>> + */
>>> + if (!cpumask_test_and_clear_cpu(cpu, &cpu_mask_nest_pmu))
>>> + return;
>>> +
>>> + /*
>>> + * Now that this cpu is one of the designated,
>>> + * find a new cpu a) which is not dying and
>> This comment is not right. nest_exit_cpu() is called in the hotplug
>> path, so another cpu cannot be dying in parallel. Hotplug operations are
>> done serially. The comment ought to be "a) which is online" instead.
> Ok will change it.
>
>>> + * b) is in same node/chip.
>> node is not the same as a chip right ? And you are interested in cpus on
>> the same chip alone. So shouldn't the above comment be b) in the same chip ?
> I was hoping it to be, but i will change comment to chip.
>
>>> + */
>>> + nid = cpu_to_node(cpu);
>>> + src_chipid = topology_physical_package_id(cpu);
>> What is the relation between a node and a chip ? Can't we have a
>> function which returns the cpumask of a chip straight away, since that
>> is what you seem to be more interested in ? You can then simply choose
>> the next cpu in this cpumask rather than going through the below loop.
>>
> Make sense. I can separate it out.
>
>>> + l_cpumask = cpumask_of_node(nid);
>>> + for_each_cpu(i, l_cpumask) {
>>> + if (i == cpu)
>>> + continue;
>>> + if (src_chipid == topology_physical_package_id(i)) {
>>> + target = i;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + /*
>>> + * Update the cpumask with the target cpu and
>>> + * migrate the context if needed
>>> + */
>>> + if (target >= 0) {
>> You already check for target >= 0 here. So you don't need to check for
>> new_cpu >= 0 in nest_change_cpu_context() above ?
> I guess i was way too cautious :) Will change it
>
>>> + cpumask_set_cpu(target, &cpu_mask_nest_pmu);
>>> + nest_change_cpu_context (cpu, target);
>>> + }
>>> +}
>>> +
>>> +static void nest_init_cpu(int cpu)
>>> +{
>>> + int i, src_chipid;
>>> +
>>> + /*
>>> + * Search for any existing designated thread from
>>> + * the incoming cpu's node/chip. If found, do nothing.
>>> + */
>>> + src_chipid = topology_physical_package_id(cpu);
>>> + for_each_cpu(i, &cpu_mask_nest_pmu)
>>> + if (src_chipid == topology_physical_package_id(i))
>>> + return;
>>> +
>>> + /*
>>> + * Make incoming cpu as a designated thread for
>>> + * this node/chip
>>> + */
>>> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
>> Why can't we simply check if cpu is the first one coming online in the
>> chip and designate it as the cpu_mask_nest_pmu for that chip ? If it is
>> not the first cpu, it means that another cpu in the same chip already
>> took over this duty and it needn't bother.
> Looks to be right. let me try it out.
>
>> And shouldn't we also call nest_init() on this cpu, just like you do in
>> cpumask_chip() on all cpu_mask_nest_pmu cpus ?
> Yes. I missed that. We should init. Nice catch.
I guess, we dont need to init again, since we dont stop the pore engine,
we are ok.
>>> +}
>>> +
>>> +static int nest_cpu_notifier(struct notifier_block *self,
>>> + unsigned long action, void *hcpu)
>>> +{
>>> + long cpu = (long)hcpu;
>>> +
>>> + switch (action & ~CPU_TASKS_FROZEN) {
>>> + case CPU_DOWN_FAILED:
>> Why do we need to handle the DOWN_FAILED case ? In DOWN_PREPARE, you
>> have ensured that the function moves on to another cpu. So even if the
>> offline failed, its no issue. The duty is safely taken over.
>>
>>> + case CPU_STARTING:
>> I would suggest initializing nest in the CPU_ONLINE stage. This is
>> because CPU_STARTING phase can fail. In that case, we will be
>> unnecessarily initializing nest pre-maturely. CPU_ONLINE phase assures
>> that the cpu is successfully online and you can then initiate nest.
> Ok sure. Will do that.
>
>>> + nest_init_cpu(cpu);
>>> + break;
>>> + case CPU_DOWN_PREPARE:
>>> + nest_exit_cpu(cpu);
>>> + break;
>>> + default:
>>> + break;
>>> + }
>>> +
>>> + return NOTIFY_OK;
>>> +}
>>> +
>>> +static struct notifier_block nest_cpu_nb = {
>>> + .notifier_call = nest_cpu_notifier,
>>> + .priority = CPU_PRI_PERF + 1,
>>> +};
>>> +
>>> +void cpumask_chip(void)
>> This name ^^ is not apt IMO. You are initiating the cpumask necessary
>> for nest pmu. So why not call it nest_pmu_cpumask_init() ?
> Ok.
>>> +{
>>> + const struct cpumask *l_cpumask;
>>> + int cpu, nid;
>>> +
>>> + if (!cpumask_empty(&cpu_mask_nest_pmu))
>> When can this condition become true ?
> My bad. This code ended up here from the initial RFC patch, but after
> reviewing this again, i dont think this is needed. Once again nice catch.
>
>>> + return;
>>> +
>>> + cpu_notifier_register_begin();
>>> +
>>> + /*
>>> + * Nest PMUs are per-chip counters. So designate a cpu
>>> + * from each node/chip for counter collection.
>>> + */
>>> + for_each_online_node(nid) {
>>> + l_cpumask = cpumask_of_node(nid);
>>> +
>>> + /* designate first online cpu in this node */
>>> + cpu = cpumask_first(l_cpumask);
>>> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
>>> + }
>>> +
>>> + /* Initialize Nest PMUs in each node using designated cpus */
>>> + on_each_cpu_mask(&cpu_mask_nest_pmu, (smp_call_func_t)nest_init, NULL, 1);
>>> +
>>> + __register_cpu_notifier(&nest_cpu_nb);
>>> +
>>> + cpu_notifier_register_done();
>>> +}
>> Regards
>> Preeti U Murthy
> _______________________________________________
> Linuxppc-dev mailing list
> [email protected]
> https://lists.ozlabs.org/listinfo/linuxppc-dev