This patchset enables Nest Instrumentation support on powerpc.
POWER8 has per-chip Nest Intrumentation which provides various
per-chip metrics like memory, powerbus, Xlink and Alink
bandwidth.
Nest Instrumentation provides an interface (via PORE Engine)
to configure and move the nest counter data to memory. From
kernel side, OPAL Call interface is used to activate/deactivate
PORE Engine for nest data collection.
OPAL at boot, detects the feature, initializes it and pass on
the nest units and other related information such as memory
region, events supported so on, to kernel via device-tree.
Kernel code then, parses the device-tree for nest pmu supports
and registers nest pmu with the events available. PORE Engine collects
and accumulate nest counter data in per-chip HOMER region, hence
device-tree also exports per-chip HOMER nest accumulation region.
And individual event offset are used as event values.
Here is sample perf usage to explain the interface.
#./perf list
....
iTLB-load-misses [Hardware cache event]
Nest_Alink_BW/Alink0/ [Kernel PMU event]
Nest_Alink_BW/Alink1/ [Kernel PMU event]
Nest_Alink_BW/Alink2/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Read_BW/MCS_03/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_00/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_01/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_02/ [Kernel PMU event]
Nest_MCS_Write_BW/MCS_03/ [Kernel PMU event]
Nest_PowerBus_BW/External/ [Kernel PMU event]
Nest_PowerBus_BW/Internal/ [Kernel PMU event]
Nest_Xlink_BW/Xlink0/ [Kernel PMU event]
Nest_Xlink_BW/Xlink1/ [Kernel PMU event]
Nest_Xlink_BW/Xlink2/ [Kernel PMU event]
rNNN [Raw hardware event descriptor]
cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor]
.....
# ./perf stat -e 'Nest_Xlink_BW/Xlink1/' -a -A sleep 1
Performance counter stats for 'system wide':
CPU0 15,913.18 MiB Nest_Xlink_BW/Xlink1/
CPU32 11,955.88 MiB Nest_Xlink_BW/Xlink1/
CPU64 11,042.43 MiB Nest_Xlink_BW/Xlink1/
CPU96 14,065.27 MiB Nest_Xlink_BW/Xlink1/
1.001062038 seconds time elapsed
# ./perf stat -e 'Nest_Alink_BW/Alink0/,Nest_Alink_BW/Alink1/,Nest_Alink_BW/Alink2/' -a -A -I 1000 sleep 5
Performance counter stats for 'system wide':
CPU0 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU32 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU64 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU96 0.00 MiB Nest_Alink_BW/Alink0/ (100.00%)
CPU0 1,430.43 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU32 320.99 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU64 3,443.83 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU96 1,904.41 MiB Nest_Alink_BW/Alink1/ (100.00%)
CPU0 2,856.85 MiB Nest_Alink_BW/Alink2/
CPU32 7.50 MiB Nest_Alink_BW/Alink2/
CPU64 4,034.29 MiB Nest_Alink_BW/Alink2/
CPU96 288.49 MiB Nest_Alink_BW/Alink2/
.....
OPAL side patches are posted in the skiboot mailing list.
Changelog from RFC:
1) Removed "uncore" code and made each Nest Unit a separate PMU.
2) Removed uncore type abstraction and uncore related functions.
3) Added simple cpumask function since these are per-chip counters
4) Redesigned device-tree parser based on the latest platform enablement code
5) Made changes to commit message
Kindly let me know you comments and feedback.
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Madhavan Srinivasan (9):
powerpc/powernv: Data structure and macros definition
powerpc/powernv: nest pmu init function with cpumask attr
powerpc/powernv: Add cpu hotplug support
powerpc/powernv: Add generic nest pmu ops
powerpc/powernv: nest pmu feature detection support
powerpc/powernv: dt parser function for nest pmu and its events
powerpc/powernv: Event attr creation and PMU registration
powerpc/powernv: Add OPAL support for Nest PMU
powerpc/powernv: Makefile changes to include nest pmu
arch/powerpc/include/asm/opal-api.h | 3 +-
arch/powerpc/include/asm/opal.h | 2 +
arch/powerpc/perf/Makefile | 2 +-
arch/powerpc/perf/nest-pmu.c | 489 +++++++++++++++++++++++++
arch/powerpc/perf/nest-pmu.h | 55 +++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
6 files changed, 550 insertions(+), 2 deletions(-)
create mode 100644 arch/powerpc/perf/nest-pmu.c
create mode 100644 arch/powerpc/perf/nest-pmu.h
--
1.9.1
Patch creates a new header file "nest-pmu.h" to add the data structures
and macros needed for the nest pmu support.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.h | 55 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
create mode 100644 arch/powerpc/perf/nest-pmu.h
diff --git a/arch/powerpc/perf/nest-pmu.h b/arch/powerpc/perf/nest-pmu.h
new file mode 100644
index 0000000..ed3f79f
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.h
@@ -0,0 +1,55 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+#define P8_MAX_CHIP 32
+#define MAX_PMU_NAME_LEN 256
+#define MAX_EVENTS_SUPPORTED 256
+#define P8_NEST_ENGINE_START 1
+#define P8_NEST_ENGINE_STOP 0
+#define P8_MAX_NEST_PMUS 32
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip HOMER region by the PORE Engine.
+ */
+struct perchip_nest_info {
+ uint32_t chip_id;
+ uint64_t pbase;
+ uint64_t vbase;
+ uint32_t size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct ppc64_nest_ima_events {
+ const char *ev_name;
+ const char *ev_value;
+};
+
+/*
+ * Device tree parser code detect nest pmu support
+ * and create new nest pmus. This structure will
+ * hold the pmu functions and attrs for each nest pmu and
+ * will be referenced at the time of pmu registering.
+ */
+struct nest_pmu {
+ struct pmu pmu;
+ const struct attribute_group *attr_groups[4];
+};
+
--
1.9.1
Patch creates a file "nest-pmu-c" to contain nest pmu related functions.
Patch adds nest pmu init function and cpumask function since Nest pmu units
are per-chip. First online cpu for a given node is picked as
designated thread to read the counter data.
Subsequent patch adds the hotplug support.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)
create mode 100644 arch/powerpc/perf/nest-pmu.c
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
new file mode 100644
index 0000000..d4413bb
--- /dev/null
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -0,0 +1,70 @@
+/*
+ * Nest Performance Monitor counter support for POWER8 processors.
+ *
+ * Copyright 2015 Madhavan Srinivasan, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "nest-pmu.h"
+
+static cpumask_t cpu_mask_nest_pmu;
+
+static ssize_t cpumask_nest_pmu_get_attr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &cpu_mask_nest_pmu);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_nest_pmu_get_attr, NULL);
+
+static struct attribute *cpumask_nest_pmu_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group cpumask_nest_pmu_attr_group = {
+ .attrs = cpumask_nest_pmu_attrs,
+};
+
+void cpumask_chip(void)
+{
+ const struct cpumask *l_cpumask;
+ int cpu, nid;
+
+ if (!cpumask_empty(&cpu_mask_nest_pmu)) {
+ printk(KERN_INFO "cpumask not empty\n");
+ return;
+ }
+
+ cpu_notifier_register_begin();
+ for_each_online_node(nid) {
+ l_cpumask = cpumask_of_node(nid);
+ cpu = cpumask_first(l_cpumask);
+ cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
+ }
+
+ cpu_notifier_register_done();
+}
+
+
+static int __init nest_pmu_init(void)
+{
+ int ret = 0;
+
+ /*
+ * Lets do this only if we are hypervisor
+ */
+ if (!cur_cpu_spec->oprofile_cpu_type ||
+ strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") ||
+ !cpu_has_feature(CPU_FTR_HVMODE))
+ return ret;
+
+ cpumask_chip();
+
+ return 0;
+}
+device_initcall(nest_pmu_init);
--
1.9.1
Patch adds cpu hotplug support. First online cpu in a node is picked as
designated thread to read the Nest pmu counter data, and at the time of
hotplug, next online cpu from the same node is picked up.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Preeti U Murthy <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 84 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 84 insertions(+)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index d4413bb..3e7010e 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -30,6 +30,86 @@ static struct attribute_group cpumask_nest_pmu_attr_group = {
.attrs = cpumask_nest_pmu_attrs,
};
+static void nest_init(void *dummy)
+{
+ opal_nest_ima_control(P8_NEST_ENGINE_START);
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+ int i;
+
+ for (i=0; per_nestpmu_arr[i] != NULL; i++)
+ perf_pmu_migrate_context(&per_nestpmu_arr[i]->pmu,
+ old_cpu, new_cpu);
+ return;
+}
+
+static void nest_exit_cpu(int cpu)
+{
+ int i, nid, target = -1;
+ const struct cpumask *l_cpumask;
+ int src_chipid;
+
+ if (!cpumask_test_and_clear_cpu(cpu, &cpu_mask_nest_pmu))
+ return;
+
+ nid = cpu_to_node(cpu);
+ src_chipid = topology_physical_package_id(cpu);
+ l_cpumask = cpumask_of_node(nid);
+ for_each_cpu(i, l_cpumask) {
+ if (i == cpu)
+ continue;
+ if (src_chipid == topology_physical_package_id(i)) {
+ target = i;
+ break;
+ }
+ }
+
+ cpumask_set_cpu(target, &cpu_mask_nest_pmu);
+ nest_change_cpu_context (cpu, target);
+ return;
+}
+
+static void nest_init_cpu(int cpu)
+{
+ int i, src_chipid;
+
+ src_chipid = topology_physical_package_id(cpu);
+ for_each_cpu(i, &cpu_mask_nest_pmu)
+ if (src_chipid == topology_physical_package_id(i))
+ return;
+
+ cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
+ nest_change_cpu_context ( -1, cpu);
+ return;
+}
+
+static int nest_cpu_notifier(struct notifier_block *self,
+ unsigned long action, void *hcpu)
+{
+ unsigned int cpu = (long)hcpu;
+
+ switch (action & ~CPU_TASKS_FROZEN) {
+ case CPU_DOWN_FAILED:
+ case CPU_STARTING:
+ nest_init_cpu(cpu);
+ break;
+ case CPU_DOWN_PREPARE:
+ nest_exit_cpu(cpu);
+ break;
+ default:
+ break;
+ }
+
+ return NOTIFY_OK;
+}
+
+static struct notifier_block nest_cpu_nb = {
+ .notifier_call = nest_cpu_notifier,
+ .priority = CPU_PRI_PERF + 1,
+}
+
void cpumask_chip(void)
{
const struct cpumask *l_cpumask;
@@ -47,6 +127,10 @@ void cpumask_chip(void)
cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
}
+ on_each_cpu_mask(&cpu_mask_nest_pmu, (smp_call_func_t)nest_init, NULL, 1);
+
+ __register_cpu_notifier(&nest_cpu_nb);
+
cpu_notifier_register_done();
}
--
1.9.1
Patch adds generic nest pmu functions and format attribute.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 107 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 107 insertions(+)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 3e7010e..345707c 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -134,6 +134,113 @@ void cpumask_chip(void)
cpu_notifier_register_done();
}
+PMU_FORMAT_ATTR(event, "config:0-20");
+struct attribute *p8_nest_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+struct attribute_group p8_nest_format_group = {
+ .name = "format",
+ .attrs = p8_nest_format_attrs,
+};
+
+int p8_nest_event_init(struct perf_event *event)
+{
+ int chip_id;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported yet */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest ||
+ event->attr.sample_period) /* no sampling */
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ chip_id = topology_physical_package_id(event->cpu);
+ event->hw.event_base = event->attr.config +
+ p8_perchip_nest_info[chip_id].vbase;
+
+ return 0;
+}
+
+void p8_nest_read_counter(struct perf_event *event)
+{
+ u64 *addr;
+ u64 data = 0;
+
+ addr = (u64 *)event->hw.event_base;
+ data = __be64_to_cpu((uint64_t)*addr);
+ local64_set(&event->hw.prev_count, data);
+}
+
+void p8_nest_perf_event_update(struct perf_event *event)
+{
+ u64 counter_prev, counter_new, final_count;
+ uint64_t *addr;
+
+ addr = (u64 *)event->hw.event_base;
+ counter_prev = local64_read(&event->hw.prev_count);
+ counter_new = __be64_to_cpu((uint64_t)*addr);
+ final_count = counter_new - counter_prev;
+
+ local64_set(&event->hw.prev_count, counter_new);
+ local64_add(final_count, &event->count);
+}
+
+void p8_nest_event_start(struct perf_event *event, int flags)
+{
+ event->hw.state = 0;
+ p8_nest_read_counter(event);
+}
+
+void p8_nest_event_stop(struct perf_event *event, int flags)
+{
+ p8_nest_perf_event_update(event);
+}
+
+int p8_nest_event_add(struct perf_event *event, int flags)
+{
+ p8_nest_event_start(event, flags);
+ return 0;
+}
+
+void p8_nest_event_del(struct perf_event *event, int flags)
+{
+ p8_nest_event_stop(event, flags);
+}
+
+/*
+ * Populate pmu ops in the structure
+ */
+static int update_pmu_ops(struct nest_pmu *pmu)
+{
+ if (!pmu)
+ return -EINVAL;
+
+ pmu->pmu.task_ctx_nr = perf_invalid_context;
+ pmu->pmu.event_init = p8_nest_event_init;
+ pmu->pmu.add = p8_nest_event_add;
+ pmu->pmu.del = p8_nest_event_del;
+ pmu->pmu.start = p8_nest_event_start;
+ pmu->pmu.stop = p8_nest_event_stop;
+ pmu->pmu.read = p8_nest_perf_event_update;
+ pmu->pmu.attr_groups = pmu->attr_groups;
+
+ return 0;
+}
static int __init nest_pmu_init(void)
{
--
1.9.1
Patch adds a device tree function to detect the nest pmu
support. Function will look for specific dt property "ibm,ima-chip"
as a detection mechanism for the nest pmu.
For Nest pmu, device tree will have two set of information.
1) Per-chip Homer address region for nest pmu counter collection area.
2) Supported Nest PMUs and events
Device tree layout for the Nest PMU as follows.
/ -- DT root folder
|
-nest-ima -- Nest PMU folder
|
-ima-chip@<chip-id> -- Per-chip folder for HOMER region information
|
-ibm,chip-id -- Chip id
-ibm,ima-chip
-reg -- HOMER PORE Nest Counter collection Address (RA)
-size -- size to map in kernel space
-Alink_BW -- Nest PMU folder
|
-Alink0 -- Nest PMU Alink Event file
-scale.Alink0.scale -- Event scale file
-unit.Alink0.unit -- Event unit file
-device_type -- "nest-ima-unit" marker
....
Patch save per-chip HOMER offset and maps the same to kernel structure.
Subsequent patch will parse the next part of the DT to find various
Nest PMUs and their events.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 345707c..c979e57 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -11,6 +11,7 @@
#include "nest-pmu.h"
+static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
static cpumask_t cpu_mask_nest_pmu;
static ssize_t cpumask_nest_pmu_get_attr(struct device *dev,
@@ -242,6 +243,36 @@ static int update_pmu_ops(struct nest_pmu *pmu)
return 0;
}
+static int nest_ima_detect_parse(void)
+{
+ const __be32 *gcid;
+ const __be64 *chip_ima_reg;
+ const __be64 *chip_ima_size;
+ struct device_node *dev;
+ int rc = -EINVAL, idx;
+
+ for_each_node_with_property(dev, "ibm,ima-chip") {
+ gcid = of_get_property(dev, "ibm,chip-id", NULL);
+ chip_ima_reg = of_get_property(dev, "reg", NULL);
+ chip_ima_size = of_get_property(dev, "size", NULL);
+ if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
+ pr_err("%s: device %s missing property \n",
+ __func__, dev->full_name);
+ return rc;
+ }
+
+ idx = (uint32_t)be32_to_cpup(gcid);
+ p8_perchip_nest_info[idx].pbase = be64_to_cpup(chip_ima_reg);
+ p8_perchip_nest_info[idx].size = be64_to_cpup(chip_ima_size);
+ p8_perchip_nest_info[idx].vbase = (uint64_t)
+ phys_to_virt(p8_perchip_nest_info[idx].pbase);
+
+ rc = 0;
+ }
+
+ return rc;
+}
+
static int __init nest_pmu_init(void)
{
int ret = 0;
@@ -256,6 +287,12 @@ static int __init nest_pmu_init(void)
cpumask_chip();
+ /*
+ * Detect the Nest PMU feature
+ */
+ if (nest_ima_detect_parse())
+ return 0;
+
return 0;
}
device_initcall(nest_pmu_init);
--
1.9.1
Patch adds a device-tree parser function to detect each Nest PMU
and will traverse through the folder to find the supported events and
corresponding unit and scale files, if any.
Event file will contain the offset in HOMER region to get the counter data
for a given event. Kernel DT parser looks for scale/unit in the file name
and pass on the file as an event attr for perf tool to use in the
post processing. For scale and unit, DT will have file name starting with
"scale.<event name>.scale" and "unit.<event name>.unit".
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 141 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 140 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index c979e57..514a0be 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -12,6 +12,7 @@
#include "nest-pmu.h"
static struct perchip_nest_info p8_perchip_nest_info[P8_MAX_CHIP];
+static struct nest_pmu *per_nestpmu_arr[P8_MAX_NEST_PMUS];
static cpumask_t cpu_mask_nest_pmu;
static ssize_t cpumask_nest_pmu_get_attr(struct device *dev,
@@ -243,6 +244,129 @@ static int update_pmu_ops(struct nest_pmu *pmu)
return 0;
}
+static int nest_pmu_create(struct device_node *dev, int pmu_index)
+{
+ struct ppc64_nest_ima_events **p8_events_arr;
+ struct ppc64_nest_ima_events *p8_events;
+ struct property *pp;
+ char *buf;
+ const __be32 *lval;
+ u32 val;
+ int len, idx = 0;
+ struct nest_pmu *pmu_ptr;
+ const char *start, *end;
+
+ if (!dev)
+ return -EINVAL;
+
+ pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
+ if (!pmu_ptr)
+ return -ENOMEM;
+
+ /* Needed for hotplug/migration */
+ per_nestpmu_arr[pmu_index] = pmu_ptr;
+
+ p8_events_arr = kzalloc((sizeof(struct ppc64_nest_ima_events) * 64),
+ GFP_KERNEL);
+ if (!p8_events_arr)
+ return -ENOMEM;
+ p8_events = (struct ppc64_nest_ima_events *)p8_events_arr;
+
+ /*
+ * Loop through each property
+ */
+ for_each_property_of_node(dev, pp) {
+ start = pp->name;
+ end = start + strlen(start);
+ len = strlen(start);
+
+ if (!strcmp(pp->name, "name")) {
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) >= pp->length))
+ return -EINVAL;
+
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ sprintf(buf, "Nest_%s", (char *)pp->value);
+ pmu_ptr->pmu.name = (char *)buf;
+ pmu_ptr->attr_groups[1] = &p8_nest_format_group;
+ pmu_ptr->attr_groups[2] = &cpumask_nest_pmu_attr_group;
+ }
+
+ /* Skip these, we dont need it */
+ if (!strcmp(pp->name, "name") ||
+ !strcmp(pp->name, "phandle") ||
+ !strcmp(pp->name, "device_type") ||
+ !strcmp(pp->name, "linux,phandle"))
+ continue;
+
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ if (strncmp(pp->name, "unit.", 5) == 0) {
+ start += 5;
+ len = strlen(start);
+ strncpy(buf, start, strlen(start));
+ p8_events->ev_name = buf;
+
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) >= pp->length))
+ return -EINVAL;
+
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ strncpy(buf, (const char *)pp->value, pp->length);
+ p8_events->ev_value = buf;
+ idx++;
+ p8_events++;
+
+ } else if (strncmp(pp->name, "scale.", 6) == 0) {
+ start += 6;
+ len = strlen(start);
+ strncpy(buf, start, strlen(start));
+ p8_events->ev_name = buf;
+
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) >= pp->length))
+ return -EINVAL;
+
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ strncpy(buf, (const char *)pp->value, pp->length);
+ p8_events->ev_value = buf;
+ idx++;
+ p8_events++;
+
+ } else {
+ strncpy(buf, start, len);
+ p8_events->ev_name = buf;
+ lval = of_get_property(dev, pp->name, NULL);
+ val = (uint32_t)be32_to_cpup(lval);
+
+ /*
+ * Use DT property value as the event
+ */
+ buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ sprintf(buf,"event=0x%x", val);
+ p8_events->ev_value = buf;
+ p8_events++;
+ idx++;
+ }
+ }
+
+ return 0;
+}
+
static int nest_ima_detect_parse(void)
{
const __be32 *gcid;
@@ -270,6 +394,21 @@ static int nest_ima_detect_parse(void)
rc = 0;
}
+ /*
+ * At this point if nest-ima not found in DT, return.
+ */
+ if (rc)
+ return rc;
+
+ /*
+ * Look for Nest IMA units supported here.
+ */
+ idx = 0; /* Reuse for nest pmu counts */
+ for_each_node_by_type(dev, "nest-ima-unit") {
+ nest_pmu_create(dev, idx);
+ idx++;
+ }
+
return rc;
}
@@ -288,7 +427,7 @@ static int __init nest_pmu_init(void)
cpumask_chip();
/*
- * Detect the Nest PMU feature
+ * Detect the Nest PMU feature and register the pmus
*/
if (nest_ima_detect_parse())
return 0;
--
1.9.1
Patch adds common event attribute function and Nest pmu registration call.
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/nest-pmu.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
index 514a0be..dd84fd7 100644
--- a/arch/powerpc/perf/nest-pmu.c
+++ b/arch/powerpc/perf/nest-pmu.c
@@ -244,6 +244,49 @@ static int update_pmu_ops(struct nest_pmu *pmu)
return 0;
}
+/*
+ * Populate event name and string in attribute
+ */
+struct attribute *dev_str_attr(char *name, char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ attr->event_str = (const char *)str;
+ attr->attr.attr.name = name;
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = perf_event_sysfs_show;
+
+ return &attr->attr.attr;
+}
+
+int update_events_in_group(
+ struct ppc64_nest_ima_events *p8_events, int idx,
+ struct nest_pmu *pmu)
+{
+ struct attribute_group *attr_group;
+ struct attribute **attrs;
+ int i;
+
+ attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
+ sizeof(*attr_group)), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ attrs = (struct attribute **)(attr_group + 1);
+ attr_group->name = "events";
+ attr_group->attrs = attrs;
+
+ for (i=0; i< idx; i++, p8_events++)
+ attrs[i] = dev_str_attr((char *)p8_events->ev_name,
+ (char *)p8_events->ev_value);
+
+ pmu->attr_groups[0] = attr_group;
+ return 0;
+}
+
+
static int nest_pmu_create(struct device_node *dev, int pmu_index)
{
struct ppc64_nest_ima_events **p8_events_arr;
@@ -364,6 +407,15 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
}
}
+ update_events_in_group(
+ (struct ppc64_nest_ima_events *)p8_events_arr,
+ idx, pmu_ptr);
+ update_pmu_ops(pmu_ptr);
+
+ /* Register the pmu */
+ perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+ printk(KERN_INFO "Nest PMU %s Registered\n", pmu_ptr->pmu.name);
+
return 0;
}
--
1.9.1
Nest Counters can be configured via PORE Engine and OPAL
provides an interface to start/stop it.
OPAL side patches are posted in the skiboot mailing.
Cc: Stewart Smith <[email protected]>
Cc: Jeremy Kerr <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 3 ++-
arch/powerpc/include/asm/opal.h | 2 ++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 0321a90..8011a75 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -153,7 +153,8 @@
#define OPAL_FLASH_READ 110
#define OPAL_FLASH_WRITE 111
#define OPAL_FLASH_ERASE 112
-#define OPAL_LAST 112
+#define OPAL_NEST_IMA_CONTROL 116
+#define OPAL_LAST 116
/* Device tree flags */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 042af1a..f86e5e9 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -201,6 +201,8 @@ int64_t opal_flash_write(uint64_t id, uint64_t offset, uint64_t buf,
int64_t opal_flash_erase(uint64_t id, uint64_t offset, uint64_t size,
uint64_t token);
+int64_t opal_nest_ima_control(uint32_t value);
+
/* Internal functions */
extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
int depth, void *data);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index a7ade94..ce36a68 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -295,3 +295,4 @@ OPAL_CALL(opal_i2c_request, OPAL_I2C_REQUEST);
OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
OPAL_CALL(opal_flash_write, OPAL_FLASH_WRITE);
OPAL_CALL(opal_flash_erase, OPAL_FLASH_ERASE);
+OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL);
--
1.9.1
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
---
arch/powerpc/perf/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f9c083a..6da656b 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o
obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \
power5+-pmu.o power6-pmu.o power7-pmu.o \
- power8-pmu.o
+ power8-pmu.o nest-pmu.o
obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o
obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
--
1.9.1
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
I'm not certain about this, but I _think_ this is supposed to be version
2 only:
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/COPYING#n9
> +/*
> + * Device tree parser code detect nest pmu support
> + * and create new nest pmus. This structure will
> + * hold the pmu functions and attrs for each nest pmu and
> + * will be referenced at the time of pmu registering.
> + */
The first sentence of that comment is unclear: I think your trying to
say "Device tree parser code detects nest pmu support and registers new
nest pmus"? Also s/registering/registration/.
> +struct nest_pmu {
> + struct pmu pmu;
> + const struct attribute_group *attr_groups[4];
> +};
> +
Regards,
Daniel Axtens
On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
> Patch creates a file "nest-pmu-c" to contain nest pmu related functions.
"nest-pmu.c"
> Patch adds nest pmu init function and cpumask function since Nest pmu units
> are per-chip. First online cpu for a given node is picked as
> designated thread to read the counter data.
>
> Subsequent patch adds the hotplug support.
>
> Cc: Michael Ellerman <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Sukadev Bhattiprolu <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Cc: Preeti U Murthy <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Signed-off-by: Madhavan Srinivasan <[email protected]>
> ---
> arch/powerpc/perf/nest-pmu.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 70 insertions(+)
> create mode 100644 arch/powerpc/perf/nest-pmu.c
>
> diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
> new file mode 100644
> index 0000000..d4413bb
> --- /dev/null
> +++ b/arch/powerpc/perf/nest-pmu.c
> @@ -0,0 +1,70 @@
> +/*
> + * Nest Performance Monitor counter support for POWER8 processors.
> + *
> + * Copyright 2015 Madhavan Srinivasan, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
Again, I think this is supposed to be v2 only.
> +#include "nest-pmu.h"
> +
> +static cpumask_t cpu_mask_nest_pmu;
> +
> +static ssize_t cpumask_nest_pmu_get_attr(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + return cpumap_print_to_pagebuf(true, buf, &cpu_mask_nest_pmu);
> +}
> +
> +static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_nest_pmu_get_attr, NULL);
> +
> +static struct attribute *cpumask_nest_pmu_attrs[] = {
> + &dev_attr_cpumask.attr,
> + NULL,
> +};
> +
> +static struct attribute_group cpumask_nest_pmu_attr_group = {
> + .attrs = cpumask_nest_pmu_attrs,
> +};
> +
> +void cpumask_chip(void)
> +{
> + const struct cpumask *l_cpumask;
> + int cpu, nid;
> +
> + if (!cpumask_empty(&cpu_mask_nest_pmu)) {
> + printk(KERN_INFO "cpumask not empty\n");
> + return;
> + }
> +
> + cpu_notifier_register_begin();
> + for_each_online_node(nid) {
> + l_cpumask = cpumask_of_node(nid);
> + cpu = cpumask_first(l_cpumask);
> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
> + }
> +
> + cpu_notifier_register_done();
> +}
It's not clear from the name of this function what it does. I don't
think I actually understand what it does: it appears to register a
notifier on the first cpu of each node; maybe that should be reflected
in the name.
> +static int __init nest_pmu_init(void)
> +{
> + int ret = 0;
> +
> + /*
> + * Lets do this only if we are hypervisor
> + */
> + if (!cur_cpu_spec->oprofile_cpu_type ||
> + strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") ||
> + !cpu_has_feature(CPU_FTR_HVMODE))
> + return ret;
> +
> + cpumask_chip();
> +
> + return 0;
> +}
- Where is ret set? I can only see it set when it's defined: the if
statment doesn't change the value of ret as far as I can see...
- Would it be clearer if you said
!(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0)
That would make it clearer that you're trying to get a list of
possible failure conditions.
- Is there really no better way to check if a CPU is a power 8 than an
string comparison?
> +device_initcall(nest_pmu_init);
Regards,
Daniel Axtens
On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
> Patch adds cpu hotplug support. First online cpu in a node is picked as
> designated thread to read the Nest pmu counter data, and at the time of
> hotplug, next online cpu from the same node is picked up.
I'm not sure I understand this commit message. I think I understand the
first half - I think you're trying to say: "At boot, the first online
CPU in a node is picked as the designated thread to read the Nest PMU
counter data." I'm not sure I understand the second half: "picked up"
how and for what?
(I did eventually figure it out by reading the patch, but it'd be really
nice to have it spelled out nicely in the commit message.)
> +static void nest_exit_cpu(int cpu)
> +{
> + int i, nid, target = -1;
> + const struct cpumask *l_cpumask;
> + int src_chipid;
> +
> + if (!cpumask_test_and_clear_cpu(cpu, &cpu_mask_nest_pmu))
> + return;
> +
> + nid = cpu_to_node(cpu);
> + src_chipid = topology_physical_package_id(cpu);
> + l_cpumask = cpumask_of_node(nid);
> + for_each_cpu(i, l_cpumask) {
> + if (i == cpu)
> + continue;
> + if (src_chipid == topology_physical_package_id(i)) {
> + target = i;
> + break;
> + }
> + }
Some comments here would really help. I think you're looking for the
first CPU that's (a) not the cpu you're removing and (b) on the same
physical package, so sharing the same nest, but it took me a lot of
staring at the code to figure it out.
> +
> + cpumask_set_cpu(target, &cpu_mask_nest_pmu);
> + nest_change_cpu_context (cpu, target);
> + return;
Return is redundant here and in several other functions in this patch.
> +}
> +
> +static void nest_init_cpu(int cpu)
> +{
> + int i, src_chipid;
> +
> + src_chipid = topology_physical_package_id(cpu);
> + for_each_cpu(i, &cpu_mask_nest_pmu)
> + if (src_chipid == topology_physical_package_id(i))
> + return;
> +
> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
> + nest_change_cpu_context ( -1, cpu);
Weird extra spaces here.
> + return;
> +}
This function could also do with a comment: AFAICT, you've structured
the function so that it only calls nest_change_cpu_context if you've
picked up a cpu on a physical package that previously didn't have a nest
pmu thread on it.
> +
> +static int nest_cpu_notifier(struct notifier_block *self,
> + unsigned long action, void *hcpu)
> +{
> + unsigned int cpu = (long)hcpu;
What's with this cast? You cast it to a long and then assign it to an
unsigned int?
> +
> + switch (action & ~CPU_TASKS_FROZEN) {
> + case CPU_DOWN_FAILED:
Is it necessary to move the thread back if the CPU fails to go down?
You've moved it to another online CPU already; what's the benefit of
paying the time-penalty to move it back?
> + case CPU_STARTING:
> + nest_init_cpu(cpu);
> + break;
> + case CPU_DOWN_PREPARE:
> + nest_exit_cpu(cpu);
> + break;
> + default:
> + break;
> + }
> +
> + return NOTIFY_OK;
> +}
>
Now, I don't know the details of CPU hotplug _at all_, so this may be
stupid, but what happens if you hotplug a lot of CPUs all at once? Is
everything properly serialised or is this going to race and end up with
either multiple cpus trying to do PMU or no cpus?
Regards,
Daniel Axtens
On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
> Patch adds generic nest pmu functions and format attribute.
>
I'm not sure this commit message accurately reflects the content of the
patch. At any rate, please could you:
- say what the patch adds the functions and attributes to.
- phrase your message as "Add generic ..." not "Patch adds
generic ...": see
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches#n155
>
> +PMU_FORMAT_ATTR(event, "config:0-20");
> +struct attribute *p8_nest_format_attrs[] = {
> + &format_attr_event.attr,
> + NULL,
> +};
> +
> +struct attribute_group p8_nest_format_group = {
> + .name = "format",
> + .attrs = p8_nest_format_attrs,
> +};
Can these structs be constified?
> +
> +int p8_nest_event_init(struct perf_event *event)
> +{
> + int chip_id;
> +
> + if (event->attr.type != event->pmu->type)
> + return -ENOENT;
> +
> + /* Sampling not supported yet */
> + if (event->hw.sample_period)
> + return -EINVAL;
> +
> + /* unsupported modes and filters */
> + if (event->attr.exclude_user ||
> + event->attr.exclude_kernel ||
> + event->attr.exclude_hv ||
> + event->attr.exclude_idle ||
> + event->attr.exclude_host ||
> + event->attr.exclude_guest ||
> + event->attr.sample_period) /* no sampling */
> + return -EINVAL;
You test for sample period twice here.
> +
> + if (event->cpu < 0)
> + return -EINVAL;
> +
> + chip_id = topology_physical_package_id(event->cpu);
> + event->hw.event_base = event->attr.config +
> + p8_perchip_nest_info[chip_id].vbase;
> +
> + return 0;
> +}
> +
> +void p8_nest_read_counter(struct perf_event *event)
> +{
> + u64 *addr;
> + u64 data = 0;
> +
> + addr = (u64 *)event->hw.event_base;
> + data = __be64_to_cpu((uint64_t)*addr);
> + local64_set(&event->hw.prev_count, data);
> +}
> +
> +void p8_nest_perf_event_update(struct perf_event *event)
> +{
> + u64 counter_prev, counter_new, final_count;
> + uint64_t *addr;
> +
> + addr = (u64 *)event->hw.event_base;
> + counter_prev = local64_read(&event->hw.prev_count);
> + counter_new = __be64_to_cpu((uint64_t)*addr);
> + final_count = counter_new - counter_prev;
> +
> + local64_set(&event->hw.prev_count, counter_new);
> + local64_add(final_count, &event->count);
> +}
> +
> +void p8_nest_event_start(struct perf_event *event, int flags)
> +{
> + event->hw.state = 0;
> + p8_nest_read_counter(event);
> +}
> +
> +void p8_nest_event_stop(struct perf_event *event, int flags)
> +{
> + p8_nest_perf_event_update(event);
> +}
> +
> +int p8_nest_event_add(struct perf_event *event, int flags)
> +{
> + p8_nest_event_start(event, flags);
> + return 0;
> +}
> +
> +void p8_nest_event_del(struct perf_event *event, int flags)
> +{
> + p8_nest_event_stop(event, flags);
Is this necessary?
Stop calls update, which I guess makes sense as it finalises the value.
But if the event is being deleted anyway, why not just do nothing here?
> +}
> +
Regards,
Daniel Axtens
On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
> Patch adds a device tree function to detect the nest pmu
> support. Function will look for specific dt property "ibm,ima-chip"
> as a detection mechanism for the nest pmu.
>
> For Nest pmu, device tree will have two set of information.
> 1) Per-chip Homer address region for nest pmu counter collection area.
> 2) Supported Nest PMUs and events
What's HOMER?
>
> +static int nest_ima_detect_parse(void)
> +{
> + const __be32 *gcid;
> + const __be64 *chip_ima_reg;
> + const __be64 *chip_ima_size;
> + struct device_node *dev;
> + int rc = -EINVAL, idx;
> +
> + for_each_node_with_property(dev, "ibm,ima-chip") {
> + gcid = of_get_property(dev, "ibm,chip-id", NULL);
> + chip_ima_reg = of_get_property(dev, "reg", NULL);
> + chip_ima_size = of_get_property(dev, "size", NULL);
> + if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
> + pr_err("%s: device %s missing property \n",
> + __func__, dev->full_name);
This is not a particularly informative error message. It'd be good if it
mentioned that it was for PMU.
> + return rc;
> + }
> +
> + idx = (uint32_t)be32_to_cpup(gcid);
> + p8_perchip_nest_info[idx].pbase = be64_to_cpup(chip_ima_reg);
> + p8_perchip_nest_info[idx].size = be64_to_cpup(chip_ima_size);
> + p8_perchip_nest_info[idx].vbase = (uint64_t)
> + phys_to_virt(p8_perchip_nest_info[idx].pbase);
> +
> + rc = 0;
> + }
> +
> + return rc;
I'm not sure your rc handling is correct. As I understand it:
- Start with rc = -EINVAL.
- If your first node is missing a property, return -EINVAL.
- Once your first node succeeds, set rc = 0
- If any subsequent node is missing a property, return 0.
- Return 0 if any node is successfully processed, otherwise return
-EINVAL.
If that's what you intended (especially with regards to returning 0 when
a subsequent node is missing a property), a comment explaining it would
be great.
Also, why bail out if a property is missing on any node? Why not try all
of them and see if any succeed?
> +}
> +
> static int __init nest_pmu_init(void)
> {
> int ret = 0;
> @@ -256,6 +287,12 @@ static int __init nest_pmu_init(void)
>
> cpumask_chip();
>
> + /*
> + * Detect the Nest PMU feature
> + */
> + if (nest_ima_detect_parse())
> + return 0;
> +
> return 0;
> }
Zero is returned regardless of the output of nest_ima_detect_parse. Is
that intentional? If so, do you need the 'if'?
> device_initcall(nest_pmu_init);
Regards,
Daniel Axtens
> +static int nest_pmu_create(struct device_node *dev, int pmu_index)
> +{
> + struct ppc64_nest_ima_events **p8_events_arr;
> + struct ppc64_nest_ima_events *p8_events;
> + struct property *pp;
> + char *buf;
> + const __be32 *lval;
> + u32 val;
> + int len, idx = 0;
> + struct nest_pmu *pmu_ptr;
> + const char *start, *end;
> +
> + if (!dev)
> + return -EINVAL;
> +
> + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
> + if (!pmu_ptr)
> + return -ENOMEM;
> +
> + /* Needed for hotplug/migration */
> + per_nestpmu_arr[pmu_index] = pmu_ptr;
> +
> + p8_events_arr = kzalloc((sizeof(struct ppc64_nest_ima_events) * 64),
> + GFP_KERNEL);
> + if (!p8_events_arr)
> + return -ENOMEM;
> + p8_events = (struct ppc64_nest_ima_events *)p8_events_arr;
I think you're trying to get the first element of the array here: why
not just `p8_events = p8_events_arr[0];`?
> +
> + /*
> + * Loop through each property
> + */
> + for_each_property_of_node(dev, pp) {
> + start = pp->name;
> + end = start + strlen(start);
> + len = strlen(start);
> +
> + if (!strcmp(pp->name, "name")) {
> + if (!pp->value ||
> + (strnlen(pp->value, pp->length) >= pp->length))
> + return -EINVAL;
> +
> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + sprintf(buf, "Nest_%s", (char *)pp->value);
> + pmu_ptr->pmu.name = (char *)buf;
> + pmu_ptr->attr_groups[1] = &p8_nest_format_group;
> + pmu_ptr->attr_groups[2] = &cpumask_nest_pmu_attr_group;
> + }
> +
> + /* Skip these, we dont need it */
> + if (!strcmp(pp->name, "name") ||
> + !strcmp(pp->name, "phandle") ||
> + !strcmp(pp->name, "device_type") ||
> + !strcmp(pp->name, "linux,phandle"))
> + continue;
> +
> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + if (strncmp(pp->name, "unit.", 5) == 0) {
> + start += 5;
> + len = strlen(start);
> + strncpy(buf, start, strlen(start));
You've just saved strlen(start), you could just use len. This also
applies in the next case below.
> + p8_events->ev_name = buf;
> +
> + if (!pp->value ||
> + (strnlen(pp->value, pp->length) >= pp->length))
> + return -EINVAL;
The strnlen will never be greater than pp->length, so the only case this
will hit is if strnlen(pp->value, pp->length) == pp->length. This also
applies again below.
> +
> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + strncpy(buf, (const char *)pp->value, pp->length);
> + p8_events->ev_value = buf;
> + idx++;
> + p8_events++;
> +
> + } else if (strncmp(pp->name, "scale.", 6) == 0) {
> + start += 6;
> + len = strlen(start);
> + strncpy(buf, start, strlen(start));
> + p8_events->ev_name = buf;
> +
> + if (!pp->value ||
> + (strnlen(pp->value, pp->length) >= pp->length))
> + return -EINVAL;
> +
> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + strncpy(buf, (const char *)pp->value, pp->length);
> + p8_events->ev_value = buf;
> + idx++;
> + p8_events++;
> +
> + } else {
> + strncpy(buf, start, len);
This is the only case where you actually use the orignal version of len.
This makes me think you could drop the variable entirely and just use
strlen(start) in all cases. I also don't see where `end` is used
anywhere in this function: could that be dropped?
> + p8_events->ev_name = buf;
> + lval = of_get_property(dev, pp->name, NULL);
> + val = (uint32_t)be32_to_cpup(lval);
> +
> + /*
> + * Use DT property value as the event
> + */
I'm not sure if this is my mailer, but it looks like lines 2 and 3 of
that comment need to be indented to line up under the * in the first
line.
> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + sprintf(buf,"event=0x%x", val);
> + p8_events->ev_value = buf;
> + p8_events++;
> + idx++;
> + }
> + }
> +
> + return 0;
> +}
> +
> @@ -288,7 +427,7 @@ static int __init nest_pmu_init(void)
> cpumask_chip();
>
> /*
> - * Detect the Nest PMU feature
> + * Detect the Nest PMU feature and register the pmus
> */
> if (nest_ima_detect_parse())
> return 0;
As the changed comment indicates, this function changes the behaviour of
nest_ima_detect_parse. Given that it's a new function introduced by this
patch series, maybe it should also change names.
> +int64_t opal_nest_ima_control(uint32_t value);
If I'm understanding things correctly, you call this function in patch
3. Quoting from that patch:
> +static void nest_init(void *dummy)
> +{
> + opal_nest_ima_control(P8_NEST_ENGINE_START);
> +}
Does this patch need to be moved earlier in the series?
Have you tested that the series compiles at every point?
(I've found that this can be done quite easily with
git rebase --interactive using x to run the compile)
> +
> /* Internal functions */
> extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> int depth, void *data);
> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
> index a7ade94..ce36a68 100644
> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
> @@ -295,3 +295,4 @@ OPAL_CALL(opal_i2c_request, OPAL_I2C_REQUEST);
> OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
> OPAL_CALL(opal_flash_write, OPAL_FLASH_WRITE);
> OPAL_CALL(opal_flash_erase, OPAL_FLASH_ERASE);
> +OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL);
On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
> Patch adds common event attribute function and Nest pmu registration call.
>
> Cc: Michael Ellerman <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Sukadev Bhattiprolu <[email protected]>
> Cc: Anshuman Khandual <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Signed-off-by: Madhavan Srinivasan <[email protected]>
> ---
> arch/powerpc/perf/nest-pmu.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 52 insertions(+)
>
> diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
> index 514a0be..dd84fd7 100644
> --- a/arch/powerpc/perf/nest-pmu.c
> +++ b/arch/powerpc/perf/nest-pmu.c
> @@ -244,6 +244,49 @@ static int update_pmu_ops(struct nest_pmu *pmu)
> return 0;
> }
>
> +/*
> + * Populate event name and string in attribute
> + */
> +struct attribute *dev_str_attr(char *name, char *str)
> +{
> + struct perf_pmu_events_attr *attr;
> +
> + attr = kzalloc(sizeof(*attr), GFP_KERNEL);
> +
> + attr->event_str = (const char *)str;
Erk. Two things:
- Is str const or not? If you're treating it as const here, should you
pass that through the function signature?
- Who is responsible for the memory behind it? It looks like a caller
can't construct str dynamically, pass it to this function and then free
it, because that will invalidate attr->event_str. Is this documented?
> + attr->attr.attr.name = name;
> + attr->attr.attr.mode = 0444;
> + attr->attr.show = perf_event_sysfs_show;
> +
> + return &attr->attr.attr;
If you're returning the address of attr->attr.attr, then:
- why don't you just deal directly with struct attribute * in the
function? Why an entire struct perf_pmu_events_attr *?
- with the function as written, if you return just &attr->attr.attr,
don't attr->event_str and attr->attr.show get lost?
> +}
> +
> +int update_events_in_group(
> + struct ppc64_nest_ima_events *p8_events, int idx,
> + struct nest_pmu *pmu)
> +{
> + struct attribute_group *attr_group;
> + struct attribute **attrs;
> + int i;
> +
> + attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
> + sizeof(*attr_group)), GFP_KERNEL);
> + if (!attr_group)
> + return -ENOMEM;
> +
> + attrs = (struct attribute **)(attr_group + 1);
> + attr_group->name = "events";
> + attr_group->attrs = attrs;
> +
> + for (i=0; i< idx; i++, p8_events++)
> + attrs[i] = dev_str_attr((char *)p8_events->ev_name,
> + (char *)p8_events->ev_value);
> +
> + pmu->attr_groups[0] = attr_group;
> + return 0;
> +}
I'm very confused by what this function is trying to do. Could you add
some comments? I'm particularly confused by the relationship between
attrs and attr_group.
> +
> +
> static int nest_pmu_create(struct device_node *dev, int pmu_index)
> {
> struct ppc64_nest_ima_events **p8_events_arr;
> @@ -364,6 +407,15 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
> }
> }
>
> + update_events_in_group(
> + (struct ppc64_nest_ima_events *)p8_events_arr,
> + idx, pmu_ptr);
> + update_pmu_ops(pmu_ptr);
> +
> + /* Register the pmu */
> + perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
> + printk(KERN_INFO "Nest PMU %s Registered\n", pmu_ptr->pmu.name);
> +
> return 0;
> }
>
Regards,
Daniel
On Wednesday 03 June 2015 04:41 AM, Daniel Axtens wrote:
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
> I'm not certain about this, but I _think_ this is supposed to be version
> 2 only:
> http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/COPYING#n9
I referred other files for license information from the same folder. So
I could wait
for the maintainers comment on this.
>
>> +/*
>> + * Device tree parser code detect nest pmu support
>> + * and create new nest pmus. This structure will
>> + * hold the pmu functions and attrs for each nest pmu and
>> + * will be referenced at the time of pmu registering.
>> + */
> The first sentence of that comment is unclear: I think your trying to
> say "Device tree parser code detects nest pmu support and registers new
> nest pmus"? Also s/registering/registration/.
Yes. I will rewrite the comment.
>> +struct nest_pmu {
>> + struct pmu pmu;
>> + const struct attribute_group *attr_groups[4];
>> +};
>> +
> Regards,
> Daniel Axtens
Thanks for the review
Maddy
On Wednesday 03 June 2015 04:44 AM, Daniel Axtens wrote:
> On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
>> Patch creates a file "nest-pmu-c" to contain nest pmu related functions.
> "nest-pmu.c"
>> Patch adds nest pmu init function and cpumask function since Nest pmu units
>> are per-chip. First online cpu for a given node is picked as
>> designated thread to read the counter data.
>>
>> Subsequent patch adds the hotplug support.
>>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Sukadev Bhattiprolu <[email protected]>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Stephane Eranian <[email protected]>
>> Cc: Preeti U Murthy <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Signed-off-by: Madhavan Srinivasan <[email protected]>
>> ---
>> arch/powerpc/perf/nest-pmu.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 70 insertions(+)
>> create mode 100644 arch/powerpc/perf/nest-pmu.c
>>
>> diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
>> new file mode 100644
>> index 0000000..d4413bb
>> --- /dev/null
>> +++ b/arch/powerpc/perf/nest-pmu.c
>> @@ -0,0 +1,70 @@
>> +/*
>> + * Nest Performance Monitor counter support for POWER8 processors.
>> + *
>> + * Copyright 2015 Madhavan Srinivasan, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +
> Again, I think this is supposed to be v2 only.
>
>> +#include "nest-pmu.h"
>> +
>> +static cpumask_t cpu_mask_nest_pmu;
>> +
>> +static ssize_t cpumask_nest_pmu_get_attr(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + return cpumap_print_to_pagebuf(true, buf, &cpu_mask_nest_pmu);
>> +}
>> +
>> +static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_nest_pmu_get_attr, NULL);
>> +
>> +static struct attribute *cpumask_nest_pmu_attrs[] = {
>> + &dev_attr_cpumask.attr,
>> + NULL,
>> +};
>> +
>> +static struct attribute_group cpumask_nest_pmu_attr_group = {
>> + .attrs = cpumask_nest_pmu_attrs,
>> +};
>> +
>> +void cpumask_chip(void)
>> +{
>> + const struct cpumask *l_cpumask;
>> + int cpu, nid;
>> +
>> + if (!cpumask_empty(&cpu_mask_nest_pmu)) {
>> + printk(KERN_INFO "cpumask not empty\n");
>> + return;
>> + }
>> +
>> + cpu_notifier_register_begin();
>> + for_each_online_node(nid) {
>> + l_cpumask = cpumask_of_node(nid);
>> + cpu = cpumask_first(l_cpumask);
>> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
>> + }
>> +
>> + cpu_notifier_register_done();
>> +}
> It's not clear from the name of this function what it does. I don't
> think I actually understand what it does: it appears to register a
> notifier on the first cpu of each node; maybe that should be reflected
> in the name.
My bad. Hotplug notification registration happens in the next patch.
could merge both as single patch.
>
>> +static int __init nest_pmu_init(void)
>> +{
>> + int ret = 0;
>> +
>> + /*
>> + * Lets do this only if we are hypervisor
>> + */
>> + if (!cur_cpu_spec->oprofile_cpu_type ||
>> + strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") ||
>> + !cpu_has_feature(CPU_FTR_HVMODE))
>> + return ret;
>> +
>> + cpumask_chip();
>> +
>> + return 0;
>> +}
> - Where is ret set? I can only see it set when it's defined: the if
> statment doesn't change the value of ret as far as I can see...
Yes. It should have set to error value. Will fix it.
> - Would it be clearer if you said
> !(strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power8") == 0)
> That would make it clearer that you're trying to get a list of
> possible failure conditions.
>
Yes. Sure will change it.
> - Is there really no better way to check if a CPU is a power 8 than an
> string comparison?
One other way I can think of is using PVR (Processor Version Register),
but then will end up having multiple checks for Power8 itself, so this
is lot simpler.
>> +device_initcall(nest_pmu_init);
> Regards,
> Daniel Axtens
Thanks for the review
Maddy
On Wednesday 03 June 2015 05:08 AM, Daniel Axtens wrote:
> On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
>> Patch adds cpu hotplug support. First online cpu in a node is picked as
>> designated thread to read the Nest pmu counter data, and at the time of
>> hotplug, next online cpu from the same node is picked up.
> I'm not sure I understand this commit message. I think I understand the
> first half - I think you're trying to say: "At boot, the first online
I will rephrase it.
> CPU in a node is picked as the designated thread to read the Nest PMU
> counter data." I'm not sure I understand the second half: "picked up"
> how and for what?
When the designated thread is hotplugged, next online cpu in the
same node is picked up as the designated thread to read the PMU counter
data.
> (I did eventually figure it out by reading the patch, but it'd be really
> nice to have it spelled out nicely in the commit message.)
Sure. Will fix the commit message.
>> +static void nest_exit_cpu(int cpu)
>> +{
>> + int i, nid, target = -1;
>> + const struct cpumask *l_cpumask;
>> + int src_chipid;
>> +
>> + if (!cpumask_test_and_clear_cpu(cpu, &cpu_mask_nest_pmu))
>> + return;
>> +
>> + nid = cpu_to_node(cpu);
>> + src_chipid = topology_physical_package_id(cpu);
>> + l_cpumask = cpumask_of_node(nid);
>> + for_each_cpu(i, l_cpumask) {
>> + if (i == cpu)
>> + continue;
>> + if (src_chipid == topology_physical_package_id(i)) {
>> + target = i;
>> + break;
>> + }
>> + }
> Some comments here would really help. I think you're looking for the
> first CPU that's (a) not the cpu you're removing and (b) on the same
> physical package, so sharing the same nest, but it took me a lot of
> staring at the code to figure it out.
My bad. I will comment it.
>> +
>> + cpumask_set_cpu(target, &cpu_mask_nest_pmu);
>> + nest_change_cpu_context (cpu, target);
>> + return;
> Return is redundant here and in several other functions in this patch.
Ok.
>> +}
>> +
>> +static void nest_init_cpu(int cpu)
>> +{
>> + int i, src_chipid;
>> +
>> + src_chipid = topology_physical_package_id(cpu);
>> + for_each_cpu(i, &cpu_mask_nest_pmu)
>> + if (src_chipid == topology_physical_package_id(i))
>> + return;
>> +
>> + cpumask_set_cpu(cpu, &cpu_mask_nest_pmu);
>> + nest_change_cpu_context ( -1, cpu);
> Weird extra spaces here.
Yes. Nice catch. Will fix it.
>> + return;
>> +}
> This function could also do with a comment: AFAICT, you've structured
> the function so that it only calls nest_change_cpu_context if you've
> picked up a cpu on a physical package that previously didn't have a nest
> pmu thread on it.
>
>> +
>> +static int nest_cpu_notifier(struct notifier_block *self,
>> + unsigned long action, void *hcpu)
>> +{
>> + unsigned int cpu = (long)hcpu;
> What's with this cast? You cast it to a long and then assign it to an
> unsigned int?
Facepalm. My bad, will fix it.
>> +
>> + switch (action & ~CPU_TASKS_FROZEN) {
>> + case CPU_DOWN_FAILED:
> Is it necessary to move the thread back if the CPU fails to go down?
No. not need.
> You've moved it to another online CPU already; what's the benefit of
> paying the time-penalty to move it back?
Why should go through that. Because, there is no restriction saying only
the first
cpu has to read it, why should we complicate it further instead of
moving to another
cpu in the same node.
>> + case CPU_STARTING:
>> + nest_init_cpu(cpu);
>> + break;
>> + case CPU_DOWN_PREPARE:
>> + nest_exit_cpu(cpu);
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + return NOTIFY_OK;
>> +}
>>
> Now, I don't know the details of CPU hotplug _at all_, so this may be
> stupid, but what happens if you hotplug a lot of CPUs all at once? Is
> everything properly serialised or is this going to race and end up with
> either multiple cpus trying to do PMU or no cpus?
I did test the code with hotplug test. If all the cpus in the node is
offlined,
then we will have no cpus designated for that node.
Thanks for review
Maddy
> Regards,
> Daniel Axtens
>
>
On Wednesday 03 June 2015 05:33 AM, Daniel Axtens wrote:
> On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
>> Patch adds generic nest pmu functions and format attribute.
>>
> I'm not sure this commit message accurately reflects the content of the
> patch. At any rate, please could you:
> - say what the patch adds the functions and attributes to.
> - phrase your message as "Add generic ..." not "Patch adds
> generic ...": see
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches#n155
Sure. Will rephrase it.
>
>>
>> +PMU_FORMAT_ATTR(event, "config:0-20");
>> +struct attribute *p8_nest_format_attrs[] = {
>> + &format_attr_event.attr,
>> + NULL,
>> +};
>> +
>> +struct attribute_group p8_nest_format_group = {
>> + .name = "format",
>> + .attrs = p8_nest_format_attrs,
>> +};
> Can these structs be constified?
I guess it can. Will check it out.
>> +
>> +int p8_nest_event_init(struct perf_event *event)
>> +{
>> + int chip_id;
>> +
>> + if (event->attr.type != event->pmu->type)
>> + return -ENOENT;
>> +
>> + /* Sampling not supported yet */
>> + if (event->hw.sample_period)
>> + return -EINVAL;
>> +
>> + /* unsupported modes and filters */
>> + if (event->attr.exclude_user ||
>> + event->attr.exclude_kernel ||
>> + event->attr.exclude_hv ||
>> + event->attr.exclude_idle ||
>> + event->attr.exclude_host ||
>> + event->attr.exclude_guest ||
>> + event->attr.sample_period) /* no sampling */
>> + return -EINVAL;
> You test for sample period twice here.
Yes right. I will remove it.
>> +
>> + if (event->cpu < 0)
>> + return -EINVAL;
>> +
>> + chip_id = topology_physical_package_id(event->cpu);
>> + event->hw.event_base = event->attr.config +
>> + p8_perchip_nest_info[chip_id].vbase;
>> +
>> + return 0;
>> +}
>> +
>> +void p8_nest_read_counter(struct perf_event *event)
>> +{
>> + u64 *addr;
>> + u64 data = 0;
>> +
>> + addr = (u64 *)event->hw.event_base;
>> + data = __be64_to_cpu((uint64_t)*addr);
>> + local64_set(&event->hw.prev_count, data);
>> +}
>> +
>> +void p8_nest_perf_event_update(struct perf_event *event)
>> +{
>> + u64 counter_prev, counter_new, final_count;
>> + uint64_t *addr;
>> +
>> + addr = (u64 *)event->hw.event_base;
>> + counter_prev = local64_read(&event->hw.prev_count);
>> + counter_new = __be64_to_cpu((uint64_t)*addr);
>> + final_count = counter_new - counter_prev;
>> +
>> + local64_set(&event->hw.prev_count, counter_new);
>> + local64_add(final_count, &event->count);
>> +}
>> +
>> +void p8_nest_event_start(struct perf_event *event, int flags)
>> +{
>> + event->hw.state = 0;
>> + p8_nest_read_counter(event);
>> +}
>> +
>> +void p8_nest_event_stop(struct perf_event *event, int flags)
>> +{
>> + p8_nest_perf_event_update(event);
>> +}
>> +
>> +int p8_nest_event_add(struct perf_event *event, int flags)
>> +{
>> + p8_nest_event_start(event, flags);
>> + return 0;
>> +}
>> +
>> +void p8_nest_event_del(struct perf_event *event, int flags)
>> +{
>> + p8_nest_event_stop(event, flags);
> Is this necessary?
>
> Stop calls update, which I guess makes sense as it finalises the value.
> But if the event is being deleted anyway, why not just do nothing here?
Since these Nest PMUs does not support sampling. IIUC, "perf record"
interface uses
the event start/stop ops. Incase of perf stat interface event add/del
interface are used to enable and disable the counters. Now, when we
disable or delete, we update the event counter with the delta value.
>> +}
>> +
> Regards,
> Daniel Axtens
Thanks for the review
Maddy
On Wednesday 03 June 2015 05:33 AM, Daniel Axtens wrote:
> On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
>> Patch adds generic nest pmu functions and format attribute.
>>
> I'm not sure this commit message accurately reflects the content of the
> patch. At any rate, please could you:
> - say what the patch adds the functions and attributes to.
> - phrase your message as "Add generic ..." not "Patch adds
> generic ...": see
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches#n155
>
I will rephrase the commit message.
>>
>> +PMU_FORMAT_ATTR(event, "config:0-20");
>> +struct attribute *p8_nest_format_attrs[] = {
>> + &format_attr_event.attr,
>> + NULL,
>> +};
>> +
>> +struct attribute_group p8_nest_format_group = {
>> + .name = "format",
>> + .attrs = p8_nest_format_attrs,
>> +};
> Can these structs be constified?
I guess so. Will try it out.
>> +
>> +int p8_nest_event_init(struct perf_event *event)
>> +{
>> + int chip_id;
>> +
>> + if (event->attr.type != event->pmu->type)
>> + return -ENOENT;
>> +
>> + /* Sampling not supported yet */
>> + if (event->hw.sample_period)
>> + return -EINVAL;
>> +
>> + /* unsupported modes and filters */
>> + if (event->attr.exclude_user ||
>> + event->attr.exclude_kernel ||
>> + event->attr.exclude_hv ||
>> + event->attr.exclude_idle ||
>> + event->attr.exclude_host ||
>> + event->attr.exclude_guest ||
>> + event->attr.sample_period) /* no sampling */
>> + return -EINVAL;
> You test for sample period twice here.
My bad. Will remove it.
>> +
>> + if (event->cpu < 0)
>> + return -EINVAL;
>> +
>> + chip_id = topology_physical_package_id(event->cpu);
>> + event->hw.event_base = event->attr.config +
>> + p8_perchip_nest_info[chip_id].vbase;
>> +
>> + return 0;
>> +}
>> +
>> +void p8_nest_read_counter(struct perf_event *event)
>> +{
>> + u64 *addr;
>> + u64 data = 0;
>> +
>> + addr = (u64 *)event->hw.event_base;
>> + data = __be64_to_cpu((uint64_t)*addr);
>> + local64_set(&event->hw.prev_count, data);
>> +}
>> +
>> +void p8_nest_perf_event_update(struct perf_event *event)
>> +{
>> + u64 counter_prev, counter_new, final_count;
>> + uint64_t *addr;
>> +
>> + addr = (u64 *)event->hw.event_base;
>> + counter_prev = local64_read(&event->hw.prev_count);
>> + counter_new = __be64_to_cpu((uint64_t)*addr);
>> + final_count = counter_new - counter_prev;
>> +
>> + local64_set(&event->hw.prev_count, counter_new);
>> + local64_add(final_count, &event->count);
>> +}
>> +
>> +void p8_nest_event_start(struct perf_event *event, int flags)
>> +{
>> + event->hw.state = 0;
>> + p8_nest_read_counter(event);
>> +}
>> +
>> +void p8_nest_event_stop(struct perf_event *event, int flags)
>> +{
>> + p8_nest_perf_event_update(event);
>> +}
>> +
>> +int p8_nest_event_add(struct perf_event *event, int flags)
>> +{
>> + p8_nest_event_start(event, flags);
>> + return 0;
>> +}
>> +
>> +void p8_nest_event_del(struct perf_event *event, int flags)
>> +{
>> + p8_nest_event_stop(event, flags);
> Is this necessary?
>
> Stop calls update, which I guess makes sense as it finalises the value.
> But if the event is being deleted anyway, why not just do nothing here?
IIUC, "perf record" will use the event start/stop interface. Incase of
"perf stat" (for PMUs which
does not support sampling), event add/del interface is used. Now when
event is disable or deleted,
event count should get updated with the delta value.
>> +}
>> +
> Regards,
> Daniel Axtens
Thanks for the review
Maddy
On Wednesday 03 June 2015 05:51 AM, Daniel Axtens wrote:
> On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
>> Patch adds a device tree function to detect the nest pmu
>> support. Function will look for specific dt property "ibm,ima-chip"
>> as a detection mechanism for the nest pmu.
>>
>> For Nest pmu, device tree will have two set of information.
>> 1) Per-chip Homer address region for nest pmu counter collection area.
>> 2) Supported Nest PMUs and events
> What's HOMER?
Nest PMUs are configured via PORE engine interface and PORE Engine
collections the Nest counter
value and updates in the main memory which is reserved for this use.
>>
>> +static int nest_ima_detect_parse(void)
>> +{
>> + const __be32 *gcid;
>> + const __be64 *chip_ima_reg;
>> + const __be64 *chip_ima_size;
>> + struct device_node *dev;
>> + int rc = -EINVAL, idx;
>> +
>> + for_each_node_with_property(dev, "ibm,ima-chip") {
>> + gcid = of_get_property(dev, "ibm,chip-id", NULL);
>> + chip_ima_reg = of_get_property(dev, "reg", NULL);
>> + chip_ima_size = of_get_property(dev, "size", NULL);
>> + if ((!gcid) || (!chip_ima_reg) || (!chip_ima_size)) {
>> + pr_err("%s: device %s missing property \n",
>> + __func__, dev->full_name);
> This is not a particularly informative error message. It'd be good if it
> mentioned that it was for PMU.
Sure will changes.
>> + return rc
>> + }
>> +
>> + idx = (uint32_t)be32_to_cpup(gcid);
>> + p8_perchip_nest_info[idx].pbase = be64_to_cpup(chip_ima_reg);
>> + p8_perchip_nest_info[idx].size = be64_to_cpup(chip_ima_size);
>> + p8_perchip_nest_info[idx].vbase = (uint64_t)
>> + phys_to_virt(p8_perchip_nest_info[idx].pbase);
>> +
>> + rc = 0;
>> + }
>> +
>> + return rc;
> I'm not sure your rc handling is correct. As I understand it:
> - Start with rc = -EINVAL.
> - If your first node is missing a property, return -EINVAL.
> - Once your first node succeeds, set rc = 0
> - If any subsequent node is missing a property, return 0.
> - Return 0 if any node is successfully processed, otherwise return
> -EINVAL.
Main loop is only for nodes with property "ibm,ima-chip". Not all the
nodes will have this
property.
> If that's what you intended (especially with regards to returning 0 when
> a subsequent node is missing a property), a comment explaining it would
> be great.
Yes. I will add comment explaining it. But i did add this in the commit
message.
> Also, why bail out if a property is missing on any node? Why not try all
> of them and see if any succeed?
Only the Nest Unit nodes in the device tree will have this property.
Commit has the
device tree hierarchy for the Nest instrumentation. So if we dont find
this property
then Nest instrumentation is not supported, hence bail out.
>
>> +}
>> +
>> static int __init nest_pmu_init(void)
>> {
>> int ret = 0;
>> @@ -256,6 +287,12 @@ static int __init nest_pmu_init(void)
>>
>> cpumask_chip();
>>
>> + /*
>> + * Detect the Nest PMU feature
>> + */
>> + if (nest_ima_detect_parse())
>> + return 0;
>> +
>> return 0;
>> }
> Zero is returned regardless of the output of nest_ima_detect_parse. Is
> that intentional? If so, do you need the 'if'?
No it should return "ret" which should be initialized to error value.
WIll fix it
>> device_initcall(nest_pmu_init);
> Regards,
> Daniel Axtens
>
Thanks for the review
MAddy
On Wednesday 03 June 2015 06:16 AM, Daniel Axtens wrote:
>> +static int nest_pmu_create(struct device_node *dev, int pmu_index)
>> +{
>> + struct ppc64_nest_ima_events **p8_events_arr;
>> + struct ppc64_nest_ima_events *p8_events;
>> + struct property *pp;
>> + char *buf;
>> + const __be32 *lval;
>> + u32 val;
>> + int len, idx = 0;
>> + struct nest_pmu *pmu_ptr;
>> + const char *start, *end;
>> +
>> + if (!dev)
>> + return -EINVAL;
>> +
>> + pmu_ptr = kzalloc(sizeof(struct nest_pmu), GFP_KERNEL);
>> + if (!pmu_ptr)
>> + return -ENOMEM;
>> +
>> + /* Needed for hotplug/migration */
>> + per_nestpmu_arr[pmu_index] = pmu_ptr;
>> +
>> + p8_events_arr = kzalloc((sizeof(struct ppc64_nest_ima_events) * 64),
>> + GFP_KERNEL);
>> + if (!p8_events_arr)
>> + return -ENOMEM;
>> + p8_events = (struct ppc64_nest_ima_events *)p8_events_arr;
> I think you're trying to get the first element of the array here: why
> not just `p8_events = p8_events_arr[0];`?
Yes. Will change it.
>> +
>> + /*
>> + * Loop through each property
>> + */
>> + for_each_property_of_node(dev, pp) {
>> + start = pp->name;
>> + end = start + strlen(start);
>> + len = strlen(start);
>> +
>> + if (!strcmp(pp->name, "name")) {
>> + if (!pp->value ||
>> + (strnlen(pp->value, pp->length) >= pp->length))
>> + return -EINVAL;
>> +
>> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
>> + if (!buf)
>> + return -ENOMEM;
>> +
>> + sprintf(buf, "Nest_%s", (char *)pp->value);
>> + pmu_ptr->pmu.name = (char *)buf;
>> + pmu_ptr->attr_groups[1] = &p8_nest_format_group;
>> + pmu_ptr->attr_groups[2] = &cpumask_nest_pmu_attr_group;
>> + }
>> +
>> + /* Skip these, we dont need it */
>> + if (!strcmp(pp->name, "name") ||
>> + !strcmp(pp->name, "phandle") ||
>> + !strcmp(pp->name, "device_type") ||
>> + !strcmp(pp->name, "linux,phandle"))
>> + continue;
>> +
>> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
>> + if (!buf)
>> + return -ENOMEM;
>> +
>> + if (strncmp(pp->name, "unit.", 5) == 0) {
>> + start += 5;
>> + len = strlen(start);
>> + strncpy(buf, start, strlen(start));
> You've just saved strlen(start), you could just use len. This also
> applies in the next case below.
Yes. That is true.
>> + p8_events->ev_name = buf;
>> +
>> + if (!pp->value ||
>> + (strnlen(pp->value, pp->length) >= pp->length))
>> + return -EINVAL;
> The strnlen will never be greater than pp->length, so the only case this
> will hit is if strnlen(pp->value, pp->length) == pp->length. This also
> applies again below.
True will change it.
>
>> +
>> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
>> + if (!buf)
>> + return -ENOMEM;
>> +
>> + strncpy(buf, (const char *)pp->value, pp->length);
>> + p8_events->ev_value = buf;
>> + idx++;
>> + p8_events++;
>> +
>> + } else if (strncmp(pp->name, "scale.", 6) == 0) {
>> + start += 6;
>> + len = strlen(start);
>> + strncpy(buf, start, strlen(start));
>> + p8_events->ev_name = buf;
>> +
>> + if (!pp->value ||
>> + (strnlen(pp->value, pp->length) >= pp->length))
>> + return -EINVAL;
>> +
>> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
>> + if (!buf)
>> + return -ENOMEM;
>> +
>> + strncpy(buf, (const char *)pp->value, pp->length);
>> + p8_events->ev_value = buf;
>> + idx++;
>> + p8_events++;
>> +
>> + } else {
>> + strncpy(buf, start, len);
> This is the only case where you actually use the orignal version of len.
> This makes me think you could drop the variable entirely and just use
> strlen(start) in all cases. I also don't see where `end` is used
> anywhere in this function: could that be dropped?
Correct. I guess we can drop both len and end. I used "end" for my
prints during debug.
>> + p8_events->ev_name = buf;
>> + lval = of_get_property(dev, pp->name, NULL);
>> + val = (uint32_t)be32_to_cpup(lval);
>> +
>> + /*
>> + * Use DT property value as the event
>> + */
> I'm not sure if this is my mailer, but it looks like lines 2 and 3 of
> that comment need to be indented to line up under the * in the first
> line.
No, it is not your mail :). Will fix the indentation.
>> + buf = kzalloc(MAX_PMU_NAME_LEN, GFP_KERNEL);
>> + if (!buf)
>> + return -ENOMEM;
>> +
>> + sprintf(buf,"event=0x%x", val);
>> + p8_events->ev_value = buf;
>> + p8_events++;
>> + idx++;
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> @@ -288,7 +427,7 @@ static int __init nest_pmu_init(void)
>> cpumask_chip();
>>
>> /*
>> - * Detect the Nest PMU feature
>> + * Detect the Nest PMU feature and register the pmus
>> */
>> if (nest_ima_detect_parse())
>> return 0;
> As the changed comment indicates, this function changes the behaviour of
> nest_ima_detect_parse. Given that it's a new function introduced by this
> patch series, maybe it should also change names.
>
Ok will fix it.
Thanks for the review.
Maddy
On Wednesday 03 June 2015 06:24 AM, Daniel Axtens wrote:
>> +int64_t opal_nest_ima_control(uint32_t value);
> If I'm understanding things correctly, you call this function in patch
> 3. Quoting from that patch:
>> +static void nest_init(void *dummy)
>> +{
>> + opal_nest_ima_control(P8_NEST_ENGINE_START);
>> +}
> Does this patch need to be moved earlier in the series?
I applied all the patches together and tested it since the Makefile
inclusion is
the final patch in the series. I guess it is better rearrange the series.
> Have you tested that the series compiles at every point?
> (I've found that this can be done quite easily with
> git rebase --interactive using x to run the compile)
Nice. will try this out.
Thanks for the review
Maddy
>> +
>> /* Internal functions */
>> extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>> int depth, void *data);
>> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
>> index a7ade94..ce36a68 100644
>> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
>> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
>> @@ -295,3 +295,4 @@ OPAL_CALL(opal_i2c_request, OPAL_I2C_REQUEST);
>> OPAL_CALL(opal_flash_read, OPAL_FLASH_READ);
>> OPAL_CALL(opal_flash_write, OPAL_FLASH_WRITE);
>> OPAL_CALL(opal_flash_erase, OPAL_FLASH_ERASE);
>> +OPAL_CALL(opal_nest_ima_control, OPAL_NEST_IMA_CONTROL);
On Wed, 2015-06-03 at 09:11 +1000, Daniel Axtens wrote:
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
>
> I'm not certain about this, but I _think_ this is supposed to be version
> 2 only:
> http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/COPYING#n9
Urgh.
Yeah you're right, but we've been very sloppy about that over the years. And so
we have a lot of files that say .. or later.
And now that they say that we can't change them.
For a new file though we should just use the v2 only wording.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License version 2 as published
by the Free Software Foundation.
cheers
On Wednesday 03 June 2015 06:36 AM, Daniel Axtens wrote:
> On Tue, 2015-06-02 at 21:29 +0530, Madhavan Srinivasan wrote:
>> Patch adds common event attribute function and Nest pmu registration call.
>>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: Benjamin Herrenschmidt <[email protected]>
>> Cc: Paul Mackerras <[email protected]>
>> Cc: Sukadev Bhattiprolu <[email protected]>
>> Cc: Anshuman Khandual <[email protected]>
>> Cc: Stephane Eranian <[email protected]>
>> Signed-off-by: Madhavan Srinivasan <[email protected]>
>> ---
>> arch/powerpc/perf/nest-pmu.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 52 insertions(+)
>>
>> diff --git a/arch/powerpc/perf/nest-pmu.c b/arch/powerpc/perf/nest-pmu.c
>> index 514a0be..dd84fd7 100644
>> --- a/arch/powerpc/perf/nest-pmu.c
>> +++ b/arch/powerpc/perf/nest-pmu.c
>> @@ -244,6 +244,49 @@ static int update_pmu_ops(struct nest_pmu *pmu)
>> return 0;
>> }
>>
>> +/*
>> + * Populate event name and string in attribute
>> + */
>> +struct attribute *dev_str_attr(char *name, char *str)
>> +{
>> + struct perf_pmu_events_attr *attr;
>> +
>> + attr = kzalloc(sizeof(*attr), GFP_KERNEL);
>> +
>> + attr->event_str = (const char *)str;
> Erk. Two things:
> - Is str const or not? If you're treating it as const here, should you
> pass that through the function signature?
> - Who is responsible for the memory behind it? It looks like a caller
> can't construct str dynamically, pass it to this function and then free
> it, because that will invalidate attr->event_str. Is this documented?
Yes. Valid point. str should be and it is const. My bad, will fix the
function
signature.
>> + attr->attr.attr.name = name;
>> + attr->attr.attr.mode = 0444;
>> + attr->attr.show = perf_event_sysfs_show;
>> +
>> + return &attr->attr.attr;
> If you're returning the address of attr->attr.attr, then:
> - why don't you just deal directly with struct attribute * in the
> function? Why an entire struct perf_pmu_events_attr *?
> - with the function as written, if you return just &attr->attr.attr,
> don't attr->event_str and attr->attr.show get lost?
Kindly have should look at perf_event_sysfs_show function in
include/linux/perf_event.h.
Even though we return only &attr->attr.attr, we are not freeing the
memory of
perf_pmu_event_attr, hence will not be lost :) .
>> +}
>> +
>> +int update_events_in_group(
>> + struct ppc64_nest_ima_events *p8_events, int idx,
>> + struct nest_pmu *pmu)
>> +{
>> + struct attribute_group *attr_group;
>> + struct attribute **attrs;
>> + int i;
>> +
>> + attr_group = kzalloc(((sizeof(struct attribute *) * (idx + 1)) +
>> + sizeof(*attr_group)), GFP_KERNEL);
>> + if (!attr_group)
>> + return -ENOMEM;
>> +
>> + attrs = (struct attribute **)(attr_group + 1);
>> + attr_group->name = "events";
>> + attr_group->attrs = attrs;
>> +
>> + for (i=0; i< idx; i++, p8_events++)
>> + attrs[i] = dev_str_attr((char *)p8_events->ev_name,
>> + (char *)p8_events->ev_value);
>> +
>> + pmu->attr_groups[0] = attr_group;
>> + return 0;
>> +}
> I'm very confused by what this function is trying to do. Could you add
> some comments? I'm particularly confused by the relationship between
> attrs and attr_group.
This function mainly creates a "event" attribute group for this PMU.
It does so with the list of event files parsed from the device tree
for this pmu in the nest_pmu_create function.
WIll add comments in the next version.
>> +
>> +
>> static int nest_pmu_create(struct device_node *dev, int pmu_index)
>> {
>> struct ppc64_nest_ima_events **p8_events_arr;
>> @@ -364,6 +407,15 @@ static int nest_pmu_create(struct device_node *dev, int pmu_index)
>> }
>> }
>>
>> + update_events_in_group(
>> + (struct ppc64_nest_ima_events *)p8_events_arr,
>> + idx, pmu_ptr);
>> + update_pmu_ops(pmu_ptr);
>> +
>> + /* Register the pmu */
>> + perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
>> + printk(KERN_INFO "Nest PMU %s Registered\n", pmu_ptr->pmu.name);
>> +
>> return 0;
>> }
>>
> Regards,
> Daniel
>
Apologizes on late response to this mail. Missed it.
Thanks for review
Maddy