2016-12-20 06:40:29

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v3 0/6] IMC Instrumentation Support

Power 9 has In-Memory-Collection (IMC) infrastructure which contains
various Performance Monitoring Units (PMUs) at Nest level (these are
on-chip but off-core). These Nest PMU counters are handled by a Nest
IMC microcode. This microcode runs in the OCC (On-Chip Controller)
complex and its purpose is to program the nest counters, collect the
counter data and move the counter data to memory.

The IMC infrastructure encapsulates nest (per-chip), core and thread
level counters. While the nest IMC PMUs are handled by the nest IMC
microcode, the core and thread level PMUs are handled by the Core-HPMC
engine. This patchset enables the nest IMC PMUs and is based on the
initial work done by Madhavan Srinivasan.
"Nest Instrumentation Support" : https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132078.html

v1 for this patchset can be found here :
https://lwn.net/Articles/705475/

Nest events:
Per-chip nest instrumentation provides various per-chip metrics
such as memory, powerbus, Xlink and Alink bandwidth.

PMU Events' Information:
OPAL obtains the Nest PMU and event information from the IMC Catalog
and passes on to the kernel via the device tree. The events' information
contains :
- Event name
- Event Offset
- Event description
and, maybe :
- Event scale
- Event unit

Some PMUs may have a common scale and unit values for all their
supported events. For those cases, the scale and unit properties for
those events must be inherited from the PMU.

The event offset in the memory is where the counter data gets
accumulated.

The OPAL-side patches are posted upstream :
https://lists.ozlabs.org/pipermail/skiboot/2016-November/005552.html

The kernel discovers the IMC counters information in the device tree
at the "imc-counters" device node which has a compatible field
"ibm,opal-in-memory-counters".

Parsing of the Events' information:
To parse the IMC PMUs and events information, the kernel has to
discover the "imc-counters" node and walk through the pmu and event
nodes.

Here is an excerpt of the dt showing the imc-counters and mcs node:
/dts-v1/;

[...]
imc-counters {
imc-nest-offset = <0x320000>;
compatible = "ibm,opal-in-memory-counters";
imc-nest-size = <0x30000>;
#address-cells = <0x1>;
#size-cells = <0x1>;
phandle = <0x10000238>;
version-id = [00];

mcs0 {
compatible = "ibm,imc-counters-chip";
ranges;
#address-cells = <0x1>;
#size-cells = <0x1>;
phandle = <0x10000279>;
scale = "1.2207e-4";
unit = "MiB";

event@528 {
event-name = "PM_MCS_UP_128B_DATA_XFER_MC0" ;
desc = "Total Read Bandwidth seen on both MCS of MC0";
phandle = <0x1000028c>;
reg = <0x118 0x8>;
};
[...]

>From the device tree, the kernel parses the PMUs and their events'
information.

After parsing the nest IMC PMUs and their events, the PMUs and their
attributes are registered in the kernel.

Example Usage :
# perf list

[...]
nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/ [Kernel PMU event]
nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0_LAST_SAMPLE/ [Kernel PMU event]
[...]

# perf stat -e "nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/" -a --per-socket

TODOs:
- Add support for Core IMC.
- Add support for thread IMC.

Comments/feedback/suggestions are welcome.

Changelog:
v2 -> v3 :
- Changed all references for IMA (In-Memory Accumulation) to IMC (In-Memory
Collection).
v1 -> v2 :
- Account for the cases where a PMU can have a common scale and unit
values for all its supported events (Patch 3/6).
- Fixed a Build error (for maple_defconfig) by enabling imc_pmu.o
only for CONFIG_PPC_POWERNV=y (Patch 4/6)
- Read from the "event-name" property instead of "name" for an event
node (Patch 3/6).

Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Stewart Smith <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Hemant Kumar <[email protected]>

Hemant Kumar (6):
powerpc/powernv: Data structure and macros definitions
powerpc/powernv: Autoload IMC device driver module
powerpc/powernv: Detect supported IMC units and its events
powerpc/perf: Add event attribute and group to IMC pmus
powerpc/perf: Generic imc pmu event functions
powerpc/perf: IMC pmu cpumask and cpu hotplug support

arch/powerpc/include/asm/imc-pmu.h | 74 ++++
arch/powerpc/include/asm/opal-api.h | 3 +-
arch/powerpc/include/asm/opal.h | 2 +
arch/powerpc/perf/Makefile | 6 +-
arch/powerpc/perf/imc-pmu.c | 383 ++++++++++++++++++++
arch/powerpc/platforms/powernv/Makefile | 2 +-
arch/powerpc/platforms/powernv/opal-imc.c | 478 +++++++++++++++++++++++++
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
arch/powerpc/platforms/powernv/opal.c | 13 +
9 files changed, 959 insertions(+), 3 deletions(-)
create mode 100644 arch/powerpc/include/asm/imc-pmu.h
create mode 100644 arch/powerpc/perf/imc-pmu.c
create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

--
2.7.4


2016-12-20 06:40:41

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v3 2/6] powerpc/powernv: Autoload IMC device driver module

This patch does three things :
- Enables "opal.c" to create a platform device for the IMC interface
according to the appropriate compatibility string.
- Find the reserved-memory region details from the system device tree
and get the base address of HOMER region address for each chip.
- We also get the Nest PMU counter data offsets (in the HOMER region)
and their sizes. The offsets for the counters' data are fixed and
won't change from chip to chip.

The device tree parsing logic is separated from the PMU creation
functions (which is done in subsequent patches). Right now, only Nest
units are taken care of.

Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Stewart Smith <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Hemant Kumar <[email protected]>
---
arch/powerpc/platforms/powernv/Makefile | 2 +-
arch/powerpc/platforms/powernv/opal-imc.c | 117 ++++++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/opal.c | 13 ++++
3 files changed, 131 insertions(+), 1 deletion(-)
create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..44909fe 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o opal-async.o idle.o
obj-y += opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
obj-y += opal-msglog.o opal-hmi.o opal-power.o opal-irqchip.o
-obj-y += opal-kmsg.o
+obj-y += opal-kmsg.o opal-imc.o

obj-$(CONFIG_SMP) += smp.o subcore.o subcore-asm.o
obj-$(CONFIG_PCI) += pci.o pci-ioda.o npu-dma.o
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
new file mode 100644
index 0000000..ee2ae45
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -0,0 +1,117 @@
+/*
+ * OPAL IMC interface detection driver
+ * Supported on POWERNV platform
+ *
+ * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ * (C) 2016 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/miscdevice.h>
+#include <linux/fs.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
+#include <linux/poll.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <asm/opal.h>
+#include <asm/io.h>
+#include <asm/uaccess.h>
+#include <asm/cputable.h>
+#include <asm/imc-pmu.h>
+
+struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+
+static int opal_imc_counters_probe(struct platform_device *pdev)
+{
+ struct device_node *child, *imc_dev, *rm_node = NULL;
+ struct perchip_nest_info *pcni;
+ u32 reg[4], pages, nest_offset, nest_size, idx;
+ int i = 0;
+ const char *node_name;
+
+ if (!pdev || !pdev->dev.of_node)
+ return -ENODEV;
+
+ imc_dev = pdev->dev.of_node;
+
+ /*
+ * nest_offset : where the nest-counters' data start.
+ * size : size of the entire nest-counters region
+ */
+ if (of_property_read_u32(imc_dev, "imc-nest-offset", &nest_offset))
+ goto err;
+ if (of_property_read_u32(imc_dev, "imc-nest-size", &nest_size))
+ goto err;
+
+ /* Find the "homer region" for each chip */
+ rm_node = of_find_node_by_path("/reserved-memory");
+ if (!rm_node)
+ goto err;
+
+ for_each_child_of_node(rm_node, child) {
+ if (of_property_read_string_index(child, "name", 0,
+ &node_name))
+ continue;
+ if (strncmp("ibm,homer-image", node_name,
+ strlen("ibm,homer-image")))
+ continue;
+
+ /* Get the chip id to which the above homer region belongs to */
+ if (of_property_read_u32(child, "ibm,chip-id", &idx))
+ goto err;
+
+ /* reg property will have four u32 cells. */
+ if (of_property_read_u32_array(child, "reg", reg, 4))
+ goto err;
+
+ pcni = &nest_perchip_info[idx];
+
+ /* Fetch the homer region base address */
+ pcni->pbase = reg[0];
+ pcni->pbase = pcni->pbase << 32 | reg[1];
+ /* Add the nest IMC Base offset */
+ pcni->pbase = pcni->pbase + nest_offset;
+ /* Fetch the size of the homer region */
+ pcni->size = nest_size;
+
+ do {
+ pages = PAGE_SIZE * i;
+ pcni->vbase[i++] = (u64)phys_to_virt(pcni->pbase +
+ pages);
+ } while (i < (pcni->size / PAGE_SIZE));
+ }
+
+ return 0;
+err:
+ return -ENODEV;
+}
+
+static const struct of_device_id opal_imc_match[] = {
+ { .compatible = IMC_DTB_COMPAT },
+ {},
+};
+
+static struct platform_driver opal_imc_driver = {
+ .driver = {
+ .name = "opal-imc-counters",
+ .of_match_table = opal_imc_match,
+ },
+ .probe = opal_imc_counters_probe,
+};
+
+MODULE_DEVICE_TABLE(of, opal_imc_match);
+module_platform_driver(opal_imc_driver);
+MODULE_DESCRIPTION("PowerNV OPAL IMC driver");
+MODULE_LICENSE("GPL");
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 6c9a65b..a0bb336 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -30,6 +30,7 @@
#include <asm/opal.h>
#include <asm/firmware.h>
#include <asm/mce.h>
+#include <asm/imc-pmu.h>

#include "powernv.h"

@@ -650,6 +651,15 @@ static void opal_i2c_create_devs(void)
of_platform_device_create(np, NULL, NULL);
}

+static void opal_imc_init_dev(void)
+{
+ struct device_node *np;
+
+ np = of_find_compatible_node(NULL, NULL, IMC_DTB_COMPAT);
+ if (np)
+ of_platform_device_create(np, NULL, NULL);
+}
+
static int kopald(void *unused)
{
unsigned long timeout = msecs_to_jiffies(opal_heartbeat) + 1;
@@ -723,6 +733,9 @@ static int __init opal_init(void)
/* Setup a heatbeat thread if requested by OPAL */
opal_init_heartbeat();

+ /* Detect IMC pmu counters support and create PMUs */
+ opal_imc_init_dev();
+
/* Create leds platform devices */
leds = of_find_node_by_path("/ibm,opal/leds");
if (leds) {
--
2.7.4

2016-12-20 06:40:47

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v3 1/6] powerpc/powernv: Data structure and macros definitions

Create new header file "imc-pmu.h" to add the data structures
and macros needed for IMC pmu support.

Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Stewart Smith <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Hemant Kumar <[email protected]>
---
arch/powerpc/include/asm/imc-pmu.h | 73 ++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)
create mode 100644 arch/powerpc/include/asm/imc-pmu.h

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
new file mode 100644
index 0000000..911d837
--- /dev/null
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -0,0 +1,73 @@
+#ifndef PPC_POWERNV_IMC_PMU_DEF_H
+#define PPC_POWERNV_IMC_PMU_DEF_H
+
+/*
+ * IMC Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ * (C) 2016 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+#define IMC_MAX_CHIPS 32
+#define IMC_MAX_PMUS 32
+#define IMC_MAX_PMU_NAME_LEN 256
+
+#define NEST_IMC_ENGINE_START 1
+#define NEST_IMC_ENGINE_STOP 0
+#define NEST_MAX_PAGES 16
+
+#define NEST_IMC_PRODUCTION_MODE 1
+
+#define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
+#define IMC_DTB_NEST_COMPAT "ibm,imc-counters-chip"
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+ u32 chip_id;
+ u64 pbase;
+ u64 vbase[NEST_MAX_PAGES];
+ u64 size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct imc_events {
+ char *ev_name;
+ char *ev_value;
+};
+
+/*
+ * Device tree parser code detects IMC pmu support and
+ * registers new IMC pmus. This structure will
+ * hold the pmu functions and attrs for each imc pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct imc_pmu {
+ struct pmu pmu;
+ int domain;
+ const struct attribute_group *attr_groups[4];
+};
+
+/*
+ * Domains for IMC PMUs
+ */
+#define IMC_DOMAIN_NEST 1
+
+#define UNKNOWN_DOMAIN -1
+
+#endif /* PPC_POWERNV_IMC_PMU_DEF_H */
--
2.7.4

2016-12-20 06:40:53

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v3 5/6] powerpc/perf: Generic imc pmu event functions

Since, the IMC counters' data are periodically fed to a memory location,
the functions to read/update, start/stop, add/del can be generic and can
be used by all IMC PMU units.

This patch adds a set of generic imc pmu related event functions to be
used by each imc pmu unit. Add code to setup format attribute and to
register imc pmus. Add a event_init function for nest_imc events.

Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Stewart Smith <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Hemant Kumar <[email protected]>
---
arch/powerpc/include/asm/imc-pmu.h | 1 +
arch/powerpc/perf/imc-pmu.c | 122 ++++++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/opal-imc.c | 29 ++++++-
3 files changed, 148 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index 911d837..ceb6b1f 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -70,4 +70,5 @@ struct imc_pmu {

#define UNKNOWN_DOMAIN -1

+int imc_get_domain(struct device_node *pmu_dev);
#endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 7b6ce50..f12ece8 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -17,6 +17,117 @@
struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];

+/* Needed for sanity check */
+extern u64 nest_max_offset;
+
+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *imc_format_attrs[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group imc_format_group = {
+ .name = "format",
+ .attrs = imc_format_attrs,
+};
+
+static int nest_imc_event_init(struct perf_event *event)
+{
+ int chip_id;
+ u32 config = event->attr.config;
+ struct perchip_nest_info *pcni;
+
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* Sampling not supported */
+ if (event->hw.sample_period)
+ return -EINVAL;
+
+ /* unsupported modes and filters */
+ if (event->attr.exclude_user ||
+ event->attr.exclude_kernel ||
+ event->attr.exclude_hv ||
+ event->attr.exclude_idle ||
+ event->attr.exclude_host ||
+ event->attr.exclude_guest)
+ return -EINVAL;
+
+ if (event->cpu < 0)
+ return -EINVAL;
+
+ /* Sanity check for config (event offset) */
+ if (config > nest_max_offset)
+ return -EINVAL;
+
+ chip_id = topology_physical_package_id(event->cpu);
+ pcni = &nest_perchip_info[chip_id];
+ event->hw.event_base = pcni->vbase[config/PAGE_SIZE] +
+ (config & ~PAGE_MASK);
+
+ return 0;
+}
+
+static void imc_read_counter(struct perf_event *event)
+{
+ u64 *addr, data;
+
+ addr = (u64 *)event->hw.event_base;
+ data = __be64_to_cpu(*addr);
+ local64_set(&event->hw.prev_count, data);
+}
+
+static void imc_perf_event_update(struct perf_event *event)
+{
+ u64 counter_prev, counter_new, final_count, *addr;
+
+ addr = (u64 *)event->hw.event_base;
+ counter_prev = local64_read(&event->hw.prev_count);
+ counter_new = __be64_to_cpu(*addr);
+ final_count = counter_new - counter_prev;
+
+ local64_set(&event->hw.prev_count, counter_new);
+ local64_add(final_count, &event->count);
+}
+
+static void imc_event_start(struct perf_event *event, int flags)
+{
+ imc_read_counter(event);
+}
+
+static void imc_event_stop(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_UPDATE)
+ imc_perf_event_update(event);
+}
+
+static int imc_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ imc_event_start(event, flags);
+
+ return 0;
+}
+
+/* update_pmu_ops : Populate the appropriate operations for "pmu" */
+static int update_pmu_ops(struct imc_pmu *pmu)
+{
+ if (!pmu)
+ return -EINVAL;
+
+ pmu->pmu.task_ctx_nr = perf_invalid_context;
+ pmu->pmu.event_init = nest_imc_event_init;
+ pmu->pmu.add = imc_event_add;
+ pmu->pmu.del = imc_event_stop;
+ pmu->pmu.start = imc_event_start;
+ pmu->pmu.stop = imc_event_stop;
+ pmu->pmu.read = imc_perf_event_update;
+ pmu->attr_groups[1] = &imc_format_group;
+ pmu->pmu.attr_groups = pmu->attr_groups;
+
+ return 0;
+}
+
/* dev_str_attr : Populate event "name" and string "str" in attribute */
static struct attribute *dev_str_attr(const char *name, const char *str)
{
@@ -83,6 +194,17 @@ int init_imc_pmu(struct imc_events *events, int idx,
if (ret)
goto err_free;

+ ret = update_pmu_ops(pmu_ptr);
+ if (ret)
+ goto err_free;
+
+ ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+ if (ret)
+ goto err_free;
+
+ pr_info("%s performance monitor hardware support registered\n",
+ pmu_ptr->pmu.name);
+
return 0;

err_free:
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 7870401..a2ca8e4 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -36,6 +36,7 @@ extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];

extern int init_imc_pmu(struct imc_events *events,
int idx, struct imc_pmu *pmu_ptr);
+u64 nest_max_offset;

static int imc_event_info(char *name, struct imc_events *events)
{
@@ -68,8 +69,25 @@ static int imc_event_info_str(struct property *pp, char *name,
return 0;
}

+/*
+ * Updates the maximum offset for an event in the pmu with domain
+ * "pmu_domain". Right now, only nest domain is supported.
+ */
+static void update_max_value(u32 value, int pmu_domain)
+{
+ switch (pmu_domain) {
+ case IMC_DOMAIN_NEST:
+ if (nest_max_offset < value)
+ nest_max_offset = value;
+ break;
+ default:
+ /* Unknown domain, return */
+ return;
+ }
+}
+
static int imc_event_info_val(char *name, u32 val,
- struct imc_events *events)
+ struct imc_events *events, int pmu_domain)
{
int ret;

@@ -77,6 +95,7 @@ static int imc_event_info_val(char *name, u32 val,
if (ret)
return ret;
sprintf(events->ev_value, "event=0x%x", val);
+ update_max_value(val, pmu_domain);

return 0;
}
@@ -111,7 +130,8 @@ static int set_event_property(struct property *pp, char *event_prop,
static int imc_events_node_parser(struct device_node *dev,
struct imc_events *events,
struct property *event_scale,
- struct property *event_unit)
+ struct property *event_unit,
+ int pmu_domain)
{
struct property *name, *pp;
char *ev_name;
@@ -153,7 +173,8 @@ static int imc_events_node_parser(struct device_node *dev,
*/
if (strncmp(pp->name, "reg", 3) == 0) {
of_property_read_u32(dev, pp->name, &val);
- ret = imc_event_info_val(ev_name, val, &events[idx]);
+ ret = imc_event_info_val(ev_name, val, &events[idx],
+ pmu_domain);
if (ret) {
kfree(events[idx].ev_name);
kfree(events[idx].ev_value);
@@ -322,7 +343,7 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index)
/* Loop through event nodes */
for_each_child_of_node(parent, ev_node) {
ret = imc_events_node_parser(ev_node, &events[idx], scale_pp,
- unit_pp);
+ unit_pp, pmu_ptr->domain);
if (ret < 0) {
/* Unable to parse this event */
if (ret == -ENOMEM)
--
2.7.4

2016-12-20 06:41:00

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v3 6/6] powerpc/perf: IMC pmu cpumask and cpu hotplug support

Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
online CPU) from each chip for nest PMUs is designated to read counters.

On CPU hotplug, dying CPU is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip (for nest
units) is designated as new cpu to read counters.

Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Stewart Smith <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Hemant Kumar <[email protected]>
---
arch/powerpc/include/asm/opal-api.h | 3 +-
arch/powerpc/include/asm/opal.h | 2 +
arch/powerpc/perf/imc-pmu.c | 167 ++++++++++++++++++++++++-
arch/powerpc/platforms/powernv/opal-wrappers.S | 1 +
4 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index 0e2e57b..48e1d3e 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -167,7 +167,8 @@
#define OPAL_INT_EOI 124
#define OPAL_INT_SET_MFRR 125
#define OPAL_PCI_TCE_KILL 126
-#define OPAL_LAST 126
+#define OPAL_NEST_IMC_COUNTERS_CONTROL 128
+#define OPAL_LAST 128

/* Device tree flags */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index e958b70..fe72b57 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -229,6 +229,8 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
int64_t opal_rm_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
uint32_t pe_num, uint32_t tce_size,
uint64_t dma_addr, uint32_t npages);
+int64_t opal_nest_imc_counters_control(uint64_t mode, uint64_t value1,
+ uint64_t value2, uint64_t value3);

/* Internal functions */
extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f12ece8..49f6486 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -16,6 +16,7 @@

struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+static cpumask_t nest_imc_cpumask;

/* Needed for sanity check */
extern u64 nest_max_offset;
@@ -31,6 +32,164 @@ static struct attribute_group imc_format_group = {
.attrs = imc_format_attrs,
};

+/* Get the cpumask printed to a buffer "buf" */
+static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ cpumask_t *active_mask;
+
+ active_mask = &nest_imc_cpumask;
+ return cpumap_print_to_pagebuf(true, buf, active_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *imc_pmu_cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group imc_pmu_cpumask_attr_group = {
+ .attrs = imc_pmu_cpumask_attrs,
+};
+
+/*
+ * nest_init : Initializes the nest imc engine for the current chip.
+ */
+static void nest_init(int *loc)
+{
+ int rc;
+
+ rc = opal_nest_imc_counters_control(NEST_IMC_PRODUCTION_MODE,
+ NEST_IMC_ENGINE_START, 0, 0);
+ if (rc)
+ loc[smp_processor_id()] = 1;
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+ int i;
+
+ for (i = 0;
+ (per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
+ perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+ old_cpu, new_cpu);
+}
+
+static int ppc_nest_imc_cpu_online(unsigned int cpu)
+{
+ int nid, fcpu, ncpu;
+ struct cpumask *l_cpumask, tmp_mask;
+
+ /* Fint the cpumask of this node */
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+
+ /*
+ * If any of the cpu from this node is already present in the mask,
+ * just return, if not, then set this cpu in the mask.
+ */
+ if (!cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask)) {
+ cpumask_set_cpu(cpu, &nest_imc_cpumask);
+ return 0;
+ }
+
+ fcpu = cpumask_first(l_cpumask);
+ ncpu = cpumask_next(cpu, l_cpumask);
+ if (cpu == fcpu) {
+ if (cpumask_test_and_clear_cpu(ncpu, &nest_imc_cpumask)) {
+ cpumask_set_cpu(cpu, &nest_imc_cpumask);
+ nest_change_cpu_context(ncpu, cpu);
+ }
+ }
+
+ return 0;
+}
+
+static int ppc_nest_imc_cpu_offline(unsigned int cpu)
+{
+ int nid, target = -1;
+ struct cpumask *l_cpumask;
+
+ /*
+ * Check in the designated list for this cpu. Dont bother
+ * if not one of them.
+ */
+ if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
+ return 0;
+
+ /*
+ * Now that this cpu is one of the designated,
+ * find a next cpu a) which is online and b) in same chip.
+ */
+ nid = cpu_to_node(cpu);
+ l_cpumask = cpumask_of_node(nid);
+ target = cpumask_next(cpu, l_cpumask);
+
+ /*
+ * Update the cpumask with the target cpu and
+ * migrate the context if needed
+ */
+ if (target >= 0 && target <= nr_cpu_ids) {
+ cpumask_set_cpu(target, &nest_imc_cpumask);
+ nest_change_cpu_context(cpu, target);
+ }
+ return 0;
+}
+
+static int nest_pmu_cpumask_init(void)
+{
+ const struct cpumask *l_cpumask;
+ int cpu, nid;
+ int *cpus_opal_rc;
+
+ if (!cpumask_empty(&nest_imc_cpumask))
+ return 0;
+
+ cpu_notifier_register_begin();
+
+ /*
+ * Nest PMUs are per-chip counters. So designate a cpu
+ * from each chip for counter collection.
+ */
+ for_each_online_node(nid) {
+ l_cpumask = cpumask_of_node(nid);
+
+ /* designate first online cpu in this node */
+ cpu = cpumask_first(l_cpumask);
+ cpumask_set_cpu(cpu, &nest_imc_cpumask);
+ }
+
+ /*
+ * Memory for OPAL call return value.
+ */
+ cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+ if (!cpus_opal_rc)
+ goto fail;
+
+ /* Initialize Nest PMUs in each node using designated cpus */
+ on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
+ (void *)cpus_opal_rc, 1);
+
+ /* Check return value array for any OPAL call failure */
+ for_each_cpu(cpu, &nest_imc_cpumask) {
+ if (cpus_opal_rc[cpu])
+ goto fail;
+ }
+
+ cpuhp_setup_state(CPUHP_AP_PERF_ONLINE,
+ "POWER_NEST_IMC_ONLINE",
+ ppc_nest_imc_cpu_online,
+ ppc_nest_imc_cpu_offline);
+
+ cpu_notifier_register_done();
+ return 0;
+
+fail:
+ cpu_notifier_register_done();
+ return -ENODEV;
+}
+
static int nest_imc_event_init(struct perf_event *event)
{
int chip_id;
@@ -63,7 +222,7 @@ static int nest_imc_event_init(struct perf_event *event)
chip_id = topology_physical_package_id(event->cpu);
pcni = &nest_perchip_info[chip_id];
event->hw.event_base = pcni->vbase[config/PAGE_SIZE] +
- (config & ~PAGE_MASK);
+ (config & ~PAGE_MASK);

return 0;
}
@@ -123,6 +282,7 @@ static int update_pmu_ops(struct imc_pmu *pmu)
pmu->pmu.stop = imc_event_stop;
pmu->pmu.read = imc_perf_event_update;
pmu->attr_groups[1] = &imc_format_group;
+ pmu->attr_groups[2] = &imc_pmu_cpumask_attr_group;
pmu->pmu.attr_groups = pmu->attr_groups;

return 0;
@@ -190,6 +350,11 @@ int init_imc_pmu(struct imc_events *events, int idx,
{
int ret = -ENODEV;

+ /* Add cpumask and register for hotplug notification */
+ ret = nest_pmu_cpumask_init();
+ if (ret)
+ return ret;
+
ret = update_events_in_group(events, idx, pmu_ptr);
if (ret)
goto err_free;
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 44d2d84..c615990 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -309,3 +309,4 @@ OPAL_CALL(opal_int_eoi, OPAL_INT_EOI);
OPAL_CALL(opal_int_set_mfrr, OPAL_INT_SET_MFRR);
OPAL_CALL(opal_pci_tce_kill, OPAL_PCI_TCE_KILL);
OPAL_CALL_REAL(opal_rm_pci_tce_kill, OPAL_PCI_TCE_KILL);
+OPAL_CALL(opal_nest_imc_counters_control, OPAL_NEST_IMC_COUNTERS_CONTROL);
--
2.7.4

2016-12-20 06:41:27

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v3 4/6] powerpc/perf: Add event attribute and group to IMC pmus

Device tree IMC driver code parses the IMC units and their events. It
passes the information to IMC pmu code which is placed in powerpc/perf
as "imc-pmu.c".

This patch creates only event attributes and attribute groups for the
IMC pmus.

Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Stewart Smith <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Hemant Kumar <[email protected]>
---
arch/powerpc/perf/Makefile | 6 +-
arch/powerpc/perf/imc-pmu.c | 96 +++++++++++++++++++++++++++++++
arch/powerpc/platforms/powernv/opal-imc.c | 12 +++-
3 files changed, 111 insertions(+), 3 deletions(-)
create mode 100644 arch/powerpc/perf/imc-pmu.c

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index f102d53..6f1d0ac 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -2,10 +2,14 @@ subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror

obj-$(CONFIG_PERF_EVENTS) += callchain.o perf_regs.o

+imc-$(CONFIG_PPC_POWERNV) += imc-pmu.o
+
obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o bhrb.o
obj64-$(CONFIG_PPC_PERF_CTRS) += power4-pmu.o ppc970-pmu.o power5-pmu.o \
power5+-pmu.o power6-pmu.o power7-pmu.o \
- isa207-common.o power8-pmu.o power9-pmu.o
+ isa207-common.o power8-pmu.o power9-pmu.o \
+ $(imc-y)
+
obj32-$(CONFIG_PPC_PERF_CTRS) += mpc7450-pmu.o

obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
new file mode 100644
index 0000000..7b6ce50
--- /dev/null
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -0,0 +1,96 @@
+/*
+ * Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2016 Madhavan Srinivasan, IBM Corporation.
+ * (C) 2016 Hemant K Shaw, IBM Corporation.
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <asm/opal.h>
+#include <asm/imc-pmu.h>
+#include <asm/cputhreads.h>
+#include <linux/string.h>
+
+struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+/* dev_str_attr : Populate event "name" and string "str" in attribute */
+static struct attribute *dev_str_attr(const char *name, const char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ sysfs_attr_init(&attr->attr.attr);
+
+ attr->event_str = str;
+ attr->attr.attr.name = name;
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = perf_event_sysfs_show;
+
+ return &attr->attr.attr;
+}
+
+/*
+ * update_events_in_group: Update the "events" information in an attr_group
+ * and assign the attr_group to the pmu "pmu".
+ */
+static int update_events_in_group(struct imc_events *events,
+ int idx, struct imc_pmu *pmu)
+{
+ struct attribute_group *attr_group;
+ struct attribute **attrs;
+ int i;
+
+ /* Allocate memory for attribute group */
+ attr_group = kzalloc(sizeof(*attr_group), GFP_KERNEL);
+ if (!attr_group)
+ return -ENOMEM;
+
+ /* Allocate memory for attributes */
+ attrs = kzalloc((sizeof(struct attribute *) * (idx + 1)), GFP_KERNEL);
+ if (!attrs) {
+ kfree(attr_group);
+ return -ENOMEM;
+ }
+
+ attr_group->name = "events";
+ attr_group->attrs = attrs;
+ for (i = 0; i < idx; i++, events++) {
+ attrs[i] = dev_str_attr((char *)events->ev_name,
+ (char *)events->ev_value);
+ }
+
+ pmu->attr_groups[0] = attr_group;
+ return 0;
+}
+
+/*
+ * init_imc_pmu : Setup the IMC pmu device in "pmu_ptr" and its events
+ * "events".
+ * Setup the cpu mask information for these pmus and setup the state machine
+ * hotplug notifiers as well.
+ */
+int init_imc_pmu(struct imc_events *events, int idx,
+ struct imc_pmu *pmu_ptr)
+{
+ int ret = -ENODEV;
+
+ ret = update_events_in_group(events, idx, pmu_ptr);
+ if (ret)
+ goto err_free;
+
+ return 0;
+
+err_free:
+ /* Only free the attr_groups which are dynamically allocated */
+ if (pmu_ptr->attr_groups[0]) {
+ kfree(pmu_ptr->attr_groups[0]->attrs);
+ kfree(pmu_ptr->attr_groups[0]);
+ }
+
+ return ret;
+}
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 5ee93402..7870401 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -31,8 +31,11 @@
#include <asm/cputable.h>
#include <asm/imc-pmu.h>

-struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
-struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+extern int init_imc_pmu(struct imc_events *events,
+ int idx, struct imc_pmu *pmu_ptr);

static int imc_event_info(char *name, struct imc_events *events)
{
@@ -335,6 +338,11 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index)
idx += ret;
}

+ ret = init_imc_pmu(events, idx, pmu_ptr);
+ if (ret) {
+ pr_err("IMC PMU %s Register failed\n", pmu_ptr->pmu.name);
+ goto free_events;
+ }
return 0;

free_events:
--
2.7.4

2016-12-20 06:41:44

by Hemant Kumar

[permalink] [raw]
Subject: [PATCH v3 3/6] powerpc/powernv: Detect supported IMC units and its events

Parse device tree to detect IMC units. Traverse through each IMC unit
node to find supported events and corresponding unit/scale files (if any).

Right now, only nest IMC units are supported.
The nest IMC unit event node from device tree will contain the offset in
the reserved memory region to get the counter data for a given
event. The offsets for the nest events are contained in the "reg"
property of the event "node".

Kernel code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit property in the event
node and passes on the value as an event attr for perf interface to use
in the post processing by the perf tool. Some PMUs may have common scale
and unit properties which implies that all events supported by this PMU
inherit the scale and unit properties of the PMU itself. For those
events, we need to set the common unit and scale values.

For failure to initialize any unit or any event, disable that unit and
continue setting up the rest of them.

Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Anton Blanchard <[email protected]>
Cc: Sukadev Bhattiprolu <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Stewart Smith <[email protected]>
Cc: Daniel Axtens <[email protected]>
Cc: Stephane Eranian <[email protected]>
Signed-off-by: Hemant Kumar <[email protected]>
---
arch/powerpc/platforms/powernv/opal-imc.c | 332 ++++++++++++++++++++++++++++++
1 file changed, 332 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index ee2ae45..5ee93402 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -32,6 +32,337 @@
#include <asm/imc-pmu.h>

struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+static int imc_event_info(char *name, struct imc_events *events)
+{
+ char *buf;
+
+ /* memory for content */
+ buf = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ events->ev_name = name;
+ events->ev_value = buf;
+ return 0;
+}
+
+static int imc_event_info_str(struct property *pp, char *name,
+ struct imc_events *events)
+{
+ int ret;
+
+ ret = imc_event_info(name, events);
+ if (ret)
+ return ret;
+
+ if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
+ (pp->length > IMC_MAX_PMU_NAME_LEN))
+ return -EINVAL;
+ strncpy(events->ev_value, (const char *)pp->value, pp->length);
+
+ return 0;
+}
+
+static int imc_event_info_val(char *name, u32 val,
+ struct imc_events *events)
+{
+ int ret;
+
+ ret = imc_event_info(name, events);
+ if (ret)
+ return ret;
+ sprintf(events->ev_value, "event=0x%x", val);
+
+ return 0;
+}
+
+static int set_event_property(struct property *pp, char *event_prop,
+ struct imc_events *events, char *ev_name)
+{
+ char *buf;
+ int ret;
+
+ buf = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ sprintf(buf, "%s.%s", ev_name, event_prop);
+ ret = imc_event_info_str(pp, buf, events);
+ if (ret) {
+ kfree(events->ev_name);
+ kfree(events->ev_value);
+ }
+
+ return ret;
+}
+
+/*
+ * imc_events_node_parser: Parse the event node "dev" and assign the parsed
+ * information to event "events".
+ *
+ * Parses the "reg" property of this event. "reg" gives us the event offset.
+ * Also, parse the "scale" and "unit" properties, if any.
+ */
+static int imc_events_node_parser(struct device_node *dev,
+ struct imc_events *events,
+ struct property *event_scale,
+ struct property *event_unit)
+{
+ struct property *name, *pp;
+ char *ev_name;
+ u32 val;
+ int idx = 0, ret;
+
+ if (!dev)
+ return -EINVAL;
+
+ /*
+ * Loop through each property of an event node
+ */
+ name = of_find_property(dev, "event-name", NULL);
+ if (!name)
+ return -ENODEV;
+
+ if (!name->value ||
+ (strnlen(name->value, name->length) == name->length) ||
+ (name->length > IMC_MAX_PMU_NAME_LEN))
+ return -EINVAL;
+
+ ev_name = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!ev_name)
+ return -ENOMEM;
+
+ strncpy(ev_name, name->value, name->length);
+
+ /*
+ * Parse each property of this event node "dev". Property "reg" has
+ * the offset which is assigned to the event name. Other properties
+ * like "scale" and "unit" are assigned to event.scale and event.unit
+ * accordingly.
+ */
+ for_each_property_of_node(dev, pp) {
+ /*
+ * If there is an issue in parsing a single property of
+ * this event, we just clean up the buffers, but we still
+ * continue to parse.
+ */
+ if (strncmp(pp->name, "reg", 3) == 0) {
+ of_property_read_u32(dev, pp->name, &val);
+ ret = imc_event_info_val(ev_name, val, &events[idx]);
+ if (ret) {
+ kfree(events[idx].ev_name);
+ kfree(events[idx].ev_value);
+ continue;
+ }
+ /*
+ * If the common scale and unit properties available,
+ * then, assign them to this event
+ */
+ if (event_scale) {
+ idx++;
+ ret = set_event_property(event_scale, "scale",
+ &events[idx],
+ ev_name);
+ if (ret)
+ continue;
+ idx++;
+ }
+ if (event_unit) {
+ ret = set_event_property(event_unit, "unit",
+ &events[idx],
+ ev_name);
+ if (ret)
+ continue;
+ }
+ idx++;
+ } else if (strncmp(pp->name, "unit", 4) == 0) {
+ ret = set_event_property(pp, "unit", &events[idx],
+ ev_name);
+ if (ret)
+ continue;
+ idx++;
+ } else if (strncmp(pp->name, "scale", 5) == 0) {
+ ret = set_event_property(pp, "scale", &events[idx],
+ ev_name);
+ if (ret)
+ continue;
+ idx++;
+ }
+ }
+
+ return idx;
+}
+
+/*
+ * imc_get_domain : Returns the domain for pmu "pmu_dev".
+ */
+int imc_get_domain(struct device_node *pmu_dev)
+{
+ if (of_device_is_compatible(pmu_dev, IMC_DTB_NEST_COMPAT))
+ return IMC_DOMAIN_NEST;
+ else
+ return UNKNOWN_DOMAIN;
+}
+
+/*
+ * get_nr_children : Returns the number of children for a pmu device node.
+ */
+static int get_nr_children(struct device_node *pmu_node)
+{
+ struct device_node *child;
+ int i = 0;
+
+ for_each_child_of_node(pmu_node, child)
+ i++;
+ return i;
+}
+
+/*
+ * imc_free_events : Cleanup the "events" list having "nr_entries" entries.
+ */
+static void imc_free_events(struct imc_events *events, int nr_entries)
+{
+ int i;
+
+ /* Nothing to clean, return */
+ if (!events)
+ return;
+ for (i = 0; i < nr_entries; i++) {
+ kfree(events[i].ev_name);
+ kfree(events[i].ev_value);
+ }
+
+ kfree(events);
+}
+
+/*
+ * imc_pmu_create : Takes the parent device which is the pmu unit and a
+ * pmu_index as the inputs.
+ * Allocates memory for the pmu, sets up its domain (NEST or CORE), and
+ * allocates memory for the events supported by this pmu. Assigns a name for
+ * the pmu. Calls imc_events_node_parser() to setup the individual events.
+ * If everything goes fine, it calls, init_imc_pmu() to setup the pmu device
+ * and register it.
+ */
+static int imc_pmu_create(struct device_node *parent, int pmu_index)
+{
+ struct device_node *ev_node;
+ struct imc_events *events;
+ struct imc_pmu *pmu_ptr;
+ struct property *pp, *scale_pp, *unit_pp;
+ char *buf;
+ int idx = 0, ret, nr_children = 0;
+
+ if (!parent)
+ return -EINVAL;
+
+ /* memory for pmu */
+ pmu_ptr = kzalloc(sizeof(struct imc_pmu), GFP_KERNEL);
+ if (!pmu_ptr)
+ return -ENOMEM;
+
+ pmu_ptr->domain = imc_get_domain(parent);
+ if (pmu_ptr->domain == UNKNOWN_DOMAIN)
+ goto free_pmu;
+
+ /* Needed for hotplug/migration */
+ per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+ /*
+ * Get the maximum no. of events in this node.
+ * Multiply by 3 to account for .scale and .unit properties
+ * This number suggests the amount of memory needed to setup the
+ * events for this pmu.
+ */
+ nr_children = get_nr_children(parent) * 3;
+
+ /* memory for pmu events */
+ events = kzalloc((sizeof(struct imc_events) * nr_children),
+ GFP_KERNEL);
+ if (!events) {
+ ret = -ENOMEM;
+ goto free_pmu;
+ }
+
+ pp = of_find_property(parent, "name", NULL);
+ if (!pp) {
+ ret = -ENODEV;
+ goto free_events;
+ }
+
+ if (!pp->value ||
+ (strnlen(pp->value, pp->length) == pp->length) ||
+ (pp->length > IMC_MAX_PMU_NAME_LEN)) {
+ ret = -EINVAL;
+ goto free_events;
+ }
+
+ buf = kzalloc(IMC_MAX_PMU_NAME_LEN, GFP_KERNEL);
+ if (!buf) {
+ ret = -ENOMEM;
+ goto free_events;
+ }
+
+ /* Save the name to register it later */
+ sprintf(buf, "nest_%s", (char *)pp->value);
+ pmu_ptr->pmu.name = (char *)buf;
+
+ /*
+ * Check if there is a common "scale" and "unit" properties inside
+ * the PMU node for all the events supported by this PMU.
+ */
+ scale_pp = of_find_property(parent, "scale", NULL);
+ unit_pp = of_find_property(parent, "unit", NULL);
+
+ /* Loop through event nodes */
+ for_each_child_of_node(parent, ev_node) {
+ ret = imc_events_node_parser(ev_node, &events[idx], scale_pp,
+ unit_pp);
+ if (ret < 0) {
+ /* Unable to parse this event */
+ if (ret == -ENOMEM)
+ goto free_events;
+ continue;
+ }
+
+ /*
+ * imc_event_node_parser will return number of
+ * event entries created for this. This could include
+ * event scale and unit files also.
+ */
+ idx += ret;
+ }
+
+ return 0;
+
+free_events:
+ imc_free_events(events, idx);
+free_pmu:
+ kfree(pmu_ptr);
+ return ret;
+}
+
+/*
+ * imc_pmu_setup : Setup the IMC PMUs (children of "parent").
+ */
+static void imc_pmu_setup(struct device_node *parent)
+{
+ struct device_node *child;
+ int pmu_count = 0, rc = 0;
+
+ if (!parent)
+ return;
+
+ /* Setup all the IMC pmus */
+ for_each_child_of_node(parent, child) {
+ imc_pmu_create(child, pmu_count);
+ if (rc)
+ return;
+ pmu_count++;
+ }
+}

static int opal_imc_counters_probe(struct platform_device *pdev)
{
@@ -93,6 +424,7 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
} while (i < (pcni->size / PAGE_SIZE));
}

+ imc_pmu_setup(imc_dev);
return 0;
err:
return -ENODEV;
--
2.7.4