2022-02-17 17:17:52

by kajoljain

[permalink] [raw]
Subject: [PATCH v6 0/4] Add perf interface to expose nvdimm

Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for pmu register/unregister
functions. A structure is added called nvdimm_pmu to be used for
adding arch/platform specific data such as cpumask, nvdimm device
pointer and pmu event functions like event_init/add/read/del.
User could use the standard perf tool to access perf events
exposed via pmu.

Interface also defines supported event list, config fields for the
event attributes and their corresponding bit values which are exported
via sysfs. Patch 3 exposes IBM pseries platform nmem* device
performance stats using this interface.

Result from power9 pseries lpar with 2 nvdimm device:

Ex: List all event by perf list

command:# perf list nmem

nmem0/cache_rh_cnt/ [Kernel PMU event]
nmem0/cache_wh_cnt/ [Kernel PMU event]
nmem0/cri_res_util/ [Kernel PMU event]
nmem0/ctl_res_cnt/ [Kernel PMU event]
nmem0/ctl_res_tm/ [Kernel PMU event]
nmem0/fast_w_cnt/ [Kernel PMU event]
nmem0/host_l_cnt/ [Kernel PMU event]
nmem0/host_l_dur/ [Kernel PMU event]
nmem0/host_s_cnt/ [Kernel PMU event]
nmem0/host_s_dur/ [Kernel PMU event]
nmem0/med_r_cnt/ [Kernel PMU event]
nmem0/med_r_dur/ [Kernel PMU event]
nmem0/med_w_cnt/ [Kernel PMU event]
nmem0/med_w_dur/ [Kernel PMU event]
nmem0/mem_life/ [Kernel PMU event]
nmem0/poweron_secs/ [Kernel PMU event]
...
nmem1/mem_life/ [Kernel PMU event]
nmem1/poweron_secs/ [Kernel PMU event]

Patch1:
Introduces the nvdimm_pmu structure
Patch2:
Adds common interface to add arch/platform specific data
includes nvdimm device pointer, pmu data along with
pmu event functions. It also defines supported event list
and adds attribute groups for format, events and cpumask.
It also adds code for cpu hotplug support.
Patch3:
Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
capabilities, cpumask and event functions and then registers
the pmu by adding callbacks to register_nvdimm_pmu.
Patch4:
Sysfs documentation patch

Changelog
---
Resend v5 -> v6
- No logic change, just a rebase to latest upstream and
tested the patchset.

- Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979

v5 -> Resend v5
- Resend the patchset

- Link to the patchset v5: https://lkml.org/lkml/2021/9/28/643

v4 -> v5:
- Remove multiple variables defined in nvdimm_pmu structure include
name and pmu functions(event_int/add/del/read) as they are just
used to copy them again in pmu variable. Now we are directly doing
this step in arch specific code as suggested by Dan Williams.

- Remove attribute group field from nvdimm pmu structure and
defined these attribute groups in common interface which
includes format, event list along with cpumask as suggested by
Dan Williams.
Since we added static defination for attrbute groups needed in
common interface, removes corresponding code from papr.

- Add nvdimm pmu event list with event codes in the common interface.

- Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
to handle review comments from Dan.

- Make nvdimm_pmu_free_hotplug_memory function static as reported
by kernel test robot, also add corresponding Reported-by tag.

- Link to the patchset v4: https://lkml.org/lkml/2021/9/3/45

v3 -> v4
- Rebase code on top of current papr_scm code without any logical
changes.

- Added Acked-by tag from Peter Zijlstra and Reviewed by tag
from Madhavan Srinivasan.

- Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605

v2 -> v3
- Added Tested-by tag.

- Fix nvdimm mailing list in the ABI Documentation.

- Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25

v1 -> v2
- Fix hotplug code by adding pmu migration call
incase current designated cpu got offline. As
pointed by Peter Zijlstra.

- Removed the retun -1 part from cpu hotplug offline
function.

- Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500

Kajol Jain (4):
drivers/nvdimm: Add nvdimm pmu structure
drivers/nvdimm: Add perf interface to expose nvdimm performance stats
powerpc/papr_scm: Add perf interface support
docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for
nvdimm pmu

Documentation/ABI/testing/sysfs-bus-nvdimm | 35 +++
arch/powerpc/include/asm/device.h | 5 +
arch/powerpc/platforms/pseries/papr_scm.c | 225 ++++++++++++++
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/nd_perf.c | 328 +++++++++++++++++++++
include/linux/nd.h | 41 +++
6 files changed, 635 insertions(+)
create mode 100644 drivers/nvdimm/nd_perf.c

--
2.31.1


2022-02-17 17:28:54

by kajoljain

[permalink] [raw]
Subject: [PATCH v6 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats

A common interface is added to get performance stats reporting
support for nvdimm devices. Added interface defines supported
event list, config fields for the event attributes and their
corresponding bit values which are exported via sysfs.

Interface also added support for pmu register/unregister functions,
cpu hotplug feature along with macros for handling events addition
via sysfs. It adds attribute groups for format, cpumask and events
to the pmu structure.

User could use the standard perf tool to access perf events exposed
via nvdimm pmu.

Signed-off-by: Kajol Jain <[email protected]>
[Make hotplug function static as reported by kernel test rorbot]
Reported-by: kernel test robot <[email protected]>
---
Changelog:
Resend v5 -> v6
- No logic change, just a rebase to latest upstream and
tested the patch.

- Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979

drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/nd_perf.c | 328 +++++++++++++++++++++++++++++++++++++++
include/linux/nd.h | 21 +++
3 files changed, 350 insertions(+)
create mode 100644 drivers/nvdimm/nd_perf.c

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..25dba6095612 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -18,6 +18,7 @@ nd_e820-y := e820.o
libnvdimm-y := core.o
libnvdimm-y += bus.o
libnvdimm-y += dimm_devs.o
+libnvdimm-y += nd_perf.o
libnvdimm-y += dimm.o
libnvdimm-y += region_devs.o
libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
new file mode 100644
index 000000000000..314415894acf
--- /dev/null
+++ b/drivers/nvdimm/nd_perf.c
@@ -0,0 +1,328 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * nd_perf.c: NVDIMM Device Performance Monitoring Unit support
+ *
+ * Perf interface to expose nvdimm performance stats.
+ *
+ * Copyright (C) 2021 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "nvdimm_pmu: " fmt
+
+#include <linux/nd.h>
+
+#define EVENT(_name, _code) enum{_name = _code}
+
+/*
+ * NVDIMM Events codes.
+ */
+
+/* Controller Reset Count */
+EVENT(CTL_RES_CNT, 0x1);
+/* Controller Reset Elapsed Time */
+EVENT(CTL_RES_TM, 0x2);
+/* Power-on Seconds */
+EVENT(POWERON_SECS, 0x3);
+/* Life Remaining */
+EVENT(MEM_LIFE, 0x4);
+/* Critical Resource Utilization */
+EVENT(CRI_RES_UTIL, 0x5);
+/* Host Load Count */
+EVENT(HOST_L_CNT, 0x6);
+/* Host Store Count */
+EVENT(HOST_S_CNT, 0x7);
+/* Host Store Duration */
+EVENT(HOST_S_DUR, 0x8);
+/* Host Load Duration */
+EVENT(HOST_L_DUR, 0x9);
+/* Media Read Count */
+EVENT(MED_R_CNT, 0xa);
+/* Media Write Count */
+EVENT(MED_W_CNT, 0xb);
+/* Media Read Duration */
+EVENT(MED_R_DUR, 0xc);
+/* Media Write Duration */
+EVENT(MED_W_DUR, 0xd);
+/* Cache Read Hit Count */
+EVENT(CACHE_RH_CNT, 0xe);
+/* Cache Write Hit Count */
+EVENT(CACHE_WH_CNT, 0xf);
+/* Fast Write Count */
+EVENT(FAST_W_CNT, 0x10);
+
+NVDIMM_EVENT_ATTR(ctl_res_cnt, CTL_RES_CNT);
+NVDIMM_EVENT_ATTR(ctl_res_tm, CTL_RES_TM);
+NVDIMM_EVENT_ATTR(poweron_secs, POWERON_SECS);
+NVDIMM_EVENT_ATTR(mem_life, MEM_LIFE);
+NVDIMM_EVENT_ATTR(cri_res_util, CRI_RES_UTIL);
+NVDIMM_EVENT_ATTR(host_l_cnt, HOST_L_CNT);
+NVDIMM_EVENT_ATTR(host_s_cnt, HOST_S_CNT);
+NVDIMM_EVENT_ATTR(host_s_dur, HOST_S_DUR);
+NVDIMM_EVENT_ATTR(host_l_dur, HOST_L_DUR);
+NVDIMM_EVENT_ATTR(med_r_cnt, MED_R_CNT);
+NVDIMM_EVENT_ATTR(med_w_cnt, MED_W_CNT);
+NVDIMM_EVENT_ATTR(med_r_dur, MED_R_DUR);
+NVDIMM_EVENT_ATTR(med_w_dur, MED_W_DUR);
+NVDIMM_EVENT_ATTR(cache_rh_cnt, CACHE_RH_CNT);
+NVDIMM_EVENT_ATTR(cache_wh_cnt, CACHE_WH_CNT);
+NVDIMM_EVENT_ATTR(fast_w_cnt, FAST_W_CNT);
+
+static struct attribute *nvdimm_events_attr[] = {
+ NVDIMM_EVENT_PTR(CTL_RES_CNT),
+ NVDIMM_EVENT_PTR(CTL_RES_TM),
+ NVDIMM_EVENT_PTR(POWERON_SECS),
+ NVDIMM_EVENT_PTR(MEM_LIFE),
+ NVDIMM_EVENT_PTR(CRI_RES_UTIL),
+ NVDIMM_EVENT_PTR(HOST_L_CNT),
+ NVDIMM_EVENT_PTR(HOST_S_CNT),
+ NVDIMM_EVENT_PTR(HOST_S_DUR),
+ NVDIMM_EVENT_PTR(HOST_L_DUR),
+ NVDIMM_EVENT_PTR(MED_R_CNT),
+ NVDIMM_EVENT_PTR(MED_W_CNT),
+ NVDIMM_EVENT_PTR(MED_R_DUR),
+ NVDIMM_EVENT_PTR(MED_W_DUR),
+ NVDIMM_EVENT_PTR(CACHE_RH_CNT),
+ NVDIMM_EVENT_PTR(CACHE_WH_CNT),
+ NVDIMM_EVENT_PTR(FAST_W_CNT),
+ NULL
+};
+
+static struct attribute_group nvdimm_pmu_events_group = {
+ .name = "events",
+ .attrs = nvdimm_events_attr,
+};
+
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+ .name = "format",
+ .attrs = nvdimm_pmu_format_attr,
+};
+
+ssize_t nvdimm_events_sysfs_show(struct device *dev,
+ struct device_attribute *attr, char *page)
+{
+ struct perf_pmu_events_attr *pmu_attr;
+
+ pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
+
+ return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
+}
+
+static ssize_t nvdimm_pmu_cpumask_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct pmu *pmu = dev_get_drvdata(dev);
+ struct nvdimm_pmu *nd_pmu;
+
+ nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+ return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+ struct nvdimm_pmu *nd_pmu;
+ u32 target;
+ int nodeid;
+ const struct cpumask *cpumask;
+
+ nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+ /* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
+ cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
+
+ /*
+ * If given cpu is not same as current designated cpu for
+ * counter access, just return.
+ */
+ if (cpu != nd_pmu->cpu)
+ return 0;
+
+ /* Check for any active cpu in nd_pmu->arch_cpumask */
+ target = cpumask_any(&nd_pmu->arch_cpumask);
+
+ /*
+ * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
+ * check in given cpu's numa node list.
+ */
+ if (target >= nr_cpu_ids) {
+ nodeid = cpu_to_node(cpu);
+ cpumask = cpumask_of_node(nodeid);
+ target = cpumask_any_but(cpumask, cpu);
+ }
+ nd_pmu->cpu = target;
+
+ /* Migrate nvdimm pmu events to the new target cpu if valid */
+ if (target >= 0 && target < nr_cpu_ids)
+ perf_pmu_migrate_context(&nd_pmu->pmu, cpu, target);
+
+ return 0;
+}
+
+static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+ struct nvdimm_pmu *nd_pmu;
+
+ nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+ if (nd_pmu->cpu >= nr_cpu_ids)
+ nd_pmu->cpu = cpu;
+
+ return 0;
+}
+
+static int create_cpumask_attr_group(struct nvdimm_pmu *nd_pmu)
+{
+ struct perf_pmu_events_attr *pmu_events_attr;
+ struct attribute **attrs_group;
+ struct attribute_group *nvdimm_pmu_cpumask_group;
+
+ pmu_events_attr = kzalloc(sizeof(*pmu_events_attr), GFP_KERNEL);
+ if (!pmu_events_attr)
+ return -ENOMEM;
+
+ attrs_group = kzalloc(2 * sizeof(struct attribute *), GFP_KERNEL);
+ if (!attrs_group) {
+ kfree(pmu_events_attr);
+ return -ENOMEM;
+ }
+
+ /* Allocate memory for cpumask attribute group */
+ nvdimm_pmu_cpumask_group = kzalloc(sizeof(*nvdimm_pmu_cpumask_group), GFP_KERNEL);
+ if (!nvdimm_pmu_cpumask_group) {
+ kfree(pmu_events_attr);
+ kfree(attrs_group);
+ return -ENOMEM;
+ }
+
+ sysfs_attr_init(&pmu_events_attr->attr.attr);
+ pmu_events_attr->attr.attr.name = "cpumask";
+ pmu_events_attr->attr.attr.mode = 0444;
+ pmu_events_attr->attr.show = nvdimm_pmu_cpumask_show;
+ attrs_group[0] = &pmu_events_attr->attr.attr;
+ attrs_group[1] = NULL;
+
+ nvdimm_pmu_cpumask_group->attrs = attrs_group;
+ nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = nvdimm_pmu_cpumask_group;
+ return 0;
+}
+
+static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
+{
+ int nodeid, rc;
+ const struct cpumask *cpumask;
+
+ /*
+ * Incase of cpu hotplug feature, arch specific code
+ * can provide required cpumask which can be used
+ * to get designatd cpu for counter access.
+ * Check for any active cpu in nd_pmu->arch_cpumask.
+ */
+ if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
+ nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
+ } else {
+ /* pick active cpu from the cpumask of device numa node. */
+ nodeid = dev_to_node(nd_pmu->dev);
+ cpumask = cpumask_of_node(nodeid);
+ nd_pmu->cpu = cpumask_any(cpumask);
+ }
+
+ rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
+ nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
+
+ if (rc < 0)
+ return rc;
+
+ nd_pmu->cpuhp_state = rc;
+
+ /* Register the pmu instance for cpu hotplug */
+ rc = cpuhp_state_add_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ if (rc) {
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+ return rc;
+ }
+
+ /* Create cpumask attribute group */
+ rc = create_cpumask_attr_group(nd_pmu);
+ if (rc) {
+ cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+ return rc;
+ }
+
+ return 0;
+}
+
+static void nvdimm_pmu_free_hotplug_memory(struct nvdimm_pmu *nd_pmu)
+{
+ cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+ cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+
+ if (nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR])
+ kfree(nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR]->attrs);
+ kfree(nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR]);
+}
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nd_pmu, struct platform_device *pdev)
+{
+ int rc;
+
+ if (!nd_pmu || !pdev)
+ return -EINVAL;
+
+ /* event functions like add/del/read/event_init and pmu name should not be NULL */
+ if (WARN_ON_ONCE(!(nd_pmu->pmu.event_init && nd_pmu->pmu.add &&
+ nd_pmu->pmu.del && nd_pmu->pmu.read && nd_pmu->pmu.name)))
+ return -EINVAL;
+
+ nd_pmu->pmu.attr_groups = kzalloc((NVDIMM_PMU_NULL_ATTR + 1) *
+ sizeof(struct attribute_group *), GFP_KERNEL);
+ if (!nd_pmu->pmu.attr_groups)
+ return -ENOMEM;
+
+ /*
+ * Add platform_device->dev pointer to nvdimm_pmu to access
+ * device data in events functions.
+ */
+ nd_pmu->dev = &pdev->dev;
+
+ /* Fill attribute groups for the nvdimm pmu device */
+ nd_pmu->pmu.attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
+ nd_pmu->pmu.attr_groups[NVDIMM_PMU_EVENT_ATTR] = &nvdimm_pmu_events_group;
+ nd_pmu->pmu.attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
+
+ /* Fill attribute group for cpumask */
+ rc = nvdimm_pmu_cpu_hotplug_init(nd_pmu);
+ if (rc) {
+ pr_info("cpu hotplug feature failed for device: %s\n", nd_pmu->pmu.name);
+ kfree(nd_pmu->pmu.attr_groups);
+ return rc;
+ }
+
+ rc = perf_pmu_register(&nd_pmu->pmu, nd_pmu->pmu.name, -1);
+ if (rc) {
+ kfree(nd_pmu->pmu.attr_groups);
+ nvdimm_pmu_free_hotplug_memory(nd_pmu);
+ return rc;
+ }
+
+ pr_info("%s NVDIMM performance monitor support registered\n",
+ nd_pmu->pmu.name);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(register_nvdimm_pmu);
+
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu)
+{
+ perf_pmu_unregister(&nd_pmu->pmu);
+ nvdimm_pmu_free_hotplug_memory(nd_pmu);
+ kfree(nd_pmu);
+}
+EXPORT_SYMBOL_GPL(unregister_nvdimm_pmu);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index ad186e828263..f28bda6c5512 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -9,6 +9,7 @@
#include <linux/device.h>
#include <linux/badblocks.h>
#include <linux/perf_event.h>
+#include <linux/platform_device.h>

enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
@@ -24,6 +25,19 @@ enum nvdimm_claim_class {
NVDIMM_CCLASS_UNKNOWN,
};

+#define NVDIMM_EVENT_VAR(_id) event_attr_##_id
+#define NVDIMM_EVENT_PTR(_id) (&event_attr_##_id.attr.attr)
+
+#define NVDIMM_EVENT_ATTR(_name, _id) \
+ PMU_EVENT_ATTR(_name, NVDIMM_EVENT_VAR(_id), _id, \
+ nvdimm_events_sysfs_show)
+
+/* Event attribute array index */
+#define NVDIMM_PMU_FORMAT_ATTR 0
+#define NVDIMM_PMU_EVENT_ATTR 1
+#define NVDIMM_PMU_CPUMASK_ATTR 2
+#define NVDIMM_PMU_NULL_ATTR 3
+
/**
* struct nvdimm_pmu - data structure for nvdimm perf driver
* @pmu: pmu data structure for nvdimm performance stats.
@@ -43,6 +57,13 @@ struct nvdimm_pmu {
struct cpumask arch_cpumask;
};

+extern ssize_t nvdimm_events_sysfs_show(struct device *dev,
+ struct device_attribute *attr,
+ char *page);
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
+
struct nd_device_driver {
struct device_driver drv;
unsigned long type;
--
2.31.1

2022-02-17 19:32:17

by kajoljain

[permalink] [raw]
Subject: [PATCH v6 4/4] docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for nvdimm pmu

Details are added for the event, cpumask and format attributes
in the ABI documentation.

Signed-off-by: Kajol Jain <[email protected]>
---
Changelog:
Resend v5 -> v6
- No logic change, just a rebase to latest upstream.

- Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979

Documentation/ABI/testing/sysfs-bus-nvdimm | 35 ++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-nvdimm b/Documentation/ABI/testing/sysfs-bus-nvdimm
index bff84a16812a..64004d5e4840 100644
--- a/Documentation/ABI/testing/sysfs-bus-nvdimm
+++ b/Documentation/ABI/testing/sysfs-bus-nvdimm
@@ -6,3 +6,38 @@ Description:

The libnvdimm sub-system implements a common sysfs interface for
platform nvdimm resources. See Documentation/driver-api/nvdimm/.
+
+What: /sys/bus/event_source/devices/nmemX/format
+Date: February 2022
+KernelVersion: 5.18
+Contact: Kajol Jain <[email protected]>
+Description: (RO) Attribute group to describe the magic bits
+ that go into perf_event_attr.config for a particular pmu.
+ (See ABI/testing/sysfs-bus-event_source-devices-format).
+
+ Each attribute under this group defines a bit range of the
+ perf_event_attr.config. Supported attribute is listed
+ below::
+ event = "config:0-4" - event ID
+
+ For example::
+ ctl_res_cnt = "event=0x1"
+
+What: /sys/bus/event_source/devices/nmemX/events
+Date: February 2022
+KernelVersion: 5.18
+Contact: Kajol Jain <[email protected]>
+Description: (RO) Attribute group to describe performance monitoring events
+ for the nvdimm memory device. Each attribute in this group
+ describes a single performance monitoring event supported by
+ this nvdimm pmu. The name of the file is the name of the event.
+ (See ABI/testing/sysfs-bus-event_source-devices-events). A
+ listing of the events supported by a given nvdimm provider type
+ can be found in Documentation/driver-api/nvdimm/$provider.
+
+What: /sys/bus/event_source/devices/nmemX/cpumask
+Date: February 2022
+KernelVersion: 5.18
+Contact: Kajol Jain <[email protected]>
+Description: (RO) This sysfs file exposes the cpumask which is designated to
+ to retrieve nvdimm pmu event counter data.
--
2.31.1

2022-02-17 23:04:20

by kajoljain

[permalink] [raw]
Subject: [PATCH v6 1/4] drivers/nvdimm: Add nvdimm pmu structure

A structure is added called nvdimm_pmu, for performance
stats reporting support of nvdimm devices. It can be used to add
device pmu data such as pmu data structure for performance
stats, nvdimm device pointer along with cpumask attributes.

Signed-off-by: Kajol Jain <[email protected]>
---
Changelog:
Resend v5 -> v6
- No logic change, just a rebase to latest upstream and
tested the patch.

- Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979

include/linux/nd.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/include/linux/nd.h b/include/linux/nd.h
index 8a8c63edb1b2..ad186e828263 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,7 @@
#include <linux/ndctl.h>
#include <linux/device.h>
#include <linux/badblocks.h>
+#include <linux/perf_event.h>

enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
@@ -23,6 +24,25 @@ enum nvdimm_claim_class {
NVDIMM_CCLASS_UNKNOWN,
};

+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @dev: nvdimm device pointer.
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ * @arch_cpumask: cpumask to get designated cpu for counter access.
+ */
+struct nvdimm_pmu {
+ struct pmu pmu;
+ struct device *dev;
+ int cpu;
+ struct hlist_node node;
+ enum cpuhp_state cpuhp_state;
+ /* cpumask provided by arch/platform specific code */
+ struct cpumask arch_cpumask;
+};
+
struct nd_device_driver {
struct device_driver drv;
unsigned long type;
--
2.31.1

2022-02-18 00:05:52

by kajoljain

[permalink] [raw]
Subject: [PATCH v6 3/4] powerpc/papr_scm: Add perf interface support

Performance monitoring support for papr-scm nvdimm devices
via perf interface is added which includes addition of pmu
functions like add/del/read/event_init for nvdimm_pmu struture.

A new parameter 'priv' in added to the pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.

papr_scm_pmu_register function populates the nvdimm_pmu structure
with name, capabilities, cpumask along with event handling
functions. Finally the populated nvdimm_pmu structure is passed to
register the pmu device. Event handling functions internally uses
hcall to get events and counter data.

Result in power9 machine with 2 nvdimm device:

Ex: List all event by perf list

command:# perf list nmem

nmem0/cache_rh_cnt/ [Kernel PMU event]
nmem0/cache_wh_cnt/ [Kernel PMU event]
nmem0/cri_res_util/ [Kernel PMU event]
nmem0/ctl_res_cnt/ [Kernel PMU event]
nmem0/ctl_res_tm/ [Kernel PMU event]
nmem0/fast_w_cnt/ [Kernel PMU event]
nmem0/host_l_cnt/ [Kernel PMU event]
nmem0/host_l_dur/ [Kernel PMU event]
nmem0/host_s_cnt/ [Kernel PMU event]
nmem0/host_s_dur/ [Kernel PMU event]
nmem0/med_r_cnt/ [Kernel PMU event]
nmem0/med_r_dur/ [Kernel PMU event]
nmem0/med_w_cnt/ [Kernel PMU event]
nmem0/med_w_dur/ [Kernel PMU event]
nmem0/mem_life/ [Kernel PMU event]
nmem0/poweron_secs/ [Kernel PMU event]
...
nmem1/mem_life/ [Kernel PMU event]
nmem1/poweron_secs/ [Kernel PMU event]

Signed-off-by: Kajol Jain <[email protected]>
---
Changelog:
Resend v5 -> v6
- No logic change, just a rebase to latest upstream and
tested the patch.

- Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979

arch/powerpc/include/asm/device.h | 5 +
arch/powerpc/platforms/pseries/papr_scm.c | 225 ++++++++++++++++++++++
2 files changed, 230 insertions(+)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {

struct pdev_archdata {
u64 dma_mask;
+ /*
+ * Pointer to nvdimm_pmu structure, to handle the unregistering
+ * of pmu device
+ */
+ void *priv;
};

#endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f48e87ac89c9..bdf2620db461 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -19,6 +19,7 @@
#include <asm/papr_pdsm.h>
#include <asm/mce.h>
#include <asm/unaligned.h>
+#include <linux/perf_event.h>

#define BIND_ANY_ADDR (~0ul)

@@ -68,6 +69,8 @@
#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
#define PAPR_SCM_PERF_STATS_VERSION 0x1

+#define to_nvdimm_pmu(_pmu) container_of(_pmu, struct nvdimm_pmu, pmu)
+
/* Struct holding a single performance metric */
struct papr_scm_perf_stat {
u8 stat_id[8];
@@ -120,6 +123,9 @@ struct papr_scm_priv {

/* length of the stat buffer as expected by phyp */
size_t stat_buffer_len;
+
+ /* array to have event_code and stat_id mappings */
+ char **nvdimm_events_map;
};

static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -340,6 +346,218 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
return 0;
}

+static int papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
+{
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats;
+ struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+ int rc, size;
+
+ /* Allocate request buffer enough to hold single performance stat */
+ size = sizeof(struct papr_scm_perf_stats) +
+ sizeof(struct papr_scm_perf_stat);
+
+ if (!p || !p->nvdimm_events_map)
+ return -EINVAL;
+
+ stats = kzalloc(size, GFP_KERNEL);
+ if (!stats)
+ return -ENOMEM;
+
+ stat = &stats->scm_statistic[0];
+ memcpy(&stat->stat_id,
+ p->nvdimm_events_map[event->attr.config],
+ sizeof(stat->stat_id));
+ stat->stat_val = 0;
+
+ rc = drc_pmem_query_stats(p, stats, 1);
+ if (rc < 0) {
+ kfree(stats);
+ return rc;
+ }
+
+ *count = be64_to_cpu(stat->stat_val);
+ kfree(stats);
+ return 0;
+}
+
+static int papr_scm_pmu_event_init(struct perf_event *event)
+{
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+ struct papr_scm_priv *p;
+
+ if (!nd_pmu)
+ return -EINVAL;
+
+ /* test the event attr type for PMU enumeration */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* it does not support event sampling mode */
+ if (is_sampling_event(event))
+ return -EOPNOTSUPP;
+
+ /* no branch sampling */
+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
+ p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
+ if (!p)
+ return -EINVAL;
+
+ /* Invalid eventcode */
+ if (event->attr.config == 0 || event->attr.config > 16)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int papr_scm_pmu_add(struct perf_event *event, int flags)
+{
+ u64 count;
+ int rc;
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+ if (!nd_pmu)
+ return -EINVAL;
+
+ if (flags & PERF_EF_START) {
+ rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &count);
+ if (rc)
+ return rc;
+
+ local64_set(&event->hw.prev_count, count);
+ }
+
+ return 0;
+}
+
+static void papr_scm_pmu_read(struct perf_event *event)
+{
+ u64 prev, now;
+ int rc;
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+ if (!nd_pmu)
+ return;
+
+ rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &now);
+ if (rc)
+ return;
+
+ prev = local64_xchg(&event->hw.prev_count, now);
+ local64_add(now - prev, &event->count);
+}
+
+static void papr_scm_pmu_del(struct perf_event *event, int flags)
+{
+ papr_scm_pmu_read(event);
+}
+
+static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu *nd_pmu)
+{
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats;
+ char *statid;
+ int index, rc, count;
+ u32 available_events;
+
+ if (!p->stat_buffer_len)
+ return -ENOENT;
+
+ available_events = (p->stat_buffer_len - sizeof(struct papr_scm_perf_stats))
+ / sizeof(struct papr_scm_perf_stat);
+
+ /* Allocate the buffer for phyp where stats are written */
+ stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
+ if (!stats) {
+ rc = -ENOMEM;
+ return rc;
+ }
+
+ /* Allocate memory to nvdimm_event_map */
+ p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL);
+ if (!p->nvdimm_events_map) {
+ rc = -ENOMEM;
+ goto out_stats;
+ }
+
+ /* Called to get list of events supported */
+ rc = drc_pmem_query_stats(p, stats, 0);
+ if (rc)
+ goto out_nvdimm_events_map;
+
+ for (index = 0, stat = stats->scm_statistic, count = 0;
+ index < available_events; index++, ++stat) {
+ statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+ if (!statid) {
+ rc = -ENOMEM;
+ goto out_nvdimm_events_map;
+ }
+
+ strcpy(statid, stat->stat_id);
+ p->nvdimm_events_map[count] = statid;
+ count++;
+ }
+ p->nvdimm_events_map[count] = NULL;
+ kfree(stats);
+ return 0;
+
+out_nvdimm_events_map:
+ kfree(p->nvdimm_events_map);
+out_stats:
+ kfree(stats);
+ return rc;
+}
+
+static void papr_scm_pmu_register(struct papr_scm_priv *p)
+{
+ struct nvdimm_pmu *nd_pmu;
+ int rc, nodeid;
+
+ nd_pmu = kzalloc(sizeof(*nd_pmu), GFP_KERNEL);
+ if (!nd_pmu) {
+ rc = -ENOMEM;
+ goto pmu_err_print;
+ }
+
+ rc = papr_scm_pmu_check_events(p, nd_pmu);
+ if (rc)
+ goto pmu_check_events_err;
+
+ nd_pmu->pmu.task_ctx_nr = perf_invalid_context;
+ nd_pmu->pmu.name = nvdimm_name(p->nvdimm);
+ nd_pmu->pmu.event_init = papr_scm_pmu_event_init;
+ nd_pmu->pmu.read = papr_scm_pmu_read;
+ nd_pmu->pmu.add = papr_scm_pmu_add;
+ nd_pmu->pmu.del = papr_scm_pmu_del;
+
+ nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
+ PERF_PMU_CAP_NO_EXCLUDE;
+
+ /*updating the cpumask variable */
+ nodeid = dev_to_node(&p->pdev->dev);
+ nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
+
+ rc = register_nvdimm_pmu(nd_pmu, p->pdev);
+ if (rc)
+ goto pmu_register_err;
+
+ /*
+ * Set archdata.priv value to nvdimm_pmu structure, to handle the
+ * unregistering of pmu device.
+ */
+ p->pdev->archdata.priv = nd_pmu;
+ return;
+
+pmu_register_err:
+ kfree(p->nvdimm_events_map);
+pmu_check_events_err:
+ kfree(nd_pmu);
+pmu_err_print:
+ dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
+}
+
/*
* Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
* health information.
@@ -1236,6 +1454,7 @@ static int papr_scm_probe(struct platform_device *pdev)
goto err2;

platform_set_drvdata(pdev, p);
+ papr_scm_pmu_register(p);

return 0;

@@ -1254,6 +1473,12 @@ static int papr_scm_remove(struct platform_device *pdev)

nvdimm_bus_unregister(p->bus);
drc_pmem_unbind(p);
+
+ if (pdev->archdata.priv)
+ unregister_nvdimm_pmu(pdev->archdata.priv);
+
+ pdev->archdata.priv = NULL;
+ kfree(p->nvdimm_events_map);
kfree(p->bus_desc.provider_name);
kfree(p);

--
2.31.1

2022-02-21 09:08:06

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm

On Thu, Feb 17, 2022 at 8:34 AM Kajol Jain <[email protected]> wrote:
>
> Patchset adds performance stats reporting support for nvdimm.
> Added interface includes support for pmu register/unregister
> functions. A structure is added called nvdimm_pmu to be used for
> adding arch/platform specific data such as cpumask, nvdimm device
> pointer and pmu event functions like event_init/add/read/del.
> User could use the standard perf tool to access perf events
> exposed via pmu.
>
> Interface also defines supported event list, config fields for the
> event attributes and their corresponding bit values which are exported
> via sysfs. Patch 3 exposes IBM pseries platform nmem* device
> performance stats using this interface.
>
> Result from power9 pseries lpar with 2 nvdimm device:
>
> Ex: List all event by perf list
>
> command:# perf list nmem
>
> nmem0/cache_rh_cnt/ [Kernel PMU event]
> nmem0/cache_wh_cnt/ [Kernel PMU event]
> nmem0/cri_res_util/ [Kernel PMU event]
> nmem0/ctl_res_cnt/ [Kernel PMU event]
> nmem0/ctl_res_tm/ [Kernel PMU event]
> nmem0/fast_w_cnt/ [Kernel PMU event]
> nmem0/host_l_cnt/ [Kernel PMU event]
> nmem0/host_l_dur/ [Kernel PMU event]
> nmem0/host_s_cnt/ [Kernel PMU event]
> nmem0/host_s_dur/ [Kernel PMU event]
> nmem0/med_r_cnt/ [Kernel PMU event]
> nmem0/med_r_dur/ [Kernel PMU event]
> nmem0/med_w_cnt/ [Kernel PMU event]
> nmem0/med_w_dur/ [Kernel PMU event]
> nmem0/mem_life/ [Kernel PMU event]
> nmem0/poweron_secs/ [Kernel PMU event]
> ...
> nmem1/mem_life/ [Kernel PMU event]
> nmem1/poweron_secs/ [Kernel PMU event]
>
> Patch1:
> Introduces the nvdimm_pmu structure
> Patch2:
> Adds common interface to add arch/platform specific data
> includes nvdimm device pointer, pmu data along with
> pmu event functions. It also defines supported event list
> and adds attribute groups for format, events and cpumask.
> It also adds code for cpu hotplug support.
> Patch3:
> Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
> nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
> capabilities, cpumask and event functions and then registers
> the pmu by adding callbacks to register_nvdimm_pmu.
> Patch4:
> Sysfs documentation patch
>
> Changelog
> ---
> Resend v5 -> v6
> - No logic change, just a rebase to latest upstream and
> tested the patchset.
>
> - Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979
>
> v5 -> Resend v5
> - Resend the patchset
>
> - Link to the patchset v5: https://lkml.org/lkml/2021/9/28/643
>
> v4 -> v5:
> - Remove multiple variables defined in nvdimm_pmu structure include
> name and pmu functions(event_int/add/del/read) as they are just
> used to copy them again in pmu variable. Now we are directly doing
> this step in arch specific code as suggested by Dan Williams.
>
> - Remove attribute group field from nvdimm pmu structure and
> defined these attribute groups in common interface which
> includes format, event list along with cpumask as suggested by
> Dan Williams.
> Since we added static defination for attrbute groups needed in
> common interface, removes corresponding code from papr.
>
> - Add nvdimm pmu event list with event codes in the common interface.
>
> - Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
> to handle review comments from Dan.

I don't think review comments should invalidate the Acked-by tags in
this case. Nothing fundamentally changed in the approach, and I would
like to have the perf ack before taking this through the nvdimm tree.

Otherwise this looks good to me.

Peter, might you have a chance to re-Ack this series, or any concerns
about me retrieving those Acks from the previous postings?

2022-02-24 01:12:08

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm

On Fri, Feb 18, 2022 at 10:06 AM Dan Williams <[email protected]> wrote:
>
> On Thu, Feb 17, 2022 at 8:34 AM Kajol Jain <[email protected]> wrote:
> >
> > Patchset adds performance stats reporting support for nvdimm.
> > Added interface includes support for pmu register/unregister
> > functions. A structure is added called nvdimm_pmu to be used for
> > adding arch/platform specific data such as cpumask, nvdimm device
> > pointer and pmu event functions like event_init/add/read/del.
> > User could use the standard perf tool to access perf events
> > exposed via pmu.
> >
> > Interface also defines supported event list, config fields for the
> > event attributes and their corresponding bit values which are exported
> > via sysfs. Patch 3 exposes IBM pseries platform nmem* device
> > performance stats using this interface.
> >
> > Result from power9 pseries lpar with 2 nvdimm device:
> >
> > Ex: List all event by perf list
> >
> > command:# perf list nmem
> >
> > nmem0/cache_rh_cnt/ [Kernel PMU event]
> > nmem0/cache_wh_cnt/ [Kernel PMU event]
> > nmem0/cri_res_util/ [Kernel PMU event]
> > nmem0/ctl_res_cnt/ [Kernel PMU event]
> > nmem0/ctl_res_tm/ [Kernel PMU event]
> > nmem0/fast_w_cnt/ [Kernel PMU event]
> > nmem0/host_l_cnt/ [Kernel PMU event]
> > nmem0/host_l_dur/ [Kernel PMU event]
> > nmem0/host_s_cnt/ [Kernel PMU event]
> > nmem0/host_s_dur/ [Kernel PMU event]
> > nmem0/med_r_cnt/ [Kernel PMU event]
> > nmem0/med_r_dur/ [Kernel PMU event]
> > nmem0/med_w_cnt/ [Kernel PMU event]
> > nmem0/med_w_dur/ [Kernel PMU event]
> > nmem0/mem_life/ [Kernel PMU event]
> > nmem0/poweron_secs/ [Kernel PMU event]
> > ...
> > nmem1/mem_life/ [Kernel PMU event]
> > nmem1/poweron_secs/ [Kernel PMU event]
> >
> > Patch1:
> > Introduces the nvdimm_pmu structure
> > Patch2:
> > Adds common interface to add arch/platform specific data
> > includes nvdimm device pointer, pmu data along with
> > pmu event functions. It also defines supported event list
> > and adds attribute groups for format, events and cpumask.
> > It also adds code for cpu hotplug support.
> > Patch3:
> > Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
> > nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
> > capabilities, cpumask and event functions and then registers
> > the pmu by adding callbacks to register_nvdimm_pmu.
> > Patch4:
> > Sysfs documentation patch
> >
> > Changelog
> > ---
> > Resend v5 -> v6
> > - No logic change, just a rebase to latest upstream and
> > tested the patchset.
> >
> > - Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979
> >
> > v5 -> Resend v5
> > - Resend the patchset
> >
> > - Link to the patchset v5: https://lkml.org/lkml/2021/9/28/643
> >
> > v4 -> v5:
> > - Remove multiple variables defined in nvdimm_pmu structure include
> > name and pmu functions(event_int/add/del/read) as they are just
> > used to copy them again in pmu variable. Now we are directly doing
> > this step in arch specific code as suggested by Dan Williams.
> >
> > - Remove attribute group field from nvdimm pmu structure and
> > defined these attribute groups in common interface which
> > includes format, event list along with cpumask as suggested by
> > Dan Williams.
> > Since we added static defination for attrbute groups needed in
> > common interface, removes corresponding code from papr.
> >
> > - Add nvdimm pmu event list with event codes in the common interface.
> >
> > - Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
> > to handle review comments from Dan.
>
> I don't think review comments should invalidate the Acked-by tags in
> this case. Nothing fundamentally changed in the approach, and I would
> like to have the perf ack before taking this through the nvdimm tree.
>
> Otherwise this looks good to me.
>
> Peter, might you have a chance to re-Ack this series, or any concerns
> about me retrieving those Acks from the previous postings?

Reached Peter offline and he refreshed his Acked-by.

2022-02-24 01:51:25

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm

On Wed, Feb 23, 2022 at 11:07 AM Dan Williams <[email protected]> wrote:
>
> On Fri, Feb 18, 2022 at 10:06 AM Dan Williams <[email protected]> wrote:
> >
> > On Thu, Feb 17, 2022 at 8:34 AM Kajol Jain <[email protected]> wrote:
> > >
> > > Patchset adds performance stats reporting support for nvdimm.
> > > Added interface includes support for pmu register/unregister
> > > functions. A structure is added called nvdimm_pmu to be used for
> > > adding arch/platform specific data such as cpumask, nvdimm device
> > > pointer and pmu event functions like event_init/add/read/del.
> > > User could use the standard perf tool to access perf events
> > > exposed via pmu.
> > >
> > > Interface also defines supported event list, config fields for the
> > > event attributes and their corresponding bit values which are exported
> > > via sysfs. Patch 3 exposes IBM pseries platform nmem* device
> > > performance stats using this interface.
> > >
> > > Result from power9 pseries lpar with 2 nvdimm device:
> > >
> > > Ex: List all event by perf list
> > >
> > > command:# perf list nmem
> > >
> > > nmem0/cache_rh_cnt/ [Kernel PMU event]
> > > nmem0/cache_wh_cnt/ [Kernel PMU event]
> > > nmem0/cri_res_util/ [Kernel PMU event]
> > > nmem0/ctl_res_cnt/ [Kernel PMU event]
> > > nmem0/ctl_res_tm/ [Kernel PMU event]
> > > nmem0/fast_w_cnt/ [Kernel PMU event]
> > > nmem0/host_l_cnt/ [Kernel PMU event]
> > > nmem0/host_l_dur/ [Kernel PMU event]
> > > nmem0/host_s_cnt/ [Kernel PMU event]
> > > nmem0/host_s_dur/ [Kernel PMU event]
> > > nmem0/med_r_cnt/ [Kernel PMU event]
> > > nmem0/med_r_dur/ [Kernel PMU event]
> > > nmem0/med_w_cnt/ [Kernel PMU event]
> > > nmem0/med_w_dur/ [Kernel PMU event]
> > > nmem0/mem_life/ [Kernel PMU event]
> > > nmem0/poweron_secs/ [Kernel PMU event]
> > > ...
> > > nmem1/mem_life/ [Kernel PMU event]
> > > nmem1/poweron_secs/ [Kernel PMU event]
> > >
> > > Patch1:
> > > Introduces the nvdimm_pmu structure
> > > Patch2:
> > > Adds common interface to add arch/platform specific data
> > > includes nvdimm device pointer, pmu data along with
> > > pmu event functions. It also defines supported event list
> > > and adds attribute groups for format, events and cpumask.
> > > It also adds code for cpu hotplug support.
> > > Patch3:
> > > Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
> > > nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
> > > capabilities, cpumask and event functions and then registers
> > > the pmu by adding callbacks to register_nvdimm_pmu.
> > > Patch4:
> > > Sysfs documentation patch
> > >
> > > Changelog
> > > ---
> > > Resend v5 -> v6
> > > - No logic change, just a rebase to latest upstream and
> > > tested the patchset.
> > >
> > > - Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979
> > >
> > > v5 -> Resend v5
> > > - Resend the patchset
> > >
> > > - Link to the patchset v5: https://lkml.org/lkml/2021/9/28/643
> > >
> > > v4 -> v5:
> > > - Remove multiple variables defined in nvdimm_pmu structure include
> > > name and pmu functions(event_int/add/del/read) as they are just
> > > used to copy them again in pmu variable. Now we are directly doing
> > > this step in arch specific code as suggested by Dan Williams.
> > >
> > > - Remove attribute group field from nvdimm pmu structure and
> > > defined these attribute groups in common interface which
> > > includes format, event list along with cpumask as suggested by
> > > Dan Williams.
> > > Since we added static defination for attrbute groups needed in
> > > common interface, removes corresponding code from papr.
> > >
> > > - Add nvdimm pmu event list with event codes in the common interface.
> > >
> > > - Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
> > > to handle review comments from Dan.
> >
> > I don't think review comments should invalidate the Acked-by tags in
> > this case. Nothing fundamentally changed in the approach, and I would
> > like to have the perf ack before taking this through the nvdimm tree.
> >
> > Otherwise this looks good to me.
> >
> > Peter, might you have a chance to re-Ack this series, or any concerns
> > about me retrieving those Acks from the previous postings?
>
> Reached Peter offline and he refreshed his Acked-by.

There's still time for the tags from:

"Madhavan Srinivasan"
"Nageswara R Sastry"

...to be reapplied, but I'll go ahead with pushing this to Linux-next
in the meantime.

2022-02-25 07:48:49

by kajoljain

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm



On 2/25/22 11:25, Nageswara Sastry wrote:
>
>
> On 17/02/22 10:03 pm, Kajol Jain wrote:
>> Patchset adds performance stats reporting support for nvdimm.
>> Added interface includes support for pmu register/unregister
>> functions. A structure is added called nvdimm_pmu to be used for
>> adding arch/platform specific data such as cpumask, nvdimm device
>> pointer and pmu event functions like event_init/add/read/del.
>> User could use the standard perf tool to access perf events
>> exposed via pmu.
>>
>> Interface also defines supported event list, config fields for the
>> event attributes and their corresponding bit values which are exported
>> via sysfs. Patch 3 exposes IBM pseries platform nmem* device
>> performance stats using this interface.
>>
>> Result from power9 pseries lpar with 2 nvdimm device:
>>
>> Ex: List all event by perf list
>>
>> command:# perf list nmem
>>
>>    nmem0/cache_rh_cnt/                                [Kernel PMU event]
>>    nmem0/cache_wh_cnt/                                [Kernel PMU event]
>>    nmem0/cri_res_util/                                [Kernel PMU event]
>>    nmem0/ctl_res_cnt/                                 [Kernel PMU event]
>>    nmem0/ctl_res_tm/                                  [Kernel PMU event]
>>    nmem0/fast_w_cnt/                                  [Kernel PMU event]
>>    nmem0/host_l_cnt/                                  [Kernel PMU event]
>>    nmem0/host_l_dur/                                  [Kernel PMU event]
>>    nmem0/host_s_cnt/                                  [Kernel PMU event]
>>    nmem0/host_s_dur/                                  [Kernel PMU event]
>>    nmem0/med_r_cnt/                                   [Kernel PMU event]
>>    nmem0/med_r_dur/                                   [Kernel PMU event]
>>    nmem0/med_w_cnt/                                   [Kernel PMU event]
>>    nmem0/med_w_dur/                                   [Kernel PMU event]
>>    nmem0/mem_life/                                    [Kernel PMU event]
>>    nmem0/poweron_secs/                                [Kernel PMU event]
>>    ...
>>    nmem1/mem_life/                                    [Kernel PMU event]
>>    nmem1/poweron_secs/                                [Kernel PMU event]
>>
>> Patch1:
>>          Introduces the nvdimm_pmu structure
>> Patch2:
>>          Adds common interface to add arch/platform specific data
>>          includes nvdimm device pointer, pmu data along with
>>          pmu event functions. It also defines supported event list
>>          and adds attribute groups for format, events and cpumask.
>>          It also adds code for cpu hotplug support.
>> Patch3:
>>          Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
>>          nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
>>          capabilities, cpumask and event functions and then registers
>>          the pmu by adding callbacks to register_nvdimm_pmu.
>> Patch4:
>>          Sysfs documentation patch
>>
>> Changelog
>
> Tested these patches with the automated tests at
> avocado-misc-tests/perf/perf_nmem.py
> URL:
> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py
>
>
> 1. On the system where target id and online id were different then not
> seeing value in 'cpumask' and those tests failed.
>
> Example:
> Log from dmesg
> ...
> papr_scm ibm,persistent-memory:ibm,pmemory@44100003: Region registered
> with target node 1 and online node 0
> ...

Hi Nageswara Sastry,
Thanks for testing the patch set. Yes you right, incase target
node id and online node id is different, it can happen when target
node is not online and hence can cause this issue, thanks for pointing
it.

Function dev_to_node will return node id for a given nvdimm device which
can be offline in some scenarios. We should use numa node id return by
numa_map_to_online_node function in that scenario. This function incase
given node is offline, it will lookup for next closest online node and
return that nodeid.

Can you try with below change and see, if you are still getting this
issue. Please let me know.

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
b/arch/powerpc/platforms/pseries/papr_scm.c
index bdf2620db461..4dd513d7c029 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -536,7 +536,7 @@ static void papr_scm_pmu_register(struct
papr_scm_priv *p)
PERF_PMU_CAP_NO_EXCLUDE;

/*updating the cpumask variable */
- nodeid = dev_to_node(&p->pdev->dev);
+ nodeid = numa_map_to_online_node(dev_to_node(&p->pdev->dev));
nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);

Thanks,
Kajol Jain

>
> tests log:
>  (1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (1.13 s)
>  (2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.10 s)
>  (3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.07 s)
>  (4/9) perf_nmem.py:perfNMEM.test_all_events: PASS (18.14 s)
>  (5/9) perf_nmem.py:perfNMEM.test_all_group_events: PASS (2.18 s)
>  (6/9) perf_nmem.py:perfNMEM.test_mixed_events: CANCEL: With single PMU
> mixed events test is not possible. (1.10 s)
>  (7/9) perf_nmem.py:perfNMEM.test_pmu_cpumask: ERROR: invalid literal
> for int() with base 10: '' (1.10 s)
>  (8/9) perf_nmem.py:perfNMEM.test_cpumask: ERROR: invalid literal for
> int() with base 10: '' (1.10 s)
>  (9/9) perf_nmem.py:perfNMEM.test_cpumask_cpu_off: ERROR: invalid
> literal for int() with base 10: '' (1.07 s)
>
> 2. On the system where target id and online id were same then seeing
> value in 'cpumask' and those tests pass.
>
> tests log:
>  (1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (1.16 s)
>  (2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.10 s)
>  (3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.12 s)
>  (4/9) perf_nmem.py:perfNMEM.test_all_events: PASS (18.10 s)
>  (5/9) perf_nmem.py:perfNMEM.test_all_group_events: PASS (2.23 s)
>  (6/9) perf_nmem.py:perfNMEM.test_mixed_events: CANCEL: With single PMU
> mixed events test is not possible. (1.13 s)
>  (7/9) perf_nmem.py:perfNMEM.test_pmu_cpumask: PASS (1.08 s)
>  (8/9) perf_nmem.py:perfNMEM.test_cpumask: PASS (1.09 s)
>  (9/9) perf_nmem.py:perfNMEM.test_cpumask_cpu_off: PASS (1.62 s)
>
>> ---
>> Resend v5 -> v6
>> - No logic change, just a rebase to latest upstream and
>>    tested the patchset.
>>
>> - Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979
>>
>> v5 -> Resend v5
>> - Resend the patchset
>>
>> - Link to the patchset v5: https://lkml.org/lkml/2021/9/28/643
>>
>> v4 -> v5:
>> - Remove multiple variables defined in nvdimm_pmu structure include
>>    name and pmu functions(event_int/add/del/read) as they are just
>>    used to copy them again in pmu variable. Now we are directly doing
>>    this step in arch specific code as suggested by Dan Williams.
>>
>> - Remove attribute group field from nvdimm pmu structure and
>>    defined these attribute groups in common interface which
>>    includes format, event list along with cpumask as suggested by
>>    Dan Williams.
>>    Since we added static defination for attrbute groups needed in
>>    common interface, removes corresponding code from papr.
>>
>> - Add nvdimm pmu event list with event codes in the common interface.
>>
>> - Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
>>    to handle review comments from Dan.
>>
>> - Make nvdimm_pmu_free_hotplug_memory function static as reported
>>    by kernel test robot, also add corresponding Reported-by tag.
>>
>> - Link to the patchset v4: https://lkml.org/lkml/2021/9/3/45
>>
>> v3 -> v4
>> - Rebase code on top of current papr_scm code without any logical
>>    changes.
>>
>> - Added Acked-by tag from Peter Zijlstra and Reviewed by tag
>>    from Madhavan Srinivasan.
>>
>> - Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605
>>
>> v2 -> v3
>> - Added Tested-by tag.
>>
>> - Fix nvdimm mailing list in the ABI Documentation.
>>
>> - Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25
>>
>> v1 -> v2
>> - Fix hotplug code by adding pmu migration call
>>    incase current designated cpu got offline. As
>>    pointed by Peter Zijlstra.
>>
>> - Removed the retun -1 part from cpu hotplug offline
>>    function.
>>
>> - Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500
>>
>> Kajol Jain (4):
>>    drivers/nvdimm: Add nvdimm pmu structure
>>    drivers/nvdimm: Add perf interface to expose nvdimm performance stats
>>    powerpc/papr_scm: Add perf interface support
>>    docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for
>>      nvdimm pmu
>>
>>   Documentation/ABI/testing/sysfs-bus-nvdimm |  35 +++
>>   arch/powerpc/include/asm/device.h          |   5 +
>>   arch/powerpc/platforms/pseries/papr_scm.c  | 225 ++++++++++++++
>>   drivers/nvdimm/Makefile                    |   1 +
>>   drivers/nvdimm/nd_perf.c                   | 328 +++++++++++++++++++++
>>   include/linux/nd.h                         |  41 +++
>>   6 files changed, 635 insertions(+)
>>   create mode 100644 drivers/nvdimm/nd_perf.c
>>
>

2022-02-25 09:19:59

by kajoljain

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm



On 2/25/22 13:17, Aneesh Kumar K V wrote:
> On Fri, 2022-02-25 at 12:08 +0530, kajoljain wrote:
>>
>>
>> On 2/25/22 11:25, Nageswara Sastry wrote:
>>>
>>>
>>> On 17/02/22 10:03 pm, Kajol Jain wrote:
>>>>
> ....
>>>>
>>>> Changelog
>>>
>>> Tested these patches with the automated tests at
>>> avocado-misc-tests/perf/perf_nmem.py
>>> URL:
>>> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py
>>>
>>>
>>> 1. On the system where target id and online id were different then
>>> not
>>> seeing value in 'cpumask' and those tests failed.
>>>
>>> Example:
>>> Log from dmesg
>>> ...
>>> papr_scm ibm,persistent-memory:ibm,pmemory@44100003: Region
>>> registered
>>> with target node 1 and online node 0
>>> ...
>>
>> Hi Nageswara Sastry,
>>        Thanks for testing the patch set. Yes you right, incase target
>> node id and online node id is different, it can happen when target
>> node is not online and hence can cause this issue, thanks for
>> pointing
>> it.
>>
>> Function dev_to_node will return node id for a given nvdimm device
>> which
>> can be offline in some scenarios. We should use numa node id return
>> by
>> numa_map_to_online_node function in that scenario. This function
>> incase
>> given node is offline, it will lookup for next closest online node
>> and
>> return that nodeid.
>>
>> Can you try with below change and see, if you are still getting this
>> issue. Please let me know.
>>
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index bdf2620db461..4dd513d7c029 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -536,7 +536,7 @@ static void papr_scm_pmu_register(struct
>> papr_scm_priv *p)
>>                                 PERF_PMU_CAP_NO_EXCLUDE;
>>
>>         /*updating the cpumask variable */
>> -       nodeid = dev_to_node(&p->pdev->dev);
>> +       nodeid = numa_map_to_online_node(dev_to_node(&p->pdev->dev));
>>         nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
>>
>>>
>
> Can you use p->region->numa_node?

Hi Aneesh,
Thanks for reviewing the changes. Actually we can't use numa_node
field of region structure directly inside papr_scm.c as we will get
dereferencing issue

Result of build with this change:

arch/powerpc/platforms/pseries/papr_scm.c: In function
'papr_scm_pmu_register':
arch/powerpc/platforms/pseries/papr_scm.c:539:21: error: dereferencing
pointer to incomplete type 'struct nd_region'
nodeid = &p->region->numa_node;
^~
make[3]: *** [scripts/Makefile.build:288:
arch/powerpc/platforms/pseries/papr_scm.o] Error 1
make[2]: *** [scripts/Makefile.build:550:
arch/powerpc/platforms/pseries] Erro

This is because, this structure is defined inside drivers/nvdimm/nd.h
code which is not included in this file.

So, thats why I am using
`numa_map_to_online_node(dev_to_node(&p->pdev->dev));` directly.

Let me know if you are ok with initial change.

Thanks,
Kajol Jain

>
> -aneesh
>
>

2022-02-25 11:41:06

by kajoljain

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm



On 2/25/22 16:41, Nageswara Sastry wrote:
>
>
> On 25/02/22 12:08 pm, kajoljain wrote:
>>
>>
>> On 2/25/22 11:25, Nageswara Sastry wrote:
>>>
>>>
>>> On 17/02/22 10:03 pm, Kajol Jain wrote:
>>>> Patchset adds performance stats reporting support for nvdimm.
>>>> Added interface includes support for pmu register/unregister
>>>> functions. A structure is added called nvdimm_pmu to be used for
>>>> adding arch/platform specific data such as cpumask, nvdimm device
>>>> pointer and pmu event functions like event_init/add/read/del.
>>>> User could use the standard perf tool to access perf events
>>>> exposed via pmu.
>>>>
>>>> Interface also defines supported event list, config fields for the
>>>> event attributes and their corresponding bit values which are exported
>>>> via sysfs. Patch 3 exposes IBM pseries platform nmem* device
>>>> performance stats using this interface.
>>>>
>>>> Result from power9 pseries lpar with 2 nvdimm device:
>>>>
>>>> Ex: List all event by perf list
>>>>
>>>> command:# perf list nmem
>>>>
>>>>     nmem0/cache_rh_cnt/                                [Kernel PMU
>>>> event]
>>>>     nmem0/cache_wh_cnt/                                [Kernel PMU
>>>> event]
>>>>     nmem0/cri_res_util/                                [Kernel PMU
>>>> event]
>>>>     nmem0/ctl_res_cnt/                                 [Kernel PMU
>>>> event]
>>>>     nmem0/ctl_res_tm/                                  [Kernel PMU
>>>> event]
>>>>     nmem0/fast_w_cnt/                                  [Kernel PMU
>>>> event]
>>>>     nmem0/host_l_cnt/                                  [Kernel PMU
>>>> event]
>>>>     nmem0/host_l_dur/                                  [Kernel PMU
>>>> event]
>>>>     nmem0/host_s_cnt/                                  [Kernel PMU
>>>> event]
>>>>     nmem0/host_s_dur/                                  [Kernel PMU
>>>> event]
>>>>     nmem0/med_r_cnt/                                   [Kernel PMU
>>>> event]
>>>>     nmem0/med_r_dur/                                   [Kernel PMU
>>>> event]
>>>>     nmem0/med_w_cnt/                                   [Kernel PMU
>>>> event]
>>>>     nmem0/med_w_dur/                                   [Kernel PMU
>>>> event]
>>>>     nmem0/mem_life/                                    [Kernel PMU
>>>> event]
>>>>     nmem0/poweron_secs/                                [Kernel PMU
>>>> event]
>>>>     ...
>>>>     nmem1/mem_life/                                    [Kernel PMU
>>>> event]
>>>>     nmem1/poweron_secs/                                [Kernel PMU
>>>> event]
>>>>
>>>> Patch1:
>>>>           Introduces the nvdimm_pmu structure
>>>> Patch2:
>>>>           Adds common interface to add arch/platform specific data
>>>>           includes nvdimm device pointer, pmu data along with
>>>>           pmu event functions. It also defines supported event list
>>>>           and adds attribute groups for format, events and cpumask.
>>>>           It also adds code for cpu hotplug support.
>>>> Patch3:
>>>>           Add code in arch/powerpc/platform/pseries/papr_scm.c to
>>>> expose
>>>>           nmem* pmu. It fills in the nvdimm_pmu structure with pmu
>>>> name,
>>>>           capabilities, cpumask and event functions and then registers
>>>>           the pmu by adding callbacks to register_nvdimm_pmu.
>>>> Patch4:
>>>>           Sysfs documentation patch
>>>>
>>>> Changelog
>>>
>>> Tested these patches with the automated tests at
>>> avocado-misc-tests/perf/perf_nmem.py
>>> URL:
>>> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py
>>>
>>>
>>>
>>> 1. On the system where target id and online id were different then not
>>> seeing value in 'cpumask' and those tests failed.
>>>
>>> Example:
>>> Log from dmesg
>>> ...
>>> papr_scm ibm,persistent-memory:ibm,pmemory@44100003: Region registered
>>> with target node 1 and online node 0
>>> ...
>>
>> Hi Nageswara Sastry,
>>         Thanks for testing the patch set. Yes you right, incase target
>> node id and online node id is different, it can happen when target
>> node is not online and hence can cause this issue, thanks for pointing
>> it.
>>
>> Function dev_to_node will return node id for a given nvdimm device which
>> can be offline in some scenarios. We should use numa node id return by
>> numa_map_to_online_node function in that scenario. This function incase
>> given node is offline, it will lookup for next closest online node and
>> return that nodeid.
>>
>> Can you try with below change and see, if you are still getting this
>> issue. Please let me know.
>>
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index bdf2620db461..4dd513d7c029 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -536,7 +536,7 @@ static void papr_scm_pmu_register(struct
>> papr_scm_priv *p)
>>                                  PERF_PMU_CAP_NO_EXCLUDE;
>>
>>          /*updating the cpumask variable */
>> -       nodeid = dev_to_node(&p->pdev->dev);
>> +       nodeid = numa_map_to_online_node(dev_to_node(&p->pdev->dev));
>>          nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
>>
>> Thanks,
>> Kajol Jain
>>
>
> With the above patch all the tests are passing on the system where
> target id and online id were different. Here is the the result:
>
> (1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (3.47 s)
> (2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.15 s)
> (3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.08 s)
> (4/9) perf_nmem.py:perfNMEM.test_all_events: PASS (18.15 s)
> (5/9) perf_nmem.py:perfNMEM.test_all_group_events: PASS (2.22 s)
> (6/9) perf_nmem.py:perfNMEM.test_mixed_events: CANCEL: With single PMU
> mixed events test is not possible. (1.18 s)
> (7/9) perf_nmem.py:perfNMEM.test_pmu_cpumask: PASS (1.12 s)
> (8/9) perf_nmem.py:perfNMEM.test_cpumask: PASS (1.17 s)
> (9/9) perf_nmem.py:perfNMEM.test_cpumask_cpu_off: PASS (1.81 s)
>
> Tested-by: Nageswara R Sastry <[email protected]>

Hi Nageswara,
Thanks for testing this change. I will send new patch series v7 with
this change and also include your Tested-by tag.

Thanks,
Kajol Jain

>

2022-02-25 12:23:29

by R Nageswara Sastry

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm



On 17/02/22 10:03 pm, Kajol Jain wrote:
> Patchset adds performance stats reporting support for nvdimm.
> Added interface includes support for pmu register/unregister
> functions. A structure is added called nvdimm_pmu to be used for
> adding arch/platform specific data such as cpumask, nvdimm device
> pointer and pmu event functions like event_init/add/read/del.
> User could use the standard perf tool to access perf events
> exposed via pmu.
>
> Interface also defines supported event list, config fields for the
> event attributes and their corresponding bit values which are exported
> via sysfs. Patch 3 exposes IBM pseries platform nmem* device
> performance stats using this interface.
>
> Result from power9 pseries lpar with 2 nvdimm device:
>
> Ex: List all event by perf list
>
> command:# perf list nmem
>
> nmem0/cache_rh_cnt/ [Kernel PMU event]
> nmem0/cache_wh_cnt/ [Kernel PMU event]
> nmem0/cri_res_util/ [Kernel PMU event]
> nmem0/ctl_res_cnt/ [Kernel PMU event]
> nmem0/ctl_res_tm/ [Kernel PMU event]
> nmem0/fast_w_cnt/ [Kernel PMU event]
> nmem0/host_l_cnt/ [Kernel PMU event]
> nmem0/host_l_dur/ [Kernel PMU event]
> nmem0/host_s_cnt/ [Kernel PMU event]
> nmem0/host_s_dur/ [Kernel PMU event]
> nmem0/med_r_cnt/ [Kernel PMU event]
> nmem0/med_r_dur/ [Kernel PMU event]
> nmem0/med_w_cnt/ [Kernel PMU event]
> nmem0/med_w_dur/ [Kernel PMU event]
> nmem0/mem_life/ [Kernel PMU event]
> nmem0/poweron_secs/ [Kernel PMU event]
> ...
> nmem1/mem_life/ [Kernel PMU event]
> nmem1/poweron_secs/ [Kernel PMU event]
>
> Patch1:
> Introduces the nvdimm_pmu structure
> Patch2:
> Adds common interface to add arch/platform specific data
> includes nvdimm device pointer, pmu data along with
> pmu event functions. It also defines supported event list
> and adds attribute groups for format, events and cpumask.
> It also adds code for cpu hotplug support.
> Patch3:
> Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
> nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
> capabilities, cpumask and event functions and then registers
> the pmu by adding callbacks to register_nvdimm_pmu.
> Patch4:
> Sysfs documentation patch
>
> Changelog

Tested these patches with the automated tests at
avocado-misc-tests/perf/perf_nmem.py
URL:
https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py

1. On the system where target id and online id were different then not
seeing value in 'cpumask' and those tests failed.

Example:
Log from dmesg
...
papr_scm ibm,persistent-memory:ibm,pmemory@44100003: Region registered
with target node 1 and online node 0
...

tests log:
(1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (1.13 s)
(2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.10 s)
(3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.07 s)
(4/9) perf_nmem.py:perfNMEM.test_all_events: PASS (18.14 s)
(5/9) perf_nmem.py:perfNMEM.test_all_group_events: PASS (2.18 s)
(6/9) perf_nmem.py:perfNMEM.test_mixed_events: CANCEL: With single PMU
mixed events test is not possible. (1.10 s)
(7/9) perf_nmem.py:perfNMEM.test_pmu_cpumask: ERROR: invalid literal
for int() with base 10: '' (1.10 s)
(8/9) perf_nmem.py:perfNMEM.test_cpumask: ERROR: invalid literal for
int() with base 10: '' (1.10 s)
(9/9) perf_nmem.py:perfNMEM.test_cpumask_cpu_off: ERROR: invalid
literal for int() with base 10: '' (1.07 s)

2. On the system where target id and online id were same then seeing
value in 'cpumask' and those tests pass.

tests log:
(1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (1.16 s)
(2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.10 s)
(3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.12 s)
(4/9) perf_nmem.py:perfNMEM.test_all_events: PASS (18.10 s)
(5/9) perf_nmem.py:perfNMEM.test_all_group_events: PASS (2.23 s)
(6/9) perf_nmem.py:perfNMEM.test_mixed_events: CANCEL: With single PMU
mixed events test is not possible. (1.13 s)
(7/9) perf_nmem.py:perfNMEM.test_pmu_cpumask: PASS (1.08 s)
(8/9) perf_nmem.py:perfNMEM.test_cpumask: PASS (1.09 s)
(9/9) perf_nmem.py:perfNMEM.test_cpumask_cpu_off: PASS (1.62 s)

> ---
> Resend v5 -> v6
> - No logic change, just a rebase to latest upstream and
> tested the patchset.
>
> - Link to the patchset Resend v5: https://lkml.org/lkml/2021/11/15/3979
>
> v5 -> Resend v5
> - Resend the patchset
>
> - Link to the patchset v5: https://lkml.org/lkml/2021/9/28/643
>
> v4 -> v5:
> - Remove multiple variables defined in nvdimm_pmu structure include
> name and pmu functions(event_int/add/del/read) as they are just
> used to copy them again in pmu variable. Now we are directly doing
> this step in arch specific code as suggested by Dan Williams.
>
> - Remove attribute group field from nvdimm pmu structure and
> defined these attribute groups in common interface which
> includes format, event list along with cpumask as suggested by
> Dan Williams.
> Since we added static defination for attrbute groups needed in
> common interface, removes corresponding code from papr.
>
> - Add nvdimm pmu event list with event codes in the common interface.
>
> - Remove Acked-by/Reviewed-by/Tested-by tags as code is refactored
> to handle review comments from Dan.
>
> - Make nvdimm_pmu_free_hotplug_memory function static as reported
> by kernel test robot, also add corresponding Reported-by tag.
>
> - Link to the patchset v4: https://lkml.org/lkml/2021/9/3/45
>
> v3 -> v4
> - Rebase code on top of current papr_scm code without any logical
> changes.
>
> - Added Acked-by tag from Peter Zijlstra and Reviewed by tag
> from Madhavan Srinivasan.
>
> - Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605
>
> v2 -> v3
> - Added Tested-by tag.
>
> - Fix nvdimm mailing list in the ABI Documentation.
>
> - Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25
>
> v1 -> v2
> - Fix hotplug code by adding pmu migration call
> incase current designated cpu got offline. As
> pointed by Peter Zijlstra.
>
> - Removed the retun -1 part from cpu hotplug offline
> function.
>
> - Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500
>
> Kajol Jain (4):
> drivers/nvdimm: Add nvdimm pmu structure
> drivers/nvdimm: Add perf interface to expose nvdimm performance stats
> powerpc/papr_scm: Add perf interface support
> docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for
> nvdimm pmu
>
> Documentation/ABI/testing/sysfs-bus-nvdimm | 35 +++
> arch/powerpc/include/asm/device.h | 5 +
> arch/powerpc/platforms/pseries/papr_scm.c | 225 ++++++++++++++
> drivers/nvdimm/Makefile | 1 +
> drivers/nvdimm/nd_perf.c | 328 +++++++++++++++++++++
> include/linux/nd.h | 41 +++
> 6 files changed, 635 insertions(+)
> create mode 100644 drivers/nvdimm/nd_perf.c
>

--
Thanks and Regards
R.Nageswara Sastry

2022-02-25 17:25:00

by R Nageswara Sastry

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm



On 25/02/22 12:08 pm, kajoljain wrote:
>
>
> On 2/25/22 11:25, Nageswara Sastry wrote:
>>
>>
>> On 17/02/22 10:03 pm, Kajol Jain wrote:
>>> Patchset adds performance stats reporting support for nvdimm.
>>> Added interface includes support for pmu register/unregister
>>> functions. A structure is added called nvdimm_pmu to be used for
>>> adding arch/platform specific data such as cpumask, nvdimm device
>>> pointer and pmu event functions like event_init/add/read/del.
>>> User could use the standard perf tool to access perf events
>>> exposed via pmu.
>>>
>>> Interface also defines supported event list, config fields for the
>>> event attributes and their corresponding bit values which are exported
>>> via sysfs. Patch 3 exposes IBM pseries platform nmem* device
>>> performance stats using this interface.
>>>
>>> Result from power9 pseries lpar with 2 nvdimm device:
>>>
>>> Ex: List all event by perf list
>>>
>>> command:# perf list nmem
>>>
>>>    nmem0/cache_rh_cnt/                                [Kernel PMU event]
>>>    nmem0/cache_wh_cnt/                                [Kernel PMU event]
>>>    nmem0/cri_res_util/                                [Kernel PMU event]
>>>    nmem0/ctl_res_cnt/                                 [Kernel PMU event]
>>>    nmem0/ctl_res_tm/                                  [Kernel PMU event]
>>>    nmem0/fast_w_cnt/                                  [Kernel PMU event]
>>>    nmem0/host_l_cnt/                                  [Kernel PMU event]
>>>    nmem0/host_l_dur/                                  [Kernel PMU event]
>>>    nmem0/host_s_cnt/                                  [Kernel PMU event]
>>>    nmem0/host_s_dur/                                  [Kernel PMU event]
>>>    nmem0/med_r_cnt/                                   [Kernel PMU event]
>>>    nmem0/med_r_dur/                                   [Kernel PMU event]
>>>    nmem0/med_w_cnt/                                   [Kernel PMU event]
>>>    nmem0/med_w_dur/                                   [Kernel PMU event]
>>>    nmem0/mem_life/                                    [Kernel PMU event]
>>>    nmem0/poweron_secs/                                [Kernel PMU event]
>>>    ...
>>>    nmem1/mem_life/                                    [Kernel PMU event]
>>>    nmem1/poweron_secs/                                [Kernel PMU event]
>>>
>>> Patch1:
>>>          Introduces the nvdimm_pmu structure
>>> Patch2:
>>>          Adds common interface to add arch/platform specific data
>>>          includes nvdimm device pointer, pmu data along with
>>>          pmu event functions. It also defines supported event list
>>>          and adds attribute groups for format, events and cpumask.
>>>          It also adds code for cpu hotplug support.
>>> Patch3:
>>>          Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
>>>          nmem* pmu. It fills in the nvdimm_pmu structure with pmu name,
>>>          capabilities, cpumask and event functions and then registers
>>>          the pmu by adding callbacks to register_nvdimm_pmu.
>>> Patch4:
>>>          Sysfs documentation patch
>>>
>>> Changelog
>>
>> Tested these patches with the automated tests at
>> avocado-misc-tests/perf/perf_nmem.py
>> URL:
>> https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py
>>
>>
>> 1. On the system where target id and online id were different then not
>> seeing value in 'cpumask' and those tests failed.
>>
>> Example:
>> Log from dmesg
>> ...
>> papr_scm ibm,persistent-memory:ibm,pmemory@44100003: Region registered
>> with target node 1 and online node 0
>> ...
>
> Hi Nageswara Sastry,
> Thanks for testing the patch set. Yes you right, incase target
> node id and online node id is different, it can happen when target
> node is not online and hence can cause this issue, thanks for pointing
> it.
>
> Function dev_to_node will return node id for a given nvdimm device which
> can be offline in some scenarios. We should use numa node id return by
> numa_map_to_online_node function in that scenario. This function incase
> given node is offline, it will lookup for next closest online node and
> return that nodeid.
>
> Can you try with below change and see, if you are still getting this
> issue. Please let me know.
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index bdf2620db461..4dd513d7c029 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -536,7 +536,7 @@ static void papr_scm_pmu_register(struct
> papr_scm_priv *p)
> PERF_PMU_CAP_NO_EXCLUDE;
>
> /*updating the cpumask variable */
> - nodeid = dev_to_node(&p->pdev->dev);
> + nodeid = numa_map_to_online_node(dev_to_node(&p->pdev->dev));
> nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
>
> Thanks,
> Kajol Jain
>

With the above patch all the tests are passing on the system where
target id and online id were different. Here is the the result:

(1/9) perf_nmem.py:perfNMEM.test_pmu_register_dmesg: PASS (3.47 s)
(2/9) perf_nmem.py:perfNMEM.test_sysfs: PASS (1.15 s)
(3/9) perf_nmem.py:perfNMEM.test_pmu_count: PASS (1.08 s)
(4/9) perf_nmem.py:perfNMEM.test_all_events: PASS (18.15 s)
(5/9) perf_nmem.py:perfNMEM.test_all_group_events: PASS (2.22 s)
(6/9) perf_nmem.py:perfNMEM.test_mixed_events: CANCEL: With single PMU
mixed events test is not possible. (1.18 s)
(7/9) perf_nmem.py:perfNMEM.test_pmu_cpumask: PASS (1.12 s)
(8/9) perf_nmem.py:perfNMEM.test_cpumask: PASS (1.17 s)
(9/9) perf_nmem.py:perfNMEM.test_cpumask_cpu_off: PASS (1.81 s)

Tested-by: Nageswara R Sastry <[email protected]>

--
Thanks and Regards
R.Nageswara Sastry

2022-02-26 01:51:05

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH v6 0/4] Add perf interface to expose nvdimm

On Fri, 2022-02-25 at 12:08 +0530, kajoljain wrote:
>
>
> On 2/25/22 11:25, Nageswara Sastry wrote:
> >
> >
> > On 17/02/22 10:03 pm, Kajol Jain wrote:
> > >
....
> > >
> > > Changelog
> >
> > Tested these patches with the automated tests at
> > avocado-misc-tests/perf/perf_nmem.py
> > URL:
> > https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/perf/perf_nmem.py
> >
> >
> > 1. On the system where target id and online id were different then
> > not
> > seeing value in 'cpumask' and those tests failed.
> >
> > Example:
> > Log from dmesg
> > ...
> > papr_scm ibm,persistent-memory:ibm,pmemory@44100003: Region
> > registered
> > with target node 1 and online node 0
> > ...
>
> Hi Nageswara Sastry,
>        Thanks for testing the patch set. Yes you right, incase target
> node id and online node id is different, it can happen when target
> node is not online and hence can cause this issue, thanks for
> pointing
> it.
>
> Function dev_to_node will return node id for a given nvdimm device
> which
> can be offline in some scenarios. We should use numa node id return
> by
> numa_map_to_online_node function in that scenario. This function
> incase
> given node is offline, it will lookup for next closest online node
> and
> return that nodeid.
>
> Can you try with below change and see, if you are still getting this
> issue. Please let me know.
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index bdf2620db461..4dd513d7c029 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -536,7 +536,7 @@ static void papr_scm_pmu_register(struct
> papr_scm_priv *p)
>                                 PERF_PMU_CAP_NO_EXCLUDE;
>
>         /*updating the cpumask variable */
> -       nodeid = dev_to_node(&p->pdev->dev);
> +       nodeid = numa_map_to_online_node(dev_to_node(&p->pdev->dev));
>         nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
>
> >

Can you use p->region->numa_node?

-aneesh