Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for pmu register/unregister
functions. A structure is added called nvdimm_pmu to be used for
adding arch/platform specific data such as supported events, cpumask
pmu event functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.
Added implementation to expose IBM pseries platform nmem*
device performance stats using this interface.
Result from power9 pseries lpar with 2 nvdimm device:
command:# perf list nmem
nmem0/cchrhcnt/ [Kernel PMU event]
nmem0/cchwhcnt/ [Kernel PMU event]
nmem0/critrscu/ [Kernel PMU event]
nmem0/ctlresct/ [Kernel PMU event]
nmem0/ctlrestm/ [Kernel PMU event]
nmem0/fastwcnt/ [Kernel PMU event]
nmem0/hostlcnt/ [Kernel PMU event]
nmem0/hostldur/ [Kernel PMU event]
nmem0/hostscnt/ [Kernel PMU event]
nmem0/hostsdur/ [Kernel PMU event]
nmem0/medrcnt/ [Kernel PMU event]
nmem0/medrdur/ [Kernel PMU event]
nmem0/medwcnt/ [Kernel PMU event]
nmem0/medwdur/ [Kernel PMU event]
nmem0/memlife/ [Kernel PMU event]
nmem0/noopstat/ [Kernel PMU event]
nmem0/ponsecs/ [Kernel PMU event]
nmem1/cchrhcnt/ [Kernel PMU event]
nmem1/cchwhcnt/ [Kernel PMU event]
nmem1/critrscu/ [Kernel PMU event]
...
nmem1/noopstat/ [Kernel PMU event]
nmem1/ponsecs/ [Kernel PMU event]
Patch1:
Introduces the nvdimm_pmu structure
Patch2:
Adds common interface to add arch/platform specific data
includes supported events, pmu event functions. It also
adds code for cpu hotplug support.
Patch3:
Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
nmem* pmu. It fills in the nvdimm_pmu structure with event attrs
cpumask andevent functions and then registers the pmu by adding
callbacks to register_nvdimm_pmu.
Patch4:
Sysfs documentation patch
Changelog
---
v3 -> v4
- Rebase code on top of current papr_scm code without any logical
changes.
- Added Acked-by tag from Peter Zijlstra and Reviewed by tag
from Madhavan Srinivasan.
- Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605
v2 -> v3
- Added Tested-by tag.
- Fix nvdimm mailing list in the ABI Documentation.
- Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25
v1 -> v2
- Fix hotplug code by adding pmu migration call
incase current designated cpu got offline. As
pointed by Peter Zijlstra.
- Removed the retun -1 part from cpu hotplug offline
function.
- Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500
Kajol Jain (4):
drivers/nvdimm: Add nvdimm pmu structure
drivers/nvdimm: Add perf interface to expose nvdimm performance stats
powerpc/papr_scm: Add perf interface support
powerpc/papr_scm: Document papr_scm sysfs event format entries
Documentation/ABI/testing/sysfs-bus-papr-pmem | 31 ++
arch/powerpc/include/asm/device.h | 5 +
arch/powerpc/platforms/pseries/papr_scm.c | 365 ++++++++++++++++++
drivers/nvdimm/Makefile | 1 +
drivers/nvdimm/nd_perf.c | 230 +++++++++++
include/linux/nd.h | 46 +++
6 files changed, 678 insertions(+)
create mode 100644 drivers/nvdimm/nd_perf.c
--
2.26.2
A structure is added, called nvdimm_pmu, for performance
stats reporting support of nvdimm devices. It can be used to add
nvdimm pmu data such as supported events and pmu event functions
like event_init/add/read/del with cpu hotplug support.
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Madhavan Srinivasan <[email protected]>
Tested-by: Nageswara R Sastry <[email protected]>
Signed-off-by: Kajol Jain <[email protected]>
---
include/linux/nd.h | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/include/linux/nd.h b/include/linux/nd.h
index ee9ad76afbba..712499cf7335 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,8 @@
#include <linux/ndctl.h>
#include <linux/device.h>
#include <linux/badblocks.h>
+#include <linux/platform_device.h>
+#include <linux/perf_event.h>
enum nvdimm_event {
NVDIMM_REVALIDATE_POISON,
@@ -23,6 +25,47 @@ enum nvdimm_claim_class {
NVDIMM_CCLASS_UNKNOWN,
};
+/* Event attribute array index */
+#define NVDIMM_PMU_FORMAT_ATTR 0
+#define NVDIMM_PMU_EVENT_ATTR 1
+#define NVDIMM_PMU_CPUMASK_ATTR 2
+#define NVDIMM_PMU_NULL_ATTR 3
+
+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ *
+ * @name: name of the nvdimm pmu device.
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @dev: nvdimm device pointer.
+ * @functions(event_init/add/del/read): platform specific pmu functions.
+ * @attr_groups: data structure for events, formats and cpumask
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ * @arch_cpumask: cpumask to get designated cpu for counter access.
+ */
+struct nvdimm_pmu {
+ const char *name;
+ struct pmu pmu;
+ struct device *dev;
+ int (*event_init)(struct perf_event *event);
+ int (*add)(struct perf_event *event, int flags);
+ void (*del)(struct perf_event *event, int flags);
+ void (*read)(struct perf_event *event);
+ /*
+ * Attribute groups for the nvdimm pmu. Index 0 used for
+ * format attribute, index 1 used for event attribute,
+ * index 2 used for cpusmask attribute and index 3 kept as NULL.
+ */
+ const struct attribute_group *attr_groups[4];
+ int cpu;
+ struct hlist_node node;
+ enum cpuhp_state cpuhp_state;
+
+ /* cpumask provided by arch/platform specific code */
+ struct cpumask arch_cpumask;
+};
+
struct nd_device_driver {
struct device_driver drv;
unsigned long type;
--
2.26.2
Performance monitoring support for papr-scm nvdimm devices
via perf interface is added which includes addition of pmu
functions like add/del/read/event_init for nvdimm_pmu struture.
A new parameter 'priv' in added to the pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.
papr_scm_pmu_register function populates the nvdimm_pmu structure
with events, cpumask, attribute groups along with event handling
functions. Finally the populated nvdimm_pmu structure is passed to
register the pmu device.
Event handling functions internally uses hcall to get events and
counter data.
Result in power9 machine with 2 nvdimm device:
Ex: List all event by perf list
command:# perf list nmem
nmem0/cchrhcnt/ [Kernel PMU event]
nmem0/cchwhcnt/ [Kernel PMU event]
nmem0/critrscu/ [Kernel PMU event]
nmem0/ctlresct/ [Kernel PMU event]
nmem0/ctlrestm/ [Kernel PMU event]
nmem0/fastwcnt/ [Kernel PMU event]
nmem0/hostlcnt/ [Kernel PMU event]
nmem0/hostldur/ [Kernel PMU event]
nmem0/hostscnt/ [Kernel PMU event]
nmem0/hostsdur/ [Kernel PMU event]
nmem0/medrcnt/ [Kernel PMU event]
nmem0/medrdur/ [Kernel PMU event]
nmem0/medwcnt/ [Kernel PMU event]
nmem0/medwdur/ [Kernel PMU event]
nmem0/memlife/ [Kernel PMU event]
nmem0/noopstat/ [Kernel PMU event]
nmem0/ponsecs/ [Kernel PMU event]
nmem1/cchrhcnt/ [Kernel PMU event]
nmem1/cchwhcnt/ [Kernel PMU event]
nmem1/critrscu/ [Kernel PMU event]
...
nmem1/noopstat/ [Kernel PMU event]
nmem1/ponsecs/ [Kernel PMU event]
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Madhavan Srinivasan <[email protected]>
Tested-by: Nageswara R Sastry <[email protected]>
Signed-off-by: Kajol Jain <[email protected]>
---
arch/powerpc/include/asm/device.h | 5 +
arch/powerpc/platforms/pseries/papr_scm.c | 365 ++++++++++++++++++++++
2 files changed, 370 insertions(+)
diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {
struct pdev_archdata {
u64 dma_mask;
+ /*
+ * Pointer to nvdimm_pmu structure, to handle the unregistering
+ * of pmu device
+ */
+ void *priv;
};
#endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f48e87ac89c9..26900101e638 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -19,6 +19,8 @@
#include <asm/papr_pdsm.h>
#include <asm/mce.h>
#include <asm/unaligned.h>
+#include <linux/perf_event.h>
+#include <linux/ctype.h>
#define BIND_ANY_ADDR (~0ul)
@@ -68,6 +70,8 @@
#define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
#define PAPR_SCM_PERF_STATS_VERSION 0x1
+#define to_nvdimm_pmu(_pmu) container_of(_pmu, struct nvdimm_pmu, pmu)
+
/* Struct holding a single performance metric */
struct papr_scm_perf_stat {
u8 stat_id[8];
@@ -120,6 +124,12 @@ struct papr_scm_priv {
/* length of the stat buffer as expected by phyp */
size_t stat_buffer_len;
+
+ /* array to have event_code and stat_id mappings */
+ char **nvdimm_events_map;
+
+ /* count of supported events */
+ u32 total_events;
};
static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -340,6 +350,354 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
return 0;
}
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+ &format_attr_event.attr,
+ NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+ .name = "format",
+ .attrs = nvdimm_pmu_format_attr,
+};
+
+static int papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
+{
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats;
+ struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+ int rc, size;
+
+ /* Allocate request buffer enough to hold single performance stat */
+ size = sizeof(struct papr_scm_perf_stats) +
+ sizeof(struct papr_scm_perf_stat);
+
+ if (!p || !p->nvdimm_events_map)
+ return -EINVAL;
+
+ stats = kzalloc(size, GFP_KERNEL);
+ if (!stats)
+ return -ENOMEM;
+
+ stat = &stats->scm_statistic[0];
+ memcpy(&stat->stat_id,
+ p->nvdimm_events_map[event->attr.config - 1],
+ sizeof(stat->stat_id));
+ stat->stat_val = 0;
+
+ rc = drc_pmem_query_stats(p, stats, 1);
+ if (rc < 0) {
+ kfree(stats);
+ return rc;
+ }
+
+ *count = be64_to_cpu(stat->stat_val);
+ kfree(stats);
+ return 0;
+}
+
+static int papr_scm_pmu_event_init(struct perf_event *event)
+{
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+ struct papr_scm_priv *p;
+
+ if (!nd_pmu)
+ return -EINVAL;
+
+ /* test the event attr type for PMU enumeration */
+ if (event->attr.type != event->pmu->type)
+ return -ENOENT;
+
+ /* it does not support event sampling mode */
+ if (is_sampling_event(event))
+ return -EOPNOTSUPP;
+
+ /* no branch sampling */
+ if (has_branch_stack(event))
+ return -EOPNOTSUPP;
+
+ p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
+ if (!p)
+ return -EINVAL;
+
+ /* Invalid eventcode */
+ if (event->attr.config == 0 || event->attr.config > p->total_events)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int papr_scm_pmu_add(struct perf_event *event, int flags)
+{
+ u64 count;
+ int rc;
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+ if (!nd_pmu)
+ return -EINVAL;
+
+ if (flags & PERF_EF_START) {
+ rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &count);
+ if (rc)
+ return rc;
+
+ local64_set(&event->hw.prev_count, count);
+ }
+
+ return 0;
+}
+
+static void papr_scm_pmu_read(struct perf_event *event)
+{
+ u64 prev, now;
+ int rc;
+ struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+ if (!nd_pmu)
+ return;
+
+ rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &now);
+ if (rc)
+ return;
+
+ prev = local64_xchg(&event->hw.prev_count, now);
+ local64_add(now - prev, &event->count);
+}
+
+static void papr_scm_pmu_del(struct perf_event *event, int flags)
+{
+ papr_scm_pmu_read(event);
+}
+
+static ssize_t device_show_string(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct perf_pmu_events_attr *d;
+
+ d = container_of(attr, struct perf_pmu_events_attr, attr);
+
+ return sysfs_emit(buf, "%s\n", (char *)d->event_str);
+}
+
+static char *strtolower(char *updated_name)
+{
+ int i = 0;
+
+ while (updated_name[i]) {
+ if (isupper(updated_name[i]))
+ updated_name[i] = tolower(updated_name[i]);
+ i++;
+ }
+ updated_name[i] = '\0';
+ return strim(updated_name);
+}
+
+/* device_str_attr_create : Populate event "name" and string "str" in attribute */
+static struct attribute *device_str_attr_create_(char *name, char *str)
+{
+ struct perf_pmu_events_attr *attr;
+
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+ if (!attr)
+ return NULL;
+
+ sysfs_attr_init(&attr->attr.attr);
+ attr->event_str = str;
+ attr->attr.attr.name = strtolower(name);
+ attr->attr.attr.mode = 0444;
+ attr->attr.show = device_show_string;
+
+ return &attr->attr.attr;
+}
+
+static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu *nd_pmu)
+{
+ struct papr_scm_perf_stat *stat;
+ struct papr_scm_perf_stats *stats, *single_stats;
+ int index, size, rc, count;
+ u32 available_events;
+ struct attribute **events;
+ char *eventcode, *eventname, *statid;
+ struct attribute_group *nvdimm_pmu_events_group;
+
+ if (!p->stat_buffer_len)
+ return -ENOENT;
+
+ available_events = (p->stat_buffer_len - sizeof(struct papr_scm_perf_stats))
+ / sizeof(struct papr_scm_perf_stat);
+
+ /* Allocate memory for events attribute group */
+ nvdimm_pmu_events_group = kzalloc(sizeof(*nvdimm_pmu_events_group), GFP_KERNEL);
+ if (!nvdimm_pmu_events_group)
+ return -ENOMEM;
+
+ /* Allocate the buffer for phyp where stats are written */
+ stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
+ if (!stats) {
+ rc = -ENOMEM;
+ goto out_nvdimm_pmu_events_group;
+ }
+
+ /* Allocate memory to nvdimm_event_map */
+ p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL);
+ if (!p->nvdimm_events_map) {
+ rc = -ENOMEM;
+ goto out_stats;
+ }
+
+ /* Called to get list of events supported */
+ rc = drc_pmem_query_stats(p, stats, 0);
+ if (rc)
+ goto out_nvdimm_events_map;
+
+ /* Allocate buffer to hold single performance stat */
+ size = sizeof(struct papr_scm_perf_stats) + sizeof(struct papr_scm_perf_stat);
+
+ single_stats = kzalloc(size, GFP_KERNEL);
+ if (!single_stats) {
+ rc = -ENOMEM;
+ goto out_nvdimm_events_map;
+ }
+
+ events = kzalloc(available_events * sizeof(struct attribute *), GFP_KERNEL);
+ if (!events) {
+ rc = -ENOMEM;
+ goto out_single_stats;
+ }
+
+ for (index = 0, stat = stats->scm_statistic, count = 0;
+ index < available_events; index++, ++stat) {
+
+ single_stats->scm_statistic[0] = *stat;
+ rc = drc_pmem_query_stats(p, single_stats, 1);
+
+ if (rc < 0) {
+ pr_info("Event not supported %s for device %s\n",
+ stat->stat_id, nvdimm_name(p->nvdimm));
+ } else {
+ eventcode = kasprintf(GFP_KERNEL, "event=0x%x", count + 1);
+ eventname = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+ statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+
+ if (!eventname || !statid || !eventcode)
+ goto out;
+
+ strcpy(eventname, stat->stat_id);
+ events[count] = device_str_attr_create_(eventname,
+ eventcode);
+ if (!events[count])
+ goto out;
+
+ strcpy(statid, stat->stat_id);
+ p->nvdimm_events_map[count] = statid;
+ count++;
+ continue;
+out:
+ kfree(eventcode);
+ kfree(eventname);
+ kfree(statid);
+ }
+ }
+
+ if (!count)
+ goto out_events;
+
+ events[count] = NULL;
+ p->nvdimm_events_map[count] = NULL;
+ p->total_events = count;
+
+ nvdimm_pmu_events_group->name = "events";
+ nvdimm_pmu_events_group->attrs = events;
+
+ /* Fill attribute groups for the nvdimm pmu device */
+ nd_pmu->attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
+ nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR] = nvdimm_pmu_events_group;
+ nd_pmu->attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
+
+ kfree(single_stats);
+ kfree(stats);
+ return 0;
+
+out_events:
+ kfree(events);
+out_single_stats:
+ kfree(single_stats);
+out_nvdimm_events_map:
+ kfree(p->nvdimm_events_map);
+out_stats:
+ kfree(stats);
+out_nvdimm_pmu_events_group:
+ kfree(nvdimm_pmu_events_group);
+ return rc;
+}
+
+/* Function to free the attr_groups which are dynamically allocated */
+static void papr_scm_pmu_mem_free(struct nvdimm_pmu *nd_pmu)
+{
+ if (nd_pmu) {
+ if (nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR])
+ kfree(nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR]->attrs);
+ kfree(nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR]);
+ }
+}
+
+static void papr_scm_pmu_register(struct papr_scm_priv *p)
+{
+ struct nvdimm_pmu *nd_pmu;
+ int rc, nodeid;
+
+ nd_pmu = kzalloc(sizeof(*nd_pmu), GFP_KERNEL);
+ if (!nd_pmu) {
+ rc = -ENOMEM;
+ goto pmu_err_print;
+ }
+
+ rc = papr_scm_pmu_check_events(p, nd_pmu);
+ if (rc)
+ goto pmu_check_events_err;
+
+ nd_pmu->name = nvdimm_name(p->nvdimm);
+ nd_pmu->event_init = papr_scm_pmu_event_init;
+ nd_pmu->read = papr_scm_pmu_read;
+ nd_pmu->add = papr_scm_pmu_add;
+ nd_pmu->del = papr_scm_pmu_del;
+
+ /*updating the cpumask variable */
+ nodeid = dev_to_node(&p->pdev->dev);
+ nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
+
+ /* cpumask should not be NULL */
+ WARN_ON_ONCE(cpumask_empty(&nd_pmu->arch_cpumask));
+
+ rc = register_nvdimm_pmu(nd_pmu, p->pdev);
+ if (rc)
+ goto pmu_register_err;
+
+ /*
+ * Set archdata.priv value to nvdimm_pmu structure, to handle the
+ * unregistering of pmu device.
+ */
+ p->pdev->archdata.priv = nd_pmu;
+ return;
+
+pmu_register_err:
+ papr_scm_pmu_mem_free(nd_pmu);
+ kfree(p->nvdimm_events_map);
+pmu_check_events_err:
+ kfree(nd_pmu);
+pmu_err_print:
+ dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
+}
+
+static void papr_scm_pmu_uninit(struct nvdimm_pmu *nd_pmu)
+{
+ unregister_nvdimm_pmu(nd_pmu);
+ papr_scm_pmu_mem_free(nd_pmu);
+ kfree(nd_pmu);
+}
+
/*
* Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
* health information.
@@ -1236,6 +1594,7 @@ static int papr_scm_probe(struct platform_device *pdev)
goto err2;
platform_set_drvdata(pdev, p);
+ papr_scm_pmu_register(p);
return 0;
@@ -1254,6 +1613,12 @@ static int papr_scm_remove(struct platform_device *pdev)
nvdimm_bus_unregister(p->bus);
drc_pmem_unbind(p);
+
+ if (pdev->archdata.priv)
+ papr_scm_pmu_uninit(pdev->archdata.priv);
+
+ pdev->archdata.priv = NULL;
+ kfree(p->nvdimm_events_map);
kfree(p->bus_desc.provider_name);
kfree(p);
--
2.26.2
Hi Kajol,
Apologies for the delay in responding to this series, some comments below:
On Thu, Sep 2, 2021 at 10:10 PM Kajol Jain <[email protected]> wrote:
>
> A structure is added, called nvdimm_pmu, for performance
> stats reporting support of nvdimm devices. It can be used to add
> nvdimm pmu data such as supported events and pmu event functions
> like event_init/add/read/del with cpu hotplug support.
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
> Reviewed-by: Madhavan Srinivasan <[email protected]>
> Tested-by: Nageswara R Sastry <[email protected]>
> Signed-off-by: Kajol Jain <[email protected]>
> ---
> include/linux/nd.h | 43 +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/include/linux/nd.h b/include/linux/nd.h
> index ee9ad76afbba..712499cf7335 100644
> --- a/include/linux/nd.h
> +++ b/include/linux/nd.h
> @@ -8,6 +8,8 @@
> #include <linux/ndctl.h>
> #include <linux/device.h>
> #include <linux/badblocks.h>
> +#include <linux/platform_device.h>
> +#include <linux/perf_event.h>
>
> enum nvdimm_event {
> NVDIMM_REVALIDATE_POISON,
> @@ -23,6 +25,47 @@ enum nvdimm_claim_class {
> NVDIMM_CCLASS_UNKNOWN,
> };
>
> +/* Event attribute array index */
> +#define NVDIMM_PMU_FORMAT_ATTR 0
> +#define NVDIMM_PMU_EVENT_ATTR 1
> +#define NVDIMM_PMU_CPUMASK_ATTR 2
> +#define NVDIMM_PMU_NULL_ATTR 3
> +
> +/**
> + * struct nvdimm_pmu - data structure for nvdimm perf driver
> + *
> + * @name: name of the nvdimm pmu device.
> + * @pmu: pmu data structure for nvdimm performance stats.
> + * @dev: nvdimm device pointer.
> + * @functions(event_init/add/del/read): platform specific pmu functions.
This is not valid kernel-doc:
include/linux/nd.h:67: warning: Function parameter or member
'event_init' not described in 'nvdimm_pmu'
include/linux/nd.h:67: warning: Function parameter or member 'add' not
described in 'nvdimm_pmu'
include/linux/nd.h:67: warning: Function parameter or member 'del' not
described in 'nvdimm_pmu'
include/linux/nd.h:67: warning: Function parameter or member 'read'
not described in 'nvdimm_pmu'
...but I think rather than fixing those up 'struct nvdimm_pmu' should be pruned.
It's not clear to me that it is worth the effort to describe these
details to the nvdimm core which is just going to turn around and call
the pmu core. I'd just as soon have the driver call the pmu core
directly, optionally passing in attributes and callbacks that come
from the nvdimm core and/or the nvdimm provider.
Otherwise it's also not clear which of these structure members are
used at runtime vs purely used as temporary storage to pass parameters
to the pmu core.
> + * @attr_groups: data structure for events, formats and cpumask
> + * @cpu: designated cpu for counter access.
> + * @node: node for cpu hotplug notifier link.
> + * @cpuhp_state: state for cpu hotplug notification.
> + * @arch_cpumask: cpumask to get designated cpu for counter access.
> + */
> +struct nvdimm_pmu {
> + const char *name;
> + struct pmu pmu;
> + struct device *dev;
> + int (*event_init)(struct perf_event *event);
> + int (*add)(struct perf_event *event, int flags);
> + void (*del)(struct perf_event *event, int flags);
> + void (*read)(struct perf_event *event);
> + /*
> + * Attribute groups for the nvdimm pmu. Index 0 used for
> + * format attribute, index 1 used for event attribute,
> + * index 2 used for cpusmask attribute and index 3 kept as NULL.
> + */
> + const struct attribute_group *attr_groups[4];
Following from above, I'd rather this was organized as static
attributes with an is_visible() helper for the groups for any dynamic
aspects. That mirrors the behavior of nvdimm_create() and allows for
device drivers to compose the attribute groups from a core set and /
or a provider specific set.
> + int cpu;
> + struct hlist_node node;
> + enum cpuhp_state cpuhp_state;
> +
> + /* cpumask provided by arch/platform specific code */
> + struct cpumask arch_cpumask;
> +};
> +
> struct nd_device_driver {
> struct device_driver drv;
> unsigned long type;
> --
> 2.26.2
>
On 9/8/21 3:29 AM, Dan Williams wrote:
> Hi Kajol,
>
> Apologies for the delay in responding to this series, some comments below:
Hi Dan,
No issues, thanks for reviewing the patches.
>
> On Thu, Sep 2, 2021 at 10:10 PM Kajol Jain <[email protected]> wrote:
>>
>> A structure is added, called nvdimm_pmu, for performance
>> stats reporting support of nvdimm devices. It can be used to add
>> nvdimm pmu data such as supported events and pmu event functions
>> like event_init/add/read/del with cpu hotplug support.
>>
>> Acked-by: Peter Zijlstra (Intel) <[email protected]>
>> Reviewed-by: Madhavan Srinivasan <[email protected]>
>> Tested-by: Nageswara R Sastry <[email protected]>
>> Signed-off-by: Kajol Jain <[email protected]>
>> ---
>> include/linux/nd.h | 43 +++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 43 insertions(+)
>>
>> diff --git a/include/linux/nd.h b/include/linux/nd.h
>> index ee9ad76afbba..712499cf7335 100644
>> --- a/include/linux/nd.h
>> +++ b/include/linux/nd.h
>> @@ -8,6 +8,8 @@
>> #include <linux/ndctl.h>
>> #include <linux/device.h>
>> #include <linux/badblocks.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/perf_event.h>
>>
>> enum nvdimm_event {
>> NVDIMM_REVALIDATE_POISON,
>> @@ -23,6 +25,47 @@ enum nvdimm_claim_class {
>> NVDIMM_CCLASS_UNKNOWN,
>> };
>>
>> +/* Event attribute array index */
>> +#define NVDIMM_PMU_FORMAT_ATTR 0
>> +#define NVDIMM_PMU_EVENT_ATTR 1
>> +#define NVDIMM_PMU_CPUMASK_ATTR 2
>> +#define NVDIMM_PMU_NULL_ATTR 3
>> +
>> +/**
>> + * struct nvdimm_pmu - data structure for nvdimm perf driver
>> + *
>> + * @name: name of the nvdimm pmu device.
>> + * @pmu: pmu data structure for nvdimm performance stats.
>> + * @dev: nvdimm device pointer.
>> + * @functions(event_init/add/del/read): platform specific pmu functions.
>
> This is not valid kernel-doc:
>
> include/linux/nd.h:67: warning: Function parameter or member
> 'event_init' not described in 'nvdimm_pmu'
> include/linux/nd.h:67: warning: Function parameter or member 'add' not
> described in 'nvdimm_pmu'
> include/linux/nd.h:67: warning: Function parameter or member 'del' not
> described in 'nvdimm_pmu'
> include/linux/nd.h:67: warning: Function parameter or member 'read'
> not described in 'nvdimm_pmu'
>
> ...but I think rather than fixing those up 'struct nvdimm_pmu' should be pruned.
>
> It's not clear to me that it is worth the effort to describe these
> details to the nvdimm core which is just going to turn around and call
> the pmu core. I'd just as soon have the driver call the pmu core
> directly, optionally passing in attributes and callbacks that come
> from the nvdimm core and/or the nvdimm provider.
The intend for adding these callbacks(event_init/add/del/read) is to give
flexibility to the nvdimm core to add some common checks/routines if required
in the future. Those checks can be common for all architecture with still having the
ability to call arch/platform specific driver code to use its own routines.
But as you said, currently we don't have any common checks and it directly
calling platform specific code, so we can get rid of it.
Should we remove this part for now?
>
> Otherwise it's also not clear which of these structure members are
> used at runtime vs purely used as temporary storage to pass parameters
> to the pmu core.
>
>> + * @attr_groups: data structure for events, formats and cpumask
>> + * @cpu: designated cpu for counter access.
>> + * @node: node for cpu hotplug notifier link.
>> + * @cpuhp_state: state for cpu hotplug notification.
>> + * @arch_cpumask: cpumask to get designated cpu for counter access.
>> + */
>> +struct nvdimm_pmu {
>> + const char *name;
>> + struct pmu pmu;
>> + struct device *dev;
>> + int (*event_init)(struct perf_event *event);
>> + int (*add)(struct perf_event *event, int flags);
>> + void (*del)(struct perf_event *event, int flags);
>> + void (*read)(struct perf_event *event);
>> + /*
>> + * Attribute groups for the nvdimm pmu. Index 0 used for
>> + * format attribute, index 1 used for event attribute,
>> + * index 2 used for cpusmask attribute and index 3 kept as NULL.
>> + */
>> + const struct attribute_group *attr_groups[4];
>
> Following from above, I'd rather this was organized as static
> attributes with an is_visible() helper for the groups for any dynamic
> aspects. That mirrors the behavior of nvdimm_create() and allows for
> device drivers to compose the attribute groups from a core set and /
> or a provider specific set.
Since we don't have any common events right now, Can I use papr
attributes directly or should we create dummy events for common thing and
then merged it with papr event list.
Thanks,
Kajol Jain
>
>> + int cpu;
>> + struct hlist_node node;
>> + enum cpuhp_state cpuhp_state;
>> +
>> + /* cpumask provided by arch/platform specific code */
>> + struct cpumask arch_cpumask;
>> +};
>> +
>> struct nd_device_driver {
>> struct device_driver drv;
>> unsigned long type;
>> --
>> 2.26.2
>>
On Thu, Sep 9, 2021 at 12:56 AM kajoljain <[email protected]> wrote:
>
>
>
> On 9/8/21 3:29 AM, Dan Williams wrote:
> > Hi Kajol,
> >
> > Apologies for the delay in responding to this series, some comments below:
>
> Hi Dan,
> No issues, thanks for reviewing the patches.
>
> >
> > On Thu, Sep 2, 2021 at 10:10 PM Kajol Jain <[email protected]> wrote:
> >>
> >> A structure is added, called nvdimm_pmu, for performance
> >> stats reporting support of nvdimm devices. It can be used to add
> >> nvdimm pmu data such as supported events and pmu event functions
> >> like event_init/add/read/del with cpu hotplug support.
> >>
> >> Acked-by: Peter Zijlstra (Intel) <[email protected]>
> >> Reviewed-by: Madhavan Srinivasan <[email protected]>
> >> Tested-by: Nageswara R Sastry <[email protected]>
> >> Signed-off-by: Kajol Jain <[email protected]>
> >> ---
> >> include/linux/nd.h | 43 +++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 43 insertions(+)
> >>
> >> diff --git a/include/linux/nd.h b/include/linux/nd.h
> >> index ee9ad76afbba..712499cf7335 100644
> >> --- a/include/linux/nd.h
> >> +++ b/include/linux/nd.h
> >> @@ -8,6 +8,8 @@
> >> #include <linux/ndctl.h>
> >> #include <linux/device.h>
> >> #include <linux/badblocks.h>
> >> +#include <linux/platform_device.h>
> >> +#include <linux/perf_event.h>
> >>
> >> enum nvdimm_event {
> >> NVDIMM_REVALIDATE_POISON,
> >> @@ -23,6 +25,47 @@ enum nvdimm_claim_class {
> >> NVDIMM_CCLASS_UNKNOWN,
> >> };
> >>
> >> +/* Event attribute array index */
> >> +#define NVDIMM_PMU_FORMAT_ATTR 0
> >> +#define NVDIMM_PMU_EVENT_ATTR 1
> >> +#define NVDIMM_PMU_CPUMASK_ATTR 2
> >> +#define NVDIMM_PMU_NULL_ATTR 3
> >> +
> >> +/**
> >> + * struct nvdimm_pmu - data structure for nvdimm perf driver
> >> + *
> >> + * @name: name of the nvdimm pmu device.
> >> + * @pmu: pmu data structure for nvdimm performance stats.
> >> + * @dev: nvdimm device pointer.
> >> + * @functions(event_init/add/del/read): platform specific pmu functions.
> >
> > This is not valid kernel-doc:
> >
> > include/linux/nd.h:67: warning: Function parameter or member
> > 'event_init' not described in 'nvdimm_pmu'
> > include/linux/nd.h:67: warning: Function parameter or member 'add' not
> > described in 'nvdimm_pmu'
> > include/linux/nd.h:67: warning: Function parameter or member 'del' not
> > described in 'nvdimm_pmu'
> > include/linux/nd.h:67: warning: Function parameter or member 'read'
> > not described in 'nvdimm_pmu'
> >
> > ...but I think rather than fixing those up 'struct nvdimm_pmu' should be pruned.
> >
> > It's not clear to me that it is worth the effort to describe these
> > details to the nvdimm core which is just going to turn around and call
> > the pmu core. I'd just as soon have the driver call the pmu core
> > directly, optionally passing in attributes and callbacks that come
> > from the nvdimm core and/or the nvdimm provider.
>
> The intend for adding these callbacks(event_init/add/del/read) is to give
> flexibility to the nvdimm core to add some common checks/routines if required
> in the future. Those checks can be common for all architecture with still having the
> ability to call arch/platform specific driver code to use its own routines.
>
> But as you said, currently we don't have any common checks and it directly
> calling platform specific code, so we can get rid of it.
> Should we remove this part for now?
Yes, lets go direct to the perf api for now and await the need for a
common core wrapper to present itself.
>
>
> >
> > Otherwise it's also not clear which of these structure members are
> > used at runtime vs purely used as temporary storage to pass parameters
> > to the pmu core.
> >
> >> + * @attr_groups: data structure for events, formats and cpumask
> >> + * @cpu: designated cpu for counter access.
> >> + * @node: node for cpu hotplug notifier link.
> >> + * @cpuhp_state: state for cpu hotplug notification.
> >> + * @arch_cpumask: cpumask to get designated cpu for counter access.
> >> + */
> >> +struct nvdimm_pmu {
> >> + const char *name;
> >> + struct pmu pmu;
> >> + struct device *dev;
> >> + int (*event_init)(struct perf_event *event);
> >> + int (*add)(struct perf_event *event, int flags);
> >> + void (*del)(struct perf_event *event, int flags);
> >> + void (*read)(struct perf_event *event);
> >> + /*
> >> + * Attribute groups for the nvdimm pmu. Index 0 used for
> >> + * format attribute, index 1 used for event attribute,
> >> + * index 2 used for cpusmask attribute and index 3 kept as NULL.
> >> + */
> >> + const struct attribute_group *attr_groups[4];
> >
> > Following from above, I'd rather this was organized as static
> > attributes with an is_visible() helper for the groups for any dynamic
> > aspects. That mirrors the behavior of nvdimm_create() and allows for
> > device drivers to compose the attribute groups from a core set and /
> > or a provider specific set.
>
> Since we don't have any common events right now, Can I use papr
> attributes directly or should we create dummy events for common thing and
> then merged it with papr event list.
Just use papr events directly.
On Tue, Sep 14, 2021 at 9:08 PM Dan Williams <[email protected]> wrote:
>
> On Thu, Sep 9, 2021 at 12:56 AM kajoljain <[email protected]> wrote:
> >
> >
> >
> > On 9/8/21 3:29 AM, Dan Williams wrote:
> > > Hi Kajol,
> > >
> > > Apologies for the delay in responding to this series, some comments below:
> >
> > Hi Dan,
> > No issues, thanks for reviewing the patches.
> >
> > >
> > > On Thu, Sep 2, 2021 at 10:10 PM Kajol Jain <[email protected]> wrote:
> > >>
> > >> A structure is added, called nvdimm_pmu, for performance
> > >> stats reporting support of nvdimm devices. It can be used to add
> > >> nvdimm pmu data such as supported events and pmu event functions
> > >> like event_init/add/read/del with cpu hotplug support.
> > >>
> > >> Acked-by: Peter Zijlstra (Intel) <[email protected]>
> > >> Reviewed-by: Madhavan Srinivasan <[email protected]>
> > >> Tested-by: Nageswara R Sastry <[email protected]>
> > >> Signed-off-by: Kajol Jain <[email protected]>
> > >> ---
> > >> include/linux/nd.h | 43 +++++++++++++++++++++++++++++++++++++++++++
> > >> 1 file changed, 43 insertions(+)
> > >>
> > >> diff --git a/include/linux/nd.h b/include/linux/nd.h
> > >> index ee9ad76afbba..712499cf7335 100644
> > >> --- a/include/linux/nd.h
> > >> +++ b/include/linux/nd.h
> > >> @@ -8,6 +8,8 @@
> > >> #include <linux/ndctl.h>
> > >> #include <linux/device.h>
> > >> #include <linux/badblocks.h>
> > >> +#include <linux/platform_device.h>
> > >> +#include <linux/perf_event.h>
> > >>
> > >> enum nvdimm_event {
> > >> NVDIMM_REVALIDATE_POISON,
> > >> @@ -23,6 +25,47 @@ enum nvdimm_claim_class {
> > >> NVDIMM_CCLASS_UNKNOWN,
> > >> };
> > >>
> > >> +/* Event attribute array index */
> > >> +#define NVDIMM_PMU_FORMAT_ATTR 0
> > >> +#define NVDIMM_PMU_EVENT_ATTR 1
> > >> +#define NVDIMM_PMU_CPUMASK_ATTR 2
> > >> +#define NVDIMM_PMU_NULL_ATTR 3
> > >> +
> > >> +/**
> > >> + * struct nvdimm_pmu - data structure for nvdimm perf driver
> > >> + *
> > >> + * @name: name of the nvdimm pmu device.
> > >> + * @pmu: pmu data structure for nvdimm performance stats.
> > >> + * @dev: nvdimm device pointer.
> > >> + * @functions(event_init/add/del/read): platform specific pmu functions.
> > >
> > > This is not valid kernel-doc:
> > >
> > > include/linux/nd.h:67: warning: Function parameter or member
> > > 'event_init' not described in 'nvdimm_pmu'
> > > include/linux/nd.h:67: warning: Function parameter or member 'add' not
> > > described in 'nvdimm_pmu'
> > > include/linux/nd.h:67: warning: Function parameter or member 'del' not
> > > described in 'nvdimm_pmu'
> > > include/linux/nd.h:67: warning: Function parameter or member 'read'
> > > not described in 'nvdimm_pmu'
> > >
> > > ...but I think rather than fixing those up 'struct nvdimm_pmu' should be pruned.
> > >
> > > It's not clear to me that it is worth the effort to describe these
> > > details to the nvdimm core which is just going to turn around and call
> > > the pmu core. I'd just as soon have the driver call the pmu core
> > > directly, optionally passing in attributes and callbacks that come
> > > from the nvdimm core and/or the nvdimm provider.
> >
> > The intend for adding these callbacks(event_init/add/del/read) is to give
> > flexibility to the nvdimm core to add some common checks/routines if required
> > in the future. Those checks can be common for all architecture with still having the
> > ability to call arch/platform specific driver code to use its own routines.
> >
> > But as you said, currently we don't have any common checks and it directly
> > calling platform specific code, so we can get rid of it.
> > Should we remove this part for now?
>
> Yes, lets go direct to the perf api for now and await the need for a
> common core wrapper to present itself.
>
> >
> >
> > >
> > > Otherwise it's also not clear which of these structure members are
> > > used at runtime vs purely used as temporary storage to pass parameters
> > > to the pmu core.
> > >
> > >> + * @attr_groups: data structure for events, formats and cpumask
> > >> + * @cpu: designated cpu for counter access.
> > >> + * @node: node for cpu hotplug notifier link.
> > >> + * @cpuhp_state: state for cpu hotplug notification.
> > >> + * @arch_cpumask: cpumask to get designated cpu for counter access.
> > >> + */
> > >> +struct nvdimm_pmu {
> > >> + const char *name;
> > >> + struct pmu pmu;
> > >> + struct device *dev;
> > >> + int (*event_init)(struct perf_event *event);
> > >> + int (*add)(struct perf_event *event, int flags);
> > >> + void (*del)(struct perf_event *event, int flags);
> > >> + void (*read)(struct perf_event *event);
> > >> + /*
> > >> + * Attribute groups for the nvdimm pmu. Index 0 used for
> > >> + * format attribute, index 1 used for event attribute,
> > >> + * index 2 used for cpusmask attribute and index 3 kept as NULL.
> > >> + */
> > >> + const struct attribute_group *attr_groups[4];
> > >
> > > Following from above, I'd rather this was organized as static
> > > attributes with an is_visible() helper for the groups for any dynamic
> > > aspects. That mirrors the behavior of nvdimm_create() and allows for
> > > device drivers to compose the attribute groups from a core set and /
> > > or a provider specific set.
> >
> > Since we don't have any common events right now, Can I use papr
> > attributes directly or should we create dummy events for common thing and
> > then merged it with papr event list.
>
> Just use papr events directly.
That is to say...I think if another implementation followed it should
try to match as many common event names as papr_scm picked, and
possibly extend with its own rather than start with a papr_scm
specific namespace for everything.