The PAPR standard[1][3] provides mechanisms to query the health and
performance stats of an NVDIMM via various hcalls as described in
Ref[2]. Until now these stats were never available nor exposed to the
user-space tools like 'ndctl'. This is partly due to PAPR platform not
having support for ACPI and NFIT. Hence 'ndctl' is unable to query and
report the dimm health status and a user had no way to determine the
current health status of a NDVIMM.
To overcome this limitation, this patch-set updates papr_scm kernel
module to query and fetch NVDIMM health stats using hcalls described
in Ref[2]. This health and performance stats are then exposed to
userspace via sysfs and PAPR-NVDIMM-Specific-Methods(PDSM) issued by
libndctl.
These changes coupled with proposed ndtcl changes located at Ref[4]
should provide a way for the user to retrieve NVDIMM health status
using ndtcl.
Below is a sample output using proposed kernel + ndctl for PAPR NVDIMM
in a emulation environment:
# ndctl list -DH
[
{
"dev":"nmem0",
"health":{
"health_state":"fatal",
"shutdown_state":"dirty"
}
}
]
Dimm health report output on a pseries guest lpar with vPMEM or HMS
based NVDIMMs that are in perfectly healthy conditions:
# ndctl list -d nmem0 -H
[
{
"dev":"nmem0",
"health":{
"health_state":"ok",
"shutdown_state":"clean"
}
}
]
PAPR NVDIMM-Specific-Methods(PDSM)
==================================
PDSM requests are issued by vendor specific code in libndctl to
execute certain operations or fetch information from NVDIMMS. PDSMs
requests can be sent to papr_scm module via libndctl(userspace) and
libnvdimm (kernel) using the ND_CMD_CALL ioctl command which can be
handled in the dimm control function papr_scm_ndctl(). Current
patchset proposes a single PDSM to retrieve NVDIMM health, defined in
the newly introduced uapi header named 'papr_scm_pdsm.h'. Support for
more PDSMs will be added in future.
Structure of the patch-set
==========================
The patch-set starts with a doc patch documenting details of hcall
H_SCM_HEALTH. Second patch exports kernel symbol seq_buf_printf()
thats used in subsequent patches to generate sysfs attribute content.
Third patch implements support for fetching NVDIMM health information
from PHYP and partially exposing it to user-space via a NVDIMM sysfs
flag.
Fourth patches deal with implementing support for servicing PDSM
commands in papr_scm module.
Finally the last patch implements support for servicing PDSM
'PAPR_SCM_PDSM_HEALTH' that returns the NVDIMM health information to
libndctl.
Changelog:
==========
Resend:
* Added ack from Steven Rostedt on Patch-2 that exports kernel symbol
seq_buf_printf()
v6..v7:
* Incorporate various review comments from Mpe. Removed papr_scm.h
* Added a patch to export seq_buf_printf() [Mpe, Steven Rostedt]
* header file and moved its contents to papr_scm.c.
* Split function drc_pmem_query_health() into two functions, one that takes
care of caching and concurrency and other one that doesn't.
* Fixed a possible incorrect way to make local copy of nvdimm health data.
* Some variable renames changed as suggested in previous review.
* Removed unused macros/defines from papr_scm_pdsm.h
* Updated papr_scm_pdsm.h to remove usage of __KERNEL__ define.
* Updated papr_scm_pdsm.h to remove redefinition of __packed macro.
v5..v6:
* Incorporate review comments from Mpe and Dan Williams.
* Changed the usage of term DSM to PDSM as former conflicted with
usage in ACPI context.
* UAPI updates to remove usage of bool and marking the structs
defined as 'packed'.
* Simplified the health-bitmap handling in papr_scm to use u64
instead of __be64 integers.
* Caching of the health information so reading the dimm-flag file
doesn't result in costly hcalls everytime.
* Changed dimm-flag 'save_fail' to 'flush_fail'
* Moved the dimm flag file from 'papr_flags' to 'papr/flags'.
* Added a patch to document H_SCM_HEALTH hcall return values.
* Added sysfs ABI documentation for newly introduce dimm-flag
sysfs file 'papr/flags'
v4..v5:
* Fixed a bug in new implementation of papr_scm_ndctl() that was triggering
a false error condition.
v3..v4:
* Restructured papr_scm_ndctl() to dispatch ND_CMD_CALL commands to a new
function named papr_scm_service_dsm() to serivice DSM requests. [Aneesh]
v2..v3:
* Updated the papr_scm_dsm.h header to be more confimant general kernel
guidelines for UAPI headers. [Aneesh]
* Changed the definition of macro PAPR_SCM_DIMM_UNARMED_MASK to not
include case when the NVDIMM is unarmed because its a vPMEM
NVDIMM. [Aneesh]
v1..v2:
* Restructured the patch-set based on review comments on V1 patch-set to
simplify the patch review. Multiple small patches have been combined into
single patches to reduce cross referencing that was needed in earlier
patch-set. Hence most of the patches in this patch-set as now new. [Aneesh]
* Removed the initial work done for fetch NVDIMM performance statistics.
These changes will be re-proposed in a separate patch-set. [Aneesh]
* Simplified handling of versioning of 'struct
nd_papr_scm_dimm_health_stat_v1' as only one version of the structure is
currently in existence.
References:
[1] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2] commit 58b278f568f0
("powerpc: Provide initial documentation for PAPR hcalls")
[3] "Linux on Power Architecture Platform Reference"
https://members.openpowerfoundation.org/document/dl/469
[4] https://github.com/vaibhav92/ndctl/tree/papr_scm_health_v7
Vaibhav Jain (5):
powerpc: Document details on H_SCM_HEALTH hcall
seq_buf: Export seq_buf_printf() to external modules
powerpc/papr_scm: Fetch nvdimm health information from PHYP
ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
powerpc/papr_scm: Implement support for PAPR_SCM_PDSM_HEALTH
Documentation/ABI/testing/sysfs-bus-papr-scm | 27 ++
Documentation/powerpc/papr_hcalls.rst | 43 ++-
arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 173 +++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 363 +++++++++++++++++-
include/uapi/linux/ndctl.h | 1 +
lib/seq_buf.c | 1 +
6 files changed, 595 insertions(+), 13 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
--
2.26.2
Implement support for fetching nvdimm health information via
H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
of 64-bit big-endian integers, bitwise-and of which is then stored in
'struct papr_scm_priv' and subsequently partially exposed to
user-space via newly introduced dimm specific attribute
'papr/flags'. Since the hcall is costly, the health information is
cached and only re-queried, 60s after the previous successful hcall.
The patch also adds a documentation text describing flags reported by
the the new sysfs attribute 'papr/flags' is also introduced at
Documentation/ABI/testing/sysfs-bus-papr-scm.
[1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
PAPR hcalls")
Cc: Dan Williams <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Aneesh Kumar K . V" <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* None
v6..v7 :
* Used the exported buf_seq_printf() function to generate content for
'papr/flags'
* Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
and removed the papr_scm.h file [Mpe]
* Some minor consistency issued in sysfs-bus-papr-scm
documentation. [Mpe]
* s/dimm_mutex/health_mutex/g [Mpe]
* Split drc_pmem_query_health() into two function one of which takes
care of caching and locking. [Mpe]
* Fixed a local copy creation of dimm health information using
READ_ONCE(). [Mpe]
v5..v6 :
* Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
[Dan Williams]
* Include documentation for 'papr/flags' attr [Dan Williams]
* Change flag 'save_fail' to 'flush_fail' [Dan Williams]
* Caching of health bitmap to reduce expensive hcalls [Dan Williams]
* Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
* Replaced two __be64 integers from papr_scm_priv to a single u64
integer [Mpe]
* Updated patch description to reflect the changes made in this
version.
* Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
flags_show() [Dan Williams]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
NVDIMM unarmed [Aneesh]
v1..v2 :
* New patch in the series.
---
Documentation/ABI/testing/sysfs-bus-papr-scm | 27 +++
arch/powerpc/platforms/pseries/papr_scm.c | 169 ++++++++++++++++++-
2 files changed, 194 insertions(+), 2 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm b/Documentation/ABI/testing/sysfs-bus-papr-scm
new file mode 100644
index 000000000000..6143d06072f1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-papr-scm
@@ -0,0 +1,27 @@
+What: /sys/bus/nd/devices/nmemX/papr/flags
+Date: Apr, 2020
+KernelVersion: v5.8
+Contact: linuxppc-dev <[email protected]>, [email protected],
+Description:
+ (RO) Report flags indicating various states of a
+ papr-scm NVDIMM device. Each flag maps to a one or
+ more bits set in the dimm-health-bitmap retrieved in
+ response to H_SCM_HEALTH hcall. The details of the bit
+ flags returned in response to this hcall is available
+ at 'Documentation/powerpc/papr_hcalls.rst' . Below are
+ the flags reported in this sysfs file:
+
+ * "not_armed" : Indicates that NVDIMM contents will not
+ survive a power cycle.
+ * "flush_fail" : Indicates that NVDIMM contents
+ couldn't be flushed during last
+ shut-down event.
+ * "restore_fail": Indicates that NVDIMM contents
+ couldn't be restored during NVDIMM
+ initialization.
+ * "encrypted" : NVDIMM contents are encrypted.
+ * "smart_notify": There is health event for the NVDIMM.
+ * "scrubbed" : Indicating that contents of the
+ NVDIMM have been scrubbed.
+ * "locked" : Indicating that NVDIMM contents cant
+ be modified until next power cycle.
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..142636e1a59f 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -12,6 +12,7 @@
#include <linux/libnvdimm.h>
#include <linux/platform_device.h>
#include <linux/delay.h>
+#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
@@ -22,6 +23,44 @@
(1ul << ND_CMD_GET_CONFIG_DATA) | \
(1ul << ND_CMD_SET_CONFIG_DATA))
+/* DIMM health bitmap bitmap indicators */
+/* SCM device is unable to persist memory contents */
+#define PAPR_SCM_DIMM_UNARMED (1ULL << (63 - 0))
+/* SCM device failed to persist memory contents */
+#define PAPR_SCM_DIMM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
+/* SCM device contents are persisted from previous IPL */
+#define PAPR_SCM_DIMM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
+/* SCM device contents are not persisted from previous IPL */
+#define PAPR_SCM_DIMM_EMPTY (1ULL << (63 - 3))
+/* SCM device memory life remaining is critically low */
+#define PAPR_SCM_DIMM_HEALTH_CRITICAL (1ULL << (63 - 4))
+/* SCM device will be garded off next IPL due to failure */
+#define PAPR_SCM_DIMM_HEALTH_FATAL (1ULL << (63 - 5))
+/* SCM contents cannot persist due to current platform health status */
+#define PAPR_SCM_DIMM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
+/* SCM device is unable to persist memory contents in certain conditions */
+#define PAPR_SCM_DIMM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
+/* SCM device is encrypted */
+#define PAPR_SCM_DIMM_ENCRYPTED (1ULL << (63 - 8))
+/* SCM device has been scrubbed and locked */
+#define PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
+
+/* Bits status indicators for health bitmap indicating unarmed dimm */
+#define PAPR_SCM_DIMM_UNARMED_MASK (PAPR_SCM_DIMM_UNARMED | \
+ PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
+
+/* Bits status indicators for health bitmap indicating unflushed dimm */
+#define PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK (PAPR_SCM_DIMM_SHUTDOWN_DIRTY)
+
+/* Bits status indicators for health bitmap indicating unrestored dimm */
+#define PAPR_SCM_DIMM_BAD_RESTORE_MASK (PAPR_SCM_DIMM_EMPTY)
+
+/* Bit status indicators for smart event notification */
+#define PAPR_SCM_DIMM_SMART_EVENT_MASK (PAPR_SCM_DIMM_HEALTH_CRITICAL | \
+ PAPR_SCM_DIMM_HEALTH_FATAL | \
+ PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
+
+/* private struct associated with each region */
struct papr_scm_priv {
struct platform_device *pdev;
struct device_node *dn;
@@ -39,6 +78,15 @@ struct papr_scm_priv {
struct resource res;
struct nd_region *region;
struct nd_interleave_set nd_set;
+
+ /* Protect dimm health data from concurrent read/writes */
+ struct mutex health_mutex;
+
+ /* Last time the health information of the dimm was updated */
+ unsigned long lasthealth_jiffies;
+
+ /* Health information for the dimm */
+ u64 health_bitmap;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
return drc_pmem_bind(p);
}
+/*
+ * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
+ * health information.
+ */
+static int __drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ s64 rc;
+
+ /* issue the hcall */
+ rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
+ if (rc != H_SUCCESS) {
+ dev_err(&p->pdev->dev,
+ "Failed to query health information, Err:%lld\n", rc);
+ rc = -ENXIO;
+ goto out;
+ }
+
+ p->lasthealth_jiffies = jiffies;
+ p->health_bitmap = ret[0] & ret[1];
+
+ dev_dbg(&p->pdev->dev,
+ "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
+ ret[0], ret[1]);
+out:
+ return rc;
+}
+
+/* Min interval in seconds for assuming stable dimm health */
+#define MIN_HEALTH_QUERY_INTERVAL 60
+
+/* Query cached health info and if needed call drc_pmem_query_health */
+static int drc_pmem_query_health(struct papr_scm_priv *p)
+{
+ unsigned long cache_timeout;
+ s64 rc;
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ /* Jiffies offset for which the health data is assumed to be same */
+ cache_timeout = p->lasthealth_jiffies +
+ msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
+
+ /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
+ if (time_after(jiffies, cache_timeout))
+ rc = __drc_pmem_query_health(p);
+ else
+ /* Assume cached health data is valid */
+ rc = 0;
+
+ mutex_unlock(&p->health_mutex);
+ return rc;
+}
static int papr_scm_meta_get(struct papr_scm_priv *p,
struct nd_cmd_get_config_data_hdr *hdr)
@@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
return 0;
}
+static ssize_t flags_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct nvdimm *dimm = to_nvdimm(dev);
+ struct papr_scm_priv *p = nvdimm_provider_data(dimm);
+ struct seq_buf s;
+ u64 health;
+ int rc;
+
+ rc = drc_pmem_query_health(p);
+ if (rc)
+ return rc;
+
+ /* Copy health_bitmap locally, check masks & update out buffer */
+ health = READ_ONCE(p->health_bitmap);
+
+ seq_buf_init(&s, buf, PAGE_SIZE);
+ if (health & PAPR_SCM_DIMM_UNARMED_MASK)
+ seq_buf_printf(&s, "not_armed ");
+
+ if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
+ seq_buf_printf(&s, "flush_fail ");
+
+ if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
+ seq_buf_printf(&s, "restore_fail ");
+
+ if (health & PAPR_SCM_DIMM_ENCRYPTED)
+ seq_buf_printf(&s, "encrypted ");
+
+ if (health & PAPR_SCM_DIMM_SMART_EVENT_MASK)
+ seq_buf_printf(&s, "smart_notify ");
+
+ if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED)
+ seq_buf_printf(&s, "scrubbed locked ");
+
+ if (seq_buf_used(&s))
+ seq_buf_printf(&s, "\n");
+
+ return seq_buf_used(&s);
+}
+DEVICE_ATTR_RO(flags);
+
+/* papr_scm specific dimm attributes */
+static struct attribute *papr_scm_nd_attributes[] = {
+ &dev_attr_flags.attr,
+ NULL,
+};
+
+static struct attribute_group papr_scm_nd_attribute_group = {
+ .name = "papr",
+ .attrs = papr_scm_nd_attributes,
+};
+
+static const struct attribute_group *papr_scm_dimm_attr_groups[] = {
+ &papr_scm_nd_attribute_group,
+ NULL,
+};
+
static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
{
struct device *dev = &p->pdev->dev;
@@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
dimm_flags = 0;
set_bit(NDD_LABELING, &dimm_flags);
- p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
- PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
+ p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups,
+ dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
if (!p->nvdimm) {
dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
goto err;
@@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
if (!p)
return -ENOMEM;
+ /* Initialize the dimm mutex */
+ mutex_init(&p->health_mutex);
+
/* optional DT properties */
of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
--
2.26.2
Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
modules and add the command family to the white list of NVDIMM command
sets. Also advertise support for ND_CMD_CALL for the dimm
command mask and implement necessary scaffolding in the module to
handle ND_CMD_CALL ioctl and PDSM requests that we receive.
The layout of the PDSM request as we expect from libnvdimm/libndctl is
described in newly introduced uapi header 'papr_scm_pdsm.h' which
defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
to communicate the PDSM request via member
'nd_pkg_papr_scm->nd_command' and size of payload that need to be
sent/received for servicing the PDSM.
A new function is_cmd_valid() is implemented that reads the args to
papr_scm_ndctl() and performs sanity tests on them. A new function
papr_scm_service_pdsm() is introduced and is called from
papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
command from libnvdimm.
Cc: Dan Williams <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Aneesh Kumar K . V" <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* None
v6..v7 :
* Removed the re-definitions of __packed macro from papr_scm_pdsm.h
[Mpe].
* Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
* Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
[Mpe].
* Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
v5..v6 :
* Changed the usage of the term DSM to PDSM to distinguish it from the
ACPI term [ Dan Williams ]
* Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
to reflect the new terminology.
* Updated the patch description and title to reflect the new terminology.
* Squashed patch to introduce new command family in 'ndctl.h' with
this patch [ Dan Williams ]
* Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
[ Dan Williams ]
* Removed redundant license text from the papr_scm_psdm.h file.
[ Dan Williams ]
* s/envelop/envelope/ at various places [ Dan Williams ]
* Added '__packed' attribute to command package header to gaurd
against different compiler adding paddings between the fields.
[ Dan Williams]
* Converted various pr_debug to dev_debug [ Dan Williams ]
v4..v5 :
* None
v3..v4 :
* None
v2..v3 :
* Updated the patch prefix to 'ndctl/uapi' [Aneesh]
v1..v2 :
* None
---
arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 134 ++++++++++++++++++
arch/powerpc/platforms/pseries/papr_scm.c | 101 ++++++++++++-
include/uapi/linux/ndctl.h | 1 +
3 files changed, 230 insertions(+), 6 deletions(-)
create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
new file mode 100644
index 000000000000..671693439c1c
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl
+ *
+ * (C) Copyright IBM 2020
+ *
+ * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
+ */
+
+#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
+#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
+
+#include <linux/types.h>
+
+/*
+ * PDSM Envelope:
+ *
+ * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
+ * 'envelopes' which consists of a header and user-defined payload sections.
+ * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
+ * payload following it and offset of which relative to the struct is provided
+ * by 'nd_pdsm_cmd_pkg.payload_offset'. *
+ *
+ * +-------------+---------------------+---------------------------+
+ * | 64-Bytes | 8-Bytes | Max 184-Bytes |
+ * +-------------+---------------------+---------------------------+
+ * | nd_pdsm_cmd_pkg | |
+ * |-------------+ | |
+ * | nd_cmd_pkg | | |
+ * +-------------+---------------------+---------------------------+
+ * | nd_family | | |
+ * | nd_size_out | cmd_status | |
+ * | nd_size_in | payload_version | PAYLOAD |
+ * | nd_command | payload_offset -----> |
+ * | nd_fw_size | | |
+ * +-------------+---------------------+---------------------------+
+ *
+ * PDSM Header:
+ *
+ * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
+ * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
+ * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
+ * contained in 'struct nd_cmd_pkg', the header also has members following
+ * members:
+ *
+ * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
+ * 'payload_version' : (In/Out) Version number associated with the payload.
+ * 'payload_offset' : (In)Relative offset of payload from start of envelope.
+ *
+ * PDSM Payload:
+ *
+ * The layout of the PDSM Payload is defined by various structs shared between
+ * papr_scm and libndctl so that contents of payload can be interpreted. During
+ * servicing of a PDSM the papr_scm module will read input args from the payload
+ * field by casting its contents to an appropriate struct pointer based on the
+ * PDSM command. Similarly the output of servicing the PDSM command will be
+ * copied to the payload field using the same struct.
+ *
+ * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
+ * leaves around 184 bytes for the envelope payload (ignoring any padding that
+ * the compiler may silently introduce).
+ *
+ * Payload Version:
+ *
+ * A 'payload_version' field is present in PDSM header that indicates a specific
+ * version of the structure present in PDSM Payload for a given PDSM command.
+ * This provides backward compatibility in case the PDSM Payload structure
+ * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
+ *
+ * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
+ * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
+ * module when servicing the PDSM envelope checks the 'payload_version' and then
+ * uses 'payload struct version' == MIN('payload_version field',
+ * 'max payload-struct-version supported by papr_scm') to service the PDSM.
+ * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
+ * struct in returned 'payload_version' field.
+ *
+ * Libndctl on receiving the envelope back from papr_scm again checks the
+ * 'payload_version' field and based on it use the appropriate version dsm
+ * struct to parse the results.
+ *
+ * Backward Compatibility:
+ *
+ * Above scheme of exchanging different versioned PDSM struct between libndctl
+ * and papr_scm should provide backward compatibility until following two
+ * assumptions/conditions when defining new PDSM structs hold:
+ *
+ * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
+ *
+ * 1. T(X) is a proper subset of T(Y) if X > Y.
+ * i.e Each new version of PDSM struct should retain existing struct
+ * attributes from previous version
+ *
+ * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
+ * it should also support T(1), T(2)...T(X - 1).
+ * i.e When adding support for new version of a PDSM struct, libndctl
+ * and papr_scm should retain support of the existing PDSM struct
+ * version they support.
+ */
+
+/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
+struct nd_pdsm_cmd_pkg {
+ struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
+ __s32 cmd_status; /* Out: Sub-cmd status returned back */
+ __u16 payload_offset; /* In: offset from start of struct */
+ __u16 payload_version; /* In/Out: version of the payload */
+ __u8 payload[]; /* In/Out: Sub-cmd data buffer */
+} __packed;
+
+/*
+ * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
+ * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
+ */
+enum papr_scm_pdsm {
+ PAPR_SCM_PDSM_MIN = 0x0,
+ PAPR_SCM_PDSM_MAX,
+};
+
+/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
+static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
+{
+ return (struct nd_pdsm_cmd_pkg *) cmd;
+}
+
+/* Return the payload pointer for a given pcmd */
+static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
+{
+ if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
+ return NULL;
+ else
+ return (void *)((__u8 *) pcmd + pcmd->payload_offset);
+}
+
+#endif /* _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index 142636e1a59f..ed4b49a6f1e1 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -15,13 +15,15 @@
#include <linux/seq_buf.h>
#include <asm/plpar_wrappers.h>
+#include <asm/papr_scm_pdsm.h>
#define BIND_ANY_ADDR (~0ul)
#define PAPR_SCM_DIMM_CMD_MASK \
((1ul << ND_CMD_GET_CONFIG_SIZE) | \
(1ul << ND_CMD_GET_CONFIG_DATA) | \
- (1ul << ND_CMD_SET_CONFIG_DATA))
+ (1ul << ND_CMD_SET_CONFIG_DATA) | \
+ (1ul << ND_CMD_CALL))
/* DIMM health bitmap bitmap indicators */
/* SCM device is unable to persist memory contents */
@@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
return 0;
}
+/*
+ * Validate the inputs args to dimm-control function and return '0' if valid.
+ * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
+ */
+static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
+ unsigned int buf_len)
+{
+ unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
+ struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
+ struct papr_scm_priv *p;
+
+ /* Only dimm-specific calls are supported atm */
+ if (!nvdimm)
+ return -EINVAL;
+
+ /* get the provider date from struct nvdimm */
+ p = nvdimm_provider_data(nvdimm);
+
+ if (!test_bit(cmd, &cmd_mask)) {
+ dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
+ return -EINVAL;
+ } else if (cmd == ND_CMD_CALL) {
+
+ /* Verify the envelope package */
+ if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
+ buf_len);
+ return -EINVAL;
+ }
+
+ /* Verify that the PDSM family is valid */
+ if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR_SCM) {
+ dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
+ pkg->hdr.nd_family);
+ return -EINVAL;
+
+ }
+
+ /* We except a payload with all PDSM commands */
+ if (pdsm_cmd_to_payload(pkg) == NULL) {
+ dev_dbg(&p->pdev->dev,
+ "Empty payload for sub-command=0x%llx\n",
+ pkg->hdr.nd_command);
+ return -EINVAL;
+ }
+ }
+
+ /* Command looks valid */
+ return 0;
+}
+
+static int papr_scm_service_pdsm(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *call_pkg)
+{
+ /* unknown subcommands return error in packages */
+ if (call_pkg->hdr.nd_command <= PAPR_SCM_PDSM_MIN ||
+ call_pkg->hdr.nd_command >= PAPR_SCM_PDSM_MAX) {
+ dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
+ call_pkg->hdr.nd_command);
+ call_pkg->cmd_status = -EINVAL;
+ return 0;
+ }
+
+ /* Depending on the DSM command call appropriate service routine */
+ switch (call_pkg->hdr.nd_command) {
+ default:
+ dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
+ call_pkg->hdr.nd_command);
+ call_pkg->cmd_status = -ENOENT;
+ return 0;
+ }
+}
+
static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc)
{
struct nd_cmd_get_config_size *get_size_hdr;
struct papr_scm_priv *p;
+ struct nd_pdsm_cmd_pkg *call_pkg = NULL;
+ int rc;
- /* Only dimm-specific calls are supported atm */
- if (!nvdimm)
- return -EINVAL;
+ /* Use a local variable in case cmd_rc pointer is NULL */
+ if (cmd_rc == NULL)
+ cmd_rc = &rc;
+
+ *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
+ if (*cmd_rc) {
+ pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
+ return *cmd_rc;
+ }
p = nvdimm_provider_data(nvdimm);
@@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
*cmd_rc = papr_scm_meta_set(p, buf);
break;
+ case ND_CMD_CALL:
+ call_pkg = nd_to_pdsm_cmd_pkg(buf);
+ *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
+ break;
+
default:
- return -EINVAL;
+ dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
+ *cmd_rc = -EINVAL;
}
dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
- return 0;
+ return *cmd_rc;
}
static ssize_t flags_show(struct device *dev,
diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
index de5d90212409..99fb60600ef8 100644
--- a/include/uapi/linux/ndctl.h
+++ b/include/uapi/linux/ndctl.h
@@ -244,6 +244,7 @@ struct nd_cmd_pkg {
#define NVDIMM_FAMILY_HPE2 2
#define NVDIMM_FAMILY_MSFT 3
#define NVDIMM_FAMILY_HYPERV 4
+#define NVDIMM_FAMILY_PAPR_SCM 5
#define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
struct nd_cmd_pkg)
--
2.26.2
Add documentation to 'papr_hcalls.rst' describing the bitmap flags
that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
specification.
Cc: Dan Williams <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Aneesh Kumar K . V" <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* None
v6..v7:
* None
v5..v6
* New patch in the series
---
Documentation/powerpc/papr_hcalls.rst | 43 ++++++++++++++++++++++++---
1 file changed, 39 insertions(+), 4 deletions(-)
diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
index 3493631a60f8..9a5ba5eaf323 100644
--- a/Documentation/powerpc/papr_hcalls.rst
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -220,13 +220,48 @@ from the LPAR memory.
**H_SCM_HEALTH**
| Input: drcIndex
-| Out: *health-bitmap, health-bit-valid-bitmap*
+| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
| Return Value: *H_Success, H_Parameter, H_Hardware*
Given a DRC Index return the info on predictive failure and overall health of
-the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
-failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
-valid.
+the NVDIMM. The asserted bits in the health-bitmap indicate one or more states
+(described in table below) of the NVDIMM and health-bit-valid-bitmap indicate
+which bits in health-bitmap are valid.
+
+Health Bitmap Flags:
+
++------+-----------------------------------------------------------------------+
+| Bit | Definition |
++======+=======================================================================+
+| 00 | SCM device is unable to persist memory contents. |
+| | If the system is powered down, nothing will be saved. |
++------+-----------------------------------------------------------------------+
+| 01 | SCM device failed to persist memory contents. Either contents were not|
+| | saved successfully on power down or were not restored properly on |
+| | power up. |
++------+-----------------------------------------------------------------------+
+| 02 | SCM device contents are persisted from previous IPL. The data from |
+| | the last boot were successfully restored. |
++------+-----------------------------------------------------------------------+
+| 03 | SCM device contents are not persisted from previous IPL. There was no |
+| | data to restore from the last boot. |
++------+-----------------------------------------------------------------------+
+| 04 | SCM device memory life remaining is critically low |
++------+-----------------------------------------------------------------------+
+| 05 | SCM device will be garded off next IPL due to failure |
++------+-----------------------------------------------------------------------+
+| 06 | SCM contents cannot persist due to current platform health status. A |
+| | hardware failure may prevent data from being saved or restored. |
++------+-----------------------------------------------------------------------+
+| 07 | SCM device is unable to persist memory contents in certain conditions |
++------+-----------------------------------------------------------------------+
+| 08 | SCM device is encrypted |
++------+-----------------------------------------------------------------------+
+| 09 | SCM device has successfully completed a requested erase or secure |
+| | erase procedure. |
++------+-----------------------------------------------------------------------+
+|10:63 | Reserved / Unused |
++------+-----------------------------------------------------------------------+
**H_SCM_PERFORMANCE_STATS**
--
2.26.2
'seq_buf' provides a very useful abstraction for writing to a string
buffer without needing to worry about it over-flowing. However even
though the API has been stable for couple of years now its stills not
exported to external modules limiting its usage.
Hence this patch proposes update to 'seq_buf.c' to mark
seq_buf_printf() which is part of the seq_buf API to be exported to
external GPL modules. This symbol will be used in later parts of this
patchset to simplify content creation for a sysfs attribute.
Cc: Steven Rostedt <[email protected]>
Cc: Piotr Maziarz <[email protected]>
Cc: Cezary Rojewski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* Added ack from Steven Rostedt
v6..v7:
* New patch in the series
---
lib/seq_buf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/seq_buf.c b/lib/seq_buf.c
index 4e865d42ab03..707453f5d58e 100644
--- a/lib/seq_buf.c
+++ b/lib/seq_buf.c
@@ -91,6 +91,7 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
return ret;
}
+EXPORT_SYMBOL_GPL(seq_buf_printf);
#ifdef CONFIG_BINARY_PRINTF
/**
--
2.26.2
This patch implements support for PDSM request 'PAPR_SCM_PDSM_HEALTH'
that returns a newly introduced 'struct nd_papr_pdsm_health' instance
containing dimm health information back to user space in response to
ND_CMD_CALL. This functionality is implemented in newly introduced
papr_scm_get_health() that queries the scm-dimm health information and
then copies this information to the package payload whose layout is
defined by 'struct nd_papr_pdsm_health'.
The patch also introduces a new member 'struct papr_scm_priv.health'
thats an instance of 'struct nd_papr_pdsm_health' to cache the health
information of a nvdimm. As a result functions drc_pmem_query_health()
and flags_show() are updated to populate and use this new struct
instead of a u64 integer that was earlier used.
Cc: Dan Williams <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Aneesh Kumar K . V" <[email protected]>
Signed-off-by: Vaibhav Jain <[email protected]>
---
Changelog:
Resend:
* None
v6..v7:
* Updated flags_show() to use seq_buf_printf(). [Mpe]
* Updated papr_scm_get_health() to use newly introduced
__drc_pmem_query_health() bypassing the cache [Mpe].
v5..v6:
* Added attribute '__packed' to 'struct nd_papr_pdsm_health_v1' to
gaurd against possibility of different compilers adding different
paddings to the struct [ Dan Williams ]
* Updated 'struct nd_papr_pdsm_health_v1' to use __u8 instead of
'bool' and also updated drc_pmem_query_health() to take this into
account. [ Dan Williams ]
v4..v5:
* None
v3..v4:
* Call the DSM_PAPR_SCM_HEALTH service function from
papr_scm_service_dsm() instead of papr_scm_ndctl(). [Aneesh]
v2..v3:
* Updated struct nd_papr_scm_dimm_health_stat_v1 to use '__xx' types
as its exported to the userspace [Aneesh]
* Changed the constants DSM_PAPR_SCM_DIMM_XX indicating dimm health
from enum to #defines [Aneesh]
v1..v2:
* New patch in the series
---
arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 39 ++++++
arch/powerpc/platforms/pseries/papr_scm.c | 125 +++++++++++++++---
2 files changed, 147 insertions(+), 17 deletions(-)
diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
index 671693439c1c..db0cf550dabe 100644
--- a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
+++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
@@ -113,6 +113,7 @@ struct nd_pdsm_cmd_pkg {
*/
enum papr_scm_pdsm {
PAPR_SCM_PDSM_MIN = 0x0,
+ PAPR_SCM_PDSM_HEALTH,
PAPR_SCM_PDSM_MAX,
};
@@ -131,4 +132,42 @@ static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
return (void *)((__u8 *) pcmd + pcmd->payload_offset);
}
+/* Various scm-dimm health indicators */
+#define PAPR_PDSM_DIMM_HEALTHY 0
+#define PAPR_PDSM_DIMM_UNHEALTHY 1
+#define PAPR_PDSM_DIMM_CRITICAL 2
+#define PAPR_PDSM_DIMM_FATAL 3
+
+/*
+ * Struct exchanged between kernel & ndctl in for PAPR_SCM_PDSM_HEALTH
+ * Various flags indicate the health status of the dimm.
+ *
+ * dimm_unarmed : Dimm not armed. So contents wont persist.
+ * dimm_bad_shutdown : Previous shutdown did not persist contents.
+ * dimm_bad_restore : Contents from previous shutdown werent restored.
+ * dimm_scrubbed : Contents of the dimm have been scrubbed.
+ * dimm_locked : Contents of the dimm cant be modified until CEC reboot
+ * dimm_encrypted : Contents of dimm are encrypted.
+ * dimm_health : Dimm health indicator. One of PAPR_PDSM_DIMM_XXXX
+ */
+struct nd_papr_pdsm_health_v1 {
+ __u8 dimm_unarmed;
+ __u8 dimm_bad_shutdown;
+ __u8 dimm_bad_restore;
+ __u8 dimm_scrubbed;
+ __u8 dimm_locked;
+ __u8 dimm_encrypted;
+ __u16 dimm_health;
+} __packed;
+
+/*
+ * Typedef the current struct for dimm_health so that any application
+ * or kernel recompiled after introducing a new version automatically
+ * supports the new version.
+ */
+#define nd_papr_pdsm_health nd_papr_pdsm_health_v1
+
+/* Current version number for the dimm health struct */
+#define ND_PAPR_PDSM_HEALTH_VERSION 1
+
#endif /* _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index ed4b49a6f1e1..c59bf17ad054 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -88,7 +88,7 @@ struct papr_scm_priv {
unsigned long lasthealth_jiffies;
/* Health information for the dimm */
- u64 health_bitmap;
+ struct nd_papr_pdsm_health health;
};
static int drc_pmem_bind(struct papr_scm_priv *p)
@@ -201,6 +201,7 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
static int __drc_pmem_query_health(struct papr_scm_priv *p)
{
unsigned long ret[PLPAR_HCALL_BUFSIZE];
+ u64 health;
s64 rc;
/* issue the hcall */
@@ -208,18 +209,46 @@ static int __drc_pmem_query_health(struct papr_scm_priv *p)
if (rc != H_SUCCESS) {
dev_err(&p->pdev->dev,
"Failed to query health information, Err:%lld\n", rc);
- rc = -ENXIO;
- goto out;
+ return -ENXIO;
}
p->lasthealth_jiffies = jiffies;
- p->health_bitmap = ret[0] & ret[1];
+ health = ret[0] & ret[1];
dev_dbg(&p->pdev->dev,
"Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
ret[0], ret[1]);
-out:
- return rc;
+
+ memset(&p->health, 0, sizeof(p->health));
+
+ /* Check for various masks in bitmap and set the buffer */
+ if (health & PAPR_SCM_DIMM_UNARMED_MASK)
+ p->health.dimm_unarmed = 1;
+
+ if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
+ p->health.dimm_bad_shutdown = 1;
+
+ if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
+ p->health.dimm_bad_restore = 1;
+
+ if (health & PAPR_SCM_DIMM_ENCRYPTED)
+ p->health.dimm_encrypted = 1;
+
+ if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED) {
+ p->health.dimm_locked = 1;
+ p->health.dimm_scrubbed = 1;
+ }
+
+ if (health & PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
+ p->health.dimm_health = PAPR_PDSM_DIMM_UNHEALTHY;
+
+ if (health & PAPR_SCM_DIMM_HEALTH_CRITICAL)
+ p->health.dimm_health = PAPR_PDSM_DIMM_CRITICAL;
+
+ if (health & PAPR_SCM_DIMM_HEALTH_FATAL)
+ p->health.dimm_health = PAPR_PDSM_DIMM_FATAL;
+
+ return 0;
}
/* Min interval in seconds for assuming stable dimm health */
@@ -403,6 +432,58 @@ static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
return 0;
}
+/* Fetch the DIMM health info and populate it in provided package. */
+static int papr_scm_get_health(struct papr_scm_priv *p,
+ struct nd_pdsm_cmd_pkg *pkg)
+{
+ int rc;
+ size_t copysize = sizeof(p->health);
+
+ /* Ensure dimm health mutex is taken preventing concurrent access */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ goto out;
+
+ /* Always fetch upto date dimm health data ignoring cached values */
+ rc = __drc_pmem_query_health(p);
+ if (rc)
+ goto out_unlock;
+ /*
+ * If the requested payload version is greater than one we know
+ * about, return the payload version we know about and let
+ * caller/userspace handle.
+ */
+ if (pkg->payload_version > ND_PAPR_PDSM_HEALTH_VERSION)
+ pkg->payload_version = ND_PAPR_PDSM_HEALTH_VERSION;
+
+ if (pkg->hdr.nd_size_out < copysize) {
+ dev_dbg(&p->pdev->dev, "Truncated payload (%u). Expected (%lu)",
+ pkg->hdr.nd_size_out, copysize);
+ rc = -ENOSPC;
+ goto out_unlock;
+ }
+
+ dev_dbg(&p->pdev->dev, "Copying payload size=%lu version=0x%x\n",
+ copysize, pkg->payload_version);
+
+ /* Copy the health struct to the payload */
+ memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
+ pkg->hdr.nd_fw_size = copysize;
+
+out_unlock:
+ mutex_unlock(&p->health_mutex);
+
+out:
+ /*
+ * Put the error in out package and return success from function
+ * so that errors if any are propogated back to userspace.
+ */
+ pkg->cmd_status = rc;
+ dev_dbg(&p->pdev->dev, "completion code = %d\n", rc);
+
+ return 0;
+}
+
static int papr_scm_service_pdsm(struct papr_scm_priv *p,
struct nd_pdsm_cmd_pkg *call_pkg)
{
@@ -417,6 +498,9 @@ static int papr_scm_service_pdsm(struct papr_scm_priv *p,
/* Depending on the DSM command call appropriate service routine */
switch (call_pkg->hdr.nd_command) {
+ case PAPR_SCM_PDSM_HEALTH:
+ return papr_scm_get_health(p, call_pkg);
+
default:
dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
call_pkg->hdr.nd_command);
@@ -485,34 +569,41 @@ static ssize_t flags_show(struct device *dev,
struct nvdimm *dimm = to_nvdimm(dev);
struct papr_scm_priv *p = nvdimm_provider_data(dimm);
struct seq_buf s;
- u64 health;
int rc;
rc = drc_pmem_query_health(p);
if (rc)
return rc;
- /* Copy health_bitmap locally, check masks & update out buffer */
- health = READ_ONCE(p->health_bitmap);
-
seq_buf_init(&s, buf, PAGE_SIZE);
- if (health & PAPR_SCM_DIMM_UNARMED_MASK)
+
+ /* Protect concurrent modifications to papr_scm_priv */
+ rc = mutex_lock_interruptible(&p->health_mutex);
+ if (rc)
+ return rc;
+
+ if (p->health.dimm_unarmed)
seq_buf_printf(&s, "not_armed ");
- if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
+ if (p->health.dimm_bad_shutdown)
seq_buf_printf(&s, "flush_fail ");
- if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
+ if (p->health.dimm_bad_restore)
seq_buf_printf(&s, "restore_fail ");
- if (health & PAPR_SCM_DIMM_ENCRYPTED)
+ if (p->health.dimm_encrypted)
seq_buf_printf(&s, "encrypted ");
- if (health & PAPR_SCM_DIMM_SMART_EVENT_MASK)
+ if (p->health.dimm_health)
seq_buf_printf(&s, "smart_notify ");
- if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED)
- seq_buf_printf(&s, "scrubbed locked ");
+ if (p->health.dimm_scrubbed)
+ seq_buf_printf(&s, "scrubbed ");
+
+ if (p->health.dimm_locked)
+ seq_buf_printf(&s, "locked ");
+
+ mutex_unlock(&p->health_mutex);
if (seq_buf_used(&s))
seq_buf_printf(&s, "\n");
--
2.26.2
Vaibhav Jain <[email protected]> writes:
....
+
> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> +struct nd_pdsm_cmd_pkg {
> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
> + __u16 payload_offset; /* In: offset from start of struct */
> + __u16 payload_version; /* In/Out: version of the payload */
> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
> +} __packed;
that payload_offset can be avoided if we prevent userspace to user a
different variant of nd_pdsm_cmd_pkg which different header. We can keep
things simpler if we can always find payload at
nd_pdsm_cmd_pkg->payload.
> +
> +/*
> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> + */
> +enum papr_scm_pdsm {
> + PAPR_SCM_PDSM_MIN = 0x0,
> + PAPR_SCM_PDSM_MAX,
> +};
> +
> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> +{
> + return (struct nd_pdsm_cmd_pkg *) cmd;
> +}
> +
> +/* Return the payload pointer for a given pcmd */
> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> +{
> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> + return NULL;
> + else
> + return (void *)((__u8 *) pcmd + pcmd->payload_offset);
> +}
> +
we need to make sure userspace is not passing a wrong payload_offset.
and in the next patch you do
+ /* Copy the health struct to the payload */
+ memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
+ pkg->hdr.nd_fw_size = copysize;
+
All this can be simplified if you can keep payload at
nd_pdsm_cmd_pkg->payload.
If you still want to have the ability to extend the header, then added a
reserved field similar to nd_cmd_pkg.
-aneesh
Thanks for reviewing this patch Aneesh.
"Aneesh Kumar K.V" <[email protected]> writes:
> Vaibhav Jain <[email protected]> writes:
>
> ....
>
> +
>> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
>> +struct nd_pdsm_cmd_pkg {
>> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> + __u16 payload_offset; /* In: offset from start of struct */
>> + __u16 payload_version; /* In/Out: version of the payload */
>> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> +} __packed;
>
> that payload_offset can be avoided if we prevent userspace to user a
> different variant of nd_pdsm_cmd_pkg which different header. We can keep
> things simpler if we can always find payload at
> nd_pdsm_cmd_pkg->payload.
Had introduced this member to handle case where new fields are added to
'struct nd_pdsm_cmd_pkg' without having to break the ABI. But agree with
the point that you made later that this can be simplified by replacing
'payload_offset' with a set of reserved variables. Will address this in
next iteration of this patchset.
>
>> +
>> +/*
>> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
>> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> + */
>> +enum papr_scm_pdsm {
>> + PAPR_SCM_PDSM_MIN = 0x0,
>> + PAPR_SCM_PDSM_MAX,
>> +};
>> +
>> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
>> +{
>> + return (struct nd_pdsm_cmd_pkg *) cmd;
>> +}
>> +
>> +/* Return the payload pointer for a given pcmd */
>> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> +{
>> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> + return NULL;
>> + else
>> + return (void *)((__u8 *) pcmd + pcmd->payload_offset);
>> +}
>> +
>
> we need to make sure userspace is not passing a wrong payload_offset.
Agree, that this function should have more strict checking for
payload_offset field. However will be getting rid of
'payload_offset' all together in the next iteration as you previously
suggested.
> and in the next patch you do
>
> + /* Copy the health struct to the payload */
> + memcpy(pdsm_cmd_to_payload(pkg), &p->health, copysize);
> + pkg->hdr.nd_fw_size = copysize;
> +
Yes this is already being done in the patchset and changes proposed to
this pdsm_cmd_to_payload() should not impact other patches as
pdsm_cmd_to_payload() abstracts rest of the code from how to access the
payload.
> All this can be simplified if you can keep payload at
> nd_pdsm_cmd_pkg->payload.
>
> If you still want to have the ability to extend the header, then added a
> reserved field similar to nd_cmd_pkg.
>
Agree to this and will address this in V8.
>
> -aneesh
--
Cheers
~ Vaibhav
On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
> Implement support for fetching nvdimm health information via
> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
> of 64-bit big-endian integers, bitwise-and of which is then stored in
> 'struct papr_scm_priv' and subsequently partially exposed to
> user-space via newly introduced dimm specific attribute
> 'papr/flags'. Since the hcall is costly, the health information is
> cached and only re-queried, 60s after the previous successful hcall.
>
> The patch also adds a documentation text describing flags reported by
> the the new sysfs attribute 'papr/flags' is also introduced at
> Documentation/ABI/testing/sysfs-bus-papr-scm.
>
> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
> PAPR hcalls")
>
> Cc: Dan Williams <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: "Aneesh Kumar K . V" <[email protected]>
> Signed-off-by: Vaibhav Jain <[email protected]>
> ---
> Changelog:
>
> Resend:
> * None
>
> v6..v7 :
> * Used the exported buf_seq_printf() function to generate content for
> 'papr/flags'
> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
> and removed the papr_scm.h file [Mpe]
> * Some minor consistency issued in sysfs-bus-papr-scm
> documentation. [Mpe]
> * s/dimm_mutex/health_mutex/g [Mpe]
> * Split drc_pmem_query_health() into two function one of which takes
> care of caching and locking. [Mpe]
> * Fixed a local copy creation of dimm health information using
> READ_ONCE(). [Mpe]
>
> v5..v6 :
> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
> [Dan Williams]
> * Include documentation for 'papr/flags' attr [Dan Williams]
> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
> * Replaced two __be64 integers from papr_scm_priv to a single u64
> integer [Mpe]
> * Updated patch description to reflect the changes made in this
> version.
> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
> flags_show() [Dan Williams]
>
> v4..v5 :
> * None
>
> v3..v4 :
> * None
>
> v2..v3 :
> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
> NVDIMM unarmed [Aneesh]
>
> v1..v2 :
> * New patch in the series.
> ---
> Documentation/ABI/testing/sysfs-bus-papr-scm | 27 +++
> arch/powerpc/platforms/pseries/papr_scm.c | 169 ++++++++++++++++++-
> 2 files changed, 194 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm b/Documentation/ABI/testing/sysfs-bus-papr-scm
> new file mode 100644
> index 000000000000..6143d06072f1
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm
> @@ -0,0 +1,27 @@
> +What: /sys/bus/nd/devices/nmemX/papr/flags
> +Date: Apr, 2020
> +KernelVersion: v5.8
> +Contact: linuxppc-dev <[email protected]>, [email protected],
> +Description:
> + (RO) Report flags indicating various states of a
> + papr-scm NVDIMM device. Each flag maps to a one or
> + more bits set in the dimm-health-bitmap retrieved in
> + response to H_SCM_HEALTH hcall. The details of the bit
> + flags returned in response to this hcall is available
> + at 'Documentation/powerpc/papr_hcalls.rst' . Below are
> + the flags reported in this sysfs file:
> +
> + * "not_armed" : Indicates that NVDIMM contents will not
> + survive a power cycle.
> + * "flush_fail" : Indicates that NVDIMM contents
> + couldn't be flushed during last
> + shut-down event.
> + * "restore_fail": Indicates that NVDIMM contents
> + couldn't be restored during NVDIMM
> + initialization.
> + * "encrypted" : NVDIMM contents are encrypted.
> + * "smart_notify": There is health event for the NVDIMM.
> + * "scrubbed" : Indicating that contents of the
> + NVDIMM have been scrubbed.
> + * "locked" : Indicating that NVDIMM contents cant
> + be modified until next power cycle.
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index f35592423380..142636e1a59f 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -12,6 +12,7 @@
> #include <linux/libnvdimm.h>
> #include <linux/platform_device.h>
> #include <linux/delay.h>
> +#include <linux/seq_buf.h>
>
> #include <asm/plpar_wrappers.h>
>
> @@ -22,6 +23,44 @@
> (1ul << ND_CMD_GET_CONFIG_DATA) | \
> (1ul << ND_CMD_SET_CONFIG_DATA))
>
> +/* DIMM health bitmap bitmap indicators */
> +/* SCM device is unable to persist memory contents */
> +#define PAPR_SCM_DIMM_UNARMED (1ULL << (63 - 0))
> +/* SCM device failed to persist memory contents */
> +#define PAPR_SCM_DIMM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
> +/* SCM device contents are persisted from previous IPL */
> +#define PAPR_SCM_DIMM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
> +/* SCM device contents are not persisted from previous IPL */
> +#define PAPR_SCM_DIMM_EMPTY (1ULL << (63 - 3))
> +/* SCM device memory life remaining is critically low */
> +#define PAPR_SCM_DIMM_HEALTH_CRITICAL (1ULL << (63 - 4))
> +/* SCM device will be garded off next IPL due to failure */
> +#define PAPR_SCM_DIMM_HEALTH_FATAL (1ULL << (63 - 5))
> +/* SCM contents cannot persist due to current platform health status */
> +#define PAPR_SCM_DIMM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
> +/* SCM device is unable to persist memory contents in certain conditions */
> +#define PAPR_SCM_DIMM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
> +/* SCM device is encrypted */
> +#define PAPR_SCM_DIMM_ENCRYPTED (1ULL << (63 - 8))
> +/* SCM device has been scrubbed and locked */
> +#define PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
> +
> +/* Bits status indicators for health bitmap indicating unarmed dimm */
> +#define PAPR_SCM_DIMM_UNARMED_MASK (PAPR_SCM_DIMM_UNARMED | \
> + PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
> +
> +/* Bits status indicators for health bitmap indicating unflushed dimm */
> +#define PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK (PAPR_SCM_DIMM_SHUTDOWN_DIRTY)
> +
> +/* Bits status indicators for health bitmap indicating unrestored dimm */
> +#define PAPR_SCM_DIMM_BAD_RESTORE_MASK (PAPR_SCM_DIMM_EMPTY)
> +
> +/* Bit status indicators for smart event notification */
> +#define PAPR_SCM_DIMM_SMART_EVENT_MASK (PAPR_SCM_DIMM_HEALTH_CRITICAL | \
> + PAPR_SCM_DIMM_HEALTH_FATAL | \
> + PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
> +
> +/* private struct associated with each region */
> struct papr_scm_priv {
> struct platform_device *pdev;
> struct device_node *dn;
> @@ -39,6 +78,15 @@ struct papr_scm_priv {
> struct resource res;
> struct nd_region *region;
> struct nd_interleave_set nd_set;
> +
> + /* Protect dimm health data from concurrent read/writes */
> + struct mutex health_mutex;
> +
> + /* Last time the health information of the dimm was updated */
> + unsigned long lasthealth_jiffies;
> +
> + /* Health information for the dimm */
> + u64 health_bitmap;
I wonder if this should be typed big endian as you mention that it is in the
commit message?
> };
>
> static int drc_pmem_bind(struct papr_scm_priv *p)
> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
> return drc_pmem_bind(p);
> }
>
> +/*
> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
> + * health information.
> + */
> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
> +{
> + unsigned long ret[PLPAR_HCALL_BUFSIZE];
Is this exclusive to 64bit? Why not u64?
> + s64 rc;
plpar_hcall() returns long and this function returns int and rc is declared
s64?
Why not have them all be long to follow plpar_hcall?
> +
> + /* issue the hcall */
> + rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
> + if (rc != H_SUCCESS) {
> + dev_err(&p->pdev->dev,
> + "Failed to query health information, Err:%lld\n", rc);
> + rc = -ENXIO;
> + goto out;
> + }
> +
> + p->lasthealth_jiffies = jiffies;
> + p->health_bitmap = ret[0] & ret[1];
> +
> + dev_dbg(&p->pdev->dev,
> + "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> + ret[0], ret[1]);
> +out:
> + return rc;
> +}
> +
> +/* Min interval in seconds for assuming stable dimm health */
> +#define MIN_HEALTH_QUERY_INTERVAL 60
> +
> +/* Query cached health info and if needed call drc_pmem_query_health */
> +static int drc_pmem_query_health(struct papr_scm_priv *p)
> +{
> + unsigned long cache_timeout;
> + s64 rc;
> +
> + /* Protect concurrent modifications to papr_scm_priv */
> + rc = mutex_lock_interruptible(&p->health_mutex);
> + if (rc)
> + return rc;
> +
> + /* Jiffies offset for which the health data is assumed to be same */
> + cache_timeout = p->lasthealth_jiffies +
> + msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
> +
> + /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
> + if (time_after(jiffies, cache_timeout))
> + rc = __drc_pmem_query_health(p);
And back to s64 after returning int?
> + else
> + /* Assume cached health data is valid */
> + rc = 0;
> +
> + mutex_unlock(&p->health_mutex);
> + return rc;
> +}
>
> static int papr_scm_meta_get(struct papr_scm_priv *p,
> struct nd_cmd_get_config_data_hdr *hdr)
> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> return 0;
> }
>
> +static ssize_t flags_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct nvdimm *dimm = to_nvdimm(dev);
> + struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> + struct seq_buf s;
> + u64 health;
> + int rc;
> +
> + rc = drc_pmem_query_health(p);
and back to int...
Just make them long all through...
Ira
> + if (rc)
> + return rc;
> +
> + /* Copy health_bitmap locally, check masks & update out buffer */
> + health = READ_ONCE(p->health_bitmap);
> +
> + seq_buf_init(&s, buf, PAGE_SIZE);
> + if (health & PAPR_SCM_DIMM_UNARMED_MASK)
> + seq_buf_printf(&s, "not_armed ");
> +
> + if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
> + seq_buf_printf(&s, "flush_fail ");
> +
> + if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
> + seq_buf_printf(&s, "restore_fail ");
> +
> + if (health & PAPR_SCM_DIMM_ENCRYPTED)
> + seq_buf_printf(&s, "encrypted ");
> +
> + if (health & PAPR_SCM_DIMM_SMART_EVENT_MASK)
> + seq_buf_printf(&s, "smart_notify ");
> +
> + if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED)
> + seq_buf_printf(&s, "scrubbed locked ");
> +
> + if (seq_buf_used(&s))
> + seq_buf_printf(&s, "\n");
> +
> + return seq_buf_used(&s);
> +}
> +DEVICE_ATTR_RO(flags);
> +
> +/* papr_scm specific dimm attributes */
> +static struct attribute *papr_scm_nd_attributes[] = {
> + &dev_attr_flags.attr,
> + NULL,
> +};
> +
> +static struct attribute_group papr_scm_nd_attribute_group = {
> + .name = "papr",
> + .attrs = papr_scm_nd_attributes,
> +};
> +
> +static const struct attribute_group *papr_scm_dimm_attr_groups[] = {
> + &papr_scm_nd_attribute_group,
> + NULL,
> +};
> +
> static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> {
> struct device *dev = &p->pdev->dev;
> @@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> dimm_flags = 0;
> set_bit(NDD_LABELING, &dimm_flags);
>
> - p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
> - PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> + p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups,
> + dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> if (!p->nvdimm) {
> dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
> goto err;
> @@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
> if (!p)
> return -ENOMEM;
>
> + /* Initialize the dimm mutex */
> + mutex_init(&p->health_mutex);
> +
> /* optional DT properties */
> of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
>
> --
> 2.26.2
> _______________________________________________
> Linux-nvdimm mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
> modules and add the command family to the white list of NVDIMM command
> sets. Also advertise support for ND_CMD_CALL for the dimm
> command mask and implement necessary scaffolding in the module to
> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>
> The layout of the PDSM request as we expect from libnvdimm/libndctl is
> described in newly introduced uapi header 'papr_scm_pdsm.h' which
> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
> to communicate the PDSM request via member
> 'nd_pkg_papr_scm->nd_command' and size of payload that need to be
> sent/received for servicing the PDSM.
>
> A new function is_cmd_valid() is implemented that reads the args to
> papr_scm_ndctl() and performs sanity tests on them. A new function
> papr_scm_service_pdsm() is introduced and is called from
> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
> command from libnvdimm.
>
> Cc: Dan Williams <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: "Aneesh Kumar K . V" <[email protected]>
> Signed-off-by: Vaibhav Jain <[email protected]>
> ---
> Changelog:
>
> Resend:
> * None
>
> v6..v7 :
> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
> [Mpe].
> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
> [Mpe].
> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
>
> v5..v6 :
> * Changed the usage of the term DSM to PDSM to distinguish it from the
> ACPI term [ Dan Williams ]
> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
> to reflect the new terminology.
> * Updated the patch description and title to reflect the new terminology.
> * Squashed patch to introduce new command family in 'ndctl.h' with
> this patch [ Dan Williams ]
> * Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
> [ Dan Williams ]
> * Removed redundant license text from the papr_scm_psdm.h file.
> [ Dan Williams ]
> * s/envelop/envelope/ at various places [ Dan Williams ]
> * Added '__packed' attribute to command package header to gaurd
> against different compiler adding paddings between the fields.
> [ Dan Williams]
> * Converted various pr_debug to dev_debug [ Dan Williams ]
>
> v4..v5 :
> * None
>
> v3..v4 :
> * None
>
> v2..v3 :
> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>
> v1..v2 :
> * None
> ---
> arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 134 ++++++++++++++++++
> arch/powerpc/platforms/pseries/papr_scm.c | 101 ++++++++++++-
> include/uapi/linux/ndctl.h | 1 +
> 3 files changed, 230 insertions(+), 6 deletions(-)
> create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
>
> diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
> new file mode 100644
> index 000000000000..671693439c1c
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
> @@ -0,0 +1,134 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/*
> + * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl
> + *
> + * (C) Copyright IBM 2020
> + *
> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
> +#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
> +
> +#include <linux/types.h>
> +
> +/*
> + * PDSM Envelope:
> + *
> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
> + * 'envelopes' which consists of a header and user-defined payload sections.
> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
> + * payload following it and offset of which relative to the struct is provided
> + * by 'nd_pdsm_cmd_pkg.payload_offset'. *
> + *
> + * +-------------+---------------------+---------------------------+
> + * | 64-Bytes | 8-Bytes | Max 184-Bytes |
> + * +-------------+---------------------+---------------------------+
> + * | nd_pdsm_cmd_pkg | |
> + * |-------------+ | |
> + * | nd_cmd_pkg | | |
> + * +-------------+---------------------+---------------------------+
> + * | nd_family | | |
> + * | nd_size_out | cmd_status | |
> + * | nd_size_in | payload_version | PAYLOAD |
> + * | nd_command | payload_offset -----> |
> + * | nd_fw_size | | |
> + * +-------------+---------------------+---------------------------+
> + *
> + * PDSM Header:
> + *
> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
> + * contained in 'struct nd_cmd_pkg', the header also has members following
> + * members:
> + *
> + * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
> + * 'payload_version' : (In/Out) Version number associated with the payload.
> + * 'payload_offset' : (In)Relative offset of payload from start of envelope.
> + *
> + * PDSM Payload:
> + *
> + * The layout of the PDSM Payload is defined by various structs shared between
> + * papr_scm and libndctl so that contents of payload can be interpreted. During
> + * servicing of a PDSM the papr_scm module will read input args from the payload
> + * field by casting its contents to an appropriate struct pointer based on the
> + * PDSM command. Similarly the output of servicing the PDSM command will be
> + * copied to the payload field using the same struct.
> + *
> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
> + * leaves around 184 bytes for the envelope payload (ignoring any padding that
> + * the compiler may silently introduce).
> + *
> + * Payload Version:
> + *
> + * A 'payload_version' field is present in PDSM header that indicates a specific
> + * version of the structure present in PDSM Payload for a given PDSM command.
> + * This provides backward compatibility in case the PDSM Payload structure
> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
> + *
> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
> + * module when servicing the PDSM envelope checks the 'payload_version' and then
> + * uses 'payload struct version' == MIN('payload_version field',
> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
> + * struct in returned 'payload_version' field.
FWIW many people believe using a size rather than version is more sustainable.
It is expected that new payload structures are larger (more features) than the
previous payload structure.
I can't find references at the moment through.
What does payload_version provide us that the command size in/out does not?
> + *
> + * Libndctl on receiving the envelope back from papr_scm again checks the
> + * 'payload_version' field and based on it use the appropriate version dsm
> + * struct to parse the results.
> + *
> + * Backward Compatibility:
> + *
> + * Above scheme of exchanging different versioned PDSM struct between libndctl
> + * and papr_scm should provide backward compatibility until following two
> + * assumptions/conditions when defining new PDSM structs hold:
> + *
> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
> + *
> + * 1. T(X) is a proper subset of T(Y) if X > Y.
Proper superset? Or Y > X?
Ira
> + * i.e Each new version of PDSM struct should retain existing struct
> + * attributes from previous version
> + *
> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
> + * it should also support T(1), T(2)...T(X - 1).
> + * i.e When adding support for new version of a PDSM struct, libndctl
> + * and papr_scm should retain support of the existing PDSM struct
> + * version they support.
> + */
> +
> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
> +struct nd_pdsm_cmd_pkg {
> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
> + __u16 payload_offset; /* In: offset from start of struct */
> + __u16 payload_version; /* In/Out: version of the payload */
> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
> +} __packed;
> +
> +/*
> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
> + */
> +enum papr_scm_pdsm {
> + PAPR_SCM_PDSM_MIN = 0x0,
> + PAPR_SCM_PDSM_MAX,
> +};
> +
> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
> +{
> + return (struct nd_pdsm_cmd_pkg *) cmd;
> +}
> +
> +/* Return the payload pointer for a given pcmd */
> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
> +{
> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
> + return NULL;
> + else
> + return (void *)((__u8 *) pcmd + pcmd->payload_offset);
> +}
> +
> +#endif /* _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ */
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index 142636e1a59f..ed4b49a6f1e1 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -15,13 +15,15 @@
> #include <linux/seq_buf.h>
>
> #include <asm/plpar_wrappers.h>
> +#include <asm/papr_scm_pdsm.h>
>
> #define BIND_ANY_ADDR (~0ul)
>
> #define PAPR_SCM_DIMM_CMD_MASK \
> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
> (1ul << ND_CMD_GET_CONFIG_DATA) | \
> - (1ul << ND_CMD_SET_CONFIG_DATA))
> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
> + (1ul << ND_CMD_CALL))
>
> /* DIMM health bitmap bitmap indicators */
> /* SCM device is unable to persist memory contents */
> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
> return 0;
> }
>
> +/*
> + * Validate the inputs args to dimm-control function and return '0' if valid.
> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
> + */
> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> + unsigned int buf_len)
> +{
> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
> + struct papr_scm_priv *p;
> +
> + /* Only dimm-specific calls are supported atm */
> + if (!nvdimm)
> + return -EINVAL;
> +
> + /* get the provider date from struct nvdimm */
> + p = nvdimm_provider_data(nvdimm);
> +
> + if (!test_bit(cmd, &cmd_mask)) {
> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
> + return -EINVAL;
> + } else if (cmd == ND_CMD_CALL) {
> +
> + /* Verify the envelope package */
> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
> + buf_len);
> + return -EINVAL;
> + }
> +
> + /* Verify that the PDSM family is valid */
> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR_SCM) {
> + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
> + pkg->hdr.nd_family);
> + return -EINVAL;
> +
> + }
> +
> + /* We except a payload with all PDSM commands */
> + if (pdsm_cmd_to_payload(pkg) == NULL) {
> + dev_dbg(&p->pdev->dev,
> + "Empty payload for sub-command=0x%llx\n",
> + pkg->hdr.nd_command);
> + return -EINVAL;
> + }
> + }
> +
> + /* Command looks valid */
> + return 0;
> +}
> +
> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
> + struct nd_pdsm_cmd_pkg *call_pkg)
> +{
> + /* unknown subcommands return error in packages */
> + if (call_pkg->hdr.nd_command <= PAPR_SCM_PDSM_MIN ||
> + call_pkg->hdr.nd_command >= PAPR_SCM_PDSM_MAX) {
> + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
> + call_pkg->hdr.nd_command);
> + call_pkg->cmd_status = -EINVAL;
> + return 0;
> + }
> +
> + /* Depending on the DSM command call appropriate service routine */
> + switch (call_pkg->hdr.nd_command) {
> + default:
> + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
> + call_pkg->hdr.nd_command);
> + call_pkg->cmd_status = -ENOENT;
> + return 0;
> + }
> +}
> +
> static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> struct nvdimm *nvdimm, unsigned int cmd, void *buf,
> unsigned int buf_len, int *cmd_rc)
> {
> struct nd_cmd_get_config_size *get_size_hdr;
> struct papr_scm_priv *p;
> + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
> + int rc;
>
> - /* Only dimm-specific calls are supported atm */
> - if (!nvdimm)
> - return -EINVAL;
> + /* Use a local variable in case cmd_rc pointer is NULL */
> + if (cmd_rc == NULL)
> + cmd_rc = &rc;
> +
> + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
> + if (*cmd_rc) {
> + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
> + return *cmd_rc;
> + }
>
> p = nvdimm_provider_data(nvdimm);
>
> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> *cmd_rc = papr_scm_meta_set(p, buf);
> break;
>
> + case ND_CMD_CALL:
> + call_pkg = nd_to_pdsm_cmd_pkg(buf);
> + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
> + break;
> +
> default:
> - return -EINVAL;
> + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
> + *cmd_rc = -EINVAL;
> }
>
> dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>
> - return 0;
> + return *cmd_rc;
> }
>
> static ssize_t flags_show(struct device *dev,
> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
> index de5d90212409..99fb60600ef8 100644
> --- a/include/uapi/linux/ndctl.h
> +++ b/include/uapi/linux/ndctl.h
> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
> #define NVDIMM_FAMILY_HPE2 2
> #define NVDIMM_FAMILY_MSFT 3
> #define NVDIMM_FAMILY_HYPERV 4
> +#define NVDIMM_FAMILY_PAPR_SCM 5
>
> #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
> struct nd_cmd_pkg)
> --
> 2.26.2
> _______________________________________________
> Linux-nvdimm mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
s/seq_buf: Export seq_buf_printf() to external modules/
seq_buf: export seq_buf_printf/
Thanks for reviewing this this patch Ira. My responses below:
Ira Weiny <[email protected]> writes:
> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
>> Implement support for fetching nvdimm health information via
>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
>> of 64-bit big-endian integers, bitwise-and of which is then stored in
>> 'struct papr_scm_priv' and subsequently partially exposed to
>> user-space via newly introduced dimm specific attribute
>> 'papr/flags'. Since the hcall is costly, the health information is
>> cached and only re-queried, 60s after the previous successful hcall.
>>
>> The patch also adds a documentation text describing flags reported by
>> the the new sysfs attribute 'papr/flags' is also introduced at
>> Documentation/ABI/testing/sysfs-bus-papr-scm.
>>
>> [1] commit 58b278f568f0 ("powerpc: Provide initial documentation for
>> PAPR hcalls")
>>
>> Cc: Dan Williams <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: "Aneesh Kumar K . V" <[email protected]>
>> Signed-off-by: Vaibhav Jain <[email protected]>
>> ---
>> Changelog:
>>
>> Resend:
>> * None
>>
>> v6..v7 :
>> * Used the exported buf_seq_printf() function to generate content for
>> 'papr/flags'
>> * Moved the PAPR_SCM_DIMM_* bit-flags macro definitions to papr_scm.c
>> and removed the papr_scm.h file [Mpe]
>> * Some minor consistency issued in sysfs-bus-papr-scm
>> documentation. [Mpe]
>> * s/dimm_mutex/health_mutex/g [Mpe]
>> * Split drc_pmem_query_health() into two function one of which takes
>> care of caching and locking. [Mpe]
>> * Fixed a local copy creation of dimm health information using
>> READ_ONCE(). [Mpe]
>>
>> v5..v6 :
>> * Change the flags sysfs attribute from 'papr_flags' to 'papr/flags'
>> [Dan Williams]
>> * Include documentation for 'papr/flags' attr [Dan Williams]
>> * Change flag 'save_fail' to 'flush_fail' [Dan Williams]
>> * Caching of health bitmap to reduce expensive hcalls [Dan Williams]
>> * Removed usage of PPC_BIT from 'papr-scm.h' header [Mpe]
>> * Replaced two __be64 integers from papr_scm_priv to a single u64
>> integer [Mpe]
>> * Updated patch description to reflect the changes made in this
>> version.
>> * Removed avoidable usage of 'papr_scm_priv.dimm_mutex' from
>> flags_show() [Dan Williams]
>>
>> v4..v5 :
>> * None
>>
>> v3..v4 :
>> * None
>>
>> v2..v3 :
>> * Removed PAPR_SCM_DIMM_HEALTH_NON_CRITICAL as a condition for
>> NVDIMM unarmed [Aneesh]
>>
>> v1..v2 :
>> * New patch in the series.
>> ---
>> Documentation/ABI/testing/sysfs-bus-papr-scm | 27 +++
>> arch/powerpc/platforms/pseries/papr_scm.c | 169 ++++++++++++++++++-
>> 2 files changed, 194 insertions(+), 2 deletions(-)
>> create mode 100644 Documentation/ABI/testing/sysfs-bus-papr-scm
>>
>> diff --git a/Documentation/ABI/testing/sysfs-bus-papr-scm b/Documentation/ABI/testing/sysfs-bus-papr-scm
>> new file mode 100644
>> index 000000000000..6143d06072f1
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/sysfs-bus-papr-scm
>> @@ -0,0 +1,27 @@
>> +What: /sys/bus/nd/devices/nmemX/papr/flags
>> +Date: Apr, 2020
>> +KernelVersion: v5.8
>> +Contact: linuxppc-dev <[email protected]>, [email protected],
>> +Description:
>> + (RO) Report flags indicating various states of a
>> + papr-scm NVDIMM device. Each flag maps to a one or
>> + more bits set in the dimm-health-bitmap retrieved in
>> + response to H_SCM_HEALTH hcall. The details of the bit
>> + flags returned in response to this hcall is available
>> + at 'Documentation/powerpc/papr_hcalls.rst' . Below are
>> + the flags reported in this sysfs file:
>> +
>> + * "not_armed" : Indicates that NVDIMM contents will not
>> + survive a power cycle.
>> + * "flush_fail" : Indicates that NVDIMM contents
>> + couldn't be flushed during last
>> + shut-down event.
>> + * "restore_fail": Indicates that NVDIMM contents
>> + couldn't be restored during NVDIMM
>> + initialization.
>> + * "encrypted" : NVDIMM contents are encrypted.
>> + * "smart_notify": There is health event for the NVDIMM.
>> + * "scrubbed" : Indicating that contents of the
>> + NVDIMM have been scrubbed.
>> + * "locked" : Indicating that NVDIMM contents cant
>> + be modified until next power cycle.
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> index f35592423380..142636e1a59f 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -12,6 +12,7 @@
>> #include <linux/libnvdimm.h>
>> #include <linux/platform_device.h>
>> #include <linux/delay.h>
>> +#include <linux/seq_buf.h>
>>
>> #include <asm/plpar_wrappers.h>
>>
>> @@ -22,6 +23,44 @@
>> (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> (1ul << ND_CMD_SET_CONFIG_DATA))
>>
>> +/* DIMM health bitmap bitmap indicators */
>> +/* SCM device is unable to persist memory contents */
>> +#define PAPR_SCM_DIMM_UNARMED (1ULL << (63 - 0))
>> +/* SCM device failed to persist memory contents */
>> +#define PAPR_SCM_DIMM_SHUTDOWN_DIRTY (1ULL << (63 - 1))
>> +/* SCM device contents are persisted from previous IPL */
>> +#define PAPR_SCM_DIMM_SHUTDOWN_CLEAN (1ULL << (63 - 2))
>> +/* SCM device contents are not persisted from previous IPL */
>> +#define PAPR_SCM_DIMM_EMPTY (1ULL << (63 - 3))
>> +/* SCM device memory life remaining is critically low */
>> +#define PAPR_SCM_DIMM_HEALTH_CRITICAL (1ULL << (63 - 4))
>> +/* SCM device will be garded off next IPL due to failure */
>> +#define PAPR_SCM_DIMM_HEALTH_FATAL (1ULL << (63 - 5))
>> +/* SCM contents cannot persist due to current platform health status */
>> +#define PAPR_SCM_DIMM_HEALTH_UNHEALTHY (1ULL << (63 - 6))
>> +/* SCM device is unable to persist memory contents in certain conditions */
>> +#define PAPR_SCM_DIMM_HEALTH_NON_CRITICAL (1ULL << (63 - 7))
>> +/* SCM device is encrypted */
>> +#define PAPR_SCM_DIMM_ENCRYPTED (1ULL << (63 - 8))
>> +/* SCM device has been scrubbed and locked */
>> +#define PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED (1ULL << (63 - 9))
>> +
>> +/* Bits status indicators for health bitmap indicating unarmed dimm */
>> +#define PAPR_SCM_DIMM_UNARMED_MASK (PAPR_SCM_DIMM_UNARMED | \
>> + PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
>> +
>> +/* Bits status indicators for health bitmap indicating unflushed dimm */
>> +#define PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK (PAPR_SCM_DIMM_SHUTDOWN_DIRTY)
>> +
>> +/* Bits status indicators for health bitmap indicating unrestored dimm */
>> +#define PAPR_SCM_DIMM_BAD_RESTORE_MASK (PAPR_SCM_DIMM_EMPTY)
>> +
>> +/* Bit status indicators for smart event notification */
>> +#define PAPR_SCM_DIMM_SMART_EVENT_MASK (PAPR_SCM_DIMM_HEALTH_CRITICAL | \
>> + PAPR_SCM_DIMM_HEALTH_FATAL | \
>> + PAPR_SCM_DIMM_HEALTH_UNHEALTHY)
>> +
>> +/* private struct associated with each region */
>> struct papr_scm_priv {
>> struct platform_device *pdev;
>> struct device_node *dn;
>> @@ -39,6 +78,15 @@ struct papr_scm_priv {
>> struct resource res;
>> struct nd_region *region;
>> struct nd_interleave_set nd_set;
>> +
>> + /* Protect dimm health data from concurrent read/writes */
>> + struct mutex health_mutex;
>> +
>> + /* Last time the health information of the dimm was updated */
>> + unsigned long lasthealth_jiffies;
>> +
>> + /* Health information for the dimm */
>> + u64 health_bitmap;
>
> I wonder if this should be typed big endian as you mention that it is in the
> commit message?
This was discussed in an earlier review of the patch series at
https://lore.kernel.org/linux-nvdimm/[email protected]
Even though health bitmap is returned in big endian format (For ex
value 0xC00000000000000 indicates bits 0,1 set), its value is never
used. Instead only test for specific bits being set in the register is
done.
Hence using native cpu type instead of __be64 to store this value.
>
>> };
>>
>> static int drc_pmem_bind(struct papr_scm_priv *p)
>> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
>> return drc_pmem_bind(p);
>> }
>>
>> +/*
>> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
>> + * health information.
>> + */
>> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
>> +{
>> + unsigned long ret[PLPAR_HCALL_BUFSIZE];
>
> Is this exclusive to 64bit? Why not u64?
Yes this is specific to 64 bit as the array holds 64 bit register values
returned from PHYP. Can u64 but here that will be a departure from existing
practice within arch/powerpc code to use an unsigned long array to fetch
returned values for PHYP.
>
>> + s64 rc;
>
> plpar_hcall() returns long and this function returns int and rc is declared
> s64?
>
> Why not have them all be long to follow plpar_hcall?
Yes 'long' type is better suited for variable 'rc' and I will get it fixed.
But the value of variable 'rc' is never directly returned from this
function, we always return kernel error codes instead. Hence the
return type of this function is consistent.
>
>> +
>> + /* issue the hcall */
>> + rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
>> + if (rc != H_SUCCESS) {
>> + dev_err(&p->pdev->dev,
>> + "Failed to query health information, Err:%lld\n", rc);
>> + rc = -ENXIO;
>> + goto out;
>> + }
>> +
>> + p->lasthealth_jiffies = jiffies;
>> + p->health_bitmap = ret[0] & ret[1];
>> +
>> + dev_dbg(&p->pdev->dev,
>> + "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
>> + ret[0], ret[1]);
>> +out:
>> + return rc;
>> +}
>> +
>> +/* Min interval in seconds for assuming stable dimm health */
>> +#define MIN_HEALTH_QUERY_INTERVAL 60
>> +
>> +/* Query cached health info and if needed call drc_pmem_query_health */
>> +static int drc_pmem_query_health(struct papr_scm_priv *p)
>> +{
>> + unsigned long cache_timeout;
>> + s64 rc;
>> +
>> + /* Protect concurrent modifications to papr_scm_priv */
>> + rc = mutex_lock_interruptible(&p->health_mutex);
>> + if (rc)
>> + return rc;
>> +
>> + /* Jiffies offset for which the health data is assumed to be same */
>> + cache_timeout = p->lasthealth_jiffies +
>> + msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
>> +
>> + /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
>> + if (time_after(jiffies, cache_timeout))
>> + rc = __drc_pmem_query_health(p);
>
> And back to s64 after returning int?
Agree, will change 's64 rc' to 'int rc'.
>
>> + else
>> + /* Assume cached health data is valid */
>> + rc = 0;
>> +
>> + mutex_unlock(&p->health_mutex);
>> + return rc;
>> +}
>>
>> static int papr_scm_meta_get(struct papr_scm_priv *p,
>> struct nd_cmd_get_config_data_hdr *hdr)
>> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> return 0;
>> }
>>
>> +static ssize_t flags_show(struct device *dev,
>> + struct device_attribute *attr, char *buf)
>> +{
>> + struct nvdimm *dimm = to_nvdimm(dev);
>> + struct papr_scm_priv *p = nvdimm_provider_data(dimm);
>> + struct seq_buf s;
>> + u64 health;
>> + int rc;
>> +
>> + rc = drc_pmem_query_health(p);
>
> and back to int...
>
drc_pmem_query_health() returns an 'int' so the type of variable 'rc'
looks correct to me.
> Just make them long all through...
I think the return type for above all functions is 'int' with
an issue in drc_pmem_query_health() that you pointed out.
With that fixed the usage of 'int' return type for functions will become
consistent.
>
> Ira
>
>> + if (rc)
>> + return rc;
>> +
>> + /* Copy health_bitmap locally, check masks & update out buffer */
>> + health = READ_ONCE(p->health_bitmap);
>> +
>> + seq_buf_init(&s, buf, PAGE_SIZE);
>> + if (health & PAPR_SCM_DIMM_UNARMED_MASK)
>> + seq_buf_printf(&s, "not_armed ");
>> +
>> + if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
>> + seq_buf_printf(&s, "flush_fail ");
>> +
>> + if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
>> + seq_buf_printf(&s, "restore_fail ");
>> +
>> + if (health & PAPR_SCM_DIMM_ENCRYPTED)
>> + seq_buf_printf(&s, "encrypted ");
>> +
>> + if (health & PAPR_SCM_DIMM_SMART_EVENT_MASK)
>> + seq_buf_printf(&s, "smart_notify ");
>> +
>> + if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED)
>> + seq_buf_printf(&s, "scrubbed locked ");
>> +
>> + if (seq_buf_used(&s))
>> + seq_buf_printf(&s, "\n");
>> +
>> + return seq_buf_used(&s);
>> +}
>> +DEVICE_ATTR_RO(flags);
>> +
>> +/* papr_scm specific dimm attributes */
>> +static struct attribute *papr_scm_nd_attributes[] = {
>> + &dev_attr_flags.attr,
>> + NULL,
>> +};
>> +
>> +static struct attribute_group papr_scm_nd_attribute_group = {
>> + .name = "papr",
>> + .attrs = papr_scm_nd_attributes,
>> +};
>> +
>> +static const struct attribute_group *papr_scm_dimm_attr_groups[] = {
>> + &papr_scm_nd_attribute_group,
>> + NULL,
>> +};
>> +
>> static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>> {
>> struct device *dev = &p->pdev->dev;
>> @@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>> dimm_flags = 0;
>> set_bit(NDD_LABELING, &dimm_flags);
>>
>> - p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
>> - PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
>> + p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups,
>> + dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
>> if (!p->nvdimm) {
>> dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
>> goto err;
>> @@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
>> if (!p)
>> return -ENOMEM;
>>
>> + /* Initialize the dimm mutex */
>> + mutex_init(&p->health_mutex);
>> +
>> /* optional DT properties */
>> of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
>>
>> --
>> 2.26.2
>> _______________________________________________
>> Linux-nvdimm mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
--
Cheers
~ Vaibhav
Thanks for reviewing this patch Ira. My responses below:
Ira Weiny <[email protected]> writes:
> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>> modules and add the command family to the white list of NVDIMM command
>> sets. Also advertise support for ND_CMD_CALL for the dimm
>> command mask and implement necessary scaffolding in the module to
>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>>
>> The layout of the PDSM request as we expect from libnvdimm/libndctl is
>> described in newly introduced uapi header 'papr_scm_pdsm.h' which
>> defines a new 'struct nd_pdsm_cmd_pkg' header. This header is used
>> to communicate the PDSM request via member
>> 'nd_pkg_papr_scm->nd_command' and size of payload that need to be
>> sent/received for servicing the PDSM.
>>
>> A new function is_cmd_valid() is implemented that reads the args to
>> papr_scm_ndctl() and performs sanity tests on them. A new function
>> papr_scm_service_pdsm() is introduced and is called from
>> papr_scm_ndctl() in case of a PDSM request is received via ND_CMD_CALL
>> command from libnvdimm.
>>
>> Cc: Dan Williams <[email protected]>
>> Cc: Michael Ellerman <[email protected]>
>> Cc: "Aneesh Kumar K . V" <[email protected]>
>> Signed-off-by: Vaibhav Jain <[email protected]>
>> ---
>> Changelog:
>>
>> Resend:
>> * None
>>
>> v6..v7 :
>> * Removed the re-definitions of __packed macro from papr_scm_pdsm.h
>> [Mpe].
>> * Removed the usage of __KERNEL__ macros in papr_scm_pdsm.h [Mpe].
>> * Removed macros that were unused in papr_scm.c from papr_scm_pdsm.h
>> [Mpe].
>> * Made functions defined in papr_scm_pdsm.h as static inline. [Mpe]
>>
>> v5..v6 :
>> * Changed the usage of the term DSM to PDSM to distinguish it from the
>> ACPI term [ Dan Williams ]
>> * Renamed papr_scm_dsm.h to papr_scm_pdsm.h and updated various struct
>> to reflect the new terminology.
>> * Updated the patch description and title to reflect the new terminology.
>> * Squashed patch to introduce new command family in 'ndctl.h' with
>> this patch [ Dan Williams ]
>> * Updated the papr_scm_pdsm method starting index from 0x10000 to 0x0
>> [ Dan Williams ]
>> * Removed redundant license text from the papr_scm_psdm.h file.
>> [ Dan Williams ]
>> * s/envelop/envelope/ at various places [ Dan Williams ]
>> * Added '__packed' attribute to command package header to gaurd
>> against different compiler adding paddings between the fields.
>> [ Dan Williams]
>> * Converted various pr_debug to dev_debug [ Dan Williams ]
>>
>> v4..v5 :
>> * None
>>
>> v3..v4 :
>> * None
>>
>> v2..v3 :
>> * Updated the patch prefix to 'ndctl/uapi' [Aneesh]
>>
>> v1..v2 :
>> * None
>> ---
>> arch/powerpc/include/uapi/asm/papr_scm_pdsm.h | 134 ++++++++++++++++++
>> arch/powerpc/platforms/pseries/papr_scm.c | 101 ++++++++++++-
>> include/uapi/linux/ndctl.h | 1 +
>> 3 files changed, 230 insertions(+), 6 deletions(-)
>> create mode 100644 arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
>>
>> diff --git a/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
>> new file mode 100644
>> index 000000000000..671693439c1c
>> --- /dev/null
>> +++ b/arch/powerpc/include/uapi/asm/papr_scm_pdsm.h
>> @@ -0,0 +1,134 @@
>> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
>> +/*
>> + * PAPR-SCM Dimm specific methods (PDSM) and structs for libndctl
>> + *
>> + * (C) Copyright IBM 2020
>> + *
>> + * Author: Vaibhav Jain <vaibhav at linux.ibm.com>
>> + */
>> +
>> +#ifndef _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
>> +#define _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_
>> +
>> +#include <linux/types.h>
>> +
>> +/*
>> + * PDSM Envelope:
>> + *
>> + * The ioctl ND_CMD_CALL transfers data between user-space and kernel via
>> + * 'envelopes' which consists of a header and user-defined payload sections.
>> + * The header is described by 'struct nd_pdsm_cmd_pkg' which expects a
>> + * payload following it and offset of which relative to the struct is provided
>> + * by 'nd_pdsm_cmd_pkg.payload_offset'. *
>> + *
>> + * +-------------+---------------------+---------------------------+
>> + * | 64-Bytes | 8-Bytes | Max 184-Bytes |
>> + * +-------------+---------------------+---------------------------+
>> + * | nd_pdsm_cmd_pkg | |
>> + * |-------------+ | |
>> + * | nd_cmd_pkg | | |
>> + * +-------------+---------------------+---------------------------+
>> + * | nd_family | | |
>> + * | nd_size_out | cmd_status | |
>> + * | nd_size_in | payload_version | PAYLOAD |
>> + * | nd_command | payload_offset -----> |
>> + * | nd_fw_size | | |
>> + * +-------------+---------------------+---------------------------+
>> + *
>> + * PDSM Header:
>> + *
>> + * The header is defined as 'struct nd_pdsm_cmd_pkg' which embeds a
>> + * 'struct nd_cmd_pkg' instance. The PDSM command is assigned to member
>> + * 'nd_cmd_pkg.nd_command'. Apart from size information of the envelope which is
>> + * contained in 'struct nd_cmd_pkg', the header also has members following
>> + * members:
>> + *
>> + * 'cmd_status' : (Out) Errors if any encountered while servicing PDSM.
>> + * 'payload_version' : (In/Out) Version number associated with the payload.
>> + * 'payload_offset' : (In)Relative offset of payload from start of envelope.
>> + *
>> + * PDSM Payload:
>> + *
>> + * The layout of the PDSM Payload is defined by various structs shared between
>> + * papr_scm and libndctl so that contents of payload can be interpreted. During
>> + * servicing of a PDSM the papr_scm module will read input args from the payload
>> + * field by casting its contents to an appropriate struct pointer based on the
>> + * PDSM command. Similarly the output of servicing the PDSM command will be
>> + * copied to the payload field using the same struct.
>> + *
>> + * 'libnvdimm' enforces a hard limit of 256 bytes on the envelope size, which
>> + * leaves around 184 bytes for the envelope payload (ignoring any padding that
>> + * the compiler may silently introduce).
>> + *
>> + * Payload Version:
>> + *
>> + * A 'payload_version' field is present in PDSM header that indicates a specific
>> + * version of the structure present in PDSM Payload for a given PDSM command.
>> + * This provides backward compatibility in case the PDSM Payload structure
>> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>> + *
>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>> + * uses 'payload struct version' == MIN('payload_version field',
>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>> + * struct in returned 'payload_version' field.
>
> FWIW many people believe using a size rather than version is more sustainable.
> It is expected that new payload structures are larger (more features) than the
> previous payload structure.
>
> I can't find references at the moment through.
>
> What does payload_version provide us that the command size in/out does
> not?
Taking constrains 1 & 2 mentioned below in section "Backward
Compatibility" into account, there should be a 1:1 mapping between set of
valid 'payload_versions' and set of corrosponding in/out sizes.
However its much more intutive and convenient to figure out the struct
type to use from version number rather than sizes. For example:
struct v1 { u64 a; }; sizeof(v1) == 8
struct v2 { u64 a, b; }; sizeof(v2) == 16
With version numbers its easy to figure out which type to use. However
with in/out size an extra lookup is needed to identify the type to be used.
>
>> + *
>> + * Libndctl on receiving the envelope back from papr_scm again checks the
>> + * 'payload_version' field and based on it use the appropriate version dsm
>> + * struct to parse the results.
>> + *
>> + * Backward Compatibility:
>> + *
>> + * Above scheme of exchanging different versioned PDSM struct between libndctl
>> + * and papr_scm should provide backward compatibility until following two
>> + * assumptions/conditions when defining new PDSM structs hold:
>> + *
>> + * Let T(X) = { set of attributes in PDSM struct 'T' versioned X }
>> + *
>> + * 1. T(X) is a proper subset of T(Y) if X > Y.
>
> Proper superset? Or Y > X?
>
Good catch. This should be Y > X. Will get this fixed.
> Ira
>
>> + * i.e Each new version of PDSM struct should retain existing struct
>> + * attributes from previous version
>> + *
>> + * 2. If an entity (libndctl or papr_scm) supports a PDSM struct T(X) then
>> + * it should also support T(1), T(2)...T(X - 1).
>> + * i.e When adding support for new version of a PDSM struct, libndctl
>> + * and papr_scm should retain support of the existing PDSM struct
>> + * version they support.
>> + */
>> +
>> +/* Papr-scm-header + payload expected with ND_CMD_CALL ioctl from libnvdimm */
>> +struct nd_pdsm_cmd_pkg {
>> + struct nd_cmd_pkg hdr; /* Package header containing sub-cmd */
>> + __s32 cmd_status; /* Out: Sub-cmd status returned back */
>> + __u16 payload_offset; /* In: offset from start of struct */
>> + __u16 payload_version; /* In/Out: version of the payload */
>> + __u8 payload[]; /* In/Out: Sub-cmd data buffer */
>> +} __packed;
>> +
>> +/*
>> + * Methods to be embedded in ND_CMD_CALL request. These are sent to the kernel
>> + * via 'nd_pdsm_cmd_pkg.hdr.nd_command' member of the ioctl struct
>> + */
>> +enum papr_scm_pdsm {
>> + PAPR_SCM_PDSM_MIN = 0x0,
>> + PAPR_SCM_PDSM_MAX,
>> +};
>> +
>> +/* Convert a libnvdimm nd_cmd_pkg to pdsm specific pkg */
>> +static inline struct nd_pdsm_cmd_pkg *nd_to_pdsm_cmd_pkg(struct nd_cmd_pkg *cmd)
>> +{
>> + return (struct nd_pdsm_cmd_pkg *) cmd;
>> +}
>> +
>> +/* Return the payload pointer for a given pcmd */
>> +static inline void *pdsm_cmd_to_payload(struct nd_pdsm_cmd_pkg *pcmd)
>> +{
>> + if (pcmd->hdr.nd_size_in == 0 && pcmd->hdr.nd_size_out == 0)
>> + return NULL;
>> + else
>> + return (void *)((__u8 *) pcmd + pcmd->payload_offset);
>> +}
>> +
>> +#endif /* _UAPI_ASM_POWERPC_PAPR_SCM_PDSM_H_ */
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>> index 142636e1a59f..ed4b49a6f1e1 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -15,13 +15,15 @@
>> #include <linux/seq_buf.h>
>>
>> #include <asm/plpar_wrappers.h>
>> +#include <asm/papr_scm_pdsm.h>
>>
>> #define BIND_ANY_ADDR (~0ul)
>>
>> #define PAPR_SCM_DIMM_CMD_MASK \
>> ((1ul << ND_CMD_GET_CONFIG_SIZE) | \
>> (1ul << ND_CMD_GET_CONFIG_DATA) | \
>> - (1ul << ND_CMD_SET_CONFIG_DATA))
>> + (1ul << ND_CMD_SET_CONFIG_DATA) | \
>> + (1ul << ND_CMD_CALL))
>>
>> /* DIMM health bitmap bitmap indicators */
>> /* SCM device is unable to persist memory contents */
>> @@ -350,16 +352,97 @@ static int papr_scm_meta_set(struct papr_scm_priv *p,
>> return 0;
>> }
>>
>> +/*
>> + * Validate the inputs args to dimm-control function and return '0' if valid.
>> + * This also does initial sanity validation to ND_CMD_CALL sub-command packages.
>> + */
>> +static int is_cmd_valid(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> + unsigned int buf_len)
>> +{
>> + unsigned long cmd_mask = PAPR_SCM_DIMM_CMD_MASK;
>> + struct nd_pdsm_cmd_pkg *pkg = nd_to_pdsm_cmd_pkg(buf);
>> + struct papr_scm_priv *p;
>> +
>> + /* Only dimm-specific calls are supported atm */
>> + if (!nvdimm)
>> + return -EINVAL;
>> +
>> + /* get the provider date from struct nvdimm */
>> + p = nvdimm_provider_data(nvdimm);
>> +
>> + if (!test_bit(cmd, &cmd_mask)) {
>> + dev_dbg(&p->pdev->dev, "Unsupported cmd=%u\n", cmd);
>> + return -EINVAL;
>> + } else if (cmd == ND_CMD_CALL) {
>> +
>> + /* Verify the envelope package */
>> + if (!buf || buf_len < sizeof(struct nd_pdsm_cmd_pkg)) {
>> + dev_dbg(&p->pdev->dev, "Invalid pkg size=%u\n",
>> + buf_len);
>> + return -EINVAL;
>> + }
>> +
>> + /* Verify that the PDSM family is valid */
>> + if (pkg->hdr.nd_family != NVDIMM_FAMILY_PAPR_SCM) {
>> + dev_dbg(&p->pdev->dev, "Invalid pkg family=0x%llx\n",
>> + pkg->hdr.nd_family);
>> + return -EINVAL;
>> +
>> + }
>> +
>> + /* We except a payload with all PDSM commands */
>> + if (pdsm_cmd_to_payload(pkg) == NULL) {
>> + dev_dbg(&p->pdev->dev,
>> + "Empty payload for sub-command=0x%llx\n",
>> + pkg->hdr.nd_command);
>> + return -EINVAL;
>> + }
>> + }
>> +
>> + /* Command looks valid */
>> + return 0;
>> +}
>> +
>> +static int papr_scm_service_pdsm(struct papr_scm_priv *p,
>> + struct nd_pdsm_cmd_pkg *call_pkg)
>> +{
>> + /* unknown subcommands return error in packages */
>> + if (call_pkg->hdr.nd_command <= PAPR_SCM_PDSM_MIN ||
>> + call_pkg->hdr.nd_command >= PAPR_SCM_PDSM_MAX) {
>> + dev_dbg(&p->pdev->dev, "Invalid PDSM request 0x%llx\n",
>> + call_pkg->hdr.nd_command);
>> + call_pkg->cmd_status = -EINVAL;
>> + return 0;
>> + }
>> +
>> + /* Depending on the DSM command call appropriate service routine */
>> + switch (call_pkg->hdr.nd_command) {
>> + default:
>> + dev_dbg(&p->pdev->dev, "Unsupported PDSM request 0x%llx\n",
>> + call_pkg->hdr.nd_command);
>> + call_pkg->cmd_status = -ENOENT;
>> + return 0;
>> + }
>> +}
>> +
>> static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> struct nvdimm *nvdimm, unsigned int cmd, void *buf,
>> unsigned int buf_len, int *cmd_rc)
>> {
>> struct nd_cmd_get_config_size *get_size_hdr;
>> struct papr_scm_priv *p;
>> + struct nd_pdsm_cmd_pkg *call_pkg = NULL;
>> + int rc;
>>
>> - /* Only dimm-specific calls are supported atm */
>> - if (!nvdimm)
>> - return -EINVAL;
>> + /* Use a local variable in case cmd_rc pointer is NULL */
>> + if (cmd_rc == NULL)
>> + cmd_rc = &rc;
>> +
>> + *cmd_rc = is_cmd_valid(nvdimm, cmd, buf, buf_len);
>> + if (*cmd_rc) {
>> + pr_debug("Invalid cmd=0x%x. Err=%d\n", cmd, *cmd_rc);
>> + return *cmd_rc;
>> + }
>>
>> p = nvdimm_provider_data(nvdimm);
>>
>> @@ -381,13 +464,19 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>> *cmd_rc = papr_scm_meta_set(p, buf);
>> break;
>>
>> + case ND_CMD_CALL:
>> + call_pkg = nd_to_pdsm_cmd_pkg(buf);
>> + *cmd_rc = papr_scm_service_pdsm(p, call_pkg);
>> + break;
>> +
>> default:
>> - return -EINVAL;
>> + dev_dbg(&p->pdev->dev, "Unknown command = %d\n", cmd);
>> + *cmd_rc = -EINVAL;
>> }
>>
>> dev_dbg(&p->pdev->dev, "returned with cmd_rc = %d\n", *cmd_rc);
>>
>> - return 0;
>> + return *cmd_rc;
>> }
>>
>> static ssize_t flags_show(struct device *dev,
>> diff --git a/include/uapi/linux/ndctl.h b/include/uapi/linux/ndctl.h
>> index de5d90212409..99fb60600ef8 100644
>> --- a/include/uapi/linux/ndctl.h
>> +++ b/include/uapi/linux/ndctl.h
>> @@ -244,6 +244,7 @@ struct nd_cmd_pkg {
>> #define NVDIMM_FAMILY_HPE2 2
>> #define NVDIMM_FAMILY_MSFT 3
>> #define NVDIMM_FAMILY_HYPERV 4
>> +#define NVDIMM_FAMILY_PAPR_SCM 5
>>
>> #define ND_IOCTL_CALL _IOWR(ND_IOCTL, ND_CMD_CALL,\
>> struct nd_cmd_pkg)
>> --
>> 2.26.2
>> _______________________________________________
>> Linux-nvdimm mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
--
Cheers
~ Vaibhav
Ira Weiny <[email protected]> writes:
> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>> modules and add the command family to the white list of NVDIMM command
>> sets. Also advertise support for ND_CMD_CALL for the dimm
>> command mask and implement necessary scaffolding in the module to
>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
...
>> + *
>> + * Payload Version:
>> + *
>> + * A 'payload_version' field is present in PDSM header that indicates a specific
>> + * version of the structure present in PDSM Payload for a given PDSM command.
>> + * This provides backward compatibility in case the PDSM Payload structure
>> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>> + *
>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>> + * uses 'payload struct version' == MIN('payload_version field',
>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>> + * struct in returned 'payload_version' field.
>
> FWIW many people believe using a size rather than version is more sustainable.
> It is expected that new payload structures are larger (more features) than the
> previous payload structure.
>
> I can't find references at the moment through.
I think clone_args is a good modern example:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88
cheers
Vaibhav Jain <[email protected]> writes:
> Thanks for reviewing this this patch Ira. My responses below:
> Ira Weiny <[email protected]> writes:
>> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
>>> Implement support for fetching nvdimm health information via
>>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
>>> of 64-bit big-endian integers, bitwise-and of which is then stored in
>>> 'struct papr_scm_priv' and subsequently partially exposed to
>>> user-space via newly introduced dimm specific attribute
>>> 'papr/flags'. Since the hcall is costly, the health information is
>>> cached and only re-queried, 60s after the previous successful hcall.
...
>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>>> index f35592423380..142636e1a59f 100644
>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>>> @@ -39,6 +78,15 @@ struct papr_scm_priv {
>>> struct resource res;
>>> struct nd_region *region;
>>> struct nd_interleave_set nd_set;
>>> +
>>> + /* Protect dimm health data from concurrent read/writes */
>>> + struct mutex health_mutex;
>>> +
>>> + /* Last time the health information of the dimm was updated */
>>> + unsigned long lasthealth_jiffies;
>>> +
>>> + /* Health information for the dimm */
>>> + u64 health_bitmap;
>>
>> I wonder if this should be typed big endian as you mention that it is in the
>> commit message?
> This was discussed in an earlier review of the patch series at
> https://lore.kernel.org/linux-nvdimm/[email protected]
>
> Even though health bitmap is returned in big endian format (For ex
> value 0xC00000000000000 indicates bits 0,1 set), its value is never
> used. Instead only test for specific bits being set in the register is
> done.
This has already caused a lot of confusion, so let me try and clear it
up. I will probably fail :)
The value is not big endian.
It's returned in a GPR (a register), from the hypervisor. The ordering
of bytes in a register is not dependent on what endian we're executing
in.
It's true that the hypervisor will have been running big endian, and
when it returns to us we will now be running little endian. But the
value is unchanged, it was 0xC00000000000000 in the GPR while the HV was
running and it's still 0xC00000000000000 when we return to Linux. You
can see this in mambo, see below for an example.
_However_, the specification of the bits in the bitmap value uses MSB 0
ordering, as is traditional for IBM documentation. That means the most
significant bit, aka. the left most bit, is called "bit 0".
See: https://en.wikipedia.org/wiki/Bit_numbering#MSB_0_bit_numbering
That is the opposite numbering from what most people use, and in
particular what most code in Linux uses, which is that bit 0 is the
least significant bit.
Which is where the confusion comes in. It's not that the bytes are
returned in a different order, it's that the bits are numbered
differently in the IBM documentation.
The way to fix this kind of thing is to read the docs, and convert all
the bits into correct numbering (LSB=0), and then throw away the docs ;)
cheers
In mambo we can set a breakpoint just before the kernel enters skiboot,
towards the end of __opal_call. The kernel is running LE and skiboot
runs BE.
systemsim-p9 [~/skiboot/skiboot/external/mambo] b 0xc0000000000c1744
breakpoint set at [0:0:0]: 0xc0000000000c1744 (0x00000000000C1744) Enc:0x2402004C : hrfid
Then run:
systemsim-p9 [~/skiboot/skiboot/external/mambo] c
[0:0:0]: 0xC0000000000C1744 (0x00000000000C1744) Enc:0x2402004C : hrfid
INFO: 121671618: (121671618): ** Execution stopped: user (tcl), **
121671618: ** finished running 121671618 instructions **
And we stop there, on an hrfid that we haven't executed yet.
We can print r0, to see the OPAL token:
systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0
0x0000000000000019
ie. we're calling OPAL_CONSOLE_WRITE_BUFFER_SPACE (25).
And we can print the MSR:
systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
0x9000000002001033
64-bit mode (SF): 0x1 [64-bit mode]
Hypervisor State (HV): 0x1
Vector Available (VEC): 0x1
Machine Check Interrupt Enable (ME): 0x1
Instruction Relocate (IR): 0x1
Data Relocate (DR): 0x1
Recoverable Interrupt (RI): 0x1
Little-Endian Mode (LE): 0x1 [little-endian]
ie. we're little endian.
We then step one instruction:
systemsim-p9 [~/skiboot/skiboot/external/mambo] s
[0:0:0]: 0x0000000030002BF0 (0x0000000030002BF0) Enc:0x7D9FFAA6 : mfspr r12,PIR
Now we're in skiboot. Print the MSR again:
systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
0x9000000002001002
64-bit mode (SF): 0x1 [64-bit mode]
Hypervisor State (HV): 0x1
Vector Available (VEC): 0x1
Machine Check Interrupt Enable (ME): 0x1
Recoverable Interrupt (RI): 0x1
We're big endian.
Print r0:
systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0
0x0000000000000019
r0 is unchanged!
Michael Ellerman <[email protected]> writes:
> Vaibhav Jain <[email protected]> writes:
>> Thanks for reviewing this this patch Ira. My responses below:
>> Ira Weiny <[email protected]> writes:
>>> On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
>>>> Implement support for fetching nvdimm health information via
>>>> H_SCM_HEALTH hcall as documented in Ref[1]. The hcall returns a pair
>>>> of 64-bit big-endian integers, bitwise-and of which is then stored in
>>>> 'struct papr_scm_priv' and subsequently partially exposed to
>>>> user-space via newly introduced dimm specific attribute
>>>> 'papr/flags'. Since the hcall is costly, the health information is
>>>> cached and only re-queried, 60s after the previous successful hcall.
> ...
>>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
>>>> index f35592423380..142636e1a59f 100644
>>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>>>> @@ -39,6 +78,15 @@ struct papr_scm_priv {
>>>> struct resource res;
>>>> struct nd_region *region;
>>>> struct nd_interleave_set nd_set;
>>>> +
>>>> + /* Protect dimm health data from concurrent read/writes */
>>>> + struct mutex health_mutex;
>>>> +
>>>> + /* Last time the health information of the dimm was updated */
>>>> + unsigned long lasthealth_jiffies;
>>>> +
>>>> + /* Health information for the dimm */
>>>> + u64 health_bitmap;
>>>
>>> I wonder if this should be typed big endian as you mention that it is in the
>>> commit message?
>> This was discussed in an earlier review of the patch series at
>> https://lore.kernel.org/linux-nvdimm/[email protected]
>>
>> Even though health bitmap is returned in big endian format (For ex
>> value 0xC00000000000000 indicates bits 0,1 set), its value is never
>> used. Instead only test for specific bits being set in the register is
>> done.
>
> This has already caused a lot of confusion, so let me try and clear it
> up. I will probably fail :)
>
> The value is not big endian.
>
> It's returned in a GPR (a register), from the hypervisor. The ordering
> of bytes in a register is not dependent on what endian we're executing
> in.
>
> It's true that the hypervisor will have been running big endian, and
> when it returns to us we will now be running little endian. But the
> value is unchanged, it was 0xC00000000000000 in the GPR while the HV was
> running and it's still 0xC00000000000000 when we return to Linux. You
> can see this in mambo, see below for an example.
>
>
> _However_, the specification of the bits in the bitmap value uses MSB 0
> ordering, as is traditional for IBM documentation. That means the most
> significant bit, aka. the left most bit, is called "bit 0".
>
> See: https://en.wikipedia.org/wiki/Bit_numbering#MSB_0_bit_numbering
>
> That is the opposite numbering from what most people use, and in
> particular what most code in Linux uses, which is that bit 0 is the
> least significant bit.
>
> Which is where the confusion comes in. It's not that the bytes are
> returned in a different order, it's that the bits are numbered
> differently in the IBM documentation.
>
> The way to fix this kind of thing is to read the docs, and convert all
> the bits into correct numbering (LSB=0), and then throw away the docs ;)
>
> cheers
Thanks a lot for clarifying this Mpe and for this detailed explaination.
I have removed the term Big-Endian from v8 patch description to avoid
any further confusion.
>
>
>
> In mambo we can set a breakpoint just before the kernel enters skiboot,
> towards the end of __opal_call. The kernel is running LE and skiboot
> runs BE.
>
> systemsim-p9 [~/skiboot/skiboot/external/mambo] b 0xc0000000000c1744
> breakpoint set at [0:0:0]: 0xc0000000000c1744 (0x00000000000C1744) Enc:0x2402004C : hrfid
>
> Then run:
>
> systemsim-p9 [~/skiboot/skiboot/external/mambo] c
> [0:0:0]: 0xC0000000000C1744 (0x00000000000C1744) Enc:0x2402004C : hrfid
> INFO: 121671618: (121671618): ** Execution stopped: user (tcl), **
> 121671618: ** finished running 121671618 instructions **
>
> And we stop there, on an hrfid that we haven't executed yet.
> We can print r0, to see the OPAL token:
>
> systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0
> 0x0000000000000019
>
> ie. we're calling OPAL_CONSOLE_WRITE_BUFFER_SPACE (25).
>
> And we can print the MSR:
>
> systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
> 0x9000000002001033
>
> 64-bit mode (SF): 0x1 [64-bit mode]
> Hypervisor State (HV): 0x1
> Vector Available (VEC): 0x1
> Machine Check Interrupt Enable (ME): 0x1
> Instruction Relocate (IR): 0x1
> Data Relocate (DR): 0x1
> Recoverable Interrupt (RI): 0x1
> Little-Endian Mode (LE): 0x1 [little-endian]
>
> ie. we're little endian.
>
> We then step one instruction:
>
> systemsim-p9 [~/skiboot/skiboot/external/mambo] s
> [0:0:0]: 0x0000000030002BF0 (0x0000000030002BF0) Enc:0x7D9FFAA6 : mfspr r12,PIR
>
> Now we're in skiboot. Print the MSR again:
>
> systemsim-p9 [~/skiboot/skiboot/external/mambo] p msr
> 0x9000000002001002
>
> 64-bit mode (SF): 0x1 [64-bit mode]
> Hypervisor State (HV): 0x1
> Vector Available (VEC): 0x1
> Machine Check Interrupt Enable (ME): 0x1
> Recoverable Interrupt (RI): 0x1
>
> We're big endian.
> Print r0:
>
> systemsim-p9 [~/skiboot/skiboot/external/mambo] p r0
> 0x0000000000000019
>
> r0 is unchanged!
Got it. Thanks again.
--
Cheers
~ Vaibhav
On Wed, May 20, 2020 at 10:45:58PM +0530, Vaibhav Jain wrote:
...
> > On Wed, May 20, 2020 at 12:30:56AM +0530, Vaibhav Jain wrote:
...
> >> @@ -39,6 +78,15 @@ struct papr_scm_priv {
> >> struct resource res;
> >> struct nd_region *region;
> >> struct nd_interleave_set nd_set;
> >> +
> >> + /* Protect dimm health data from concurrent read/writes */
> >> + struct mutex health_mutex;
> >> +
> >> + /* Last time the health information of the dimm was updated */
> >> + unsigned long lasthealth_jiffies;
> >> +
> >> + /* Health information for the dimm */
> >> + u64 health_bitmap;
> >
> > I wonder if this should be typed big endian as you mention that it is in the
> > commit message?
> This was discussed in an earlier review of the patch series at
> https://lore.kernel.org/linux-nvdimm/[email protected]
>
> Even though health bitmap is returned in big endian format (For ex
> value 0xC00000000000000 indicates bits 0,1 set), its value is never
> used. Instead only test for specific bits being set in the register is
> done.
>
> Hence using native cpu type instead of __be64 to store this value.
ok.
>
> >
> >> };
> >>
> >> static int drc_pmem_bind(struct papr_scm_priv *p)
> >> @@ -144,6 +192,62 @@ static int drc_pmem_query_n_bind(struct papr_scm_priv *p)
> >> return drc_pmem_bind(p);
> >> }
> >>
> >> +/*
> >> + * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
> >> + * health information.
> >> + */
> >> +static int __drc_pmem_query_health(struct papr_scm_priv *p)
> >> +{
> >> + unsigned long ret[PLPAR_HCALL_BUFSIZE];
> >
> > Is this exclusive to 64bit? Why not u64?
> Yes this is specific to 64 bit as the array holds 64 bit register values
> returned from PHYP. Can u64 but here that will be a departure from existing
> practice within arch/powerpc code to use an unsigned long array to fetch
> returned values for PHYP.
>
> >
> >> + s64 rc;
> >
> > plpar_hcall() returns long and this function returns int and rc is declared
> > s64?
> >
> > Why not have them all be long to follow plpar_hcall?
> Yes 'long' type is better suited for variable 'rc' and I will get it fixed.
>
> But the value of variable 'rc' is never directly returned from this
> function, we always return kernel error codes instead. Hence the
> return type of this function is consistent.
Honestly masking the error return of plpar_hcall() seems a problem as well...
but ok.
Ira
>
> >
> >> +
> >> + /* issue the hcall */
> >> + rc = plpar_hcall(H_SCM_HEALTH, ret, p->drc_index);
> >> + if (rc != H_SUCCESS) {
> >> + dev_err(&p->pdev->dev,
> >> + "Failed to query health information, Err:%lld\n", rc);
> >> + rc = -ENXIO;
> >> + goto out;
> >> + }
> >> +
> >> + p->lasthealth_jiffies = jiffies;
> >> + p->health_bitmap = ret[0] & ret[1];
> >> +
> >> + dev_dbg(&p->pdev->dev,
> >> + "Queried dimm health info. Bitmap:0x%016lx Mask:0x%016lx\n",
> >> + ret[0], ret[1]);
> >> +out:
> >> + return rc;
> >> +}
> >> +
> >> +/* Min interval in seconds for assuming stable dimm health */
> >> +#define MIN_HEALTH_QUERY_INTERVAL 60
> >> +
> >> +/* Query cached health info and if needed call drc_pmem_query_health */
> >> +static int drc_pmem_query_health(struct papr_scm_priv *p)
> >> +{
> >> + unsigned long cache_timeout;
> >> + s64 rc;
> >> +
> >> + /* Protect concurrent modifications to papr_scm_priv */
> >> + rc = mutex_lock_interruptible(&p->health_mutex);
> >> + if (rc)
> >> + return rc;
> >> +
> >> + /* Jiffies offset for which the health data is assumed to be same */
> >> + cache_timeout = p->lasthealth_jiffies +
> >> + msecs_to_jiffies(MIN_HEALTH_QUERY_INTERVAL * 1000);
> >> +
> >> + /* Fetch new health info is its older than MIN_HEALTH_QUERY_INTERVAL */
> >> + if (time_after(jiffies, cache_timeout))
> >> + rc = __drc_pmem_query_health(p);
> >
> > And back to s64 after returning int?
> Agree, will change 's64 rc' to 'int rc'.
>
> >
> >> + else
> >> + /* Assume cached health data is valid */
> >> + rc = 0;
> >> +
> >> + mutex_unlock(&p->health_mutex);
> >> + return rc;
> >> +}
> >>
> >> static int papr_scm_meta_get(struct papr_scm_priv *p,
> >> struct nd_cmd_get_config_data_hdr *hdr)
> >> @@ -286,6 +390,64 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
> >> return 0;
> >> }
> >>
> >> +static ssize_t flags_show(struct device *dev,
> >> + struct device_attribute *attr, char *buf)
> >> +{
> >> + struct nvdimm *dimm = to_nvdimm(dev);
> >> + struct papr_scm_priv *p = nvdimm_provider_data(dimm);
> >> + struct seq_buf s;
> >> + u64 health;
> >> + int rc;
> >> +
> >> + rc = drc_pmem_query_health(p);
> >
> > and back to int...
> >
> drc_pmem_query_health() returns an 'int' so the type of variable 'rc'
> looks correct to me.
>
> > Just make them long all through...
> I think the return type for above all functions is 'int' with
> an issue in drc_pmem_query_health() that you pointed out.
>
> With that fixed the usage of 'int' return type for functions will become
> consistent.
>
> >
> > Ira
> >
> >> + if (rc)
> >> + return rc;
> >> +
> >> + /* Copy health_bitmap locally, check masks & update out buffer */
> >> + health = READ_ONCE(p->health_bitmap);
> >> +
> >> + seq_buf_init(&s, buf, PAGE_SIZE);
> >> + if (health & PAPR_SCM_DIMM_UNARMED_MASK)
> >> + seq_buf_printf(&s, "not_armed ");
> >> +
> >> + if (health & PAPR_SCM_DIMM_BAD_SHUTDOWN_MASK)
> >> + seq_buf_printf(&s, "flush_fail ");
> >> +
> >> + if (health & PAPR_SCM_DIMM_BAD_RESTORE_MASK)
> >> + seq_buf_printf(&s, "restore_fail ");
> >> +
> >> + if (health & PAPR_SCM_DIMM_ENCRYPTED)
> >> + seq_buf_printf(&s, "encrypted ");
> >> +
> >> + if (health & PAPR_SCM_DIMM_SMART_EVENT_MASK)
> >> + seq_buf_printf(&s, "smart_notify ");
> >> +
> >> + if (health & PAPR_SCM_DIMM_SCRUBBED_AND_LOCKED)
> >> + seq_buf_printf(&s, "scrubbed locked ");
> >> +
> >> + if (seq_buf_used(&s))
> >> + seq_buf_printf(&s, "\n");
> >> +
> >> + return seq_buf_used(&s);
> >> +}
> >> +DEVICE_ATTR_RO(flags);
> >> +
> >> +/* papr_scm specific dimm attributes */
> >> +static struct attribute *papr_scm_nd_attributes[] = {
> >> + &dev_attr_flags.attr,
> >> + NULL,
> >> +};
> >> +
> >> +static struct attribute_group papr_scm_nd_attribute_group = {
> >> + .name = "papr",
> >> + .attrs = papr_scm_nd_attributes,
> >> +};
> >> +
> >> +static const struct attribute_group *papr_scm_dimm_attr_groups[] = {
> >> + &papr_scm_nd_attribute_group,
> >> + NULL,
> >> +};
> >> +
> >> static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >> {
> >> struct device *dev = &p->pdev->dev;
> >> @@ -312,8 +474,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
> >> dimm_flags = 0;
> >> set_bit(NDD_LABELING, &dimm_flags);
> >>
> >> - p->nvdimm = nvdimm_create(p->bus, p, NULL, dimm_flags,
> >> - PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> >> + p->nvdimm = nvdimm_create(p->bus, p, papr_scm_dimm_attr_groups,
> >> + dimm_flags, PAPR_SCM_DIMM_CMD_MASK, 0, NULL);
> >> if (!p->nvdimm) {
> >> dev_err(dev, "Error creating DIMM object for %pOF\n", p->dn);
> >> goto err;
> >> @@ -399,6 +561,9 @@ static int papr_scm_probe(struct platform_device *pdev)
> >> if (!p)
> >> return -ENOMEM;
> >>
> >> + /* Initialize the dimm mutex */
> >> + mutex_init(&p->health_mutex);
> >> +
> >> /* optional DT properties */
> >> of_property_read_u32(dn, "ibm,metadata-size", &metadata_size);
> >>
> >> --
> >> 2.26.2
> >> _______________________________________________
> >> Linux-nvdimm mailing list -- [email protected]
> >> To unsubscribe send an email to [email protected]
>
> --
> Cheers
> ~ Vaibhav
Michael Ellerman <[email protected]> writes:
> Ira Weiny <[email protected]> writes:
>> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>>> modules and add the command family to the white list of NVDIMM command
>>> sets. Also advertise support for ND_CMD_CALL for the dimm
>>> command mask and implement necessary scaffolding in the module to
>>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
> ...
>>> + *
>>> + * Payload Version:
>>> + *
>>> + * A 'payload_version' field is present in PDSM header that indicates a specific
>>> + * version of the structure present in PDSM Payload for a given PDSM command.
>>> + * This provides backward compatibility in case the PDSM Payload structure
>>> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>>> + *
>>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>>> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>>> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>>> + * uses 'payload struct version' == MIN('payload_version field',
>>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>>> + * struct in returned 'payload_version' field.
>>
>> FWIW many people believe using a size rather than version is more sustainable.
>> It is expected that new payload structures are larger (more features) than the
>> previous payload structure.
>>
>> I can't find references at the moment through.
>
> I think clone_args is a good modern example:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88
>
> cheers
Thank Ira and Mpe for pointing this out. I looked into how clone3 sycall
handles clone_args and few differences came out:
* Unlike clone_args that are always transferred in one direction from
user-space to kernel, payload contents of pdsms are transferred in both
directions. Having a single version number makes it easier for
user-space and kernel to determine what data will be exchanged.
* For PDSMs, the version number is negotiated between libndctl and
kernel. For example in case kernel only supports an older version of
a structure, its free to send a lower version number back to
libndctl. Such negotiations doesnt happen with clone3 syscall.
--
Cheers
~ Vaibhav
Hi Ira, Mpe and Aneesh,
Vaibhav Jain <[email protected]> writes:
> Michael Ellerman <[email protected]> writes:
>
>> Ira Weiny <[email protected]> writes:
>>> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>>>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>>>> modules and add the command family to the white list of NVDIMM command
>>>> sets. Also advertise support for ND_CMD_CALL for the dimm
>>>> command mask and implement necessary scaffolding in the module to
>>>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>> ...
>>>> + *
>>>> + * Payload Version:
>>>> + *
>>>> + * A 'payload_version' field is present in PDSM header that indicates a specific
>>>> + * version of the structure present in PDSM Payload for a given PDSM command.
>>>> + * This provides backward compatibility in case the PDSM Payload structure
>>>> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>>>> + *
>>>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>>>> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>>>> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>>>> + * uses 'payload struct version' == MIN('payload_version field',
>>>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>>>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>>>> + * struct in returned 'payload_version' field.
>>>
>>> FWIW many people believe using a size rather than version is more sustainable.
>>> It is expected that new payload structures are larger (more features) than the
>>> previous payload structure.
>>>
>>> I can't find references at the moment through.
>>
>> I think clone_args is a good modern example:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88
>>
>> cheers
>
> Thank Ira and Mpe for pointing this out. I looked into how clone3 sycall
> handles clone_args and few differences came out:
>
> * Unlike clone_args that are always transferred in one direction from
> user-space to kernel, payload contents of pdsms are transferred in both
> directions. Having a single version number makes it easier for
> user-space and kernel to determine what data will be exchanged.
>
> * For PDSMs, the version number is negotiated between libndctl and
> kernel. For example in case kernel only supports an older version of
> a structure, its free to send a lower version number back to
> libndctl. Such negotiations doesnt happen with clone3 syscall.
If you are ok with the explaination above please let me know. I will
quickly spin off a v8 addressing your review comments.
Thanks,
--
Cheers
~ Vaibhav
Vaibhav Jain <[email protected]> writes:
> Hi Ira, Mpe and Aneesh,
>
> Vaibhav Jain <[email protected]> writes:
>
>> Michael Ellerman <[email protected]> writes:
>>
>>> Ira Weiny <[email protected]> writes:
>>>> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>>>>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>>>>> modules and add the command family to the white list of NVDIMM command
>>>>> sets. Also advertise support for ND_CMD_CALL for the dimm
>>>>> command mask and implement necessary scaffolding in the module to
>>>>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
>>> ...
>>>>> + *
>>>>> + * Payload Version:
>>>>> + *
>>>>> + * A 'payload_version' field is present in PDSM header that indicates a specific
>>>>> + * version of the structure present in PDSM Payload for a given PDSM command.
>>>>> + * This provides backward compatibility in case the PDSM Payload structure
>>>>> + * evolves and different structures are supported by 'papr_scm' and 'libndctl'.
>>>>> + *
>>>>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the version
>>>>> + * of the payload struct it supports via 'payload_version' field. The 'papr_scm'
>>>>> + * module when servicing the PDSM envelope checks the 'payload_version' and then
>>>>> + * uses 'payload struct version' == MIN('payload_version field',
>>>>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>>>>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of payload
>>>>> + * struct in returned 'payload_version' field.
>>>>
>>>> FWIW many people believe using a size rather than version is more sustainable.
>>>> It is expected that new payload structures are larger (more features) than the
>>>> previous payload structure.
>>>>
>>>> I can't find references at the moment through.
>>>
>>> I think clone_args is a good modern example:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88
>>
>> Thank Ira and Mpe for pointing this out. I looked into how clone3 sycall
>> handles clone_args and few differences came out:
>>
>> * Unlike clone_args that are always transferred in one direction from
>> user-space to kernel, payload contents of pdsms are transferred in both
>> directions. Having a single version number makes it easier for
>> user-space and kernel to determine what data will be exchanged.
>>
>> * For PDSMs, the version number is negotiated between libndctl and
>> kernel. For example in case kernel only supports an older version of
>> a structure, its free to send a lower version number back to
>> libndctl. Such negotiations doesnt happen with clone3 syscall.
>
> If you are ok with the explaination above please let me know. I will
> quickly spin off a v8 addressing your review comments.
I don't have strong opinions about the user API, it's really up to the
nvdimm folks.
cheers